DeepSeek-OCR
OCRdeepseek-ocrOCR-focused multimodal stack for documents.
Use DeepSeek-OCR when you need resilient text extraction across mixed-quality captures. Same chat contract as your text models — swap the `model` field only.
Best for
Image OCR
Mode
OCR
Reasoning
No reasoning mode
Tools / coding
—
Throughput
860 tokens/sec
At a glance
- Modalities: Image · Multimodal
- Multimodal messages for scans
- Works with guarded BFF patterns
- Batch pages for throughput
Integration examples
These snippets are adapted for this model's API mode: OCR. The model field is set to deepseek-ocr. If your deployment uses an alias, mirror that value when you paste into your app. For general concepts, see the main documentation hub.
Keys and base URL
YOUR_API_KEY with the key from Models after you sign in. This page's snippets use http://app.ai-grid.io:4000 when you are signed in (see DOCS_AUTHENTICATED_API_BASE_URL). Otherwise they keep the placeholder host for sovereign-safe documentation.For PDFs, render each page to PNG/JPEG first, then send each page as image input. Keep files small enough for browser or gateway limits.
Extract text from an image
OCR
/v1/chat/completionsIMAGE_DATA="$(base64 -w 0 document-page.png)"
curl http://app.ai-grid.io:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "deepseek-ocr",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Extract all readable text. Preserve tables and line breaks." },
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,'"$IMAGE_DATA"'"
}
}
]
}
],
"max_tokens": 1200
}'Other models
Open-weight · Text LLM
gpt-oss-120b
General agents
gpt-oss-120b
Large open stack for general reasoning, code, and assistants.
Throughput
1,000 tokens/sec
- Strong default for assistants and tool loops
- OpenAI-compatible `/v1/chat/completions`
- Pair with LangGraph for stateful agents
Qwen · Text LLM
Qwen3-30B-A3B-Thinking
Reasoning
Qwen3-30B-A3B-Thinking
Reasoning-forward 30B tier for planning and analysis.
Throughput
660 tokens/sec
- Extended thinking style outputs
- Ideal for LangGraph flows
- Tune system prompts for chain-of-thought depth
Google · Text LLM
google/gemma-4-31B
High-volume chat
google/gemma-4-31B
Fast 31B text generation for assistants and RAG answers.
Throughput
580 tokens/sec
- Great for high-volume chat
- Pairs with LangChain ChatOpenAI
- Stable for LlamaIndex completion nodes
Alibaba · Embedding
Alibaba-NLP/gte-Qwen2-7B-instruct
Embeddings
Alibaba-NLP/gte-Qwen2-7B-instruct
Dense text encoder for semantic search and clustering.
Throughput
No throughput info
- Vector / similarity workflows
- LlamaIndex Settings helper
- Keep API id in sync with the model catalog
Zhipu / Z.ai · OCR
zai-org/GLM-OCR
PDF OCR
zai-org/GLM-OCR
Document and image OCR with multimodal chat messages.
Throughput
1.96 pages PDF/sec
- Vision + OCR style prompts
- Great for ingestion pipelines
- Resize images for latency