Model library
Compare AIGrid-ready chat, embedding, and OCR models by capability, throughput, and best-fit workload. Each card links to model-specific integration examples.
Model type
Showing 6 of 6 models
Open-weight · Text LLM
gpt-oss-120b
General agents
gpt-oss-120b
Large open stack for general reasoning, code, and assistants.
Throughput
1,000 tokens/sec
- Strong default for assistants and tool loops
- OpenAI-compatible `/v1/chat/completions`
- Pair with LangGraph for stateful agents
Qwen · Text LLM
Qwen3-30B-A3B-Thinking
Reasoning
Qwen3-30B-A3B-Thinking
Reasoning-forward 30B tier for planning and analysis.
Throughput
660 tokens/sec
- Extended thinking style outputs
- Ideal for LangGraph flows
- Tune system prompts for chain-of-thought depth
Google · Text LLM
google/gemma-4-31B
High-volume chat
google/gemma-4-31B
Fast 31B text generation for assistants and RAG answers.
Throughput
580 tokens/sec
- Great for high-volume chat
- Pairs with LangChain ChatOpenAI
- Stable for LlamaIndex completion nodes
Zhipu / Z.ai · OCR
zai-org/GLM-OCR
PDF OCR
zai-org/GLM-OCR
Document and image OCR with multimodal chat messages.
Throughput
1.96 pages PDF/sec
- Vision + OCR style prompts
- Great for ingestion pipelines
- Resize images for latency
DeepSeek · OCR
DeepSeek-OCR
Image OCR
deepseek-ocr
OCR-focused multimodal stack for documents.
Throughput
860 tokens/sec
- Multimodal messages for scans
- Works with guarded BFF patterns
- Batch pages for throughput
Alibaba · Embedding
Alibaba-NLP/gte-Qwen2-7B-instruct
Embeddings
Alibaba-NLP/gte-Qwen2-7B-instruct
Dense text encoder for semantic search and clustering.
Throughput
No throughput info
- Vector / similarity workflows
- LlamaIndex Settings helper
- Keep API id in sync with the model catalog