Alibaba-NLP/gte-Qwen2-7B-instruct
EmbeddingAlibaba-NLP/gte-Qwen2-7B-instructDense text encoder for semantic search and clustering.
Embedding / instruct profile suited to vector search, reranking, and LlamaIndex ingestion. Use the embeddings or completion mode your deployment exposes under this model id.
Best for
Embeddings
Mode
Embedding
Reasoning
No reasoning mode
Tools / coding
—
Throughput
No throughput info
At a glance
- Modalities: Text
- Vector / similarity workflows
- LlamaIndex Settings helper
- Keep API id in sync with the model catalog
Integration examples
These snippets are adapted for this model's API mode: Embeddings. The model field is set to Alibaba-NLP/gte-Qwen2-7B-instruct. If your deployment uses an alias, mirror that value when you paste into your app. For general concepts, see the main documentation hub.
Keys and base URL
YOUR_API_KEY with the key from Models after you sign in. This page's snippets use http://app.ai-grid.io:4000 when you are signed in (see DOCS_AUTHENTICATED_API_BASE_URL). Otherwise they keep the placeholder host for sovereign-safe documentation.Create embeddings
Embeddings
/v1/embeddingscurl http://app.ai-grid.io:4000/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "Alibaba-NLP/gte-Qwen2-7B-instruct",
"input": [
"AIGrid provides sovereign AI infrastructure.",
"Embeddings power search, clustering, and retrieval."
]
}'Other models
Open-weight · Text LLM
gpt-oss-120b
General agents
gpt-oss-120b
Large open stack for general reasoning, code, and assistants.
Throughput
1,000 tokens/sec
- Strong default for assistants and tool loops
- OpenAI-compatible `/v1/chat/completions`
- Pair with LangGraph for stateful agents
Qwen · Text LLM
Qwen3-30B-A3B-Thinking
Reasoning
Qwen3-30B-A3B-Thinking
Reasoning-forward 30B tier for planning and analysis.
Throughput
660 tokens/sec
- Extended thinking style outputs
- Ideal for LangGraph flows
- Tune system prompts for chain-of-thought depth
Google · Text LLM
google/gemma-4-31B
High-volume chat
google/gemma-4-31B
Fast 31B text generation for assistants and RAG answers.
Throughput
580 tokens/sec
- Great for high-volume chat
- Pairs with LangChain ChatOpenAI
- Stable for LlamaIndex completion nodes
Zhipu / Z.ai · OCR
zai-org/GLM-OCR
PDF OCR
zai-org/GLM-OCR
Document and image OCR with multimodal chat messages.
Throughput
1.96 pages PDF/sec
- Vision + OCR style prompts
- Great for ingestion pipelines
- Resize images for latency
DeepSeek · OCR
DeepSeek-OCR
Image OCR
deepseek-ocr
OCR-focused multimodal stack for documents.
Throughput
860 tokens/sec
- Multimodal messages for scans
- Works with guarded BFF patterns
- Batch pages for throughput