Alibaba-NLP/gte-Qwen2-7B-instruct

EmbeddingAlibaba-NLP/gte-Qwen2-7B-instruct

Dense text encoder for semantic search and clustering.

Embedding / instruct profile suited to vector search, reranking, and LlamaIndex ingestion. Use the embeddings or completion mode your deployment exposes under this model id.

Best for

Embeddings

Mode

Embedding

Reasoning

No reasoning mode

Tools / coding

Throughput

No throughput info

At a glance

  • Modalities: Text
  • Vector / similarity workflows
  • LlamaIndex Settings helper
  • Keep API id in sync with the model catalog

Integration examples

These snippets are adapted for this model's API mode: Embeddings. The model field is set to Alibaba-NLP/gte-Qwen2-7B-instruct. If your deployment uses an alias, mirror that value when you paste into your app. For general concepts, see the main documentation hub.

Deploy a model →

Keys and base URL

Replace YOUR_API_KEY with the key from Models after you sign in. This page's snippets use http://app.ai-grid.io:4000 when you are signed in (see DOCS_AUTHENTICATED_API_BASE_URL). Otherwise they keep the placeholder host for sovereign-safe documentation.

Create embeddings

Embeddings

/v1/embeddings
Embeddings · HTTP
curl http://app.ai-grid.io:4000/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "Alibaba-NLP/gte-Qwen2-7B-instruct",
    "input": [
      "AIGrid provides sovereign AI infrastructure.",
      "Embeddings power search, clustering, and retrieval."
    ]
  }'

Other models

Open-weight · Text LLM

gpt-oss-120b

General agents

GO
TextMultimodalAdvanced reasoningToolsCode

gpt-oss-120b

Large open stack for general reasoning, code, and assistants.

Throughput

1,000 tokens/sec

  • Strong default for assistants and tool loops
  • OpenAI-compatible `/v1/chat/completions`
  • Pair with LangGraph for stateful agents

Qwen · Text LLM

Qwen3-30B-A3B-Thinking

Reasoning

Q3
TextAdvanced reasoningToolsCode

Qwen3-30B-A3B-Thinking

Reasoning-forward 30B tier for planning and analysis.

Throughput

660 tokens/sec

  • Extended thinking style outputs
  • Ideal for LangGraph flows
  • Tune system prompts for chain-of-thought depth

Google · Text LLM

google/gemma-4-31B

High-volume chat

G4
TextStandard reasoningToolsCode

google/gemma-4-31B

Fast 31B text generation for assistants and RAG answers.

Throughput

580 tokens/sec

  • Great for high-volume chat
  • Pairs with LangChain ChatOpenAI
  • Stable for LlamaIndex completion nodes

Zhipu / Z.ai · OCR

zai-org/GLM-OCR

PDF OCR

GL
ImageMultimodalNo reasoning mode

zai-org/GLM-OCR

Document and image OCR with multimodal chat messages.

Throughput

1.96 pages PDF/sec

  • Vision + OCR style prompts
  • Great for ingestion pipelines
  • Resize images for latency

DeepSeek · OCR

DeepSeek-OCR

Image OCR

DS
ImageMultimodalNo reasoning mode

deepseek-ocr

OCR-focused multimodal stack for documents.

Throughput

860 tokens/sec

  • Multimodal messages for scans
  • Works with guarded BFF patterns
  • Batch pages for throughput