Qwen3-30B-A3B-Thinking

Text LLMQwen3-30B-A3B-Thinking

Reasoning-forward 30B tier for planning and analysis.

Optimized for deliberate multi-step reasoning — policy drafting, structured analysis, and orchestration prompts. Call it like any other chat model via your AIGrid deployment string.

Best for

Reasoning

Mode

Text LLM

Reasoning

Advanced reasoning

Tools / coding

Tools · Code

Throughput

660 tokens/sec

At a glance

  • Modalities: Text
  • Extended thinking style outputs
  • Ideal for LangGraph flows
  • Tune system prompts for chain-of-thought depth

Integration examples

These snippets are adapted for this model's API mode: Chat completions. The model field is set to Qwen3-30B-A3B-Thinking. If your deployment uses an alias, mirror that value when you paste into your app. For general concepts, see the main documentation hub.

Deploy a model →

Keys and base URL

Replace YOUR_API_KEY with the key from Models after you sign in. This page's snippets use http://app.ai-grid.io:4000 when you are signed in (see DOCS_AUTHENTICATED_API_BASE_URL). Otherwise they keep the placeholder host for sovereign-safe documentation.

Send a chat completion

Chat completions

/v1/chat/completions
Chat completions · HTTP
curl http://app.ai-grid.io:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "Qwen3-30B-A3B-Thinking",
    "messages": [
      { "role": "user", "content": "Hello from AIGrid!" }
    ]
  }'

Other models

Open-weight · Text LLM

gpt-oss-120b

General agents

GO
TextMultimodalAdvanced reasoningToolsCode

gpt-oss-120b

Large open stack for general reasoning, code, and assistants.

Throughput

1,000 tokens/sec

  • Strong default for assistants and tool loops
  • OpenAI-compatible `/v1/chat/completions`
  • Pair with LangGraph for stateful agents

Google · Text LLM

google/gemma-4-31B

High-volume chat

G4
TextStandard reasoningToolsCode

google/gemma-4-31B

Fast 31B text generation for assistants and RAG answers.

Throughput

580 tokens/sec

  • Great for high-volume chat
  • Pairs with LangChain ChatOpenAI
  • Stable for LlamaIndex completion nodes

Alibaba · Embedding

Alibaba-NLP/gte-Qwen2-7B-instruct

Embeddings

AB
TextNo reasoning mode

Alibaba-NLP/gte-Qwen2-7B-instruct

Dense text encoder for semantic search and clustering.

Throughput

No throughput info

  • Vector / similarity workflows
  • LlamaIndex Settings helper
  • Keep API id in sync with the model catalog

Zhipu / Z.ai · OCR

zai-org/GLM-OCR

PDF OCR

GL
ImageMultimodalNo reasoning mode

zai-org/GLM-OCR

Document and image OCR with multimodal chat messages.

Throughput

1.96 pages PDF/sec

  • Vision + OCR style prompts
  • Great for ingestion pipelines
  • Resize images for latency

DeepSeek · OCR

DeepSeek-OCR

Image OCR

DS
ImageMultimodalNo reasoning mode

deepseek-ocr

OCR-focused multimodal stack for documents.

Throughput

860 tokens/sec

  • Multimodal messages for scans
  • Works with guarded BFF patterns
  • Batch pages for throughput