gpt-oss-120b

Text LLMgpt-oss-120b

Large open stack for general reasoning, code, and assistants.

Use gpt-oss-120b for production chat and multi-step agents. Point LangChain, LangGraph, or raw HTTP at your AIGrid instance so traffic stays on your org key and routing policy.

Best for

General agents

Mode

Text LLM

Reasoning

Advanced reasoning

Tools / coding

Tools · Code

Throughput

1,000 tokens/sec

Model pricing

Token prices are shown in Algerian dinars per one million tokens.

Input price

19 DA / 1M tokens

Output price

75 DA / 1M tokens

At a glance

Modalities: Text · Multimodal
Strong default for assistants and tool loops
OpenAI-compatible `/v1/chat/completions`
Pair with LangGraph for stateful agents

Integration examples

These snippets are adapted for this model's API mode: Chat completions. The model field is set to gpt-oss-120b. If your deployment uses an alias, mirror that value when you paste into your app. For general concepts, see the main documentation hub.

Deploy a model →

Keys and base URL

Replace YOUR_API_KEY with the key from Models after you sign in. This page's snippets use https://app.ai-grid.io when you are signed in (see DOCS_AUTHENTICATED_API_BASE_URL). Otherwise they keep the placeholder host for sovereign-safe documentation.

For vision / OCR models, send multimodal user content: mix `image_url` and `text` parts in the `messages` array using the same schema as OpenAI-compatible vision chat.

Send a chat completion

Chat completions

/v1/chat/completions

Chat completions · HTTP

curl https://app.ai-grid.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [
      { "role": "user", "content": "Hello from AIGrid!" }
    ]
  }'

Other models

Qwen · Text LLM

Qwen3-30B-A3B-Thinking

Reasoning

TextAdvanced reasoningToolsCode

Qwen3-30B-A3B-Thinking

Reasoning-forward 30B tier for planning and analysis.

Throughput

660 tokens/sec

Input

73 DA / 1M tokens

Output

148 DA / 1M tokens

Extended thinking style outputs
Ideal for LangGraph flows
Tune system prompts for chain-of-thought depth

View documentation

Google · Text LLM

google/gemma-4-31B

High-volume chat

TextStandard reasoningToolsCode

google/gemma-4-31B

Fast 31B text generation for assistants and RAG answers.

Throughput

580 tokens/sec

Input

73 DA / 1M tokens

Output

148 DA / 1M tokens

Great for high-volume chat
Pairs with LangChain ChatOpenAI
Stable for LlamaIndex completion nodes

View documentation

Alibaba · Embedding

Alibaba-NLP/gte-Qwen2-7B-instruct

Embeddings

TextNo reasoning mode

Alibaba-NLP/gte-Qwen2-7B-instruct

Dense text encoder for semantic search and clustering.

Throughput

No throughput info

Input

Free

Output

Free

Vector / similarity workflows
LlamaIndex Settings helper
Keep API id in sync with the model catalog

View documentation

Zhipu / Z.ai · OCR

zai-org/GLM-OCR

PDF OCR

ImageMultimodalNo reasoning mode

zai-org/GLM-OCR

Document and image OCR with multimodal chat messages.

Throughput

1.96 pages PDF/sec

Input

8 DA / 1M tokens

Output

8 DA / 1M tokens

Vision + OCR style prompts
Great for ingestion pipelines
Resize images for latency

View documentation

DeepSeek · OCR

DeepSeek-OCR

Image OCR

ImageMultimodalNo reasoning mode

deepseek-ocr

OCR-focused multimodal stack for documents.

Throughput

860 tokens/sec

Input

8 DA / 1M tokens

Output

8 DA / 1M tokens

Multimodal messages for scans
Works with guarded BFF patterns
Batch pages for throughput

View documentation