zai-org/GLM-OCR

OCRzai-org/GLM-OCR

Document and image OCR with multimodal chat messages.

Send OpenAI-style multimodal user content (image + instructions) to extract text from scans, PDF renders, and screenshots.

Best for

PDF OCR

Mode

OCR

Reasoning

No reasoning mode

Tools / coding

—

Throughput

1.96 pages PDF/sec

Model pricing

Token prices are shown in Algerian dinars per one million tokens.

Input price

8 DA / 1M tokens

Output price

8 DA / 1M tokens

At a glance

Modalities: Image · Multimodal
Vision + OCR style prompts
Great for ingestion pipelines
Resize images for latency

Integration examples

These snippets are adapted for this model's API mode: OCR. The model field is set to zai-org/GLM-OCR. If your deployment uses an alias, mirror that value when you paste into your app. For general concepts, see the main documentation hub.

Deploy a model →

Keys and base URL

Replace YOUR_API_KEY with the key from Models after you sign in. This page's snippets use https://app.ai-grid.io when you are signed in (see DOCS_AUTHENTICATED_API_BASE_URL). Otherwise they keep the placeholder host for sovereign-safe documentation.

For PDFs, render each page to PNG/JPEG first, then send each page as image input. Keep files small enough for browser or gateway limits.

Extract text from an image

OCR

/v1/chat/completions

OCR · HTTP

IMAGE_DATA="$(base64 -w 0 document-page.png)"

curl https://app.ai-grid.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "zai-org/GLM-OCR",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Extract all readable text. Preserve tables and line breaks." },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,'"$IMAGE_DATA"'"
            }
          }
        ]
      }
    ],
    "max_tokens": 1200
  }'

Other models

Open-weight · Text LLM

gpt-oss-120b

General agents

TextMultimodalAdvanced reasoningToolsCode

gpt-oss-120b

Large open stack for general reasoning, code, and assistants.

Throughput

1,000 tokens/sec

Input

19 DA / 1M tokens

Output

75 DA / 1M tokens

Strong default for assistants and tool loops
OpenAI-compatible `/v1/chat/completions`
Pair with LangGraph for stateful agents

View documentation

Qwen · Text LLM

Qwen3-30B-A3B-Thinking

Reasoning

TextAdvanced reasoningToolsCode

Qwen3-30B-A3B-Thinking

Reasoning-forward 30B tier for planning and analysis.

Throughput

660 tokens/sec

Input

73 DA / 1M tokens

Output

148 DA / 1M tokens

Extended thinking style outputs
Ideal for LangGraph flows
Tune system prompts for chain-of-thought depth

View documentation

Google · Text LLM

google/gemma-4-31B

High-volume chat

TextStandard reasoningToolsCode

google/gemma-4-31B

Fast 31B text generation for assistants and RAG answers.

Throughput

580 tokens/sec

Input

73 DA / 1M tokens

Output

148 DA / 1M tokens

Great for high-volume chat
Pairs with LangChain ChatOpenAI
Stable for LlamaIndex completion nodes

View documentation

Alibaba · Embedding

Alibaba-NLP/gte-Qwen2-7B-instruct

Embeddings

TextNo reasoning mode

Alibaba-NLP/gte-Qwen2-7B-instruct

Dense text encoder for semantic search and clustering.

Throughput

No throughput info

Input

Free

Output

Free

Vector / similarity workflows
LlamaIndex Settings helper
Keep API id in sync with the model catalog

View documentation

DeepSeek · OCR

DeepSeek-OCR

Image OCR

ImageMultimodalNo reasoning mode

deepseek-ocr

OCR-focused multimodal stack for documents.

Throughput

860 tokens/sec

Input

8 DA / 1M tokens

Output

8 DA / 1M tokens

Multimodal messages for scans
Works with guarded BFF patterns
Batch pages for throughput

View documentation