Skip to content

OpenAI GPT-5.4 Family

The GPT-5.4 family represents OpenAI's production standard as of April 2026, featuring integrated coding capabilities, agentic reasoning, and a suite of specialized variants for different use cases.


Overview

GPT-5.4 marks a departure from the traditional single-model approach. Instead of one flagship, OpenAI now offers five distinct variants released across March 2026, each optimized for specific workloads.

Release Date: March 5-17, 2026
Status: Active (in production)
Pricing Tier: Premium ($2.50/M input, $15.00/M output for Standard)


Core Innovation: Integrated Coding

Previous approach: Specialized "Codex" model for code tasks
GPT-5.4 approach: Native code understanding integrated into all variants

This means: - ✅ All GPT-5.4 models understand code natively (no separate model needed) - ✅ Better code-language interleaving - ✅ Unified API for mixed tasks (code + natural language) - ✅ Agentic capabilities now default, not specialized


Model Variants

1. GPT-5.4 Standard

Primary use case: General-purpose, default choice

  • Release: March 5, 2026
  • Status: Active
  • Capabilities: Balanced reasoning, coding, analysis
  • Key benchmark: OSWorld 75% (computer use)
  • Context window: ~128K tokens
  • Pricing: $2.50/M input, $15.00/M output
  • Speed: Fast inference
  • Use when:
  • General-purpose applications
  • Mixed reasoning + coding tasks
  • When you need agentic capabilities (web/desktop automation)
  • Default choice unless other variants fit better

Recommendation: Start here unless cost or speed is critical.


2. GPT-5.4 Thinking

Primary use case: Deep reasoning, complex analysis

  • Release: March 5, 2026
  • Status: Active
  • Capabilities: Extended reasoning, problem-solving
  • Approach: "Thinking" variant (reasons internally before outputting)
  • Latency: Slower (reasoning overhead)
  • Pricing: Higher than Standard
  • Best for:
  • Complex mathematical problems
  • Long-form reasoning
  • Multi-step problem solving
  • Research and analysis
  • When accuracy > speed

Example use cases: - Theorem proving - Complex algorithm design - Deep code review - Scientific reasoning


3. GPT-5.4 Mini

Primary use case: Cost-optimized, high-volume tasks

  • Release: March 17, 2026
  • Status: Active
  • Capability claim: 95% of Standard model at 17% of cost
  • SWE-bench Pro: 54.38% (vs Standard 57.7%)
  • Pricing: ~$0.15/M input, ~$0.60/M output (6x cheaper)
  • Context window: ~64K tokens
  • Speed: Very fast
  • Use when:
  • Cost is critical (high-volume applications)
  • Simple/moderate complexity tasks
  • Real-time requirements
  • Prototyping and testing
  • Running on consumer hardware

Cost-benefit: Trade 5% capability loss for 85% cost savings. Excellent value.

Example workloads: - Content generation at scale - Classification and tagging - Simple Q&A and customer support - High-frequency API calls


4. GPT-5.4 Nano

Primary use case: Edge/embedded devices, extreme constraints

  • Release: March 17, 2026
  • Status: Active
  • Target: Mobile, IoT, embedded systems
  • Parameter efficiency: Heavily quantized
  • Context window: ~16K tokens
  • Speed: Real-time on edge devices
  • Use when:
  • Running on smartphones/tablets
  • Embedded IoT devices
  • Offline-first applications
  • Ultra-low latency required
  • Network bandwidth is expensive

Deployment scenarios: - Mobile app assistants - In-car AI systems - Home automation - Field devices (no cloud access)


5. GPT-5.4 Spark

Primary use case: Real-time streaming applications

  • Release: Q1 2026 (March 2026)
  • Status: Active
  • Feature: Streaming/real-time output
  • Latency: Optimized for continuous generation
  • Use case: Live transcription, real-time chat, streaming analytics
  • Output format: Token-by-token streaming
  • Use when:
  • Live streaming applications
  • Real-time chat interfaces
  • Continuous monitoring/analysis
  • Video/audio processing streams

Performance Benchmarks

Agentic Capabilities (Computer Use)

Variant OSWorld Score Notes
Standard 75% Near-human desktop/web navigation
Thinking 78% Better reasoning for complex tasks
Mini 62% 82% of Standard, 6x cheaper
Nano 48% Basic automation only
Spark 75% Stream-optimized, same reasoning

OSWorld: Tests ability to use desktop/browser autonomously (click, type, navigate)

Code Performance (SWE-bench Pro)

Variant Score Rationale
Standard 57.7% Full coding capability
Thinking 62% Better reasoning helps code
Mini 54.38% 94% of Standard, compelling value
Nano 38% Basic code only

API Pricing

Per-Million-Token Pricing

Variant Input Output Monthly (1M queries)
Standard $2.50 $15.00 ~$2,000-5,000
Thinking $5.00 $30.00 ~$4,000-10,000
Mini $0.15 $0.60 ~$120-300
Nano $0.02 $0.08 ~$16-40
Spark $2.50 $15.00 ~$2,000-5,000

Cost optimization tip: Use Mini + Standard hybrid approach - 80% of queries via Mini (cheap) - 20% of complex queries via Standard - Saves ~60% vs all-Standard


Architecture & Capabilities

Core Features

  • Modalities: Text + code (native integration)
  • Context: 128K tokens (Standard/Thinking/Spark)
  • Multimodal: No (text/code only)
  • Thinking model: Yes (separate Thinking variant)
  • Streaming: Yes (Spark optimized, others support)
  • Vision: No

Agentic Capabilities (NEW)

  • Desktop automation: Click, type, navigate
  • Web automation: Fill forms, scrape, interact
  • OSWorld score: 75% (near-human reliability)
  • Use cases:
  • Autonomous data entry
  • Web scraping
  • Testing automation
  • RPA (Robotic Process Automation)

Coding Integration (NEW)

  • Native understanding: No separate Codex model
  • Code generation: High quality
  • Code analysis: Strong reasoning
  • Debugging: Good error identification
  • Architecture design: Solid recommendations

Decision Tree

Do you need reasoning/thinking?
  ├─ Yes → GPT-5.4 Thinking
  └─ No
      ├─ Is cost critical?
      │   ├─ Yes, volume high → GPT-5.4 Mini
      │   └─ No → GPT-5.4 Standard
      │
      ├─ Is this for mobile/edge?
      │   ├─ Yes → GPT-5.4 Nano
      │   └─ No
      │
      └─ Do you need real-time streaming?
          ├─ Yes → GPT-5.4 Spark
          └─ No → GPT-5.4 Standard (default)

Comparison to Previous Versions

vs GPT-5.2 (Retiring June 5, 2026)

  • ✅ Better agentic capabilities (OSWorld: 75% vs 68%)
  • ✅ Integrated coding (no Codex needed)
  • ✅ Better reasoning in Thinking variant
  • ✅ More efficient Mini variant
  • ✅ Streaming optimized (Spark)
  • 💾 Similar context window (128K)
  • 💵 Higher cost for Standard tier

Migration: Begin moving to GPT-5.4 now (retiring June 5, 2026)


When NOT to Use GPT-5.4

  • ❌ Need multimodal (image/audio/video) → Use Gemini 3.1
  • ❌ Need open-source / on-prem → Use Llama 4
  • ❌ Need longest context (1M+ tokens) → Use Gemini 3.1 or Llama 4 Maverick
  • ❌ Need to avoid proprietary APIs → Use Meta Llama 4

API Integration Example

from openai import OpenAI

client = OpenAI(api_key="your-key")

# Standard model (default)
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}]
)

# Thinking variant (reasoning)
response = client.chat.completions.create(
    model="gpt-5.4-thinking",
    messages=[{"role": "user", "content": "Solve: 2x + 5 = 17"}]
)

# Mini variant (cost-optimized)
response = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Summarize this..."}]
)

# Streaming (Spark)
stream = client.chat.completions.create(
    model="gpt-5.4-spark",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

Summary

Use Case Best Variant Why
General purpose Standard Balanced, reliable
Deep reasoning Thinking Extended deliberation
High volume Mini 6x cheaper, 95% capable
Edge devices Nano Runs locally
Real-time Spark Stream-optimized
Agentic/RPA Standard 75% OSWorld
Budget projects Mini + Standard Hybrid approach

Last Updated

April 8, 2026