LLM API Pricing Comparison 2025

OpenAI • Anthropic • Cohere • Google • Meta • Open Source

Last updated: October 2025 | All prices in USD per 1M tokens

🎯 Quick Recommendations

💰 Best Value

Gemini 1.5 Flash: $0.075/$0.30 per 1M tokens

🚀 Best Performance

GPT-4 Turbo: $10/$30 per 1M tokens

⚖️ Best Balance

Claude 3.5 Sonnet: $3/$15 per 1M tokens

🔓 Open Source

Self-hosted Llama 3.1: Hardware cost only

Complete API Pricing Comparison

All major LLM providers with their latest pricing (October 2025)

Provider / Model Input (per 1M tokens) Output (per 1M tokens) Context Window Best For
OpenAI GPT-4 Turbo $10.00 $30.00 128K tokens Complex reasoning, coding
OpenAI GPT-4 $30.00 $60.00 8K tokens Highest quality tasks
OpenAI GPT-3.5 Turbo $0.50 $1.50 16K tokens Fast, affordable tasks
Anthropic Claude 3.5 Sonnet $3.00 $15.00 200K tokens Analysis, writing, coding
Anthropic Claude 3 Opus $15.00 $75.00 200K tokens Complex, nuanced tasks
Anthropic Claude 3 Haiku 🏆 $0.25 $1.25 200K tokens Fast, economical tasks
Google Gemini 1.5 Flash 🏆 $0.075 $0.30 1M tokens High-volume, cost-sensitive
Google Gemini 1.5 Pro $1.25 $5.00 2M tokens Long context, multimodal
Cohere Command R+ $3.00 $15.00 128K tokens RAG, enterprise search
Cohere Command R $0.50 $1.50 128K tokens Conversational AI
Cohere Command Light $0.30 $0.60 4K tokens Simple, fast tasks
Meta Llama 3.1 405B
via Together AI
$3.75 $4.50 128K tokens Open source, customizable
Meta Llama 3.1 70B
via Together AI
$0.88 $0.88 128K tokens Balance cost/performance
Meta Llama 3.1 8B
via Together AI
$0.18 $0.18 128K tokens High-volume, simple tasks
Mistral Large $4.00 $12.00 128K tokens Complex reasoning
Mistral Medium $2.70 $8.10 32K tokens General purpose
Mistral Small $1.00 $3.00 32K tokens Fast, efficient tasks

💰 API Cost Calculator

Calculate your monthly API costs based on usage

📊 Which API Should You Choose?

🚀 High-Performance Applications

Need the absolute best quality regardless of cost

  • Best: GPT-4 Turbo, Claude 3 Opus
  • Cost: $10-30 per 1M input tokens
  • Use for: Complex reasoning, advanced coding, critical decisions

⚖️ Balanced Performance

Great quality at reasonable costs

  • Best: Claude 3.5 Sonnet, Gemini 1.5 Pro
  • Cost: $1-3 per 1M input tokens
  • Use for: Most production applications, content generation

💰 High-Volume / Cost-Sensitive

Maximum efficiency for large-scale deployments

  • Best: Gemini 1.5 Flash, Claude Haiku
  • Cost: $0.075-0.50 per 1M input tokens
  • Use for: Chatbots, summaries, classification at scale

🔓 Maximum Control

Full control over data and customization

  • Best: Self-hosted Llama 3.1, Mistral
  • Cost: Hardware only (no per-token fees)
  • Use for: Sensitive data, fine-tuning, unlimited usage

Provider Deep Dives

OpenAI API Pricing

Best Known For: Industry-leading performance, extensive ecosystem

GPT-4 Turbo

$10 input / $30 output per 1M tokens

Best for complex reasoning, coding, and analysis

GPT-3.5 Turbo

$0.50 input / $1.50 output per 1M tokens

Great for chatbots and simple tasks

View Official OpenAI Pricing →

Anthropic Claude API Pricing

Best Known For: Long context windows (200K tokens), safety-focused

Claude 3.5 Sonnet

$3 input / $15 output per 1M tokens

Best balance of intelligence and speed

Claude 3 Opus

$15 input / $75 output per 1M tokens

Top performance for complex tasks

Claude 3 Haiku

$0.25 input / $1.25 output per 1M tokens

Fast and economical

View Official Anthropic Pricing →

Cohere API Pricing

Best Known For: Enterprise RAG, multilingual capabilities, embedding models

Command R+

$3 input / $15 output per 1M tokens

Advanced RAG and search

Command R

$0.50 input / $1.50 output per 1M tokens

Efficient conversational AI

View Official Cohere Pricing →

Google Gemini API Pricing

Best Known For: Ultra-long context (2M tokens), multimodal capabilities, lowest cost

Gemini 1.5 Flash

$0.075 input / $0.30 output per 1M tokens

Most cost-effective option

Gemini 1.5 Pro

$1.25 input / $5 output per 1M tokens

2M token context window

View Official Google Pricing →

Frequently Asked Questions

What's the difference between input and output tokens?

Input tokens are the text you send to the API (your prompt), while output tokens are the text the model generates in response. Most providers charge more for output tokens since generation is computationally more expensive.

How many tokens is 1,000 words?

Roughly 1,000 words equals 1,300-1,500 tokens in English. This varies by language and content type. A simple rule: 1 token ≈ 0.75 words or 1 word ≈ 1.3 tokens.

Should I use API or subscription (ChatGPT Plus, Claude Pro)?

Subscription ($20/month): Good for personal use, ~unlimited conversations.
API (pay-per-token): Better for building applications, better rate limits, programmatic access. If you use more than 1-2M tokens/month, API is usually cheaper.

Can I use open-source models via API?

Yes! Services like Together AI, Anyscale, Replicate, and Fireworks AI offer hosted APIs for open-source models like Llama 3.1, Mistral, and others. Prices are typically 50-90% cheaper than proprietary models.

What about rate limits?

All providers have rate limits (requests per minute/day). Higher tier accounts get better limits. For production apps, always check rate limits and implement proper retry logic.

Ready to Choose Your LLM API?

Compare 2,800+ models, check benchmarks, and find deployment guides

Browse All Models View Benchmarks Get Expert Help

💻 Or Self-Host for Maximum Savings

If you have high volume (>10M tokens/month), self-hosting open-source models can be 10-50x cheaper:

Llama 3.1 70B

One-time: $2,000 GPU

Monthly: $200 cloud hosting

Break-even at ~20M tokens/month vs Claude Sonnet

Mistral 7B

One-time: $500 GPU

Monthly: $50 cloud hosting

Great for high-volume simple tasks

See full self-hosting cost analysis →