LLM API Pricing Comparison 2025
OpenAI • Anthropic • Cohere • Google • Meta • Open Source
Last updated: October 2025 | All prices in USD per 1M tokens
🎯 Quick Recommendations
💰 Best Value
Gemini 1.5 Flash: $0.075/$0.30 per 1M tokens
🚀 Best Performance
GPT-4 Turbo: $10/$30 per 1M tokens
⚖️ Best Balance
Claude 3.5 Sonnet: $3/$15 per 1M tokens
🔓 Open Source
Self-hosted Llama 3.1: Hardware cost only
Complete API Pricing Comparison
All major LLM providers with their latest pricing (October 2025)
| Provider / Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Best For |
|---|---|---|---|---|
| OpenAI GPT-4 Turbo | $10.00 | $30.00 | 128K tokens | Complex reasoning, coding |
| OpenAI GPT-4 | $30.00 | $60.00 | 8K tokens | Highest quality tasks |
| OpenAI GPT-3.5 Turbo | $0.50 | $1.50 | 16K tokens | Fast, affordable tasks |
| Anthropic Claude 3.5 Sonnet | $3.00 | $15.00 | 200K tokens | Analysis, writing, coding |
| Anthropic Claude 3 Opus | $15.00 | $75.00 | 200K tokens | Complex, nuanced tasks |
| Anthropic Claude 3 Haiku 🏆 | $0.25 | $1.25 | 200K tokens | Fast, economical tasks |
| Google Gemini 1.5 Flash 🏆 | $0.075 | $0.30 | 1M tokens | High-volume, cost-sensitive |
| Google Gemini 1.5 Pro | $1.25 | $5.00 | 2M tokens | Long context, multimodal |
| Cohere Command R+ | $3.00 | $15.00 | 128K tokens | RAG, enterprise search |
| Cohere Command R | $0.50 | $1.50 | 128K tokens | Conversational AI |
| Cohere Command Light | $0.30 | $0.60 | 4K tokens | Simple, fast tasks |
| Meta Llama 3.1 405B via Together AI |
$3.75 | $4.50 | 128K tokens | Open source, customizable |
| Meta Llama 3.1 70B via Together AI |
$0.88 | $0.88 | 128K tokens | Balance cost/performance |
| Meta Llama 3.1 8B via Together AI |
$0.18 | $0.18 | 128K tokens | High-volume, simple tasks |
| Mistral Large | $4.00 | $12.00 | 128K tokens | Complex reasoning |
| Mistral Medium | $2.70 | $8.10 | 32K tokens | General purpose |
| Mistral Small | $1.00 | $3.00 | 32K tokens | Fast, efficient tasks |
💰 API Cost Calculator
Calculate your monthly API costs based on usage
📊 Which API Should You Choose?
🚀 High-Performance Applications
Need the absolute best quality regardless of cost
- Best: GPT-4 Turbo, Claude 3 Opus
- Cost: $10-30 per 1M input tokens
- Use for: Complex reasoning, advanced coding, critical decisions
⚖️ Balanced Performance
Great quality at reasonable costs
- Best: Claude 3.5 Sonnet, Gemini 1.5 Pro
- Cost: $1-3 per 1M input tokens
- Use for: Most production applications, content generation
💰 High-Volume / Cost-Sensitive
Maximum efficiency for large-scale deployments
- Best: Gemini 1.5 Flash, Claude Haiku
- Cost: $0.075-0.50 per 1M input tokens
- Use for: Chatbots, summaries, classification at scale
🔓 Maximum Control
Full control over data and customization
- Best: Self-hosted Llama 3.1, Mistral
- Cost: Hardware only (no per-token fees)
- Use for: Sensitive data, fine-tuning, unlimited usage
Provider Deep Dives
OpenAI API Pricing
Best Known For: Industry-leading performance, extensive ecosystem
$10 input / $30 output per 1M tokens
Best for complex reasoning, coding, and analysis
$0.50 input / $1.50 output per 1M tokens
Great for chatbots and simple tasks
Anthropic Claude API Pricing
Best Known For: Long context windows (200K tokens), safety-focused
$3 input / $15 output per 1M tokens
Best balance of intelligence and speed
$15 input / $75 output per 1M tokens
Top performance for complex tasks
$0.25 input / $1.25 output per 1M tokens
Fast and economical
Cohere API Pricing
Best Known For: Enterprise RAG, multilingual capabilities, embedding models
$3 input / $15 output per 1M tokens
Advanced RAG and search
$0.50 input / $1.50 output per 1M tokens
Efficient conversational AI
Google Gemini API Pricing
Best Known For: Ultra-long context (2M tokens), multimodal capabilities, lowest cost
$0.075 input / $0.30 output per 1M tokens
Most cost-effective option
$1.25 input / $5 output per 1M tokens
2M token context window
Frequently Asked Questions
What's the difference between input and output tokens?
Input tokens are the text you send to the API (your prompt), while output tokens are the text the model generates in response. Most providers charge more for output tokens since generation is computationally more expensive.
How many tokens is 1,000 words?
Roughly 1,000 words equals 1,300-1,500 tokens in English. This varies by language and content type. A simple rule: 1 token ≈ 0.75 words or 1 word ≈ 1.3 tokens.
Should I use API or subscription (ChatGPT Plus, Claude Pro)?
Subscription ($20/month): Good for personal use, ~unlimited conversations.
API (pay-per-token): Better for building applications, better rate limits, programmatic access. If you use more than 1-2M tokens/month, API is usually cheaper.
Can I use open-source models via API?
Yes! Services like Together AI, Anyscale, Replicate, and Fireworks AI offer hosted APIs for open-source models like Llama 3.1, Mistral, and others. Prices are typically 50-90% cheaper than proprietary models.
What about rate limits?
All providers have rate limits (requests per minute/day). Higher tier accounts get better limits. For production apps, always check rate limits and implement proper retry logic.
Ready to Choose Your LLM API?
Compare 2,800+ models, check benchmarks, and find deployment guides
💻 Or Self-Host for Maximum Savings
If you have high volume (>10M tokens/month), self-hosting open-source models can be 10-50x cheaper:
Llama 3.1 70B
One-time: $2,000 GPU
Monthly: $200 cloud hosting
Break-even at ~20M tokens/month vs Claude Sonnet
Mistral 7B
One-time: $500 GPU
Monthly: $50 cloud hosting
Great for high-volume simple tasks