OpenAI has launched a new option called Flex API, designed to help users save money when using AI for less urgent work. It’s especially useful for tasks that don’t need fast results, like organizing data or running background processes.
How Flex API Works
The Flex API works by offering slower responses and sometimes not being available right away. In exchange, users get a much lower price. It’s ideal for non-critical uses where timing is not a big concern.
Flex API is Now in Beta
This new option is currently in beta testing and supports two AI models:
- o3 model
- o4-mini model
Users can try it now, especially if they’re handling large amounts of data that don’t require immediate answers.
Big Savings With New Token Pricing
OpenAI has cut the prices in half for those who use Flex:
- For the o3 model:
- $5 per million input tokens
- $20 per million output tokens
- For the o4-mini model:
- $0.55 per million input tokens (down from $1.10)
- $2.20 per million output tokens (down from $4.40)
This makes Flex a cost-effective choice for businesses or developers managing a lot of AI tasks.
Competing with Google and Others
OpenAI’s Flex launch comes at a time when competitors, especially Google, are also offering budget options. Google’s Gemini 2.5 Flash model is a recent example that focuses on faster and cheaper services.
Who Should Use Flex?
If your work involves:
- Data tagging
- Content filtering
- Large batch jobs
- Non-urgent requests
Then Flex could help you save a lot of money while still getting the job done. It’s perfect for background or delayed tasks.