Google's latest Gemini 3.5 Flash model is causing a stir in the generative AI world, boasting impressive token output speeds while remaining cost-effective.
The trend of continual updates with each new release has been a hallmark of Google’s approach to Gemini over the past year, but this time they claim it’s different. Tulsee Doshi, senior director of product management for Gemini, explains that the innovations in 3.5 Flash are designed to be woven through multiple products, marking a significant step towards making complex agentic tasks economically viable.
While the new model can churn out nearly 300 tokens per second—about three times the rate of its predecessor, 3.1 Pro—it still manages to match the benchmark scores of larger models such as 3.1 Pro. This balance between speed and efficiency could be a game-changer for businesses looking to integrate AI into their operations.
Google has gone so far as to suggest that companies using the most AI tokens might save up to a billion dollars annually by switching to Gemini 3.5 Flash. The cost of API pricing is significantly lower than its Pro counterpart, making it more accessible and potentially more profitable for businesses in the long run.







