In recent months, the landscape of AI coding agents has seen remarkable advancements, particularly with tools like OpenAI’s Codex and Anthropic’s Claude Code. These technologies have evolved to provide developers with new capabilities for rapidly building prototypes, interfaces, and boilerplate code.
One pressing metric in this race among AI coding tools is speed. While Codex-Spark boasts a respectable performance of 1,000 tokens per second, it is on the lower end of what’s expected within the industry. Cerebras, a leading company in AI hardware, has recorded impressive speeds of up to 2,100 tokens per second on models such as Llama 3.1 70B. Furthermore, OpenAI’s gpt-oss-120B has reached an astounding rate of 3,000 tokens per second. These benchmarks imply that CODAI-Spark’s comparatively slower pace might stem from the complexities or size of its underlying model.
The Rise of AI Coding Assistants
The past year can rightly be characterized as a breakout period for AI coding assistants. Industry giants such as OpenAI, Google, and Anthropic are fervently competing to deliver more capable coding agents. In this fast-paced environment, latency has become a pivotal factor; faster coding tools allow developers to iterate their projects more efficiently.
OpenAI, in particular, has been active in refining its Codex line, rolling out updates like GPT-5.2 in December after CEO Sam Altman emphasized the competitive pressures from Google. Just days ago, the company unveiled GPT-5.3-Codex, showcasing its commitment to staying ahead of the curve.
Diversifying from Nvidia: A Significant Pivot
Yet, the story surrounding Codex-Spark extends beyond mere performance scores. It is primarily powered by Cerebras’ Wafer Scale Engine 3—a chip resembling the size of a dinner plate, marking a significant shift in how AI models are deployed. This partnership, initiated in January, is a strategic move by OpenAI to diversify its hardware sourcing.
Recently, OpenAI has worked diligently to lessen its reliance on Nvidia. In a noteworthy deal signed in October 2025, the company entered into a multi-year agreement with AMD and later established a $38 billion cloud computing alliance with Amazon in November. OpenAI has also been developing its own custom AI chips to be manufactured by TSMC in the future.
Despite plans for a substantial $100 billion infrastructure deal with Nvidia, tensions have arisen, contributing to a more cautious approach. Reports have indicated that OpenAI expressed dissatisfaction with the performance of some Nvidia chips for inference tasks—exactly the kind of workload that Codex-Spark is designed to excel at.
Balancing Speed and Accuracy
Ultimately, speed remains a crucial element in the world of AI coding, but it may come at the expense of accuracy. For developers entrenched in their code editors, the experience of receiving AI-generated suggestions at 1,000 tokens per second may feel less like navigating through a complex puzzle and more like running a powerful saw. It’s imperative for developers to remain vigilant as they harness these swifter tools.
As the AI coding landscape continues to evolve, staying informed about these developments will be crucial for developers aiming to leverage AI for enhanced productivity.
To read more about these advancements and the competitive dynamics amongst AI coding agents, click here.
Image Credit: arstechnica.com






