AI agents have been getting a lot of attention but they can be expensive to run. This is because they use a lot of tokens and make multiple calls to LLMs. Some AI developers have beat performance benchmarks by using exceptionally expensive agentic runs. However, this is unrealistic because in real-world business applications, costs matter.
This brings me to this new AI paper from researchers at Princeton University. The chart below shows what they have achieved. In a nutshell, the researchers argue that cost should be considered when running evaluations of AI agent systems. In addition, the paper highlights techniques that can produce high accuracy at a massively lower cost! See the chart below on accuracy versus cost.
A sample of techniques used to generate high accuracy at lower cost are listed below. They are also labelled on the dots in the chart above.
Retrying Strategy: This is where you set the temperature of a model to zero, and call it a fixed number of times if it keeps failing a specified test. You can do this with cheaper models like GPT 3.5, but call them repeatedly just a handful of times until you get a desired answer. Note that even when temperature is set to zero, these models are still stochastic.
Warming Strategy: Here you do the same as the retry strategy, but you incrementally increase the temperature at each try. You go from temperature 0 to 0.5. Tuning up the temperature (aka randomness) increases the variety of answers and could lead to a successful result quicker and more cheaply.
Escalation Strategy: You start with a cheaper model and escalate to more intelligent, expensive models only if the first prompt attempts fail. This reserves resources and until the cheaper models fail.
The paper has more on the topic and it’s worth reading in full if you’re building real-world agentic applications.