Tech

Day 100 of 100 Days of AI 🎉

Michael Tefula

03 Jul 2024 — 1 min read

AI agents have been getting a lot of attention but they can be expensive to run. This is because they use a lot of tokens and make multiple calls to LLMs. Some AI developers have beat performance benchmarks by using exceptionally expensive agentic runs. However, this is unrealistic because in real-world business applications, costs matter.

This brings me to this new AI paper from researchers at Princeton University. The chart below shows what they have achieved. In a nutshell, the researchers argue that cost should be considered when running evaluations of AI agent systems. In addition, the paper highlights techniques that can produce high accuracy at a massively lower cost! See the chart below on accuracy versus cost.

A sample of techniques used to generate high accuracy at lower cost are listed below. They are also labelled on the dots in the chart above.

Retrying Strategy: This is where you set the temperature of a model to zero, and call it a fixed number of times if it keeps failing a specified test. You can do this with cheaper models like GPT 3.5, but call them repeatedly just a handful of times until you get a desired answer. Note that even when temperature is set to zero, these models are still stochastic.

Warming Strategy: Here you do the same as the retry strategy, but you incrementally increase the temperature at each try. You go from temperature 0 to 0.5. Tuning up the temperature (aka randomness) increases the variety of answers and could lead to a successful result quicker and more cheaply.

Escalation Strategy: You start with a cheaper model and escalate to more intelligent, expensive models only if the first prompt attempts fail. This reserves resources and until the cheaper models fail.

The paper has more on the topic and it’s worth reading in full if you’re building real-world agentic applications.

Learn Slow So You Can Move Fast

I learned to code the old-school way: I bought a Python textbook and went through examples and exercises, page by page, writing all the code from scratch. Today, we have AI agents writing code for us. I often use Cursor and LLMs to rapidly generate snippets or whole sections of

The Gen AI Frenzy: What’s Hype, What’s Real, and Where’s the Productivity?

Today, I read two contrasting articles. One posited that we are near the peak of investor hype in Gen AI. It argued that productivity gains from this new technology will be incremental rather than transformative. Another article suggested the opposite. It made the distinction between good bubbles and bad bubbles,

Prompt Engineering: A Surprising Switching Cost of Large Language Models

I've been working on some exceptionally long LLM prompts for a couple of projects at work. I've noticed a fascinating phenomenon: A prompt that works well with one model can diverge in performance when applied to another. This presents switching costs for developers and businesses. You

A Goldilocks Introduction to AI Agents: Opportunities, Challenges, and Everyday Impact

AI agents have quickly evolved from obscure technical experiments to mainstream buzz. In the last 18 months, new automation capabilities have captured the imagination of entrepreneurs and established organizations alike. While it’s possible that we’re in another AI bubble, the early adopters of AI agents are reporting impressive

Read more

Learn Slow So You Can Move Fast

The Gen AI Frenzy: What’s Hype, What’s Real, and Where’s the Productivity?

Prompt Engineering: A Surprising Switching Cost of Large Language Models

A Goldilocks Introduction to AI Agents: Opportunities, Challenges, and Everyday Impact