Day 81 of 100 Days of AI

The Plateau of Generative AI?

Dr Mike Pound – Youtube

There are many people in Generative AI circles who believe that we can get to God-like AI by training models on increasingly larger swaths of data. Their view is that if we can train gargantuan models on all of the world’s data, we’ll have artificial general intelligence (“AGI”) that can excel at any task.

Thanks to this video from Dr Mike Pound (and this research paper he discusses), I’m increasingly sceptical that existing approaches of pumping more data into models will get us to AGI. Why? Here are the key empirical points I took away from Dr Pound’s video and the research article on the topic:

Diminishing Returns to Data — Researchers are finding that you need exponentially more data to get incremental improvements in performance. There’s a robust “log-linear” scaling trend. For example, imagine that you can train a model on 100 examples of a task to get a performance score of 15%. To get a 30% score you would need 100^2 (i.e. 10,000 examples). To get a 45% score you would need 100^3 examples (i.e. 1 million examples) and so forth. Is it really possible to find exponentially more data in pursuit of incremental performance gains? At some point, you will get diminishing and possibly negative returns.

Rare Things are Pervasive But Not in the Data — The world is full of ideas, concepts, events and tasks that don’t appear in training datasets that much. It’s incredibly tough to get lots of examples of something rare. Yet, if a model hasn’t seen enough of a concept it will underperform against it. Consider the number of scenarios you might encounter while driving a car on a road with other human drivers. No amount of data collection and model training can ever capture all possibilities on a road. This is why we don’t yet have fully autonomous self-driving cars. It’s also why existing AI techniques might not ever get us there.

Figure 6 from the Paper: “No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance”

AI Models Today Struggle With Rare & Nuanced Concepts — Even if you train a model on all the data in the world, that data will have a long-tail distribution. In other words, a few things will appear a lot and many things will appear infrequently. For example, if you ask a Gen AI model to produce an image of an aircraft, a worm, or a bird, you’ll get lots of good generic results. But try asking it to produce an image of Piaggio Avanti P180 aircraft, or some obscure object or animal. Current AI models fail at this. The authors of this paper found that 40 Gen AI models consistently underperformed on more nuanced data (the “Let-it-Wag” dataset) versus a broader dataset (e.g. ImageNet). You can see an example I run below with Chat GPT4o (using Dall-E 3) versus a real image from Wikipedia.

Above: Dall-E 3 Image of a Piaggio P.180 Avanti
Above: Real-life Image from Wikipedia of a Piaggio P.180 Avanti

What does this all mean for the future of Gen AI?

Spending more money on data and compute is going to have limits. We are probably in a Gen AI bubble and once the performance of these models plateaus, we might see a market correction on how much money goes into the technology.

But there’s hope for more progress. AI researchers are experimenting with new techniques and algorithms. Perhaps a fundamentally new architecture will get us beyond what current generative AI techniques can achieve.

For now though, I don’t think we are on fast path to AGI. But then again, there’s no expert consensus on the matter so only time will tell.

Ps. Here’s a good video that lays out why large language models probably won’t turn into AGI.

Read more