Uncategorized

The Plateau of Generative AI?

There are many people in Generative AI circles who believe that we can get to God-like AI by training models on increasingly larger swaths of data. Their view is that if we can train gargantuan models on all of the world’s data, we’ll have artificial general intelligence (“AGI”) that can excel at any task.

Thanks to this video from Dr Mike Pound (and this research paper he discusses), I’m increasingly sceptical that existing approaches of pumping more data into models will get us to AGI. Why? Here are the key empirical points I took away from Dr Pound’s video and the research article on the topic:

Diminishing Returns to Data — Researchers are finding that you need exponentially more data to get incremental improvements in performance. There’s a robust “log-linear” scaling trend. For example, imagine that you can train a model on 100 examples of a task to get a performance score of 15%. To get a 30% score you would need 100^2 (i.e. 10,000 examples). To get a 45% score you would need 100^3 examples (i.e. 1 million examples) and so forth. Is it really possible to find exponentially more data in pursuit of incremental performance gains? At some point, you will get diminishing and possibly negative returns.

Rare Things are Pervasive But Not in the Data — The world is full of ideas, concepts, events and tasks that don’t appear in training datasets that much. It’s incredibly tough to get lots of examples of something rare. Yet, if a model hasn’t seen enough of a concept it will underperform against it. Consider the number of scenarios you might encounter while driving a car on a road with other human drivers. No amount of data collection and model training can ever capture all possibilities on a road. This is why we don’t yet have fully autonomous self-driving cars. It’s also why existing AI techniques might not ever get us there.

Figure 6 from the Paper: “No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance”

AI Models Today Struggle With Rare & Nuanced Concepts — Even if you train a model on all the data in the world, that data will have a long-tail distribution. In other words, a few things will appear a lot and many things will appear infrequently. For example, if you ask a Gen AI model to produce an image of an aircraft, a worm, or a bird, you’ll get lots of good generic results. But try asking it to produce an image of Piaggio Avanti P180 aircraft, or some obscure object or animal. Current AI models fail at this. The authors of this paper found that 40 Gen AI models consistently underperformed on more nuanced data (the “Let-it-Wag” dataset) versus a broader dataset (e.g. ImageNet). You can see an example I run below with Chat GPT4o (using Dall-E 3) versus a real image from Wikipedia.

Above: Dall-E 3 Image of a Piaggio P.180 Avanti

Above: Real-life Image from Wikipedia of a Piaggio P.180 Avanti

What does this all mean for the future of Gen AI?

Spending more money on data and compute is going to have limits. We are probably in a Gen AI bubble and once the performance of these models plateaus, we might see a market correction on how much money goes into the technology.

But there’s hope for more progress. AI researchers are experimenting with new techniques and algorithms. Perhaps a fundamentally new architecture will get us beyond what current generative AI techniques can achieve.

For now though, I don’t think we are on fast path to AGI. But then again, there’s no expert consensus on the matter so only time will tell.

Ps. Here’s a good video that lays out why large language models probably won’t turn into AGI.

After the OpenAI developer event yesterday investor Harry Stebbings tweeted, “Holy shit. Can you imagine a GPT where you could ask any question and it uses advice from 3,000 20VC episodes to answer your questions from the best VCs in the world.” That possibility is in fact already a reality. Here’s how I spun up a working prototype rapidly. Demo videos below.

Some background: I built something similar six months ago with a different dataset. However, that process took several hours over a few evenings. Today, you can make custom bots in minutes. Let’s walk through the broad steps I took for the 20VC bot.

First, I downloaded a sample of 60 episodes from the 20VC podcast. I then used AssemblyAI’s API to transcribe the MP3s in a big batch. You could also use Whisper, which is cheaper and perhaps even faster. I went for AssemblyAI instead because of familiarity and a need to prototype quickly.

The next step was to convert these transcripts into a database that GPT4 could use. For that, I used Retool — a platform that lets you drag and drop files into a database of embeddings that language models understand. Retool also provides chatbot interfaces you can use right off the shelf. And voila! I had a bot that could query 60 episodes of 20VC for knowledge and advice.

To create a full version of the 20VC bot you would need all 3,000 episodes and a reasonable budget for large language model services. This process will rack up a bill in the hundreds (maybe thousands) of dollars, but it’s small change for a media or investment business.

To Harry and his team, I hope this demo shows what’s possible. Even without the upcoming OpenAI feature that enables anyone to create their own GPT, you can build custom GPTs already and with impressive speed.

Happy building all!

Day 81 of 100 Days of AI

Turning Podcasts into AI Insights: The Making of a 20VC-GPT Bot