Just 18 months ago, OpenAI released GPT-3.5 Turbo which had double the input token context window of its predecessor, GPT-3. We went from 2048 tokens to 4096 tokens and that felt like a significant leap. But today, we are enjoying context windows of 128,000 tokens with GPT4o.
How much further can we go? Today, I was perusing this Google paper and it turns out that their research team can achieve a 10 million token context window Gemini 1.5! Not just that, but as you can see in the charts below from the June 2024 update of the paper, the model achieves almost perfect recall across the equivalent of 7 million words or up to 107 hours of audio or 10 hours of video. These are incredibly impressive results!