Day 92 of 100 Days of AI

I’ve just finished listening to this podcast with the founders of https://www.factory.ai/. Their startup automates away the drudgery bits of work in software engineering. That includes writing tests, debugging, documentation, and migrations.

I found the views of the founders compelling. For example, they don’t think AI will replace software engineers en mass and suddenly. Instead, we are likely to see gradual automation, task by task. For example, Factory.ai has bots that can automatically review code changes; they have a bot that can rewrite code and improve it; and they also have bots that can plan larger projects.

What they aren’t building, and what isn’t yet achievable, is a fully autonomous AI software engineer. We are still some way away from that and companies that are attempting to build such systems will probably struggle to create real business value. (Hint: Those systems aren’t yet reliable!)

I found the podcast insightful and it’s worth a watch if you’re interested in the automation of work or are building a company in the space.

Day 91 of 100 Days of AI

I finally have a Youtube video summarizer working! I’ve deployed the app with Streamlit.io. On the backend I’m using the Gemini 1.5 Flash LLM because it’s cheap and fast! The summaries are “chained” together using Langchain. A 20 minute video can be summarized in around 10 seconds.

I watch a lot of Youtube videos but sometimes I want to get a quick overview of a video before investing time into it. I’ll be using this app going forward to get a preview of content and also to get a recap of long videos I have already watched. Here’s an example output summary generated from this Youtube video.

Day 90 of 100 Days of AI

I’ve been testing out the new Claude LLM and it’s better at coding than GPT4o. I might actually start using it more given how fast and effective it is. You can see the benchmarks below that Anthropic provided. They probably have some bias as they are self-published but from my experience the model has been performing better than GPT4o in some instances.

Day 89 of 100 Days of AI

Today I got a Youtube summarizer script working! I can input a Youtube URL and I get useful summary of the video using the Langchain “refine” documents chain. I really like this because iterates over portions of text and it refines the summary until something more thorough is generated.

I’ll share this on Github soon along with some demo examples. I’ve used the Gemini 1.5 Flash model via API. As you can see the charts below, it performs reasonably well, it’s fast, and really cheap!

Day 88 of 100 Days of AI

This evening I experimented with Langchain’s summarisation frameworks. This is something LLMs are great at natively, but with Langchain, you can use even more sophisticated summarisation techniques. Here’s a GPT-generated summary of these techniques based on the Langchain documentation:

From ChatGPT4o:

1. Stuff Method

  • Concept: Simply concatenate all documents into a single prompt and pass that prompt to a language model (LLM).
  • Usage:
    • Suitable for cases where the combined document size does not exceed the model’s token limit.
    • Useful for quick and simple summarization tasks.
  • Pros: Easy to implement.
  • Cons: Limited by the token capacity of the LLM; not efficient for large sets of documents.

2. Map-Reduce Method

  • Concept: A two-step approach where documents are first summarized individually (map), and then these summaries are combined into a final summary (reduce).
  • Usage:
    • Appropriate for summarizing large collections of documents.
    • Effective when documents are too large to be processed in a single prompt.
  • Pros: Can handle larger datasets by breaking them down into smaller chunks.
  • Cons: More complex to implement compared to the stuff method; may require tuning to balance between map and reduce stages.

3. Refine Method

  • Concept: Iteratively updates a summary by passing through the documents sequentially, refining the summary at each step.
  • Usage:
    • Best for situations where the documents can provide additional context sequentially.
    • Useful for creating a more detailed and nuanced summary.
  • Pros: Produces a progressively refined and detailed summary.
  • Cons: Can be time-consuming and computationally expensive due to iterative nature.

Day 87 of 100 Days of AI

Given the note I wrote yesterday about the need to be “programmatically” lazy, I’m rewriting my Youtube reviewer system. I’m building it initially without agents, and I’ll only add an agentic layer in areas that are more unstructured. I’m not yet sure what areas will have this need, but once I have a deterministic version of the system working, I’ll then consider AI agents.

My hope is that this system will be cheaper to run and that it will output higher quality results. You can see a previous version of the system here.

Day 86 of 100 Days of AI

The temptation with any new powerful tool is to apply it in more places than is necessary. What’s that saying…”to a man with a hammer, every problem is a nail”?

The same appears to be happening with large language model frameworks and AI agent frameworks. I’ve experienced some of this personally. Some things I can do programmatically without frameworks, but I’ve been tempted to use AI agents with api calls to OpenAI that are bloated with tokens I didn’t need to use.

In the video below Matt Williams reminds us to consider simplicity.

One quote worth remembering here is this:

…only lazy programmers will want to write the kind of tools that might replace them in the end. Only a lazy programmer will avoid writing monotonous, repetitive code. The tools and processes inspired by laziness speed up production.”

Philipp Lenssen

Day 85 of 100 Days of AI

One of the pioneers of how large language models are trained and fine-tuned, Jeremy Howard, has a great article here on “What policy makers need to know about AI (and what goes wrong if they don’t)”. The article is actually very accessible to everyday readers. Here’s an excerpt as an example.

The rest of the article makes the broad point that policy makers don’t understand how large language models work. Howard notes that, “AI models are general purpose computation devices, which can be quickly and easily modified and used for any purpose. Therefore, the requirement that they must be shown to be safe prior to release, means that they can’t be released at all.”

The full article is worth reading if you’re building in AI or are interested in how this new powerful technology will be regulated.

Day 84 of 100 Days of AI

I tested out a few AI agents today with the goal of optimising on cost, but they remain expensive to run if you have tasks that involve multi-step reasoning with chunky bits of content.

One group of tasks that I gave to GPT4o cost $0.15 and took 2 minutes to process. That’s $4.50 per hour. Running this tasks continuously 8 hours per day over 250 working days a year would cost $9,000.

The cost of models is going to come down, but thinking through these numbers made me reconsider how to build agents. They need to be resource-conscious and efficient. On the flip side, startups that are selling agentic services have to find a way to price them for the value they create, not just how much they cost. For example paying $9,000 a year for automation isn’t crazy if it frees up time that is worth 10x or more.

Day 83 of 100 Days of AI

OpenAI launched a feature called “function calling” last year, but it’s a feature I haven’t used and didn’t understand until today.

Part of my misunderstanding was because the feature is poorly named. It made me (and many other more technical people) expect that the capability means an LLM can call functions. That isn’t the case. OpenAI’s “function calling” allows developers to ensure a model’s outputs fit a particular format. Instead of generating just regular text, the model can strictly adhere to JSON format, for example. The YouTuber below provides the clarity I needed on this today. He also shows how to achieve the same JSON format effect using a local on-device model with Ollama.
https://youtu.be/RXDWkiuXtG0?feature=shared