AI for the Rest of Us
Posts
Stop. There's an AI Agent For That!

Stop. There's an AI Agent For That!

Fauzi
June 02, 2024 • Reading Time: 6 minutes

One of the most exciting developments in the last few months has been the proliferation of use cases for Large Language Models. One such use case I am excited about is Autonomous AI Agents. Today, we will discuss what these agents are, how they relate to standard Large Language Models, and what they are capable of. Let's get started!

What is an Agent?

An LLM-powered autonomous agent (agent, for short) is simply a Large Language Model that has been enhanced with four capabilities: planning, memory, tool use, and collaboration. Each capability enables the LLM to act autonomously to achieve a specified goal without being told how. In other words, an agent allows you to specify a high-level goal, and it figures out how to achieve that goal by itself. Let's briefly discuss each of these capabilities and what kinds of behavior they enable.

Components of an agent

Planning

Planning enables the LLM to divide a goal into smaller, more manageable subgoals. For example, given the goal “Plan a trip for 4 to Mexico in the summer,” the agent may choose to break that goal down into three different subgoals: “find flights to Mexico,” “find the best cities in Mexico for a vacation,” and “find the nicest hotels in those cities.”

Memory

Memory enables the LLM to remember and recall details over a period of time. Standard LLMs like ChatGPT already have this capability for short-term memory, allowing them to understand and keep up with a conversation over a short period. However, the memory capabilities of agents extend beyond that, enabling them to remember details such as your preferences and previous conversations over an extended period.

Tool Use

The word “tool” in this context simply means any capability that the LLM was not trained to have, but can now learn to use. This could be anything that can be done on a computer screen. For instance, ChatGPT or Claude is currently not capable of both writing and sending an email. In the context of agents, you could use Claude as an agent to write an email and interact with Gmail to send the email to a specific recipient. The ability to interact with Gmail would be encapsulated in some code that the LLM could use as a tool. Pretty much anything can be a tool for an LLM, as long as it can be represented as code (most, if not all actions on a computer can be encapsulated as code).

Collaboration

Given the individual capabilities of agents, putting two or more agents together can enable you to do a lot more than you could with just one agent. For instance, you could have each agent specialize in a specific task or subgoal. You could also have scenarios where agents pass messages to each other and delegate tasks to one another to meet the required goal.

Agents can communicate and collaborate

What can Agents do?

Now that you understand what agents are and the components of an agent, let’s discuss the kinds of things agents can do.

Agents have all the capabilities of a Large Language Model. However, due to their added capabilities (planning, memory, tool use, and collaboration), they can handle more complex tasks. For instance, consider a scenario where you want to research food and nutrition to create a meal plan that keeps you nourished and energized. Using a standard Large Language Model, you would start by prompting the model for information, going back and forth, refining, and following up for more information until you got the output you desired. Using an agent, this interaction would be very different. You would tell the agent your goal, and it would go off on its own, independently researching on your behalf, iterating until it found what it deems to be the best results for you.

In this example, you could use multiple agents to achieve even better results. One agent could take on the role of a nutritionist, another a chef, and a third could be a personal assistant. The nutritionist agent could focus exclusively on gathering information from the internet about nutrition and food and then recommending ingredients. The chef agent could take the nutritionist's recommendations and combine them in novel ways to create different dishes. Lastly, the personal assistant agent could take the dishes created by the chef and plan them by day, setting reminders in your calendar to shop for ingredients for each meal.

How you can use agents today?

Agents are still a fairly new concept. Today, most tools that exist are targeted at developers. However, as time goes on, these developers (myself included) will start to harness the power of agents to build software that solves ever more complex problems. This is already starting to happen, with tools like MultiOn and RelevanceAI.

MultiOn is an agent that lives in your browser and can take complex actions on your behalf. Here's a video demonstrating it shopping on Amazon, ordering an Uber, and booking events.

Tools like MultiOn are just scratching the surface of what kinds of workflows agents will enable us to automate. I am currently working on one such workflow based on my personal needs. However, I will be making it available to subscribers of this newsletter in the coming weeks. Make sure you are subscribed so you don't miss it. Until then, happy prompting!

— Fauzi

Reply

or to participate.