Why LLM Agents Today Don't Work

If you're involved in the space of LLMs, you're probably hearing more and more about agents. After all, in a recent Reddit AMA, Sam Altman said "we will have better and better models, but i think the thing that will feel like the next giant breakthrough will be agents". You can believe this or not, but it's been over 1.5 years since GPT-4 released, and hence unclear whether we will see big jumps in foundational models from OpenAI or other companies in the future.

So instead, many people are thinking more and more about LLM agents. Several agentic frameworks already exist which take different approaches to try and enable LLMs to do more things in the real-world more effectively. To name just a few:

LangChain (LangGraph) - Agents as configurable state machines
ControlFlow - Agentic workflows
AutoGen - Microsoft's multi-agent orchestration framework
CrewAI - Multi-agent platform
Swarm - OpenAI's lightweight multi-agent orchestration framework

But ask yourself, have you actually heard of, or seen, an "agent" being used for a real-world purpose, that couldn't be done just almost easily without the "agent" abstraction?

I'm serious by the way - if you have - I would be very curious to hear about it. Email me at anders@langur.ai

Don't get me wrong - these libraries can have their uses - but I think we're still a long ways away from really having the kinds of agents we can actually rely on for complex real-world tasks.

Most "agents" you hear about are state machines doing things like RAG for Q&A chatbots. We already know LLMs are good at this stuff, so these platforms might serve as useful frameworks to build certain agentic behaviors, but we aren't really opening any "new" capabilities here. When you use an agentic framework for something, you either (1) give up too much control to the point where the agent becomes too unpredictable, or (2) restrict the agent into a specific behavior pattern, which might be handy when your task fits into a neat box - but this is not always the case.

The Path to Fully Reliable Agents

If we're going to have an effective agent abstraction layer on top of LLMs/prompting, it needs to be reliable while retaining behavioral dynamism. You should be able to give an arbitrarily complex task to an agent, and trust that the agent will work towards that task based on reasonable assumptions.

It's not completely clear how to achieve this right now, but I do think there are identifiable areas that we can work on to get much closer:

Consistency: Dynamic agent behavior typical comes at the cost of consistency. But this doesn't need to be the case - we can strike a balance between understanding agents and giving them freedom if the agent behavior is observable and repeatable.
Dynamic behavior: To be capable of a complex and diverse range of tasks, agents must be able to excercise a variety of behavioral patterns. We have to consider the limitations of LLMs but simulateously grant them the ability to define their own behavior dynamically in such a way that enables the completion of any task.
Assumptions: Any task you give an LLM, or a human for that matter, almost always carries some implicit context, information relevant to the task that might not be explicitly included. It might not always be clear that this information is missing, and it might not always be possible to include it. A reliable agentic framework must address this limitation and make the workflow clear between the agent and those responsible for communicating or correcting assumptions regarding tasks.

I'm building Langur, which is an agentic framework that attempts to address all of these limitations. I want to build a platform that takes LLM agents from being an interesting gimmick that's useful on occasion, to dependable actors that you can trust to operate in production systems to solve real problems.

If you're curious to learn more, or want to share your own thoughts on the topic, visit https://langur.ai or shoot me an email at anders@langur.ai. Thanks for reading!