OpenHands CodeAct 2.1: An Open, State-of-the-Art Software Development Agent

8 min read

Written by

Graham Neubig, Xingyao Wang

Published on

November 1, 2024

OpenHands just released the strongest AI software developer to date, and it is open source, reproducible, and available for anyone to use.

Based on our new agent, CodeAct 2.1 (powered by the new Claude Sonnet 3.5), we achieved:

🥇 53% resolve rate on SWE-Bench Verified
🥇 41.7% resolve rate on SWE-Bench Lite

These numbers demonstrate the strength of our agents, and the difference in user experience using them is palpable - they resolve more issues more autonomously than ever before. Read on more for details, or jump right in to:

What is SWE-Bench?

SWE-bench is a benchmark designed to evaluate large language models' (LLMs) capabilities in addressing real-world software engineering challenges. It consists of issues and corresponding pull requests from 12 popular Python repositories on GitHub, tasking models with generating code patches to resolve specified issues. Due to its realism and the potential vast benefits of AI agents that could autonomously solve real-world software development challenges, it is used widely throughout academia and industry as a gold-standard for measuring the abilities of AI software developers.

What did we do?

We released a new agent, OpenHands CodeAct 2.1. CodeAct is a strong single agent that can solve a wide variety of tasks by executing Python and bash commands, powered by a language model base (see the paper for more details).

CodeAct was already a competitive agent for SWE-Bench, but for CodeAct 2.1, we made a number of further improvements:

Switched to use function calling, a method used by language models to more precisely specify the functions available to them
We switched to Anthropic's new claude-3.5 model, released in October
Made a number of fixes to make it easier for agents to traverse directories

Why is this important?

From a user experience perspective, the difference is palpable. We use OpenHands every day in our development through the UI and github issue resolver, and things are smoother – more issues get resolved without human intervention, bigger issues can be tackled, agents don't get stuck in circles as much.

Further, at All Hands AI we believe strongly that AI agents are going to be transformative for software development. Because this software that will touch every developer's workflow, it is also software that should be developed in the open so all developers can see it, use it, and contribute to its development. Having the best software developers be open source is a step towards this mission.

If this mission resonates with you, we would love to have you start out by using OpenHands through the links above, and further:

Citation