Andrej Karpathy has released Autoresearch, an open-source framework that turns ML experimentation into an autonomous loop. The 630-line Python tool lets AI agents read their own source code, form hypotheses for improvement, modify the code, run experiments, and commit successful changes via Git, all without human intervention.
The framework, published in early March 2026, has already run 50 experiments overnight on a single GPU. Each dot in Karpathy’s visualisation represents an independent experiment: the agent proposes a change, such as adjusting a learning rate or modifying architecture depth, runs the training loop, evaluates the result, and decides whether to keep or discard the modification.

How does Autoresearch work?
The design is deliberately minimal. Autoresearch reads its own codebase, generates a hypothesis for improvement, implements the change, runs the experiment, and evaluates results against a metric. Successful experiments are committed to Git. Failed ones are discarded. The loop continues autonomously until the GPU runs out of time or the user stops it.
What makes the framework interesting is not its complexity but its simplicity. At 630 lines of Python, it strips the agentic experimentation concept down to its essential loop. There is no elaborate orchestration, no complex agent framework dependency. Just a tight read-modify-evaluate-commit cycle that a single GPU can sustain overnight.
Beyond ML research
The design pattern behind Autoresearch applies well beyond machine learning. Any domain where code can be modified, evaluated against a metric, and iterated on autonomously fits the model. Software testing, optimisation problems, and configuration tuning are all candidates for the same approach.
Karpathy’s framework also highlights a broader trend: AI agents that improve themselves. The loop of proposing changes, evaluating them, and committing winners is a primitive form of self-improvement. As agent frameworks grow more capable, the question is whether this pattern scales from single-GPU ML experiments to larger, more consequential domains.
This article is for informational purposes only and does not constitute financial, investment, or professional advice.


