NousCoder-14B from Nous Research is an open source coding model located in the Claude Code Moment



Nous Research, an open-source AI startup backed by cryptocurrency venture firm Paradigm, released a new competitive programming model on Monday that it says matches or exceeds many larger proprietary systems – trained in just four days using 48 of Nvidia’s latest B200 graphics processors.

The model, called NousCoder-14B, is another entry in a crowded field of AI coding assistants, but it arrives at a particularly fraught moment: CloudCode, the agent programming tool from rival Anthropic, has dominated social media discussion since New Year’s Day, with developers posting breathless testimonials about its capabilities. The concurrent developments underscore how quickly AI-powered software is evolving, and how fiercely companies large and small are competing for what many believe will become a core technology for how software is written.

NousCoder-14B achieved an accuracy rate of 67.87 percent on LiveCodeBench v6, a standardized evaluation that tests models on competitive programming problems published between August 2024 and May 2025. This number represents a 7.08 percentage point improvement over the base model from which it was trained, Alibaba’s Qwen3-14B, according to Nous Research’s technical report published alongside the release.

“I gave Cloud Code a description of the problem, and it generated what we built last year in an hour,” Jana Dugan, the Google lead engineer responsible for the Gemini API, wrote in a viral post on X last week, which highlighted the mood around AI coding tools. Dugan was describing a distributed agent orchestration system her team spent a year developing – a system that approximates Cloud Code from a three-paragraph prompt.

This juxtaposition is instructive: While Anthropic’s Claude Code has captured the imagination with its end-to-end software development demos, Nous Research is betting that open source alternatives trained on verifiable problems can fill the gap – and that transparency in how these models are built is as important as raw capabilities.


How Nous Research built an AI coding model that anyone can imitate

What sets the NousCoder-14B version apart from many competitor announcements is its radical openness. Nous Research has published not only the model weights, but also the entire reinforcement learning environment, benchmark suite, and training tools – built on the company’s Atropos framework – making it possible for any researcher with sufficient compute to reproduce or extend the work.

“The open source of the Atropos suite provides the infrastructure needed for Olympiad-level reproducible inference research,” noted one X moderator, summarizing the importance to the academic and open source communities.

The model was trained by Joe Li, a resident researcher at Nous Research and himself a former competitive programmer. Lee’s technical report reveals an unexpected personal dimension: he compared the model’s improvement path to his own journey on Codeforces, a competitive programming platform where participants receive ratings based on competition performance.

Based on the rough estimates that assign LiveCodeBench scores to Codeforces evaluations, Lee calculates that the NousCoder-14B’s improvement – from a rating range of roughly 1600-1750 to 2100-2200 – reflects a jump that took nearly two years of consistent practice between the ages of 14 and 16. The model completed the equivalent in four days.

“Watching the final rehearsal was a very surreal experience,” Lee wrote in the technical report.

But Lee is quick to note an important caveat that addresses broader questions about AI’s efficiency: It solved nearly 1,000 problems during those two years, when the model required 24,000. Humans, at least for now, remain vastly more efficient learners at using samples.


Inside a reinforcement learning system that trains on 24,000 competitive programming problems

The training process at NousCoder-14B provides a window into increasingly sophisticated techniques that researchers are using to improve AI’s reasoning capabilities through reinforcement learning.

This approach is based on what researchers call “verifiable rewards” – a system in which a model generates software solutions, those solutions are executed against test cases, and the model receives a simple binary signal: correct or incorrect. Although this feedback loop is straightforward in theory, it requires significant infrastructure to implement at scale.

Nous Research used Modal, a cloud computing platform, to run sandbox code execution in parallel. Each of the 24,000 training problems contains hundreds of test cases on average, and the system must verify that the generated code produces correct output within time and memory constraints – 15 seconds and 4 GB, respectively.

The training used a technique called DAPO (Dynamic Sampling Policy Optimization), which the researchers found was slightly better than the alternatives in their experiments. A key innovation involves “dynamic sampling” – eliminating training examples where the model either solves all attempts or fails all attempts, since these do not provide a useful gradient signal for learning.

The researchers also adopted “iterative context extension,” first training the model using a context window of 32,000 tokens before expanding to 40,000 tokens. During evaluation, expanding the context to approximately 80,000 characters yielded the best results, with an accuracy of 67.87 percent.

Perhaps most importantly, the training pipeline overlaps inference and verification – once the model has generated a solution, it starts working on the next problem while the previous solution is verified. This path, combined with asynchronous training where multiple model instances run in parallel, maximizes hardware utilization on expensive GPU clusters.


A looming data shortage that may slow down the progress of the AI ​​coding model

There’s a discovery buried in Lee’s technical report that has major implications for the future of AI development: the training dataset for NousCoder-14B includes “a significant portion of all readily available competitive programming problems that can be verified in a standardized dataset format.”

In other words, in this particular field, researchers are approaching the limits of high-quality training data.

“The total number of competitive programming problems on the Internet is about the same order of magnitude,” Lee wrote, referring to the 24,000 problems used for training. “This suggests that in competitive programming, we are approaching the limits of high-quality data.”

This observation reflects growing concern across the AI ​​industry about data limitations. While computing continues to expand according to well-understood economic and engineering principles, training data is “increasingly limited,” Lee says.

“It appears that some of the most important research to be done in the future will be in the areas of synthetic data generation, algorithms and efficient data architectures,” he concluded.

The challenge is particularly acute for competitive programming because the field requires problems with known correct solutions that can be automatically verified. Unlike natural language tasks where human evaluation or agent metrics suffice, the code either works or it doesn’t – making generating synthetic data considerably more difficult.

Lee identified one possible avenue: training models not only to solve problems, but also to generate solvable problems, enabling a form of self-playing similar to techniques that have proven successful in game-playing AI systems. “Once the problem of generating synthetic problems is solved, self-play becomes a very interesting direction,” he wrote.


A $65 million bet that open source AI can compete with big tech companies

Nous Research has carved out a niche in the AI ​​landscape: a company committed to open source releases that compete with, and sometimes even exceed, proprietary alternatives.

The company raised $50 million in April 2025 in a round led by Paradigm, the cryptocurrency-focused venture firm founded by Coinbase co-founder Fred Ehrsam. Total funding reached $65 million, according to some reports. The investment reflects growing interest in a decentralized approach to AI training, an area in which Nous Research has developed its Psyche platform.

Previous releases include Hermes 4, a set of models that we reported “outperforms ChatGPT without content limitations,” and DeepHermes-3, which the company describes as the first “interchangeable thinking model” – allowing users to activate extended thinking capabilities on demand.

The company has developed a distinct aesthetic and community, raising some doubts about whether style might overshadow substance. “I’ll believe pfp anime company. Stop benchmarking,” one critic wrote on

Others raised technical questions. “Based on the benchmark, Nemotron is the best,” one commenter noted, referring to Nvidia’s family of language models. Another asked whether NousCoder-14B was “agent-focused or just ‘one-shot’ coding” – an important distinction for practical software development, where iterating over feedback typically yields better results than single attempts.


What the researchers say needs to happen next for AI coding tools to continue improving

The release includes several directions for future work that indicate the direction in which AI coding research may be heading.

Multi-turn reinforcement learning tops the list. Currently, the model only receives the final binary reward – success or failure – after generating the solution. But competitive programming problems typically involve generic test cases that provide intermediate feedback: compilation errors, incorrect output, and time bound violations. Training models to incorporate this feedback across multiple trials can dramatically improve performance.

Controlling response length also remains a challenge. The researchers found that incorrect solutions tended to be longer than correct solutions, and response lengths quickly saturated the context windows available during training – a pattern that various algorithmic tweaks failed to resolve.

Perhaps most ambitiously, Lee proposed “problem generation and self-play”-training models for solving and creating programming problems. This would directly address the problem of data scarcity by enabling models to create their own training approaches.

“Humans are great at generating problems that are interesting and useful to other competitive programmers, but there appears to still be a significant gap in LLM’s abilities to generate creative problems,” Lee wrote.

The model is now available on Hugging Face under the Apache 2.0 license. For researchers and developers who want to build on the work, Nous Research has published the full Atropos training package alongside it.

What took me two years of teenage dedication to achieve – rising from a 1600-ranked novice to a 2100-ranked competitor at Codeforces – is AI iterated in 96 hours. He needed 1000 problems. The model needs 24,000. But very soon, these systems may learn to write their own problems, teach themselves, and leave human standards behind entirely.

The question is no longer whether machines can learn to program or not. It’s about whether they will soon be better teachers than we ever were.

Leave a Reply