Microsoft releases Fara1.5: a family of browser computing agents (4B/9B/27B) that outperforms OpenAI and Gemini 2.5 for online computing – Mind2Web


Microsoft Research’s Artificial Intelligence Frontiers Lab has released Fara1.5. It is a family of Computer Use Agent (CUA) models for browsers. The version comes in three sizes: Fara1.5-4B, Fara1.5-9B, and Fara1.5-27B. The forms are integrated with MagneticLite, Microsoft’s isolated browser interface for these agents.

Computer user agents are pixel-to-action models that drive a real browser. They read screenshots and issue mouse and keyboard actions to complete tasks. Recent proxy products such as OpenAI’s Operator and Google’s Gemini 2.5 Computer Use fall into this category.

Fara1.5-27B achieved a task success rate of 72% on Online-Mind2Web. This benchmark covers 300 tasks across 136 popular locations. In the same evaluation, the OpenAI launcher scored 58.3% and Gemini 2.5 for PC use scored 57.3%. Yutori’s Navigator n1 test reached 64.7%, and the Fara1.5-9B scored 63.4%. This almost doubles the previous model, the Fara-7B, which scored 34.1% on the same standard.

https://www.microsoft.com/en-us/research/articles/fara1-5-computer-use-agent/

Architecture and agent loop

The models use Qwen3.5 core checkpoints in their 4B, 9B and 27B variants. They work through a loop of observing, thinking, and acting. At each step, the model takes the history of the previous conversation and the last three browser screenshots. Then it emits thoughts and one next action.

The action space includes standard mouse and keyboard inputs and web-specific actions such as web search. It also exposes meta-routines for context management. This includes saving facts for later use and asking the user clarifying questions. These meta-routines allow the agent to operate over longer horizons and work collaboratively with users.

Training mix

The training uses supervised fine-tuning of approximately two million samples. The mix consists of 60% web paths and 12.8% synthetic environments. Form filling and user interactions account for 12.5%. Grounding contributes 8.8% and VQA 4.9%. Smaller slides cover pulling the GUI, following instructions, and safety. The loss is only applied to the last three turns on each track.

https://www.microsoft.com/en-us/research/articles/fara1-5-computer-use-agent/

FaraGen1.5: Synthetic data pipeline

FaraGen1.5 is the synthetic pipeline that generated the training pipelines. It consists of three modular components: environments, resolvers, and verification tools.

Environments are divided into two types. Open Internet tasks run on live websites that don’t require logins. Gated domain tasks require authenticated sessions or irreversible actions, such as sending email.

For the gated domains, the team built six synthetic versions called FaraEnvs. It covers Mail, Calendar, Stream, ML, Live, and Scheduler. Each clone has a real-world front-end, a fully-functional API, and a database containing metadata based on the persona.

These environments were built using the GitHub Copilot CLI as well as iterative human optimization. Since the team controls the entire group, they know the correct outcome for each task. For tasks that change the backend, an LLM reviewer compares database snapshots before and after implementation. Tasks that do not change their state are scored against the previously computed reference answers.

The solution agent uses OpenAI’s GPT-5.4 with custom tools that mirror the Fara1.5 workspace. The solution scored 83% on Online-Mind2Web using automated WebJudge. The previous Fara-7B solution scored 67% in the same evaluation. The user simulator is called when the solution issues a message ask_user Call or when the job is finished.

Three investigators portal to the paths that go into training. Correctness uses rubrics created by LLM for open Internet assignments and distinct databases to judge synthetic assignments. Efficiency punishes redundant or unnecessary actions. User interaction verification checks whether the agent has paused at critical points.

Critical and safety points

Fara1.5 is trained to stop and ask the user in three situations. First: The task requires personal information that the user has not provided. Second: The description of the task is vague or lacks the details necessary for implementation. Third: An irreversible procedure takes place without prior approval.

Safety training uses public safety datasets and internal tasks consistent with Microsoft’s Responsible AI policy. Within MagneticLite, all agent actions are logged and auditable. The sandbox browser also acts as a security boundary between the agent and the user’s device.

Other criteria

On WebVoyager, the Fara1.5-27B scores 88.6%, the 9B scores 86.6%, and the 4B scores 80.8%. The 9B also outperforms similarly sized peers such as the MolmoWeb 8B, GUI-Owl-1.5 8B, and Holo2 8B. All evaluations use Fara1.5 Browserbase to stabilize sessions and reduce session-level blocking. Figures are averaged over three independent runs.

In WebTailBench version 1.5, aimed at long-running web tasks, the Fara1.5-9B achieved 64.5% process success and 32.3% result success. GPT-5.4 obtained 79.6% of operations and 57.4% of results on the same benchmark.

Key takeaways

Here are 5 key points in one line:

  • Microsoft Research has released Fara1.5, a family of 4B, 9B, and 27B browser computer agents built on top of Qwen3.5.
  • Fara1.5-27B scored 72% on Online-Mind2Web, beating OpenAI Operator (58.3%), Gemini 2.5 CU (57.3%), and Yutori Navigator n1 (64.7%).
  • The FaraGen1.5 synthetic data pipeline opens training on gated domains via six functional application instances (FaraEnvs) built using the GitHub Copilot CLI.
  • Fara1.5 pauses to question the user at critical points: missing information, ambiguous tasks, or irreversible actions without consent.

verify Technical details. Also, feel free to follow us on twitter Don’t forget to join us 150k+ mil SubReddit And subscribe to Our newsletter. I am waiting! Are you on telegram? Now you can join us on Telegram too.

Do you need to partner with us to promote your GitHub Repo page, face hug page, product release, webinar, etc.? Contact us


Leave a Reply