CORPGEN advances AI brokers for actual work

February 27, 2026

26

decorative icons in white on a blue and green gradient background

At a look

Right now’s AI agent benchmarks check one job at a time, whereas actual office productiveness requires managing dozens of interdependent duties directly. To replicate this, we created a setting referred to as Multi-Horizon Activity Environments (MHTEs).
Underneath multi-task hundreds, main computer-using brokers degrade sharply, with completion charges dropping from 16.7% to eight.7%.
CORPGEN introduces digital workers, with hierarchical planning, reminiscence isolation, and experiential studying, delivering as much as 3.5 occasions larger completion charges than baselines throughout three impartial agent backends.
As a result of CORPGEN is architecture-agnostic and modular, its features come from system design reasonably than any single base mannequin, and it advantages instantly as underlying fashions enhance.

By mid-morning, a typical data employee is already juggling a shopper report, a finances spreadsheet, a slide deck, and an e mail backlog, all interdependent and all demanding consideration directly. For AI brokers to be genuinely helpful in that surroundings, they might want to function the identical method, however in the present day’s finest fashions are evaluated one job at a time, not dozens directly.

In our paper, “CORPGEN: Simulating Company Environments with Autonomous Digital Staff in Multi-Horizon Activity Environments,” we suggest an agent framework that equips AI with the reminiscence, planning, and studying capabilities to shut that hole.

Introducing Multi-Horizon Activity Environments

Replicating the truth of office multitasking requires a brand new form of analysis surroundings. In response, we developed Multi-Horizon Activity Environments (MHTEs), settings the place an agent should handle a number of complicated duties concurrently. Every job requires 10 to 30 dependent steps inside a single session spanning 5 hours.

To find out what a benchmark would wish to check, we ran MHTEs at scale on a few of in the present day’s main AI brokers, exposing 4 weaknesses. First, reminiscence fills up. An agent can’t maintain particulars for a number of lively duties directly. Second, data from one job interferes with reasoning about one other. Third, duties don’t rely on one another in easy sequences. They kind complicated webs the place an agent should continually examine whether or not upstream work is completed earlier than it might transfer ahead on something downstream. Fourth, each motion cycle requires reprioritizing throughout all lively duties, not merely resuming the place the agent left off.

We additionally examined three impartial agent techniques underneath rising hundreds. Because the variety of concurrent duties rose from 12 to 46, completion charges fell from 16.7% to eight.7% throughout all techniques.

CORPGEN’s structure

CORPGEN introduces digital workers: LLM-powered AI brokers with persistent identities, role-specific experience, and real looking work schedules. They function Microsoft Workplace functions via GUI automation and carry out constantly inside MHTEs over hours of steady exercise. Determine 1 illustrates how a digital worker strikes via a full workday.

Diagram showing a digital employee's workday in three phases. Day Init on the left, where the agent loads memory and generates a daily plan. Execution Cycles in the center, where the agent repeatedly retrieves context, reasons and acts through a ReAct loop, and persists results across 50+ interleaved tasks. Day End on the right, where the agent generates a reflection and consolidates experience into long-term memory. Below the diagram, labels show the tiered memory architecture and experiential learning components. — Determine 1. Every day begins with a structured plan and reminiscence loaded from earlier periods. The agent then works via overlapping duties in repeated cycles, storing key outcomes at day’s finish to tell the following session.

CORPGEN addresses every of the 4 weaknesses of concurrent job execution—reminiscence overload, cross-task interference, dependency complexity, and reprioritization—in a focused method. Hierarchical planning breaks targets into each day targets after which into moment-to-moment choices, permitting the agent to behave from a structured plan as an alternative of reviewing all out there duties earlier than every step.

Subagents carry out complicated operations like net analysis in remoted contexts, stopping cross-task contamination. A tiered reminiscence system allows selective recall of task-related data reasonably than retaining every little thing in lively context. Adaptive summarization compresses routine observations whereas preserving essential data, holding reminiscence development managed.

As a result of these mechanisms usually are not tied to a selected base mannequin, we examined CORPGEN throughout three completely different brokers. In every case, we noticed constant features. The enhancements got here from the structure, not from the energy of any explicit mannequin. Determine 2 reveals how they match collectively inside CORPGEN’s structure.

Architecture diagram of the CORPGEN framework. At center is the Digital Employee with persistent identity, execution engine, cognitive tools, sub-agents, and context management. On the left, Hierarchical Planning decomposes strategic objectives into tactical plans and operational actions. On the right, Sub-Agents as Tools shows a Research Agent and Computer-Use agent (UFO2) operating in isolated contexts. At the bottom, the Tiered Memory Architecture spans working memory, structured long-term memory, and semantic memory via Mem0. Experiential Learning in the bottom right captures successful trajectories and routes feedback to UFO2. Multi-Employee Collaboration at the top shows async communication via Email and Teams with no shared state. — Determine 2. 4 mechanisms help concurrent job execution in CORPGEN: hierarchical planning, remoted subagents, tiered reminiscence, and adaptive summarization.

How digital workers collaborate

When a number of digital workers function in the identical surroundings, collaboration takes form via customary communication channels, with out predefined coordination guidelines. One worker sends an e mail requesting information; one other picks it up within the subsequent cycle, makes use of its reminiscence to course of it, and responds. This alternate mirrors actual office communication.

There isn’t any shared inner state between brokers. Coordination happens solely via e mail and Microsoft Groups, the identical channels many staff use. Over time, these impartial exchanges kind recognizable organizational patterns. Some brokers tackle management roles; others present help; shared paperwork develop into the connective tissue.

When a communication path breaks, akin to an e mail supply error, brokers reroute messages via alternate channels to maintain work shifting. The result’s a digital group that behaves like an actual one with out being explicitly programmed to take action.

Evaluating CORPGEN

We evaluated CORPGEN on a multi-task benchmark that mixed as much as 46 duties right into a single six-hour session. Three findings stood out.

Baselines degrade as load will increase; CORPGEN doesn’t. All three baseline agent techniques confirmed regular efficiency declines as job load rose. CORPGEN, in contrast, maintained or improved its completion charges at larger hundreds. At 46 duties, CORPGEN accomplished 15.2% of duties, in contrast with 4.3% for the baselines, roughly 3.5 occasions extra.

Experiential studying drives the most important features. We launched CORPGEN’s elements sequentially: first the orchestration layer, then cognitive instruments, and at last experiential studying. The primary two produced reasonable enhancements. Experiential studying, wherein brokers retailer information of accomplished duties and reuse them once they encounter structurally related work, produced the most important improve, elevating completion charges from 8.7% to fifteen.2%.

Analysis methodology modifications the image. After we inspected the precise output information produced by brokers, the outcomes agreed with human judgements roughly 90% of the time. Analysis based mostly on screenshots and motion logs agreed solely about 40% of the time. This hole means that frequent analysis approaches could underestimate what brokers really accomplish in apply.

Implications and looking out ahead

The outcomes counsel that reminiscence and retrieval, not simply uncooked mannequin functionality, could also be a key bottleneck in getting brokers to work in the true world. The biggest features got here from experiential studying. Brokers that study from prior successes and apply these patterns to structurally related duties construct a bonus over techniques that reply to every job in isolation.

CORPGEN additionally opens a brand new lens on how AI brokers collaborate. Subsequent steps embody testing whether or not brokers can preserve reminiscence throughout a number of workdays and the way they coordinate when working in groups. We’re additionally exploring methods to make brokers quicker and extra dependable by combining completely different strategies of interacting with software program.

Acknowledgments

This work is a results of a collaboration between the Workplace of the CTO at Microsoft and the Microsoft AI Improvement Accelerator Program (MAIDAP). We wish to thank the Microsoft Safety Analysis crew for offering sources that supported this analysis. We additionally thank the members of the Microsoft UFO2 (opens in new tab) crew and the Mem0 (opens in new tab) venture for his or her open-source contributions, which enabled key elements of the CORPGEN structure, and the OSWorld crew for the benchmark that served as the muse for our multi-task analysis.

Lastly, we thank the various contributors to this analysis: Charlotte Siska, Manuel Raúl Meléndez Luján, Anthony Twum-Barimah, and Mauricio Velazco.

Previous articleDatabricks at MWC 2026 | Databricks Weblog

Next articleThe Trump telephone positive seems to be loads like this HTC handset

CORPGEN advances AI brokers for actual work

At a look

Introducing Multi-Horizon Activity Environments

CORPGEN’s structure

How digital workers collaborate

Evaluating CORPGEN

Implications and looking out ahead

Acknowledgments

Related Articles

Mars rover makes use of wiggly wheels impressed by lizard

This Week’s Superior Tech Tales From Across the Internet (By means of June 20)

AURA Foresight Reaches World XPRIZE Wildfire Finals in Alaska

LEAVE A REPLY Cancel reply

Latest Articles

Mars rover makes use of wiggly wheels impressed by lizard

This Week’s Superior Tech Tales From Across the Internet (By means of June 20)

AURA Foresight Reaches World XPRIZE Wildfire Finals in Alaska

Photo voltaic Beat Coal in US Electrical energy Combine for the First Time in Might

Robots-Weblog | RoboCup 2050: Werden Roboter einmal Fußball-Weltmeister?

ABOUT US

CORPGEN advances AI brokers for actual work

At a look

Introducing Multi-Horizon Activity Environments

CORPGEN’s structure

How digital workers collaborate

Evaluating CORPGEN

On Second Thought

Implications and looking out ahead

Acknowledgments

Related Articles

LEAVE A REPLY Cancel reply

Latest Articles

ABOUT US