At present, we’re asserting the general public preview of AWS DevOps Agent, a frontier agent that helps you reply to incidents, establish root causes, and stop future points via systematic evaluation of previous incidents and operational patterns.
Frontier brokers characterize a brand new class of AI brokers which can be autonomous, massively scalable, and work for hours or days with out fixed intervention.
When manufacturing incidents happen, on-call engineers face vital stress to rapidly establish root causes whereas managing stakeholder communications. They need to analyze knowledge throughout a number of monitoring instruments, assessment latest deployments, and coordinate response groups. After service restoration, groups typically lack bandwidth to remodel incident learnings into systematic enhancements.
AWS DevOps Agent is your always-on, autonomous on-call engineer. When points come up, it mechanically correlates knowledge throughout your operational toolchain, from metrics and logs to latest code deployments in GitHub or GitLab. It identifies possible root causes and recommends focused mitigations, serving to scale back imply time to decision. The agent additionally manages incident coordination, utilizing Slack channels for stakeholder updates and sustaining detailed investigation timelines.
To get began, you join AWS DevOps Agent to your current instruments via the AWS Administration Console. The agent works with in style companies resembling Amazon CloudWatch, Datadog, Dynatrace, New Relic, and Splunk for observability knowledge, whereas integrating with GitHub Actions and GitLab CI/CD to trace deployments and their impression in your cloud assets. By means of the carry your personal (BYO) Mannequin Context Protocol (MCP) server functionality, it’s also possible to combine extra instruments resembling your group’s customized instruments, specialised platforms or open supply observability options, resembling Grafana and Prometheus into your investigations.
The agent acts as a digital group member and might be configured to mechanically reply to incidents out of your ticketing techniques. It contains built-in assist for ServiceNow, and thru configurable webhooks, can reply to occasions from different incident administration instruments like PagerDuty. As investigations progress, the agent updates tickets and related Slack channels with its findings. All of that is powered by an clever utility topology the agent builds—a complete map of your system elements and their interactions, together with deployment historical past that helps establish potential deployment-related causes throughout investigations.
Let me present you the way it works
To point out you the way it works, I deployed a straigthforward AWS Lambda operate that deliberately generates errors when invoked. I deployed it in an AWS CloudFormation stack.
Step 1: Create an Agent House
An Agent House defines the scope of what AWS DevOps Agent can entry because it performs duties.
You possibly can set up Agent Areas based mostly in your operational mannequin. Some groups align an Agent House with a single utility, others create one per on-call group managing a number of companies, and a few organizations use a centralized strategy. For this demonstration, I’ll present you the way to create an Agent House for a single utility. This setup helps isolate investigations and assets for that particular utility, making it simpler to trace and analyze incidents inside its context.
Within the AWS DevOps Agent part of the AWS Administration Console, I choose Create Agent House, enter a reputation for this house and create the AWS Identification and Entry Administration (IAM) roles it makes use of to introspect AWS assets in my or others’ AWS accounts.
For this demo, I select to allow the AWS DevOps Agent internet app; extra about this later. This may be accomplished at a later stage.
When prepared, I select Create.
After it has been created, I select the Topology tab.
This view exhibits the important thing assets, entities, and relationships AWS DevOps Agent has chosen as a basis for performing its duties effectively. It doesn’t characterize every thing AWS DevOps Agent can entry or see, solely what the Agent considers most related proper now. By default, the Topology contains the AWS assets which can be contained in my account. As your agent completes extra duties, it is going to uncover and add new assets to this listing.
Step 2: Configure the AWS DevOps internet app for the operators
The AWS DevOps Agent internet app gives an internet interface for on-call engineers to manually set off investigations, view investigation particulars together with related topology parts, steer investigations, and ask questions on an investigation.
I can entry the online app immediately from my Agent House within the AWS console by selecting the Operator entry hyperlink. Alternatively, I can use AWS IAM Identification Heart to configure person entry for my group. IAM Identification Heart lets me handle customers and teams immediately or hook up with an id supplier (IdP), offering a centralized technique to management who can entry the AWS DevOps Agent internet app.
At this stage, I’ve an Agent House all set as much as focus investigations and assets for this specific utility, and I’ve enabled the DevOps group to provoke investigations utilizing the online app.
Now that the one-time setup for this utility is completed, I begin invoking the defective Lambda operate. It generates errors at every invocation. The CloudWatch alarm related to the Lambda errors rely activates to ALARM state. In actual life, you would possibly obtain an alert from exterior companies, resembling ServiceNow. You possibly can configure AWS DevOps Agent to mechanically begin investigations when receiving such alerts.
For this demo, I manually begin the investigation by deciding on Begin Investigation.
You may as well select from a number of preconfigured beginning factors to rapidly start your investigation: Newest alarm to research your most up-to-date triggered alarm and analyze the underlying metrics and logs to find out the basis trigger, Excessive CPU utilization to research excessive CPU utilization metrics throughout your compute assets and establish which processes or companies are consuming extreme assets, or Error charge spike to research the latest improve in utility error charges by analyzing metrics, utility logs, and figuring out the supply of failures.
I enter some data, resembling Investigation particulars, Investigation start line, the Date and time of the incident, the AWS Account ID for the incident.
Within the AWS DevOps Agent internet app, you possibly can watch the investigation unfold in actual time. The agent identifies the applying stack. It correlates metrics from CloudWatch, examines logs from CloudWatch Logs or exterior sources, resembling Splunk, critiques latest code modifications from GitHub, and analyzes traces from AWS X-Ray.
It identifies the error patterns and gives an in depth investigation abstract. Within the context of this demo, the investigation reveals that these are intentional check exceptions, exhibits the timeline of operate invocations resulting in the alarm, and even suggests monitoring enhancements for error dealing with.
The agent makes use of a devoted incident channel in Slack, notifies on-call groups if wanted, and gives real-time standing updates to stakeholders. By means of the investigation chat interface, you possibly can work together immediately with the agent by asking clarifying questions resembling “which logs did you analyze?” or steering the investigation by offering extra context, resembling “concentrate on these particular log teams and rerun your evaluation.” When you want knowledgeable help, you possibly can create an AWS Assist case with a single click on, mechanically populating it with the agent’s findings, and have interaction with AWS Assist specialists immediately via the investigation chat window.
For this demo, the AWS DevOps Agent accurately recognized handbook actions within the Lambda console to invoke a operate that deliberately triggers errors 😇.
Past incident response, AWS DevOps Agent analyzes my latest incidents to establish high-impact enhancements that stop future points.
Throughout energetic incidents, the agent gives rapid mitigation plans via its incident mitigations tab to assist restore service rapidly. Mitigation plans include specs that present detailed implementation steerage for builders and agentic growth instruments like Kiro.
For longer-term resilience, it identifies potential enhancements by inspecting gaps in observability, infrastructure configurations, and deployment pipeline. My simple demo that triggered intentional errors was not sufficient to generate related suggestions although.
For instance, it would detect {that a} crucial service lacks multi-AZ deployment and complete monitoring. The agent then creates detailed suggestions with implementation steerage, contemplating elements like operational impression and implementation complexity. In an upcoming fast follow-up launch, the agent will broaden its evaluation to incorporate code bugs and testing protection enhancements.
Availability
You possibly can strive AWS DevOps Agent in the present day within the US East (N. Virginia) Area. Though the agent itself runs in US East (N. Virginia) (us-east-1), it will possibly monitor purposes deployed in any Area, throughout a number of AWS accounts.
Throughout the preview interval, you should utilize AWS DevOps Agent at no cost, however there can be a restrict on the variety of agent activity hours per thirty days.
As somebody who has spent numerous nights debugging manufacturing points, I’m significantly enthusiastic about how AWS DevOps Agent combines deep operational insights with sensible, actionable suggestions. The service helps groups transfer from reactive firefighting to proactive system enchancment.
To be taught extra and join the preview, go to AWS DevOps Agent. I stay up for listening to how AWS DevOps Agent helps enhance your operational effectivity.








