
(New Africa/Shutterstock)
For the reason that earliest days of huge information, information engineers have been the unsung heroes doing the soiled work of shifting, reworking, and prepping information so extremely paid information scientists and machine studying engineers can do their factor and get the glory. Because the agentic AI period dawns on us, it opens up a bunch of latest information engineering alternatives–in addition to probably catostrphic pitfalls.
Frank Weigel, the previous Googel and Microsoft govt who was just lately employed by Matillion to be its new chief product officer, brazenly puzzled to a reporter just lately whether or not the Agentic AI Air was on a glideslope for catastrophe.
“Principally, we see there’s an enormous drawback coming for information engineering groups,” Weigel stated in an interview throughout the current Snowflake Summit. “I’m undecided all people is totally conscious of it.”
Right here’s the problem, as Weigel defined it:
The explosion of supply information is one facet of the issue. Knowledge engineers who’re accustomed to working with structured information are actually being requested to handle, prep, and rework unstructured information, which is harder to work with, however which in the end is the gas for many AI (i.e. phrases and footage processed by neural networks).
Knowledge engineers are already overworked. Weigel cited a research that indicated 80% of knowledge engineering groups are already overloaded. However once you add AI and unstructured information to the combination, the workload problem turns into much more acute.
Agentic AI gives a possible resolution. It’s pure that overworked information engineering groups will flip to AI for assist. There’s a bevy of suppliers constructing copilots and swarms of AI brokers that, ostensibly, can construct, deploy, monitor, and repair information pipelines once they break. We’re already seeing agentic AI have actual impacts on information engineering groups, in addition to the downstream information analysts who in the end are those requesting the info within the first place.
However based on Weigel, if we implement agentic AI for information engineering the mistaken means we’re probably setting ourselves a lure that shall be robust to get out of.
The issue that he’s foreseeing would stem from AI brokers that entry supply information on their very own. If an analyst can kick off an agentic AI workflow that in the end includes the AI agent writing SQL to acquire a chunk of knowledge from some upstream system, what occurs when one thing goes mistaken with the info pipeline? AI brokers may be capable of repair fundamental issues, however what about critical ones that demand human consideration?
“You’ll have autonomous AI brokers that run complete enterprise capabilities,” Weigel stated. “However equally, they begin to have an enormous want for information. And so if the info group already was overloaded earlier than, properly, it’s now going to be like trying down the abyss and saying ‘How on earth can we do something? How am I going to have a human information engineer reply a query from an AI agent?’”
As soon as human information engineers are out of the loop, unhealthy issues can begin taking place, Weigel stated. They probably face a state of affairs the place the quantity of knowledge requests–which initially had been served by human information engineers however now are being served by AI brokers–is past their functionality to maintain up.
The accuracy of knowledge may also undergo, he stated. If each AI agent writes its personal SQL and pulls information straight out of its supply, the chances of getting the mistaken reply goes up significantly.
“We’re now again at the hours of darkness ages, the place we had been 10 years in the past [when we wondered] why we’d like information warehouses,” he stated. “I do know that if particular person A, B, and C ask a query, and beforehand they wrote their very own queries, they obtained totally different outcomes. Proper now, we ask the identical agent the identical query, and since they’re non-deterministic, they are going to really create totally different queries each time you ask it. And in consequence, you now have the totally different enterprise capabilities all getting totally different solutions, insisting in fact that it’s proper.
“You’ve got misplaced all of the governance and management of why you established a central information group,” Weigel continued. “And for me, that’s the angle that I feel a whole lot of information orgs haven’t actually considered. After I get a demo of an AI agent, they by no means discuss that. They only have the agent entry the info straight. And positive, it could actually. However the issue is, it shouldn’t actually.”
The reply to this dilemma, based on Weigel, is twofold. First, it’s essential to maintain information warehouses, because it serves as a repository for information that has been vetted, checked, and standardized.
It’s additionally crucial to maintain people within the loop, based on Weigel. And to maintain people within the loop, human information engineers should in some way be prevented from turning into utterly overwhelmed by the unstructured information requests and the brand new AI workflows. To perform that, he stated, they basically should grow to be superhuman information engineers, augmented with AI.
Matillion is constructing its agentic AI options round this technique. As a substitute of setting AI brokers unfastened to jot down their very own SQL towards supply information techniques, Matillion is utilizing AI brokers as supporting solid members who’s aim is to help the human information engineer in getting the work achieved.
This on-demand group of digital information engineers is dubbed Maia, which the corporate introduced earlier this month. The brokers, which run within the Matillion Knowledge Producdtivity Cloud (DPC), are capable of help information engineers with a spread of duties, together with creating information connectors, constructing information pipelines, documenting adjustments, testing pipelines, and analyzing failures.
“We have to supercharge the info engineering perform, and we have to allow them to match the AI capabilities,” he stated. “As a substitute of only a copilot idea, it has grow to be a element, a choice of totally different information engineers which have totally different duties. They’ll do various things.”
Maia acts because the lead agent that controls varied sub-agents. The corporate has three or 4 such information engineering sub-agents at this time, Weigel stated, and it’ll have extra sooner or later. Maia, which is constructed utilizing a group of huge language fashions (LLMs), together with Anthropic’s Claude–may even right itself when it does one thing mistaken.
“It’s actually fascinating,” Weigel stated. “Whenever you see it work, it is going to break down the issue into the steps. Then it is going to begin doing it. It’s going to have a look at the info and determine whether or not it’s going heading in the right direction. It would roll again. ‘That wasn’t fairly proper.’ And so it actually is sort of a information engineer in its activity and considering, together with trying on the information. It’s going to ask the human for sure at sure factors if it needs enter.”
Regardless of the potential for agentic autonomy, that isn’t a part of the Matillion plan, as the corporate sees the human engineer as a crucial backstop that may’t be eradicated from the equation.
One other essential backstop that would assist Matillion prospects keep away from agentic AI pitfalls: No AI technology of SQL.
Whereas LLMs like Claude have gotten actually, actually good at writing SQL, Matillion won’t hand the reins over to AI for this crucial element. The ETL vendor has been mechanically producing SQL as a part of its information pipeline resolution for Snowflake, Databricks, and different cloud information warehouses for years, and it’s not about to begin from scratch.
“The key in Matillion is we’ve abstracted that layer so we’re a lot nearer to the consumer intent,” Weigel stated. “So the consumer is constructing that information pipeline intent with predefined constructing blocks that in the end write SQL. But it surely’s Matillion that writes SQL, not the consumer.”
This method additionally avoids the issue of getting spaghetti SQL code that may’t be up to date and modified over time, which is a chance with AI-generated code.
“We now have this abstraction of this intermediate illustration of those elements that in flip points SQL,” Weigel stated. “And so our agent doesn’t should generate no matter code you want. As a substitute, it’s about selecting the correct element and configuring the correct element after which sequencing them collectively.”
It’s straightforward to get mesmerized by the “shiny object” syndrome within the tech world. With all of the advances in generative AI, it’s tempting to letting these shiny new copilots unfastened to attempt to replicate the job of the overworked, under-appreciated information engineer, at a fraction of her price.
But when changing information engineers with AI additionally means changing a lot of the governance and management the info engineer brings, that would spell catastrophe for corporations. “I feel information engineering groups aren’t possibly totally conscious of the potential doom that’s there,” Weigel stated.
As a substitute, corporations ought to be trying to super-charge these overworked information engineers utilizing AI, which Weigel stated is the most effective hope for surviving the AI information deluge.
Associated Objects:
Are We Placing the Agentic Cart Earlier than the LLM Horse?



