Why Founders Need to Rethink Data Engineering for AI-First Products
As LLMs and AI agents become first-class consumers of enterprise data, the role of the data engineer is undergoing a subtle but significant shift.
In a recent live recording of the LangTalks podcast, I joined our friends Lee Twito (Generative AI Lead at Lemonade) and Gal Peretz (Head of AI at Carbyne) to discuss how this transition is changing what it means to design, expose, and maintain data systems in an AI-driven world.
Rather than focusing on tools or frameworks, the conversation centered on responsibilities: how data engineers adapt when their primary “users” are no longer just humans or deterministic software, but probabilistic systems that reason about data.
“We are moving from building systems for humans to building systems that humans and AI need to agree on.”
Guy Fighel
A New Type of Data Consumer
Traditional data platforms were built for predictable access patterns - like dashboards and BI tools, backend services with fixed schemas, and queries written by humans with contextual understanding.
LLMs and agents behave differently. They infer meaning rather than follow structure. They operate probabilistically rather than deterministically. They rely on context (not just schemas) to behave correctly. As the panel emphasized, this creates a new challenge: data can be technically correct and still unusable for AI systems.
“A query can be technically correct and still completely wrong for an AI agent.”
Lee Twito
Moving from Schemas to Semantics
One of the core themes of our discussion was that schemas describe structure, but not meaning. Many production databases rely on implicit knowledge:
- Column names encode assumptions
- Business logic lives outside the data layer
- Definitions drift over time
Humans resolve these ambiguities intuitively. AI systems cannot. This pushes data engineers toward a new responsibility: making business meaning explicit. That includes documenting relationships, clarifying intent, and designing data models that reflect how the business actually operates, not just how tables are organized.
“We have relied on tribal knowledge for years, and AI exposes how fragile that actually is.”
Guy Fighel
Semantic Layers as an AI Primitive
We discussed how semantic layers (long used in BI) are becoming increasingly important for AI systems. In this context, semantic layers are not about abstraction, but translation. Think about defining entities and relationships clearly, encoding assumptions and constraints and providing AI agents with a stable conceptual model of the domain.
This shift positions data engineers as curators of meaning, not just operators of pipelines.
“Semantic layers are becoming a translation layer, not an abstraction layer.”
Guy Fighel
Making Databases LLM-Friendly
Rather than replacing existing databases, our discussion focused on how data is exposed to AI systems. Emerging patterns include:
- Natural-language interfaces over structured data
- Controlled APIs or Model Context Protocols (MCPs) that govern how LLMs access data
- Architectures where LLMs reason about requests, while traditional systems execute them
In these designs, data engineers serve as the bridge ensuring that AI systems have access to the right data, under clear constraints, and with appropriate context.
“The goal is not to let LLMs touch your database directly, but to control how they reason about it.”
Gal Peretz
Offline and Online AI Pipelines
Another of our key takeaways was the growing use of hybrid pipelines that combine offline and online AI processing. Examples include offline LLM jobs that enrich or annotate data, storing AI-generated signals back into structured systems and online agents consuming this enriched context during real-time interactions.
This mirrors earlier shifts in data engineering, such as batch versus streaming, but with intelligence itself becoming part of the pipeline.
“Offline enrichment plus online reasoning is quickly becoming a standard pattern.”
Lee Twito
Serving Multiple Audiences
A modern data platform now serves three audiences simultaneously: the business users analyzing outcomes, the developers building applications and the AI agents acting autonomously. Each consumes data differently, but all depend on trust.
As noted by the LangTalks panel, data engineers increasingly operate at the intersection of these needs, ensuring consistency, correctness, and interpretability across all consumers.
“Each of those consumers needs the same truth, but presented very differently.”
Gal Peretz
How the Skill Set Is Evolving
Our session made clear that this transition doesn’t diminish the data engineer’s role, but expands it. Skills gaining prominence include semantic modeling and ontology design, context engineering for AI systems, designing AI-facing data contracts and understanding LLM behavior and failure modes. The work shifts away from query tuning toward reasoning support.
“This is not about becoming an ML engineer, it is about becoming a curator of meaning.”
Guy Fighel
A Familiar Pattern Repeating
As I noted during the conversation, this evolution follows a familiar arc. Analytics only scaled once data pipelines matured. Machine learning only scaled once feature engineering was systematized. Agentic AI will only scale once semantic infrastructure becomes standard. Each generation pushes data engineering closer to the heart of how systems think.
The takeaway from our LangTalks session was pragmatic and clear: there is no single tool that defines the future data engineer. What will define the role is the ability to make data understandable and not just accessible to an increasingly diverse set of consumers, including AI systems that reason rather than query.
In the AI era, data engineering remains foundational, but its center of gravity is shifting - from structure to meaning.
“Agentic systems will not scale without semantic infrastructure, just like ML did not scale without features.”
Guy Fighel
.jpg)

