Data engineering is not new, but with the growing leveraging of LLM models as a core tech strategy for enterprises and startups alike, it’s all anyone can talk about now. How will enterprises ready themselves for the coming shift in the way their teams use, store and secure data? How will startup founders step up to build for the next advancement in the field?
In this article, we explore three insights drawn from a recent survey conducted among data practitioners in the MLOps Community:
- Enterprises want to start using LLM models in production.
- There is a new shift left: from AI researchers to ‘data citizens’ (engineers or developers), which requires a change in the data stack and development tools.
- The data stack will be drastically affected by the increase of the dependency on data.
For context, the survey was sent to broad practitioners among the MLOps Community , a global community with more than 15,000 active users spanning roles from MLOps engineers to data executives to startup founders. The survey’s main purpose was to understand the real usage rates and adoption of LLM among organizations and practitioners who are integrating Generative Language models as part of their daily needs.
Enterprises want to use LLM Models in production
The survey’s results reflect the broad shift toward embracing LLM within enterprises, mainly in production.
About 63% of the respondents are using LLM in their organizations, the vast majority of which are employees of large organizations (1,000 employees or more):
One third of the respondents use multiple LLM platforms at their organizations. The breakdown of their usage across platforms is as follows:
Natural language processing (NLP) capabilities, including LLMs such as GPT-3, have significantly adapted and advanced NLP capabilities. LLMs offer enterprises tools to automate their language-related tasks, opening up a wide range of applications such as chatbots, language translation, sentiment analysis, content generation and more. The improved NLP capabilities of LLMs make them attractive tools for enterprises that seek to automate and enhance their language-related tasks and workflows.
Respectively, the respondents’ answers cover a wide range of usages to LLM. We identify the five primary usages below:
- Natural Language interface and domain-specific content creation: Such as creating interfaces for interacting with systems using natural language and generating content in specific domains.
- Relevance and context: Retrieving relevant information and generating text content.
- Coding-related tasks: For example, translating text between languages and generating SQL queries from natural language.
- LLM as an idea generator: There are a wide range of uses for this within the fields of content, HR and coding.
- Chatbots: For example, the development of better attuned chatbots and implementing their usage on customer service platforms.
The Shift Left of Data Engineering
AI research and development rely on large volumes of high-quality data. As the scale of AI models and its complexity and applications have grown, the need for diverse and reliable data has become a priority. Without proper data engineering practices, the potential of the enterprises' AI models can be limited. The shift towards data engineering signifies an acknowledgment of the critical role that data plays in AI development. By embracing data engineering, AI researchers can leverage high-quality data in order to enhance the performance and applicability of their AI models and systems.
As a result, large companies will need fewer data scientists and more data engineers and enthusiasts that stitch code together. This is to say that there will be a shift in manpower towards hiring back more senior Python engineers, for example. Yet, the shift left towards data citizens cannot be fulfilled without facing some of the main challenges within the LLM usage, as raised by the survey respondents. The analysis of the answers about the usage and difficulties of the LLM platform reflects the need to enhance LLM management tools to ease the LLM usage in large enterprises.
Here’s a reflection of which enterprise roles are using their LLM models according to survey respondents:
Catering to the data stack required for this shift
At Hetz, we’re predicting that due to these new trends expressed in the previous section, enterprises will require much more control, management and security guardrails for data leakage. This growing need for enterprise-level data management and security will of course require cost management solutions (just as happened in the cloud space) that combine external models with internal open source models, and therefore the barrier of entry to the enterprise will lower. Finally, another change that Hetz is predicting is that enterprises will require additional visibility into their overall LLM stack across departments as these models become more and more utilized and leaned upon.
Hetz Ventures has launched the Hetz Data Program - nicknamed SPARQL - to seek out, guide and support the next wave of data engineering and AI startups tackling the industry-wide data shift. Learn more about the Hetz Ventures data program for startups, what founders can expect, and the global network of advisors actively participating in the program, including data leaders from Salesforce, Twilio, Snowflake, Shopify and others. Contact us.
Thanks to Mika Kaplan for her research and support in producing this article.