Unlocking the secrets to data engineering success
We interviewed Erica and Guy at the recent Hetz Ventures Data Engineering Breakfast in Tel Aviv. The interview has been paraphrased/lightly edited for readability. Topics covered include:
- Data has been around a while, why now?
- As investors, what are you staying away from?
- Open source and data
- Bottom up approach in the data space
- Generative AI: the day after tomorrow
- Goody bag: One founder takeaway
Meet our data engineering/AI experts: Erica Brescia & Guy Fighel
Erica Brescia is the Managing Director at Redpoint Ventures, specializing in enterprise software, infrastructure data, and open source. Erica has extensive experience in the data stack and has been involved in investing in a range of companies, including ClickHouse, Snowflake, Cockroach, and more. Before becoming an investor, Erica was the COO at Github and prior to that, co-founded Bitnami and later sold it to VMware.
Guy Fighel is New Relic’s SVP & GM of Data Platform Engineering & AI, and an investor in data, MLOps and AI startups. Guy is a technical founder who started in the cybersecurity world, which gives him a unique perspective around data, which he has been deeply involved in since 2013. Previously, Guy co-founded SignifAI, which he later sold to New Relic.
Data has been around a while, why now?
Data has obviously been around a long time, but it sounds like everybody's talking about it more today than they were before. What’s changed? What are you looking forward to?
Erica: It's getting to the point where people just don't know what to do with it anymore. That’s what led to the rise of Snowflake and tools like ClickHouse, MotherDuck. We’ve started with SaaS collecting so much more data around everything we do, and then at the same time, there's been this move towards the consumerization of software and expectations of businesses to empower employees to do a lot of things themselves. You have insane amounts of data and then teams trying to get access to do interesting things with that data and it’s become naturally more unmanageable. Of course we can’t have this conversation about data without AI. The data space is growing even more bold than it was before because everybody's looking at how to get the best data set models and it's only going to continue - so it's a good time to build.
Guy: The change we're seeing today regarding data is primarily driven by advancements in technology, compute power, and algorithms. Extensive research over time has contributed to the maturation of these factors, particularly in the past few years. Additionally, regardless of the industry or vertical, data has become a top priority for CEOs, whether they lead startups or Fortune 500 companies. This applies to sectors like cybersecurity, where new avenues are emerging by leveraging data.
When people ask me about investment strategies or what type of stock to build, it becomes evident to me that data will be the fundamental infrastructure upon which applications are built. These applications can span across cybersecurity, consumer-based services, marketing, as well as core AI algorithms and transformations. To enable this, investment in a new data stack and significant infrastructure work is required, involving tasks such as data translation, formatting, and transformation. This shift has occurred because it took considerable time for companies to accumulate the necessary information and lay the groundwork for building data-driven businesses.
Erica: When we're looking at investments from a venture perspective, it's all about the TAM ultimately, right? And that's one of the primary reasons we end up passing on deals; assuming the founders are great and they've built a great product, there just may not be an opportunity to build a big enough company. Our early fund (at Redpoint) is $650M. So every time I'm investing at a Series A, I have to be able to tell a story: how can this $15M investment have the potential to return $650M to the fund? You're always looking for a fund returner. The great thing about data is it's going to be the foundation of the future. There is just a huge market opportunity. It is much easier in most cases to justify TAM around data infrastructure and ‘data empowerment'.
As investors, what are you staying away from?
What are you staying away from as far as investing in the data engineering/AI space?
Guy: That's one of my favorite topics because I see more and more startups building on top of ‘X’ – OpenAI, GPT, etc. I'm like, no, no, no. There are tons of opportunities around what's happening. I'm really trying to stay away from applications on top of that.
I remember Facebook: at some point they basically opened up the platform for studio games and every week there was a new startup building games on top of Facebook. It was booming - there was Zynga - and all of a sudden all of those startups actually crashed - because Facebook understood the power they had and they said, you're gonna pay us a big portion, or we can just close some of the APIs and do it on our own.
I don't think that OpenAI will basically let those startups eat their lunch at any point. If not today, it will happen next month or next year or two years from now. So anything that is basically heavily attached to these specific platforms - I'm staying away.
Erica: One thing we're seeing is a ton of founders saying, “look what we built in three weeks.” If you built it in three weeks, I guarantee other people can build it in three weeks too. What is defendable about this business?
We saw this in cloud too: A ton of companies trying to do smaller management solutions for AWS. But AWS got better over time. If you look at the data market map, it’s exploding. So many companies are doing almost the same thing. You need to be building something that's really a steep change. I don't need to see the tenth user-friendly BI platform investment.
The other thing is, we're looking across the entire landscape of companies that are building in a space. If you see two dozen companies, especially at the early stage, they're not all going to have stellar teams, but I bet two or three of them have pretty darn good teams. Trying to pick a winner and then understand, is this a space where it's going to be winner-take-all? Or where you're going to have just fragmentation for the long term because of user preferences or everything else. It gets really hard to write that check thinking that you can get the really big outcome.
Layering open source with data
Is there anything particular about open source layered with the data space that you're excited about, that you're looking forward to, that you're not seeing yet?
Guy: One of the challenges with open source projects in the data space is that a lot of the time when I code and build something that I'm super excited about, typically the target persona will be the developer or the data scientist. Those personas have zero budget, so there's no direct relationship between the open source project, the problems that I'm trying to solve - and how I grow a big company from that. Yet that's the number one target that I keep seeing over and over in, especially in the data space, especially in the data space.
My biggest recommendation would be to shift from the data scientist to the developer and product person for value creation, and then have a path to the decision maker - the CTO, VP Engineering, Chief Data Officer, with something that provides big value - cutting cost or improving shipping time or tapping into privacy or security - something they really have to spend money on.
It will never be enough to excite a data scientist. You see the stars going up, you see the community getting excited but you will not see the dollars in the bank account. It's just not good enough. So for me open source is definitely a great starting point - but don't just build for the better scientists. Build for a broader set of developers and product people who can tap into it and create value for the decision makers.
Erica: I like to distinguish between infrastructure and tools for interacting with data. There are so many success stories of data infrastructure being open source. Looking at databases specifically - when you're using something to run your product in production, you're going to pay for it. There's a very clear model there.
It's really hard to build a successful business around any kind of open source, for instance, apps or tooling, in my experience. When you go back to when open source was starting, there were so many companies that were trying to build, like Zenoss, SugarCRM, there was an open source BI company that had a pretty good exit, one of the better ones. They’re very few and far between. What these companies learned is it's really hard to engage developers around a CRM; use the SugarCRM example. It's not real open source. You really need to understand if you're going to build something that's more of a tool - why should it be open source? And what is your strategy there?
I'll give you a great example: We invested in a company which is an open source machine data labeling platform for machine learning. The reason that they're open source is because they need to be able to process and label all different types of data and it makes a ton of sense to have people adding new formats to the project. Having it be open source makes sense in that context. But in a lot of other contexts, you just don't need it.
Bottom up approach in the data space
What are your thoughts on bottom up products in the data field? How do you think about funding for bottom up companies?
Guy: I think the bottom up motion has nothing to do with open source. It's a totally legit motion that is very relevant in the data space. Again, because of data scientists and developers and data citizens, as we call them - users basically, that can use the platform or a solution.
And so I definitely believe the bottom up bottoms up approach is very healthy in the data space. But it has nothing to do with open source. It's something that I would consider depending on the solution. Who should be the buyer? Who's the decision maker, who's the early adopter?
But I wouldn't make a direct connection to whether it should be open source or not.
Erica: I'm also thinking of examples where you are going to have your traditional enterprise buyer. I still think having a frictionless process for evaluation is really important, but that's not necessarily bottoms up either, right? Somebody can come and play around and do a demo without having to talk to a salesperson. That's important for almost every product out there these days, but they're just going to jump from that to buying. Bottoms up is when we're seeing an account grow over time.
It's hard for me to imagine, at least from my GitHub experience, us buying a BI tool in a bottom up way. It's not like some little team is going to use this one tool, especially because with a lot of data tooling, by definition, you're hooking into really sensitive data. So you need to go through a procurement process.
If I was a founder, I would be very attracted to the idea of trying to build a bottoms up company just because of the mechanics of how those work. In any case where you have an opportunity to go bottoms up, it’s almost always very attractive just because you can start getting feedback and testing things in the market and it allows for a faster iteration than when you're doing enterprise sales with a really long sales process. But it's not right for everything and you have to choose the right model.
To your second question, how we think about funding in a bottom up or PLG motion: What we're really looking for - and it's different at Seed and Series A of course - but it's momentum and traction. Basically, speed. The very best companies that are bottom up in their early days sometimes grow around 50% month-over-month. Then it gets to about 20% as they get a little bit bigger. If we see that kind of monthly growth, it's a good indication that you built something people want.
Using AI to gain insights from unstructured data
AI has been around a long time, but one of the biggest changes today is the ability to use these new models to really gain insights from unstructured data, which was really hard previously. Do you see change in the companies you’re seeing or in your perspective?
Guy: Definitely. AI has been around for a long time, but one of the biggest changes today is the ability to gain insights from unstructured data using new models. Previously, this was a challenging task. I have observed changes in companies and in my perspective due to this advancement.
In the past 10 to 15 years, we have seen different groups doing various things with different algorithms. However, what has made a significant impact is the ability to work with unstructured data in terms of pipeline development, real-time synchronization, data volume, data pipelines, and transformations. These advancements allow for building new applications on top of unstructured data and facilitate the translation between unstructured and structured data.
As a community, these advancements empower us to develop new applications on top of unstructured data. Many different industries and verticals are now benefiting from this progress. For example, in areas focused on text analysis, converting unstructured data into a structured format has enabled automation of tasks that were previously complex and time-consuming.
The emergence of models like GPT has brought significant momentum to the industry, attracting talented individuals, particularly from the younger generation, to learn about the engineering and algorithm aspects of AI. This influx of talent has further advanced the field. Although there may be fluctuations in the industry, I believe AI is here to stay and businesses will continue to adapt and evolve over time.
Erica: First of all, I have a 10 year old who's getting into hacking and learning Python. He's using ChatGPT already - he went, Google it, signed himself up and has been playing around with it.
We're climbing the first part of the height cycle. The biggest thing about ChatGPT is seeing the incremental improvement, and then the massive improvement with GPT-4 was a real ‘aha’ moment for a lot of people.
When we launched Copilot when I was at GitHub, it was the only time I have been involved in a product launch where the product we built was better than we expected it to be. That does not usually happen. Right out of the gate it was predicting what people want, and you might not love it yet, but for repos that have Copilot enabled, it’s 40% of the code. The point of this is people's expectations of what they can get are changing very quickly.
As people are getting used to this, their expectations are going to evolve quickly. We are going to see much faster adoption of these products and UX changes than we've seen in almost anything else.
Generative AI: The day after tomorrow
Do you have concerns or thoughts on what’s been opened up with Generative AI?
Erica: The exciting thing about that is the evolution of this is like nothing I've seen in my entire professional career. Just how quickly things are getting better. I think that will continue, and I also think there's an opportunity to create a lot of very interesting tooling to help people understand this.
Like one of the companies we looked at recently was basically helping you test and understand the quality of different models. And the performance, cost, etc. There'll be a lot of things that are the scaffolding around some of the core models and tools that come out that will help you mitigate issues.
But I also definitely think people are gonna get into a lot of trouble in particular because people are adopting this stuff so fast - it can help so much that they're gonna do a lot of, quite frankly, stupid, stupid things. That happens anytime you have a fundamentally new technology and we always end up iterating through it.
Guy: You know, it won't happen in the future. It is actually happening. The minute that OpenAI provided the $20 package for a plus account, every single developer in the organization could swipe their own card and get reimbursed. And you have zero control. So the implication is that if you are in a heavily regulated industry, and you need to basically file for regulations as a public company, I don't know how as a compliance officer you can sign off that you're in control - no one really knows what happens in those AI prompts in a completely unmonitored, unregulated environment.
You know, we've seen those types of activities in the past as well, going all the way back to the ‘90s when firewall wasn't a thing yet. It was all open. And as the technology evolved, different solutions evolved, and now it is very obvious to everyone that a firewall is basically a commodity. Same will happen with AI.
Goody bag: One founder takeaway
One takeaway that everyone here can grab from you before they leave when they're building their companies, what would it be?
Erica: I'll talk about fundraising since I am an investor. Deal processes aren't happening as fast, and that's good on both sides. When you're choosing who to partner with from an investment perspective, understanding who they really are and how they view your business and whether or not you're aligned is really important (on values, outcomes, how big to get, understanding of your space).
The biggest mistake I see founders make is telling themselves a story about TAM that just doesn't really make sense. Like every slide, it's a 10B market. Being honest with yourself about how big the company could get and then choosing the right type of funding. The sooner you get clearer on that, the better experience you'll have raising funding if you choose to.
Guy: Remember: A design partner is not a customer. A customer is not a paying customer. Design, partner, customer, paying customer - founders, especially in the beginning of their journey, mismatch that terminology. One of the biggest mistakes that I see is when founders are speaking with three to four design partners, usually from their own network, and they just don't get the competition. And all of a sudden as a founder, I'm like, “woo-hoo - I can do it much better and cheaper - and I have a potential customer who just told me that.” No, you don't - it's not the right design partner. And the homework isn’t finished. Tons of times VPs in big companies just don't know the tool set, other markets; just take the time to do the due diligence yourself. Play devil's advocate for your own ideas. Find at least five other companies that could bury you. Understand that, and then find alternatives if you still believe in the market.