As a companion to the 2024 MAD (ML, AI & Data) Landscape (blog post, PDF, interactive website), my colleague Aman and I had a fun chat about some key trends we see in data and AI.
This is our tenth annual landscape and “state of the union” of the data, analytics, machine learning and AI ecosystem.
In 10+ years covering the space, things have never been as exciting and promising as they are today. All trends and subtrends we described over the years are coalescing: data has been digitized, in massive amounts; it can be stored, processed and analyzed fast and cheaply with modern tools; and most importantly, it can be fed to ever-more performing ML/AI models which can make sense of it, recognize patterns, make predictions based on it, and now generate text, code, images, sounds and videos.
The MAD (ML, AI & Data) ecosystem has gone from niche and technical, to mainstream. The paradigm shift seems to be accelerating with implications that go far beyond technical or even business matters, and impact society, geopolitics and perhaps the human condition.
There are still many chapters to write in the multi-decade megatrend, however. As every year, this post is an attempt at making sense of where we are currently, across products, companies and industry trends.
Our team this year was Aman Kabeer and Katie Mills (FirstMark), Jonathan Grana (Go Fractional) and Paolo Campos, major thanks to all. And a big thank you as well to CB Insights for providing the card data appearing in the interactive version.
This annual state of the union post is organized in three parts:
(This post is part of my “This Week in AI” series, which is general off-the-cuff market commentary. I’m not an investor in either MosaicML or Databricks)
$21M per employee. That’s the price Databricks is paying for MosaicML — a total of $1.3B for 62 employees (in Databricks stock, and also includes employee retention packages).
One thing is clear – if you’re going to be aggressively acquiring Generative AI startups, you’re going to have to pay up
But it may turn out to be cheap in the long term given the size of the opportunity.
That’s because, beyond any Generative AI capabilities, Databricks’ move needs to be understood in the broader context of its fierce rivalry with Snowflake.
Note: those are quick thoughts on some of last week’s most interesting news in AI. I may, or may not (!) do this on a regular basis.
This last week in AI: Adobe killed all the Generative AI design startups with Firefly, and then Microsoft killed all the other Generative AI startups with its plugins and Fabric releases.
I’m sort of kidding, but sort of not. Kidding because, again and again, founders and startups find a way. But sort of not, because the speed of deployment of AI by the Big Tech incumbents is truly something to behold. Companies like Adobe and Microsoft have, of course, massive distribution advantages. It is undeniably problematic for startups to see Adobe deploying Firefly in Photoshop and Microsoft deploying AI copilots across, well just about every single of its products for consumers, businesses and developers (see the dizzying list of announcements at Microsoft’s Build conference this week).
I don’t think, however, that the world wants a Microsoft and Google dominated AI world. The best version of the future for the Generative AI landscape is to be “polyglot” with a variety of tools and companies. Open source is going to be play a huge role and it’s comforting to see so much energy there. And I have faith startups will build the best specialized tools and vertical solutions. It’s going to be a fun ride ahead.
I recently got an opportunity to chat with Prateek Joshi on Infinite Machine Learning, his excellent podcast.
It was a wide-ranging conversation about Generative AI (which I would recommend listening at 1.25x speed or more, makes me a lot more articulate). We covered a range of topics including:
AI going mainstream with ChatGPT
The opportunity for Generative AI in the enterprise
Defensibility and moats of Generative AI companies
A mental model for thinking about what AI is best suited for, in terms of startup opportunities
Desirable characteristics of AI startup founding teams
Rapid fire: favorite books, favorite questions to ask when interviewing a candidate, why VC is a craft business
Every year, as part of our MAD project, we do a presentation at Data Driven NYC about the top trends we see across data and ML/AI. (here’s the 2022 version for reference).
The presentation, done this year with my FirstMark colleague Kevin Zhang, is a whirlwind tour of top trends, as opposed to anything particularly in-depth, as we tried to keep it short. But hopefully it should provide a good overview of what’s been happening in those spaces, for anyone interested in a recap.
Software Daily (aka Software Engineering Daily) has been on my podcast rotation for a while, so it was fun to get a chance to be a part of it – thanks to Jocelyn Houle who moonlights as podcast host on top of her day job at Securiti. While this was done in connection with the publication of the MAD 2023, we ended up talking a lot of about venture capital and entrepreneurship in general, including some personal stories.
The video is below, and here’s the audio-only podcast: Apple, Spotify.
One of the cool parts of publishing the MAD landscape every year is the conversations that come with it. Here’s a fun chat I did recently with Joe Reis and Matthew Housley, co-founders of data consulting company Ternary Data and co-authors of the O’Reilly book, Fundamentals of Data Engineering (see their recent talk at Data Driven NYC). We covered a lot of things, check it out!
It has been less than 18 months since we published our last MAD landscape, and it has been full of drama.
When we left, the data world was booming in the wake of the gigantic Snowflake IPO, with a whole ecosystem of startups organizing around it.
Since then, of course, public markets crashed, a recessionary economy appeared and VC funding dried up. A whole generation of data/AI startups has had to adapt to a new reality.
Meanwhile, the last few months saw the unmistakable, exponential acceleration of Generative AI, with arguably the formation of a new mini-bubble. Beyond technological progress, it feels that AI has gone mainstream, with a broad group of non-technical people around the world now getting to experience its power firsthand.
The rise of data, ML and AI is one of the most fundamental trends in our generation. Its importance goes well beyond the purely technical, with a deep impact on society, politics, geopolitics and ethics.
“It’s been crazy out there. Venture capital has been deployed at unprecedented pace, surging 157% year-on-year globally […]. Ever higher valuations led to the creation of 136 newly-minted unicorns […] and the IPO window has been wide open, with public financings up +687%”
Well, that was…last year. Or more precisely, 15 months ago, in the MAD 2021 post, written pretty much at the top of the market, in September 2021.
Since then, of course, the long-anticipated market downturn did occur, driven by geopolitical shocks and rising inflation. Central banks started increasing interest rates, which sucked the air out of an entire world of over-inflated assets, from speculative crypto to tech stocks. Public markets tanked, the IPO window shut down, and bit by bit, the malaise trickled down to private markets – first at the growth stage, then progressively to the venture and seed markets.
We’ll talk about this new 2023 reality in the following order:
(note: this is part III of the 2023 MAD Landscape. The landscape PDF is here, and the interactive version is here)
In the hyper-frothy environment of 2019-2021, the world of data infrastructure (nee Big Data) was one of the hottest areas for both founders and VCs.
It was dizzying and fun at the same time, and perhaps a little weird to see so much market enthusiasm for products and companies that are ultimately very technical in nature.
Regardless, as the market has cooled down, that moment is over. While good companies will continue to be created in any market cycle, and “hot” market segments will continue to pop up, the bar has certainly escalated dramatically in terms of differentiation and quality for any new data infrastructure startup to get real interest from potential customers and investors.
Here is our take on some of the key trends in the data infra market in 2023.
(note: this is part IV of the 2023 MAD Landscape. The landscape PDF is here, and the interactive version is here)
The excitement! The drama! The action!
Everybody is talking breathlessly about AI all of a sudden. OpenAI gets a $10B investment. Google is in Code Red. Sergey is coding again. Bill Gates says what’s been happening in AI in the last 12 months is “every bit as important as the PC or the internet” (here). Brand new startups are popping up (20 Generative AI companies just in the Winter ’23 YC batch). VCs are back to chasing pre-revenue startups at billions of valuation.
So what does it all mean? Is this one of those breakthrough moments that only happen every few decades? Or just the logical continuation of work that has been happening for many years? Are we in the early days of a true exponential acceleration? Or in the early days of a hype cycle and mini financing bubble, as many in tech are desperate for the next big platform shift, after social and mobile, and the crypto headfake?
For anyone interested in a quick overview of our long-form 2021 Machine Learning, AI and Data (MAD) Landscape, here are the Cliffs Notes! My co-author John and I did a presentation at our most recent Data Driven NYC, focused on top 10 trends in this year’s landscape.
As a preview, here they are:
Every company is a data company
The big unlock: data warehouses and lakehouses
Consolidation vs data mesh: the future is hybrid
An explosive funding environment
A busy year in DataOps
It’s time for real time
The action moves to the right side of the warehouse
The rise of AI generated content
From MLOps to ModelOps
The continued emergence of a separate Chinese AI stack
Below is the video from the event, and below that, the transcript.
Full resolution version of the landscape image here
It’s been a hot,hot year in the world of data, machine learning and AI.
Just when you thought it couldn’t grow any more explosively, the data/AI landscape just did: rapid pace of company creation, exciting new product and project launches, a deluge of VC financings, unicorn creation, IPOs, etc.
It has also been a year of multiple threads and stories intertwining.
One story has been the maturation of the ecosystem, with market leaders reaching large scale and ramping up their ambitions for global market domination, in particular through increasingly broad product offerings. Some of those companies, such as Snowflake, have been thriving in public markets (see our MAD Public Company Index), and a number of others (Databricks, Dataiku, Datarobot, etc.) have raised very large (or in the case of Databricks, gigantic) rounds at multi-billion valuations and are knocking on the IPO door (see our Emerging MAD company Index – both indexes will be updated soon).
But at the other end of the spectrum, this year has also seen the rapid emergence of a whole new generation of data and ML startups. Whether they were founded a few years or a few months ago, many experienced a growth spurt in the last year or so. As we will discuss, part of it is due to a rabid VC funding environment and part of it, more fundamentally, is due to inflection points in the market.
In the last year, there’s been less headline-grabbing discussion of futuristic applications of AI (self-driving vehicle, etc.), and a bit less AI hype as a result. Regardless, data and ML/AI-driven application companies have continued to thrive, particularly those focused on enterprise use cases. Meanwhile, a lot of the action has been happening behind the scenes on the data and ML infrastructure side, with entire new categories (data observability, reverse ETL, metrics stores, etc.) appearing and/or drastically accelerating.
To keep track of this evolution, this is our eighth annual landscape and “state of the union” of the data and AI ecosystem – co-authored this year with my FirstMark colleague John Wu. (For anyone interested, here are the prior versions: 2012, 2014, 2016, 2017, 2018, 2019 (Part I and Part II) and 2020.)
For those who have remarked over the years how insanely busy the chart is, you’ll love our new acronym – Machine learning, Artificial intelligence and Data (MAD) – this is now officially the MAD landscape!