The 2023 MAD (Machine Learning, Artificial Intelligence & Data) Landscape

It has been less than 18 months since we published our last MAD landscape, and it has been full of drama.

When we left, the data world was booming in the wake of the gigantic Snowflake IPO, with a whole ecosystem of startups organizing around it. 

Since then, of course, public markets crashed, a recessionary economy appeared and VC funding dried up. A whole generation of data/AI startups has had to adapt to a new reality.

Meanwhile, the last few months saw the unmistakable, exponential acceleration of Generative AI, with arguably the formation of a new mini-bubble. Beyond technological progress, it feels that AI has gone mainstream, with a broad group of non-technical people around the world now getting to experience its power firsthand.

The rise of data, ML and AI is one of the most fundamental trends in our generation. Its importance goes well beyond the purely technical, with a deep impact on society, politics, geopolitics and ethics.

Yet it is a complicated, technical, and rapidly evolving world that is often confusing even for practitioners in the space. There’s a jungle of acronyms, technologies, products and companies out there that are hard to keep track of, let alone master:

The annual MAD (Machine Learning, Artificial Intelligence and Data) landscape is our attempt at making sense of this vibrant space.  Its general philosophy, much like our event series Data Driven NYC, has been to open source work that we would do anyway, and start a conversation with the community.

So, here we are again, in 2023. This is our ninth annual landscape and “state of the union” of the data and AI ecosystem. Here are the prior versions: 2012, 2014, 2016, 2017, 2018, 2019 (Part I and Part II), 2020 and 2021

This annual state of the union post is organized in four parts:

MAD 2023, PART I: THE LANDSCAPE

After much research and effort, we are proud to present the 2023 version of the MAD landscape. When I say “we”, I mean a little group, whose nights will be haunted for months to come by memories of moving tiny logos in and out of crowded little boxes on a PDF: Katie Mills, Kevin Zhang and Paolo Campos. Immense thanks to them. And yes, I meant it when I told them at the onset “oh, it’s a light project, maybe a day or two, it’ll be fun, please sign here”.

So, here it is (cue in drum roll, smoke machine).  The MAD landscape comes in two modes of consumption this year:

PDF (static) version:

<<<<<<<< CLICK HERE FOR PDF VERSION >>>>>>>>

(yes, it’s all very high resolution, and you can easily zoom on both desktop and mobile)

<New!> Interactive version:

In addition, this year for the first time, we are jumping head first into what the youngsters call the “World Wide Web”, with a fully interactive version of the MAD Landscape that should make it fun to explore the various categories.  

 <<<<<<<< CLICK HERE FOR THE INTERACTIVE VERSION >>>>>>>>

Notes on the interactive version:

  • Each logo is clickable – when you click a pop up shows up on the bottom right corner
  • There is a “landscape” and a “card” view (see top right corner)… and also, a night mode!
  • This is a first version, and we’ll add more functionality ASAP (search, filtering, etc.)
  • For this interactive version, we partnered with Gotta Go Fast for the app build and CB Insights for the data that appears in the cards.  Many thanks to both for their partnership. 

For all questions and comments, please email MAD2023@firstmarkcap.com 

General approach

First, we’ve made the decision this year again to keep both data infrastructure and ML/AI on the same landscape. One could argue that those two worlds are increasingly distinct. However, we continue to believe that there is an essential symbiotic relationship between those areas. Data feeds ML/AI models. The distinction between a data engineer and a machine learning engineer is often pretty fluid. Enterprises need to have a solid data infrastructure in place in order before properly leveraging ML/AI.

The landscape is built more or less on the same structure as every annual landscape since our first version in 2012. The loose logic is to follow the flow of data, from left to right – from storing and processing to analyzing to feeding ML/AI models and building user-facing, AI-driven or data-driven applications.

This year again, we’ve kept a separate “open source” section. It’s always been a bit of an awkward organization as we effectively separate commercial companies from the open source project they’re often the main sponsor of. But equally, we want to capture the reality that for one open source project (for example, Kafka), you have many commercial companies and/or distributions (for Kafka – Confluent, Amazon, Aiven, etc.). Also, some open source projects appearing in the box are not fully commercial companies yet.

The vast majority of the organizations appearing on the MAD landscape are unique companies, with a very large number of VC-backed startups. A number of others are products (such as products offered by cloud vendors) or open source projects.

Company selection

This year, we have a total of 1,416 logos appearing on the landscape.   For comparison, there were 139 in our first version in 2012.

Each year we say we can’t possibly fit more companies on the landscape and each year, somehow, we need to. This comes with the territory of covering one of the most explosive areas of technology.

However, this year in particular, we’ve had to take a more editorial, opinionated approach to deciding which companies make it to the landscape. Despite the surging number of companies in the category, we’re long past the stage where we can fit nearly everyone, so we have had to make choices.

In prior years, we tended to give disproportionate representation to growth-stage companies, based on funding stage (typically Series B-C or later) and ARR (when available), in addition to all the large incumbents. However this year, particularly given the explosion of brand new areas like Generative AI where most companies are 1 or 2 years old, we’ve made the editorial decision to feature many more very young startups on the landscape.

A couple of disclaimers:

  • We’re VCs, so we have a bias towards startups, although hopefully we’ve done a good job covering larger companies, cloud vendor offerings, open source and occasional bootstrapped companies
  • We’re based in the US, so we probably over-emphasize US startups. We do have strong representation of European and Israeli startups on the MAD landscape. However, while we have a few Chinese companies, we probably under-emphasize the Asian market as well as Latin America and Africa (which just had an impressive data/AI startup success with the acquisition of Tunisia-born Instadeep by BioNTech for $650M)

Categorization

One of the harder parts of the process is categorization – in particular, what to do when a company’s product offering straddles two or more areas. It’s becoming a more salient issue every year, as many startups progressively expand their offering, a trend we discuss in “Part III – Data Infrastructure”.

Equally, it would be just untenable to put every startup in multiple boxes in this already overcrowded landscape.

Therefore, our general approach has been to categorize a company based on its core offering, or what it’s mostly known for.  As a result, startups generally appear in only one box, even if they do more than just one thing.

We make exceptions for the cloud hyperscalers (many AWS, Azure and GCP products across the various boxes), as well as some public companies (e.g. Datadog) or very large private companies (e.g., Databricks).

What’s new this year

Main changes in “Infrastructure”:

  • We (finally) killed the Hadoop box, to reflect the gradual disappearance of the OG Big Data technology – the end of an era! We had decided to keep it one last time in the MAD 2021 landscape to reflect the existing footprint. Hadoop is actually not dead, and parts of the Hadoop ecosystem are still being actively used (e.g., Hive) – see The Hadoop Conversation Is Now About What’s Next . But it has declined enough that we decided to merge the various vendors and products supporting Hadoop into Data Lakes (and kept Hadoop and other related projects in our Open Source category).
  • Speaking of data lakes, we rebranded that box to “Data Lakes / Lakehouses” to reflect the lakehouse trend (which we had discussed in the 2021 MAD landscape)
  • In the ever evolving world of databases, we created three new subcategories:
    • “GPU-accelerated Databases” (used for streaming data and real-time machine learning)
    • “Vector Databases” (used for unstructured data to power AI applications, see What is a Vector Database?)
    • “Database Abstraction”, a somewhat amorphous term meant to capture the emergence of a new group of serverless databases that abstract away a lot of the complexity involved in managing and configuring a database. For more, here’s a good overview: 2023 State of Databases for Serverless & Edge (mentions a number of vendors, more than we could fit in the box)
  • We considered adding an Embedded Database” category with DuckDB for OLAP, KuzuDB for Graph, SQLite for RDBMS and Chroma for search but had to make hard choices given limited real estate – maybe next year.
  • We added a “Data Orchestration” box to reflect that rise of several commercial vendors in that space (we already had a “Data Orchestration” box in “Open Source” in MAD 2021)
  • We merged two subcategories “Data observability” and “Data Quality” into just one box, to reflect the fact that companies in the space, while sometimes coming from different angles, are increasingly overlapping – a signal that the category may be ripe for consolidation.
  • We created a new “Fully Managed” data infrastructure subcategory. This reflects the emergence of startups that abstract away the complexity of stitching together a chain of data products (see our thoughts on the Modern Data Stack in Part III), saving their customers time, not just on the technical front, but also on contract negotiation, payments, etc.

Main changes in “Analytics”:

  • For now, we killed the “Metrics Store” subcategory we had created in the 2021 MAD landscape. The idea was that there was a missing piece in the modern data stack. The need for the functionality certainly remains, but it’s unclear whether there’s enough there for a separate subcategory.  Early entrants in the space rapidly evolved: Supergrain pivoted, Trace* built a whole layer of analytics on top of its metrics store, and Transform was recently acquired by dbt Labs. 
  • We created a “Customer Data Platform” box, as this subcategory, long in the making, has been heating up.
  • At the risk of being “very 2022”, we created a “Crypto/web3 Analytics” box — we continue to believe there are opportunities to build important companies in the space.

Main changes in “Machine Learning / Artificial Intelligence”:

  • In our 2021 MAD landscape, we had broken down “MLOps” into multiple subcategories – “Model Building”, “Feature Stores” and “Deployment and Production”. In this year’s MAD, we’ve merged everything back into one big MLOps box. This reflects the reality that many vendors’ offerings in the space are now significantly overlapping – another category that’s ripe for consolidation.
  • We almost created a new “LLMOps” category next to MLOps to reflect the emergence of a new group of startups focused on the specific infrastructure needs for large language models. But the number of companies there (at least that we are aware of) is still too small and those companies literally just got started. 
  • We renamed “Horizontal AI” to “Horizontal AI / AGI” to reflect the emergence of a whole new group of research-oriented outfits, many of which openly state Artificial General Intelligence as their ultimate goal.
  • We created a “Closed Source Models” box, to reflect the unmistakable explosion of new models over the last year, especially in the field of Generative AI. We’ve also added a new box in “Open Source” to capture the open source models.
  • We added an “Edge AI” category – not a new topic, but there seems to be acceleration in the space

Main changes in “Applications”:

  • We created a new “Applications/Horizontal” category, with subcategories such as code, text, image, video, etc. The new box captures the explosion of new Generative AI startups over the last few months. Of course, many of those companies are thin-layers on top of GPT and may or may not be around in the next few years, but we believe it’s a fundamentally new important category and wanted to reflect it on the 2023 MAD landscape. Note that there are a few Generative AI startups mentioned in “Applications/Enterprise” as well.
  • In order to make room for this new category:
    • We deleted the “Security” box in “Applications/Enterprise”. We made this editorial decision because, at this point, just about every one of the thousands of security startups out there use ML/AI, and we could devote an entire landscape to them.
    • We trimmed down the “Applications/Industry” box. In particular, as many larger companies in spaces like finance, health or industrial have built some level ML/AI into their product offering, we’ve made the editorial decision to focus mostly on “AI-first” companies in those areas.

Other noteworthy changes:

  • We added a new ESG data subcategory to “Data Sources & APIs” at the bottom, to reflect its growing (if sometimes controversial) importance.

We considerably expanded our “Data Services” category and rebranded it “Data & AI Consulting”, to reflect the growing importance of consulting services to help customers facing a complex ecosystem, as well as the fact that some pure-play consulting shops are starting to reach early scale.

READ NEXT: MAD 2023, PART II: FINANCINGS, M&A AND IPOs 

10 thoughts on “The 2023 MAD (Machine Learning, Artificial Intelligence & Data) Landscape

  1. Hey Matt,
    I’ve just come across this page and am blown away by the detail. Kudos!

    … but being of an SAP background I see that your current release does not accurately reflect the latest SAP Data/BI/Analytics suite.

    I’d be happy to provide input if you’d like?

  2. Ahh it would be good to include profitable bootstrapped startups that have proven market fit, big name customers and thought leadership in the data industry 😉 Like us — Sequentum! Data ingestion, data transformation, data enrichment, data structure and data delivery. All with governance, quality monitoring every step of the way, and low code efficiency. Maybe next year!

  3. Ab Initio (abinitio.com) is a leader in a great many of these spaces. And many of the largest international organisations are using Ab Initio.

    On first glance it is notably missing from:
    – GraphDBs
    – ETL/ELT/Data Transformation
    – Data Integration
    – Data Governance & Catalog
    – Data Quality & Observability
    and arguably others

    Please let me know if you’d like to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *