Introducing the *Emerging* MAD (Machine Learning, AI, Data) Index

A few weeks ago, my colleague John Wu and I introduced the MAD Index, a new public market index to track the progress of “pure play” machine learning, AI and data public companies. This was an initial group of 13 companies, which has since then increased to 14, following the UiPath IPO.

Today, we’re introducing the Emerging MAD Index, a companion to the public MAD index. The idea is to track a group of private companies that show high potential to join the MAD Index in the future.


Just like the Public MAD Index, our goal is to capture “pure play” machine learning, AI and data companies.

In practice, that generally means infrastructure companies offering tools to store, process and analyze data, create and manage machine learning models, and/or automate core processes deep in the stack – broadly horizontal companies serving a variety of business needs across departments, industries and geographies.

More specifically, we used the following criteria for inclusion

  • Private companies (non-public, non-exited), generally, but not necessarily, venture-backed
  • Software or infrastructure as a service only
  • Data/artificial intelligence/machine learning product accounted for majority (>50%) of company‚Äôs revenues
  • Product offering can be applied generically across a range of cross-industry use cases
  • Since this is a list of companies poised to go public in the next few years, preference for late-stage, “unicorn” type companies (latest valuation known to be >$1B)

Conversely, we decided to not include the following types of companies:

  • Applications that heavily leverage AI but for the benefit of specific business users in the enterprise, such as sales (Gong, etc) or customer support (our portfolio company Ada) – many very impressive companies in that category, but arguably those are less “pure play”
  • Applications that heavily leverage AI to target specific verticals (insurance, genomics, etc.)
  • Hardware vendors (GPUs, etc.)
  • Data brokers

As for all lists, deciding which companies we should include/exclude was a difficult task, and we agonized over where to draw the line. Specifically, we heavily debated whether to include robotics process automation (RPA) and players like Automation Anywhere and Workato. Given that ML and AI are increasingly key to their core use cases (process mining, handling tasks with computer vision, etc.), we ultimately decided to include them based on their usage of AI across a broad range of use cases within the enterprise, and across a broad range of enterprises. Once we decided to add those RPA players, it made sense to also include the native ML/AI players in that broad space as well, specifically Hyperscience and Instabase.

The List

Some comments on the list:

  • Some of those companies are about to go public, and join the public MAD Index – specifically, Couchbase and Confluent
  • Although this is mostly a unicorn list, we selectively included a couple of companies that we believe are currently below that $1B valuation bar, based on the ubiquity of their open-source projects, specifically: Fishtown Analytics and Hugging Face
  • While it includes companies with European roots (Celonis, Collibra, Dataiku), this is a US-heavy index, as it’s meant to be a list of companies likely to go public on a US market (NYSE or NASDAQ). Obviously, incredible companies are being built around world, including in China.
  • As a disclaimer, our firm FirstMark is an investor in Cockroach Labs, Dataiku and Hyperscience


One surprise to us, as we were putting the index together: it’s a pretty short list. Particularly compared to the broad software world, it’s still early days in the ML/AI/Data world. The bulk of exciting startups are early stage in funding and years away from an IPO.

Some quick numbers from the Emerging MAD companies:

  • Combined, the companies have raised a whopping $12.9B in venture capital
  • Total private valuation is just under $119B
  • The average Emerging MAD total funding raised is $462M, while the median is $347M
  • The average Emerging MAD company valuation is $3.2B, while the median is $2.1B
  • The average company is 9 years old and founded in 2012 while the median company is 8 years old and founded in 2013
  • The youngest company on this list is Starburst Data (founded in October 2017) and the oldest is Automation Anywhere (founded in 2003)

Funding for companies in the MAD stacks have rapidly increased over the last 5 years. The bulk of funding for companies in the Emerging MAD Index occurred recently, with 67% of all funding occurring in the last 3 years. Funding for the first 4 months of 2021 have already outpaced that in the entirety of 2020, headlined by the recent $1B Series G raise from Databricks. The increasing number of rounds is unlikely to slow down any time soon, as venture investors have been highly active across the data and machine learning stacks in data lineage, data quality, labeling, orchestration, and more.

The number of investment rounds for companies in this index has also steadily increased over the last few years, with 21 total rounds accounted for last year, up slightly from 20 the year prior. 

Following the broader market trend of larger round sizes, the average round size for Emerging MAD companies has become successively higher in each following 3 year period:

Are there any companies which you feel should be included that we missed or any suggestions for either of the MAD indexes? Please mention in comments to this post below (or on LinkedIn or on Twitter at @mattturck and @john_d_wu).

2 thoughts on “Introducing the *Emerging* MAD (Machine Learning, AI, Data) Index”

  1. Consider Postman,, Alation and Denodo in the private category, and Informatica & Qlik from the were public before and will likely be again.

    1. Thanks, good suggestions — as this is a pre-IPO list, we’ve used private market valuations (unicorn etc) as an imperfect measure for traction and stage, with a couple of exceptions. But we’ll consider those for the next version (I hosted H2), Alation and Qlik at my Data Driven NYC event over the years).

Leave a Reply

Your email address will not be published. Required fields are marked *