Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape

Full resolution version of the landscape image here

It’s been a hot, hot year in the world of data, machine learning and AI. 

Just when you thought it couldn’t grow any more explosively, the data/AI landscape just did: rapid pace of company creation, exciting new product and project launches, a deluge of VC financings, unicorn creation, IPOs, etc.  

It has also been a year of multiple threads and stories intertwining.

One story has been the maturation of the ecosystem, with market leaders reaching large scale and ramping up their ambitions for global market domination, in particular through increasingly broad product offerings.  Some of those companies, such as Snowflake, have been thriving in public markets (see our MAD Public Company Index), and a number of others (Databricks, Dataiku, Datarobot, etc.) have raised very large (or in the case of Databricks, gigantic) rounds at multi-billion valuations and are knocking on the IPO door (see our Emerging MAD company Index – both indexes will be updated soon).

But at the other end of the spectrum, this year has also seen the rapid emergence of a whole new generation of data and ML startups.  Whether they were founded a few years or a few months ago, many experienced a growth spurt in the last year or so.  As we will discuss, part of it is due to a rabid VC funding environment and part of it, more fundamentally, is due to inflection points in the market.

In the last year, there’s been less headline-grabbing discussion of futuristic applications of AI (self-driving vehicle, etc.), and a bit less AI hype as a result.  Regardless, data and ML/AI-driven application companies have continued to thrive, particularly those focused on enterprise use cases.  Meanwhile, a lot of the action has been happening behind the scenes on the data and ML infrastructure side, with entire new categories (data observability, reverse ETL, metrics stores, etc.) appearing and/or drastically accelerating.

To keep track of this evolution, this is our eighth annual landscape and “state of the union” of the data and AI ecosystem – co-authored this year with my FirstMark colleague John Wu.  (For anyone interested, here are the prior versions: 2012, 2014, 2016, 2017, 2018, 2019 (Part I and Part II) and 2020.)

For those who have remarked over the years how insanely busy the chart is, you’ll love our new acronym – Machine learning, Artificial intelligence and Data (MAD) – this is now officially the MAD landscape!

Continue reading “Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape”

Dataiku’s Series E: Ushering the Era of Everyday AI

Today, Dataiku is announcing a major new financing – a total of $400m at a $4.6B valuation, led by Tiger Global (which had also invested in the company’s Series D), alongside a great group of existing and new investors.

While financings are ultimately just milestones, this is certainly a testament to the remarkable progress the company has been making towards becoming a major global software player, as it has scaled to hundreds of customers around the world and some 750 employees (and yes, hiring a lot more).

Beyond the headlines and high-fives, what is the story? Here’s a quick industry backgrounder and reminder for anyone new to the company.

A huge part of the data world has been historically focused on business intelligence, with both historical players (Tableau, Microsoft’s Power BI, Google Looker) and newer players (SiSense, Mode, etc.). Business intelligence tools enable you to analyze the past and the present of your business: “which region performed best last quarter?”, “who are our best salespeople?” etc. This is sometimes referred to as descriptive analytics.

Dataiku is a leader in another part of the data world, which different people call different names: data science, enterprise AI (for artificial intelligence), enterprise machine learning. Beyond the semantics, the core idea is to make it possible to asnwer questions about the future of your business, based on the analysis of historical data: “which customers are most likely to buy this product?”, “which customers are most likely to churn?”, “which transaction is most likely to be fraudulent?”, “which region is most likely to show strong demand this month?”. This area is sometimes referred to as predictive analytics.

Continue reading “Dataiku’s Series E: Ushering the Era of Everyday AI”

Congrats, Sketchfab!

This morning, Sketchfab announced that it was joining the Epic Games family.

From inception, Sketchfab has been a visionary company in the creator economy, pioneering the emergence of 3D as a key format on the web. It built the best 3D viewer on the market, and leveraged it to build a remarkable community of 3D creators and enthusiasts all around the world. It navigated the ups and downs of the “VR Winter” and, through entrepreneurial grit and great execution, emerged on the other side a stronger, profitable company – a journey that CEO Alban Denoyel documented with remarkable transparency.

Today, Sketchfab is the center of the 3D world, addressing the need of individual creators and companies alike: a 5M user community, a huge library of models, a marketplace to buy and sell 3D models and a fast-growing enterprise business.

Epic is the perfect partner for Sketchfab. It has epic (yes) plans for building the Metaverse (see Matthew Ball’s excellent essays on the Metaverse here and Epic here). The Metaverse will be a heavy consumer of 3D, AR and VR content, and Sketchfab fits perfectly within that vision. Sketchfab will continue operating largely as an independently branded service, and will be able to access Epic’s resources and distributions capabilities.

Continue reading “Congrats, Sketchfab!”

In Conversation with Elementl (Dagster), Meroxa and Superconductive (Great Expectations)

This last year has seen tremendous levels of activity for early stage startups in the data infrastructure ecosystem. At our most recent Data Driven NYC, we featured some of the rising stars:

  • Nick Schrock, Founder & CEO, Elementl (Dagster) | Elementl is building the next generation of open source data tools including Dagster, the open-source data orchestrator for machine learning, analytics, and ETL.
  • DeVaris Brown, Founder & CEO, Meroxa | Meroxa is a real-time data platform that gives data teams the tools they need to build real-time infrastructure in minutes.
  • Abe Gong, Founder & CEO, Superconductive (Great Expectations) | Superconductive is the team behind Great Expectations, the leading open source tool for defeating pipeline debt through data testing, documentation, and profiling. The company’s mission is to revolutionize the speed and integrity of data collaboration.
Continue reading “In Conversation with Elementl (Dagster), Meroxa and Superconductive (Great Expectations)”

Quick S-1 Teardown: Confluent

A member of our Emerging MAD Index of companies on their path to an IPO, Confluent is a very interesting company in a strategic part of the data space, providing infrastructure for real-time data streaming – what it nicely calls “data in motion”, in contrast to the world of batch processing or “data at rest”.

I had the pleasure of hosting the company’s co-founder and then CTO, Neha Narkhede, at Data Driven NYC back in 2016, and her great talk remains entirely relevant to understand the premise behind the company and its core technical foundation.

Confluent recently released its full S-1, and will trade under the stock ticker CFLT on the NASDAQ.

In the same vein as previous “Quick S-1 teardowns” (see Palantir, Snowflake, nCino), here are some high level thoughts and quick highlights, from my colleague John Wu and I.

Continue reading “Quick S-1 Teardown: Confluent”

In Conversation with Ali Ghodsi, CEO, Databricks

Databricks is an enterprise software giant in the making. Most recently valued at $28B in a $1B fundraise announced in February 2021, the company has global ambitions in the data and AI space.

An unlikely story of a company started by seven co-founders, most of whom were academics, built around the Spark open source project, Databricks is heading towards a monster IPO that will accelerate its rivalry with its chief competitor, Snowflake.

I had a chance to interview then co-founder and then CEO Ion Stoica at Data Driven NYC back in 2015, when Databricks was a company very aggressively courted by VCs, but still very early in commercial traction.

It was a real treat to catch up with Ali Ghodsi, who took over as CEO in 2015.

Below is the video and below that, the transcript.

Continue reading “In Conversation with Ali Ghodsi, CEO, Databricks”

Congratulations, Text IQ!

A couple of years ago, FirstMark led the Series A of Text IQ, an impressive AI startup focused on the management of unstructured data in the enterprise for legal, privacy and compliance purposes. The company was co-founded by Apoorv Agarwal (CEO, left) and Omar Haroun (COO, right), and had managed to grow both fast and profitably after raising a seed from our friends at Floodgate.

At its core, Text IQ leverages unsupervised learning to identify sensitive information (privileged documents, PII, PHI, etc.) in large amounts of unstructured data – a challenge that AI is uniquely equipped to solve.

After its Series A, Text IQ continued to make strong progress, building a great team, evolving the product into an enterprise-grade platform and securing an impressive list of Fortune 1000 customers.

Not surprisingly, this attracted the attention not just from potential Series B investors, but also acquirers.

Today, Text IQ is announcing its acquisition by Relativity, a leader in the discovery and compliance market, which just announced a large financing led by Silver Lake.

Continue reading “Congratulations, Text IQ!”

In Conversation with Victor Riparbelli (CEO) and Matthias Niessner (Co-Founder), Synthesia

One of the most exciting emerging areas for AI is content generation. Powered by anything from GANs to GPT-3, a new generation of tools and platforms enables the creation of highly customizable content at scale – whether text, images, audio or video – opening up a broad range of consumer and enterprise use cases.

At FirstMark, we recently announced that we had led the Series A in Synthesia, a startup providing impressive AI synthetic video generation capabilities to both creators and large enterprises.

As a follow up to our investment announcement, we had the pleasure of hosting two of Synthesia’s co-founders, Victor Riparbelli (CEO) and Matthias Niessner (co-founder and a Professor of Computer Vision at Technical University of Munich).

Some of topics we covered:

  • The rise of Generative Adversarial Networks (GANs) in AI
  • Use cases for synthetic video in the enterprise
  • Synthetic videos vs deep fakes
  • What’s next in the space

Below is the video and below that, the transcript.

Continue reading “In Conversation with Victor Riparbelli (CEO) and Matthias Niessner (Co-Founder), Synthesia”

In conversation with Dev Ittycheria, CEO, MongoDB

MongoDB’s path from unlikely NYC enterprise tech startup to global category leader has been amazing to watch.

I’ve had the pleasure of hosting two of MongoDB’s co-founders over the years, first Dwight Merriman back in 2012 (here) and then CTO Eliot Horowitz in 2016 (here). So it was a real treat this time to get to chat with CEO Dev Ittycheria, who has been leading the company since 2014, and it particular has presided over the company’s remarkable ride in public markets since its 2017 IPO.

In addition to being a truly world-class CEO, Dev has had an outsized impact on the New York tech scene, as he’s been playing a central role both at MongoDB and also at Datadog, where he’s been a long time board member (after leading the company’s Series B back in 2014).

We had a wide-ranging conversation where we covered:

  • Dev’s journey as a CEO and investor
  • The evolution of enterprise tech in New York
  • MongoDB’s database as a service offering, Atlas
  • Newest products and product roadmap
  • Open source
  • GTM strategies, bottoms up vs top down
  • Lessons in scaling the team
  • Being a student of the game rather than a master of the game
Continue reading “In conversation with Dev Ittycheria, CEO, MongoDB”

Introducing the *Emerging* MAD (Machine Learning, AI, Data) Index

A few weeks ago, my colleague John Wu and I introduced the MAD Index, a new public market index to track the progress of “pure play” machine learning, AI and data public companies. This was an initial group of 13 companies, which has since then increased to 14, following the UiPath IPO.

Today, we’re introducing the Emerging MAD Index, a companion to the public MAD index. The idea is to track a group of private companies that show high potential to join the MAD Index in the future.

Criteria

Just like the Public MAD Index, our goal is to capture “pure play” machine learning, AI and data companies.

In practice, that generally means infrastructure companies offering tools to store, process and analyze data, create and manage machine learning models, and/or automate core processes deep in the stack – broadly horizontal companies serving a variety of business needs across departments, industries and geographies.

Continue reading “Introducing the *Emerging* MAD (Machine Learning, AI, Data) Index”

In Conversation with Florian Douetteau, CEO, Dataiku

Dataiku (in which I’m a proud investor and board member) has had an impressive ride over the last few years. An early entrant in the enterprise Data Science and Machine Learning platform category, the company successfully expanded from its French/European roots to build a very strong presence in the US (where it is company is now headquartered) and, increasingly, Asia.

Along the way, Dataiku:

  • became a unicorn, most recently raising a $100M Series D in 2020
  • was named a “Leader” in Gartner’s Magic Quadrant for Data Science and ML Platforms in both 2020 and 2021
  • collected many accolades, such as CB Insight’s “AI 100” and several of Forbes lists: “Cloud 100”, “AI 50” and “America’s best startup employers in 2021”

It was really fun to host CEO Florian Douetteau at Data Driven NYC once again, after previous appearances in 2016 (here) and 2018 (here). We covered a bunch of different topics, including:

  • What enterprise AI is about: not flying cars, but optimizing hundreds of business processes
  • Why enterprises need to move past their fear of data and AI
  • The key principles behind the design of the Dataiku platform: handling the entire data lifecycle, and democratizing data/AI across teams
  • Dataiku’s partnership with Snowflake
  • The upcoming launch of their starter / SMB self-serve product, Dataiku Online

Below is the video and below that, the transcript.

Continue reading “In Conversation with Florian Douetteau, CEO, Dataiku”

In Conversation with Bindu Reddy, CEO, Abacus

At our most recent Data Driven NYC, we had the great pleasure of hosting Bindu Reddy, CEO and co-founder Abacus AI, and formerly GM & creator of AI verticals at AWS, and an ex-Googler. Bindu also has a very witty and entertaining Twitter account (@bindureddy), where she talks about all things machine learning and AI.

This was a very educational and approachable conversation, where we covered:

  • some key definitions: neural networks, weights and biases, supervised vs unsupervised learning, feature store
  • Applying neural networks to structured, tabular data
  • Abacus’ vision around “autonomous AI”
  • How companies wait too long to start experimenting in ML/AI

Below is the video and below that, the transcript.

Continue reading “In Conversation with Bindu Reddy, CEO, Abacus”

New Investment: Synthesia or the Rise of “Video as Code”

We all have insatiable appetite for video, both in our personal and professional lives. Time and again, video is shown to capture our attention better than any other medium. This is increasingly how we learn, explore, collaborate and get entertained.

However, especially in an enterprise context, creating professional-quality video remains a complex and costly endeavor. For all the capabilities of smartphones, most companies still need studio-level equipment to produce enterprise-grade videos: cameras, sound equipment, actors, post-production editing. The process is time-consuming, and not very scalable. Shooting a video in multiple languages, for example, requires multiple actors or dubbing, Any update requires everyone to go back to the studio.

But what if video could be just… code? What if it could be infinitely flexible and customizable at scale, as simple as an API call?

Today we’re excited to announce that FirstMark led a $12.5M Series A investment in Synthesia – a fast-growing startup that offers exactly that.

Synthesia makes creating a business video as simple as writing an email or putting together a powerpoint presentation – a compelling “text to video” experience.

Continue reading “New Investment: Synthesia or the Rise of “Video as Code””

Introducing the MAD (ML, AI, Data) Public Company Index

Today, we are previewing a new public market index – the MAD (for machine learning, AI and data) index.

Readers of this blog know that we have been tracking the data ecosystem since 2012, through annual landscapes (see the 2020 Data & AI Landscape).

Over the last few years, a funny thing happened – some of the small startups we had started tracking grew up, did an IPO and became large public companies.

Not so long ago, public market investors used to say there’s was no good way of “playing” the Big Data and AI trends, due to the lack of public companies in the space. This is less true today.

However, there isn’t much out there in terms of looking at those public companies as a group. For example, see this Seeking Alpha piece, Top 3 Artificial Intelligence ETFs To Consider, where none of the companies listed are actually AI companies.

Hence the idea of the MAD Index. It’s still a small group of companies, but my colleague John Wu and I were curious to see how they fared in public markets, now and going forward.

This is just a start. We anticipate that a number of companies will join this group in the next year or two, and we’re excited to see how this index matures.

Continue reading “Introducing the MAD (ML, AI, Data) Public Company Index”

In Conversation with Dave Burgess, Head of Data Engineering, Pinterest

Pinterest is near and dear to our hearts at FirstMark because we had the good fortune of being the first institutional investor back in 2009 when the company was just getting started (fun fact: the founders were in New York for a brief moment in time before moving to the Bay Area). Pinterest has had a remarkable ride ever since, and it’s a $49B market cap public company at the time of writing.

So it was a particular pleasure to welcome Dave Burgess, Head of Data Engineering, to come and talk to the Data Driven NYC audience about all things data at Pinterest.

We covered a bunch of interesting topics, including:

  • Pinterest’s newly open sourced project, QueryBook
  • The stack Pinterest uses to manage is 400 petabytes of data
  • The use cases for data analytics and machine learning at Pinterest

Below is the video and below that, the transcript.

Continue reading “In Conversation with Dave Burgess, Head of Data Engineering, Pinterest”