For anyone interested in a quick overview of our long-form 2021 Machine Learning, AI and Data (MAD) Landscape, here are the Cliffs Notes! My co-author John and I did a presentation at our most recent Data Driven NYC, focused on top 10 trends in this year’s landscape.
As a preview, here they are:
Every company is a data company
The big unlock: data warehouses and lakehouses
Consolidation vs data mesh: the future is hybrid
An explosive funding environment
A busy year in DataOps
It’s time for real time
The action moves to the right side of the warehouse
The rise of AI generated content
From MLOps to ModelOps
The continued emergence of a separate Chinese AI stack
Below is the video from the event, and below that, the transcript.
Full resolution version of the landscape image here
It’s been a hot,hot year in the world of data, machine learning and AI.
Just when you thought it couldn’t grow any more explosively, the data/AI landscape just did: rapid pace of company creation, exciting new product and project launches, a deluge of VC financings, unicorn creation, IPOs, etc.
It has also been a year of multiple threads and stories intertwining.
One story has been the maturation of the ecosystem, with market leaders reaching large scale and ramping up their ambitions for global market domination, in particular through increasingly broad product offerings. Some of those companies, such as Snowflake, have been thriving in public markets (see our MAD Public Company Index), and a number of others (Databricks, Dataiku, Datarobot, etc.) have raised very large (or in the case of Databricks, gigantic) rounds at multi-billion valuations and are knocking on the IPO door (see our Emerging MAD company Index – both indexes will be updated soon).
But at the other end of the spectrum, this year has also seen the rapid emergence of a whole new generation of data and ML startups. Whether they were founded a few years or a few months ago, many experienced a growth spurt in the last year or so. As we will discuss, part of it is due to a rabid VC funding environment and part of it, more fundamentally, is due to inflection points in the market.
In the last year, there’s been less headline-grabbing discussion of futuristic applications of AI (self-driving vehicle, etc.), and a bit less AI hype as a result. Regardless, data and ML/AI-driven application companies have continued to thrive, particularly those focused on enterprise use cases. Meanwhile, a lot of the action has been happening behind the scenes on the data and ML infrastructure side, with entire new categories (data observability, reverse ETL, metrics stores, etc.) appearing and/or drastically accelerating.
To keep track of this evolution, this is our eighth annual landscape and “state of the union” of the data and AI ecosystem – co-authored this year with my FirstMark colleague John Wu. (For anyone interested, here are the prior versions: 2012, 2014, 2016, 2017, 2018, 2019 (Part I and Part II) and 2020.)
For those who have remarked over the years how insanely busy the chart is, you’ll love our new acronym – Machine learning, Artificial intelligence and Data (MAD) – this is now officially the MAD landscape!
Today, Dataiku is announcing a major new financing – a total of $400m at a $4.6B valuation, led by Tiger Global (which had also invested in the company’s Series D), alongside a great group of existing and new investors.
While financings are ultimately just milestones, this is certainly a testament to the remarkable progress the company has been making towards becoming a major global software player, as it has scaled to hundreds of customers around the world and some 750 employees (and yes, hiring a lot more).
Beyond the headlines and high-fives, what is the story? Here’s a quick industry backgrounder and reminder for anyone new to the company.
A huge part of the data world has been historically focused on business intelligence, with both historical players (Tableau, Microsoft’s Power BI, Google Looker) and newer players (SiSense, Mode, etc.). Business intelligence tools enable you to analyze the past and the present of your business: “which region performed best last quarter?”, “who are our best salespeople?” etc. This is sometimes referred to as descriptive analytics.
Dataiku is a leader in another part of the data world, which different people call different names: data science, enterprise AI (for artificial intelligence), enterprise machine learning. Beyond the semantics, the core idea is to make it possible to asnwer questions about the future of your business, based on the analysis of historical data: “which customers are most likely to buy this product?”, “which customers are most likely to churn?”, “which transaction is most likely to be fraudulent?”, “which region is most likely to show strong demand this month?”. This area is sometimes referred to as predictive analytics.
We all have insatiable appetite for video, both in our personal and professional lives. Time and again, video is shown to capture our attention better than any other medium. This is increasingly how we learn, explore, collaborate and get entertained.
However, especially in an enterprise context, creating professional-quality video remains a complex and costly endeavor. For all the capabilities of smartphones, most companies still need studio-level equipment to produce enterprise-grade videos: cameras, sound equipment, actors, post-production editing. The process is time-consuming, and not very scalable. Shooting a video in multiple languages, for example, requires multiple actors or dubbing, Any update requires everyone to go back to the studio.
But what if video could be just… code? What if it could be infinitely flexible and customizable at scale, as simple as an API call?
Today we’re excited to announce that FirstMark led a $12.5M Series A investment in Synthesia – a fast-growing startup that offers exactly that.
Synthesia makes creating a business video as simple as writing an email or putting together a powerpoint presentation – a compelling “text to video” experience.
Today, we are previewing a new public market index – the MAD (for machine learning, AI and data) index.
Readers of this blog know that we have been tracking the data ecosystem since 2012, through annual landscapes (see the 2020 Data & AI Landscape).
Over the last few years, a funny thing happened – some of the small startups we had started tracking grew up, did an IPO and became large public companies.
Not so long ago, public market investors used to say there’s was no good way of “playing” the Big Data and AI trends, due to the lack of public companies in the space. This is less true today.
However, there isn’t much out there in terms of looking at those public companies as a group. For example, see this Seeking Alpha piece, Top 3 Artificial Intelligence ETFs To Consider, where none of the companies listed are actually AI companies.
Hence the idea of the MAD Index. It’s still a small group of companies, but my colleague John Wu and I were curious to see how they fared in public markets, now and going forward.
This is just a start. We anticipate that a number of companies will join this group in the next year or two, and we’re excited to see how this index matures.
For anyone following the software industry, there’s been a little bit of snark about C3.ai (“C3”) over the years. Here’s a company that was founded by Silicon Valley royalty (Tom Siebel, who sold Siebel Systems to Oracle in 2006 for just shy of $6B), with seemingly limitless access to capital, that somehow seemed to be pivoting every few years to something new – from energy at first, to the Internet of Things, to Artificial Intelligence.
C3 also largely eschewed the startup echochamber – funded personally by its founder at first, it didn’t raise money from the usual VC suspects, target well-know startups as its first customers, or open source any AI frameworks, working instead with a small group of Fortune 1000 and government customers. As a result, it didn’t build the kind of buzz that often precedes the most notable startups on their way to becoming public.
Lo and behold, what emerges in this IPO is a solid company by enterprise software IPO standards, with $157m in revenue, growing 71% yoy, a 75% gross margin and a $69m loss.
It will be interesting to see how the market reacts to this IPO.
On the one hand, C3 is not growing anywhere as explosively as a Snowflake, and in fact seems to have just had a bad quarter of decelerating growth. There are also other concerns, including account concentration and a substantial loss (not as pronounced as a Snowflake or Palantir, but still on the higher range of the software market).
On the other hand, the tailwinds around the deployment of ML/AI in the enterprise are very strong, and C3 is clearly positioning itself as one of the very first enterprise AI companies to go public: its ticker symbol on the NYSE will be “AI”, and the term “machine learning” is mentioned 56 times in the S-1.
This IPO will be an interesting test for the continued appetite of financial markets for all things AI.
Here’s a quick analysis of the S-1 and main characteristics of the business, put together by my FirstMark colleague John Wu and I.
In a year like no other in recent memory, the data ecosystem is showing not just remarkable resilience but exciting vibrancy.
When COVID hit the world a few months ago, an extended period of gloom seemed all but inevitable. Yet, as per Satya Nadella, “two years of digital transformation [occurred] in two months”. Cloud and data technologies (data infrastructure, machine learning / artificial intelligence, data driven applications) are at the heart of digital transformation. As a result, many companies in the data ecosystem have not just survived, but in fact thrived, in an otherwise overall challenging political and economic context.
Perhaps most emblematic of this is the blockbuster IPO of Snowflake, a data warehouse provider, which took place a couple of weeks ago and catapulted Snowflake to a $69B market cap company, at the time of writing – the biggest software IPO ever (see our S-1 teardown). And Palantir, an often controversial data analytics platform focused on the financial and government sector, became a public company via direct listing, reaching a market cap of $22B, at the time of writing (see our S-1 teardown).
Earlier this week, Forbes published a piece on ScaleFactor, a startup using AI to automate accounting, which shut down after raising $100m.
Here’s the heart of the issue covered in the story: “Instead of [AI] producing financial statements, dozens of accountants did most of it manually from ScaleFactor’s Austin headquarters or from an outsourcing office in the Philippines, according to former employees. Some customers say they received books filled with errors, and were forced to re-hire accountants, or clean up the mess themselves.“
While AI may seem like a futuristic goal for most companies around the world, Facebook has already been there for a while. “There’s pretty much a deep learning system in every single Facebook product and they are very much at the core of them” says our guest Jerome Pesenti, VP of AI at Facebook.
Jerome leads the development of artificial intelligence at Facebook, and oversees hundreds of scientists and engineers whose work shapes the company’s direction and impacts our world.
We had had the pleasure of welcoming Jerome at Data Driven NYC in October 2017, in his prior role as CEO, BenevolentAI, and we had chatted about using AI for drug discovery.
It was wonderful to welcome him back in his new capacity at our first **online** Data Driven NYC, courtesy of the coronavirus. It was a fascinating, in-depth conversation.
Below are: a) the video, b) some highlights and c) the full transcript.
By any measure, Datadog is an incredible entrepreneurial success story. The company went from a tiny startup in 2010 that had trouble raising money, to a public company that, at the time of writing, has a market capitalization of $12.5B. It was a pioneer in the category of DevOps and observability, and it’s now a clear leader. With revenues hovering around $350M, it has 1,300 employees across 31 locations around the world.
Perhaps improbably, the founders built the company out of New York, which many people over the years have thought of as a hub for adtech, media and commerce startups only. Along the way, they faced a lot of skepticism: “Whenever we pitched West Coast investors it was sort of seen as a form of mental deficiency to be based in New York and doing infrastructure“, says Olivier. I wrote a few months ago about the significance of the Datadog IPO for the ecosystem and beyond. Ironically, out of the three top public tech companies in New York today, two are infrastructure software companies (Datadog and MongoDB).
Not one for gratuitous self-aggrandizing, Olivier has given surprisingly few interviews over the years, and it was a real treat to sit down with him for a fireside chat in front of a packed house of 350 attendees at our most recent Data Driven NYC.
We had an in-depth conversations and covered a lot of topics.
The first half of our conversation was focused on Datadog itself, starting with a high level overview of the observability and DevOps space to make the discussion approachable by people who don’t know the space.
The second half of the conversation was focused on all sorts of lessons learned along the way of building a major company- sales, marketing, fundraising, etc.
Below is the video. We have also provided a full written transcript to make the content easy to scan through (many thanks to Karissa Domondon for her help with this).
Our most recent VC guest at Data Driven NYC, Mike Volpi of Index, has had a pretty amazing last couple of years, with three of his venture investments going public: Zuora, Sonos and Elastic.
Before becoming a VC, Mike ran Cisco’s routing business where he managed a P&L in excess of $10 billion in revenues, and acquired over 70 companies (note: probably a pretty good way to make a lot of friends in Silicon Valley).
A partner at Index Ventures in San Francisco, Mike invests primarily in infrastructure, open-source and artificial intelligence companies, so he was a perfect guest to have at the event. In particular, he invested in two prior presenting companies: Confluent and Cockroach Labs (in which FirstMark is also an investor).
We had a really interesting conversation about open source, AI and venture capital. Here’s the video below, and l have jotted down a few notes as well, below the fold.
Best-selling author, Professor of Computer Science at the University of Washington, recent recipient of the prestigious IJCAI John McCarthy Award for excellence in artificial intelligence research (among other awards) and Head of the Machine Learning Research group at D.E. Shaw: Pedro Domingos has one of the most incredible resumes in the world of AI, and we were thrilled to host him for a fireside chat at our most recent Data Driven NYC.
We covered a bunch of things, including why finance is a killer app for machine learning, his much-lauded book, ‘The Master Algorithm’ and what’s truly scary about AI (hint: not the Terminator).
Should we be worried about the prospect of AI superintelligence taking over the world?
“In the real world, current-day robots struggle to turn doorknobs, and Teslas driven in ‘Autopilot’ mode keep rear-ending parked emergency vehicles […]. It’s as if people in the fourteenth century were worrying about traffic accidents, where good hygiene might have been a whole lot more helpful”.
This is one of my favorite quotes from “Rebooting AI: Building Artificial Intelligence We Can Trust,” a new book by Gary Marcus – scientist, NYU professor, New York Times bestselling author, entrepreneur – and his co-author Ernest Davis, Professor of Computer Science at the Courant Institute, NYU.
Gary did us a big honor recently: he chose to speak at Data Driven NYC on the evening of the publication of the book. He also signed a few copies. Our first book launch party!
Particularly if you’re trying to make sense of the still-ongoing hype around AI, including predictions of global gloom, Gary’s book is a fantastic read: a lucid, no-nonsense and occasionally provocative take on the current state of AI, that distills complex concepts into simple ideas, and includes plenty of interesting and often funny anecdotes.
In its largest acquisition since Oculus in 2014, Facebook just announced last night it acquired CTRL-labs, a 4 year old startup based in New York, for a reported $500M-$1B.
Coincidentally, CTRL-labs CEO, Thomas Reardon (who goes by Reardon) was our guest at Data Driven NYC just a couple of weeks ago. Reardon is a particularly compelling entrepreneur, and this was a fascinating fireside chat, where we dove into machine learning, neuroscience, VR and all sorts of cool topics.
CTRL-labs builds what it calls “neural interface technology”: algorithms that decode the activity of individual motor neurons and turns that into control over machines, thereby completely redefining the interaction between humans and machines. Because the technology captures your intentions without requiring any physical movement, you can do things that you could never do by moving, and you can start “imaging experiences where you would have 20 fingers… or 8 arms or legs”.
The video (below) is well worth a watch in its entirety, including the audience Q&A at the end, and I’ve jotted down a few notes as well, for a quick review.
Part I of the 2019 Data & AI Landscape covered issues around the societal impact of data and AI, and included the landscape chart itself. In this Part II, we’re going to dive into some of the main industry trends in data and AI.
The data and AI ecosystem continues to be one of the most exciting areas of technology. Not only does it have its own explosive momentum, but it also powers and accelerates innovation in many other areas (consumer applications, gaming, transportation, etc). As such, its overall impact is immense, and goes much beyond the technical discussions below.
Of course, no meaningful trend unfolds over the course of just one year, and many of the following has been years in the making. We’ll focus the discussion on trends that we have seen particularly accelerating in 2019, or gaining rapid prominence in industry conversations.
We will loosely follow the order of the landscape, from left to right: infrastructure, analytics and applications.