It has been another intense year in the world of data, full of excitement but also complexity.
As more of the world gets online, the “datafication” of everything continues to accelerate. This mega-trend keeps gathering steam, powered by the intersection of separate advances in infrastructure, cloud computing, artificial intelligence, open source and the overall digitalization of our economies and lives.
A few years ago, the discussion around “Big Data” was mostly a technical one, centered around the emergence of a new generation of tools to collect, process and analyze massive amounts of data. Many of those technologies are now well understood, and deployed at scale. In addition, over the last couple of years in particular, we’ve started adding layers of intelligence through data science, machine learning and AI into many applications, which are now increasingly running in production in all sorts of consumer and B2B products.
As those technologies continue to both improve and spread beyond the initial group of early adopters (FAANG and startups) into the broader economy and world, the discussion is shifting from the purely technical into a necessary conversation around impact on our economies, societies and lives.
We’re just starting to truly get a sense of the nature of the disruption ahead. In a world where data-driven automation becomes the rule (automated products, automated cars, automated enterprises), what is the new nature of work? How do we handle the social impact? How do we think about privacy, security, freedom?
Meanwhile, the underlying technologies continue to evolve at a rapid pace, with an ever vibrant ecosystem of startups, products and projects, heralding perhaps even more profound changes ahead. In that ecosystem, the year was characterized by the early innings of a long expected consolidation, and perhaps a passing of the guard from one era to another as early technologies are starting to give way to the next generation.
To try and make sense of it all, this is our sixth landscape and “state of the union” of the data and AI ecosystem. For anyone interested in tracking the evolution, here are the prior versions: 2012, 2014, 2016, 2017 and 2018.
Worth noting: as the term “Big Data” has now entered the museum of once-hot buzzwords, this year the chart will just be the “Data & AI Landscape”.
Also, to make the reading more digestible, we’ll break down the post into two parts:
Part I (this post) will include a few introductory thoughts on the rapidly evolving context around data privacy and regulation, which will have a profound impact on what can/cannot be done with data technologies; it will also include the landscape itself.
Data, AI and society: The tide is shifting
In 2018, we noted how the data world had started to reveal some darker, scarier undertones, in the wake of the Cambridge Analytica scandal in particular.
This trend continued to develop in 2019. There were more data breaches, more privacy scandals. More stories of surveillance state in China (including this report on a Muslim town in Northwest China). More freaky examples of AI deepfakes, for which we are very unprepared.
As a result, the tide has started to shift in earnest.
Certainly, the debate around the dangers of AI, with all its sci-fi connotations, had captured imaginations already, and this year has seen more initiatives around thinking through those issues, such as the launch of Fei Fei Li’s Institute for Human-Centered Artificial Intelligence.
But up until recently, questions around data ownership, privacy and security were met, for almost everyone but a vocal minority, with a resounding yawn.
Perhaps more than ever, privacy issues jumped to the forefront of public debate in 2019 and are now front, left and center. The fact that many of those issues were related to Facebook, a service known to billions, probably played an important role in sensitizing a much broader group of people around the world to the severity of the issues.
The data privacy landscape is also shifting, as governments are increasingly getting involved.
Regulation is certainly spreading in full force:
- GDPR, the European data protection and privacy regulation, came into effect in May 2018, and since then a few high profile fines have been announced including a €50 million fine issued to Google in January 2019 by the French data protection regulator and a £500,000 fine issued to Facebook in October 2018 by the UK’s Information Commissioner’s Office.
- The California Consumer Privacy Act (CCPA) will become effective on January 1, 2020.
- New York’s privacy bill is “even bolder” than California’s.
- San Francisco just voted to ban the use of facial recognition by city agencies.
- Illinois moved against video bots for hiring interviews.
Yet harsher government actions could take place. For starters, Facebook is likely to be fined up to $5B by the FTC over privacy issues. Perhaps most importantly, there have been increasing calls to break up the largest Internet franchises — too much power, too much data and not enough privacy. The clearest target has been Facebook (see this well- publicized opinion piece by one of its founders, Chris Hughes), but the discussion has included others as well (a proposal from presidential candidate Elizabeth Warren targets Google and Amazon).
Big Tech was already under pressure from within their own midst. Employees at Google, Amazon and Microsoft protested against the commercialization of their face recognition technology. Google relented. Amazon did not – some activist shareholders and employees tried to put a ban into effect, but were defeated.
For the FAANGs, privacy has become a new battleground, forcing their leaders to take much more of a public stance on the issue:
- Tim Cook, CEO of Apple, warned us about the “weaponization of data” which is leading us into a “data industrial complex.”
- Sundar Pichai, CEO of Google, took a public stand on the issue in the NY Times.
- Mark Zuckerberg, CEO of Facebook, vowed to turn Facebook into a privacy-focused messaging and social networking platform.
To which extent such statements should be taken for face value, of course, is anyone’s guess, and probably depends on the specific company and leader.
In Facebook’s case, the launch of Libra, a global cryptocurrency, could arguably be considered as a way to continue making money in a “post-data”, privacy-first world where the company would be less reliant on a pure advertising model based on user data – or as a way to collect even more personal data.
The debate around the impact of data and AI on privacy and society is obviously hugely important, and it is fundamentally healthy that it has become much more central over the last year or so.
Yet it is a complex discussion, which involves many nuances.
Our relationship to privacy continues to be a complicated one, full of mixed signals. People say they care about privacy, but continue to purchase all sorts of connected devices that have uncertain privacy protection. They say they are outraged by Facebook’s privacy breaches, yet Facebook continues to add users and beat estimates (both in Q4 2018 and Q1 2019).
In the same vein, how we decide to handle AI involves many trade-offs. As all technologies, AI is intrinsically neutral, and whether it creates good or bad for society is ultimately a human decision. Take face recognition for example: it can be a tool for state surveillance, but it can also help locate victims of sex trafficking Deciding how to regulate or curb AI, to the extent such a thing is even possible, would involve all sorts of second order consequences that are hard to predict. For example, if you regulate AI in the West, do you end up losing long term competitive advantage against China, which has a different set of rules (leaving aside any discussion on values)?
Data technologies: A vibrant but evolving landscape
While it is impossible in 2019 to ignore the broader questions of privacy, security and regulation around data and AI, the ecosystem of data technologies and products is as exciting (and full!) as ever.
The ecosystem is also evolving into some interesting ways, as some pioneering technologies such as Hadoop may be on their way out, replaced by cloud computing and Kubernetes, and entire segments, such as Business Intelligence, seem to be rapidly consolidating.
We’ll dig into those various trends in some detail, but first, here’s our 2019 Data & AI Landscape:
Some key resources:
- View in full size: click here
- Underlying list: despite how busy the landscape is, we cannot possibly fit in every interesting company on the chart itself. As a result, we have a whole spreadsheet that not only lists all the companies in the landscape, but also hundreds more – click here.
A few additional comments:
- Yes, you can zoom! The image and all logos are very high-res, so you can navigate the landscape in detail by zooming. Works very well on mobile, too!
- This year, my FirstMark colleague Lisa Xu provided immense help with the landscape.
- We’ve detailed some of our methodology in the notes at the end of this post.
- Thoughts and suggestions welcome – please use the comment section to this post. We’ll probably publish two or three revisions of the chart until it’s fully final.
Who’s in, who’s out?
The last year (since our 2018 landscape) has been active from an exit perspective.
Several companies on the landscape went public. Crowdstrike (NASDAQ:CRWD) and Elastic (NYSE:ESTC) reached big valuations at IPO time – $7B and $5B, respectively. Other IPOs included PagerDuty ($1.8B), Anaplan ($1.8B), and Domo ($500M).
Some very large acquisitions occurred in the last year, including Qualtrics (acquired by SAP for $8B), Medidata (acquired post-IPO by Dassault for $5.8B), Hortonworks ($5.2B merger with Cloudera), Imperva (acquired by Thoma Bravo for $2.1B), AppNexus (acquired by AT&T for up to $2B), Cylance (acquired by BlackBerry for $1.4B), Datorama (acquired by Salesforce for $800M), Treasure Data (acquired by Arm for $600M), Attunity (acquired post-IPO by Qlik for $560M), Dynamic Yield (acquired by McDonald’s for $300M), and Figure Eight (acquired by Appen for $300M).
Notably, there has been a wave of consolidation in business intelligence in just the last quarter: Tableau (acquired by Salesforce for $15.7B), Looker (acquired by Google for $2.6B), Periscope Data (acquired by Sisense for $100M), ClearStory Data (acquired by Alteryx for $20M), and Zoomdata (acquired by Logi Analytics).
Many other companies on the 2018 landscape were acquired for smaller amounts: Alooma (Google), Bonsai (Microsoft), Euclid Analytics (WeWork), Sailthru (Campaign Monitor), Data Artisans (Alibaba), GRIDSMART (Cubic), Drawbridge (LinkedIn), Citus Data (Microsoft), Quandl (NASDAQ), Connotate (import.io), Datafox (Oracle), Market Track (Vista Equity Partners), Lattice Engines (Dun & Bradstreet), Blue Yonder (JDA Software), SimpleReach (Nativo).
Also worth noting, the AI acqui-hire by large Internet companies, a fixture of 2016-2017, is not completely dead: Twitter acquired Fabula AI to strengthen its machine learning expertise, for example.
On the investment front, Big Data and AI startups continued to see big financing rounds. Investments in China were not quite as oversized as last year, when there were multiple companies that raised over a billion dollars. Chinese companies that raised large rounds this year included facial recognition company Face++ ($750M Series D), AI chip maker Horizon Robotics ($600M Series B), fleet management company G7 ($320M Series F), online tutoring platform Yuanfudao ($300M Series F).
In the US, huge investments went into autonomous vehicle companies, including Cruise ($1.9B across 2 rounds in 2018 and 2019), Nuro ($940M Series B), and Aurora ($600M Series B). RPA companies also saw massive rounds: UiPath ($800M across 2 rounds in 2018 and 2019) and Automation Anywhere ($550M across 2 rounds in 2018).
Other major rounds of US companies on the landscape include Verily Life Sciences ($1B private equity round), Cambridge Mobile Telematics ($500M), Clover Health ($500M Series E), Veeam Software ($500M), Snowflake Computing ($450M Series F), Compass ($400M Series F), Zymergen ($400M Series C), Dataminr ($392M Series E), Lemonade ($400M Series D), Rubrik ($260M Series E), Databricks ($250M Series E), and MediaMath ($225M Series D).
End of Part I.