Dataiku’s Series E: Ushering the Era of Everyday AI

Today, Dataiku is announcing a major new financing – a total of $400m at a $4.6B valuation, led by Tiger Global (which had also invested in the company’s Series D), alongside a great group of existing and new investors.

While financings are ultimately just milestones, this is certainly a testament to the remarkable progress the company has been making towards becoming a major global software player, as it has scaled to hundreds of customers around the world and some 750 employees (and yes, hiring a lot more).

Beyond the headlines and high-fives, what is the story? Here’s a quick industry backgrounder and reminder for anyone new to the company.

A huge part of the data world has been historically focused on business intelligence, with both historical players (Tableau, Microsoft’s Power BI, Google Looker) and newer players (SiSense, Mode, etc.). Business intelligence tools enable you to analyze the past and the present of your business: “which region performed best last quarter?”, “who are our best salespeople?” etc. This is sometimes referred to as descriptive analytics.

Dataiku is a leader in another part of the data world, which different people call different names: data science, enterprise AI (for artificial intelligence), enterprise machine learning. Beyond the semantics, the core idea is to make it possible to asnwer questions about the future of your business, based on the analysis of historical data: “which customers are most likely to buy this product?”, “which customers are most likely to churn?”, “which transaction is most likely to be fraudulent?”, “which region is most likely to show strong demand this month?”. This area is sometimes referred to as predictive analytics.

Continue reading “Dataiku’s Series E: Ushering the Era of Everyday AI”

Introducing the MAD (ML, AI, Data) Public Company Index

Today, we are previewing a new public market index – the MAD (for machine learning, AI and data) index.

Readers of this blog know that we have been tracking the data ecosystem since 2012, through annual landscapes (see the 2020 Data & AI Landscape).

Over the last few years, a funny thing happened – some of the small startups we had started tracking grew up, did an IPO and became large public companies.

Not so long ago, public market investors used to say there’s was no good way of “playing” the Big Data and AI trends, due to the lack of public companies in the space. This is less true today.

However, there isn’t much out there in terms of looking at those public companies as a group. For example, see this Seeking Alpha piece, Top 3 Artificial Intelligence ETFs To Consider, where none of the companies listed are actually AI companies.

Hence the idea of the MAD Index. It’s still a small group of companies, but my colleague John Wu and I were curious to see how they fared in public markets, now and going forward.

This is just a start. We anticipate that a number of companies will join this group in the next year or two, and we’re excited to see how this index matures.

Continue reading “Introducing the MAD (ML, AI, Data) Public Company Index”

Data Observability and Pipelines: OpenLineage and Marquez

There’s an inherent tension at the heart of modern data infrastructure. On the one hand, it’s becoming more mission-critical every day, as companies around the world rely on it to run their business. On the other hand, it’s more complex, and potentially brittle, than ever, an “assembly chain” involving multiple tools and repositories.

This tension has led to the emergence of DataOps as a distinct and very active segment. One particularly important area is known as “data lineage“. The concept is basically to monitor data pipelines and understand the journey of data through its various transformations and usages. This makes it possible to fix any issues that happen along the way, and go to the root of data quality, and potentially fairness, issues.

Because data lineage involves many different tools, platforms and companies, it makes sense for those different parts of the ecosystem to collaborate around standard definitions. This is the concept behind OpenLineage, a cross-industry effort involving creators and contributors from key data projects (DBT, Spark, Pandas, etc.), gathered together at the initiative of the founders of Datakin, an SF startup beyond the open source data lineage project Marquez (originally started at WeWork).

At our most recent Data Driven NYC, we had the pleasure of hosting Julien Le Dem, CTO of Datakin. His talk (video below) is very approachable and educational.

Continue reading “Data Observability and Pipelines: OpenLineage and Marquez”

Resilience and Vibrancy: The 2020 Data & AI Landscape

2020 Data and AI Landscape

In a year like no other in recent memory, the data ecosystem is showing not just remarkable resilience but exciting vibrancy.

When COVID hit the world a few months ago, an extended period of gloom seemed all but inevitable.   Yet, as per Satya Nadella, “two years of digital transformation [occurred] in two months”.  Cloud and data technologies (data infrastructure, machine learning / artificial intelligence, data driven applications) are at the heart of digital transformation.  As a result, many companies in the data ecosystem have not just survived, but in fact thrived, in an otherwise overall challenging political and economic context. 

Perhaps most emblematic of this is the blockbuster IPO of Snowflake, a data warehouse provider, which took place a couple of weeks ago and catapulted Snowflake to a $69B market cap company, at the time of writing – the biggest software IPO ever (see our S-1 teardown).  And Palantir, an often controversial data analytics platform focused on the financial and government sector, became a public company via direct listing, reaching a market cap of $22B, at the time of writing (see our S-1 teardown).

Continue reading “Resilience and Vibrancy: The 2020 Data & AI Landscape”

Building a $12B Public Company: In Conversation with Olivier Pomel, CEO, Datadog

By any measure, Datadog is an incredible entrepreneurial success story. The company went from a tiny startup in 2010 that had trouble raising money, to a public company that, at the time of writing, has a market capitalization of $12.5B. It was a pioneer in the category of DevOps and observability, and it’s now a clear leader. With revenues hovering around $350M, it has 1,300 employees across 31 locations around the world.

Perhaps improbably, the founders built the company out of New York, which many people over the years have thought of as a hub for adtech, media and commerce startups only. Along the way, they faced a lot of skepticism: “Whenever we pitched West Coast investors it was sort of seen as a form of mental deficiency to be based in New York and doing infrastructure“, says Olivier. I wrote a few months ago about the significance of the Datadog IPO for the ecosystem and beyond. Ironically, out of the three top public tech companies in New York today, two are infrastructure software companies (Datadog and MongoDB).

Not one for gratuitous self-aggrandizing, Olivier has given surprisingly few interviews over the years, and it was a real treat to sit down with him for a fireside chat in front of a packed house of 350 attendees at our most recent Data Driven NYC.

We had an in-depth conversations and covered a lot of topics.

The first half of our conversation was focused on Datadog itself, starting with a high level overview of the observability and DevOps space to make the discussion approachable by people who don’t know the space.

The second half of the conversation was focused on all sorts of lessons learned along the way of building a major company- sales, marketing, fundraising, etc.

Below is the video. We have also provided a full written transcript to make the content easy to scan through (many thanks to Karissa Domondon for her help with this).

Continue reading “Building a $12B Public Company: In Conversation with Olivier Pomel, CEO, Datadog”

The Power of Open Source: In conversation with Mike Volpi, General Partner, Index Ventures

Our most recent VC guest at Data Driven NYC, Mike Volpi of Index, has had a pretty amazing last couple of years, with three of his venture investments going public:  Zuora, Sonos and Elastic. 

Before becoming a VC, Mike ran Cisco’s routing business where he managed a P&L in excess of $10 billion in revenues, and acquired over 70 companies (note: probably a pretty good way to make a lot of friends in Silicon Valley).

A partner at Index Ventures in San Francisco, Mike invests primarily in infrastructure, open-source and artificial intelligence companies, so he was a perfect guest to have at the event.  In particular, he invested in two prior presenting companies: Confluent and Cockroach Labs (in which FirstMark is also an investor). 

We had a really interesting conversation about open source, AI and venture capital.  Here’s the video below, and l have jotted down a few notes as well, below the fold. 

Notes from the chat:

Continue reading “The Power of Open Source: In conversation with Mike Volpi, General Partner, Index Ventures”

AI’s Trust Problem: In Conversation with Gary Marcus (Video + Book Notes)

Should we be worried about the prospect of AI superintelligence taking over the world?

“In the real world, current-day robots struggle to turn doorknobs, and Teslas driven in ‘Autopilot’ mode keep rear-ending parked emergency vehicles […].   It’s as if people in the fourteenth century were worrying about traffic accidents, where good hygiene might have been a whole lot more helpful”.

This is one of my favorite quotes from “Rebooting AI: Building Artificial Intelligence We Can Trust,” a new book by Gary Marcus – scientist, NYU professor, New York Times bestselling author, entrepreneur – and his co-author Ernest Davis, Professor of Computer Science at the Courant Institute, NYU.

Gary did us a big honor recently: he chose to speak at Data Driven NYC on the evening of the publication of the book.  He also signed a few copies. Our first book launch party!

Particularly if you’re trying to make sense of the still-ongoing hype around AI, including predictions of global gloom, Gary’s book is a fantastic read: a lucid, no-nonsense and occasionally provocative take on the current state of AI, that distills complex concepts into simple ideas, and includes plenty of interesting and often funny anecdotes.

The book builds on Gary’s earlier assessment of deep learning (see Deep Learning: A Critical Appraisal), and advocates for a hybrid approach to AI.

Below is the video of his talk at the event, plus a notes I derived from both the talk and the book.  I’ll keep those brief as the book is worth reading in its entirety.

Continue reading “AI’s Trust Problem: In Conversation with Gary Marcus (Video + Book Notes)”

Part II: Major Trends in the 2019 Data & AI Landscape

Part I of the 2019 Data & AI Landscape covered issues around the societal impact of data and AI, and included the landscape chart itself. In this Part II, we’re going to dive into some of the main industry trends in data and AI. 

The data and AI ecosystem continues to be one of the most exciting areas of technology. Not only does it have its own explosive momentum, but it also powers and accelerates innovation in many other areas (consumer applications, gaming, transportation, etc).  As such, its overall impact is immense, and goes much beyond the technical discussions below.

Of course, no meaningful trend unfolds over the course of just one year, and many of the following has been years in the making. We’ll focus the discussion on trends that we have seen particularly accelerating in 2019, or gaining rapid prominence in industry conversations.

We will loosely follow the order of the landscape, from left to right: infrastructure, analytics and applications.

Continue reading “Part II: Major Trends in the 2019 Data & AI Landscape”

A Turbulent Year: The 2019 Data & AI Landscape

It has been another intense year in the world of data, full of excitement but also complexity. 

As more of the world gets online, the “datafication” of everything continues to accelerate.  This mega-trend keeps gathering steam, powered by the intersection of separate advances in infrastructure, cloud computing, artificial intelligence, open source and the overall digitalization of our economies and lives. 

A few years ago, the discussion around “Big Data” was mostly a technical one, centered around the emergence of a new generation of tools to collect, process and analyze massive amounts of data. Many of those technologies are now well understood, and deployed at scale. In addition, over the last couple of years in particular, we’ve started adding layers of intelligence through data science, machine learning and AI into many applications, which are now increasingly running in production in all sorts of consumer and B2B products.  

As those technologies continue to both improve and spread beyond the initial group of early adopters (FAANG and startups) into the broader economy and world, the discussion is shifting from the purely technical into a necessary conversation around impact on our economies, societies and lives.

We’re just starting to truly get a sense of the nature of the disruption ahead. In a world where data-driven automation becomes the rule (automated products, automated cars, automated enterprises), what is the new nature of work? How do we handle the social impact? How do we think about privacy, security, freedom? 

Meanwhile, the underlying technologies continue to evolve at a rapid pace, with an ever vibrant ecosystem of startups, products and projects, heralding perhaps even more profound changes ahead. In that ecosystem, the year was characterized by the early innings of a long expected consolidation, and perhaps a passing of the guard from one era to another as early technologies are starting to give way to the next generation.

Continue reading “A Turbulent Year: The 2019 Data & AI Landscape”

Data, AI & Hedge Funds: In Conversation with Matt Ober, Chief Data Scientist at Third Point

 

The hedge fund world has been evolving dramatically over the last few years.

Just like in other industries, software, data and AI/ML have been playing an increasingly important, and disruptive, role.  Many hedge funds have been scrambling to embrace this evolution – not just to gain an edge, but also to avoid becoming extinct.

Certainly, quantitative hedge funds have been making heavy use of software and data for a while now.  The “quant” funds rely upon algorithmic or systematic strategies for their trades – meaning that they generally employ  automated trading rules rather than discretionary (human) ones, and they will trade tens or hundreds of assets simultaneously.

But another big part of the industry, the “fundamental” hedge funds, had been operating very differently.  Those funds will perform a bottoms up analysis on individual securities to  value them in the marketplace and assess whether they are  “undervalued” and “overvalued” assets.  They’ll often have a much more concentrated portfolio.

In part because the entire hedge fund industry has been performing generally poorly recently (years of performance trailing the stock market), there’s been mounting pressure on hedge funds to evolve rapidly, particularly fundamental ones.

A couple of years ago, Third Point made a big splash when they hired Matt Ober, who was 32 at the time, to become their Chief Data Scientist.  Dan Loeb, the billionaire founder of Third Point, was a prime example of a fund manager who had reached tremendous success through a fundamental approach.  His efforts to hire Matt away from his previous employer and make him Third Point’s head quant was widely viewed as a sign of the times. Continue reading “Data, AI & Hedge Funds: In Conversation with Matt Ober, Chief Data Scientist at Third Point”

AI & Blockchain: An Introduction

 

At the kind invitation of Rob May and the Botchain team, I had the opportunity recently to keynote Brains and Chains, an interesting conference in New York exploring  the intersection of artificial intelligence and blockchain.

This is both an exciting and challenging topic, and the goal of my talk was to provide a broad introduction to kick things off, and frame the discussion for the rest of the day: discuss why the topic matters in the first place, and highlight the work of some interesting companies in the space.

Below is the presentation, with some added commentary when relevant. Scroll to the very bottom for a SlideShare widget, if you’d like to flip through the slides.

Continue reading “AI & Blockchain: An Introduction”

Great Power, Great Responsibility: The 2018 Big Data & AI Landscape

 

It’s been an exciting, but complex year in the data world.  

Just as last year, the data tech ecosystem has continued to “fire on all cylinders”.  If nothing else, data is probably even more front and center in 2018, in both business and personal conversations.  Some of the reasons, however, have changed.

On the one hand, data technologies (Big Data, data science, machine learning, AI) continue their march forward, becoming ever more efficient, and also more widely adopted in businesses around the world.   It is no accident that one of the key themes in the corporate world in 2018 so far has been “digital transformation”.  The term may feel quaint to some (“isn’t that what’s been happening for the last 25 years?”), but it reflects that many of the more traditional industries and companies are now fully engaged into their journey to become truly data-driven.   

On the other hand, a much broader cross-section of the public has become aware of the pitfalls of data. Whether it is through the very public debate over the risks of AI, the Cambridge Analytica scandal, the massive Equifax data breach, GDPR-related privacy discussions or reports of growing government surveillance in China, the data world has started revealing some darker, scarier undertones.

Continue reading “Great Power, Great Responsibility: The 2018 Big Data & AI Landscape”

Frontier AI: How far are we from artificial “general” intelligence, really?

Some call it “strong” AI, others “real” AI, “true” AI or artificial “general” intelligence (AGI)… whatever the term (and important nuances), there are few questions of greater importance than whether we are collectively in the process of developing generalized AI that can truly think like a human — possibly even at a superhuman intelligence level, with unpredictable, uncontrollable consequences.

Continue reading “Frontier AI: How far are we from artificial “general” intelligence, really?”

Firing on All Cylinders: The 2017 Big Data Landscape

 

It feels good to be a data geek in 2017.

Last year, we asked “Is Big Data Still a Thing?”, observing that since Big Data is largely “plumbing”, it has been subject to enterprise adoption cycles that are much slower than the hype cycle.  As a result, it took several years for Big Data to evolve from cool new technologies to core enterprise systems actually deployed in production.

In 2017, we’re now well into this deployment phase.  The term “Big Data” continues to gradually fade away, but the Big Data space itself is booming.  We’re seeing everywhere anecdotal evidence pointing to more mature products, more substantial adoption in Fortune 1000 companies, and rapid revenue growth for many startups.

Meanwhile, the froth has indisputably moved to the machine learning and artificial intelligence side of the ecosystem. AI experienced in the last few months a “Big Bang” in collective consciousness not entirely dissimilar to the excitement around Big Data a few years ago, except with even more velocity.

2017 is also shaping up to be an exciting year from another perspective: long-awaited IPOs.  The first few months of this year have seen a burst of activity for Big Data startups on that front, with warm reception from the public markets.

All in all, in 2017 the data ecosystem is firing on all cylinders.  As every year, we’ll use the annual revision of our Big Data Landscape to do a long-form, “State of the Union” roundup of the key trends we’re seeing in the industry.

Let’s dig in.

Continue reading “Firing on All Cylinders: The 2017 Big Data Landscape”

The New Gold Rush? Wall Street Wants your Data

 

trading-data

 
A few months ago, Foursquare achieved an impressive feat by predicting, ahead of official company results, that Chipotle’s Q1 2016 sales would be down nearly 30%. Because it captures geo-location data from both check-ins and visits through its apps, Foursquare was able to extrapolate foot-traffic stats that turned out to be very accurate predictors of financial performance.
 
That a social media company could be building a data asset of immense value to Wall Street is part of an accelerating trend known as “alternative data”. As just about everything in our lives is getting sensed and captured by technology, financial services firms have been turning their attention to startups, with the hope of mining their data to extract the type of gold nuggets that will enable them to beat the market.
 
Could working with Wall Street be a business model for you?
 
The opportunity is open to a wide range of startups.  Many tech companies these days generate an interesting “data exhaust” as a by-product of their core activity.  If your company offers a payment solution, you may have interesting data on what people buy. A mobile app may accumulate geo-location data on where people shop or how often they go to the movies.  A connected health device may know who gets sick when and where.  A commerce company may have data on trends and consumer preferences. A SaaS provider may know what corporations purchase, or how many employees they hire, in which region. And so on and so forth.
 
At the same time, this is a tricky topic, with a lot of misunderstandings. The hedge fund world is very different from the startup world, and a lot gets lost in translation.  Rumors about hedge funds paying “millions” for data sets abound, which has created a distorted perception of the size of the financial opportunity.  A fair number of startups I speak with do incorporate idea of selling data to Wall Street into their business plan and VC pitches, but how that would work exactly remains generally very fuzzy.
 
If you’re one of the many startups sitting on a growing data asset and trying to figure out whether you can make money selling it to Wall Street, this post is for you: a deep dive to provide context, clarify concepts and offer some practical tips.
 

Continue reading “The New Gold Rush? Wall Street Wants your Data”