In a tech startup industry that loves its shiny new objects, the term “Big Data” is in the unenviable position of sounding increasingly “3 years ago”. While Hadoop was created in 2006, interest in the concept of “Big Data” reached fever pitch sometime between 2011 and 2014. This was the period when, at least in the press and on industry panels, Big Data was the new “black”, “gold” or “oil”. However, at least in my conversations with people in the industry, there’s an increasing sense of having reached some kind of plateau. 2015 was probably the year when the cool kids in the data world (to the extent there is such a thing) moved on to obsessing over AI and its many related concepts and flavors: machine intelligence, deep learning, etc.
Beyond semantics and the inevitable hype cycle, our fourth annual “Big Data Landscape” (scroll down) is a great opportunity to take a step back, reflect on what’s happened over the last year or so and ponder the future of this industry.
In 2016, is Big Data still a “thing”? Let’s dig in.
In the furiously competitive world of tech startups, where good entrepreneurs tend to think of comparable ideas around the same time and “hot spaces” get crowded quickly with well-funded hopefuls, competitive moats matter more than ever. Ideally, as your startup scales, you want to not only be able to defend yourself against competitors, but actually find it increasingly easier to break away from them, making your business more and more unassailable and leading to a “winner take all” dynamic. This sounds simple enough, but in reality many growing startups, including some well-known ones, experience exactly the reverse (higher customer acquisition costs resulting from increased competition, core technology that gets replicated and improved upon by competitors that started later and learned from your early mistakes, etc.).
While there are various types of competitive moats, such as a powerful brand (Apple) or economies of scale (Oracle), network effects are particularly effective at creating this winner takes all dynamic, and have been associated with some of the biggest success stories in the history of the Internet industry.
Network effects come in different flavors, and today I want to talk about a specific type that has been very much at the core of my personal investment thesis as a VC, resulting from my profound interest in the world of data and machine learning: data network effects.
A few days ago, I was invited to speak at a Yale Entrepreneurship Breakfast about about one of my favorite areas of interest, Artificial Intelligence. Here are the slides from the talk — a primer on how AI rose from of the ashes to become a fascinating category for startup founders and venture capitalists. Very much a companion to my earliest post about our investment in x.ai. Many thanks to my colleague Jim Hao, who worked with me on this presentation.
AI is experiencing an astounding resurrection. After so many broken promises, the term “artificial intelligence” had become almost a dirty word in technology circles. The field is now rising from the ashes. Researchers who had been toiling away in semi-obscurity over the last few decades have suddenly become superstars and have been aggressively recruited by the largest Internet companies: Yann LeCun (see his recent talk at our Data Driven NYC event here) by Facebook; Geoff Hinton by Google; Andrew Ng by Baidu. Google spent over $400 million to acquire DeepMind, a 2 year old secretive UK AI startup. The press and social media are awash with thoughts on AI. Elon Musk cautions us against its perils.
What’s different this time? As Irving Wladawsky-Berger pointed out in a Wall Street Journal article, “a different AI paradigm emerged. Instead of trying to program computers to act intelligently–an approach that hadn’t worked because we don’t really know what intelligence is– AI now embraced a statistical, brute force approach based on analyzing vast amounts of information with powerful computers and sophisticated algorithms.” In other words, the resurgence of AI is partly a child of Big Data, as better algorithms (in particular, what’s known as “deep learning”, pioneered by LeCun and others) have been enabled by larger than ever datasets and the ability to process those datasets at scale at reasonable cost.
It’s been almost two years since I took a first stab at charting the booming Big Data ecosystem, and it’s been a period of incredible activity in the space. An updated chart was long overdue, and here it is:
(click on the arrows at the bottom right of the screen to expand)
A few thoughts on this revised chart, and the Big Data market in general, largely from a VC perspective:
In the eye of some entrepreneurs and venture capitalists, the Bloomberg terminal is a bit of an anomaly, perhaps even an anachronism. In the era of free information on the Internet and open source Big Data tools, here’s a business that makes billions every year charging its users to access data that it generally obtains from third parties, as well as the tools to analyze it. You’ll hear the occasional jab at its interface as reminiscent of the 1980s. And at a time of accelerating “unbundling” across many industries, including financial services, the Bloomberg terminal is the ultimate “bundling” play: one product, one price, which means that that the average user uses only a small percentage of the terminal’s 30,000+ functions. Yet, 320,000 people around the world pay about $20,000 a year to use it.
The field of bioinformatics is having its “big bang” moment. Of course, bioinformatics is not a new discipline and it has seen various waves of innovations since the 1970s and 1980s, with its fair share of both exciting moments and disappointments (particularly in terms of linking DNA analysis to clinical outcomes). But there is something special happening to the industry right now, accelerated by several factors:
Thomson Reuters CTO James Powell runs a great series of podcasts where he interviews people in the technology world about topics of relevance to his organization. I was fortunate to be invited to speak with James about the Internet of Things and Big Data, and it was a lot of fun. Below is the podcast, uploaded on SoundCloud. Thanks to James Powell and Dan Cost for the opportunity.
1) A little while ago, I changed the name of the data event I’ve been organizing from “NYC Data Business Meetup” to “Data Driven NYC”. I originally started the event mostly as experiment, and didn’t give much thought to branding (so yeah, that was a terrible name). The event has now grown quite a bit (over 3.700 members as I write this), so it was time for a better name; also at this stage, it feels more like a community than “just” a meetup, so I wanted a name that reflected this reality.
2) Back in June, I launched a new community called “Hardwired NYC”. It covers startups, technologies and products at the intersection of the physical and digital worlds, including topics like 3D printing, Internet of Things, wearable computing, etc. I developed a strong interest in those areas through my involvement in the Big Data world – the Internet of Things, in particular, is deeply intertwined with Big Data (the proliferation of sensors has been contributing to the Big Data “problem”; equally the Internet of Things will be highly dependent on Big Data technologies if it is to deliver on its promise).
3) As Hardwired NYC is taking off fast (more than 700 members after just two events), I figured that both events/communities should have their own website with full video libraries, including for people who don’t live in New York and are interested in the content. So, with the great help of my FirstMark colleague Dan Kozikowski, I’m launching this week www.datadrivennyc.com and www.hardwirednyc.com. Both sites have a “Watch” section where, from now on, I will post pictures and videos of events (as opposed to this blog).
A few weeks ago, I was invited to do a couple of guest lectures at NYU (as part of the excellent “Ready, Fire, Aim” entrepreneurship class that Lawrence Lenihan, now my partner at FirstMark, has been doing for a while there) and at The New School (as part of a Big Data course organized by Debra Anderson and Greta Knutzen). Thought I’d share the slide deck I had prepared for those classes. Very much a Big Data 101 class for a college-level audience that had had little or no exposure to the key concepts prior to the class.
Our February NYC Data Business Meetup was focused on the intersection of data and finance (both market and consumer finance). Quantopian, Plaid and ZestFinance presented.
We also had a great panel presenting the customer perspective on Big Data (hype vs. reality), from a financial institutions’ viewpoint, with the following speakers: Mike Simone (Global Head of CitiData Platform Engineering), Emile Werr (Head of Enterprise Data Architecture, NYSE EuroNext) and Raj Patil (up until recently Data innovation CTO at UBS, now an entrepreneur). Unfortunately, due to standard policy at some of those institutions, we can’t publicly post the video of the panel.
The December NYC Data Business Meetup was focused on big data infrastructure companies, with the co-founders of Sqrrl, Infochimps and MemSQL presenting to a full house. We started the evening with a presentation by prominent data scientist Joseph Turian.
The November NYC Data Business Meetup was focused on “vertical-specific” applications of big data – startups leveraging the big data stack to offer new solutions to specific industries, such as finance and government (Recorded Future), the legal industry (Lex Machina), energy (DataMarket, although it offers data sets for other industries as well) and sports (numberFire).