Note: This post appeared on VentureBeat, here.
It’s been almost two years since I took a first stab at charting the booming Big Data ecosystem, and it’s been a period of incredible activity in the space. An updated chart was long overdue, and here it is:
(click on the arrows at the bottom right of the screen to expand)
A few thoughts on this revised chart, and the Big Data market in general, largely from a VC perspective:
The field of bioinformatics is having its “big bang” moment. Of course, bioinformatics is not a new discipline and it has seen various waves of innovations since the 1970s and 1980s, with its fair share of both exciting moments and disappointments (particularly in terms of linking DNA analysis to clinical outcomes). But there is something special happening to the industry right now, accelerated by several factors:
Thomson Reuters CTO James Powell runs a great series of podcasts where he interviews people in the technology world about topics of relevance to his organization. I was fortunate to be invited to speak with James about the Internet of Things and Big Data, and it was a lot of fun. Below is the podcast, uploaded on SoundCloud. Thanks to James Powell and Dan Cost for the opportunity.
Some updates on the event/community front:
1) A little while ago, I changed the name of the data event I’ve been organizing from “NYC Data Business Meetup” to “Data Driven NYC”. I originally started the event mostly as experiment, and didn’t give much thought to branding (so yeah, that was a terrible name). The event has now grown quite a bit (over 3.700 members as I write this), so it was time for a better name; also at this stage, it feels more like a community than “just” a meetup, so I wanted a name that reflected this reality.
2) Back in June, I launched a new community called “Hardwired NYC”. It covers startups, technologies and products at the intersection of the physical and digital worlds, including topics like 3D printing, Internet of Things, wearable computing, etc. I developed a strong interest in those areas through my involvement in the Big Data world – the Internet of Things, in particular, is deeply intertwined with Big Data (the proliferation of sensors has been contributing to the Big Data “problem”; equally the Internet of Things will be highly dependent on Big Data technologies if it is to deliver on its promise).
3) As Hardwired NYC is taking off fast (more than 700 members after just two events), I figured that both events/communities should have their own website with full video libraries, including for people who don’t live in New York and are interested in the content. So, with the great help of my FirstMark colleague Dan Kozikowski, I’m launching this week www.datadrivennyc.com and www.hardwirednyc.com. Both sites have a “Watch” section where, from now on, I will post pictures and videos of events (as opposed to this blog).
A few weeks ago, I was invited to do a couple of guest lectures at NYU (as part of the excellent “Ready, Fire, Aim” entrepreneurship class that Lawrence Lenihan, now my partner at FirstMark, has been doing for a while there) and at The New School (as part of a Big Data course organized by Debra Anderson and Greta Knutzen). Thought I’d share the slide deck I had prepared for those classes. Very much a Big Data 101 class for a college-level audience that had had little or no exposure to the key concepts prior to the class.
Our February NYC Data Business Meetup was focused on the intersection of data and finance (both market and consumer finance). Quantopian, Plaid and ZestFinance presented.
We also had a great panel presenting the customer perspective on Big Data (hype vs. reality), from a financial institutions’ viewpoint, with the following speakers: Mike Simone (Global Head of CitiData Platform Engineering), Emile Werr (Head of Enterprise Data Architecture, NYSE EuroNext) and Raj Patil (up until recently Data innovation CTO at UBS, now an entrepreneur). Unfortunately, due to standard policy at some of those institutions, we can’t publicly post the video of the panel.
Here are the videos, in order of appearance (we also had a great “customer panel
Bloomberg App Portal:
The December NYC Data Business Meetup was focused on big data infrastructure companies, with the co-founders of Sqrrl, Infochimps and MemSQL presenting to a full house. We started the evening with a presentation by prominent data scientist Joseph Turian.
Here are the videos:
Joseph Turian, “How to do AI in 2013”
Oren A. Falkowitz, Co-Founder & CEO, Sqrrl
Dhruv Bansal, Co-Founder & Chief Science Officer, Infochimps
Eric Frenkiel, Co-Founder & CEO, MemSQL
And here are a few pics (photo credit: Shivon Zilis):
The November NYC Data Business Meetup was focused on “vertical-specific” applications of big data – startups leveraging the big data stack to offer new solutions to specific industries, such as finance and government (Recorded Future), the legal industry (Lex Machina), energy (DataMarket, although it offers data sets for other industries as well) and sports (numberFire).
Here are the videos:
Christopher Ahlberg, CEO, Recorded Future:
Josh Becker, CEO, Lex Machina:
Hjálmar Gíslason, CEO, DataMarket:
Nik Bonaddio, CEO, numberFire:
Here are the videos from the NYC Data Business Meetup that was held on October 23, 2012, in order of appearance:
Jeff Carr, COO, Precog
Max Yankelevich, co-founder, CrowdComputing Systems
Roger Ehrenberg, Founder and Managing Partner, IA Ventures; Ping Li, General Partner, Accel Partners; Matt Ocko, Co-Founder and Partner, Data Collective (from left to right):
So here we are again. My colleague Shivon and I had made a first attempt at making sense of the rapidly evolving big data ecosystem back in June. Based on some very helpful feedback from readers of this blog and others, a number of additional meetings with interesting startups and more in depth research, we’ve come up with this second version.
- It’s still a work in progress (and will presumably always be, that’s the nature of the beast)
- It’s even more crowded than the first time around, which reflects the incredible vitality of the big data space
- We’ve created some new subcategories such as NoSQL/NewSQL and analytics services (reflecting the reality that, for the time being, the last mile of data analysis is very much performed by humans)
- We have the occasional company that appears in different categories (Infochimps or Autonomy for example)
- We have learned more about companies that were already on the first version of the chart, and have positioned them differently. For example, Metamarkets now falls in the “Cross Infrastructure/Analytics” category as they offer a stack that includes a data store (Druid), predictive analytics and visualization. Another example is Collective[i] – they have built an entire proprietary big data stack from the ground up, that includes infrastructure, analytics and applications – making the company a rare example of an “Application Service Provider”.
Our goal is to continue updating this chart from time to time, and perhaps make it evolve visually, as we’ve probably reached the limits of what we can reasonably fit on one slide. It was suggested that we try to visually distinguish on premise offerings vs. cloud based solutions, which we may try to do.
To enlarge, click on the arrows at the bottom right of the chart.
Comments, thoughts, questions? Please add to the comments section.
Here are the videos and some pictures (scroll down) of the NYC Data Business Meetup held on September 25, 2012
In order of appearance:
1) Rick Smolan told us about his fascinating new project, the “Human Face of Big Data” – see the NY Times coverage here: http://nyti.ms/TO5MDd.
2) Mortar (presenter: K Young, CEO). Mortar (www.mortardata.com) provides a platform-as-a-service for Hadoop. They take care of all of the necessary infrastructure (via AWS) and allow any software engineer to run jobs on Hadoop using Apache Pig and Python without special training.
3) Datadog (presenter: Alexis Le Quoc, co-founder). Datadog (www.datadoghq.com) is a service for IT, Operations and Development teams who write and run applications at scale, and want to turn the massive amounts of data produced by their apps, tools and services into actionable insight. Datadog helps software developers and web ops understand their IT Data by putting it all in context.
4) We finished with a fireside chat with Dwight Merriman, CEO and co-founder, 10Gen. 10Gen (www.10gen.com) develops MongoDB, and offers production support, training, and consulting for the open source database. Dwight is one of the original authors of MongoDB. In 1995, Dwight co-founded DoubleClick (acquired by Google for $3.1 billion) and served as its CTO for ten years. Dwight was the architect of the DoubleClick ad serving infrastructure, DART, which serves tens of billions of ads per day. Dwight is co-founder, Chairman, and the original architect of Panther Express (now part of CDNetworks), a content distribution network (CDN) technology that serves hundreds of thousands of objects per second. Dwight is also a co-founder and investor in BusinessInsider.com and Gilt Groupe.
Here are the some videos, slides and pics from the most recent NYC Data Business Meetup. The videos are unfortunately not of the greatest quality, but are good enough to watch.
Also, note to self: make sure that our audience of 200+ sits closer to the stage, so that the room doesn’t look tragically empty on camera (rookie mistake)!
In order of appearance:
1) Todd Papaioannou, CEO, Continnuuity, a stealth big data startup, based in Palo Alto, CA and backed by Andreessen Horowitz, Battery Ventures, Data Collective and a number of high profile angels. Todd was previously Chief Cloud Architect for Yahoo.
2) Neil Capel, CEO, and Daniel Krasner, Chief Data Scientist, Sailthru, a New York based startup backed by RRE, AOL Ventures, Lerer Ventures, DFJ Gotham, Thrive Capital, Metamorphic, etc. Sailthru provides fully automated, 1:1 email and onsite recommendations using a unique behavioral targeting platform. Sailthru helps brands cut through the clutter and build trust with their customers by recognizing and acting upon their individual interests. Sailthru’s technology creates individual user profiles associated with each person’s email address and online behavior. Sailthru’s algorithms gauge each individual user’s intent and match appropriate content and frequency of email communications such that every email is tailored to the unique user. That means they send as many permutations of an email as there are recipients. All simultaneously, all automated and all in real time.
3) Dennis R. Mortensen, CEO and Jeroen Janssens, Data Scientist,Visual Revenue, a New York based startup backed by Lerer Ventures, SV Angel, IA Ventures and Softbank. Visual Revenue increases front page performance for online media organizations. Their platform provides Editors with actionable, real-time recommendations on what content to place in what position right now and for how long. Visual Revenue’s predictive analytics technology allows media organizations to proactively manage the cost of exposing a piece of content on a front page, whilst maximizing the return they expect from promoting it.
4) Panel discussion and Q&A with the audience