The State Of Big Data in 2014: a Chart

Note: This post appeared on VentureBeat, here.

It’s been almost two years since I took a first stab at charting the booming Big Data ecosystem, and it’s been a period of incredible activity in the space. An updated chart was long overdue, and here it is:

(click on the arrows at the bottom right of the screen to expand)

A few thoughts on this revised chart, and the Big Data market in general, largely from a VC perspective:

Getting crowded: Entrepreneurs have flocked to the space, VCs have poured money into promising startups, and as a result, the market is starting to get crowded. Certain categories like databases (whether NoSQL or NewSQL) or social media analytics feel ripe for consolidation or some sort of shakeout (which may have already started in social analytics with Twitter’s acquisitions of BlueFin and GNIP). While there will be always room for great new startups, it seems that a lot of the early bets in the broader infrastructure and analytics segments have been made at this stage, and the bar for success is getting higher – which doesn’t mean that VC money will stop pouring in. In terms of this specific industry chart, we’ve clearly reached the limit of how many companies we can fit one page. I’m sure there are a number of great companies we either missed or didn’t have enough space to include – apologies in advance to those, and I’d love to hear people’s thoughts and suggestions in the comments section about who else should be included.

Still early: Overall, we’re still in the early innings of this market. Over the last couple of years, some promising companies failed (for example: Drawn to Scale), a number saw early exits (for example: Precog, Prior Knowledge, Lucky Sort, Rapleaf, Nodeable, Karmasphere, etc.), and a handful saw more meaningful outcomes (for example: Infochimps, Causata, Streambase, ParAccel, Aspera, GNIP, BlueFin labs, BlueKai). Meanwhile, some companies seem to be reaching significant scale, and have raised spectacular amounts of money (for example, MongoDB has now raised over $230M, Palantir almost $900M and Cloudera $1B). But overall, we’re still early in the curve in terms of successful IPOs (Splunk or Tableau notwithstanding) and large exits, although the big companies are getting more acquisitive in the space (Oracle with BlueKai, IBM with Cloudant). In many segments, startups and large companies are jockeying for position and no obvious leader has emerged.

Hype, meet reality: A few years into a period of incredible hype, is Big Data still a thing? While less press worthy, the next couple of years are going to be hugely important for this market, as corporations start moving Big Data projects from experimentation to full production. While they will lead to rapidly increasing revenues for some Big Data vendors, those deployments will also test whether Big Data can truly deliver on its promise. Meanwhile, the fundamental need for Big Data technology keeps increasing, as the deluge of data keeps accelerating, powered in part by the rapidly emerging Internet of Things industry.

Infrastructure: Hadoop seems to have solidified its position as the cornerstone of the entire ecosystem, but there are still a number of competing distributions – this will probably need to evolve. Spark, another open source framework that builds on top of the Hadoop Distributed File System, is getting a lot of buzz right now because it promises to fill in the places where Hadoop has been weak, namely interactive speeds and good programming interfaces (and early signs seem to point to fulfilling that promise). Some themes (for example, in memory or real time) continue to be top of mind; others are appearing (for example, there’s a whole new generation of data transformation/munging/wrangling tools, including Trifacta, Paxata and DataTamer). Another key discussion is whether enterprise data will truly move to the cloud (public or private), and if so, how quickly. Many will argue that Fortune 500 companies will keep their data (and the software to process it) on premise for years to come; a generation of Hadoop-in-the-cloud startups (Qubole, Mortar, etc.) will argue that all data is moving to the cloud long term.

Analytics: This has been a particularly active segment of the Big Data ecosystem in terms of startup and VC activity. From spreadsheet type interfaces to timeline animations and 3D visualizations, startups offer all sorts of different analytical tools and interfaces, and the reality is that different customers will have different type of preferences, so there’s probably room for a number of vendors. Go to market strategies differ as well – some startups focus on selling tools to data scientists, a group that is still small but growing in numbers and budget. Others adopt the opposite approach and sell automated solutions targeting business users, bypassing data scientists altogether.

Applications: As predicted, the action has been slowly but surely moving to the application layer of Big Data. The chart highlights a number of exciting startups that are fundamentally powered by Big Data tools and techniques (certainly not an exhaustive list). Some offer horizontal applications – for example, Big Data powered marketing, CRM tools or fraud detection solutions. Others use Big Data in vertical specific applications. Finance and ad tech were always early leaders in adopting Big Data, years before it was even called Big Data. Gradually, the use of Big Data is spreading to more industries, such as healthcare and biotech (particularly in genomics) or education. This is only the beginning.

Many thanks for my FirstMark colleague Sutian Dong for doing a lot of the heavy lifting on this chart. My former colleague Shivon Zilis of Bloomberg Beta contributed immensely to prior versions of this chart.


15 thoughts on “The State Of Big Data in 2014: a Chart”

  1. Matt,
    This is great and certainly more inclusive than some other ones out there. It will be helpful to add some service provides in Analytics space to the chart. There are way too many big data/analytics services company and they are shaping the adoption of technology.

    Great effort! Thanks.

  2. Matt; Great update, HP Autonomy is missing in Social Analytics (HP Explore, e.g. multi-channel sentiment analysis), Unstructured Data, Location/People/Events, Machine Learning (HP IDOL which ingests/indexes unstructured data from multiple data repositories in multiple file formats, and has eduction/entity extraction, geospatial, people profiling, and machine learning through Bayesian and Shannon Information Theory algorithms), also depending on how you define Log Analysis, Statistical Computing, and Data Visualization, IDOL has major functions that can play significant part in all those areas. Also one or two, depending on how you define, areas that you are missing, is Video and Audio Analytics (Multi-media), which IDOL powers (HP Broadcast Monitoring, HP Surveillance, and IDOL’s audio processing capabilities: speech-to-text, sentiment analysis, speaker identification, etc.)

    1. Ok thanks Bill, will be sure to follow your guidance and put an Autonomy logo in every single category in the next version 🙂 (or maybe just one giant Autonomy logo covering the entire chart)

  3. Hello Matt

    I am doing a Porters 5 Forces Analysis on the Big Data Landscape as part of a University seminar and focus on Infrastructure, Analytics and Applications. Do you have any thoughts on that? How would you range the profitability to invest in these three areas on a scale from low to high.

    I know your time is precious but it would be great to get some expert insights.

    Kind regards,

  4. Love your work on the Big Data Landscape. Wanted to suggest a different placement for Ayasdi, since we don’t fit in the Data Visualization bucket. More appropriate to put us in a category of “Deep Analytics”, based on the types of deals we’re closing today, and the machine learning technology that we offer. This company was founded by Stanford mathematics professor and PhD students. We offer a new technology known as “Topological Data Analysis” which leverages a branch of mathematics known as Topology, which has never been applied to analytic problems previously by any company. I hesitate to categorize us as purely “Machine Learning”, because we provide end-to-end analytic application solutions for healthcare, pharma, energy, financial services, that covers from data ingest, through deep analysis to results visualization.

  5. Matt,
    I love what you are doing here and currently using your big data landscape for research purposes for my internship.I’m trying to wrap my head around the logic you used when creating the landscape and if there was a particular organization method you used. Any guidance on your organization efforts will be much appreciated,thank you for your time.

  6. Great research and synthesis of this huge information on Big Data Landscape. Is there a more comprehensive report that I can purchase?

  7. Love this! A dimension that can be really helpful to some is which vendors cover multiple categories, especially in an integrated fashion. There are massive synergies and economies to be had in a cohesive/federated Big Data solution. For example, Oracle is a very heavy player in every category here, and all components of the stack, from infrastructure to analytics to applications are designed to work together.

Leave a Reply

Your email address will not be published. Required fields are marked *