A chart of the big data ecosystem

My colleague Shivon Zilis has been obsessed with the Terry Kawaja chart of the advertising ecosystem for a while, and a few weeks ago she came up with the great idea of creating a similar one for the big data ecosystem.  Initially, we were going to do this as an internal exercise to make sure we understood every part of the ecosystem, but we figured it would be fun to “open source” the project and get people’s thoughts and input.

So here is our first attempt.

A few things became apparent very quickly:

1) Many companies don’t fall neatly into a specific category

2) There’s only so many companies we can fit on the chart — subcategories as NoSQL or advertising applications, for example, would almost deserve their own chart.

3) The ecosystem is evolving so quickly that we’re going to need to update the chart often – companies evolve (e.g., Infochimps), large vendors make aggressive moves in the space (VMWare with Serengeti and the Citas acquisition)

What do you think? (click on the bottom right to expand)

52 thoughts on “A chart of the big data ecosystem”

  1. Yes ! That was badly needed ! I would add SAP in cross infrastructure / analytics category (in this context, specially because of their solution HANA = real-time, big data).

    1. HANA isn’t truly a Big Data offering since they are in-memory and limited to only 1TB as a result.

  2. Upon first glance, you may consider adding Pervasive Software, Cirro, and Kitenga to Analytics Solutions, FeedZai and ParStream to Real-Time, IBM Infosphere BigInsights and Greenplum HD/MR to Hadoop Related, Actuate and Quantum 4D to Data Visualization. Will suggest more later.


  3. Hi Matt & Shivon, Dave Feinleib for Forbes did something similar recently http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/ but yours is by far more comprehensive. Well done. Two things:
    1) I found Todd P’s breakdown of the Big Data Landscape quite interesting: Infrastructure/Plumbing, Dev/Mgmt Tools, Analytics & Apps. Some of the Mgmt Tools are under Infrastructure in your schema.
    2) Search or Information Access seems to be missing. We hope you’ll add Q-Sensei in that box. Thanks!

    1. 1) Ah, that’s true, Todd Papaioannou did come up with that breakdown… mmm, let’s see if we can fit that in, space-wise.
      2) As to search, who else would you put in that category, that’s specific enough to Big Data? Elastic Search?
      As to the Forbes chart, yes, I know… we had been working on this for weeks on and off, but Dave beat us to it!

  4. Great start to the ecosystem. I would add the following: Cross channel marketing providers like Axciom, Epsilon, Experian, Responsys, CheetahMail, Exact Target, Alterian, etc. They store marketing data like transactional, loyalty, web, social, etc. The data is modeled and used to execute marketing programs. I would also include DMPs- Blue Kai, Aggregate Knowledge, Turn, etc. WebAnalytics- Adobe, IBM/Coremetrics, etc.

      1. Definitely data sources. They also build and host pretty large databases for B2C marketing companies so they could also fall under Applications/Marketing.

  5. Good stuff — charts like these are immensely helpful even if you sometimes can’t fit everyone in their right place. I know I swear by the Lumascape (and it sometimes haunts my dreams).

    You’re missing SAS in the analytics, publisher tools (with the aiMatch acquisition), and cross infrastructure categories. SAS rolled out high performance analytics and visual analytics for exploration of big data sets, amongst other products.

  6. Thanks for putting this together. With such a broad landscape it’s difficult to capture all the key players. MarkLogic is missing from the infrastructure group. We’re an enterprise software company powering over 500 of the world’s most critical Big Data Applications.

    1. Thanks Denise, yes, that’s an oversight – where would you put MarkLogic, though? NoSQL? But it existed long before NoSQL companies appeared, right?

      1. No worries, with so many players having recently entered the Big Data Landscape it’s gotten to be a very crowded sector, as your chart clearly shows. You are correct that MarkLogic was a NoSQL database solving Big Data issues for clients long before the term was popular.

  7. Hi Matt,
    Thanks to BV, Shivon and you for doing this.
    Companies I don’t see (some of these might be actually be a big, maybe huge, stretch or not fit your wiser criteria) that come to mind are:

    Magnetic – look to go public just three year out of the blocks
    C3 Metrics – very powerful attribution models cutting through mountains of well accepted myth.
    VisibleMeasures – I can see why vm wouldn’t seem like big data, but video on the internet is big and very few people actually understand the punch, breadth and impact of VisibleMeasures capabilities.
    GE Software’s Silicon Valley Industrial Internet
    MyCityWay – I’m biased to anyone that produces accurate meaningful subway realtime info. They’re improving.
    Ensequence – interactive TV will tip scales imho
    SAP Hana
    Dtex Systems – when Dtex looks at big data, people get fired.
    Glue Networks
    Lookingglass – these guys looked at big data and found very bad guys hidden within good guy domains

    Best & cheers,

    1. Thanks a lot Sean – not sure if we can fit all of these in the next iteration, but that’s very helpful feedback. There are a couple of companies in there that hadn’t come on my radar.

  8. Also, missing beyond SAP’s Hana DB is a different subcategory altogether: eDiscovery or what I deem forensic analytics. The ability to datamine 3 million emails, legal, court, and brief docs in the law industry. It’s changing the way legal discovery has been conducted.

    1. Yes, nice one — eDiscovery is definitely big data. The Bloomberg Vault product (compliance/eDiscovery solution) contains… 56 billion emails.

  9. This is great Matt. Thanks! only suggestion I had was adding a vertical focus somehow to indicate the specific industry sectors addressed by these companies.

    1. If you are to answer the Grids for each industry vertical, you must reach out to experts within that sector who already understand the lay of the land. My experience, and my company’s focus, is the Architecture-Engineering-Construction (AEC) industry. There’s a paucity of analytics in the industry, because it’s stuck in the legacy past.

  10. Matt,

    Great landscape. Putting these together is always hard.

    For the MPP Database layer, please add Calpont InfiniDB.

    InfiniDB is a “pure” MPP column-store, so it’s significantly faster and more scalable than most of the other MPP technologies on the slide.

  11. Do you have access to the latest Gartner Magic Quadrants for BI and DWDMS? If not I could give you access. Contact me via email.

  12. While you have Vertica, you are missing a big part of HP’s big data solutions, e.g. Autonomy. http://www.autonomy.com/content/News/Releases/2012/0604a.en.html
    IDOL 10 (Intelligent Data Operating Layer) is is a single processing layer that enables organizations to extract meaning and act on all forms of information, including audio, video, social media, email and web content, as well as structured data such as customer transaction logs and machine-based sensor data (http://idol.autonomy.com/). It provides the platform for solutions across Information Management, Information Governance, Web Commerce, Customer Interaction, Optimization and Marketing

    1. Thanks… that’s one of the challenges of putting this chart together: there are a few companies like Autonomy that were around a number of years before anyone started talking about “big data”, and it’s not that easy to know where to draw the line. Let us figure out how/where we could include Autonomy in the next version. Others have suggested search and/or eDiscovery as missing pieces, maybe that could be an appropriate spot, assuming we can somehow fit all of it in on just one page…

      1. It is more than Search/eDiscovery, it really emcompasses intelligent information processing to extract meaning from data to automate business processes and achieve whatever business results one can envision. All the “solutions” are really just “packaged” interfaces with business logic to achieve specific business objectives, however, the IDOL platform can be integrated to any information intensive application/business process to create additional insight and automation. You really need to think of it as an information platform, but unlike other Core Infrastructure providers, IDOL has connectivity to all repositories (500+) and can actual manage information in place (e.g leave it in Sharepoint or on the Z: drive, but gain insight, and automate processes from its existence in those “systems of record.”)

  13. Hi Matt, Terracotta should be included in this graphic as well… they are a leading in-memory data core solution (just acquired by Software AG) and would fit in cross-infrastructure analytics category. We are the only leading in-memory data management solution that can linearly scale to terabytes of capacity, with predictable low-latency.

      1. Hey Matt, Thanks for all the work and responses to all the folks who are weighing in… Just wanted to make sure that you reference Terracotta — not Teradata 🙂 This is getting to be a big, deep exercise!

Leave a Reply

Your email address will not be published. Required fields are marked *