A chart of the big data ecosystem, take 2

So here we are again.  My colleague Shivon and I had made a first attempt at making sense of the rapidly evolving big data ecosystem back in June.  Based on some very helpful feedback from readers of this blog and others, a number of additional meetings with interesting startups and more in depth research, we’ve come up with this second version.

Some thoughts:

  • It’s still a work in progress (and will presumably always be, that’s the nature of the beast)
  • It’s even more crowded than the first time around, which reflects the incredible vitality of the big data space
  • We’ve created some new subcategories such as NoSQL/NewSQL and analytics services (reflecting the reality that, for the time being, the last mile of data analysis is very much performed by humans)
  • We have the occasional company that appears in different categories (Infochimps or Autonomy for example)
  • We have learned more about companies that were already on the first version of the chart, and have positioned them differently.  For example, Metamarkets now falls in the “Cross Infrastructure/Analytics” category as they offer a stack that includes a data store (Druid), predictive analytics and visualization.  Another example is Collective[i] – they have built  an entire proprietary big data stack from the ground up, that includes infrastructure, analytics and applications – making the company a rare example of an “Application Service Provider”.

Our goal is to continue updating this chart from time to time, and perhaps make it evolve visually, as we’ve probably reached the limits of what we can reasonably fit on one slide.  It was suggested that we try to visually distinguish on premise offerings vs. cloud based solutions, which we may try to do.

To enlarge, click on the arrows at the bottom right of the chart.

Comments, thoughts, questions? Please add to the comments section.

Comments

  1. Ely says:

    Great chart. Just a few edits. sqrrl should actually be under NoSQL databases. You could also add Apache Accumulo to open source database projects.

  2. Add Tibco Spotfire in the “Data Visualization”? (my favorite tool).

    Excellent chart, though. It’s great to understand the categories and scope of the landscape.

  3. sraspa says:

    Agreed. Great chart. Would love to see IKANOW’s open source platform, Infinit.e on the chart too (www.ikanow.com).

  4. Denise Brown says:

    Thank you for following up on your promise to include MarkLogic. Since June when you published V1 of the chart, we added new tools for faster application development, powerful analytics and visualization widgets for greater insight, plus a lot of other cool enterprise features.

  5. Matt, you did not put Xignite as a Data Source or Marketplace. I hope it is not a subconscious omission ;) We are accumulating vast amounts of exchange data (kinda big) on our platform in the cloud….

  6. Andrea Gallego says:

    Hey, you may also want to throw in IBM MANY EYES…great data visualization tool. Maybe just IBM with an image of eyes, nice and small..could squeeze it in :)

  7. Johan Bager says:

    I love the work you guys have done here. I would love a large hardcopy. Is it worth looking into demand and process to actualize?

  8. Nik says:

    This is quite an ambitious project you have taken on here – kudos!

    As you say, the ecosystem is evolving very quickly. As the infrastructure providers are gaining traction and the solutions are being implemented more widely, there seem to be more companies emerging with great tools that help understand the data and allow users to act upon it.

    I would argue that “Monitoring” falls under that category as well. Would it make more sense to have it under “Analytics” rather than “Infrastructure”? The objective of monitoring is to provide some clarity and insights into the vast amounts of data being collected.

    It’s close to my heart as we provide a monitoring service that leverages big data. Our data store is optimized for time series data and we provide visualization and alert services. It would be great if you could add us (https://librato.com) to the chart.

    Thanks!

  9. Congrats on this work. Excellent chart.
    With respect to Acunu (in Cluster Services), I would put the company both under Analytics Realtime and under NoSQL database (Acunu has a committer and several contributors to Apache Cassandra).

  10. Jodie Gilmore says:

    I would suggest adding Guavus (real-time streaming analytics, http://www.guavus.com) to the Analytics section — maybe under ‘Real-time’.

  11. John Held says:

    Kudos for taking on such an ambitious project. Always harder to decide what NOT to do with these landscapes that it is to decide what *to* do.

    On the infrastructure side, I tend to think about analytic platforms in terms of whether they are appliance databases, cloud or on-premise, if they are column-oriented versus relational, and to distinguish databases very good at something unique (graph databases, or document databases).

    Nonetheless, the only area that really seemed to be missing are real-time streams processing platforms such as StreamBase, Coral8, Inforsphere Streams, etc that address the V = velocity aspect of “Big Data”

  12. Navin Ganeshan says:

    Great job, just about the most comprehensive I’ve seen. One quibble – Visualization vendors are really two discrete groups that are worth treating separately…
    1) BI/Visualization – dashboardng, charting and other tools for communicating, delivering analytics – such as JackBe, Chart.Io, Birst, Visual.Ly etc
    2) Visual Analytics – tools for visual pattern-based discovery – Ayasdi, Centrifuge, ClearStory etc

    thanks!

  13. Jim Kaskade says:

    Infochimps (www.infochimps.com) is in the the “Cross Infrastructure / Analytics” category. Powering Fortune 1000 with three cloud services: 1) real-time / in-stream (Cloud::Streams), 2) ad hoc / interactive analytics (Cloud::Queries), and 3) batch analytics (Cloud::Hadoop).

    The “data marketplace” or “data source” business is gone. ;-)

Trackbacks

  1. [...] Here is a new attempt at a chart of the Big Data Ecosystem, from Matt Turck and Shivon Zilis and the Big Data Meetup.  Double click to enlarge.  For the source, click here. [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 35 other followers

%d bloggers like this: