A chart of the big data ecosystem, take 2

So here we are again.  My colleague Shivon and I had made a first attempt at making sense of the rapidly evolving big data ecosystem back in June.  Based on some very helpful feedback from readers of this blog and others, a number of additional meetings with interesting startups and more in depth research, we’ve come up with this second version.

Some thoughts:

  • It’s still a work in progress (and will presumably always be, that’s the nature of the beast)
  • It’s even more crowded than the first time around, which reflects the incredible vitality of the big data space
  • We’ve created some new subcategories such as NoSQL/NewSQL and analytics services (reflecting the reality that, for the time being, the last mile of data analysis is very much performed by humans)
  • We have the occasional company that appears in different categories (Infochimps or Autonomy for example)
  • We have learned more about companies that were already on the first version of the chart, and have positioned them differently.  For example, Metamarkets now falls in the “Cross Infrastructure/Analytics” category as they offer a stack that includes a data store (Druid), predictive analytics and visualization.  Another example is Collective[i] – they have built  an entire proprietary big data stack from the ground up, that includes infrastructure, analytics and applications – making the company a rare example of an “Application Service Provider”.

Our goal is to continue updating this chart from time to time, and perhaps make it evolve visually, as we’ve probably reached the limits of what we can reasonably fit on one slide.  It was suggested that we try to visually distinguish on premise offerings vs. cloud based solutions, which we may try to do.

To enlarge, click on the arrows at the bottom right of the chart.

Comments, thoughts, questions? Please add to the comments section.

21 thoughts on “A chart of the big data ecosystem, take 2”

  1. Great chart. Just a few edits. sqrrl should actually be under NoSQL databases. You could also add Apache Accumulo to open source database projects.

  2. Thank you for following up on your promise to include MarkLogic. Since June when you published V1 of the chart, we added new tools for faster application development, powerful analytics and visualization widgets for greater insight, plus a lot of other cool enterprise features.

  3. Matt, you did not put Xignite as a Data Source or Marketplace. I hope it is not a subconscious omission 😉 We are accumulating vast amounts of exchange data (kinda big) on our platform in the cloud….

  4. Hey, you may also want to throw in IBM MANY EYES…great data visualization tool. Maybe just IBM with an image of eyes, nice and small..could squeeze it in 🙂

  5. This is quite an ambitious project you have taken on here – kudos!

    As you say, the ecosystem is evolving very quickly. As the infrastructure providers are gaining traction and the solutions are being implemented more widely, there seem to be more companies emerging with great tools that help understand the data and allow users to act upon it.

    I would argue that “Monitoring” falls under that category as well. Would it make more sense to have it under “Analytics” rather than “Infrastructure”? The objective of monitoring is to provide some clarity and insights into the vast amounts of data being collected.

    It’s close to my heart as we provide a monitoring service that leverages big data. Our data store is optimized for time series data and we provide visualization and alert services. It would be great if you could add us (https://librato.com) to the chart.

    Thanks!

  6. Kudos for taking on such an ambitious project. Always harder to decide what NOT to do with these landscapes that it is to decide what *to* do.

    On the infrastructure side, I tend to think about analytic platforms in terms of whether they are appliance databases, cloud or on-premise, if they are column-oriented versus relational, and to distinguish databases very good at something unique (graph databases, or document databases).

    Nonetheless, the only area that really seemed to be missing are real-time streams processing platforms such as StreamBase, Coral8, Inforsphere Streams, etc that address the V = velocity aspect of “Big Data”

  7. Great job, just about the most comprehensive I’ve seen. One quibble – Visualization vendors are really two discrete groups that are worth treating separately…
    1) BI/Visualization – dashboardng, charting and other tools for communicating, delivering analytics – such as JackBe, Chart.Io, Birst, Visual.Ly etc
    2) Visual Analytics – tools for visual pattern-based discovery – Ayasdi, Centrifuge, ClearStory etc

    thanks!

  8. Infochimps (www.infochimps.com) is in the the “Cross Infrastructure / Analytics” category. Powering Fortune 1000 with three cloud services: 1) real-time / in-stream (Cloud::Streams), 2) ad hoc / interactive analytics (Cloud::Queries), and 3) batch analytics (Cloud::Hadoop).

    The “data marketplace” or “data source” business is gone. 😉

  9. Dear Sir

    I am a editor form Liwen Culture & Chuliu Book Company at Taiwan.

    We are going to publish a schoolbook for college student about Sociology and Taiwan at September . One chapter of this book will talking about how technology impact the society.

    We found this big data landscape picture are useful for students to understand the big data.

    We would appreciate it if you could kindly allow us to use this picture in the schoolbook for free. We will list th the source link.

    We look forward to receiving your reply.

    Faithfully yours,

    Ray Wang

    Our Website:http://www.liwen.com.tw/

Leave a Reply

Your email address will not be published. Required fields are marked *