A chart of the big data ecosystem

My colleague Shivon Zilis has been obsessed with the Terry Kawaja chart of the advertising ecosystem for a while, and a few weeks ago she came up with the great idea of creating a similar one for the big data ecosystem. Initially, we were going to do this as an internal exercise to make sure we understood every part of the ecosystem, but we figured it would be fun to “open source” the project and get people’s thoughts and input.

So here is our first attempt.

A few things became apparent very quickly:

1) Many companies don’t fall neatly into a specific category

2) There’s only so many companies we can fit on the chart — subcategories as NoSQL or advertising applications, for example, would almost deserve their own chart.

3) The ecosystem is evolving so quickly that we’re going to need to update the chart often – companies evolve (e.g., Infochimps), large vendors make aggressive moves in the space (VMWare with Serengeti and the Citas acquisition)

What do you think? (click on the bottom right to expand)

52 thoughts on “A chart of the big data ecosystem”

Upendra Shardanand says:

June 29, 2012 at 3:19 pm

Hi Matt – I’d add Daylife under Applications / publishers tools — Big Data x Big Content. IMHO 🙂

Reply
1. mattturck says:
  
  July 2, 2012 at 1:30 pm
  
  That is very interesting Upendra. Had missed the Big Data angle to Daylife — in what way(s) are you a big data company?
  
  Reply
Francois says:

June 29, 2012 at 3:29 pm

Yes ! That was badly needed ! I would add SAP in cross infrastructure / analytics category (in this context, specially because of their solution HANA = real-time, big data).

Reply
1. mattturck says:
  
  June 29, 2012 at 6:16 pm
  
  Yes, great point, will do
  
  Reply
2. dataAngel says:
  
  July 2, 2012 at 5:10 pm
  
  HANA isn’t truly a Big Data offering since they are in-memory and limited to only 1TB as a result.
  
  Reply
Sam says:

June 29, 2012 at 3:43 pm

I’d suggest adding python / scikit – learn under the open source stat packages.

Reply
1. mattturck says:
  
  June 29, 2012 at 6:17 pm
  
  Yes, thanks a lot for taking the time Sam
  
  Reply
Sam says:

June 29, 2012 at 3:44 pm

Great idea / graphic otherwise.

Reply
Cathy says:

June 29, 2012 at 3:44 pm

Upon first glance, you may consider adding Pervasive Software, Cirro, and Kitenga to Analytics Solutions, FeedZai and ParStream to Real-Time, IBM Infosphere BigInsights and Greenplum HD/MR to Hadoop Related, Actuate and Quantum 4D to Data Visualization. Will suggest more later.

Cathy

Reply
1. mattturck says:
  
  June 29, 2012 at 9:24 pm
  
  Thanks Cathy, very helpful. We’re going to need to figure out a way to make room for all of these on just one page! 🙂
  
  Reply
Catherine Colton says:

June 29, 2012 at 4:14 pm

Hi Matt & Shivon, Dave Feinleib for Forbes did something similar recently http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/ but yours is by far more comprehensive. Well done. Two things:
1) I found Todd P’s breakdown of the Big Data Landscape quite interesting: Infrastructure/Plumbing, Dev/Mgmt Tools, Analytics & Apps. Some of the Mgmt Tools are under Infrastructure in your schema.
2) Search or Information Access seems to be missing. We hope you’ll add Q-Sensei in that box. Thanks!

Reply
1. mattturck says:
  
  June 29, 2012 at 9:32 pm
  
  1) Ah, that’s true, Todd Papaioannou did come up with that breakdown… mmm, let’s see if we can fit that in, space-wise.
  2) As to search, who else would you put in that category, that’s specific enough to Big Data? Elastic Search?
  As to the Forbes chart, yes, I know… we had been working on this for weeks on and off, but Dave beat us to it!
  
  Reply
Josh says:

June 29, 2012 at 5:24 pm

Great start to the ecosystem. I would add the following: Cross channel marketing providers like Axciom, Epsilon, Experian, Responsys, CheetahMail, Exact Target, Alterian, etc. They store marketing data like transactional, loyalty, web, social, etc. The data is modeled and used to execute marketing programs. I would also include DMPs- Blue Kai, Aggregate Knowledge, Turn, etc. WebAnalytics- Adobe, IBM/Coremetrics, etc.

Reply
1. mattturck says:
  
  June 29, 2012 at 9:23 pm
  
  Thanks Josh. We thought about the Axcioms and Experians of the world. Where would you put them? In the “Data Source” category?
  
  Reply
  1. Josh says:
    
    July 3, 2012 at 9:21 pm
    
    Definitely data sources. They also build and host pretty large databases for B2C marketing companies so they could also fall under Applications/Marketing.
    
    Reply
Ana says:

June 29, 2012 at 6:28 pm

Good stuff — charts like these are immensely helpful even if you sometimes can’t fit everyone in their right place. I know I swear by the Lumascape (and it sometimes haunts my dreams).

You’re missing SAS in the analytics, publisher tools (with the aiMatch acquisition), and cross infrastructure categories. SAS rolled out high performance analytics and visual analytics for exploration of big data sets, amongst other products.

Reply
1. mattturck says:
  
  July 2, 2012 at 1:32 pm
  
  Thanks Ana, will add SAS in the next iteration
  
  Reply
2. Dena Lawless (@iamlawless) says:
  
  February 12, 2013 at 3:04 pm
  
  SAS also with Data Visualization.
  
  Reply
Denise Brown says:

June 29, 2012 at 7:07 pm

Thanks for putting this together. With such a broad landscape it’s difficult to capture all the key players. MarkLogic is missing from the infrastructure group. We’re an enterprise software company powering over 500 of the world’s most critical Big Data Applications.

Reply
1. mattturck says:
  
  June 29, 2012 at 9:20 pm
  
  Thanks Denise, yes, that’s an oversight – where would you put MarkLogic, though? NoSQL? But it existed long before NoSQL companies appeared, right?
  
  Reply
  1. Denise Brown says:
    
    June 29, 2012 at 10:25 pm
    
    No worries, with so many players having recently entered the Big Data Landscape it’s gotten to be a very crowded sector, as your chart clearly shows. You are correct that MarkLogic was a NoSQL database solving Big Data issues for clients long before the term was popular.
    
    Reply
Sean Hallahan says:

June 29, 2012 at 9:53 pm

Hi Matt,
Thanks to BV, Shivon and you for doing this.
Companies I don’t see (some of these might be actually be a big, maybe huge, stretch or not fit your wiser criteria) that come to mind are:

Magnetic – look to go public just three year out of the blocks
C3 Metrics – very powerful attribution models cutting through mountains of well accepted myth.
VisibleMeasures – I can see why vm wouldn’t seem like big data, but video on the internet is big and very few people actually understand the punch, breadth and impact of VisibleMeasures capabilities.
GE Software’s Silicon Valley Industrial Internet
Medialets
MyCityWay – I’m biased to anyone that produces accurate meaningful subway realtime info. They’re improving.
Ensequence – interactive TV will tip scales imho
Altruik
SAP Hana
Brilig
Dtex Systems – when Dtex looks at big data, people get fired.
Adaptivity
Glue Networks
Lookingglass – these guys looked at big data and found very bad guys hidden within good guy domains

Best & cheers,
Sean

Reply
1. mattturck says:
  
  July 2, 2012 at 1:37 pm
  
  Thanks a lot Sean – not sure if we can fit all of these in the next iteration, but that’s very helpful feedback. There are a couple of companies in there that hadn’t come on my radar.
  
  Reply
James Grundvig says:

June 30, 2012 at 10:57 am

Also, missing beyond SAP’s Hana DB is a different subcategory altogether: eDiscovery or what I deem forensic analytics. The ability to datamine 3 million emails, legal, court, and brief docs in the law industry. It’s changing the way legal discovery has been conducted.

Reply
1. mattturck says:
  
  July 2, 2012 at 1:43 pm
  
  Yes, nice one — eDiscovery is definitely big data. The Bloomberg Vault product (compliance/eDiscovery solution) contains… 56 billion emails.
  
  Reply
Kunal Vaed says:

July 1, 2012 at 12:50 am

This is great Matt. Thanks! only suggestion I had was adding a vertical focus somehow to indicate the specific industry sectors addressed by these companies.

Reply
1. James Grundvig says:
  
  July 1, 2012 at 12:15 pm
  
  If you are to answer the Grids for each industry vertical, you must reach out to experts within that sector who already understand the lay of the land. My experience, and my company’s focus, is the Architecture-Engineering-Construction (AEC) industry. There’s a paucity of analytics in the industry, because it’s stuck in the legacy past.
  
  Reply
Aki Balogh (@akibalogh) says:

July 2, 2012 at 5:30 pm

Matt,

Great landscape. Putting these together is always hard.

For the MPP Database layer, please add Calpont InfiniDB.

InfiniDB is a “pure” MPP column-store, so it’s significantly faster and more scalable than most of the other MPP technologies on the slide.

Reply
1. Shivon Zilis says:
  
  July 2, 2012 at 5:45 pm
  
  Thanks, Aki! We’re working on v2 now so really appreciate the feedback.
  
  Reply
dataAngel says:

July 2, 2012 at 5:57 pm

Do you have access to the latest Gartner Magic Quadrants for BI and DWDMS? If not I could give you access. Contact me via email.

Reply
William Butler says:

July 3, 2012 at 5:12 pm

While you have Vertica, you are missing a big part of HP’s big data solutions, e.g. Autonomy. http://www.autonomy.com/content/News/Releases/2012/0604a.en.html
IDOL 10 (Intelligent Data Operating Layer) is is a single processing layer that enables organizations to extract meaning and act on all forms of information, including audio, video, social media, email and web content, as well as structured data such as customer transaction logs and machine-based sensor data (http://idol.autonomy.com/). It provides the platform for solutions across Information Management, Information Governance, Web Commerce, Customer Interaction, Optimization and Marketing

Reply
1. mattturck says:
  
  July 3, 2012 at 6:17 pm
  
  Thanks… that’s one of the challenges of putting this chart together: there are a few companies like Autonomy that were around a number of years before anyone started talking about “big data”, and it’s not that easy to know where to draw the line. Let us figure out how/where we could include Autonomy in the next version. Others have suggested search and/or eDiscovery as missing pieces, maybe that could be an appropriate spot, assuming we can somehow fit all of it in on just one page…
  
  Reply
  1. William Butler says:
    
    July 11, 2012 at 3:37 pm
    
    It is more than Search/eDiscovery, it really emcompasses intelligent information processing to extract meaning from data to automate business processes and achieve whatever business results one can envision. All the “solutions” are really just “packaged” interfaces with business logic to achieve specific business objectives, however, the IDOL platform can be integrated to any information intensive application/business process to create additional insight and automation. You really need to think of it as an information platform, but unlike other Core Infrastructure providers, IDOL has connectivity to all repositories (500+) and can actual manage information in place (e.g leave it in Sharepoint or on the Z: drive, but gain insight, and automate processes from its existence in those “systems of record.”)
    
    Reply
Pingback: Big Data Analytics Companies Take Most Venture Capital Deals
Pingback: Büyük Veri yatırımları kendine çekmeye devam ediyor | TheTeknoloji | Türkiye'nin Teknoloji Sitesi
Pingback: A chart of the big data ecosystem, take 2 – matt turck
AB says:

January 21, 2013 at 6:31 am

Dear Matt, We would like to have your authorsation to republish this image at http://www.BigDataQ.com

Thank you very much
Kind Regards
Big Data Q

Reply
1. mattturck says:
  
  January 28, 2013 at 9:59 am
  
  Sure, as long as you link back to the original post.
  
  Reply
Allison says:

January 23, 2013 at 4:07 pm

Hi Matt, Terracotta should be included in this graphic as well… they are a leading in-memory data core solution (just acquired by Software AG) and would fit in cross-infrastructure analytics category. We are the only leading in-memory data management solution that can linearly scale to terabytes of capacity, with predictable low-latency.

Reply
1. mattturck says:
  
  January 23, 2013 at 4:52 pm
  
  Thanks for the input Allison. Btw, there’s a more recent version of the chart, see http://mattturck.com/2012/10/15/a-chart-of-the-big-data-ecosystem-take-2/
  
  Reply
  1. Allison says:
    
    January 25, 2013 at 3:52 pm
    
    Hey Matt, Thanks for all the work and responses to all the folks who are weighing in… Just wanted to make sure that you reference Terracotta — not Teradata 🙂 This is getting to be a big, deep exercise!
    Thanks!
    
    Reply
Pingback: Log Yönetimi Bilgi Güvenliği Portalı – Log Yönetimi Çözümlerinin Başarı ve Başarısızlık Nedenleri
Pingback: The state of big data in 2014 (chart) | VentureBeat | Business | by Matt Turck, FirstMark Capital
Pingback: The state of big data in 2014 (chart) | 381test
Pingback: The state of big data in 2014 (chart) | Crowdfunding Today
Pingback: The state of big data in 2014 (chart) |
Pingback: The state of big data in 2014 (chart) | Tech Auntie
Pingback: The State Of Big Data in 2014: a Chart – matt turck
Pingback: The state of big data in 2014 (chart) | Your favorite stores with a personal touch
Pingback: The State of Big Data in 2014 - Hexanika
Pingback: The State Of Big Data in 2014: a Chart | EPM Channel
Pingback: The Current State of Machine Intelligence

52 thoughts on “A chart of the big data ecosystem”

Leave a Reply Cancel reply