A chart of the big data ecosystem

My colleague Shivon Zilis has been obsessed with the Terry Kawaja chart of the advertising ecosystem for a while, and a few weeks ago she came up with the great idea of creating a similar one for the big data ecosystem.  Initially, we were going to do this as an internal exercise to make sure we understood every part of the ecosystem, but we figured it would be fun to “open source” the project and get people’s thoughts and input.

So here is our first attempt.

A few things became apparent very quickly:

1) Many companies don’t fall neatly into a specific category

2) There’s only so many companies we can fit on the chart — subcategories as NoSQL or advertising applications, for example, would almost deserve their own chart.

3) The ecosystem is evolving so quickly that we’re going to need to update the chart often – companies evolve (e.g., Infochimps), large vendors make aggressive moves in the space (VMWare with Serengeti and the Citas acquisition)

What do you think? (click on the bottom right to expand)

There are many roads to success: The Buddy Media example

It’s been a few days now since their acquisition was formally announced, and I continue to be fascinated by the Buddy Media story.  But what fascinates me is less the company itself and all the things that make it great –  and instead the fact that its success tests the conventional wisdom of what makes a venture successful.  Rightly or wrongly, investors, prospective employees, the press, and anyone who tries to predict the highly unpredictable fate of startups, tend to default to some common assumptions about what’s going to work and what isn’t.  The Buddy Media story challenges that conventional wisdom in some interesting ways:

1.  NYC is not a good place to start an enterprise software company

It’s a bit ironic that, for all the talk about NYC being a media and eCommerce hub, the largest acquisition in five years would be an enterprise software company.

2.  It takes forever to build a successful enterprise software company.

It took Buddy Media less than 5 years from start to success, including an initial pivot.  

3.  To build a successful enterprise software company, you need technical co-founders, or at least a technical CEO

Buddy Media’s CEO is a serial entrepreneur with two degrees in journalism.  Buddy Media’s COO is a serial entrepreneur with a background in business development and marketing and a degree in economics.  The other co-founder and Chief Strategy Officer is a digital branding and marketing expert with a degree in Broadcasting and Mass Media.  

4.  Selling to marketers and advertisers is a really tough business.

Fortune 500 marketers and advertising agencies are indeed a tough audience – long sales cycles, often low budgets, a preference for homegrown solutions, a reluctance to buy what others in the industry purchase: not easy.   But the Buddy Media success shows that it can be done, with the right execution: build the best product in your category, focus on sales, make friends in the right places, hire some key people from agencies, and work really hard.

5.  Be really careful with strategic money

Buddy Media took a strategic investment from advertising leader WPP, which ended up substantially accelerating their business.

6. Service companies can’t become product companies

After an initial pivot, Buddy Media had to turn themselves into a service company to survive the 2008 economic recession.  James Altucher has a really interesting post on Techcrunch that describes this phase.  Somehow, they were able to gradually build a product offering.  

7.  The best founders are young and single

Two of the co-founders of Buddy Media are married.  On top of that, they have three children.  While there are famous examples of homeruns started by married founders (Cisco, VMware, etc.), in my experience, behind closed doors most investors think it’s a terribly risky idea.   The Buddy Media story shows that where there is will there is a way: founders with family obligations can still endure the rollercoaster lifestyle of the startup world.

 

VoltDB, Datastax, RJ Metrics and Custora at the NYDBM

Here are the presentations from the NYC Data Business Meetup on May 21:

VoltDB – Presenter:  Scott Jarr, co-founder

Datastax – Presenter: Matt Pfeil, co-founder

RJMetrics – Presenter: Robert J Moore, co-founder and CEO

Custora (presentation coming soon) – Presenters: Corey Pierson, co-founder, and Aaron Goodman, data scientist.

And here are a few pics!

Hope to add videos soon.

Thoughts, feedback, questions? Topics you’d like to discuss at the next NYDBM? (or data-related stuff you’d like to discuss, regardless of whether you attend the NYDBM or not?). Feel free to opine in the comments section.

“The business of data” panel with IA Ventures, Klout & Quid

Just got a copy of the video of a panel we did a few months ago at the Bloomberg Link Empowered Entrepreneur conference.  It features Roger Ehrenberg of IA Ventures, Joe Fernandez of Klout and Sean Gourley of Quid.  The speakers are terrific and it’s a solid introduction to the topic — since this panel was part of a broader entrepreneurial conference, it is slightly higher level than panel conversations you’d hear in specialized Big Data conferences.

The thriving data ecosystem in NYC

There’s a lot of interest in data-related businesses and products everywhere these days, but it’s been particularly fun to see things accelerating in New York (where I’m based).  Some purely anecdotal evidence: We had 50 very qualified data scientists show up at the recent hackathon we organized (as part of Big Data Week), despite the ungodly start time of 8am on a Saturday.   The Data Meetup I host monthly went from 0 to almost 1,300 members in barely 5 months.  General Assembly is starting a 10 week intensive program in data science.  Microsoft just announced it chose to locate in NYC its new research lab, which includes plenty of data science brainpower (including machine learning specialist John Langford and Jake Hofman, formerly of Yahoo Research).

NYC is becoming a real “hub” for data startups.  In fact, in my opinion data startups are becoming the next “layer” of the NYC tech scene — the way content and advertising startups (24/7, Doubleclick, Silicon Alley Reporter, etc.) were the foundational layer of “Silicon Alley” from 1995 to 2005, and the way social and e-commerce startups (Tumblr, Gilt, Foursquare, Etsy, Warby Parker, Rent the Runway, etc.) became the next building block that led to where we are today.

Due to their often intensely technical nature, data startups represent an interesting opportunity for NYC to develop more of a scientific and engineering-focused startup culture.

NYC has the key components of a thriving data startup ecosystem, including:

1) Customer demand: For those startups that sell to enterprises rather than consumers, NYC is where many of the key buyers are located – specifically, Wall Street and Madison Avenue, which have been among the most voracious and sophisticated users of data.  It’s no accident that some of the key conferences in the space, such as GigaOm’s Structure:Data or Strata, take place in NYC (or have an NYC event in addition to their CA event) – there’s no better place for emerging vendors to show off what they’ve built to potential purchasers.

2) A relevant talent pool: in addition to solid engineering talent, data-driven startups need data scientists, who come in various flavors: statisticians, mathematicians, machine learning experts, programmers, etc.  In part because there has been demand for this type of profiles for a while in financial services, there’s a fair concentration of them in NYC, and I’m seeing an increasing number of them making the jump to startup land.  NYC has a number of prominent data scientists, including (but certainly not limited to), Drew Conway and Jake Porway (both of whom are co-founders of Datakind, f/k/a Data without Borders), Max Shron, Cathy O’Neil (who left D.E. Shaw for a startup, Intent Media), Gilad Lotan, etc.  And of course, we have our very own emerging media star (deservedly so) in the person of Hilary Mason, most recently profiled here.

3) A data community:  Whether it’s Data Drinks or meetups, there’s clearly appetite for data nerds to get together and geek out. Both the NYC Predictive Analytics meetup (organized by Alex Lin) and the NYC Machine Learning meetup (organized by Paul Dix and Max Khesin) have over 2,000 members, while the New York Open Statistical Programming Meetup has 1,700 members.

4) Investors with a deep interest in the space:  As far as I know, IA Ventures is the only VC firm in the country that has an exclusive focus on data as an investment thesis (Accel’s big data fund is a little different, in that it’s a dedicated pool out of a much larger fund).  Roger Ehrenberg and his talented team (Brad Gillespie, Ben Siscovick, Jesse Beyroutey) are having a tremendous impact on the data world in general, and in NYC in particular (about half of their portfolio is NYC-based). RTP Ventures is a new but very promising NYC investor in the space, with a focus on the infrastructure part of the big data world.  Many of the main NYC investors are also “data friendly”, and have interesting data plays in their portfolio, as part of a broader focus: Union Square Ventures, Betaworks (see John Borthwick’s “data is the new plastic“), RRE, Lerer Ventures, Thrive Capital, kbs+ Ventures, but I’m sure I’m forgetting a number of others.

5) Universities that are willing to get involved:  The key machine learning centers in the country may be Carnegie Mellon, MIT and Stanford, but Columbia is strong as well, and most importantly, there are some terrific professors who are both academically prominent and deeply involved in the NYC tech scene – in particular Chris Wiggins (in addition to being a prominent machine-learning expert, Chris is also the co-founder of HackNY and has mentored many of the data scientists currently employed in NYC startups) and Tony Jebara (who runs the Columbia Machine Learning Laboratory and has also founded and advised several startups including Sense Networks and Bookt).  NYU has some leading authorities the data-intensive field of physical computing and Internet of Things: Tom Igoe and Dan O’Sullivan. Medium term, Cornell may be able to bring some additional academic expertise to NYC (for example, it is home to Joachims Thorsten who is arguably one of the top SVM researchers).

6) A crop of promising data startups:

  • A growing number of NYC based startups offer data and predictive analytics solutions – starting perhaps with Opera Solutions, which very people in the NYC tech scene had heard about until it raised a whopping $84 million in September 2011 from Silver Lake and Accel KKR (Opera Solutions employs some 150 data scientists, out of 400 employees).  In addition, NYC startups have been building all sorts of interesting data and analytics products for social media (Bitly, SocialFlow, Kno.des), news (Visual Revenue), finance (Dataminr), music (NextBigSound, which is moving to NYC), sports (Numberfire, and our own Bloomberg Sports) and of course advertising and marketing (Sailthru, Collective[i], Custora, PlaceIQ, YieldBot, Mediamath, m6d, 33across, Clickable, Buddy Media, etc.).
  • While we’re nowhere near the Silicon Valley on this front,  it’s great to see more big data infrastructure companies in NYC – some like 1010Data largely predate the whole big data craze; others have been appearing more recently, including FluidInfo, CrowdControl, Mortar Data (which is moving to NYC), Datadog, and of course 10Gen, whose MongoDB noSQL database is quickly becoming a must-have for a number of data-driven companies.
  • Finally, several exciting NYC startups are focused on the application of data to create disruptive products in various industries, such as education (Knewton) or consumer finance (Billguard, Bundle).
  • The fact that NYC recently saw a couple of acquisitions of data startups – Chris Dixon’s Hunch and Jordan Cooper’s Hyperpublic – doesn’t hurt either.

7) A data-centric business culture: perhaps it is because some of the key historical entrepreneurial successes in NYC were data companies (Bloomberg LP, Nielsen); or perhaps it is a reflection of the demand of East Coast investors who arguably tend to be very focused on metrics and business models (as opposed to pure vision)… but somehow, as far as I can tell, there’s always been a real culture around data and analytics in NYC.  Now increasingly, I hear CEOs of NYC startups present their companies as data companies, even those you wouldn’t necessarily suspect (recent examples include Dennis Crowley of Foursquare and Yaron Galai of Outbrain).  In addition, NYC startups have been quick to build data science teams, including many that don’t explicitly position “data” as a key part of their value proposition: Etsy, Gilt, The Ladders, GetGlue, Foursquare, Tumblr all have data scientists on board.

All of this is just a start, and I’m excited to see how it all progresses in the next few months and years.

Working with big companies: 10 practical tips for startup founders

If you’re a startup founder, a day will come, generally sooner rather than later, when you have to deal with a big company – whether that’s a Fortune 1000 company you’re trying to sell something to, a large advertising agency you hope will drive some advertising dollars your way, or a media company you want to strike a partnership with.   You may have been successful “selling” your vision to a group of angels or VCs and raise some money, but selling to a large company (and in “selling”, I include all forms of partnerships and other business development efforts) is a different animal altogether. For the last three years or so, I’ve been on the big company side and have heard many startups pitch a business relationship.  The following tips – some trivial and slightly tongue in cheek, others more substantial – are based on patterns and issues I’ve seen over and over (presented in no particular order of priority).

1.  Read a few sales books.   Our startup culture very much celebrates product people (those who, like Steve Jobs, are able to create “stuff that people want”), and tech talent (the people who actually build the product).  Some iconic founders of the previous decades may have been sales people (Larry Ellison being one example), but sales seems to have fallen out of favor as a key part of any founder’s skillset – you’ll hardly ever hear any VC express a strong preference for CEOs who have a strong sales background, the way they rave about technical CEOs.  One consequence of this is that many startup founders (especially the younger ones, but not only) come equipped with incredible product savvy, but seem to have spent very little time learning how to sell.  Selling to a large company is not just a question of presentation, it’s also about navigating a complex organization and being able to qualify an opportunity –I find that startup founders often fail to ask some basic questions about budgeting cycles, decision process, etc., and end up not being able to truly appreciate whether they have a real opportunity or not.  The good news is, there’s tons of literature out there.  Sure, many sales books have cringe worthy titles, and none of them are perfect, but after reading a few of those, a number of important principles emerge.  Worth investing some time reading a few.

2.  Hire a sales person.  This flows from the previous point.  No question, founders should be the initial sales people at their startups: there’s no better way to get market feedback and fine tune your value proposition.  But I’m often surprised by how long startups wait before bringing in their first sales (or “biz dev”) person on board.  I’m all in favor of building viral buzz about your product through blogs and social media, and using other innovative techniques to increase your inbound leads. But in many cases, all those techniques only take you so far, and at some point, particularly if you sell to large companies, you’re going to have to go through a series of in-person sales meetings.  Like for most things in life, experience helps, and in my opinion, a combination of an experienced sales (or biz dev) person and a product/tech founder proves very effective in a meeting with a big company.

3.  Play down the startup vibe.  Sure, Zuckerberg wears sandals in business meetings (at least in the movie), but for almost everyone else, there’s very little upside to showing up in a hoodie and sneakers at a meeting where most people are going to wear ties and business suits.  The startup tech world may have its own, idiosyncratic rules and codes, but when it comes to selling to a large company, it’s back to reality.  Those people don’t read TechCrunch every day (if ever), and they will have most likely no idea who your hotshot investors are.  Do yourself a favor and don’t show up looking like a college kid, because in large companies, recent college graduates are the people at the very bottom of the totem pole, not the people you invest a significant amount of time and resources doing business with.

4.  Do not play with your phone during the meeting.  Ever.  Can’t tell you how often this happens. It typically goes down something like this.  Three founders or team members of the startup show up at the meeting.  One ends up being the lead presenter and fielding most of the questions.  Another one says something every 10 minutes or so, and looks bored and disengaged the rest of the time.  The third one spends the entire meeting playing with their iPhone (and I’m not talking about taking meeting notes).  Guys, seriously? Remember, people will rarely call you on it, but that doesn’t mean you got away with it.

5.  Come prepared.  This should be obvious, but it’s not, and if you are truly well prepared for a meeting, it can be a major differentiator playing in your favor. Unlike an investor pitch, a pitch to a big company is not about you and how awesome your product is.  It’s about how you can help the big company solve a specific problem. You should come in knowing everything you can read about the big company, and have ideas about where and how you could help.  If you show slides, insert screenshots of the big company’s products in your deck. Your demo should be using examples relevant to the big company, or at a minimum, to whatever industry the big company operates in.  If you truly want to make an impression, build a simple demo just for the meeting.  Regardless, your demo should work – send the link to your host in advance to make sure the demo is going to work from within the firewall.  If this is a key meeting for you, you should spend days, not hours, prepping for it.

6.  A sales meeting is not a mentoring session.  Again, something I’ve seen a few times – your big company hosts may be really friendly, they may show genuine interest in hearing your story, and they may have experience that is very relevant to your needs and challenges.  Listen to their feedback and ideas if they offer them, but remember that you’re in a sales meeting.  Resist the temptation to ask questions about how they were able to solve specific problems, or what they’d do if they were you, etc.  Overall, one of the things they’re trying to establish on their end is that, while you’re a startup, you’re already pretty solid and it’s generally safe for them to work with you.  Asking this type of questions is not going to help.

7. Be patient.  Getting stuff done with a big company takes time.  Most people understand that, but they often attribute it to the fact that large companies have more processes, approval layers, lawyers involved, etc.  That is certainly true.  But the more fundamental (and scarier) reason for it is that large companies typically care about whatever business deal you’re discussing a lot less than you do.  Entering into a business relationship with a startup rarely moves the needle for them, at least not in a substantial and immediate way.  They can have a genuine interest in working with you, but they rarely feel true urgency.  Of course, from the startup’s perspective, it is frustrating because that deal can be a make or break moment.  Most techniques you may use to create a sense of urgency are likely to come across as a gimmick, and should be used with caution. Recognize that reality and plan accordingly.

8. Don’t beat a dead horse.  Being patient doesn’t mean that you should wait around forever.  For some reason, large companies, just like some VCs, rarely actually say “no”.  If there’s a fit, it should be fairly obvious after a couple of meetings.  If you’re going from meeting to meeting with different people each time, if getting people to return your calls feels like pulling teeth, or if they mention how working with you may make sense for a big project that they’re going to start next year, it’s unlikely anything substantial is going to happen for you anytime soon.  Learn to recognize when the opportunity is not going anywhere, and move on.

9.  Don’t forget the biz dev guy (or gal). You’ve probably heard (or experienced) how things work before:  your first point of contact at a large company will typically be their business development people; while it’s perfectly fine to start with them, you’ll want to quickly find an internal champion, preferably someone who has P&L responsibility and real decision power.  The twist I would add is this: don’t drop the biz dev person once you’ve found your champion. The person you thought was going to be your champion may not turn out to the right person, they may get too busy, change jobs or lose interest.  The biz dev person is your long-term ally within the large company, because their performance is often measured internally based on the number of opportunities they bring in that end up with an actual deal – so their interests are aligned with yours and they’ll want to make you successful.

10.  Be careful with big company politics.  Like any human organization, large companies are fraught with politics and as an outsider, it’s unlikely you’re going to be able to play them right.  So stay away as much as you can. Be transparent, keep everyone in the loop, don’t make comments about people, don’t take sides.  Don’t shop around from department to department in the hope you’ll find someone who’ll bite, unbeknownst to all the other people you’ve met with previously.  Don’t suggest strange things like making intros between employees of the same company (in most companies, they can pick up their phone).   Invest time in getting to know your contacts at the big company, and make them look good in front of their bosses.

Calling all data scientists! Participate in the first global data science hackathon

Are you a smart data scientist? In connection with Big Data Week and Data Science London, we’re helping organize a global data science hackathon that will simultaneously take place in various locations around the world (including London, Sydney, and San Francisco). We will host the NYC event at the Bloomberg Ventures office in the West Village. The aim of the hackathon is to promote data science and show the world what is possible today combining data science with open source, Hadoop, machine learning, and data mining tools.  The event will run from Saturday April 28 at 8am EST to Sunday April 29 at 8am EST.  We’ll provide copious amounts of pizza, caffeine and some prizes.  Interested? Please register here

The three waves of opportunities in big data

As the interest in all things big data continues to increase, I’ve had a few chats recently with executives and entrepreneurs looking to learn more about the space, where I was asked about the trends and opportunities I see.  Wide topic obviously, but I figured I’d jot down a few notes about what I’ve been hearing, reading and thinking.

I see opportunities in the data space unfolding in several “waves.”  Of course, reality often resists attempts at this type of categorization, and it is unlikely those waves will happen in a neatly organized sequential order; elements of each wave already exist, and it’s possible all of this will happen more or less at the same time.  However, I still find it helpful to have this type of framework in mind to understand an industry that is rapidly changing.

First wave: Big data infrastructure

Right now, the whole “big data” discussion is very much about core technology. Look up the agendas for big data conferences like Strata or Structure:Data and you’ll see – it’s all about software and data science, fascinating stuff but very technical and generally hard to understand for anyone that’s not deeply versed into those topics.  Core big data technologies may have originated from consumer internet companies, but at this stage there’s not much that feels “consumery” about big data.

The reason for this is that we’re still early in building the big data infrastructure, and there’s a lot to figure out, before much else can happen.  If the fundamental premise of big data – that all current solutions break past a certain volume (or velocity or variety) of data – holds true, then a whole new ecosystem needs to be reinvented.  We’ve made a lot of progress in the last few years, but there are still a lot of nuts to crack, for example: How do you process big data in real time?  How do you clean up large data sets at scale? How do you transfer large volumes of data to the cloud and process it there? How do you simplify big data tools to make them approachable by a larger number of software engineers and business users?

As a result, much of the innovation has been happening at the infrastructure level.  Note that I mean “infrastructure” in the broadest sense – basically all the pieces of “plumbing” necessary to process big data and derive insights from it.  That includes infrastructure per se (for example, the Clouderas and Hortonworks of the world, the various NoSQL companies, etc.), but also analytics (Platfora, Continuuty, Datasift, etc.), data marketplaces (Factual, Datamarket, etc.), crowdsourcing players (Kaggle, CrowdControl, etc.) and even devices (sensors, personal data capture devices).

This is a time of tremendous opportunities for new entrants.  Large technology vendors are going to struggle with big data, in part because the underlying technologies are very different, and in part because they’ve been making a lot of money so far selling expensive solutions to process comparatively smaller data sets – some of the new entrants claim to be up to an order of magnitude cheaper than the Oracles of the world.  Large companies have made some interesting moves (Oracle partnering with Cloudera, Microsoft announcing support for Hadoop) but presumably, they will delay the inevitable for the most part, and this will lead to plenty of attractive acquisition opportunities for startups and their investors over the next few years.

Equally, it is also a time of confusion for anyone trying to figure out who the real success stories will be:

  • There’s a lot of noise, and this is only going to accelerate as VC money continues to pour into the industry. Also, the fact that older, larger companies seem to be racing to rebrand as big data companies doesn’t help.
  • There’s a fair number of “science projects” out there – companies that, at least for now seem to be focused on solving an engineering issue but haven’t quite thought through their commercial applicability.  At our recent NYC Data Business Meetup, Kirill Sheynkman of RTP Ventures made a powerful case that big data for big data’s sake does not a company make (“Big data… so what?”)
  • It is going to take a while for winners to emerge – unlike consumer internet startups that can experience hockey stick growth from inception, software startups go through generally slower adoption cycles (consumerization of IT notwithstanding).  Also, the abundantly documented (but presumably temporary) shortage of Hadoop engineers and data scientists may somewhat slow down the widespread adoption of those technologies.
  • The surge in interest about all things big data will inevitably lead to some level of disillusionment (I assume Gartner has a nice hype cycle chart describing this), as projects turn out to be harder and more time consuming than expected, and sometimes underwhelm their sponsors.  Startups will have to struggle through that phase, which may slow things down as well

Sooner or later, of course, winners will emerge, and what seems to us like daunting technical challenges will become something that any qualified software engineer will be able to handle, equipped with reasonably simple and cheap tools.   There’s always a slight irony to underlying technologies:  their ultimate sign of success is that at some point they become a given, a starting point, a simple enabler.  In a recent talk organized about the NYC Media Lab and held at Bloomberg, Hilary Mason mentioned that the future of data visualization is “boring”, meaning that it will eventually become commoditized and a simple tool.  I believe this will probably be true eventually of the entire big data technology stack.

Second wave: “Big data enabled” applications and features

As core infrastructure issues are gradually being resolved, the next logical step is to focus on expanding the benefits of big data to a broader, non-technical audience within the enterprise, and to more consumers online.

Within the enterprise, we should see a lot of innovation around business applications.  Enterprise software has always been to a large extent about enabling business end users to access and manipulate large amounts of data.  “Big data enabled” enterprise applications will take this to the next level, offering business users unprecedented data mining and analysis opportunities, using larger volumes of internal data, in real time or close, and sometimes augmenting it with external data sets available through data marketplaces.  This will happen across many different enterprise functions (finance, sales, marketing, HR, marketing, etc.) and across industries, from retail to healthcare to financial services.

The possibilities are intriguing: for example, what will a CRM application look like, when you can mine in real time all of your customer base, the interactions of your sales force with them, and combine the results with external data sets on industry and company news, geographic and demographic patterns, to determine which prospects are the most likely to buy in the next quarter? Enterprise marketing software is also likely to be profoundly impacted by big data.

On the downside, things may take a little while longer than one would like here as well.   In enterprise software, it’s not just the quality of the software that counts.  Business end users need to accept the new product, learn how to use it, and integrate it in their daily process and workflow.  Big data applications will be no exception to this.

One thing big data vendors can do to speed up the adoption cycle is to focus on the simplicity of the end user experience.  From that perspective, startups like Splunk and Datadog are showing the way, in the IT data space– Splunk enables end users to search large amounts of data through a Google-like interface; Datadog enables users to monitor data through an experience that’s very reminiscent of the Facebook newsfeed.

On the consumer internet front, data-driven features should become commonplace on many websites.   Internet startups led the way, in particular with their recommendation engines (Amazon, LinkedIn, Netflix, Facebook, iTunes, in particular).  But so far those features have required having first-rate data scientists on board, and an ad hoc infrastructure.  I would expect all of this to democratize considerably in the near future, as the infrastructure evolution mentioned above takes place.  Retailers, financials services companies, health care providers will all use data-driven features to customize and personalize their users’ online experience, and accelerate their core business.  As over time any company with a web presence will want to offer data-driven features, there is an interesting market opportunity for startups that could provide easy to use, out of the box tools to do this easily (“big data out of the box”)

Third wave:  The emergence of “big data enabled” startups

The democratization of big data infrastructure tools will also open wide the opportunity for entrepreneurs, including those without a deep tech background, to dream up entire new businesses (and business models) based on data.

Just the way we were talking about “web enabled” businesses a few years ago, we’re likely to see more and more “big data enabled” businesses appearing.  By that I mean companies that have the ability to process large amounts of data as their core DNA, and use it to deliver a product or service that could not exist otherwise.

Of course, there are already a number of startups that live and breathe data. I believe that Klout, for all the controversy around it, is a category-defining startup, and a great example of a company computing large amounts of data to come up with a unique product.  Billguard is a very interesting play that combines big data and crowdsourcing to deliver real value to consumers. Foursquare also comes to mind — a key insight of Dennis Crowley’s interview at SxSW a few days ago was how much he thinks of his company as a data play (gamification being “just an onboarding mechanism’).

This only the beginning.  As always, there are a number of tricky issues to deal with (privacy being one of them), but it’s going to be a lot of fun to see what ideas we all come up with.  As an example, I’m fascinated by “big data enabled” startups that empower consumers, such as:

  • Personal data companies:  as the number of inputs of personal data increases (social network activity, personal health devices like Fitbit and Jawbone, etc.), I believe there are going to be exciting opportunities for startups that can aggregate and analyze one’s personal data, visualize it and compare it to peers in a simple and visually attractive way.  Think of what Stephen Wolfram has been doing for years, but as a consumer friendly product available to all: self-quantification gone mainstream.
  • Consumer to business” (C2B) companies: Individual data capture will give people more power when it comes to obtaining customized treatment for businesses.  Startups like Milesense capture your driving behavior through your iPhone so that you can obtain better insurance premiums if you’re a safer driver.  Similarly, if you can capture one’s health and diet habits, and you are healthy, you should obtain better prices for health and life insurance. What else can the consumer obtain, once she is empowered with her own data?

Bitly, Splunk, Gnip, RTP Ventures: Videos

Videos of the presentations at the NYC Data Business Meetup on February 23, 2012 (can’t seem to be able to get better quality image, suggestions are welcome):

Hilary Mason, Chief Scientist, Bitly:

Stephen Sorkin, VP of Engineering, Splunk:

Chris Moody, President and COO, Gnip:

Kirill Sheynkman, Senior Managing Director, RTP Ventures:

My (long) intro remarks:

Bitly, Splunk and Gnip: Presentations

Here are the presentations from the recent NYC Data Business Meetup  – thanks to our terrific speakers: Hilary Mason, Chief Scientist, bitly; Stephen Sorkin, VP of Engineering, Splunk; Chris Moody, President & COO, Gnip; and Kirill Sheynkman, Senior Managing Director, RTP Ventures (no presentation slides).

Bitly – NYC Data Business Meetup – February 23, 2012

Gnip – NYC Data Business Meetup – February 23, 2012

Splunk – NYC Data Business Meetup – February 23, 2012

Hyperpublic acquired by Groupon

Like many in the NYC tech community, I was excited to hear the news last Friday night that data startup Hyperpublic was acquired by Groupon.

In addition to the fact that he was the first  speaker we ever had at our  NYC Data Business Meetup (which should put him in a very special place in history, right there… right?), Jordan Cooper, Hyperpublic’s co-founder and CEO, is a terrific entrepreneur, very thoughtful about the data space, and well liked and respected by all who know him in the tech community. He and his co-founder Doug Petkanics built an awesome team (including scoring a big win when they recruited Jeff Weinstein), got some great seed investors (including Lerer Ventures and Thrive Capital, which also had another quick win with GroupMe a few months ago), and it’s always nice to see good things happening to good people.

This is also good news for the budding big data and enterprise tech community in NYC. Hyperpublic was very much part of an emerging ecosystem of interesting, tech heavy companies in NYC, that includes 10gen (MongoDB), Enterpoid, Nodejitsu, Datadog, Opani, Mortar Data, etc., and seeing a young company successfully go through the cycle of product creation, seed funding and exit is encouraging for aspiring entrepreneurs in that space.

Hyperpublic was tackling some difficult engineering problems. As a geo-local data company, they pulled data from many different sources (through both APIs and web crawling) and then organized and structured it in a way that developers could use to build location based applications.  The difficulty of this type of effort comes from ensuring that the data has enough (i) breadth (defined by the number of geographic regions where they have a critical mass of relevant data – they got to 50 cities), (ii) depth (how detailed the data about each point of interest is) and (iii) freshness (for example being able  to pick up on location openings and closings).  All of this requires an ability to gather, normalize and cross-reference enormous amounts of unstructured data at rapid intervals, and doing it with high levels of certainty is not a simple task.

Hyperpublic made very solid inroads in terms of addressing those challenges, due in large part to the quality of the engineering team and culture that Jordan and Doug had managed to put together.  Both from a technology and team perspective, the acquisition makes perfect sense for Groupon, as they ramp up their effort in the mobile local space.

From an industry watch perspective, however, the acquisition highlights the difficulty of building long term, standalone businesses based purely on gathering data and making it available to developers, including in the geo local space.  Simple Geo tried this as well and ended up being acquired by Urban Airship for $3.5 million. Urban Airship announced quickly afterwards that it was shutting down Simple Geo services altogether, leaving developers that were relying on it pretty much stranded.  Hyperpublic presumably fared a lot better, but Groupon acquired it for its own use (as a needed element to its infrastructure), and will also stop providing the data to developers, effective March 2.  It will be interesting to see how companies like Factual (which covers more than geo-local data) and PlaceIQ (which has a slightly different twist on geo-local) evolve in that context.

Startups that are in the business of gathering data and providing it to developers (as opposed to selling it to the enterprise) can get rapid user adoption, but typically face a revenue model challenge.  In addition, particularly in the geo local data space, there is an uncertainty around how to work in the long term with the larger companies that capture a lot of the geo local data, as a by product of their core activity.  Are they friends or foes? Many of the large consumer internet companies have a geo local API (Facebook Graph API, Foursquare Venue Database, Google Places API, Microsoft Bing Maps Location API, Yahoo Local API, Yelp API), which the data companies have used as one of their sources.  While those large internet consumer companies have been happy so far to cooperate with the data companies, how long will they continue to do so, if it turns out that the data becomes a real potential source of revenue?

From Hyperpublic’s perspective, therefore, the timing of the acquisition makes perfect sense – they built the team and the technology, and took the company to the stage where a natural acquirer like Groupon could come in and acquire them.  Taking it to the next level, while certainly possible, would have required a significant amount of additional time and money, and possibly figuring out a consumer facing product to start capturing their own geo local data, in a context where uncertainty about the revenue model and the competition could have made things tricky.

As a side note, it is exciting to see recently-IPO’ed Groupon getting into acquisition mode quickly (see also Kima Labs), and one can only get excited imagining how acquisitive Facebook is going to be after it goes public.

Congrats to Jordan, Doug, Jeff and the Hyperpublic team!