matt turck

A chart of the big data ecosystem, take 2

So here we are again. My colleague Shivon and I had made a first attempt at making sense of the rapidly evolving big data ecosystem back in June. Based on some very helpful feedback from readers of this blog and others, a number of additional meetings with interesting startups and more in depth research, we’ve come up with this second version.

Some thoughts:

It’s still a work in progress (and will presumably always be, that’s the nature of the beast)
It’s even more crowded than the first time around, which reflects the incredible vitality of the big data space
We’ve created some new subcategories such as NoSQL/NewSQL and analytics services (reflecting the reality that, for the time being, the last mile of data analysis is very much performed by humans)
We have the occasional company that appears in different categories (Infochimps or Autonomy for example)
We have learned more about companies that were already on the first version of the chart, and have positioned them differently. For example, Metamarkets now falls in the “Cross Infrastructure/Analytics” category as they offer a stack that includes a data store (Druid), predictive analytics and visualization. Another example is Collective[i] – they have built an entire proprietary big data stack from the ground up, that includes infrastructure, analytics and applications – making the company a rare example of an “Application Service Provider”.

Our goal is to continue updating this chart from time to time, and perhaps make it evolve visually, as we’ve probably reached the limits of what we can reasonably fit on one slide. It was suggested that we try to visually distinguish on premise offerings vs. cloud based solutions, which we may try to do.

To enlarge, click on the arrows at the bottom right of the chart.

Comments, thoughts, questions? Please add to the comments section.

10Gen, Mortar, Datadog & Rick Smolan at the NYC Data Meetup

Here are the videos and some pictures (scroll down) of the NYC Data Business Meetup held on September 25, 2012

In order of appearance:

1) Rick Smolan told us about his fascinating new project, the “Human Face of Big Data” – see the NY Times coverage here: http://nyti.ms/TO5MDd.

2) Mortar (presenter: K Young, CEO). Mortar (www.mortardata.com) provides a platform-as-a-service for Hadoop. They take care of all of the necessary infrastructure (via AWS) and allow any software engineer to run jobs on Hadoop using Apache Pig and Python without special training.

3) Datadog (presenter: Alexis Le Quoc, co-founder). Datadog (www.datadoghq.com) is a service for IT, Operations and Development teams who write and run applications at scale, and want to turn the massive amounts of data produced by their apps, tools and services into actionable insight. Datadog helps software developers and web ops understand their IT Data by putting it all in context.

4) We finished with a fireside chat with Dwight Merriman, CEO and co-founder, 10Gen. 10Gen (www.10gen.com) develops MongoDB, and offers production support, training, and consulting for the open source database. Dwight is one of the original authors of MongoDB. In 1995, Dwight co-founded DoubleClick (acquired by Google for $3.1 billion) and served as its CTO for ten years. Dwight was the architect of the DoubleClick ad serving infrastructure, DART, which serves tens of billions of ads per day. Dwight is co-founder, Chairman, and the original architect of Panther Express (now part of CDNetworks), a content distribution network (CDN) technology that serves hundreds of thousands of objects per second. Dwight is also a co-founder and investor in BusinessInsider.com and Gilt Groupe.

Enterprise Tech Panel in NYC

Mark Birch has a good summary of a recent panel organized by the NYC Enterprise Tech Meetup (which also has a video of the panel on its site, unfortunately with poor audio quality). In addition to Mark, the panel featured David Aronoff (General Partner, Flybridge Capital Partners), Jeanne Sullivan (General Partner, StarVest Partners), Raju Rishi (Venture Partner, Sigma Partners) and myself. Many thanks to Jonathan Lehr, the organizer of the event, for putting it together. Couple of pics below and also one here (I know! Panel pics are just so exciting!).

One key takeaway for me is that the NYC area used to have a pretty vibrant enterprise tech scene (with Computer Associates, etc.) in the eighties and up until the mid-nineties (before my time), which makes the relative dearth of enterprise tech startups in NYC over the last dozen years somewhat odd. I’m excited to see a whole new wave of NYC startups rising to prominence, including 10Gen, Opera Solutions, Enterproid, Nodejitsu, AppFirst, Datadog, Mortar, etc.

Facebook’s troubles are everyone’s problem

Like many people in our industry, I have been watching with growing discomfort Facebook getting beaten up on the public markets, including yesterday’s new $17.729 low. Regardless of the reason (the company remained private during its phase of hypergrowth, nobody reading the S-1, etc.) and leaving aside the borderline silly hunt for a culprit (Zuck is immature, the CFO badly overestimated demand, NASDAQ botched the IPO, etc.), the point is that the darling of an entire industry, the most prominent of all emerging internet franchises, the one that epitomized social media at its most powerful, ended up colliding with, and so far losing to, the cold cynicism of public markets.

The ripple effects are starting to be felt throughout the industry, as they trickle down the ecosystem. Evernote for example announced last week that it was going to delay its IPO until at least 2015. Further down the chain, anecdotal evidence around me seems to indicate that VCs are being increasingly cautious and picky when evaluating consumer internet plays (broadly defined: social, local, mobile, etc.), particularly when there is not a very apparent business model in place or in the works. I’m not sure if it’s the end of an era just yet, but the Facebook IPO that was supposed to create a brand new virtuous circle (Facebook buying a bunch of startups, and Facebook millionaires angel investing in hundreds of others) is so far proving to have the exact opposite effect on the industry.

While nobody is thrilled about what is happening, I hear two types of “rationalizations” around me:

It’s not entirely bad that things cool off a little bit — there are just too many startups being created in those consumer areas, too much angel and VC money floating around, valuations that don’t make sense, not enough technical talent to support the whole thing, etc
As an industry, we’ll all be fine because things have been heating up on the enterprise tech side. Public markets have been much more accepting of the enterprise tech plays (Splunk, ServiceNow, Palo Alto Networks all did very well in their IPOs). For every Instagram, there seems to be a Nicira type acquisition. Box.net and its young CEO are the object of the type of hype (and investor funding) typically reserved to successful consumer plays. The combination of the cloud and big data trends has many commentators excited. Some see the beginning of a 20 year cycle of innovation in enterprise IT.

The first point is complex and would deserve its own post (hopefully sometime soon). As to the second point, as much I’m a big fan of enterprise tech and agree that it’s probably where the action is going to be in the next few years, the above fails to reassure me, as I just don’t see an enterprise tech boom developing independently from a strong consumer internet sector, long term:

Enterprise tech does not have the gravitational pull of the consumer internet. Because you can touch it, feel it, experience it, everyone can relate to the consumer internet. And because it is very visible, the young entrepreneurs who succeeded at it not only made fortunes in a short amount of time, but became pop culture icons in the process (complete with movies, Gap ads, etc.). Rightly or wrongly, this has created all the excitement around tech that we now take for granted. Some of it perhaps led to unwanted attention (not sure that, as an industry, we need Justin Bieber to be angel investing, as much as I’m a fan…), but arguably this has drawn into the industry a lot of talent and money that has lifted all boats. What happens when this interest subsides? Enterprise tech sorely lacks sex appeal: it is complicated, obscure, behind the scenes. You pretty much can’t be an outsider to the tech industry and come up with a good idea. If exciting consumer tech projects stop being funded, will Wall Street techies still continue to migrate to startups? Will hundreds of thousands of people will still feel an urge to learn to code? Will the broader public still care about tech?
Consumer internet companies have been driving, or at least influencing, innovation in enterprise IT over the last few years – whether it is actual technology (Amazon pioneering cloud computing, Google/Yahoo/Facebook/LinkedIn being driving forces in big data, etc.), the way enterprise tech is consumed by employees (social enterprise, Bring Your Own Device, etc.) or the way it has been sold to enterprises (freemium plays that bypass the CIO). What happens to this phenomenon, long term, if consumer internet is no longer driving innovation?
Consumer internet companies have proven to be great early customers of enterprise tech startups. Of course, you could argue that this “startups selling to other startups” does not make sense because at the end of the day it’s all funded by VC money. But the reality is that every enterprise tech startup needs early customers, and many consumer internet startups have proven to be more willing to use new, bleeding edge technologies – the hope being that, once an enterprise tech startup has a few success stories with startups under its belt, it makes it easier to “graduate” to Fortune 500 companies. When the internet bubble burst in the early 2000s, consumer tech companies went down first, but enterprise tech startups soon followed, because many of them essentially lost a chunk of their customer base. Could the same phenomenon occur today?

The stabilization of the Facebook stock price is hugely important for our entire industry. Tech blogs, true to form, are overall wildly optimistic and blame it on Wall Street “not getting it” (Techcrunch covered yesterday’s new low as a buying opportunity), but smart entrepreneurs, VCs and involved parties seem to be doing everything they can to stop the blood bath (see Fab’s Jason Goldberg post here, for example), as they fully measure the severity of the issue.

Continuuity, Sailthru & Visual Revenue at the NYC Data Meetup

Here are the some videos, slides and pics from the most recent NYC Data Business Meetup. The videos are unfortunately not of the greatest quality, but are good enough to watch.

Also, note to self: make sure that our audience of 200+ sits closer to the stage, so that the room doesn’t look tragically empty on camera (rookie mistake)!

In order of appearance:

1) Todd Papaioannou, CEO, Continnuuity, a stealth big data startup, based in Palo Alto, CA and backed by Andreessen Horowitz, Battery Ventures, Data Collective and a number of high profile angels. Todd was previously Chief Cloud Architect for Yahoo.

2) Neil Capel, CEO, and Daniel Krasner, Chief Data Scientist, Sailthru, a New York based startup backed by RRE, AOL Ventures, Lerer Ventures, DFJ Gotham, Thrive Capital, Metamorphic, etc. Sailthru provides fully automated, 1:1 email and onsite recommendations using a unique behavioral targeting platform. Sailthru helps brands cut through the clutter and build trust with their customers by recognizing and acting upon their individual interests. Sailthru’s technology creates individual user profiles associated with each person’s email address and online behavior. Sailthru’s algorithms gauge each individual user’s intent and match appropriate content and frequency of email communications such that every email is tailored to the unique user. That means they send as many permutations of an email as there are recipients. All simultaneously, all automated and all in real time.

Sailthru’s slides (PDF)

3) Dennis R. Mortensen, CEO and Jeroen Janssens, Data Scientist,Visual Revenue, a New York based startup backed by Lerer Ventures, SV Angel, IA Ventures and Softbank. Visual Revenue increases front page performance for online media organizations. Their platform provides Editors with actionable, real-time recommendations on what content to place in what position right now and for how long. Visual Revenue’s predictive analytics technology allows media organizations to proactively manage the cost of exposing a piece of content on a front page, whilst maximizing the return they expect from promoting it.

Visual Revenue’s slides

4) Panel discussion and Q&A with the audience

Data-driven venture capital

I have been very intrigued by the recent emergence of “data driven” firms, aiming to use data to reinvent venture capital.

While they certainly review various data points and metrics before deciding to invest in a startup, as of today venture capital investors largely operate based on “pattern recognition” – the general idea being that, once you’ve heard thousands of pitches, sat on many boards and carefully studied industries for years, you become better than most at predicting who will make a strong founder/CEO, what business model will work and eventually, which startup will end up being a home run. The trouble is, the model doesn’t always work, far from it, and many VCs end up making the wrong bets, resulting in disappointing overall industry results. Could VCs be just like the baseball scouts described in Moneyball, who think they can spot future superstars because they’ve seen so many of them before, but end up being beaten by a cold, objective, statistics-based approach?

Enter several firms trying to do things differently:

Google Ventures has created various data-driven algorithms that inform their investment decisions – see the team discussing the concept at last year’s Web 2.0 Summit here.
Correlation Ventures raised $165M earlier this year for its first fund, which was reportedly oversubscribed (a rarity for a new fund). Correlation says it has built the “world’s largest, most comprehensive database of U.S. venture capital financings”, which covers “the vast majority of venture financings that took place over the past two decades, tracking everything from key financing terms, investors, boards of directors, management backgrounds, industry sector dynamics and outcomes”. Based on this data, Correlation has developed predictive analytics models which it uses to guide its investment decisions – as a result, it can make decisions very quickly (less than two weeks) and doesn’t require additional due diligence.
Just earlier this week, E.ventures (which results from the relaunch of BV Capital) also emphasized its own data-driven approach to investment decisions

Since I’m a big fan of anything data-driven (decisions, product, companies), the concept resonates strongly with me. Predictive analytics have been successfully used in various industries, from retail to insurance to consumer finance. Other asset classes are highly data driven – fundamental and technical analysis drive billions of dollars of trade; hedge fund quants spend their lives building complex models to price and trade securities; high-frequency trading bypasses human decision making altogether and invests gigantic amounts of money based solely on data. In this world where everything gets quantified, why should venture capital be an exception?

However, as much as I like the idea, I believe venture capital doesn’t lend itself very well to a model-heavy, quasi “black box” approach. The creation of a reliable, systematic predictive model is a particularly challenging task when you consider the following obstacles:

A relatively sparse data set: while by definition there’s not much data about early stage startups, you could argue that that amount is constantly increasing, as everything is moving online, and everything online can be measured. You could also argue that, if you could have access to all historical data from all VC firms in the country, and efficiently normalize it, you would end up with a lot of data. But still that amount of data would pale in comparison to what’s available to public market investors – Bloomberg processes up to 45 billion “ticks” (change in the price of a security)… daily.
Limited intermediary feedback points: Before getting to a final outcome (game lost or won), baseball is full of small binary outcomes (a player hits the ball or he doesn’t). Similarly, in market finance, the eventual success of strategy can typically be broken down in many different points with binary outcomes (you make money or you don’t). In venture capital, before getting to a final outcome (a startup has a liquidity event), it’s unclear how many of those intermediary, measurable points you get, that can enable you to build models – perhaps a few (the startup’s next round is an “up round” or a “down/flat round”) but certainly nothing compared to the above examples.
Extended time horizon: in baseball, the rules of the game do not change from game to game, or season to season. In venture capital, the “game” can last for years, because investments are highly illiquid. During that time, pretty much anything can change – regulatory framework, unforeseen disruptive forces in the industry, etc.

In addition, it would be interesting to see how startups react in the long run to investors who are interested in them mostly because they scored well on a model, as opposed to spending extended time getting to know them. Unlike public stock markets, venture capital fundraising is a two-way dance, and startups often pick their investors as much as their investors pick them.

However, while I have my doubts about using data models as valid predictors of the overall success of an early stage startup, my guess is that there are still plenty of interesting insights to be gleaned from the data, and that forward-thinking VC firms could gain a competitive advantage by actively crunching it – my sense is that very few firms have done so at this stage.

Interestingly, there are some good data sources and emerging technologies out there that could be leveraged as a first step, without engaging into a massive data gathering or technology development effort:

Public (and/or free) sources: Crunchbase is a great source of data. There are many directions you could go with mining it – as an example, see what Opani (an early stage NYC big data company) came up with here. I bumped into Semgel, a web app that has taken a stab at instantly gathering and analyzing Crunchbase data. The Crunchbase data could be augmented with data from marketplaces such as Factual. See also this intriguing article about pre-money valuations of startups (typically not information that’s disclosed) could possibly be mined from publicly available Delaware certificates of incorporation and similar documents in other states.
Private Databases: There a few interesting databases that collect and organize more complex information flows around private companies such as CB Insights (which also offers a data-driven tracking tool called Mosaic)
Technologies: In addition to the various open-source big data tools, there are some technologies/companies that could be leveraged to mine VC industry data, including for example Quid, co-founded by the talented Sean Gourley – “understanding co-investment relationships and deriving investment strategies” is one the challenges they address.

If anyone is aware of other efforts around crunching data relevant to VCs, or other ways VCs have been used a heavily data-driven approach, I’d love to hear about it in the comments.

Some thoughts on Brewster

Three days in, Brewster, the new personalized address book, has become an instant classic for me. Perhaps I lucked out, but I didn’t experience much of the delay in processing my contacts that many others reported – I had to wait about 90 minutes which, while not ideal, was fine. Everything since then seems to have been working like a charm – the de-duplication and reconciliation of contacts across social networks, in particular, was beautifully done, and that’s not a trivial data problem.

I have always liked the concept of a personalized, always current address book. In a way, it is sort of like the old Plaxo idea, which was probably before its time. There were various startups that tried to fix the address book, including Sensobi (that eventually was acquired by GroupMe). The next iteration of the social concept that I’m aware of is Everyme – at least in the initial vision the founders had for it when they were at Y Combinator in the Summer of 2011. I was a bit bummed when it pivoted (or evolved) to become a private social network.

I really like that Brewster came out of the gate very “feature-rich”. While I’m all for MVPs and generally agree that “if you are not embarrassed by the first version of your product, you’ve launched too late”, for something like this, I think the founder(s) made the right call to wait until the product was ready before launching. At this stage of the game, anything that sounds like yet another hyped up app, and asks me to connect all my social networks when I first log in, etc. had better deliver some real value quickly for me to give it a real chance, and that was the case here. As the founder Steve Greenwood has apparently been mulling over this concept for many years, the temptation to release early must have been strong, particularly as it sounds like several startups are working on related concepts, including for example FullContact, but from my user’s standpoint, it was well worth it.

A few other aspects of the product (and its launch) that I like:

– I like that Brewster was clearly thought through as a data product – while the “Favorites” tab has an emotional and aesthetically pleasing aspect to it (depending on how attractive one’s friends are, at least…), the rest of the app is very data-centric: the “Lists” tabs has some interesting automatic categorization (I have 171 friends who are ‘Managing Director”, apparently, does that mean I’m old?), while the “Search” tab is awesome, with good suggested searches and the ability to uncover all sorts of interesting common interests across my contact list.

– While everything is automated, I like the fact that the product made me work manually to create my list of “favorites”. That actually increased my personal investment into the product, and makes me less likely to discard it.

– I really like that Brewster did not use any of the tired “virality” tricks that have become so common place. No automatic posting on my Facebook newsfeed; no “Sent using my Brewster address book” tag line in emails, etc.

– I was impressed with the email I got to announce that my account was ready, personalized with pictures of some of my key contacts – great way of delivering a unique experience before I even started using the product in earnest.

The data privacy issue (and the fairly dramatic reactions to it) are of course a concern. I’m actually surprised that I don’t care more about it, personally — I guess I have gone pretty far down the path of accepting some privacy risk (as long as it’s not banking information), in return for getting a lot of value from the product, which I feel is the case here. But obviously many people will feel differently, and this could sink the company entirely, if not properly addressed.

One functionality that I don’t find as impressive, at least as of now, is the “Updates” section — what it has surfaced so far (birthdays essentially) is not particularly interesting. What would be really cool, eventually, would be an integration with Newsle, to get news about your friends. Oh wait, add to this an integration with Cue, as well. All built in natively into my iPhone address book and calendar. Ok, so, maybe that’s a bit much to ask. In the meantime, Brewster is already one of the most interesting apps I have seen in a long time.

A chart of the big data ecosystem

My colleague Shivon Zilis has been obsessed with the Terry Kawaja chart of the advertising ecosystem for a while, and a few weeks ago she came up with the great idea of creating a similar one for the big data ecosystem. Initially, we were going to do this as an internal exercise to make sure we understood every part of the ecosystem, but we figured it would be fun to “open source” the project and get people’s thoughts and input.

So here is our first attempt.

A few things became apparent very quickly:

1) Many companies don’t fall neatly into a specific category

2) There’s only so many companies we can fit on the chart — subcategories as NoSQL or advertising applications, for example, would almost deserve their own chart.

3) The ecosystem is evolving so quickly that we’re going to need to update the chart often – companies evolve (e.g., Infochimps), large vendors make aggressive moves in the space (VMWare with Serengeti and the Citas acquisition)

What do you think? (click on the bottom right to expand)

There are many roads to success: The Buddy Media example

It’s been a few days now since their acquisition was formally announced, and I continue to be fascinated by the Buddy Media story. But what fascinates me is less the company itself and all the things that make it great – and instead the fact that its success tests the conventional wisdom of what makes a venture successful. Rightly or wrongly, investors, prospective employees, the press, and anyone who tries to predict the highly unpredictable fate of startups, tend to default to some common assumptions about what’s going to work and what isn’t. The Buddy Media story challenges that conventional wisdom in some interesting ways:

1. NYC is not a good place to start an enterprise software company

It’s a bit ironic that, for all the talk about NYC being a media and eCommerce hub, the largest acquisition in five years would be an enterprise software company.

2. It takes forever to build a successful enterprise software company.

It took Buddy Media less than 5 years from start to success, including an initial pivot.

3. To build a successful enterprise software company, you need technical co-founders, or at least a technical CEO

Buddy Media’s CEO is a serial entrepreneur with two degrees in journalism. Buddy Media’s COO is a serial entrepreneur with a background in business development and marketing and a degree in economics. The other co-founder and Chief Strategy Officer is a digital branding and marketing expert with a degree in Broadcasting and Mass Media.

4. Selling to marketers and advertisers is a really tough business.

Fortune 500 marketers and advertising agencies are indeed a tough audience – long sales cycles, often low budgets, a preference for homegrown solutions, a reluctance to buy what others in the industry purchase: not easy. But the Buddy Media success shows that it can be done, with the right execution: build the best product in your category, focus on sales, make friends in the right places, hire some key people from agencies, and work really hard.

5. Be really careful with strategic money

Buddy Media took a strategic investment from advertising leader WPP, which ended up substantially accelerating their business.

6. Service companies can’t become product companies

After an initial pivot, Buddy Media had to turn themselves into a service company to survive the 2008 economic recession. James Altucher has a really interesting post on Techcrunch that describes this phase. Somehow, they were able to gradually build a product offering.

7. The best founders are young and single

Two of the co-founders of Buddy Media are married. On top of that, they have three children. While there are famous examples of homeruns started by married founders (Cisco, VMware, etc.), in my experience, behind closed doors most investors think it’s a terribly risky idea. The Buddy Media story shows that where there is will there is a way: founders with family obligations can still endure the rollercoaster lifestyle of the startup world.

VoltDB, Datastax, RJ Metrics and Custora at the NYDBM

Here are the presentations from the NYC Data Business Meetup on May 21:

VoltDB – Presenter: Scott Jarr, co-founder

Datastax – Presenter: Matt Pfeil, co-founder

RJMetrics – Presenter: Robert J Moore, co-founder and CEO

Custora (presentation coming soon) – Presenters: Corey Pierson, co-founder, and Aaron Goodman, data scientist.

And here are a few pics!

Hope to add videos soon.

Thoughts, feedback, questions? Topics you’d like to discuss at the next NYDBM? (or data-related stuff you’d like to discuss, regardless of whether you attend the NYDBM or not?). Feel free to opine in the comments section.

“The business of data” panel with IA Ventures, Klout & Quid

Just got a copy of the video of a panel we did a few months ago at the Bloomberg Link Empowered Entrepreneur conference. It features Roger Ehrenberg of IA Ventures, Joe Fernandez of Klout and Sean Gourley of Quid. The speakers are terrific and it’s a solid introduction to the topic — since this panel was part of a broader entrepreneurial conference, it is slightly higher level than panel conversations you’d hear in specialized Big Data conferences.

Data Science Hackathon: Pictures

In connection with Big Data Week and Data Science London, we helped organize and host a global data science hackathon that took place simultaneously in various locations around the world (including London, Sydney, and San Francisco) on April 28, 2012.

Knewton, Bundle, Next Big Sound & Bloomberg: Pictures

Pictures of the NYC Data Business Meetup #5 (April 23, 2012). Presenters: Alex White, Co-Founder and CEO, Next Big Sound; David Liu, COO, and Jesse St Charles, Data Scientist, Knewton; Shawn Edwards, CTO and Andrew Paprocki, Senior Engineer,Bloomberg LP; Phil Kim, CTO, Bundle.

The thriving data ecosystem in NYC

There’s a lot of interest in data-related businesses and products everywhere these days, but it’s been particularly fun to see things accelerating in New York (where I’m based). Some purely anecdotal evidence: We had 50 very qualified data scientists show up at the recent hackathon we organized (as part of Big Data Week), despite the ungodly start time of 8am on a Saturday. The Data Meetup I host monthly went from 0 to almost 1,300 members in barely 5 months. General Assembly is starting a 10 week intensive program in data science. Microsoft just announced it chose to locate in NYC its new research lab, which includes plenty of data science brainpower (including machine learning specialist John Langford and Jake Hofman, formerly of Yahoo Research).

NYC is becoming a real “hub” for data startups. In fact, in my opinion data startups are becoming the next “layer” of the NYC tech scene — the way content and advertising startups (24/7, Doubleclick, Silicon Alley Reporter, etc.) were the foundational layer of “Silicon Alley” from 1995 to 2005, and the way social and e-commerce startups (Tumblr, Gilt, Foursquare, Etsy, Warby Parker, Rent the Runway, etc.) became the next building block that led to where we are today.

Due to their often intensely technical nature, data startups represent an interesting opportunity for NYC to develop more of a scientific and engineering-focused startup culture.

NYC has the key components of a thriving data startup ecosystem, including:

1) Customer demand: For those startups that sell to enterprises rather than consumers, NYC is where many of the key buyers are located – specifically, Wall Street and Madison Avenue, which have been among the most voracious and sophisticated users of data. It’s no accident that some of the key conferences in the space, such as GigaOm’s Structure:Data or Strata, take place in NYC (or have an NYC event in addition to their CA event) – there’s no better place for emerging vendors to show off what they’ve built to potential purchasers.

2) A relevant talent pool: in addition to solid engineering talent, data-driven startups need data scientists, who come in various flavors: statisticians, mathematicians, machine learning experts, programmers, etc. In part because there has been demand for this type of profiles for a while in financial services, there’s a fair concentration of them in NYC, and I’m seeing an increasing number of them making the jump to startup land. NYC has a number of prominent data scientists, including (but certainly not limited to), Drew Conway and Jake Porway (both of whom are co-founders of Datakind, f/k/a Data without Borders), Max Shron, Cathy O’Neil (who left D.E. Shaw for a startup, Intent Media), Gilad Lotan, etc. And of course, we have our very own emerging media star (deservedly so) in the person of Hilary Mason, most recently profiled here.

3) A data community: Whether it’s Data Drinks or meetups, there’s clearly appetite for data nerds to get together and geek out. Both the NYC Predictive Analytics meetup (organized by Alex Lin) and the NYC Machine Learning meetup (organized by Paul Dix and Max Khesin) have over 2,000 members, while the New York Open Statistical Programming Meetup has 1,700 members.

4) Investors with a deep interest in the space: As far as I know, IA Ventures is the only VC firm in the country that has an exclusive focus on data as an investment thesis (Accel’s big data fund is a little different, in that it’s a dedicated pool out of a much larger fund). Roger Ehrenberg and his talented team (Brad Gillespie, Ben Siscovick, Jesse Beyroutey) are having a tremendous impact on the data world in general, and in NYC in particular (about half of their portfolio is NYC-based). RTP Ventures is a new but very promising NYC investor in the space, with a focus on the infrastructure part of the big data world. Many of the main NYC investors are also “data friendly”, and have interesting data plays in their portfolio, as part of a broader focus: Union Square Ventures, Betaworks (see John Borthwick’s “data is the new plastic“), RRE, Lerer Ventures, Thrive Capital, kbs+ Ventures, but I’m sure I’m forgetting a number of others.

5) Universities that are willing to get involved: The key machine learning centers in the country may be Carnegie Mellon, MIT and Stanford, but Columbia is strong as well, and most importantly, there are some terrific professors who are both academically prominent and deeply involved in the NYC tech scene – in particular Chris Wiggins (in addition to being a prominent machine-learning expert, Chris is also the co-founder of HackNY and has mentored many of the data scientists currently employed in NYC startups) and Tony Jebara (who runs the Columbia Machine Learning Laboratory and has also founded and advised several startups including Sense Networks and Bookt). NYU has some leading authorities the data-intensive field of physical computing and Internet of Things: Tom Igoe and Dan O’Sullivan. Medium term, Cornell may be able to bring some additional academic expertise to NYC (for example, it is home to Joachims Thorsten who is arguably one of the top SVM researchers).

6) A crop of promising data startups:

A growing number of NYC based startups offer data and predictive analytics solutions – starting perhaps with Opera Solutions, which very people in the NYC tech scene had heard about until it raised a whopping $84 million in September 2011 from Silver Lake and Accel KKR (Opera Solutions employs some 150 data scientists, out of 400 employees). In addition, NYC startups have been building all sorts of interesting data and analytics products for social media (Bitly, SocialFlow, Kno.des), news (Visual Revenue), finance (Dataminr), music (NextBigSound, which is moving to NYC), sports (Numberfire, and our own Bloomberg Sports) and of course advertising and marketing (Sailthru, Collective[i], Custora, PlaceIQ, YieldBot, Mediamath, m6d, 33across, Clickable, Buddy Media, etc.).
While we’re nowhere near the Silicon Valley on this front, it’s great to see more big data infrastructure companies in NYC – some like 1010Data largely predate the whole big data craze; others have been appearing more recently, including FluidInfo, CrowdControl, Mortar Data (which is moving to NYC), Datadog, and of course 10Gen, whose MongoDB noSQL database is quickly becoming a must-have for a number of data-driven companies.
Finally, several exciting NYC startups are focused on the application of data to create disruptive products in various industries, such as education (Knewton) or consumer finance (Billguard, Bundle).
The fact that NYC recently saw a couple of acquisitions of data startups – Chris Dixon’s Hunch and Jordan Cooper’s Hyperpublic – doesn’t hurt either.

7) A data-centric business culture: perhaps it is because some of the key historical entrepreneurial successes in NYC were data companies (Bloomberg LP, Nielsen); or perhaps it is a reflection of the demand of East Coast investors who arguably tend to be very focused on metrics and business models (as opposed to pure vision)… but somehow, as far as I can tell, there’s always been a real culture around data and analytics in NYC. Now increasingly, I hear CEOs of NYC startups present their companies as data companies, even those you wouldn’t necessarily suspect (recent examples include Dennis Crowley of Foursquare and Yaron Galai of Outbrain). In addition, NYC startups have been quick to build data science teams, including many that don’t explicitly position “data” as a key part of their value proposition: Etsy, Gilt, The Ladders, GetGlue, Foursquare, Tumblr all have data scientists on board.

All of this is just a start, and I’m excited to see how it all progresses in the next few months and years.

Working with big companies: 10 practical tips for startup founders

If you’re a startup founder, a day will come, generally sooner rather than later, when you have to deal with a big company – whether that’s a Fortune 1000 company you’re trying to sell something to, a large advertising agency you hope will drive some advertising dollars your way, or a media company you want to strike a partnership with. You may have been successful “selling” your vision to a group of angels or VCs and raise some money, but selling to a large company (and in “selling”, I include all forms of partnerships and other business development efforts) is a different animal altogether. For the last three years or so, I’ve been on the big company side and have heard many startups pitch a business relationship. The following tips – some trivial and slightly tongue in cheek, others more substantial – are based on patterns and issues I’ve seen over and over (presented in no particular order of priority).

1. Read a few sales books. Our startup culture very much celebrates product people (those who, like Steve Jobs, are able to create “stuff that people want”), and tech talent (the people who actually build the product). Some iconic founders of the previous decades may have been sales people (Larry Ellison being one example), but sales seems to have fallen out of favor as a key part of any founder’s skillset – you’ll hardly ever hear any VC express a strong preference for CEOs who have a strong sales background, the way they rave about technical CEOs. One consequence of this is that many startup founders (especially the younger ones, but not only) come equipped with incredible product savvy, but seem to have spent very little time learning how to sell. Selling to a large company is not just a question of presentation, it’s also about navigating a complex organization and being able to qualify an opportunity –I find that startup founders often fail to ask some basic questions about budgeting cycles, decision process, etc., and end up not being able to truly appreciate whether they have a real opportunity or not. The good news is, there’s tons of literature out there. Sure, many sales books have cringe worthy titles, and none of them are perfect, but after reading a few of those, a number of important principles emerge. Worth investing some time reading a few.

2. Hire a sales person. This flows from the previous point. No question, founders should be the initial sales people at their startups: there’s no better way to get market feedback and fine tune your value proposition. But I’m often surprised by how long startups wait before bringing in their first sales (or “biz dev”) person on board. I’m all in favor of building viral buzz about your product through blogs and social media, and using other innovative techniques to increase your inbound leads. But in many cases, all those techniques only take you so far, and at some point, particularly if you sell to large companies, you’re going to have to go through a series of in-person sales meetings. Like for most things in life, experience helps, and in my opinion, a combination of an experienced sales (or biz dev) person and a product/tech founder proves very effective in a meeting with a big company.

3. Play down the startup vibe. Sure, Zuckerberg wears sandals in business meetings (at least in the movie), but for almost everyone else, there’s very little upside to showing up in a hoodie and sneakers at a meeting where most people are going to wear ties and business suits. The startup tech world may have its own, idiosyncratic rules and codes, but when it comes to selling to a large company, it’s back to reality. Those people don’t read TechCrunch every day (if ever), and they will have most likely no idea who your hotshot investors are. Do yourself a favor and don’t show up looking like a college kid, because in large companies, recent college graduates are the people at the very bottom of the totem pole, not the people you invest a significant amount of time and resources doing business with.

4. Do not play with your phone during the meeting. Ever. Can’t tell you how often this happens. It typically goes down something like this. Three founders or team members of the startup show up at the meeting. One ends up being the lead presenter and fielding most of the questions. Another one says something every 10 minutes or so, and looks bored and disengaged the rest of the time. The third one spends the entire meeting playing with their iPhone (and I’m not talking about taking meeting notes). Guys, seriously? Remember, people will rarely call you on it, but that doesn’t mean you got away with it.

5. Come prepared. This should be obvious, but it’s not, and if you are truly well prepared for a meeting, it can be a major differentiator playing in your favor. Unlike an investor pitch, a pitch to a big company is not about you and how awesome your product is. It’s about how you can help the big company solve a specific problem. You should come in knowing everything you can read about the big company, and have ideas about where and how you could help. If you show slides, insert screenshots of the big company’s products in your deck. Your demo should be using examples relevant to the big company, or at a minimum, to whatever industry the big company operates in. If you truly want to make an impression, build a simple demo just for the meeting. Regardless, your demo should work – send the link to your host in advance to make sure the demo is going to work from within the firewall. If this is a key meeting for you, you should spend days, not hours, prepping for it.

6. A sales meeting is not a mentoring session. Again, something I’ve seen a few times – your big company hosts may be really friendly, they may show genuine interest in hearing your story, and they may have experience that is very relevant to your needs and challenges. Listen to their feedback and ideas if they offer them, but remember that you’re in a sales meeting. Resist the temptation to ask questions about how they were able to solve specific problems, or what they’d do if they were you, etc. Overall, one of the things they’re trying to establish on their end is that, while you’re a startup, you’re already pretty solid and it’s generally safe for them to work with you. Asking this type of questions is not going to help.

7. Be patient. Getting stuff done with a big company takes time. Most people understand that, but they often attribute it to the fact that large companies have more processes, approval layers, lawyers involved, etc. That is certainly true. But the more fundamental (and scarier) reason for it is that large companies typically care about whatever business deal you’re discussing a lot less than you do. Entering into a business relationship with a startup rarely moves the needle for them, at least not in a substantial and immediate way. They can have a genuine interest in working with you, but they rarely feel true urgency. Of course, from the startup’s perspective, it is frustrating because that deal can be a make or break moment. Most techniques you may use to create a sense of urgency are likely to come across as a gimmick, and should be used with caution. Recognize that reality and plan accordingly.

8. Don’t beat a dead horse. Being patient doesn’t mean that you should wait around forever. For some reason, large companies, just like some VCs, rarely actually say “no”. If there’s a fit, it should be fairly obvious after a couple of meetings. If you’re going from meeting to meeting with different people each time, if getting people to return your calls feels like pulling teeth, or if they mention how working with you may make sense for a big project that they’re going to start next year, it’s unlikely anything substantial is going to happen for you anytime soon. Learn to recognize when the opportunity is not going anywhere, and move on.

9. Don’t forget the biz dev guy (or gal). You’ve probably heard (or experienced) how things work before: your first point of contact at a large company will typically be their business development people; while it’s perfectly fine to start with them, you’ll want to quickly find an internal champion, preferably someone who has P&L responsibility and real decision power. The twist I would add is this: don’t drop the biz dev person once you’ve found your champion. The person you thought was going to be your champion may not turn out to the right person, they may get too busy, change jobs or lose interest. The biz dev person is your long-term ally within the large company, because their performance is often measured internally based on the number of opportunities they bring in that end up with an actual deal – so their interests are aligned with yours and they’ll want to make you successful.

10. Be careful with big company politics. Like any human organization, large companies are fraught with politics and as an outsider, it’s unlikely you’re going to be able to play them right. So stay away as much as you can. Be transparent, keep everyone in the loop, don’t make comments about people, don’t take sides. Don’t shop around from department to department in the hope you’ll find someone who’ll bite, unbeknownst to all the other people you’ve met with previously. Don’t suggest strange things like making intros between employees of the same company (in most companies, they can pick up their phone). Invest time in getting to know your contacts at the big company, and make them look good in front of their bosses.