Data-driven venture capital

I have been very intrigued by the recent emergence of “data driven” firms, aiming to use data to reinvent venture capital.

While they certainly review various data points and metrics before deciding to invest in a startup, as of today venture capital investors largely operate based on “pattern recognition” – the general idea being that, once you’ve heard thousands of pitches, sat on many boards and carefully studied industries for years, you become better than most at predicting who will make a strong founder/CEO, what business model will work and eventually, which startup will end up being a home run.  The trouble is, the model doesn’t always work, far from it, and many VCs end up making the wrong bets, resulting in disappointing overall industry results.  Could VCs be just like the baseball scouts described in Moneyball, who think they can spot future superstars because they’ve seen so many of them before, but end up being beaten by a cold, objective, statistics-based approach?

Enter several firms trying to do things differently:

  • Google Ventures has created various data-driven algorithms that inform their investment decisions – see the team discussing the concept at last year’s Web 2.0 Summit here.
  • Correlation Ventures raised $165M earlier this year for its first fund, which was reportedly oversubscribed (a rarity for a new fund).  Correlation says it has built the “world’s largest, most comprehensive database of U.S. venture capital financings”, which covers “the vast majority of venture financings that took place over the past two decades, tracking everything from key financing terms, investors, boards of directors, management backgrounds, industry sector dynamics and outcomes”.  Based on this data, Correlation has developed predictive analytics models which it uses to guide its investment decisions – as a result, it can make decisions very quickly (less than two weeks) and doesn’t require additional due diligence.
  • Just earlier this week, E.ventures (which results from the relaunch of BV Capital) also emphasized its own data-driven approach to investment decisions

Since I’m a big fan of anything data-driven (decisions, product, companies), the concept resonates strongly with me.  Predictive analytics have been successfully used in various industries, from retail to insurance to consumer finance.  Other asset classes are highly data driven – fundamental and technical analysis drive billions of dollars of trade; hedge fund quants spend their lives building complex models to price and trade securities; high-frequency trading bypasses human decision making altogether and invests gigantic amounts of money based solely on data.  In this world where everything gets quantified, why should venture capital be an exception?

However, as much as I like the idea, I believe venture capital doesn’t lend itself very well to a model-heavy, quasi “black box” approach.  The creation of a reliable, systematic predictive model is a particularly challenging task when you consider the following obstacles:

  • A relatively sparse data set: while by definition there’s not much data about early stage startups, you could argue that that amount is constantly increasing, as everything is moving online, and everything online can be measured.  You could also argue that, if you could have access to all historical data from all VC firms in the country, and efficiently normalize it, you would end up with a lot of data.  But still that amount of data would pale in comparison to what’s available to public market investors – Bloomberg processes up to 45 billion “ticks” (change in the price of a security)… daily.
  • Limited intermediary feedback points: Before getting to a final outcome (game lost or won), baseball is full of small binary outcomes (a player hits the ball or he doesn’t).  Similarly, in market finance, the eventual success of strategy can typically be broken down in many different points with binary outcomes (you make money or you don’t).  In venture capital, before getting to a final outcome (a startup has a liquidity event), it’s unclear how many of those intermediary, measurable points you get, that can enable you to build models – perhaps a few (the startup’s next round is an “up round” or a “down/flat round”) but certainly nothing compared to the above examples.
  • Extended time horizon: in baseball, the rules of the game do not change from game to game, or season to season.  In venture capital, the “game” can last for years, because investments are highly illiquid.  During that time, pretty much anything can change – regulatory framework, unforeseen disruptive forces in the industry, etc.

In addition, it would be interesting to see how startups react in the long run to investors who are interested in them mostly because they scored well on a model, as opposed to spending extended time getting to know them.  Unlike public stock markets, venture capital fundraising is a two-way dance, and startups often pick their investors as much as their investors pick them.

However, while I have my doubts about using data models as valid predictors of the overall success of an early stage startup, my guess is that there are still plenty of interesting insights to be gleaned from the data, and that forward-thinking VC firms could gain a competitive advantage by actively crunching it  – my sense is that very few firms have done so at this stage.

Interestingly, there are some good data sources and emerging technologies out there that could be leveraged as a first step, without engaging into a massive data gathering or technology development effort:

  • Public (and/or free) sources:  Crunchbase is a great source of data.  There are many directions you could go with mining it – as an example, see what Opani (an early stage NYC big data company) came up with here. I bumped into Semgel, a web app that has taken a stab at instantly gathering and analyzing Crunchbase data.  The Crunchbase data could be augmented with data from marketplaces such as Factual.  See also this intriguing article about pre-money valuations of startups (typically not information that’s disclosed) could possibly be mined from publicly available Delaware certificates of incorporation and similar documents in other states.
  • Private Databases: There a few interesting databases that collect and organize more complex information flows around private companies such as CB Insights (which also offers a data-driven tracking tool called Mosaic)
  • Technologies: In addition to the various open-source big data tools, there are some technologies/companies that could be leveraged to mine VC industry data, including for example Quid, co-founded by the talented Sean Gourley – “understanding co-investment relationships and deriving investment strategies” is one the challenges they address.

If anyone is aware of other efforts around crunching data relevant to VCs, or other ways VCs have been used a heavily data-driven approach, I’d love to hear about it in the comments.