Today our portfolio company HyperScience is coming out of stealth and talking a bit more about what they’ve been working on for the last couple of years. We have been involved for a little while already as lead Series A investors, and we are excited to now be joined today by our friends at Felicis, a great addition to a strong syndicate from both coasts that also includes Shana Fisher (Third Kind) who led the seed, AME Cloud Ventures, Slow Ventures, Acequia, Box Group and Scott Belsky. The company is announcing today a total of $18M in Series A investment.
HyperScience offers AI solutions targeting Global 2000 corporations and government institutions. Their products enable those customers to automate or accelerate a lot of dusty back office processes, particularly those that involve the manipulation and triage of large amounts of documents and images.
Over the last few months, the usual debate around unicorns and bubbles seems to have been put on hold a bit, as fears of a major crash have thankfully not materialized, at least for now.
Instead another discussion has emerged, one that’s actually probably more fundamental. What’s next in tech? Which areas will produce the Googles and Facebooks of the next decade?
What’s prompting the discussion is a general feeling that we’re on the tail end of the most recent big wave of innovation, one that was propelled by social, mobile and cloud. A lot of great companies emerged from that wave, and the concern is whether there’s room for a lot more “category-defining” startups to appear. Does the world need another Snapchat? (see Josh Elman’s great thoughts here). Or another marketplace, on-demand company, food startup, peer to peer lending platform? Isn’t there a SaaS company in just about every segment now? And so on and so forth.
One alternative seems to be “frontier tech”: a seemingly heterogeneous group that includes artificial intelligence, the Internet of Things, augmented reality, virtual reality, drones, robotics, autonomous vehicles, space, genomics, neuroscience, and perhaps the blockchain, depending on who you ask.
As we are perhaps reaching the end of a cycle of innovation in tech – the one that resulted from the simultaneous emergence of social, mobile and cloud – and collectively pondering what’s next, one of the areas I’ve found particularly exciting recently is the intersection of Big Data and life sciences.
A little over two years ago, in connection with my investment in Recombine, a genomics startup, I wrote (here) about another powerful combination of trends: the sharp drop in the cost of sequencing the human genome, the maturation of Big Data technologies, and the increasing commoditization of wet lab work.
The fundamental premise was, and still very much is, as follows:
In a tech startup industry that loves its shiny new objects, the term “Big Data” is in the unenviable position of sounding increasingly “3 years ago”. While Hadoop was created in 2006, interest in the concept of “Big Data” reached fever pitch sometime between 2011 and 2014. This was the period when, at least in the press and on industry panels, Big Data was the new “black”, “gold” or “oil”. However, at least in my conversations with people in the industry, there’s an increasing sense of having reached some kind of plateau. 2015 was probably the year when the cool kids in the data world (to the extent there is such a thing) moved on to obsessing over AI and its many related concepts and flavors: machine intelligence, deep learning, etc.
Beyond semantics and the inevitable hype cycle, our fourth annual “Big Data Landscape” (scroll down) is a great opportunity to take a step back, reflect on what’s happened over the last year or so and ponder the future of this industry.
In 2016, is Big Data still a “thing”? Let’s dig in.
In the furiously competitive world of tech startups, where good entrepreneurs tend to think of comparable ideas around the same time and “hot spaces” get crowded quickly with well-funded hopefuls, competitive moats matter more than ever. Ideally, as your startup scales, you want to not only be able to defend yourself against competitors, but actually find it increasingly easier to break away from them, making your business more and more unassailable and leading to a “winner take all” dynamic. This sounds simple enough, but in reality many growing startups, including some well-known ones, experience exactly the reverse (higher customer acquisition costs resulting from increased competition, core technology that gets replicated and improved upon by competitors that started later and learned from your early mistakes, etc.).
While there are various types of competitive moats, such as a powerful brand (Apple) or economies of scale (Oracle), network effects are particularly effective at creating this winner takes all dynamic, and have been associated with some of the biggest success stories in the history of the Internet industry.
Network effects come in different flavors, and today I want to talk about a specific type that has been very much at the core of my personal investment thesis as a VC, resulting from my profound interest in the world of data and machine learning: data network effects.
A few days ago, I was invited to speak at a Yale Entrepreneurship Breakfast about about one of my favorite areas of interest, Artificial Intelligence. Here are the slides from the talk — a primer on how AI rose from of the ashes to become a fascinating category for startup founders and venture capitalists. Very much a companion to my earliest post about our investment in x.ai. Many thanks to my colleague Jim Hao, who worked with me on this presentation.
Note: This post appeared on VentureBeat, here.
It’s been almost two years since I took a first stab at charting the booming Big Data ecosystem, and it’s been a period of incredible activity in the space. An updated chart was long overdue, and here it is:
(click on the arrows at the bottom right of the screen to expand)
A few thoughts on this revised chart, and the Big Data market in general, largely from a VC perspective:
Getting crowded: Entrepreneurs have flocked to the space, VCs have poured money into promising startups, and as a result, the market is starting to get crowded. Certain categories like databases (whether NoSQL or NewSQL) or social media analytics feel ripe for consolidation or some sort of shakeout (which may have already started in social analytics with Twitter’s acquisitions of BlueFin and GNIP). While there will be always room for great new startups, it seems that a lot of the early bets in the broader infrastructure and analytics segments have been made at this stage, and the bar for success is getting higher – which doesn’t mean that VC money will stop pouring in. In terms of this specific industry chart, we’ve clearly reached the limit of how many companies we can fit one page. I’m sure there are a number of great companies we either missed or didn’t have enough space to include – apologies in advance to those, and I’d love to hear people’s thoughts and suggestions in the comments section about who else should be included.
Still early: Overall, we’re still in the early innings of this market. Over the last couple of years, some promising companies failed (for example: Drawn to Scale), a number saw early exits (for example: Precog, Prior Knowledge, Lucky Sort, Rapleaf, Nodeable, Karmasphere, etc.), and a handful saw more meaningful outcomes (for example: Infochimps, Causata, Streambase, ParAccel, Aspera, GNIP, BlueFin labs, BlueKai). Meanwhile, some companies seem to be reaching significant scale, and have raised spectacular amounts of money (for example, MongoDB has now raised over $230M, Palantir almost $900M and Cloudera $1B). But overall, we’re still early in the curve in terms of successful IPOs (Splunk or Tableau notwithstanding) and large exits, although the big companies are getting more acquisitive in the space (Oracle with BlueKai, IBM with Cloudant). In many segments, startups and large companies are jockeying for position and no obvious leader has emerged.
Hype, meet reality: A few years into a period of incredible hype, is Big Data still a thing? While less press worthy, the next couple of years are going to be hugely important for this market, as corporations start moving Big Data projects from experimentation to full production. While they will lead to rapidly increasing revenues for some Big Data vendors, those deployments will also test whether Big Data can truly deliver on its promise. Meanwhile, the fundamental need for Big Data technology keeps increasing, as the deluge of data keeps accelerating, powered in part by the rapidly emerging Internet of Things industry.
Infrastructure: Hadoop seems to have solidified its position as the cornerstone of the entire ecosystem, but there are still a number of competing distributions – this will probably need to evolve. Spark, another open source framework that builds on top of the Hadoop Distributed File System, is getting a lot of buzz right now because it promises to fill in the places where Hadoop has been weak, namely interactive speeds and good programming interfaces (and early signs seem to point to fulfilling that promise). Some themes (for example, in memory or real time) continue to be top of mind; others are appearing (for example, there’s a whole new generation of data transformation/munging/wrangling tools, including Trifacta, Paxata and DataTamer). Another key discussion is whether enterprise data will truly move to the cloud (public or private), and if so, how quickly. Many will argue that Fortune 500 companies will keep their data (and the software to process it) on premise for years to come; a generation of Hadoop-in-the-cloud startups (Qubole, Mortar, etc.) will argue that all data is moving to the cloud long term.
Analytics: This has been a particularly active segment of the Big Data ecosystem in terms of startup and VC activity. From spreadsheet type interfaces to timeline animations and 3D visualizations, startups offer all sorts of different analytical tools and interfaces, and the reality is that different customers will have different type of preferences, so there’s probably room for a number of vendors. Go to market strategies differ as well – some startups focus on selling tools to data scientists, a group that is still small but growing in numbers and budget. Others adopt the opposite approach and sell automated solutions targeting business users, bypassing data scientists altogether.
Applications: As predicted, the action has been slowly but surely moving to the application layer of Big Data. The chart highlights a number of exciting startups that are fundamentally powered by Big Data tools and techniques (certainly not an exhaustive list). Some offer horizontal applications – for example, Big Data powered marketing, CRM tools or fraud detection solutions. Others use Big Data in vertical specific applications. Finance and ad tech were always early leaders in adopting Big Data, years before it was even called Big Data. Gradually, the use of Big Data is spreading to more industries, such as healthcare and biotech (particularly in genomics) or education. This is only the beginning.
Many thanks for my FirstMark colleague Sutian Dong for doing a lot of the heavy lifting on this chart. My former colleague Shivon Zilis of Bloomberg Beta contributed immensely to prior versions of this chart.
3. Crowdsourced data. From Estimize (which crowdsources analyst estimates) to Premise (which crowdsources macroeconomic data through an army of people around the world equipped with mobile phones), a whole new way of capturing financial data has emerged. Quandl, a financial data search engine, has aggregated over 8 million financial and economic datasets through both web crawling and crowdsourced, community contributions. Once such a data platform has been built, could third party developers add analytic and visualization tools on top, essentially resulting in a crowdsourced “terminal” of sorts that would be reliable enough, at least for non mission critical, non real time use cases?
The field of bioinformatics is having its “big bang” moment. Of course, bioinformatics is not a new discipline and it has seen various waves of innovations since the 1970s and 1980s, with its fair share of both exciting moments and disappointments (particularly in terms of linking DNA analysis to clinical outcomes). But there is something special happening to the industry right now, accelerated by several factors:
• The cost of full genome sequencing has been dropping precipitously, in fact a lot faster than Moore’s law would have suggested. Illumina just released brand new machines that make the $1,000 full genome sequencing a realistic possibility. As a result, an extraordinary amount of data is going to become available at reasonable cost (5.5TB or 6.3 Billion bases… per patient).
• Big Data technology has had its own, separate evolution, and there is now an arsenal of tools to process and analyze massive amounts of data, at a comparatively cheap cost.
• Wet lab work has become a more standardized and increasingly automated process, considerably reducing the “friction” involved in collecting and processing physical samples. The cost of setting up biology labs, while still high, is starting to decrease, and molecular techniques are no longer the limiting step in genomic analysis.
As a result of the above, biology is rapidly evolving from being predominantly driven by traditional life sciences research to being largely driven by software and Big Data. This evolution considerably reduces the capital required to build a successful venture in the space. It also opens up the field to a new generation of startups run by inter-disciplinarian teams that have at least as much of a software and data science background as a biology background. A whole new world of bio-hackers is also emerging, from synthetic biology to personalized medicine, the possibilities are immense and the impact on our lives potentially unparalleled. It is entirely possible that the next generation of great entrepreneurs will be building “biology 2.0” companies, rather than mobile apps.
This opportunity has not been lost on entrepreneurs and the last 3 years or so have seen a rapid acceleration of startup creation, in a wide range of area from diagnostics (Counsyl) to cloud platforms (DNANexus) to lab automation (Benchling, Transcriptic). Interestingly but not surprisingly considering the above, most of those startups are funded by technology, rather than life sciences, venture capital firms.
Today I’m excited to announce that FirstMark is partnering with Recombine, a New York based startup that very much operates at this intersection between software, Big Data and biology, as its lead Series A investor. Recombine’s CEO, Alex Bisignano, symbolizes this new generation of entrepreneurs who have deep knowledge in multiple technical fields. He has built around him a great, multi-disciplinarian team, and benefits from the deep industry knowledge and expertise of co-founder Dr. Santiago Munne, the owner of Reprogenetics and pioneer in pre-implantation genetic diagnosis.
Recombine’s core focus is the field of fertility and reproductive genetics, and it has had a spectacular early start with CarrierMap, its first product, generating a profitable multi-million dollar business with a comparatively small seed investment. The CarrierMap test is the most comprehensive, cost-effective, carrier screen on the market, and has already helped thousands of couples to identify and mitigate the risk of passing on serious illnesses to their children. CarrierMap is sold exclusively through doctors and clinics, it is not a Direct to Consumer product (and therefore falls in a different category than 23andMe).
Beyond this initial focus, Recombine has ambitious plans to fully leverage Big Data technology to help decode the myriad aspects of our genome that are still not well understood. They have already obtained Institutional Review Board (IRB) approval for their first large-scale study, and the company is currently assembling a crack team of data scientists in New York City. If you have deep expertise in data science field, this is an opportunity to help bring about a revolution in personalized medicine. Come join us!
Thomson Reuters CTO James Powell runs a great series of podcasts where he interviews people in the technology world about topics of relevance to his organization. I was fortunate to be invited to speak with James about the Internet of Things and Big Data, and it was a lot of fun. Below is the podcast, uploaded on SoundCloud. Thanks to James Powell and Dan Cost for the opportunity.
Some updates on the event/community front:
1) A little while ago, I changed the name of the data event I’ve been organizing from “NYC Data Business Meetup” to “Data Driven NYC”. I originally started the event mostly as experiment, and didn’t give much thought to branding (so yeah, that was a terrible name). The event has now grown quite a bit (over 3.700 members as I write this), so it was time for a better name; also at this stage, it feels more like a community than “just” a meetup, so I wanted a name that reflected this reality.
2) Back in June, I launched a new community called “Hardwired NYC”. It covers startups, technologies and products at the intersection of the physical and digital worlds, including topics like 3D printing, Internet of Things, wearable computing, etc. I developed a strong interest in those areas through my involvement in the Big Data world – the Internet of Things, in particular, is deeply intertwined with Big Data (the proliferation of sensors has been contributing to the Big Data “problem”; equally the Internet of Things will be highly dependent on Big Data technologies if it is to deliver on its promise).
3) As Hardwired NYC is taking off fast (more than 700 members after just two events), I figured that both events/communities should have their own website with full video libraries, including for people who don’t live in New York and are interested in the content. So, with the great help of my FirstMark colleague Dan Kozikowski, I’m launching this week www.datadrivennyc.com and www.hardwirednyc.com. Both sites have a “Watch” section where, from now on, I will post pictures and videos of events (as opposed to this blog).
A few weeks ago, I was invited to do a couple of guest lectures at NYU (as part of the excellent “Ready, Fire, Aim” entrepreneurship class that Lawrence Lenihan, now my partner at FirstMark, has been doing for a while there) and at The New School (as part of a Big Data course organized by Debra Anderson and Greta Knutzen). Thought I’d share the slide deck I had prepared for those classes. Very much a Big Data 101 class for a college-level audience that had had little or no exposure to the key concepts prior to the class.