x.ai and the emergence of the AI-powered application

AI is experiencing an astounding resurrection.  After so many broken promises, the term “artificial intelligence” had become almost a dirty word in technology circles.  The field is now rising from the ashes.  Researchers who had been toiling away in semi-obscurity over the last few decades have suddenly become superstars and have been aggressively recruited by the largest Internet companies:  Yann LeCun (see his recent talk at our Data Driven NYC event here) by Facebook; Geoff Hinton by Google; Andrew Ng by Baidu.  Google spent over $400 million to acquire DeepMind, a 2 year old secretive UK AI startup. The press and social media are awash with thoughts on AI.  Elon Musk cautions us against its perils.
What’s different this time? As Irving Wladawsky-Berger pointed out in a Wall Street Journal article, “a different AI paradigm emerged. Instead of trying to program computers to act intelligently–an approach that hadn’t worked because we don’t really know what intelligence is– AI now embraced a statistical, brute force approach based on analyzing vast amounts of information with powerful computers and sophisticated algorithms.”  In other words, the resurgence of AI is partly a child of Big Data, as better algorithms (in particular, what’s known as “deep learning”, pioneered by LeCun and others) have been enabled by larger than ever datasets and the ability to process those datasets at scale at reasonable cost.

Continue reading “x.ai and the emergence of the AI-powered application”

The State Of Big Data in 2014: a Chart

Note: This post appeared on VentureBeat, here.

It’s been almost two years since I took a first stab at charting the booming Big Data ecosystem, and it’s been a period of incredible activity in the space. An updated chart was long overdue, and here it is:

(click on the arrows at the bottom right of the screen to expand)

A few thoughts on this revised chart, and the Big Data market in general, largely from a VC perspective:

Continue reading “The State Of Big Data in 2014: a Chart”

Can the Bloomberg Terminal be “Toppled”?

In the eye of some entrepreneurs and venture capitalists, the Bloomberg terminal is a bit of an anomaly, perhaps even an anachronism.  In the era of free information on the Internet and open source Big Data tools, here’s a business that makes billions every year charging its users to access data that it generally obtains from third parties, as well as the tools to analyze it.  You’ll hear the occasional jab at its interface as reminiscent of the 1980s.  And at a time of accelerating “unbundling” across many industries, including financial services, the Bloomberg terminal is the ultimate “bundling” play: one product, one price, which means that that the average user uses only a small percentage of the terminal’s 30,000+ functions.  Yet, 320,000 people around the world pay about $20,000 a year to use it.

Continue reading “Can the Bloomberg Terminal be “Toppled”?”

Recombine

The field of bioinformatics is having its “big bang” moment.   Of course, bioinformatics is not a new discipline and it has seen various waves of innovations since the 1970s and 1980s, with its fair share of both exciting moments and disappointments (particularly in terms of linking DNA analysis to clinical outcomes).  But there is something special happening to the industry right now, accelerated by several factors:

Continue reading “Recombine”

Thomson Reuters CTO Series (Podcast)

Thomson Reuters CTO James Powell runs a great series of podcasts where he interviews people in the technology world about topics of relevance to his organization.  I was fortunate to be invited to speak with James about the Internet of Things and Big Data, and it was a lot of fun.   Below is the podcast, uploaded on SoundCloud.  Thanks to James Powell and Dan Cost for the opportunity.

Launching New Sites for Data Driven NYC and Hardwired NYC

Some updates on the event/community front:

1) A little while ago, I changed the name of the data event I’ve been organizing from “NYC Data Business Meetup” to “Data Driven NYC”.   I originally started the event mostly as experiment, and didn’t give much thought to branding (so yeah, that was a terrible name).  The event has now grown quite a bit (over 3.700 members as I write this), so it was time for a better name; also at this stage, it feels more like a community than “just” a meetup, so I wanted a name that reflected this reality.

2) Back in June, I launched a new community called “Hardwired NYC”.  It covers startups, technologies and products at the intersection of the physical and digital worlds, including topics like 3D printing, Internet of Things, wearable computing, etc.  I developed a strong interest in those areas through my involvement in the Big Data world – the Internet of Things, in particular, is deeply intertwined with Big Data (the proliferation of sensors has been contributing to the Big Data “problem”; equally  the Internet of Things will be highly dependent on Big Data technologies if it is to deliver on its promise).

3) As Hardwired NYC is taking off fast (more than 700 members after just two events), I figured that both events/communities should have their own website with full video libraries, including for people who don’t live in New York and are interested in the content. So, with the great help of my FirstMark colleague Dan Kozikowski,  I’m launching this week www.datadrivennyc.com and www.hardwirednyc.com.  Both sites have a “Watch” section where, from now on, I will post pictures and videos of events (as opposed to this blog).

Data Driven NYC

Hardwired screenshot

Big Data 101 Presentation

A few weeks ago, I was invited to do a couple of guest lectures at NYU (as part of the excellent “Ready, Fire, Aim” entrepreneurship class that Lawrence Lenihan, now my partner at FirstMark, has been doing for a while there) and at The New School (as part of a Big Data course organized by Debra Anderson and Greta Knutzen).  Thought I’d share the slide deck I had prepared for those classes.  Very much a Big Data 101 class for a college-level audience that had had little or no exposure to the key concepts prior to the class.

Quantopian, Plaid and ZestFinance

Our February NYC Data Business Meetup was focused on the intersection of data and finance (both market and consumer finance).  Quantopian, Plaid and ZestFinance presented.

We also had a great panel presenting the customer perspective on Big Data (hype vs. reality), from a financial institutions’ viewpoint, with the following speakers:  Mike Simone (Global Head of CitiData Platform Engineering), Emile Werr (Head of Enterprise Data Architecture, NYSE EuroNext) and  Raj Patil (up until recently Data innovation CTO at UBS, now an entrepreneur).  Unfortunately, due to standard policy at some of those institutions, we can’t publicly post the video of the panel.

The slides are here: 

Quantopian

Plaid

ZestFinance

Here are the videos, in order of appearance (we also had a great “customer panel

Bloomberg App Portal:

Quantopian:

Plaid:

ZestFinance:

Panel:

Joseph Turian, Sqrrl, Infochimps and MemSQL

The December NYC Data Business Meetup was focused on big data infrastructure companies, with the co-founders of Sqrrl, Infochimps and MemSQL presenting to a full house.  We started the evening with a presentation by prominent data scientist Joseph Turian.

The slides are here: Joseph TurianSqrrlInfochimps and MemSQL.

Here are the videos:

Intro

 

Joseph Turian, “How to do AI in 2013”

 

Oren A. Falkowitz, Co-Founder & CEO, Sqrrl

 

Dhruv Bansal, Co-Founder & Chief Science Officer, Infochimps

 

Eric Frenkiel, Co-Founder & CEO, MemSQL

 

And here are a few pics (photo credit: Shivon Zilis):

 

Recorded Future, Lex Machina, DataMarket and numberFire

The November NYC Data Business Meetup was focused on “vertical-specific” applications of big data – startups leveraging the big data stack to offer new solutions to specific industries, such as finance and government (Recorded Future), the legal industry (Lex Machina), energy (DataMarket, although it offers data sets for other industries as well) and sports (numberFire).

The slides are here: Recorded FutureLex MachinaDataMarket and numberFire.

Here are the videos:

Christopher Ahlberg, CEO, Recorded Future:

 

Josh Becker, CEO, Lex Machina:

 

Hjálmar Gíslason, CEO, DataMarket:

 

Nik Bonaddio, CEO, numberFire:

 

Panel discussion:

 

Some pics:

IA Ventures, Accel, Data Collective, Precog and CCS at the NYC Data Business Meetup

Here are the videos from the NYC Data Business Meetup that was held on October 23, 2012, in order of appearance:

Jeff Carr, COO, Precog

 

Max Yankelevich, co-founder, CrowdComputing Systems

 

Roger Ehrenberg, Founder and Managing Partner, IA Ventures; Ping Li, General PartnerAccel Partners; Matt Ocko, Co-Founder and Partner, Data Collective (from left to right):

 

A chart of the big data ecosystem, take 2

So here we are again.  My colleague Shivon and I had made a first attempt at making sense of the rapidly evolving big data ecosystem back in June.  Based on some very helpful feedback from readers of this blog and others, a number of additional meetings with interesting startups and more in depth research, we’ve come up with this second version.

Some thoughts:

  • It’s still a work in progress (and will presumably always be, that’s the nature of the beast)
  • It’s even more crowded than the first time around, which reflects the incredible vitality of the big data space
  • We’ve created some new subcategories such as NoSQL/NewSQL and analytics services (reflecting the reality that, for the time being, the last mile of data analysis is very much performed by humans)
  • We have the occasional company that appears in different categories (Infochimps or Autonomy for example)
  • We have learned more about companies that were already on the first version of the chart, and have positioned them differently.  For example, Metamarkets now falls in the “Cross Infrastructure/Analytics” category as they offer a stack that includes a data store (Druid), predictive analytics and visualization.  Another example is Collective[i] – they have built  an entire proprietary big data stack from the ground up, that includes infrastructure, analytics and applications – making the company a rare example of an “Application Service Provider”.

Our goal is to continue updating this chart from time to time, and perhaps make it evolve visually, as we’ve probably reached the limits of what we can reasonably fit on one slide.  It was suggested that we try to visually distinguish on premise offerings vs. cloud based solutions, which we may try to do.

To enlarge, click on the arrows at the bottom right of the chart.

Comments, thoughts, questions? Please add to the comments section.

10Gen, Mortar, Datadog & Rick Smolan at the NYC Data Meetup

Here are the videos and some pictures (scroll down) of the NYC Data Business Meetup held on September 25, 2012

In order of appearance:

1) Rick Smolan told us about his fascinating new project, the “Human Face of Big Data” – see the NY Times coverage here: http://nyti.ms/TO5MDd.

 

2) Mortar (presenter: K Young, CEO). Mortar (www.mortardata.com) provides a platform-as-a-service for Hadoop.  They take care of all of the necessary infrastructure (via AWS) and allow any software engineer to run jobs on Hadoop using Apache Pig and Python without special training.

 

3)  Datadog (presenter: Alexis Le Quoc, co-founder). Datadog (www.datadoghq.com) is a service for IT, Operations and Development teams who write and run applications at scale, and want to turn the massive amounts of data produced by their apps, tools and services into actionable insight.  Datadog helps software developers and web ops understand their IT Data by putting it all in context.

 

4) We finished with a fireside chat with Dwight Merriman, CEO and co-founder, 10Gen. 10Gen (www.10gen.com) develops MongoDB, and offers production support, training, and consulting for the open source database. Dwight is one of the original authors of MongoDB. In 1995, Dwight co-founded DoubleClick (acquired by Google for $3.1 billion) and served as its CTO for ten years. Dwight was the architect of the DoubleClick ad serving infrastructure, DART, which serves tens of billions of ads per day. Dwight is co-founder, Chairman, and the original architect of Panther Express (now part of CDNetworks), a content distribution network (CDN) technology that serves hundreds of thousands of objects per second. Dwight is also a co-founder and investor in BusinessInsider.com and Gilt Groupe.

 

Continuuity, Sailthru & Visual Revenue at the NYC Data Meetup

Here are the some videos, slides and pics from the most recent NYC Data Business Meetup.  The videos are unfortunately not of the greatest quality, but are good enough to watch.

Also, note to self: make sure that our audience of 200+ sits closer to the stage, so that the room doesn’t look tragically empty on camera (rookie mistake)!

In order of appearance:

1) Todd Papaioannou, CEO, Continnuuity, a stealth big data startup, based in Palo Alto, CA and backed by Andreessen Horowitz, Battery Ventures, Data Collective and a number of high profile angels. Todd was previously Chief Cloud Architect for Yahoo.

2) Neil Capel, CEO, and Daniel Krasner, Chief Data Scientist, Sailthru, a New York based startup backed by RRE, AOL Ventures, Lerer Ventures, DFJ Gotham, Thrive Capital, Metamorphic, etc.  Sailthru provides fully automated, 1:1 email and onsite recommendations using a unique behavioral targeting platform. Sailthru helps brands cut through the clutter and build trust with their customers by recognizing and acting upon their individual interests. Sailthru’s technology creates individual user profiles associated with each person’s email address and online behavior. Sailthru’s algorithms gauge each individual user’s intent and match appropriate content and frequency of email communications such that every email is tailored to the unique user. That means they send as many permutations of an email as there are recipients. All simultaneously, all automated and all in real time.

Sailthru’s slides (PDF)

3) Dennis R. Mortensen, CEO and Jeroen Janssens, Data Scientist,Visual Revenue, a New York based startup backed by Lerer Ventures, SV Angel, IA Ventures and Softbank. Visual Revenue increases front page performance for online media organizations.  Their platform provides Editors with actionable, real-time recommendations on what content to place in what position right now and for how long. Visual Revenue’s predictive analytics technology allows media organizations to proactively manage the cost of exposing a piece of content on a front page, whilst maximizing the return they expect from promoting it.

Visual Revenue’s slides

4) Panel discussion and Q&A with the audience