Introducing the *Emerging* MAD (Machine Learning, AI, Data) Index

A few weeks ago, my colleague John Wu and I introduced the MAD Index, a new public market index to track the progress of “pure play” machine learning, AI and data public companies. This was an initial group of 13 companies, which has since then increased to 14, following the UiPath IPO.

Today, we’re introducing the Emerging MAD Index, a companion to the public MAD index. The idea is to track a group of private companies that show high potential to join the MAD Index in the future.

Criteria

Just like the Public MAD Index, our goal is to capture “pure play” machine learning, AI and data companies.

In practice, that generally means infrastructure companies offering tools to store, process and analyze data, create and manage machine learning models, and/or automate core processes deep in the stack – broadly horizontal companies serving a variety of business needs across departments, industries and geographies.

Continue reading “Introducing the *Emerging* MAD (Machine Learning, AI, Data) Index”

In Conversation with Florian Douetteau, CEO, Dataiku

Dataiku (in which I’m a proud investor and board member) has had an impressive ride over the last few years. An early entrant in the enterprise Data Science and Machine Learning platform category, the company successfully expanded from its French/European roots to build a very strong presence in the US (where it is company is now headquartered) and, increasingly, Asia.

Along the way, Dataiku:

  • became a unicorn, most recently raising a $100M Series D in 2020
  • was named a “Leader” in Gartner’s Magic Quadrant for Data Science and ML Platforms in both 2020 and 2021
  • collected many accolades, such as CB Insight’s “AI 100” and several of Forbes lists: “Cloud 100”, “AI 50” and “America’s best startup employers in 2021”

It was really fun to host CEO Florian Douetteau at Data Driven NYC once again, after previous appearances in 2016 (here) and 2018 (here). We covered a bunch of different topics, including:

  • What enterprise AI is about: not flying cars, but optimizing hundreds of business processes
  • Why enterprises need to move past their fear of data and AI
  • The key principles behind the design of the Dataiku platform: handling the entire data lifecycle, and democratizing data/AI across teams
  • Dataiku’s partnership with Snowflake
  • The upcoming launch of their starter / SMB self-serve product, Dataiku Online

Below is the video and below that, the transcript.

Continue reading “In Conversation with Florian Douetteau, CEO, Dataiku”

In Conversation with Bindu Reddy, CEO, Abacus

At our most recent Data Driven NYC, we had the great pleasure of hosting Bindu Reddy, CEO and co-founder Abacus AI, and formerly GM & creator of AI verticals at AWS, and an ex-Googler. Bindu also has a very witty and entertaining Twitter account (@bindureddy), where she talks about all things machine learning and AI.

This was a very educational and approachable conversation, where we covered:

  • some key definitions: neural networks, weights and biases, supervised vs unsupervised learning, feature store
  • Applying neural networks to structured, tabular data
  • Abacus’ vision around “autonomous AI”
  • How companies wait too long to start experimenting in ML/AI

Below is the video and below that, the transcript.

Continue reading “In Conversation with Bindu Reddy, CEO, Abacus”

New Investment: Synthesia or the Rise of “Video as Code”

We all have insatiable appetite for video, both in our personal and professional lives. Time and again, video is shown to capture our attention better than any other medium. This is increasingly how we learn, explore, collaborate and get entertained.

However, especially in an enterprise context, creating professional-quality video remains a complex and costly endeavor. For all the capabilities of smartphones, most companies still need studio-level equipment to produce enterprise-grade videos: cameras, sound equipment, actors, post-production editing. The process is time-consuming, and not very scalable. Shooting a video in multiple languages, for example, requires multiple actors or dubbing, Any update requires everyone to go back to the studio.

But what if video could be just… code? What if it could be infinitely flexible and customizable at scale, as simple as an API call?

Today we’re excited to announce that FirstMark led a $12.5M Series A investment in Synthesia – a fast-growing startup that offers exactly that.

Synthesia makes creating a business video as simple as writing an email or putting together a powerpoint presentation – a compelling “text to video” experience.

Continue reading “New Investment: Synthesia or the Rise of “Video as Code””

Introducing the MAD (ML, AI, Data) Public Company Index

Today, we are previewing a new public market index – the MAD (for machine learning, AI and data) index.

Readers of this blog know that we have been tracking the data ecosystem since 2012, through annual landscapes (see the 2020 Data & AI Landscape).

Over the last few years, a funny thing happened – some of the small startups we had started tracking grew up, did an IPO and became large public companies.

Not so long ago, public market investors used to say there’s was no good way of “playing” the Big Data and AI trends, due to the lack of public companies in the space. This is less true today.

However, there isn’t much out there in terms of looking at those public companies as a group. For example, see this Seeking Alpha piece, Top 3 Artificial Intelligence ETFs To Consider, where none of the companies listed are actually AI companies.

Hence the idea of the MAD Index. It’s still a small group of companies, but my colleague John Wu and I were curious to see how they fared in public markets, now and going forward.

This is just a start. We anticipate that a number of companies will join this group in the next year or two, and we’re excited to see how this index matures.

Continue reading “Introducing the MAD (ML, AI, Data) Public Company Index”

In Conversation with Dave Burgess, Head of Data Engineering, Pinterest

Pinterest is near and dear to our hearts at FirstMark because we had the good fortune of being the first institutional investor back in 2009 when the company was just getting started (fun fact: the founders were in New York for a brief moment in time before moving to the Bay Area). Pinterest has had a remarkable ride ever since, and it’s a $49B market cap public company at the time of writing.

So it was a particular pleasure to welcome Dave Burgess, Head of Data Engineering, to come and talk to the Data Driven NYC audience about all things data at Pinterest.

We covered a bunch of interesting topics, including:

  • Pinterest’s newly open sourced project, QueryBook
  • The stack Pinterest uses to manage is 400 petabytes of data
  • The use cases for data analytics and machine learning at Pinterest

Below is the video and below that, the transcript.

Continue reading “In Conversation with Dave Burgess, Head of Data Engineering, Pinterest”

In conversation with Arjun Narayan, CEO, Materialize

Real-time data streaming is an increasingly crucial part of the data ecosystem. While financial services (trading) initially represented the bulk of the demand for streaming, the emergence of more mature technology in the space has unlocked more use cases, which in turn created more demand for better technology.

At a recent Data Driven NYC, we had a very interesting conversation with Arjun Narayan, CEO of Materialize, “the only true SQL streaming database for building internal tools, interactive dashboards, and customer-facing experiences”. Materialize is headquartered in New York and has raised $40M in venture capital money (with a new round rumored to be announced soon, at the time of writing).

This was a very educational discussion, where we covered the following topics:

  • What is streaming? What is Kakfa?
  • Why is there a need for a streaming database for analytics?
  • Why is SQL underrated?
  • What is Materialize?
  • Partnering with DBT to make streaming ubiquitous
  • Materializes’s roadmap

Below is the video and below that, the transcript.

Continue reading “In conversation with Arjun Narayan, CEO, Materialize”

In conversation with Chip Huyen, Writer and Computer Scientist

At our most recent Data Driven, we had the great pleasure of hosting Chip Huyen, a writer and computer scientist who also teaches machine learning design at Stanford, for a fascinating and fun conversation.

We covered a range of topics, including:

  • What is machine learning design?
  • The MLOps landscape, and how it’s both overdeveloped and under-developed
  • What is online machine learning?
  • The divergence between East and West for machine learning and data infrastructure
  • A couple of book recommendations

Below is the video and below that, the transcript.

Continue reading “In conversation with Chip Huyen, Writer and Computer Scientist”

In Conversation with Jack Hanlon, VP Data, Reddit

While it’s been around for 15+ years, Reddit has been on a tear lately: a $367M Series E round announced a few weeks ago, rumors of an IPO, and plenty of Internet action with r/wallstreetbets in particular.

Interestingly, there was a major gap for many years between the central role Reddit has been playing on the Internet and its relatively small team size. While companies like Facebook are largely AI companies (see our conversation with Jerome Pesenti, Head of AI, Facebook), Reddit’s data team was tiny.

Enter Jack Hanlon, VP Data at Reddit and our guest at our most recent Data Driven NYC event. Jack has been tasked with leading the data team into rapid growth, and we had a really interesting conversations, in particular around the following points:

  • How is the data team at Reddit organized? (preview: data science, data platform, machine learning, search)
  • What’s the data stack? (preview: switch from AWS to GCP, Kafka, Airflow, Colab, Amundsen, Great Expectations, Druid/Imply…)
  • What are the key use cases for data science and machine learning at Reddit?
  • A book recommendation: “Invisible Women: Data Bias in a World Designed for Men”

Anecdotally, Jack is our second speaker in recent memory who was a regular attendee in the early years of Data Driven NYC, before ascending to leadership responsibilities in a major Internet company! (the other being Alok Gupta, who spoke about leading data at DoorDash).

Below is the video and below that, the transcript.

Continue reading “In Conversation with Jack Hanlon, VP Data, Reddit”

In conversation with Guy Podjarny, Founder & President, Snyk

In just a few years of hyper growth, Snyk has become a $2.7B unicorn, most recently raising $200M in September 2020. A developer-first security company, it has also helped usher the “DevSecOps” category.

At our most recent Data Driven NYC, we had the pleasure of hosting its Founder & President, Guy Podjarny, zooming in late at night from Israel.

We covered many interesting topics, including:

  • What does DevSecOps mean?
  • How did Snyk initially get developers to care, and how did they expand horizontally from there?
  • What is infrastructure as code?
  • Thoughts Snyk Code and Snyk’s vulnerability database
  • The nuances of combining a bottoms-up, freemium motion focused on developers, with an enterprise motion focused on economic buyers of Snyk’s products.

Below is the video and below that, the transcript.

Continue reading “In conversation with Guy Podjarny, Founder & President, Snyk”

Introducing Kedro: Yetunde Dada, Principal Product Manager at QuantumBlack

If you follow the various talks at Data Driven NYC, and the data ecosystem on general, it’s plenty apparent that the overall tooling for data, data science and machine learning is still in its infancy, particularly compared to the software stack.

While this may feel ironic (yes, I really do think) given the billions in venture capital money that have been poured in the space, it’s worth remembering that the data stack (at least in its “big data” phase) is relatively recent (10-15 years), while the software stack has had several decades of evolution.

In many organizations, the data science and machine learning stack looks a collection of various tools, some open source, some proprietary, glued together with one-off scripts. Teams started experimenting with one tool, then another, then created ad hoc pathways to make it all work together over time, and before you knew it, you ended up with complex environments that are painful to manage.

In response to this situation, various machine learning frameworks have emerged to make abstract away the complexity. Several of those frameworks were developed internally at large tech comapanies to solve their own problems, and then open sourced.

Kedro is one such example. It was developed and maintained by QuantumBlack, an analytics consultancy acquired by McKinsey in 2015. It’s McKinsey’s first open-source product.

Kedro is somewhat hard to categorize. If it had its own category, it might be considered a Machine Learning Engineering Framework.  What React did for front-end engineering code is what Kedro does for machine learning code. It allows you to build “design systems” of reusable machine learning code.

At our most recent Data Driven NYC, we had the great pleasure of hosting Yetunde Dada, a Principal Product Manager at QuantumBlack, who has been the key driving force behind Kedro.

Below is the video and below that, the transcript.

Continue reading “Introducing Kedro: Yetunde Dada, Principal Product Manager at QuantumBlack”

Congratulations, Timber!

Although this was never publicly announced, for the last 2+ years, FirstMark was the lead investor in an exciting seed-stage company called Timber, alongside a great group of folks and firms including NextView, Notation, Addition, Lux Capital, Nat Turner, Zach Weinberg and Zach Perret.

Over time, Timber developed Vector, a very interesting open source project focused on observability data. Vector is effectively a “routing layer”, a high-performance observability data pipeline that enables customers to collect, transform, and route all their logs, metrics, and traces.

Continue reading “Congratulations, Timber!”

New Investment: Nextmv or the Democratization of Decision Science

Billions of dollars have been invested in the rise of data science and machine learning as mainstream disciplines in the world of business, one of the most exciting tech trends of the last (and next) decade.

In the enterprise, many of the applications of data science and machine learning ultimately produce a prediction: which customers are the most likely to buy? Or churn? Which transactions are most likely to be fraudulent? What part of town is likely to place the most food deliveries tomorrow afternoon?

However, powerful though it may be, there is one thing machine learning generally doesn’t tell you: once you have a prediction, what do you do with it? For example, once you have predicted high demand for food delivery in a certain part of town, how do you decide which delivery team member to dispatch where and when, to optimize for efficiency and maximize revenue and customer satisfaction?

Enter decision science. While the term has not crossed over to mainstream consciousness like its data science cousin, decision science has been around for decades. Also often known as Operations Research, it encompasses a variety of advanced analytical methods and quantitative models to help with decision-making and efficiency, including simulation, mathematical optimization, queuing theory, etc.  

Continue reading “New Investment: Nextmv or the Democratization of Decision Science”

In conversation with Wes McKinney, CEO, Ursa Computing

For anyone in the data analysis community, Wes McKinney a very well known figure. In addition to literally writing the book on the topic (“Python for Data Analysis”), he’s played a leading role in several key open source projects: he created Python Pandas, he’s a PMC member for Apache Parquet, and he’s also the co-creator of Apache Arrow, his current development focus. 

He’s also a serial entrepreneur, having co-founded DataPad (acquired by Cloudera) and now Ursa.

So it was a real pleasure hosting Wes for a chat at our most recent Data Driven NYC. As always, we tried to position the conversation to be approachable by everyone (with high level definitions) while being interesting for technical folks and industry experts.

Watch the video below (or read the transcript copied below the video) to learn:

  • What are pandas? What is a dataframe?
  • What is Arrow? What is its history and why is it a big deal?
  • What is Ursa Computing?
Continue reading “In conversation with Wes McKinney, CEO, Ursa Computing”

In conversation with Alok Gupta, Head Of Data Science & Machine Learning at DoorDash

Hosting Alok Gupta at our most recent Data Driven NYC was special for a couple of reasons.

First, because Alok is the very talented head of data science and machine learning in a company that has all sorts of really interesting use cases for AI and just had a phenomenal IPO, valuing it at $60B at the time of writing.

Second, because it was a homecoming of sorts for Alok, whose journey in the field of data science was inspired in part by Data Driven NYC – as he puts it:

This also feels like it nicely completes my journey starting 8 years ago when I was working on Wall Street in 2013 and started coming to your monthly evening talks at the Bloomberg building to learn more about ‘Data Science’. That was really a launching point for me to switch from trading to DS, and I’m grateful to be able to give back in a small way :).

One of those stories that brings joy to the heart of the organizers of this community!

Here are the video, as well as a full transcript for easy perusal:

Continue reading “In conversation with Alok Gupta, Head Of Data Science & Machine Learning at DoorDash”