Data Driven NYC

In Conversation with Florian Douetteau, Co-Founder & CEO, Dataiku (podcast + video)

An overnight success 10 years in the making, Dataiku, the leading enterprise AI platform targeting Global 2000 companies, was named “Partner of the Year” for data science, machine learning and AI by BOTH Snowflake and Databricks at their respective annual summit a few days ago.

The company, in which we’ve been proud investors since leading the Series A in 2016, has scaled impressively over the years, reaching $200M in ARR at the end of 2022, with a team of over 1,200 people.

It was a pleasure welcoming back CEO Florian Douetteau, for a conversation where we covered:

* Dataiku’s centralized approach to enterprise AI

* Emerging use cases for Generative AI in the enterprise

* Some leadership lessons learned along the way

Here are the links to the podcast (subscribe! give us 5 stars! etc), and the YouTube video:

MAD 2023: Top 10 Trends

Every year, as part of our MAD project, we do a presentation at Data Driven NYC about the top trends we see across data and ML/AI. (here’s the 2022 version for reference).

The presentation, done this year with my FirstMark colleague Kevin Zhang, is a whirlwind tour of top trends, as opposed to anything particularly in-depth, as we tried to keep it short. But hopefully it should provide a good overview of what’s been happening in those spaces, for anyone interested in a recap.

See below for:

the video (20’53”)
the list of top trends for easy perusal
the slides

In Conversation with Spencer Kimball, CEO, Cockroach Labs

Cockroach Labs, the ambitious database company with a funny name, has gone from strength to strength over the last few years. Started with three ex-Googlers in 2014, it successfully navigated in its early years the perilous waters of being an early database company that customers need to trust for mission-critical applications. Over time, it’s gained tremendous momentum with a now long list of marquee customers, and was most recently valued at $5B.

In part because we at FirstMark are proud investors in the company, we’ve featured Cockroach Labs several times at Data Driven NYC over the years: in 2014 (video), 2018 (video) and 2020 (video), and it’s been really fun to see their tremendous progress.

It was great to host CEO Spencer Kimball once again and check in on the latest, as well as lessons learned building a successful open source enterprise software company.

We covered a bunch of really interesting things, including:

The origins of the company
The evolution of the database market from SQL to NoSQL to NewSQL to cloud
The current opportunity around serverless
Open source license questions
Go to market: community led, bottoms up, top down?
Who’s the perfect first sales hire for an enterprise software company

Video and transcript below!

In Conversation with Benn Stancil, Co-Founder, Mode [Video + Transcript]

In addition to his role as co-founder and Chief Analytics Officer of Mode, a leading collaborative data platform, Benn Stancil is a prolific and thought-provoking writer about the broad data space. Over the last couple of years in particular, he’s produced a series of insightful and entertaining posts on his newsletter: https://benn.substack.com/

We had welcomed Benn at Data Driven NYC back in 2019 to talk about Mode (see the video, “The case for hiring more data analysts“), and it was great to have him back from a wide-encompassing conversation where he addressed some of the “sacred cows” of the data world.

One of the most interesting conversations on the space we’ve had recently, highly recommended watch!

Video and transcript below

Estuary: Building Real-Time Data Pipelines (Data Driven NYC talk)

In a world where everything moves ever faster, it seems inevitable that data infrastructure will need to move sooner or later to a predominantly real-time paradigm. Yet the infrastructure for real-time data is still trailing far behind its batch processing cousin.

Enter Estuary, a real-time data ops platform, in which my firm FirstMark led a large seed round last year. Estuary enables you to synchronize your data products across all your systems (whether databases, SaaS, pub/sub, etc) in real-time, and also to join aggregate, join, or otherwise take action on, your data while in motion. Estuary is not a database – instead it makes your databases real time. It abstracts away the complexity of building real-time, data-intensive applications at scale.

It was a lot of fun to host at Data Driven NYC Estuary’s co-founder and CTO Johnny Graettinger for a fun, approachable and educational talk about the company, its product and the real-time data world.

For more talks like this, subscribe to our channel!

In Conversation with Tristan Handy, CEO, dbt Labs

In the world of data infrastructure, dbt Labs has undoubtedly been one of the most exciting startups to watch. The company is the creator and maintainer of dbt, a data transformation tool that enables data analysts and engineers to transform, test and document data in the cloud data warehouse. Beyond this, the company is empowering a new generation of data analysts and enabling them to create and disseminate organizational knowledge.

dbt’s CEO, Tristan Handy, is also one of the most thoughtful and interesting CEOs in the space, having played a pivotal role in the emergence of what’s often referred to as the “Modern Data Stack”, a suite of tools and processes that leverage the power of cloud data warehouses to bring data processing to the modern era.

We had the pleasure of hosting Tristan once during the pandemic in 2021 for a great o n l i n e chat with Jeremiah Lowin, CEO of Prefect. It was a particular treat to welcome back Tristan, this time for our first in-person event since 2020!

Below is the video and full transcript. As always, please subscribe to our YouTube channel to be notified when new videos are released, and give your favorite videos a “like”! Also, if you’re in New York or come visit from time to time, please join the meetup group!

In Conversation with Krishna Gade, CEO, Fiddler

As enterprises around the world deploy machine learning and AI in actual production, it’s becoming increasingly critical that AI can be trusted to produce not just accurate, but also fair and ethical results. An interesting market opportunity has opened up to equip enterprises with the tools to address those issues.

At our most recent Data Driven NYC, we had a great chat with Krishna Gade, co-founder and CEO of Fiddler, a platform to “monitor, observe, analyze and explain your machine learning models in production with an overall mission to make AI trustworthy for all enterprises”. Fiddler has aised $45 million in venture capital to date, most recently a $32 million Series B just last year in 2021.

We got a chance to cover some great topics, including:

What does “explainability” mean, in the context of ML/AI? What is “bias detection”?
What are some examples of business impact of “models gone bad”?
A dive into the Fiddler product and how it addresses the above?
Where are we in the cycle of actually deploying ML/AI in the enterprise? What’s the actual state of the market?

Below is the video and full transcript. As always, please subscribe to our YouTube channel to be notified when new videos are released, and give your favorite videos a “like”!

In Conversation with Barry McCardel, CEO, Hex

In the ever vibrant world of the “Modern Data Stack” (an ecosystem of mostly young tech startups that represent the rising generation of data software vendors, and integrate well with one another), Hex has been getting increasing visibility and momentum. At its core, Hex is a collaborative data platform where teams can explore, analyze, and share. It aims to bring together the best of notebooks, BI & docs into a seamless, collaborative UI.

The company was founded in 2019 and you raised a total of $73.5 million in venture capital to date, including most recently a $52 million Series B.

CEO Barry McCardel joined us at Data Driven NYC for a deep dive in to the product, the company, the data space and his journey from doing “unholy things in Excel” as a young consultant to building a great startup.

Below is the video and full transcript.

In Conversation with Barr Moses, CEO, Monte Carlo

As more and more companies around the world rely on data for competitive advantage and mission-critical needs, the stakes have increased tremendously, and data infrastructure needs to be utterly reliable.

In the applications world, the need to monitor and maintain infrastructure gave rise to an entire industry, and iconic leaders like Datadog. Who will be the Datadog of the data infrastructure world? A handful of data startups have thrown their hat in the ring, and Monte Carlo is certainly one of the most notable companies in that group.

Monte Carlo presents itself as an end-to-end data observability platform that aims to increases trust in data by eliminating data downtime, so engineers innovate more and fix less. Started in 2019, the company has already raised $101M in venture capital, most recently in a Series C announced in August 2021.

It was a real pleasure to welcome Monte Carlo’s co-founder and CEO, Barr Moses, for a fun and educational conversation about data observavibility and the data infrastructure world in general.

Below is the video and full transcript.

In Conversation with Carolyn Mooney, CEO, Nextmv

Our business lives are full of optimization problems – scheduling, time management, resource planning, pricing, routing, risk management, network optimization, financial engineering, etc. Simply defined, optimization is the science of making the best decision possible, given a set of constraints.

Historically, optimization has been the province of PhDs with deep backgrounds in mathematics, using a generation of software that was developed for academia and large defense contractors.

Enter Nextmv (proncounded “Next Move”), a company in which I’m a proud investor. Nextmv is reinventing the space for the cloud era, making optimization and simulation technologies available to every developer.

It was great to welcome Nextmv’s CEO, Carolyn Mooney, at our most recent Data Driven NYC to talk abotu the space and the company.

We covered:

What is decision intelligence, and how does it differ from business intelligence and data science?
What is the overlap with the area known as “operations research”?
How decision intelligence is broadly horizontal area
How Nextmv is democratizing decision intelligence with its cloud product
Bonus: Nextmv’s policy of radical transparency on team compensation

Below is the video and full transcript.

In conversation with Felix Van de Maele, CEO, Collibra

The world of data governance is not the most visible part of the data revolution, yet it is of critical importance. As more and more data floats into the enterprise, and its role is ever more mission critical, one needs to be in full control of it – understand where data resides, who can have access to it, which datasets can be trusted or not, etc.

Enter Collibra, a startup that has had a long march towards success, as it was founded in 2008. Collibra has now become an impressive industry leader and raised a $250 million Series G at a post money valuation of $5.25 billion last year.

We had had the chance to host Stan Christiaens, the co-founder and CTO of Collibra at Data Driven NYC in 2017 (video here), and this time we got a chance to chat with the company’s CEO, Felix Van de Maele.

We had a great conversation, starting with a round of definitions that should be interesting to anyone curious to better understand that side of the data world.

Below is the video and full transcript.

In Conversation with Emil Eifrem, Founder and CEO, Neo4j

The last couple of years have seen a dramatic acceleration in the adoption of graph databases, a category of databases that stores nodes and relationships instead of tables, or documents. That acceleration has clearly benefited Neo4j, which had a banner year in 2021, surpassing $100M in ARR and closing a $325M series F financing round at over $2B valuation, which it calls “the largest funding round in database history”.

That would make Neo4j an overnight success, except for the fact that Neo4j started in 20007, pioneered the space and literally coined the term “graph database”.

Neo4j’s CEO, Emil Eifrem, had spoken at Data Driven NYC back in 2015 (the same night as the CEO of Snowflake and the CEO of Airtable, a pretty stacked line up considering those three startups combined went on to represent many billions of market cap/valuations).

So it was particularly fun to have Emil back at the event and exciting to hear about the major progress the company has experienced over the last few years. Emil spoke from Sweden at around midnight his time, bringing impressive energy despite the late hour and it was a great conversation.

Below is the video and full transcript.

In conversation with Richard Craib, Founder, Numerai

I’ve been interested in the intersection of AI and crypto for a while (see AI & Blockchain: An introduction), and Numerai is one of the most exciting companies I came across in that world. Numerai is a new kind of crowdsourced quant hedge fund, which provides data for free and enables any data scientist around the planet to contribute models they believe will beat the stock market. Numerai offers its own token, called Numeraire, to incentivize participants.

As it turns out, this model delivers exciting results, and Numerai announced a few months ago that it had outperformed market neutral hedge funds by 29%.

It was a real pleasure welcoming Richard Craib, founder of Numerai, to Data Driven NYC to talk about the very exciting work Numerai has been doing.

Below is the video and full transcript.

Celebrating 10 years of Data Driven NYC

Sometime in the Fall of 2011, I was looking for a community in New York where a non-technical person like me could learn about about “Big Data”.

I couldn’t find any. So, on a whim on a November Sunday night, I logged on Meetup.com and created a group.

After thinking about it for about 30 seconds, I came up with a oh-so-catchy name for it: the New York Data Business Meetup.

And so started a 10-year journey of community and network building, and an immensely fun and rewarding chapter of my professional life.

2021 MAD Landscape: The Top 10 Trends

For anyone interested in a quick overview of our long-form 2021 Machine Learning, AI and Data (MAD) Landscape, here are the Cliffs Notes! My co-author John and I did a presentation at our most recent Data Driven NYC, focused on top 10 trends in this year’s landscape.

As a preview, here they are:

Every company is a data company
The big unlock: data warehouses and lakehouses
Consolidation vs data mesh: the future is hybrid
An explosive funding environment
A busy year in DataOps
It’s time for real time
The action moves to the right side of the warehouse
The rise of AI generated content
From MLOps to ModelOps
The continued emergence of a separate Chinese AI stack

Below is the video from the event, and below that, the transcript.