In Conversation with Barr Moses, CEO, Monte Carlo

As more and more companies around the world rely on data for competitive advantage and mission-critical needs, the stakes have increased tremendously, and data infrastructure needs to be utterly reliable.

In the applications world, the need to monitor and maintain infrastructure gave rise to an entire industry, and iconic leaders like Datadog. Who will be the Datadog of the data infrastructure world? A handful of data startups have thrown their hat in the ring, and Monte Carlo is certainly one of the most notable companies in that group.

Monte Carlo presents itself as an end-to-end data observability platform that aims to increases trust in data by eliminating data downtime, so engineers innovate more and fix less. Started in 2019, the company has already raised $101M in venture capital, most recently in a Series C announced in August 2021.

It was a real pleasure to welcome Monte Carlo’s co-founder and CEO, Barr Moses, for a fun and educational conversation about data observavibility and the data infrastructure world in general.

Below is the video and full transcript.

Continue reading “In Conversation with Barr Moses, CEO, Monte Carlo”

In Conversation with Carolyn Mooney, CEO, Nextmv

Our business lives are full of optimization problems – scheduling, time management, resource planning, pricing, routing, risk management, network optimization, financial engineering, etc. Simply defined, optimization is the science of making the best decision possible, given a set of constraints.

Historically, optimization has been the province of PhDs with deep backgrounds in mathematics, using a generation of software that was developed for academia and large defense contractors.

Enter Nextmv (proncounded “Next Move”), a company in which I’m a proud investor. Nextmv is reinventing the space for the cloud era, making optimization and simulation technologies available to every developer.

It was great to welcome Nextmv’s CEO, Carolyn Mooney, at our most recent Data Driven NYC to talk abotu the space and the company.

We covered:

  • What is decision intelligence, and how does it differ from business intelligence and data science?
  • What is the overlap with the area known as “operations research”?
  • How decision intelligence is broadly horizontal area
  • How Nextmv is democratizing decision intelligence with its cloud product
  • Bonus: Nextmv’s policy of radical transparency on team compensation

Below is the video and full transcript.

Continue reading “In Conversation with Carolyn Mooney, CEO, Nextmv”

In conversation with Felix Van de Maele, CEO, Collibra

The world of data governance is not the most visible part of the data revolution, yet it is of critical importance. As more and more data floats into the enterprise, and its role is ever more mission critical, one needs to be in full control of it – understand where data resides, who can have access to it, which datasets can be trusted or not, etc.

Enter Collibra, a startup that has had a long march towards success, as it was founded in 2008. Collibra has now become an impressive industry leader and raised a $250 million Series G at a post money valuation of $5.25 billion last year.

We had had the chance to host Stan Christiaens, the co-founder and CTO of Collibra at Data Driven NYC in 2017 (video here), and this time we got a chance to chat with the company’s CEO, Felix Van de Maele.

We had a great conversation, starting with a round of definitions that should be interesting to anyone curious to better understand that side of the data world.

Below is the video and full transcript.

Continue reading “In conversation with Felix Van de Maele, CEO, Collibra”

In Conversation with Emil Eifrem, Founder and CEO, Neo4j

The last couple of years have seen a dramatic acceleration in the adoption of graph databases, a category of databases that stores nodes and relationships instead of tables, or documents.  That acceleration has clearly benefited Neo4j, which had a banner year in 2021, surpassing $100M in ARR and closing a $325M series F financing round at over $2B valuation, which it calls “the largest funding round in database history”.

That would make Neo4j an overnight success, except for the fact that Neo4j started in 20007, pioneered the space and literally coined the term “graph database”.

Neo4j’s CEO, Emil Eifrem, had spoken at Data Driven NYC back in 2015 (the same night as the CEO of Snowflake and the CEO of Airtable, a pretty stacked line up considering those three startups combined went on to represent many billions of market cap/valuations).

So it was particularly fun to have Emil back at the event and exciting to hear about the major progress the company has experienced over the last few years. Emil spoke from Sweden at around midnight his time, bringing impressive energy despite the late hour and it was a great conversation.

Below is the video and full transcript.

Continue reading “In Conversation with Emil Eifrem, Founder and CEO, Neo4j”

In conversation with Richard Craib, Founder, Numerai

I’ve been interested in the intersection of AI and crypto for a while (see AI & Blockchain: An introduction), and Numerai is one of the most exciting companies I came across in that world. Numerai is a new kind of crowdsourced quant hedge fund, which provides data for free and enables any data scientist around the planet to contribute models they believe will beat the stock market. Numerai offers its own token, called Numeraire, to incentivize participants.

As it turns out, this model delivers exciting results, and Numerai announced a few months ago that it had outperformed market neutral hedge funds by 29%.

It was a real pleasure welcoming Richard Craib, founder of Numerai, to Data Driven NYC to talk about the very exciting work Numerai has been doing.

Below is the video and full transcript.

Continue reading “In conversation with Richard Craib, Founder, Numerai”

Celebrating 10 years of Data Driven NYC

Sometime in the Fall of 2011, I was looking for a community in New York where a non-technical person like me could learn about about “Big Data”. 

I couldn’t find any.  So, on a whim on a November Sunday night, I logged on Meetup.com and created a group.  

After thinking about it for about 30 seconds, I came up with a oh-so-catchy name for it:  the New York Data Business Meetup. 

And so started a 10-year journey of community and network building, and an immensely fun and rewarding chapter of my professional life.

Continue reading “Celebrating 10 years of Data Driven NYC”

2021 MAD Landscape: The Top 10 Trends

For anyone interested in a quick overview of our long-form 2021 Machine Learning, AI and Data (MAD) Landscape, here are the Cliffs Notes! My co-author John and I did a presentation at our most recent Data Driven NYC, focused on top 10 trends in this year’s landscape.

As a preview, here they are:

  • Every company is a data company
  • The big unlock: data warehouses and lakehouses
  • Consolidation vs data mesh: the future is hybrid
  • An explosive funding environment
  • A busy year in DataOps
  • It’s time for real time
  • The action moves to the right side of the warehouse
  • The rise of AI generated content
  • From MLOps to ModelOps
  • The continued emergence of a separate Chinese AI stack

Below is the video from the event, and below that, the transcript.

Continue reading “2021 MAD Landscape: The Top 10 Trends”

The Data Mesh: In Conversation with Zhamak Dehghani

In the admittedly small world of people who obsess over data technologies, one of the hottest topics of the last year has been the “data mesh”.

Created by Zhamak Dehghani of ThoughtWorks, the concept struck a chord and made the rounds in countless conversations on Twitter and elswhere.

As I highlighted in the 2021 MAD Landscape, the data mesh concept is both a technological and organizational idea.  A standard approach to building data infrastructure and teams so far has been centralization: one big platform, managed by one data team, that serves the needs of business users.  This has advantages, but also can create a number of issues (bottlenecks, etc).  The general concept of the data mesh is decentralization – create independent data teams that are responsible for their own domain and provide data “as a product” to others within the organization.  Conceptually, this is not entirely different from the concept of micro-services that has become familiar in software engineering, but applied to the data domain.

It was a real treat to get to chat with Zhamak at our most recent Data Driven NYC.

Below is the video and below that, the transcript.

Continue reading “The Data Mesh: In Conversation with Zhamak Dehghani”

In Conversation with Bindu Reddy, CEO, Abacus

At our most recent Data Driven NYC, we had the great pleasure of hosting Bindu Reddy, CEO and co-founder Abacus AI, and formerly GM & creator of AI verticals at AWS, and an ex-Googler. Bindu also has a very witty and entertaining Twitter account (@bindureddy), where she talks about all things machine learning and AI.

This was a very educational and approachable conversation, where we covered:

  • some key definitions: neural networks, weights and biases, supervised vs unsupervised learning, feature store
  • Applying neural networks to structured, tabular data
  • Abacus’ vision around “autonomous AI”
  • How companies wait too long to start experimenting in ML/AI

Below is the video and below that, the transcript.

Continue reading “In Conversation with Bindu Reddy, CEO, Abacus”

In Conversation with Dave Burgess, Head of Data Engineering, Pinterest

Pinterest is near and dear to our hearts at FirstMark because we had the good fortune of being the first institutional investor back in 2009 when the company was just getting started (fun fact: the founders were in New York for a brief moment in time before moving to the Bay Area). Pinterest has had a remarkable ride ever since, and it’s a $49B market cap public company at the time of writing.

So it was a particular pleasure to welcome Dave Burgess, Head of Data Engineering, to come and talk to the Data Driven NYC audience about all things data at Pinterest.

We covered a bunch of interesting topics, including:

  • Pinterest’s newly open sourced project, QueryBook
  • The stack Pinterest uses to manage is 400 petabytes of data
  • The use cases for data analytics and machine learning at Pinterest

Below is the video and below that, the transcript.

Continue reading “In Conversation with Dave Burgess, Head of Data Engineering, Pinterest”

In conversation with Arjun Narayan, CEO, Materialize

Real-time data streaming is an increasingly crucial part of the data ecosystem. While financial services (trading) initially represented the bulk of the demand for streaming, the emergence of more mature technology in the space has unlocked more use cases, which in turn created more demand for better technology.

At a recent Data Driven NYC, we had a very interesting conversation with Arjun Narayan, CEO of Materialize, “the only true SQL streaming database for building internal tools, interactive dashboards, and customer-facing experiences”. Materialize is headquartered in New York and has raised $40M in venture capital money (with a new round rumored to be announced soon, at the time of writing).

This was a very educational discussion, where we covered the following topics:

  • What is streaming? What is Kakfa?
  • Why is there a need for a streaming database for analytics?
  • Why is SQL underrated?
  • What is Materialize?
  • Partnering with DBT to make streaming ubiquitous
  • Materializes’s roadmap

Below is the video and below that, the transcript.

Continue reading “In conversation with Arjun Narayan, CEO, Materialize”

In conversation with Chip Huyen, Writer and Computer Scientist

At our most recent Data Driven, we had the great pleasure of hosting Chip Huyen, a writer and computer scientist who also teaches machine learning design at Stanford, for a fascinating and fun conversation.

We covered a range of topics, including:

  • What is machine learning design?
  • The MLOps landscape, and how it’s both overdeveloped and under-developed
  • What is online machine learning?
  • The divergence between East and West for machine learning and data infrastructure
  • A couple of book recommendations

Below is the video and below that, the transcript.

Continue reading “In conversation with Chip Huyen, Writer and Computer Scientist”

In Conversation with Jack Hanlon, VP Data, Reddit

While it’s been around for 15+ years, Reddit has been on a tear lately: a $367M Series E round announced a few weeks ago, rumors of an IPO, and plenty of Internet action with r/wallstreetbets in particular.

Interestingly, there was a major gap for many years between the central role Reddit has been playing on the Internet and its relatively small team size. While companies like Facebook are largely AI companies (see our conversation with Jerome Pesenti, Head of AI, Facebook), Reddit’s data team was tiny.

Enter Jack Hanlon, VP Data at Reddit and our guest at our most recent Data Driven NYC event. Jack has been tasked with leading the data team into rapid growth, and we had a really interesting conversations, in particular around the following points:

  • How is the data team at Reddit organized? (preview: data science, data platform, machine learning, search)
  • What’s the data stack? (preview: switch from AWS to GCP, Kafka, Airflow, Colab, Amundsen, Great Expectations, Druid/Imply…)
  • What are the key use cases for data science and machine learning at Reddit?
  • A book recommendation: “Invisible Women: Data Bias in a World Designed for Men”

Anecdotally, Jack is our second speaker in recent memory who was a regular attendee in the early years of Data Driven NYC, before ascending to leadership responsibilities in a major Internet company! (the other being Alok Gupta, who spoke about leading data at DoorDash).

Below is the video and below that, the transcript.

Continue reading “In Conversation with Jack Hanlon, VP Data, Reddit”

In conversation with Guy Podjarny, Founder & President, Snyk

In just a few years of hyper growth, Snyk has become a $2.7B unicorn, most recently raising $200M in September 2020. A developer-first security company, it has also helped usher the “DevSecOps” category.

At our most recent Data Driven NYC, we had the pleasure of hosting its Founder & President, Guy Podjarny, zooming in late at night from Israel.

We covered many interesting topics, including:

  • What does DevSecOps mean?
  • How did Snyk initially get developers to care, and how did they expand horizontally from there?
  • What is infrastructure as code?
  • Thoughts Snyk Code and Snyk’s vulnerability database
  • The nuances of combining a bottoms-up, freemium motion focused on developers, with an enterprise motion focused on economic buyers of Snyk’s products.

Below is the video and below that, the transcript.

Continue reading “In conversation with Guy Podjarny, Founder & President, Snyk”

Introducing Kedro: Yetunde Dada, Principal Product Manager at QuantumBlack

If you follow the various talks at Data Driven NYC, and the data ecosystem on general, it’s plenty apparent that the overall tooling for data, data science and machine learning is still in its infancy, particularly compared to the software stack.

While this may feel ironic (yes, I really do think) given the billions in venture capital money that have been poured in the space, it’s worth remembering that the data stack (at least in its “big data” phase) is relatively recent (10-15 years), while the software stack has had several decades of evolution.

In many organizations, the data science and machine learning stack looks a collection of various tools, some open source, some proprietary, glued together with one-off scripts. Teams started experimenting with one tool, then another, then created ad hoc pathways to make it all work together over time, and before you knew it, you ended up with complex environments that are painful to manage.

In response to this situation, various machine learning frameworks have emerged to make abstract away the complexity. Several of those frameworks were developed internally at large tech comapanies to solve their own problems, and then open sourced.

Kedro is one such example. It was developed and maintained by QuantumBlack, an analytics consultancy acquired by McKinsey in 2015. It’s McKinsey’s first open-source product.

Kedro is somewhat hard to categorize. If it had its own category, it might be considered a Machine Learning Engineering Framework.  What React did for front-end engineering code is what Kedro does for machine learning code. It allows you to build “design systems” of reusable machine learning code.

At our most recent Data Driven NYC, we had the great pleasure of hosting Yetunde Dada, a Principal Product Manager at QuantumBlack, who has been the key driving force behind Kedro.

Below is the video and below that, the transcript.

Continue reading “Introducing Kedro: Yetunde Dada, Principal Product Manager at QuantumBlack”