In Conversation with Kyle Samani, Managing Partner, Multicoin

Multicoin Capital is one of the top crypto-native funds, and we’ve had the pleasure of working with them at Helium, a shared investment. The firm was founded in 2017 and they raised their second venture fund ($100M) in 2021, with another one (reportedly) to be announced soon.

At our Crypto Driven event, we hosted co-founder and Managing Partner Kyle Samani, who is widely recognized in the crypto ecosystem for his writing and system-level analysis.

We covered a bunch of interesting topics including:

  • Multicoin’s portfolio strategy and preferred investment structure
  • Their big bet on Solana
  • Why is composability important
  • Why they’re excited about Web3 infrastructure as a key investment them

Below is the video and full transcript.

Continue reading “In Conversation with Kyle Samani, Managing Partner, Multicoin”

In Conversation with Azeem Azhar, Author, The Exponential Age

For all the excitement about the explosive pace of progress in AI and technology that many readers of this blog will share, there’s an undeniable feeling of uneasiness: things are perhaps moving too fast and having second order effects across society that we are just beginning to truly appreciate.

The Exponential Age is one of the best books I’ve read in a while. It’s a bold exploration and call-to-arms over the widening gap between AI, automation, big data and other emerging technologies, on the one hand, and our ability to deal with their impact, on the other hand. Those technologies are growing at an exponential pace but our society is not. This “exponential gap” explains many problems of our time – from political polarization to ballooning inequality to unchecked corporate power.

It was a real pleasure to host at our most recent Data Driven event its excellent author, Azeem Azhar, an entrepreneur, investor, renowned technology analyst and host of the global tech podcast Exponential View.

Below is the video and full transcript.

Continue reading “In Conversation with Azeem Azhar, Author, The Exponential Age”

In Conversation with Emil Eifrem, Founder and CEO, Neo4j

The last couple of years have seen a dramatic acceleration in the adoption of graph databases, a category of databases that stores nodes and relationships instead of tables, or documents.  That acceleration has clearly benefited Neo4j, which had a banner year in 2021, surpassing $100M in ARR and closing a $325M series F financing round at over $2B valuation, which it calls “the largest funding round in database history”.

That would make Neo4j an overnight success, except for the fact that Neo4j started in 20007, pioneered the space and literally coined the term “graph database”.

Neo4j’s CEO, Emil Eifrem, had spoken at Data Driven NYC back in 2015 (the same night as the CEO of Snowflake and the CEO of Airtable, a pretty stacked line up considering those three startups combined went on to represent many billions of market cap/valuations).

So it was particularly fun to have Emil back at the event and exciting to hear about the major progress the company has experienced over the last few years. Emil spoke from Sweden at around midnight his time, bringing impressive energy despite the late hour and it was a great conversation.

Below is the video and full transcript.

Continue reading “In Conversation with Emil Eifrem, Founder and CEO, Neo4j”

In conversation with Richard Craib, Founder, Numerai

I’ve been interested in the intersection of AI and crypto for a while (see AI & Blockchain: An introduction), and Numerai is one of the most exciting companies I came across in that world. Numerai is a new kind of crowdsourced quant hedge fund, which provides data for free and enables any data scientist around the planet to contribute models they believe will beat the stock market. Numerai offers its own token, called Numeraire, to incentivize participants.

As it turns out, this model delivers exciting results, and Numerai announced a few months ago that it had outperformed market neutral hedge funds by 29%.

It was a real pleasure welcoming Richard Craib, founder of Numerai, to Data Driven NYC to talk about the very exciting work Numerai has been doing.

Below is the video and full transcript.

Continue reading “In conversation with Richard Craib, Founder, Numerai”

New Investment: Softr

At FirstMark, we believe that every company is going to become not just a software company, but a data company.

For that to happen, it is essential that technologies that leverage data be democratized. For the foreseeable future, the global demand for digital innovation will continue to vastly outweigh the number of developers, engineers and scientists. Therefore, some of the technical complexity must be abstracted away to enable a broader group of people to build data-driven products and companies.

Continue reading “New Investment: Softr”

My VC resolutions for 2022

1. Be a mentor: give interns the chance to ghost-write my meme tweets and Web3 thought leadership posts

2. Improve work/life balance: minimize non-public effort, such as board work, due diligence

3. Show humility: use the word “humbled” more often

4. Think ahead: e.g., take pics with early stage founders for future “how it started, how it’s going” IPO posts

5. Be proactive: e.g, remove names of unsuccessful investments from all my social profiles

6. Listen more: speaking 90% of the time in meetings is too much – 80% is plenty

7. Share knowledge: be better at telling founders what to do based on my own experience as a founder 17 years ago in a different industry

8. Add value: reply to every single founder question with 🚀🚀🚀

Happy new year everyone, let’s go!

Celebrating 10 years of Data Driven NYC

Sometime in the Fall of 2011, I was looking for a community in New York where a non-technical person like me could learn about about “Big Data”. 

I couldn’t find any.  So, on a whim on a November Sunday night, I logged on Meetup.com and created a group.  

After thinking about it for about 30 seconds, I came up with a oh-so-catchy name for it:  the New York Data Business Meetup. 

And so started a 10-year journey of community and network building, and an immensely fun and rewarding chapter of my professional life.

Continue reading “Celebrating 10 years of Data Driven NYC”

I am a VC. Here’s my daily routine.

I am a venture capitalist. Here’s my daily routine.

8am: Wake up hungover from a crypto dinner.

While in bed, tweet how refreshed I feel from a great night on my Eightsleep and my 1-hour morning meditation.

9am: look at my reflection in the mirror and say “you are *not* getting disrupted by Tiger”. Repeat 10x, increasingly loudly.

10am: Look at the list of deal announcements on VC newsletters. Feel vaguely nauseous. “Should have done that one”. Then “oh, that one, too. And probably that one…”

11am: Haven’t tweeted in a while. Time for some thought leadership. What would Naval say?

11:30am: Debate how to reach out to a founder to tell them I “heard good things”. Email? Too cheugy. Text? Creepy. Telegram? Bit desperate. Signal? This job is so hard.

Continue reading “I am a VC. Here’s my daily routine.”

In Conversation with Mark Grover, CEO, Stemma

As the volume of data in the enterprise continues to explode, with ever large amounts stored in data warehouses and data lakes, the problem of data discovery has become an increasingly painful one. How do data analysts, data scientists and business people find not just data, but the right data for the problem they need to solve? How do they know how it was produced, how recently it was updated and whether that’s the right dataset they need to use? In addition, from an organization’s perspective, there’s a question of data governance – how to manage access in a way that preserves data security and privacy, and ensures compliance with data protection regulations (GDPR, CCPA, etc.).

Data catalogs have been a powerful response to those problems, and that category has seen renewed activity in the last couple of years with a whole new group of startup entrants.

At our most Data Driven NYC, we got a chance to chat Mark Grover, co-founder and CEO of Stemma and the co-creator of Amundsen, the leading open source data discovery and metadata engine. Mark built Amundsen while he was a product manager at Lyft and started Stemma to offer a fully managed Amundsen.

It was a fun conversation about the space. Below is the video and below that, the transcript.

Continue reading “In Conversation with Mark Grover, CEO, Stemma”

In Conversation with Aaron Katz, Co-Founder & CEO, ClickHouse

Ask anyone who spends time in the data ecosystem, and the name “ClickHouse” is one that has come up countless times in conversations over the last few years.

ClickHouse is a real-time OLAP (meaning, analytical) database that is known for its performance and scalability, and has a wide footprint of users around the world.

ClickHouse started its life at Yandex, the Russian search giant. It was originally created as an internal web analytics tool called Metrica, which evolved around 2009 into “Clickstream Data Warehouse” or ClickHouse for short.

The product was open sourced in 2016 and became a very popular project, with adoption at impressive scale by a number of companies including Yandex (10s of trillions of rows), Uber, Ebay, Cloudflare, Spotify, Deutsche Bank, and more.

ClickHouse was spun out into early 2021 into ClickHouse, Inc., a commercial company co-founded by Aaron Katz, Alexey Milovidov (ClickHouse’s creator), and Yury Izarilevsky (ex-Google VP Engineering), with a focus on bringing ClickHouse to all types of companies via a managed version.

ClickHouse Inc raised a $50M Series A announced in September, followed closely by a $250M Series B last month, in which my firm, FirstMark, participated.

It was a treat to welcome Aaron Katz, the Co-Founder and CEO of ClickHouse, Inc. to Data Driven NYC. Prior to co-founding ClickHouse, Aaron had extensive experience as a world-class sales leader, most recently as the Chief Revenue Officer at Elastic and the Senior Vice President of Enterprise Sales at Salesforce

Below is the video and below that, the transcript.

Continue reading “In Conversation with Aaron Katz, Co-Founder & CEO, ClickHouse”

2021 MAD Landscape: The Top 10 Trends

For anyone interested in a quick overview of our long-form 2021 Machine Learning, AI and Data (MAD) Landscape, here are the Cliffs Notes! My co-author John and I did a presentation at our most recent Data Driven NYC, focused on top 10 trends in this year’s landscape.

As a preview, here they are:

  • Every company is a data company
  • The big unlock: data warehouses and lakehouses
  • Consolidation vs data mesh: the future is hybrid
  • An explosive funding environment
  • A busy year in DataOps
  • It’s time for real time
  • The action moves to the right side of the warehouse
  • The rise of AI generated content
  • From MLOps to ModelOps
  • The continued emergence of a separate Chinese AI stack

Below is the video from the event, and below that, the transcript.

Continue reading “2021 MAD Landscape: The Top 10 Trends”

A guide to understanding founder/VC fundraising conversations

VCs:

“Let’s take it from the top” = I have not read your deck

“There’s a lot to unpack here” = I have no idea what you just said

“We’re a very collaborative VC firm” = who else is in?

“Before VC, I was an operator” = I was a product manager for 9 months at YouTube, 10 years after it was acquired

“We can move aggressively” = we’ll take our time unless you’re also talking to Tiger?

Continue reading “A guide to understanding founder/VC fundraising conversations”

The Data Mesh: In Conversation with Zhamak Dehghani

In the admittedly small world of people who obsess over data technologies, one of the hottest topics of the last year has been the “data mesh”.

Created by Zhamak Dehghani of ThoughtWorks, the concept struck a chord and made the rounds in countless conversations on Twitter and elswhere.

As I highlighted in the 2021 MAD Landscape, the data mesh concept is both a technological and organizational idea.  A standard approach to building data infrastructure and teams so far has been centralization: one big platform, managed by one data team, that serves the needs of business users.  This has advantages, but also can create a number of issues (bottlenecks, etc).  The general concept of the data mesh is decentralization – create independent data teams that are responsible for their own domain and provide data “as a product” to others within the organization.  Conceptually, this is not entirely different from the concept of micro-services that has become familiar in software engineering, but applied to the data domain.

It was a real treat to get to chat with Zhamak at our most recent Data Driven NYC.

Below is the video and below that, the transcript.

Continue reading “The Data Mesh: In Conversation with Zhamak Dehghani”

Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape

Full resolution version of the landscape image here

It’s been a hot, hot year in the world of data, machine learning and AI. 

Just when you thought it couldn’t grow any more explosively, the data/AI landscape just did: rapid pace of company creation, exciting new product and project launches, a deluge of VC financings, unicorn creation, IPOs, etc.  

It has also been a year of multiple threads and stories intertwining.

One story has been the maturation of the ecosystem, with market leaders reaching large scale and ramping up their ambitions for global market domination, in particular through increasingly broad product offerings.  Some of those companies, such as Snowflake, have been thriving in public markets (see our MAD Public Company Index), and a number of others (Databricks, Dataiku, Datarobot, etc.) have raised very large (or in the case of Databricks, gigantic) rounds at multi-billion valuations and are knocking on the IPO door (see our Emerging MAD company Index – both indexes will be updated soon).

But at the other end of the spectrum, this year has also seen the rapid emergence of a whole new generation of data and ML startups.  Whether they were founded a few years or a few months ago, many experienced a growth spurt in the last year or so.  As we will discuss, part of it is due to a rabid VC funding environment and part of it, more fundamentally, is due to inflection points in the market.

In the last year, there’s been less headline-grabbing discussion of futuristic applications of AI (self-driving vehicle, etc.), and a bit less AI hype as a result.  Regardless, data and ML/AI-driven application companies have continued to thrive, particularly those focused on enterprise use cases.  Meanwhile, a lot of the action has been happening behind the scenes on the data and ML infrastructure side, with entire new categories (data observability, reverse ETL, metrics stores, etc.) appearing and/or drastically accelerating.

To keep track of this evolution, this is our eighth annual landscape and “state of the union” of the data and AI ecosystem – co-authored this year with my FirstMark colleague John Wu.  (For anyone interested, here are the prior versions: 2012, 2014, 2016, 2017, 2018, 2019 (Part I and Part II) and 2020.)

For those who have remarked over the years how insanely busy the chart is, you’ll love our new acronym – Machine learning, Artificial intelligence and Data (MAD) – this is now officially the MAD landscape!

Continue reading “Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape”

Dataiku’s Series E: Ushering the Era of Everyday AI

Today, Dataiku is announcing a major new financing – a total of $400m at a $4.6B valuation, led by Tiger Global (which had also invested in the company’s Series D), alongside a great group of existing and new investors.

While financings are ultimately just milestones, this is certainly a testament to the remarkable progress the company has been making towards becoming a major global software player, as it has scaled to hundreds of customers around the world and some 750 employees (and yes, hiring a lot more).

Beyond the headlines and high-fives, what is the story? Here’s a quick industry backgrounder and reminder for anyone new to the company.

A huge part of the data world has been historically focused on business intelligence, with both historical players (Tableau, Microsoft’s Power BI, Google Looker) and newer players (SiSense, Mode, etc.). Business intelligence tools enable you to analyze the past and the present of your business: “which region performed best last quarter?”, “who are our best salespeople?” etc. This is sometimes referred to as descriptive analytics.

Dataiku is a leader in another part of the data world, which different people call different names: data science, enterprise AI (for artificial intelligence), enterprise machine learning. Beyond the semantics, the core idea is to make it possible to asnwer questions about the future of your business, based on the analysis of historical data: “which customers are most likely to buy this product?”, “which customers are most likely to churn?”, “which transaction is most likely to be fraudulent?”, “which region is most likely to show strong demand this month?”. This area is sometimes referred to as predictive analytics.

Continue reading “Dataiku’s Series E: Ushering the Era of Everyday AI”