Quick S-1 Teardown: Confluent

A member of our Emerging MAD Index of companies on their path to an IPO, Confluent is a very interesting company in a strategic part of the data space, providing infrastructure for real-time data streaming – what it nicely calls “data in motion”, in contrast to the world of batch processing or “data at rest”.

I had the pleasure of hosting the company’s co-founder and then CTO, Neha Narkhede, at Data Driven NYC back in 2016, and her great talk remains entirely relevant to understand the premise behind the company and its core technical foundation.

Confluent recently released its full S-1, and will trade under the stock ticker CFLT on the NASDAQ.

In the same vein as previous “Quick S-1 teardowns” (see Palantir, Snowflake, nCino), here are some high level thoughts and quick highlights, from my colleague John Wu and I.


The streaming opportunity and beyond

  • Real-time data processing has been a hot topic since the early days of the Big Data era, 10-15 years ago – notably, processing speed was a key advantage that precipitated the success of Spark (a micro-batching framework) over Hadoop MapReduce
  • However, for years, real-time data streaming was always the market segment that was “about to explode” in a very major way, but never quite did. Some industry observers argued that the number of applications for real time data is, perhaps counter-intuitively, fairly limited, revolving around a finite number of use cases, like online fraud detection, online advertising, Netflix-style content recommendations or cybersecurity.
  • Confluent has certainly proved the naysayers wrong, as it’s clearly built an impressive, fast-growing business
  • The bull case for Confluent in public markets is that the shift to real-time data processing is just starting in earnest:
    • Online machine learning, IoT and microservices are drastically increasing the need for real time, as the company notes
    • The range of use cases will continue to expand, not just for real time data processing, but also, increasingly, real time data analytics – powered by a emerging stacks such as Apache Kafka + ksqlDB + ClickHouse (or Druid, etc) + Superset (or Looker, Mode, etc.).  
  • Confluent’s ambitions go beyond real-time data:
    • Confluent wants to gradually conquer “data at rest” use cases as well.  KsqlDB, its native event streaming database, “unifies the processing of data in motion and data at rest”. Confluent believes it will lead to significant displacement of batch data processing on traditional databases and a corresponding shift in spend to data in motion technologies.
    • As a result, Confluent views itself as becoming a foundational data platform in the enterprise for all use cases: “the central nervous system of an organization, allowing data to be captured and processed as it is generated around the whole organization, enabling organizations to react intelligently in real-time.”
  • Will it get there?
    • There’s no lack of other companies in the ecosystem that also view themselves as the heart of the enterprise’s data infrastructure (e.g., Snowflake, Databricks). In addition, the streaming world itself is increasingly competitive with both large players (Azure Event Hubs, AWS Kinesis and DynamoDB Streams, Google Dataflow, TIBCO Streaming, etc) and emerging startups (for example, see our recent fireside chat with Materialize)
    • This is certainly an enormous market, with room for a number of successful public companies, as we are in the early innings of the deployment of data infrastructure as a core foundation for any company around the world 
    • The key question whether Confluent’s impressive performance in the data streaming world can become a beachhead for a broader domination of that enormous market, where it would replace over time a lot of other key repositories.

Open Source and the Power of Kafka

  • Confluent is another example of an exciting modern enterprise software company built on top of open source – Apache Kafka
  • This is also an example of the now classic enterprise startup creation story that makes VCs swoon: smart engineers (Jay Kreps, Neha Narkhede, Jun Rao) work at a big Internet company (LinkedIn) and are confronted with technical problems that the rest of the world has not experienced yet. To solve those problems, they create a new framework (Kafka). They open source said framework (in 2011), which rapidly gains in popularity. A few years later (2014), they leave the big Internet company to create a startup (Confluent), which will build the commercial version of the open source software. VC money pours in, and the company is off to the races.
  • It’s interesting to note that Confluent to this day remains pretty single threaded on Kafka, particularly if you compare it to a company like Databricks, which rose to prominence on the back of Spark, but has since then launched multiple of open source products (MLFlow, Koalas, Delta Lake, etc)
  • Kafka is a big part of Confluent’s bottoms up, developer first go to market motion
  • Kafka is also a big part of Confluent’s moat
    • There are real challengers to Kafka these days, in particular the combination of Apache Pulsar and Apache Flink, which many argue offers higher performance. 
    • However, Kafka is hard to displace because of the size of its developer community (“more than 60,000 meet-up members across over 200 global meetup groups, estimated to have been used by over 70% of the Fortune 500“) and the overall maturity of its ecosystem (partners, connectors to hundreds of sources, etc.)
  • Like other open source companies (Redis Labs, MongoDB, Cockroach Labs, Elastic), Confluent introduced some restrictions on its open source model, back in 2018.  This was to protect itself against the big cloud providers, such as Amazon which had just announced its own fully managed version of Kafka. Confluent didn’t make any changes to the Kafka license itself.  Instead, it created a new Confluent Community License that is “substantially differentiated from Apache Kafka and was fundamentally re-architected to operate at cloud-scale, while being interoperable with existing Apache Kafka systems”.  
  • Confluent followed the same path to monetization as other successful open source companies, going from an open source project to an enterprise self-managed offering (Confluent Platform) to a fully managed offering (Confluent Cloud). The company is clearly betting that Confluent Cloud will become to Confluent what Atlas has become to MongoDB (where it went from a new product in late 2016 to representing 46% of MongoDB’s revenue in 2021, see our fireside chat with Dev Ittycheria, CEO of MongoDB). While Confluent Cloud is the fastest growing part of Confluent (see below), it is still early days as it represents “only” 18% of Confluent’s overall revenues.


  • FY20 ended with revenue at $236.6M
  • FY20’s net loss was high at ($229M)
  • $280M of cash on the balance sheet as of March 31, 2021 – not that much given the size of the net loss
  • 1,473 employees operating across 20 countries


  • Founded in September 2014, the company grew extremely quickly through most of its existence:
    •  Broke the $100M annual recurring revenue mark in just under 5 years in April 2019
    • From 2018 to 2020, total revenue grew from $65.2M to $236.6M
  • However, the pace of growth has slowed down significantly, albeit on a larger base:  +130% YoY from 2018 to 2019 but only +58% YoY from 2019 to 2020
  • This certainly continues to make Confluent a high growth company, but far below some best of class growers at IPO like Datadog or Snowflake (which for example was growing 174% annually to $264.7M the full fiscal year before its IPO)
  • According to the management discussion, COVID-19 represented only a modest adverse impact on certain parts of the business

Lines of Business:

  • Two broad product offerings: 1) Confluent Cloud – fully-managed cloud-native SaaS offering, and 2) Confluent Platform – enterprise-ready, self-managed software offering that is cloud-agnostic and multicloud, and can be deployed on premise, private cloud, or in the public cloud
  • Subscription offerings (Confluent Cloud, Confluent Platform licenses, and Confluent Platform post contract support, maintenance, and upgrades) is growing faster than services, consisting of 88% of revenue up, from 87% in 2019, while services made up 12% in 2020
  • Confluent Platform growth was largely driven by PCS (post contract support, maintenance, and upgrades) rather than licensing, growing 63% from $78.7M in 2019 to $128.2M in 2020
  • Licenses for the Confluent platform grew just 32% from $37.1M in 2019 to $49M in 2020
  • Confluent Cloud, first launched in 2017, is the fastest growing Confluent product, and is likely to be the major future growth driver. 
    • It grew 117% YoY from 2019 to 2020,  vs + 53% YOY for the Confluent Platform
    • However, it is still nascent contributing only 13% ($31.4M) of Confluent’s revenue in 2020, up from just 10% in 2019. For the first quarter of 2021, this figure was 18%


  • The company talks a lot in the S-1 about its ability to “land and expand” at customers and grow over time, but its Net Revenue Retention (which measures upsells to customers) is decreasing
  • Dollar-based net retention for 2020 was 125%, greatly decreasing from their prior best-in-class rate of 177% in 2018 and 134% in 2019.  
  • The decline seems primarily driven by:
    •  the impact of existing customers becoming a larger portion of both the overall customer base and ARR, 
    • large initial deal sizes that incorporate potential growth, 
    • the impact of the COVID-19 pandemic, and 
    • the initial impact of existing customers transitioning to Confluent Cloud 

Gross and Net Margins:

  • Gross margin decreased from 75% in 2018 to 67% in 2019, before improving slightly to 68% in 2020
  • This decrease was driven by decreasing margin for across both subscription (83% in 2018 to 76% in 2020) and services (down from 19% in 2018 to 6% in 2020)
  • Growth in operating expenses from 2019 to 2020 (+99% YoY, driven by a spike in general & administrative expenses) outpaced revenue growth (+58% YoY)
    • Confluent’s net margin in 2020 was -97%, down from -64% in 2018 and -63% in 2019. This decrease can be attributed to increasing OpEx across all functions over the 2 year period, with ballooning R&D (+157% YoY) and S&M (+112% YoY) spend in 2019, and G&A coming in 2020 (+397% YoY)


  • As of March 2021, Confluent had 2,540 customers (up from 820 in December 2019), serving 136 out of the Fortune 500
  • Confluent has 513 paying over $100K in annual recurring revenue, and 60 paying over $1M ARR 
  • Notable customers span across industries, including Goldman Sachs and Morgan Stanley in financial services, NASA JPL and the Centers for Medicare & Medicaid Services in government, Michelin and SunPower in manufacturing, Advance Auto Parts, Dick’s Sporting Goods, and Domino’s Pizza in retail, and Instacart, ServiceNow, and Grab in technology


  • While Confluent’s absolute growth is largely driven by increasing US sales (+$54M from 2019 to 2020), Confluent’s international presence is expanding at a faster rate. 
  • In 2019, 32% of Confluent’s revenue was from outside of the United States. For 2020, that figure was 34%, and for first quarter of 2021, it was 36%
  • Confluent’s international revenue grew 67% YoY from 2019 to 2020, compared to just 53% for it’s American segment
  • No country represented more than 10% of Confluent’s revenue with the exception of the United States during any period

Funding & Ownership

  • According to CB Insights, Confluent has raised $455.9M across 5 rounds, mostly recently raising a $250M Series E led by Coatue, valuing the company at $4.5B. Other investors include Sequoia Capital, Index Ventures, Benchmark, Franklin Templeton, Altimeter Capital, Data Collective, and LinkedIn.
  • Investor ownership: Benchmark owns the largest stake at 15.3%, followed by Index with 13%, and Sequoia Capital with 9.3%.
  • Founder ownership:  At IPO time, Jay Kreps owns 12.6% of outstanding shares, Jun Rao owns 10.6%, and Neha Narkhede and family trusts affiliated with her own an aggregate of 7.5%.
    • Note that those percentages reflect recent secondary sales: all 3 founders sold some of their shares in a secondary transacction in July 2020 (Jay sold $39M, Neha sold $39M and Jun sold $30M) in conjunction with their series E. Subsequently, Neha sold another $77.8M in secondary shares in September 2020, for a total of 5,334,779 shares, or about 2.33%.

DISCLAIMER: none of this is investment advice — for venture capitalists like us, going through S-1s is a routine learning exercise, and we’re just “open sourcing” our effort in case it might be interesting to others.

Leave a Reply

Your email address will not be published. Required fields are marked *