Introducing Kedro: Yetunde Dada, Principal Product Manager at QuantumBlack

If you follow the various talks at Data Driven NYC, and the data ecosystem on general, it’s plenty apparent that the overall tooling for data, data science and machine learning is still in its infancy, particularly compared to the software stack.

While this may feel ironic (yes, I really do think) given the billions in venture capital money that have been poured in the space, it’s worth remembering that the data stack (at least in its “big data” phase) is relatively recent (10-15 years), while the software stack has had several decades of evolution.

In many organizations, the data science and machine learning stack looks a collection of various tools, some open source, some proprietary, glued together with one-off scripts. Teams started experimenting with one tool, then another, then created ad hoc pathways to make it all work together over time, and before you knew it, you ended up with complex environments that are painful to manage.

In response to this situation, various machine learning frameworks have emerged to make abstract away the complexity. Several of those frameworks were developed internally at large tech comapanies to solve their own problems, and then open sourced.

Kedro is one such example. It was developed and maintained by QuantumBlack, an analytics consultancy acquired by McKinsey in 2015. It’s McKinsey’s first open-source product.

Kedro is somewhat hard to categorize. If it had its own category, it might be considered a Machine Learning Engineering Framework.  What React did for front-end engineering code is what Kedro does for machine learning code. It allows you to build “design systems” of reusable machine learning code.

At our most recent Data Driven NYC, we had the great pleasure of hosting Yetunde Dada, a Principal Product Manager at QuantumBlack, who has been the key driving force behind Kedro.

Below is the video and below that, the transcript.

Continue reading “Introducing Kedro: Yetunde Dada, Principal Product Manager at QuantumBlack”

Congratulations, Timber!

Although this was never publicly announced, for the last 2+ years, FirstMark was the lead investor in an exciting seed-stage company called Timber, alongside a great group of folks and firms including NextView, Notation, Addition, Lux Capital, Nat Turner, Zach Weinberg and Zach Perret.

Over time, Timber developed Vector, a very interesting open source project focused on observability data. Vector is effectively a “routing layer”, a high-performance observability data pipeline that enables customers to collect, transform, and route all their logs, metrics, and traces.

Continue reading “Congratulations, Timber!”

nextmv or the Democratization of Decision Science

Billions of dollars have been invested in the rise of data science and machine learning as mainstream disciplines in the world of business, one of the most exciting tech trends of the last (and next) decade.

In the enterprise, many of the applications of data science and machine learning ultimately produce a prediction: which customers are the most likely to buy? Or churn? Which transactions are most likely to be fraudulent? What part of town is likely to place the most food deliveries tomorrow afternoon?

However, powerful though it may be, there is one thing machine learning generally doesn’t tell you: once you have a prediction, what do you do with it? For example, once you have predicted high demand for food delivery in a certain part of town, how do you decide which delivery team member to dispatch where and when, to optimize for efficiency and maximize revenue and customer satisfaction?

Enter decision science. While the term has not crossed over to mainstream consciousness like its data science cousin, decision science has been around for decades. Also often known as Operations Research, it encompasses a variety of advanced analytical methods and quantitative models to help with decision-making and efficiency, including simulation, mathematical optimization, queuing theory, etc.  

Continue reading “nextmv or the Democratization of Decision Science”

In conversation with Wes McKinney, CEO, Ursa Computing

For anyone in the data analysis community, Wes McKinney a very well known figure. In addition to literally writing the book on the topic (“Python for Data Analysis”), he’s played a leading role in several key open source projects: he created Python Pandas, he’s a PMC member for Apache Parquet, and he’s also the co-creator of Apache Arrow, his current development focus. 

He’s also a serial entrepreneur, having co-founded DataPad (acquired by Cloudera) and now Ursa.

So it was a real pleasure hosting Wes for a chat at our most recent Data Driven NYC. As always, we tried to position the conversation to be approachable by everyone (with high level definitions) while being interesting for technical folks and industry experts.

Watch the video below (or read the transcript copied below the video) to learn:

  • What are pandas? What is a dataframe?
  • What is Arrow? What is its history and why is it a big deal?
  • What is Ursa Computing?
Continue reading “In conversation with Wes McKinney, CEO, Ursa Computing”

In conversation with Alok Gupta, Head Of Data Science & Machine Learning at DoorDash

Hosting Alok Gupta at our most recent Data Driven NYC was special for a couple of reasons.

First, because Alok is the very talented head of data science and machine learning in a company that has all sorts of really interesting use cases for AI and just had a phenomenal IPO, valuing it at $60B at the time of writing.

Second, because it was a homecoming of sorts for Alok, whose journey in the field of data science was inspired in part by Data Driven NYC – as he puts it:

This also feels like it nicely completes my journey starting 8 years ago when I was working on Wall Street in 2013 and started coming to your monthly evening talks at the Bloomberg building to learn more about ‘Data Science’. That was really a launching point for me to switch from trading to DS, and I’m grateful to be able to give back in a small way :).

One of those stories that brings joy to the heart of the organizers of this community!

Here are the video, as well as a full transcript for easy perusal:

Continue reading “In conversation with Alok Gupta, Head Of Data Science & Machine Learning at DoorDash”

Data Observability and Pipelines: OpenLineage and Marquez

There’s an inherent tension at the heart of modern data infrastructure. On the one hand, it’s becoming more mission-critical every day, as companies around the world rely on it to run their business. On the other hand, it’s more complex, and potentially brittle, than ever, an “assembly chain” involving multiple tools and repositories.

This tension has led to the emergence of DataOps as a distinct and very active segment. One particularly important area is known as “data lineage“. The concept is basically to monitor data pipelines and understand the journey of data through its various transformations and usages. This makes it possible to fix any issues that happen along the way, and go to the root of data quality, and potentially fairness, issues.

Because data lineage involves many different tools, platforms and companies, it makes sense for those different parts of the ecosystem to collaborate around standard definitions. This is the concept behind OpenLineage, a cross-industry effort involving creators and contributors from key data projects (DBT, Spark, Pandas, etc.), gathered together at the initiative of the founders of Datakin, an SF startup beyond the open source data lineage project Marquez (originally started at WeWork).

At our most recent Data Driven NYC, we had the pleasure of hosting Julien Le Dem, CTO of Datakin. His talk (video below) is very approachable and educational.

Continue reading “Data Observability and Pipelines: OpenLineage and Marquez”

What are the Best Startup Name Changes in History?

The tech industry has a rich history of startups that started with a pretty awkward name, and rebranded over time to the big brands we have come to know. Some of those changes are plain fun to remember.

A few days ago, I tweeted this, and it led to a cool thread with plenty of examples and suggestions (Twitter at its best), so I thought I’d compile the results here for easy reference.

Continue reading “What are the Best Startup Name Changes in History?”

In Conversation with Tristan Handy (Fishtown/DBT) and Jeremiah Lowin (Prefect)

As we close an incredibly active year in the world of data infrastructure, it was a particular treat to host at Data Driven NYC two of the most thoughtful founders in the space, for an in-depth conversation about key trends.

Tristan Handy, is the Founder & CEO of Fishtown Analytics, makers of DBT. DBT is one of the most popular, open-source, command-line tools that enable data analysts and engineers to transform data in their warehouse more effectively. Based in Philadelphia, the company raised both a $12.9M Series A and a $29.5M Series B, back to back in 2020. Tristan also does a great weekly newsletter, The Data Science Roundup.

Jeremiah Lowin, Founder & CEO of Prefect. Prefect is the new standard in dataflow automation, trusted to build, run, and monitor millions of data workflows and pipelines. As another leader in the open-source world, Prefect powers data management for some of the most influential companies in the world.

We had a wide ranging conversation, covering lots of topics: the modern data stack, data lake vs data warehouse, empowering data analysts, workflow automation etc.

Video and full transcript below!

Continue reading “In Conversation with Tristan Handy (Fishtown/DBT) and Jeremiah Lowin (Prefect)”

New investment: Pigment

Business planning is, of course, one of the vital functions in the enterprise: hard to run a successful company beyond a certain size without a clear sense for objectives and resources.

Yet, to this day, business planning is a often a cumbersome, rigid and time-intensive process. Typically led by the finance team, it is largely done through email, excel spreadsheets and meetings. In large companies involving multiple business units and geographies, the process can take several months. As a result, business planning tends to effectively happen once a year.

Continue reading “New investment: Pigment”

In Conversation with Amit Bendov, CEO, Gong

It wasn’t a walk in the park. Today, Gong is a super hot company. But at that time, we got a lot of no’s, by not stupid people.  There were a lot of objections, like salespeople are going to hate it as a big brother, and Google and Amazon will compete with you“, says Amit Bendov, the CEO of Gong.

From those early days of facing skepticism, Gong has indeed become a hot startup loved by customers and ushering its own category, revenue intelligence. It’s also had tremendous fundraising success with VCs, raising $305M in less than 18 months, including a $200M round on a $2.2B valuation, announced in August 2020.

We were thrilled to welcome back Amit at Data Driven NYC, where he had spoken a few years ago, when he was CEO of SiSense.

Continue reading “In Conversation with Amit Bendov, CEO, Gong”

Quick S-1 Teardown: C3.ai

For anyone following the software industry, there’s been a little bit of snark about C3.ai (“C3”) over the years.  Here’s a company that was founded by Silicon Valley royalty (Tom Siebel, who sold Siebel Systems to Oracle in 2006 for just shy of $6B), with seemingly limitless access to capital, that somehow seemed to be pivoting every few years to something new – from energy at first, to the Internet of Things, to Artificial Intelligence. 

C3 also largely eschewed the startup echochamber – funded personally by its founder at first, it didn’t raise money from the usual VC suspects, target well-know startups as its first customers, or open source any AI frameworks, working instead with a small group of Fortune 1000 and government customers. As a result, it didn’t build the kind of buzz that often precedes the most notable startups on their way to becoming public.

Lo and behold, what emerges in this IPO is a solid company by enterprise software IPO standards, with $157m in revenue, growing 71% yoy, a 75% gross margin and a $69m loss. 

It will be interesting to see how the market reacts to this IPO.

On the one hand, C3 is not growing anywhere as explosively as a Snowflake, and in fact seems to have just had a bad quarter of decelerating growth. There are also other concerns, including account concentration and a substantial loss (not as pronounced as a Snowflake or Palantir, but still on the higher range of the software market).

On the other hand, the tailwinds around the deployment of ML/AI in the enterprise are very strong, and C3 is clearly positioning itself as one of the very first enterprise AI companies to go public: its ticker symbol on the NYSE will be “AI”, and the term “machine learning” is mentioned 56 times in the S-1.

This IPO will be an interesting test for the continued appetite of financial markets for all things AI.

Here’s a quick analysis of the S-1 and main characteristics of the business, put together by my FirstMark colleague John Wu and I.

Continue reading “Quick S-1 Teardown: C3.ai”

In Conversation with Ashley Kramer, CPO/CMO, Sisense

Sisense is a fast-growing business intelligence startup that was ranked #31 in this year’s Forbes Cloud 100, and reached unicorn status at the beginning of 2020 through a $100M Series D led by Insight Partners.

We’ve had Sisense speak twice at Data Driven NYC over the years, first CEO Amit Bendov (now CEO of Gong) (video of the talk here) and then new CEO Amit Orad (video of the talk here).

With all the recent progress, we were particularly excited to hear the update and welcome Ashley Kramer, who recently joined Sisense as Chief Product and Marketing Officer, after a very impressive run at Amazon, Tableau and Alteryx.

We covered a bunch of topics, including:

  • What does “Business Intelligence” actually mean?
  • The convergence of BI and data science
  • How does Sisense position in the context of the consolidation of the BI industry (hint: multi-cloud and focus on different personas, including business users, data analysts and more technical folks)
  • Where Sisense sits in the modern data stack
  • How Sisense has been building data network effects with its knowledge graph
  • Dashboards are great, but embedded analytics are better

As always, Data Driven NYC is a team effort – many thanks to Jack Cohen for co-organizing, Diego Guttierez for the video work and to Karissa Domondon for the transcript!

Continue reading “In Conversation with Ashley Kramer, CPO/CMO, Sisense”

Resilience and Vibrancy: The 2020 Data & AI Landscape

2020 Data and AI Landscape

In a year like no other in recent memory, the data ecosystem is showing not just remarkable resilience but exciting vibrancy.

When COVID hit the world a few months ago, an extended period of gloom seemed all but inevitable.   Yet, as per Satya Nadella, “two years of digital transformation [occurred] in two months”.  Cloud and data technologies (data infrastructure, machine learning / artificial intelligence, data driven applications) are at the heart of digital transformation.  As a result, many companies in the data ecosystem have not just survived, but in fact thrived, in an otherwise overall challenging political and economic context. 

Perhaps most emblematic of this is the blockbuster IPO of Snowflake, a data warehouse provider, which took place a couple of weeks ago and catapulted Snowflake to a $69B market cap company, at the time of writing – the biggest software IPO ever (see our S-1 teardown).  And Palantir, an often controversial data analytics platform focused on the financial and government sector, became a public company via direct listing, reaching a market cap of $22B, at the time of writing (see our S-1 teardown).

Continue reading “Resilience and Vibrancy: The 2020 Data & AI Landscape”

In Conversation with David Cancel, CEO, Drift

David Cancel puts the “serial” in serial entrepreneur. David has founded a total of five software companies over the years, which he says make him “certifiable”. The list includes Performable, which was acquired by Hubspot, where David subsequently spent three years as Chief Product Officer.  

In 2015, David left Hubspot to start Drift, a Boston-based conversational AI platform for marketing and sales. The company has grown very rapidly and now has a whopping 50,000 customers. Drift has raised a total of $107M from a number of  venture firms including Sequoia, General Catalyst and CRV. The company has also been recognized as a Forbes Cloud 100 company.

David also has built a very strong presence and brand in the entrepreneurial community. He writes a popular newsletter, ‘The One Thing’ and hosts a long-running podcast, ‘Seeking Wisdom’. He’s very involved in a number of startups as advisor and angel investor. He’s also an Entrepreneur-in-Residence at Harvard Business School.

David and I had a really interesting, wide-encompassing conversation at our most recent Data Driven NYC event, where we covered a range of topics including:

  • Building a global SaaS brand with 50,000 customers in an astonishingly short amount of time
  • How Drift was founded to take advantage of a fundamental paradigm shift
  • Creating a new type of CRM, driven by conversational data, with automation at the core
Continue reading “In Conversation with David Cancel, CEO, Drift”

In Conversation with George Fraser, CEO, Fivetran

One of the biggest recent trends in the data world recently has been the rapid emergence of the “modern data stack”.

This stack is largely centered around the cloud data warehouse, with its massive scalability and elasticity capabilities. Snowflake’s blockbuster IPO this week, and the underlying performance of the company, demonstrate the level of excitement from both customers and investors about the data warehouse.

But the modern data stack is more than just the data warehouse, there’s a whole pipeline involving other technologies, where data gets collected, stored and analyzed. Downstream from the data warehouse, you find business intelligence solutions, as well as some machine learning platforms, to analyze the data. Upstream from it, you find solutions that focus on extracting data from various sources and loading it into the data warehouse (ETL/ELT).

This is where Fivetran comes in. A fast-growing company with a unicorn status, it automates data integration from source to destination, through a large library of connectors.

It was very fun to host Fivetran’s CEO, George Fraser, at our most recent Data Driven NYC event. We had a great conversation, both very approachable for a non-technical audience but also interesting for more technical folks.

Continue reading “In Conversation with George Fraser, CEO, Fivetran”