Dataiku (in which I’m a proud investor and board member) has had an impressive ride over the last few years. An early entrant in the enterprise Data Science and Machine Learning platform category, the company successfully expanded from its French/European roots to build a very strong presence in the US (where it is company is now headquartered) and, increasingly, Asia.
Along the way, Dataiku:
- became a unicorn, most recently raising a $100M Series D in 2020
- was named a “Leader” in Gartner’s Magic Quadrant for Data Science and ML Platforms in both 2020 and 2021
- collected many accolades, such as CB Insight’s “AI 100” and several of Forbes lists: “Cloud 100”, “AI 50” and “America’s best startup employers in 2021”
It was really fun to host CEO Florian Douetteau at Data Driven NYC once again, after previous appearances in 2016 (here) and 2018 (here). We covered a bunch of different topics, including:
- What enterprise AI is about: not flying cars, but optimizing hundreds of business processes
- Why enterprises need to move past their fear of data and AI
- The key principles behind the design of the Dataiku platform: handling the entire data lifecycle, and democratizing data/AI across teams
- Dataiku’s partnership with Snowflake
- The upcoming launch of their starter / SMB self-serve product, Dataiku Online
Below is the video and below that, the transcript.
(As always, Data Driven NYC is a team effort – many thanks to my FirstMark colleagues Jack Cohen and Katie Chiou for co-organizing, Diego Guttierez for the video work and to Karissa Domondon for the transcript!)
VIDEO:
TRANSCRIPT (edited for clarity and brevity)
[Matt Turck] Dataiku has been growing incredibly fast and it is getting better known in the US now, but it started in Europe. Maybe tell us a little bit of the history.
[Florian Douetteau] We started as a small team in France back a few years ago. And yes, we kept probably a low profile as we were moving to the US, to New York as a first step. But recently, probably in the last 18 months or so, lots of things accelerated a lot – new funding, getting to a unicorn status, getting many Fortune 100 customers in a row. So, lots of traction, I think driven also by the fact that the data and AI market was maturing a lot. And so right now, we are growing fast. We are 650 employees and doubling on a year-to-year basis. So, everything is going very fast in the world of data and AI these days, which is probably a feeling that is also shared by most of the participants you have today. Everything is going quickly.
The company is headquartered in New York, with multiple locations throughout the US?
Correct. Headquartered in NYC, with multiple locations in the US. We don’t have a full San Francisco office, but that will happen soon.
Dataiku is a platform for the deployment of data science, machine learning and AI in the enterprise. What does it actually mean to deploy machine learning and AI in the enterprise, and what are some of the key use cases?
Data and AI in the enterprise is mostly not about a magical product, or a flying machine driven by AI. It’s mostly about the business processes, probably hundreds of them that you have inside the company. Most companies operate like a clockwork, meaning you’ve got many, many business processes that work together in order to create value. Possibly for any decent-sized company, 500-1000 of them. And data and AI is mostly about optimizing each of them step-by-step to make them more efficient, and more automated.
And that’s why it’s so hard, it’s because data and AI in the enterprise is mostly about this very long transformation that most enterprises will have to go through. It’s probably a 20-25 year journey, and we are one third into it. And at the end of the journey, you have completely new way to work, with data and AI being very pervasive.
And so it’s about use cases as different as in finance, optimizing your cash management by better predicting where you will be, understanding what will be the defects you’ll get from your providers. It’s about better targeting your customers using data. It’s about if you’re into healthcare, pharmaceutical, better understanding what are the choke points in your logistics journey.
Many of those use cases, taken individually, can be mundane, but if you add them up, that’s tremendous value for the enterprise. And that’s most of it, where the value of enterprise AI is, the sum of all of those use cases.
Could you paint a picture of the FAANG vs. the remaining 99.5% of the world. What does that means in terms of companies’ abilities to deploy machine learning and AI?
There is a perspective that the FAANGs are ahead in terms of data and AI, meaning that they are actually essentially AI companies, whereas the other companies of the world are not.
But I think it’s the wrong picture in terms of where the world should be at the end. [Non-FAANG] companies should move away from this position of fear. Today, most enterprises fear data and fear AI in a sense. They have this fear of being late. They have the fear of not having the skills inside the organization. They’ve got the fear of the complexity of data and AI, just because of the number of things, and number of systems you need to put together.
Enterprises need to build the kind of systems, platforms, and processes to get things done at a larger scale and make data and AI the new normal. [That’s the lesson from] Facebook and Google and other new digital companies: they just consider data and AI is one of the key aspects of their day-to-day operations. It’s like the blood in [their] veins. That’s where most other companies needs to get to.
In order to get there, the Googles and the Facebooks of the world built open source projects. They hired lots of data scientists and data engineers and so forth. It was very, very costly. Possibly hundreds of millions in order to build those.
But for most of other companies on planet Earth, the journey won’t be the same, meaning they won’t spend $200M or $500M in order to build from scratch data and AI platform, just because it’s not realistic. They’ve got better things to do, actually.
And in order to do that, in order to actually make it normal for enterprises to get to data and AI, we built Dataiku – so they don’t have to go through that, all of that complexity or fears of getting things done in terms of data.
Could you talk about the fundamental ideas behind the design of the Dataiku platform?
[Our vision] is all about making AI simple and normal for the enterprise. Meaning there is no way that in the long term, 10 years from now, enterprises will have five, six, seven, eight different tools to manage their data lifecycle. It would make no sense.
In order to make things simple and usable, you need to have mostly one single platform for the entire data lifecycle, starting from the data as it is, which is usually ugly, dirty data as it is produced by systems at the beginning of the cycle, and business value at the end of the cycle. And in between, you have a few logical steps: you need to clean the data, you need to merge data together, you need to understand the business problem, you need to build a predictive model. Sometimes you need more metrics. You need to apply machine learning. You need to move that in production and upload and load it. So you’ve got AutoML, MLOps, data preparation, all of those keywords. But ultimately AutoML, data preparations and MLOps, from the perspective of the business users, are just keywords, they don’t bring value by themselves.
What brings value by itself is actually the ability to get from data as it is, to the answer to a business problem. And that’s the purpose and principle of Dataiku, that we try to actually get the users with this full journey.
Our other key design principle of design is that, in order to do data, you can’t be alone. Meaning there is a need for collective awareness on data, because people, meaning there is no single hero on data, the team is usually the hero.
And so you need to empower business analysts, and actually most business folks around data, just because there are not enough data scientists in the world to solve all the data problems we have. So it’s about bringing together the data scientist, and the data analyst, and the business analysts, and making them work together on those large complex problems that should bring value. We also built a platform with this crazy idea that it could be a platform for data analysts. So very visual, but also appealing for coders, meaning people doing Python or R on an everyday basis. And that’s the other principle of the platform.
It seems very much like the saying, “It takes a village.” And there’s a very strong collaboration layer across Dataiku, that has been the case from the beginning. So that data engineers, data scientists, data analysts, but also business folks can all work together on one platform. That’s like a history of previous experiments and projects running now so that the concept of a system of record, and very practically if you’re like the head of, you’re working on a data project, and you have folks in Chicago, and Texas, and London, that’s the one platform where you can collaborate with everyone.
Yes. It’s especially important these days, just because collaboration has never been harder or easier depending on your perspective. And so indeed you have this need. Essentially we built Dataiku from the frustration of the lack of collaboration between business stakeholders, data analysts, and data scientists, the teams that were operating in silos instead of collaborating. Like I give you an order, you send me an email, and I’ll try to actually understand what you mean by this email. Will you be Imperial or metric? It’s not a proper way actually to collaborate. It’s actually a way to actually distract value instead of creating value. And that’s why we built the platform where essentially you can centralize, and see this as teamwork, and build step-by-step a pipeline together as a team. And this way actually way better articulate and understand what’s happening.
That’s also the other big pain in data these days, is that most people don’t actually understand what is happening, where the data is coming from, what this particular data transformation machine learning model meant. And it’s also a big challenge for the data space these days, because it’s getting out of control, if you don’t actually have the good practice in place.
Help us paint the picture of this space and where that Dataiku fits in – compared to other players like Databricks or DataRobot, where does Dataiku fit in?
I guess having data at the beginning of the name is very common among data companies [laughter]. Dataiku is the one with the bold perspective of having AI in the middle of the name. I guess a very bold positioning.
[SIDE NOTE ON THE COMPANY NAME: Dataiku is a portmanteau of “data” and “haiku”]. An “haiku is a small Japanese poem in three verses. It’s typically very small but carries a lot meaning.]
Some companies in the data space, such as the Databricks of the world, are mostly about infrastructure, and helping you put data in a given location. Other companies like DataRobot are mostly about automating and especially the ML part of the process, and meaning the auto-ML part of the process.
In contrast, Dataiku is mostly about boosting the creativity and the inventivity of as many people as possible inside the organization.
Because I think that’s where the challenge actually is, you need to actually use data in order to empower people, and not just with reports, or new charts, or whatever. But by putting people in charge so that they can use data to build something new. And I think this creativity part of data is possibly what will make, and what will keep make this feel interesting in the next few years.
Today happens to be a very timely conversation, because you just made an important announcement this afternoon about the relationship with Snowflake. Do you want to expand on that?
Yes, we started working with Snowflake a few years ago, building integration with their product. But then we had deeper conversations with them starting a few months ago, especially as they announced Snowpack, which is a way within Snowflake to further expand the capability of the platform by enabling, not just SQL, but also Java, or other programming languages inside the platform.
It really helped us to actually think about new ways to leverage Snowflake and Dataiku together. As I explained, Dataiku is essentially about end to end – meaning, you can do everything end to end in the Dataiku platform from data preparation, to machine learning, to moving things in production. And so working with the teams at Snowflake with those new evolutions of the platform, we were able to build new capabilities together, and also build further our plans in the future. And we are now also working on a deeper partnership with Snowflake as a platform.
And it definitely it’s an interesting step for us. As one example, Dataiku is accessible directly from Snowflake and their partner portal. You can very easily click from your Snowflake instance and get a Dataiku instance, meaning a data science environment. From there you can do lots of things that were very costly or complicated to do in the past. Meaning a few years ago doing all of those heavy lifting of being able to do ML on a few gigabytes of data was possibly taking a few days to any seasoned data engineers.
Today you can actually do the same kind of achievement in a few clicks, and I think this kind of flexibility is key for the enterprise in order to actually get to the next steps, and actually deliver on valuable data in the future.
And also, there’s a new “starter” version of Dataiku in the works [note: currently in beta], which will be particularly easy to use and affordable, in particular for smaller companies?
Yes, we’re releasing this new product, Dataiku Online. It’s fully self-service, and very easy to access, and leverage for folks. And it starts as something that you can use as an individual, then as a small team. And we meant it as a way actually for anyone to start discovering, delivering value from data science. Even if you’re a team, or a company of 10, 15, or 20 people, you can actually start doing things quicker in terms of data science, thanks to this product.
Before switching to audience questions, I’d love you to take a step back and talk about what you have learned. You’ve been running this company for eight years now, and you’ve had first hand experience in deploying all this project in some pretty large and impressive companies.
What we learned over the time is that it takes lots of energy and will to fill the gap between technology and business objectives. Meaning you’ve got technologies such as a statistical concept, or concept of machine learning, like what-if analysis. It’s been in any textbook for 10, 15, or 30 years. It’s well understood and so forth, but it’s just a statistical concept. But at the end of the day, if you want to actually bring this technology into an enterprise, you need to carefully build user interfaces so that business users understand that it’s a way for them to actually understand if AI is doing something that makes sense for them.
And so building this bridge between the business intent of: “I have knowledge. I want to embed AI into my decision making, but I need to be able to use that technology in order to check that AI is delivering value” — that’s a gap you need to bridge with the user interface and technology. And it takes time and willpower to actually bring people there.
And so in this Dataiku Online product for example, we built specific interfaces for that, to enable business users to be able to check, and make better testing of models. We also added, for instance, capabilities such as machine learning in searches, which is a very old idea. It’s the idea that you have an interesting clear business knowledge, having an understanding of what’s happening in your company, that you want to be able to use when building a machine learning model. It’s a very simple idea. It’s like as old as I remember 20 years ago, it was something that we were already discussing.
But in order to make it happen, you need to facilitate for business users the ability to just write those rules, so that when a data scientist is building a machine learning model, those rules are actually checked. That’s where actually the value comes, because if you don’t actually build the processes as it happens, and add the platform while it happens, it never happens in practice. And so with Dataiku Online we actually learned from the behaviors, and all the things we learned from delivering data science projects. In order to actually fill that gap between the technology and its use by business stakeholders in order to give their opinion, voice their opinion, and also use their business knowledge in order to help data science in a positive way.
Let’s take a couple of questions from the group. One from Matt, “How does Dataiku think about sensitive data? For example, in financial services or healthcare where data privacy and regulation are critical, does it integrate with tools such as encryption or differential privacy?”
Yes. Yeah, we do integrate with a few of those tools, meaning first well, first natively, but also with a specific integration. Especially in terms of differential privacy, we work with a couple of amazing companies in the sectors, and optimizing tools. Because indeed, I think that it’s a part of the answer in order to deliver value into sectors.
From what we observed in some use cases, it’s not machine learning per se, or the performance of machine learning models that is the choke point in order to add value. It’s the ability to actually meet the regulation constraints. Meaning literally you spend more time checking models, and auditing models, than actually deploying models. And indeed, we think that our platform helps, and will further help also in the future in order to actually get more productivity into sectors.
Another question, from Allen Smith: “Interested in your growth journey. The decisions you took about growing, choices about the move to the U.S., and how you met your original vision for your software to the markets you are chasing. So indeed, probably a long story, but any highlights that come to mind.
I think that it’s when you try… At least the way I see the journey of a company, and a technology company, it’s mostly about trying to indeed deliver the value and solving the problem you intended to solve in the first place. And in our case, the problem is about democratizing AI, further helping people to use their ingenuity in order to build more things with data and AI, making it normal.
And literally speaking, where are the treasure troves of data and AI? Where are the people having the will to do more with data and AI? I must admit, and it will be painful, but it’s in the U.S., and I say that with a little bit of my original accent. And so indeed, as a technology company, indeed, it’s a natural journey actually indeed, to get to the markets where you’ve got the traction, but also the ambitions from your customer that helps you actually build a better product.
Okay. Well, that feels like a wonderful place to end. Thank you so much for coming back telling us this story. Congratulations on all the progress. It’s been thrilling to be a part of it as an investor. So really appreciate it. And we hope to have you back at these events sometime soon.
Thank you Matt.