In the early days of Big Data (call it 2009 to 2014), a lot had to do with experimentation and discovery. Early enterprise adopters would play around with Hadoop, the then-new open source framework with a funny name, trying to figure out where the technology fit in the broader landscape of databases and data warehouses. People would also try to figure out what a “data scientist” was – a statistician who can code? An engineer who knows some math? It was a time of hype, immature products and trial and error.
Fast forward to today, and the Big Data world has entered a phase of early maturity. To be sure, for many enterprises around the world, Big Data is still a very new experiment. But by now the early adopters have deployed Big Data technologies in production and at scale for a variety of use cases, and are starting to see results.
At the same time, the Big Data world is probably more complex than it was a few years ago.
First, it has seen a rapid proliferation of new technologies and frameworks, for example focused on real time and streaming (Kafka, Spark, Flink, etc) or machine learning and AI (scikit, TensorFlow, etc.). The space has also experienced its Cambrian explosion of new startups, resulting in a particularly crowded landscape.
Second, as mentioned in a blog post earlier this year, it has become very clear that success with Big Data is not about deploying one or several technologies, but rather about putting together an assembly line of technologies, people and processes. You need to capture data, store data, clean data, query data, analyze data, visualize data. Some of this will be done by products, and some of it will be done by humans.
In that very heterogeneous and complex environment, the need for tools that can “make it all work together” has become increasingly obvious. It should be possible to clean (wrangle, blend) the data, develop and apply analytics and machine learning algorithms to it, and deploy the results in production – all without herculean efforts. It should be possible for all employees and executives working with data within the enterprise, whether data scientists, data analysts, data engineers or business users to all work together.
Enter Dataiku. For the last several years, Dataiku has been building a product that is called “Data Science Studio”, but in fact it does a lot more than data science, and addresses a much broader audience within the enterprise than “just” data scientists. It is a powerful data collaboration layer that enables all components (data repositories, algorithms, models, etc) and all people (data scientists, data analysts, data engineers, etc) that are part of the Big Data assembly line to work together, at scale, in production and with all the necessary attention to security and data governance.
Today, I’m excited to announce that Firstmark is leading a $14M Series A in Dataiku. The company has been growing both profitably and explosively (300% last year), and this cash infusion will enable it to focus aggressively on the US market, in addition to continuing its development in its native Europe.
Congrats the Dataiku founders (Florian, Marc, Clement and Thomas) and we at FirstMark are thrilled to now be part of the journey!