In Conversation with Barr Moses, CEO, Monte Carlo

As more and more companies around the world rely on data for competitive advantage and mission-critical needs, the stakes have increased tremendously, and data infrastructure needs to be utterly reliable.

In the applications world, the need to monitor and maintain infrastructure gave rise to an entire industry, and iconic leaders like Datadog. Who will be the Datadog of the data infrastructure world? A handful of data startups have thrown their hat in the ring, and Monte Carlo is certainly one of the most notable companies in that group.

Monte Carlo presents itself as an end-to-end data observability platform that aims to increases trust in data by eliminating data downtime, so engineers innovate more and fix less. Started in 2019, the company has already raised $101M in venture capital, most recently in a Series C announced in August 2021.

It was a real pleasure to welcome Monte Carlo’s co-founder and CEO, Barr Moses, for a fun and educational conversation about data observavibility and the data infrastructure world in general.

Below is the video and full transcript.

(As always, Data Driven NYC is a team effort – many thanks to my FirstMark colleagues Jack Cohen, Karissa Domondon Diego Guttierez)

VIDEO:

TRANSCRIPT [edited for clarity and brevity]:

[Matt Turck] Welcome, Barr. You are the CEO and co-founder of Monte Carlo, the data reliability company, described as the industry’s first end-to-end data observability platform. You guys started in 2019?

[Barr Moses] That’s right. Summer 2019.

Summer 2019. So it’s ultimately a very young company, but you’ve had a remarkable level of success in general, from everything I understand, but also in the venture market. You have raised a little over $100 million in a pretty rapid succession of back-to-back rounds. Monte Carlo being very much a hot company in the space, which was very impressive to watch.

I thought a fun way to start the conversation would be actually with your Twitter handle, which is @bm_datadowntime. So BM obviously are the initials of your name, but data downtime is really interesting. And I’d love for you to start with, what does that mean? What is that data downtime and why does it matter?

.So actually fun fact, I’m not an early adopter of technologies. I don’t know if you’d call Twitter being an early adopter, but before starting Monte Carlo, I actually didn’t have Twitter. And my phone up until not too long ago was from 2013. We got a security team and they were unhappy with that, so I had to upgrade my phone, understandably so. But when we started Monte Carlo, I also caved in and joined Twitter at the time. So that’s the explanation for that. When we started the company, the concept of data observability, data downtime, it was really honestly very foreign and not familiar, right? It’s not something that folks understood. We’re still very much in the early days of that category. We started the company with thinking through, what is the biggest problem that data teams face today?

I spent a good couple of months and hundreds of conversations with data teams, from large companies like Uber and Netflix and Facebook to small startups, and basically asked them, “What’s keeping you up at night?” And I got to a wide range of variety of answers. But if there’s one thing that people just like, you could see them starting to sweat on the call and moving uncomfortably was when people talked about what we later referred to as data downtime. It’s basically something that really anyone in data encounters, which is there’s some data product, like maybe a report or a dataset or data on your website, basically some  data that’s being used by a data consumer. That could be an executive, maybe the CMO, it could be a team, for example, your sales team, or it could be actually your customers who are using your website.

Those downstream consumers of data often encounter wrong data. It could be wrong because the data is not up to date. It could be wrong because something was changed upstream that wasn’t reflected downstream. It could be wrong for millions of users. But basically it’s periods of time when the data is wrong, inaccurate or otherwise erroneous. And that gets people going. People are really upset about data downtime and rightfully so. It’s really frustrating, how much data we have, how much data we’ve collected, how eager we are to actually act on the data that we have. And in fact, the data is often wrong, which is really frustrating.

Are there examples where, do you have any kind of an anecdotal story where having data that was wrong was not just annoying, but led to very serious consequences?

Yeah, for sure. And happy to give some specific examples. Ranging from companies actually report numbers to the street and accidentally report the wrong numbers or about to report the wrong numbers. That happens more than you’d like to know, probably, Matt. Or for example, one of our customers is Fox. Fox streams major events like the Super Bowl as an example. As you can imagine, they’re tracking lots of information about those events. Like how many users, where are users spending time, on which content and which devices? And so the integrity of that data is incredibly important because decisions are made in real time based on that data.

Another example would be Vimeo, a great customer of ours, a video platform, streaming company. They have over 200 million users in fact, on their platform. They use data and have used data throughout COVID-19a to identify  new revenue streams. Also, make real time decisions about their users. So for example, if there’s a particular user that actually needs more bandwidth at the moment, for example. If you don’t have the right data at hand, it’s actually very difficult to give the adequate or right experience that you’d like for your customers. Ranging from making the wrong internal decision to putting your company at risk due to financial errors, to actually sharing data products out in the wild that are often inaccurate. All of those have a material impact on the business. We oftentimes hear from customers and others that one such incident could put millions of dollars at risk for businesses.

Those are great examples. So the concept of data downtime leads to the concept of data observability. Do you want to explain what that is?

Starting from the top, organizations and data teams have invested a lot in their data infrastructure. We’re seeing that in the rise of data infrastructure companies. So you’re seeing companies like BigQuery with $1.5 billion in revenue, Snowflake with a billion dollars in revenue, Databricks with 800 million and accelerating. And so organizations are investing a lot in building  best in class data infrastructure with the best data warehouse, data lake, best ETL, the best BI, the best ML. And there are full teams, including data engineers, data analysts, data scientists that are responsible to actually deliver data products. Those data products could be a report like we talked about. Could be a specific dataset that’s used in production. Could be a variety of different things.

And so the responsibility of those teams is actually to deliver those data products in a reliable, trusted way. And that’s actually really hard to do, and the data is wrong often. And so in order to solve that, one approach is to actually look at how is this solved in software engineering? Because software engineers actually have a similar role in making sure that infrastructure and web apps and other  software products that they are building and designing are in fact reliable and are not down so to speak. As a result, in order to support that, there’s actually been development in DevOps around observability and software. There’s plenty of off the shelf solutions, such as Splunk and Datadog and AppDynamics and New Relic, which have over the years helped software engineers make sure that their products are reliable and secure and easy to access.

So if you take that concept and you say, “Okay, what would that look like in the world of data? What if we took those concepts and apply them to data?” And this is what we call , “The good pipelines, bad data problems.” So you have the best pipelines, but the data is still inaccurate. What if you took some of the concept that worked in software engineering and apply them to data engineering? That’s how the term data observability was born. The idea is, the concept of observability is to actually infer the health of a system based on its outputs. And so in software observability, there’s  a set of metrics that we track, there’s best practices, there’s SLAs, there’s availability. There’s  the definition of five nines and how many nines do you need to track? We’re taking all that good stuff and moving that to data or adopting that in data as part of this concept of data observability.

So that’s in a nutshell. Often the question that we get is, “Well, what does observability actually tactically mean? What should we really  track and measure?” In software observability, that’s pretty common and data observability hasn’t. So we’ve actually written pen to paper to define  this framework of five pillars of data observability to really explain what should a data team actually look to automate, instrument, monitor, and analyze so that you can have that trust in your data.

Let’s get into this. What are the five pillars?

I wanted to leave you hanging Matt. At the core of what it means to actually  operationalize trust in your data. That’s really what we’re here about. I know there are lots of buzzwords in one sentence, but I think it’s actually  core to understanding what purpose does data observability serve. Data observability is not, you’re not just implementing it because it’s the cool hot word. It actually serves something and that’s to operationalize trust. There’s basically  three core parts to that. The first is detection. So actually understanding when data breaks and being the first to know about it. The second is resolution. So knowing once there’s an issue, how quickly can I resolve it? And the third is actually prevention. So we believe that by instituting these  best practices, you’re actually able to reduce the number of data downtime incidents that you have to begin with.

That’s what you call the data reliability life cycle?

Yes, that’s right. Exactly. That’s how we’ve developed the life cycle. And so data observability helps us under the detection part understand what are the different ways in which we can actually detect these issues. And so this is where the five pillars come in. The first, and again, this was  based, these five pillars were based off of hundreds of conversations with folks on what are the common reasons for why data breaks? And we basically consolidated those, this doesn’t capture everything, but it captures 80% of it, which helps customers meaningfully on day one. So without further ado, the first is freshness. So freshness is relating to the freshness of the data. So for example, it talked about media companies, you can think about eCommerce companies or even a fintech company that relies on thousands of data sources arriving let’s say two to three times a day. How do you keep track, make sure that thousands of those data sources are actually arriving on time?

There has to be some automatic way to do that, but that’s  a common reason for why data would break. So freshness is one. The second is volume. So pretty straightforward. You’d expect some  volume of data to arrive from that data source, has it arrived or not? The third is distribution, and distribution refers to at the field level. So let’s say there’s a credit card field that is getting updated or a social security number field that gets updated. And suddenly it has letters instead of numbers, that would obviously be something is incorrect. So you actually need tests for that at the field level.

The fourth is schema. So actually schema changes are a big culprit for data downtown. Oftentimes there’s engineers or other team members actually making changes to the schema. Maybe they’re adding a table, changing a field, changing a field type, and the folks downstream have no idea that’s happening and suddenly everything is broken. That happens all the time. And so automatically keeping track of schema changes is the fourth that contributes.

And then the fifth, my favorite, is lineage. We actually just released a blog post on  how we did field level lineage and table level lineage. And basically the idea is, can you automatically infer all the downstream and upstream dependency is a particular table say in a data warehouse and use that to understand the impact of a particular data quality issue? So let’s say a particular table has not received any data, but there are no downstream users of that data. And who cares? I don’t care about that. Maybe it doesn’t matter, but let’s say there’s 30 reports that feed, that use that data every day, maybe that data is actually being used in a marketing campaign to determine pricing, to determine discounts in which case it’s actually important to fix that problem.

And vice versa, lineage also helps us understand the root cause of a particular issue. So if, for example, there’s a table that is not receiving data or there’s a problem with it, and there’s a schema change somewhere upstream. I wish I knew about that event happening in close time or proximity to that data downtime incident so that I can actually infer an understanding of the root cause and the impact of that issue. So yeah, those are the famous five pillars.

Great. Well, thank you very much. While we’re on the topic, a question from the group, “Does data observability mean different things for different applications for different modes of data structured versus unstructured, real time versus historical or does it cover everything?

Yeah, I think in general our goal with the term data observability is to apply it to data everywhere. And obviously it has different meanings and different types of data. Especially if you think about unstructured versus structured data. We’re also seeing more and more streaming. So definitely there’s lots of different changes that are happening in the data stack and in how folks think about making sense of their data and taking action on it. Our belief is that you need to be able to trust your data wherever it is and whatever type of data it is.

With most of our companies that we work with and that we see, we spend a lot of time on the data warehouse and BI, kind of where we started, so we spent a lot of time there. We’re seeing more and more folks move to obviously different technologies. Our thinking is that in order to build strong data observability practices, it has to include a concept that we call end to end. Meaning including wherever your data is, all the way from ingestion to consumption. There’s historically been a lot of effort going into figuring out data quality in a particular place in the stack. Let’s say just upon ingestion or for a small number of data sets. I actually think that approach no longer works. The nature of data is that it changes that flows, pipelines are added every day by new team members. And so making sure that your data is accurate, only one point of the pipeline is just no longer sufficient.

If you’re really thinking about strong data observability practices, it does have to go end to end. It’s also frustrating and hard to get that accurate or right from the start. And so I actually wouldn’t recommend starting with that and trying to do everything end to end, that’s likely bound to fail. But that is a vision that I think data teams should be moving to and are moving to. And I think it’ll get easier as we standardize on what data observability means for different types of the stack and different types of data over time.

Speaking of team members, how do you think about the human and social aspect of data observability? Who owns this? Is that engineers, is that business people? How do you think about it in the context of the emerging data mesh, which is something that I believe you spend a good amount of time thinking about?

Data mesh, I think, is a very controversial topic. I love controversial topics because they generate a lot of pro and con discussions. So I love those. I think that, for folks not familiar with the data mesh, at a very high level it’s  a concept that’s taking the data industry by a storm. Love it or hate it, it’s very much huge and in discussion.

We had Zhamak speak at the event, but just to define it is basically this concept of decentralization, of ownership of data and having different teams own the full data experience and basically providing what they’re doing as a service to others. So the finance team owns a whole data stack and offers it as a service to the rest of the organization, for example, if these are fair?

Yes, that’s exactly spot on. Credit goes to Zhamak for coining the term and for popularizing it, I think she’s just actually releasing a book about it too, which I’m excited to read. So yes, that’s exactly right. That’s the concept. And as part of that move to decentralization, which by the way, we  see in waves across some companies. Like oftentimes folks will start with decentralized, move to centralized and back to decentralized, but generally the idea of making data decentralized and self-serve is something that we see a lot. That has to happen as part of data becoming widespread in the organization. So in the past, if you had only two or three people working with data, you could make it centralized, big deal. You could work with the data, check it, and you’re good to go more or less.

Today you have hundreds of people working with the data. It does not make sense anymore that there’s one team that  has the keys to it and it really, actually just ends up as a bottleneck. So, my work with a customer was like, yeah, if I wanted to get something done with my data team, I basically have to wait a year in order for them to get through all of their priorities. That’s a reality for lots of data teams. They have to wait months or years to get something done, which just doesn’t make sense for an organization that wants to really make data accessible for a large number of teams.

You ask a little bit about where are people involved. Oftentimes we see  a data platform. Within a data platform there might be  a data product manager, someone who’s actually kind of like the voice of the customer as it relates to data. There might be data engineers and then there’s  data analysts or data scientists that are consuming the data. And then there’s actually everyone else in the company who is consuming the data as well, ranging from sales, marketing, customer success, product EPD, et cetera.

In those cases where the data mesh I think is helpful is in introducing this concept of self-serve, which is actually really powerful. Because in that concept the data platform team is actually responsible for building things that can be used for all of these teams versus being a bottleneck. So, when it comes to ownership, which is a very heated topic, again, in the concept of downtime and in the concept of data mesh, I think data mesh introduced here some concepts that make it easier because self-serve basically means that there’s kind of like a shared accountability, if you will. Actually, one thing that we talk a lot about is  a RACI matrix, RACI spelling R-A-C-I, clarifying responsibility, accountability, consulted and informed, where there’s not one silver bullet fit for everyone, but data teams can actually put pen to paper. Okay, who’s responsible for data quality? Who’s responsible for dashboards? Who’s responsible for data governance? Who’s for each different item and actually laying out how teams work together.

So, I think generally the themes that we see is moving to a decentralized motion, self-serve is  picking up speed, but I can tell you that the ownership thing has been solved. Most often people ask me, “Can I talk with someone who figured it out?” And honestly, there’s very few people who’s actually figured it out. Most folks are somewhere on the journey, maybe a couple steps ahead of you or a couple steps behind you. But I rarely see folks who have said, “I got this, I figured it out. We know what to do when it comes to ownership.”

Out of curiosity, how does that translate for Monte Carlo into selling? Like, who’s your buyer? Who buys a platform like you guys?

Our mission is to accelerate the world’s adoption of data by reducing or helping to eliminate data downtime. And so that means that we work with data teams to help them reduce data downtime. Oftentimes the folks that we work with most closely are data engineers and data analysts, because they are mostly the folks who are responsible for data pipelines or for making sure that the data is actually accurate. And working with their consumers include data scientists or different teams, like  marketing teams or analytics teams that are embedded within their business units, who might consume the data. So in that case, for example, someone on the marketing team might have a question like, “Which data set should I use, or which report should I use, and is it reliable?” And so you might be able, you could  use Monte Carlo to answer that question, but the primary  users for us are the data engineers and data analysts. Oftentimes part of a data platform group, or not, depends on the  structure of the company.

I’d love to do a little bit of a product tour in some level of detail, if you can. Maybe taking it bit by bit. Let’s start with how you connect to the various data sources or the parts of the data stack, so that you’re able to do observability. I read somewhere you have data collectors, how does that work?

Yeah, for sure. So, as I mentioned, we very much believe in end-to-end observability. Actually, the cool thing about all these things that we talked about. Format – it’s not just marketing speak. It’s not just like stuff that we say on a podcast, actually, our product is built around it. So if you log into our product, you’ll see these concepts in real life, which I find amazing.

I didn’t realize that happened.

Yeah, exactly, me neither, but yeah. Our product is built around these concepts. Which means that first and foremost  end-to-end visibility into your stack. I mentioned we very much believe in having observability across your stack. We started with cloud data warehouses, data lakes and BI solutions. So we’re actually the only  product in market that you can connect today to those different systems. And automatically out of the box get an overview of what the health of your data looks like and observability for your data on the metrics or the variables that we talked about before.

That’s the first thing, you connect, you give presumably read-only access to your data warehouse or your data lake to Monte Carlo as the first?

Yeah, exactly. That’s right. So our system is API-based. We do not ingest or process the data ourselves. So we basically need read-only access to let’s say Snowflake and Looker for example. And then what we do is we start collecting metadata and statistics about your data. So for example, we collect metadata, like how often is a particular table updated? Let’s say it’s updated three times an hour. We collect the timestamps of that table. We collect metadata on the table, like who’s actually querying it? How often is it being used? What reports and the BI rely on it? We also start collecting statistics about the data. So we might look at particular talk about distribution of a field. So we might look at the percentage and all values in a particular field, a particular table, as an example.

The last thing is we reconstruct the lineage. So without any input, we parse the query logs to reconstruct at the table level all the upstream and downstream dependencies. We do that not only within a particular system, like within Snowflake, but we actually do that across your BI as well. So we can do it from Snowflake to Looker, for example. What we do is we overlay that information together with the health of your data. So we can bring together that one view where we can say, “Something changed upstream resulted in a table in Snowflake, which now does not have accurate data, which results in all these table down streams, which are impacted and here are the problems. Which results in these views in Looker that now have wrong data as well.” So you can have that end-to-end view.

So, you integrate with the data warehouses and data lakes, the BI systems, presumably DBT as well. Is that part of the integration?

We actually just released our first DBT integration not too long ago. And that is again, part of connecting to ETL, transformation, orchestration. So we’re also working on an Airflow integration as well.

It sounds like for now you’re very modern data stack centric. Is part of the idea to just go into other parts of the stack, in particular the machine learning stack, the feature stores and also the real time, the Kafka part of the world?

Yeah, definitely. Like I mentioned, observability doesn’t discriminate in that sense, right? Data needs to be accurate everywhere, regardless of stack, regardless of what you’re using. So yes, we started with cloud and what you would call modern data stack, another buzzword, but the problem does exist. With legacy stacks, with machine learning models the problem exists in those areas as well, 100%. Looking 3, 5, 10 years ahead from now, I think the problem will actually be exacerbated across all of those dimensions, not just one, because folks are using their data more and more. There’s higher demands of their data. There’s more people making those demands and there’s a stronger adoption of all of that. So definitely the problem permeates across all those levels.

So you connect to all the key systems, you get data output, you run statistics on it. How do you determine if there’s an issue or not an issue?

We actually use machine learning for that. We infer what a healthy baseline looks like and make assumptions based on historical data. So we use historical data points, collect those, infer, project, what the future should look like or might look like for you, and then use that to let you know when something is off. So I’ll give you an example. Let’s say I’ll use a freshness example because it’s the easiest one. Let’s say we observe over a period of a week that there’s a particular table that is used by your CEO every morning at 6:00 a.m. And that table gets updated twice an hour during the day, but not during the weekend. And then on Tuesday it suddenly stopped updating. Because we’ve learned that the table should get updated twice an hour every day during weekdays, if it is not updated on Tuesday at noon, for example, then we assume that there might be a problem or at the very least you’d want to know about it.

Oftentimes actually the interesting thing that we find is that even if a change is not what you’d call data downtime, not actually something wrong, data teams still want to know about that, because it’s a deviation from what they’d expect or from what they want. And so, sometimes it’s actually intended, that change, but the data team wants to know about that and wants to confirm that the intended change that they made was actually successful, for example. So it’s not like detection is incredibly important, but it’s just the tip of the spear, if you will. There’s actually a lot more that goes into improving communication about data downtime, improving, okay, there’s an issue, but what is the impact of that issue? Do I care about it? Who owns this? Who should start solving this? How do I know what the root cause is? And how do I actually prevent this to begin with, right? So if we instill the visibility here and empower people to see these things and to make changes with this context in mind, you can actually reduce these to begin with.

It’s very interesting that you used machine learning for this. I had Olivier Pomel from Datadog at this event a couple of years ago. And he was talking about how at Datadog they started using machine learning very late in the game and deliberately so, and it was very much rules based. Part of the issue being the noisiness of machine learning and potentially leading to alert creep. How do you think about this? Giving people control about the type of emergency alert they get versus something that’s predicted by the machine? And as we know, machine learning is wonderful, but ultimately it’s a somewhat imperfect science.

Generally we have to be thankful like the advances in the last few years, if you will, we’ve come a long way. I think there’s the balance between automation and input. I think historically we’ve leaned into a 100% input where folks literally had to manually draw lineage on their white board. Some companies still do it, some companies actually get in a room and everyone literally writes out what this lineage look like. We don’t believe in that. There’s ways to automate that. In some areas a customer would be the only person to know. So for example, we talked about the CEO that looks at a report at 6:00 a.m. That means that at 5:50 everything needs to be up to date, for example.

That’s a business rule that a machine would never have and we would never be able to automate that business context. And so I think it’s a balance. I do think that teams today and organizations and me being in those shoes prior to starting Monte Carlo is, we don’t have a lot of patience. People don’t have months to get started and see value from a product. And so I think the bar for products is very high. I think you have a matter of hours to see value, actually. Not days, not months, not years. And with that in mind, actually information can go a long way. Of course, we want to make sure that every alert that we send is really meaningful. But again, if you think about an alert in the context of, in a very small context of sending an alert, it’s way easier to honestly inundate and create fatigue.

But if you think about the concept of, here’s an alert, here’s everyone that’s impacted by this alert. Here’s other correlated events that happen at the same time. The chance of that alert meaning more for the organization is so much higher. If you’re just looking at changes in the data over time and at metrics, it’s a lot easier to hit a lot of noise, if you will. But if you’re actually looking at, “Hey, are we operationalizing this? Are we taking a detection and doing something meaningful out of it? Are we routing that alert to the right team? Are we routing it at the right time, the right context?” Then it makes those alerts actually a lot more rich and actionable. So I think for us, that’s a lot of what we’ve invested in. How do we make sure that every single alert is truly meaningful and can drive action? Just getting a lot of alerts without anything beyond that is honestly not sufficient. We have to go way beyond to help make the lives of data teams truly easier, not just more and more information.

How does the resolve part of the equation work? Is that why you’re integrating with Airflow so that you can run the data jobs automatically?

That’s a good question. It’s part of it. There’s also a lot of context that you can get from solutions like Airflow, DBT and others, like what pipelines are running. It’s for understanding the root cause as well, but yeah, that’s in general the area of resolve is an area that I think there’s a lot more to do. We’ve done a lot in the detection, in the first part, we’ve done some work in the resolution and prevention. Both of those are areas that we’re investing a lot more in.

Great. I want to be conscious of time at the same time it’s such an interesting product and in general the space. Just to finish that product tour – you have a data catalog as well. Where does that fit in the whole discussion? By the same token, you also have an insights product that sounded really cool. So maybe address both of those, although obviously they’re different parts, but address them together if you can?

Going back to what’s most important to the teams and people that we work with, it’s being able to know that you can trust the data that you’re using. Part of that is knowing when data breaks and part of that is actually preventing data from breaking. When you think about the type of information, the kind of information that we have about your system and how it’s being used, that can lead to many insights. We actually release insights as a way to help data teams better understand landscape and better understand the data systems. It’s actually not uncommon for me to get on a call with the customer and someone will say, “I just joined the company. I honestly don’t understand anything about our data ecosystem. There was two engineers who knew everything and they left. I really just don’t know, I don’t understand at all what’s going on. I just need understanding our lineage and the health of our data and where’s data come from, and where’s the important data and what are the key assets, for example.”

One of the first things that we actually worked on is called key assets where we help data teams know what are the top data assets for them. So what are the top tables or top reports that are being used most, are being queried most, that have the most dependencies on. That’s an example of an insight. The idea is, how can you generate insights based on all the great information that we have to make it easier for data teams to enable these data products that they’re building?

There’s plenty of different examples for insights that we’re driving, investing a lot in that. Again, with the goal of actually preventing these issues to begin with. And that’s kind of on the second part of your question. And the first part of your question around the role of catalogs. We actually wrote a blog post not too long ago, called data catalogs are dead, long live data discovery, obviously a controversial topic or title. The idea there is that the idea of data discovery, or an automated way to understand what, where data lives and what data you should access is a problem that more and more data teams are facing. When folks ask themselves, “Okay, I’m starting to work with the data, how do I know which data I should use? What data can I actually trust? Where is this data coming from?”

Those are a lot of questions that folks are asking themselves. And that it’s actually really hard to answer, unless you have that engineer who left a few weeks ago and knows all the answers to that. And so, really getting a sense of what are better ways for us to discover data, what are better ways to make it easier for folks to actually access the data is one of the areas that I think is really top of mind for lots of data teams. I hope that clarifies those too.

Just to finish a rapid fire of questions from the group. Actually question from Carolyn Mooney from Nextmv, the prior speaker. “How do you think about supporting different integrations?” So from Carolyn’s perspective in decision automation, she said “Observability is super interesting. For example, we think about alerting on the value output for decisions, for example, percentage went up significant in the last hour. So how does one integrate with Monte Carlo?”

That’s a great question. We should probably figure it out. I don’t know the answer. But Carolyn, we should probably sync offline and figure it out. Generally we have lots of folks kind of integrating with Monte Carlo, we very much welcome that. And so would love to figure out the details of that and see what we can make work. So thank you, Carolyn, for the question.

Question from Jason. “How do you think about observability and insights without semantic knowledge of the data? Do you see limitations to looking at data without this additional information?”

I probably need a little bit more details from Jason about what he means, but I’m guessing the question, going back to a little bit what we talked about earlier, which is, how can you infer whether data is wrong without having the business knowledge and the context that you might not have coming in. I will just start by saying, I don’t think that that’s possible to solve. I don’t think that a machine can actually infer, that we can infer something without knowing that business knowledge, it’s not possible. That’s also not what we are attempting to do at Monte Carlo. I do believe that there is a certain level of automation that we can and should introduce that we have not introduced to date. And that by introducing that level of automation, we can reduce our customers’ team’s work from 80% manual work to 20% manual work.

We can actually with the automation cover 80% of reasons for why data downtime incidents happen and allow data teams to reserve their work for the top few percentage of issues that only they will know about. So we’re not here to replace data teams or to understand the business context. We don’t attempt to do that. Really attempting to make data teams’ lives easier. In today’s world, most data teams actually spend a lot of time writing manual tests on things that can be automated on a lot of the known unknowns, if you will. And so, if you know what tests to write, if you know of what to check for, then you can write a test for it. But there’s so many instances where it’s an unknown, unknown, in which case actually automation and broad coverage can help eliminate those cases. So just to wrap up, I think it’s a balance. I think we’ve actually historically under invested in the automation, which is why we lead with that first. But we definitely need the business context. We’re not going to get very far without that.

The last question of the evening from Balaji. Balaji has two good questions. I’ll just pick one, because I’m curious about it as well. “I’d love to understand the team’s core differentiation and durable advantage relative to competitors. Is it the suite of integrations, proprietary time series models, CXL domain focus or something else?” Because it is a little bit of a hot space in general with a number of aspiring entrants.

Sorry, is the question differentiation in terms of…?

Relative to competitors?

So first I would say it’s our honor to pioneer the data observability category and to lead it. I think it’s a great time for this category. And I’m excited for its future too, for sure. I think in terms of differentiation, the things that we focus on in particular that I think are important for a strong data observability platform, whether it be Monte Carlo or another one is a couple of the things that we actually talked about today. So it’s probably a good summary. The first is end-to-end coverage of your stack. I think that’s critically important because data observability doesn’t start or stop in a particular place.

Thinking about  the five key pillars and the automation of that. Actually thinking through, how do I have a platform that gives me the most bang for my buck, if you will, leaning on automation? I think the third is the combination and intersection of data quality and data lineage. Those are things that are incredibly important that we see, and actually being able to make it actionable – data observability. Then the last point is  around alert fatigue that we touched on as well. I think making alerts meaningful, making them ones that your team can actually act on is something that’s very hard to do that we’ve invested a lot to do. So I would say, if I were you Balaji I would be thinking about those core capabilities for any data observability solution.

All right, wonderful. That feels like a great spot to end. I really appreciate it. Thanks, and congratulations on everything you’ve built and the momentum, it is really impressive to watch and really exciting to see how the companies are thriving in such a short period of time. So thanks for coming and telling us all about data observability. I’m also very proud of myself for being able to say observability. I practiced a lot right before this. So thanks. Thanks to everyone who attended. If you enjoyed this event, please do tell your friends. You can also subscribe to the channel on YouTube, just search for Data Driven NYC and you’ll have access to the whole library of videos. And we’ll see you at the next one. Thanks so much, everyone. Bye.

Leave a Reply

Your email address will not be published. Required fields are marked *