In the ever vibrant world of the “Modern Data Stack” (an ecosystem of mostly young tech startups that represent the rising generation of data software vendors, and integrate well with one another), Hex has been getting increasing visibility and momentum. At its core, Hex is a collaborative data platform where teams can explore, analyze, and share. It aims to bring together the best of notebooks, BI & docs into a seamless, collaborative UI.
The company was founded in 2019 and you raised a total of $73.5 million in venture capital to date, including most recently a $52 million Series B.
CEO Barry McCardel joined us at Data Driven NYC for a deep dive in to the product, the company, the data space and his journey from doing “unholy things in Excel” as a young consultant to building a great startup.
Below is the video and full transcript.
(As always, Data Driven NYC is a team effort – many thanks to my FirstMark colleagues Jack Cohen, Karissa Domondon Diego Guttierez)
TRANSCRIPT [edited for clarity and brevity]:
[Matt Turck] Why does the world need a collaborative workspace for data teams? What’s the big problem that you’re working on solving?
I’ve been working in data effectively my whole career. I started as an undergrad, really, doing a bunch of stuff, just writing R scripts and all this research stuff before data science was even a thing, really. And then was doing unholy things in Excel as a consultant. Then I was at Palantir for five years, where I got exposure all across a bunch of different technical things. Then I worked really closely with our data team at my last company. And all along the way I basically observed the same set of problems. In essence, Hex is meant to solve those.
So the first thing we really set out on and the thing that was a really acute problem that we wanted to solve was around the ability to share work. It’s this very common thing and we saw it up close at our last company we were at which is you have data analysts and data scientists and just people working with data all over the business doing really interesting, awesome things. They’re going in. They’re asking and answering questions. They’re driving insight.
Then the actual ability to share and publish that throughout the organization is awful. It’s really a disaster. You have people screenshotting charts out of Jupyter Notebook and pasting them in Google Docs. You have people exporting a CSV from a BI tool so they can build the right fit in this other thing and then put that in a deck. Then you have people hacking together scripts to try to build a pipeline to put the forecast in the warehouse so you can look at it in the BI tool.
It was this huge mess. So we started really focusing on that problem. The initial thing we were focused on was, how can you help data scientists who are working on something like a Jupyter Notebook, take that and share it with others in a way that’s interactive and useful and usable? As we started getting into that, we realized the pain was really much deeper than that. It was actually like people were just frustrated with the whole stack. You had individuals jumping around between tools depending on whether they’re using SQL or Python or no code. You’ve got teams really unable to collaborate. The whole versioning and real-time collaboration for all this is just a mess. It’s very regressive compared to tools in other spaces, like Figma or Google Docs.
Then there’s just an amount of overhead and pain to getting these tools up and running anyway that is actually really hard. There’s a very classic experience where you’ll see a new data scientist will join and the first two weeks are really just about getting all the right packages installed locally in their Jupyter environment and then making sure that’s synced up. You wind up with this overhead that is both very frustrating for the people who are doing these workflows but also prevents a lot of people from accessing.
So back to your question, Hex is really meant to be three big things. It’s an amazing, collaborative environment for being able to do analysis and data science. It’s got a notebook UI that is just absolutely magical. I’ll show that to you in a bit. It’s very, very easy to take your work and share it and publish it as an interactive data app that anyone can use. Then that work is then kept and organized in what we call a Knowledge Library which makes it very easy for anyone else in the organization to discover and benefit from the work that the data team has been doing. So mission wise, that’s really what we’re about and we built a product that really addresses that end to end.
Great. And it’s, by definition, meant to be very inclusive, right? So it’s data scientists. It’s data analysts. It’s business people as well. You have an expression that I read somewhere, which I really liked, which was the “analytically technical.”
Analytically technical, yeah. It’s interesting because you think about some of the big changes that have happened in the last few years. You see this explosion in people who are data literate. They’re even, I would call them, somewhat technical. And there’s more people who know Python, sure. There’s a lot more people who know SQL. And a lot of people have either learned SQL on the job or come in out of undergrad with that skillset. There’s also this much bigger population of people that I would argue that are technical in their own way; which if you’re an Excel Power user and you’re writing deeply nested functions or VBA or even just some pivot tables or IFs, you are basically writing code. I would argue you are writing code. You are technical in some way.
And I think traditional data science and analytics tools have actually been a high tower. They’re difficult for these people to access. And so one of the things that’s really interesting for us, what we see in our customers, is we have a lot of users. In fact, most of our customers, most of the users are mostly writing SQL. And that is very different than what you might think of when you think of a notebook environment, which is traditionally very associated with Python and, quote-unquote, “data science.” But Hex makes it very easy to go back and forth between SQL and Python. You can collaborate between these. And so it’s very inclusive.
It’s very cool for us to see that our customers will start with a very small number of data scientists, a couple people who are migrating their workflows over from Jupyter but then will explode to where you see all sorts of people using Hex to ask and answer questions. That’s something we’re very excited about. I feel like we’re just still at the tip of the iceberg. And we think of it as building a platform that has a low floor and a high ceiling. We want to have a platform that anyone can come in and ask and answer questions. But it doesn’t arbitrarily top out.
And I think that’s a big difference between the last generation of tools, which is like, “Okay. This is a no-code thing. It’s got a low floor and a low ceiling.” But the second you want to do something more complex, you’ve topped out. And now you have a UX SqlRunner. Medium floor, medium ceiling. And then, “Okay. Now I’m over in my Jupyter Notebook,” high floor, high ceiling. I challenge why this needs to be three fragmented things. And I think we’ve done a great job so far being able to bring some of those more together.
So to take some of us through a bit and drill into the next level, so the core is a notebook. We talked about showing the product. So I’m excited for a product demo. But just at a high level before we jump into the demo and maybe to make it inclusive for everyone, so just a 10-second definition of what a notebook actually is.
Yeah sure so notebooks have been around for a long time. As legend has it, they were first pioneered at Mathematica. And the most common one now is a project called Jupyter. It used to be called IPython.
That was in the ’80s, right?
Yeah. Well, I mean, Mathematica is a real OG. IPython’s a little bit newer. And then it was rebranded as Jupyter, I think, in 2015, something like that. But anyway, the notebook format is basically you’ll have cells which have code traditionally. And then those cells show the output of that code. And those cells can be evaluated individually. This is different from a script. A script is one file. And the script is usually evaluated, the whole thing, top to bottom.
And this breaking it up into cells makes it really great for iterative and exploratory analysis. So you can say, “I just want to run this little chunk. And, oh. I want to do the aggregation a little bit different. I want to do this.” And this is all an expression of a thing called literate programming. I will not go in the deep end on this. But basically, it’s this idea that you can see your logic and then the outputs in one place. It’s a very, very popular format. I mean, millions of people use notebooks. But we think that it’s actually a format that a lot more people should be using. We’re very happy to see that with our user base and customers.
Yeah and just even at a higher level, a notebook is a place where data scientists and data analysts work together. And it’s a combination of code and explanation. So it’s like a work space.
That’s right. And it’s really the thing. Well, if you talk to a lot of the data scientists, especially, it’s the thing they use all day. It’s the thing where they’re going and writing code. And they’re iterating on something. Now, notebooks also traditionally have a lot of issues. There’s a famous talk called I Don’t Like Notebooks that this guy Joel Grus gave at JupyterCon. It was very just showing up in the wrong place to give that talk. But he was right. There’s all these issues.
It was like four years ago or something like this.
Yeah. It was 2018, I think. But it was all these issues. Part of what we’re doing at Hex is, “Well, notebooks are great. They have some issues.” I think there’s a camp of people that are like, “Because of those issues, everyone should be doing something like writing scripts or whatever.” I think we’re trying to find that synthesis of, “Well, what if we just fix those issues with notebooks and made them awesome and made them accessible to 100 times the people? I think this actually could go somewhere.” And that, in a really simplistic way, is what we’ve been up to on a lot of things.
Part of what you were describing was one of the key issues – just to make sure I paraphrase and I make sure I understood correctly, one big issue of notebooks is that you can have different definitions of a variable in a notebook.
Yeah. We call this a state issue. So I would break out some issues with notebooks. I would say the first thing is accessibility. I was getting at this earlier, but most people working with data in most places have never used a notebook because step one is learning computers. You have to figure out how to set up a local Python environment and install Jupyter. And most people are not going to do that. Thing two is state. That’s what you’re getting at. And the short version of this is notebooks traditionally run in what’s called a kernel. It’s basically memory space where you run something like, “X equals 1.” Now in memory, X equals 1.
But because you can run cells out of order, it’s actually you can get in these weird state issues where you can’t actually know what state things are in. You have one cell that’s X equals 1. And another cell is X equals 20. If you ran X equals 1 before X equals 20, well, now it’s one and vice versa. So it gets really complicated. For those who aren’t familiar, we have a whole blog post about it. But the short version is this is a pain in the ass for people who have been using notebooks a long time, like me.
But it’s really painful for people who are new to it, who are like, “What’s going on?” You lose a lot of people. And we think of this as one of the many things that are in that low floor, high ceiling of, how do we make notebooks awesome, better for those power users? But also, how do you make it more accessible and usable and welcoming for this bigger population of people that we think deserve great tools?
There’s other issues with notebooks we’re working on too. But that state issue… we launched a feature last October. We called it Hex 2.0. But it was this reactive compute engine we had. And I’ll show it off in a minute. But it’s effectively saying, “What if notebooks worked a little bit more like a spreadsheet where cells have this sense of provenance between them?” When you update one thing, it automatically updates downstream cells. And the state is in a much better state. State is in a better state.
And this is great. This is better for those power users who are like, “Man, this is the way I always wished this worked.” And it’s great for novice users, who a lot of them have never used a notebook before. They’re not even aware there’s a state issue. They just know they don’t have that in Hex. So it’s all good. And that was the goal of that feature for us.
As just one last question on notebooks before we jump into the demo – part of the value proposition as well is that you can do data science in the Python world. But you can also do SQL and databases. And I think you can do that in Jupyter as well by installing packages. But it all comes out of the box. Or is that not correct?
Well, you can. I mean, this was the thing when we started out when I was like… when you’re starting a company, you get an idea, you’re pitching people the idea. And it’s not uncommon for people like, “Well, that’s already possible.” I’d be like, “Well, what if it worked like this?” People are like, “Well, Barry, that’s already possible.” “Oh. Really? Have I missed something?” “Well, if you install these three packages and then you’re willing to, if you have the environment variables all set up correctly and then you roll your own connection with SQLAlchemy and then write your [inaudible], yeah, you could totally write SQL in notebooks.” That’s an awful experience. And not only do I hate doing it as someone who’s technically capable of doing it but what about all these people who are not going to fight through all of that pain or don’t have the ability to do that?
And I think it’s the same with the sharing thing. I was like, “Well, what if it was really easy to publish your notebook in a way that anyone could use?” And it’s like, “Well, that’s possible.” There’s these three open source packages that if you install them in your JupyterHub instance and everyone’s using the right version of JupyterLab and they’re all up to date. And, oh. Well, these extensions are incompatible. But ignore that. And if you do this all right and then Mercury is aligned with Jupiter the right way, then you can do it. And by the way, you’re going to need a full-time person to manage all of this.
This is the type of shit that’s only accessible to these really technical users and turns a lot of people off from these workflows. And we don’t think it has to be this way. So whether it’s the SQL stuff or reactivity or beautiful no-code charts, we’re just making it really freaking easy to share your work with anyone. We think that there’s a way to make this more accessible without dumbing it down. Our power users love this stuff too. That’s where, I think, there’s this false dichotomy sometimes of, are you building for low-end users or building for power users? We think there’s a lot of smart people. We think there’s a lot of people that, given the right tools, will engage with these data workflows. And we’re all about building for that population.
Awesome. Love it. All right. Let’s jump into the demo.
So switching tacks a little bit, you guys seem to have done a really nice job partnering with a lot of companies in the ecosystem, including a lot of companies we’ve had at this event over the years, including, in your round recently, I saw that both Databricks and Snowflake invested in the company. But before that, you had announcements with metric store companies and other companies, like dbt.
dbt is a big partner. Yeah.
Yeah. So is that a go-to-market strategy? Is that a product? How do you think about it?
Well, it’s both. I think the partnerships with Snowflake and Databricks are very interesting in that… I didn’t talk about this earlier but we’re really building a product to embrace what we think of as the cloud data era, which is you have data that’s a massive scale, stored in cloud data warehouses. And those cloud data warehouses are not just there for storage. Databricks and Snowflake and other companies are also building very powerful compute primitives whether it’s just being able to push a query down different warehouse sizes or even being able to push Python code down. We think they’re doing a great job with that. We think that they’re going to continue to do a great job with that. And we want to partner really close with them on that.
So the partnership makes a ton of sense because when people are using Hex, they’re going to be asking and answering questions on more data. They’re going to be pushing more workloads down to those data warehouses, which is great for them. And those data warehouses also provide a really great scale and data story for us. We actually have to do less on our end to build out a whole compute infrastructure and ecosystem ourselves if they’re doing a great job of that. So we think that partnership makes a ton of sense. We see our customers really pulling us on, how are we integrating very closely with those technologies that they’re already investing in?
And then dbt? dbt is the idea that you’re building some transformation in the notebook?
You certainly could. There’s a couple interesting angles. We actually just published a blog post, one of our first analytics engineers, she uses dbt and Hex all day. She’s got a very cool workflow where she’ll develop a lot of stuff in Hex, bring it over to dbt. She’s published a blog post on our site about how she uses them together, which is very cool. But going a little bit deeper than that, I think, when you look at what dbt is doing or what companies like Transform are doing at the metrics layer, they’re really almost unbundling BI in this really interesting way where they’re saying, “Hey, it’s not just about transforming data as normalized tables in your warehouse. It’s about how you’re then actually turning them into metrics and measures and semantics that are accessible to BI and analytics layer.” And so we’re very excited about what’s happening there.
We think it’s very much in its infancy. But as it matures, we think there’s a really cool opportunity to bring that more in Hex where you could have people instead of having to write a ton of SQL in Excel, maybe they’re able to write something much more concise or maybe something more UI driven, where they can just pick a metric they want, get a data frame back and then start working against that. So we have a ton of shared business with dbt today. But I think with where they’re going and where we’re going, there’s a lot more that we’re going to be doing together and others in that space.
Great. So maybe to close, I’d love to spend two or three minutes on go-to-market sales. Who do you sell to? Who’s a great customer? Maybe, who are some existing customers? That aspect of the business.
Yeah. So we’re used by over… I think the last count was over 150 teams globally now, data teams paying us for Hex, which we’re extremely proud of. We support really big public companies, like Persion Pharmaceuticals, as an example. They use Hex enterprise wide to support their research efforts.
We’re also used by small startups. I think the one consistent thing across our customer base and where we’re really resonating, they’re making investments in data infrastructure and data. And we just talked about Snowflake and Databricks and dbt.
If companies are adopting those technologies, they’re often then coming to Hex for, “Great. I’ve got all this data now in my warehouse. I’m transforming it with dbt. Now I actually want to be able to ask and answer questions of it. I want to be able to do more with it than I’m able to in legacy tools or Jupyter Notebooks or SQL Scratchpads.”
Hex is a super good complement to companies that have invested in that stack. And what we’re seeing is any company that’s hiring folks in roles like data science, analytics, analytics engineering really needs and wants and gets a ton of value out of Hex. So, yeah. From a customer and target perspective, that’s where we’re at right now.
Very cool. Congratulations on all of this and this journey. The company is still pretty young. I mean, it seems like you guys are executing incredibly well and very fast.
Yeah. We started in late, late 2019. So really, not been at it too long. I was very, very fortunate to start at a company with two folks that I had worked with at Palantir, Caitlin and Glen. And we’ve had a lot of fun and a little bit of luck the last couple years building this out. So we’re looking to continue that streak for a little bit longer and keep going on. We’re having a good time.
Very cool. Well, thank you so much, Barry, for coming to this event, telling us your story. Best of luck for the future. I hope you come back in a couple of years.
Yeah. Ironically, I’m actually in New York this week, not that far from you. But we’re still doing this virtual. So maybe next time I come back, we’ll be able to do it in person.
Do it in person. Yes. I cannot wait for that. Okay. Cool. Thank you so much, Barry.