Data Science at Massive Scale: In Conversation with Solmaz Shahalizadeh, Shopify

Shopify (NYSE:SHOP) is one of those unlikely success stories that entrepreneurial dreams are made of.

In 2006, co-founder and CEO Tobias Lutke was a 24 year old German autodidact programmer who had followed his girlfriend to Ottawa, Canada. He partnered with an older entrepreneur, Scott Lake, to start an eCommerce business selling snowboards, Snowdevil. As Tobi realized there was no decent out of the box framework to build an e-commerce store at the time, he started building the Daredevil snowboard store from scratch, using the then nascent Ruby on Rails. Word spread out within the community about the quality of his work, and the duo decided to focus on the software platform, rather than the snowboard store. A world away from Silicon Valley, Shopify was born.

Fast forward to today , with many steps along the way, including a Series A round of financing in which our firm FirstMark invested: Shopify is a ~$34B public company that’s grown extremely fast in recent years and helps SMBs outfit their stores with a variety of essential tools. Shopify powers the online stores of more than 800,000 merchants in over 175 countries.

As tends to be the case for all major Internet franchises, Shopify recognized early the transformational power of harnessing and using data. Data science and machine learning were used in one product, then the next and over the years have become a cornerstone of the company.

The person leading this crucial effort has been Solmaz Shahalizadeh, VP of Data Science and Engineering. Growing up in Iran and inspired by her father (university professor) and mother (statistician), Solmaz started her career in the world of bioinformatics, helping inform cancer research using deep learning (before deep learning became cool again!). A few roles later, Solmaz found her way into Shopify through a hackathon—in fact, she never even submitted a resume. Since then, she’s worked her way up through the company (starting as the first financial data analyst) and has grown the data science team from 20 to 200 people.

A couple of weeks ago, I had the pleasure of sitting down with her to discuss the rapidly changing world of data science, machine learning, and much more at June’s Data Driven NYC event.

Here is the video of our fireside chat, and below the fold are some notes from our conversation:

The data science team at Shopify:

·       Data scientists are embedded in different business units or product areas. They work closely with the product managers, UX researchers and development teams.

·       The individual teams all report to one central unit and have an open data policy. Because a merchant lives through a spectrum of services, there shouldn’t be a disconnect in the products or data. As soon as consistency goes away, so does the trust in the data.

·       The most value comes from data scientists understanding the context of the problem they are trying to solve. By fully understanding the problem, you can use data science, machine learning and data engineering as tools to better solve the problems of the merchants.

Building the first product using data science

·       The first product where we deployed data science was Shopify Capital. The idea is to give cash to merchants to help them grow their business and only takes money back when the merchant has made money. In order for the business unit to be successful, the team needed to be able to predict how much money a merchant would make, and in what time frame.

·       We thought this product would be a great opportunity to test using machine learning.

·       In the beginning, we only used machine learning for a portion of the portfolio.

·       After the first iterations we became comfortable; and over the last two years, Shopify Capital is entirely machine learning–driven.

·       Using machine learning, Shopify is able to project the sales of the merchant with a good accuracy as well as the velocity of the sales.

·       The success of this product was a pivotal moment for Shopify: “Before that, we were using data to make data-informed decisions about what to build, where to invest and all of those things. But I think that was a moment where we said, okay, we can actually use data to create experiences.”

On developing a fraud detection program using machine learning

·       The next product to use machine learning was a tool that would predict how likely a purchase is to be fraudulent, so that the merchant can decide to fulfill it or not. The analysis had to be run in real-time.

·       The first experience of building the product focused on very simple logistic regression.

·       “We focusing on a product metrics—things like: what’s the trust of merchant? How often do we tell them to cancel something and they actually listen to us? We spent a lot of time thinking about what happens before and after machine learning models deployed and I think that allowed us to bring other products to market faster.”

·       The first time you go from any rule-based or non-machine-learning to machine learning process is the time that you get the highest lift.

How to determine when to use machine learning

·       Everything should be product and/or merchant-focused. Look at the problem that needs to be solved for the merchant. If machine learning can be used to accurately and quickly solve the problem, then production begins.

·       “Machine learning is very powerful. But at the end of the day it’s a tool. If you use it properly, it’s useful. If you don’t, it’s not.”

·       Shopify uses a two-week cycle to put out new features, look at the lift and do a lot of back-testing. They run the model looking at what would have happened in production over the last six months if they had run the product. If it’s not going to make the merchants’ experience better or it’s going to cause confusion, they don’t ship it.

On the pyramid of data science hierarchy of needs.

·       Monica Rogati created this great framework called the data science hierarchy of needs

. First you have to acquire data, but only collect the data you need.

·       Then, think about how you are going to move it from one location. Are there operational systems and API end points to your data lake? And then pipelines for that movement? And how resilient are they?

·       Then you can have reports and dashboards to look at what happened in the business.

·       On top of that, you can have things that are closer to causal inference where you differ statistical analysis. Then you can do A/B testing, then machine learning and AI.

On recruiting and building a machine learning team

·       “I recruited machine learning people by not going after machine learning people. Shopify looks at people working in bioinformatics, astrophysics and economics.”

·       People who are very successful in product organizations are people that can fall in love with the domain and use machine learning as a tool for solving problem.

·       Look for people that have used machine learning (or other aspects of data science) to solve a problem either in their personal life or in their last job.

·       Successful people have a sense of curiosity about data and the technical chops to follow.

Shahalizadeh’s three predictions for the future of data science

·       Seeing research and work around fair, accountable, transparent algorithms becoming more commonplace. Users are becoming more and more educated about their data and how these data products impact their life.

·       Stop to thinking in terms of human versus AI. Build tools that combine human intelligence with artificial intelligence, and we’ll be even more successful.

·    Taking a step back and looking at causal inference. “I hope over the next few years people invest in and learn from that.”

Leave a Reply

Your email address will not be published. Required fields are marked *