Facebook as an AI company: In conversation with Jerome Pesenti, VP of AI, Facebook

As always, many thanks for the Data Driven NYC team, Jack Cohen and Diego Gutierrez, as well as Karissa Domondon for the transcript.

FULL TRANSCRIPT (lightly edited for brevity and clarity):

Matt

I’d love to start with what AI means at Facebook, from an organizational perspective. Who’s the team, what do people do, how many offices, how many researchers?

Jerome

AI at Facebook is a pretty big endeavor. You’re thinking of thousands of engineers and researchers who the majority of their time work on ML systems. My team is the core AI team, whose role is really to advance AI and we do three different things.

One is, we work with products to make some big product advances. Things like, ‘Hey, how do you moderate and what’s happening on Facebook automatically? How do you do a ranking recommendation for both ads and the content?’ You know when you fire up your Instagram app or your Facebook apps, this is basically determined by a pretty sophisticated deep learning algorithm. And then also looking at the future of things like augmented reality or some new endeavors like commerce or remote presence are two big things. So a big product pillar. 

The second thing we do, and hopefully a lot of people have heard about it is to develop some new systems for ourselves and for the community. We have a big open source approach. We are the developers of PyTorch and we think this is really a community program. So not only we develop this, but we actually try to engage as much as possible the community. The goal for us is really to rethink the way software engineering is done in the age of machine learning. So it’s PyTorch, it’s all the models around it, but also all the systems around PyTorch. There’s a lot of work to be done there and we can talk about it if you’re interested.

Then the third pillar, which many people maybe know more about is the research pillar. The AI team at Facebook was launched six years ago with Yann LeCun joining the team, recruited by Mark.  Yann is still on the team as Chief AI Scientist, and he’s still very involved and they are really trying to just advance the state of the art.

That’s FAIR right? (note: Facebook AI Research)

That’s the FAIR piece. We also have an applied research team, so there’s a fundamental research team which is FAIR that Yann created, and also an applied research team. We’re really trying to move the state of the art both in the science piece and the applied piece.

So you have this lab piece of it, you also have embedded machine learning and AI people that work with product teams in the various units, is that how it works?

That’s right, there are a lot of people in my team that actually work with common goals with product teams, but there are also a lot of ML engineers who are actually in the product team. There are more ML engineers outside of my team than within.  But most of the research in AI is done inside my team for sure.

To give us a sense of scale, I think I read somewhere that FAIR, just the FAIR part, had 300 researchers across eight offices. Is that still accurate?

We don’t share, but ballpark is good. Look at AI in Facebook – it’s thousands of engineers and researchers and a good fraction of that is in the AI team.

To this point, and to anchor this conversation: one key idea I’ve heard is that Facebook ultimately is an AI company.  Meaning that there are certain companies that use a little bit of AI on top, on the side for features, to do different things. But at this stage, Facebook is fundamentally an AI company and specifically, a deep learning company. Is that correct?

There’s pretty much a deep learning system in every single Facebook product and they are very much at the core of them. The most obvious ones are the Facebook app and the Instagram app.  They have AI in the core.  The whole experience is driven by algorithm and as I mentioned moderation is driven by that. I think what’s more recent, and it’s coming more to the surface, is what we call AI experiences. Things that are driven by AI in a more visible way. That’s the kind of thing you’re going to see a lot more in the forthcoming future and we’ll talk about it today. I think AI was a lot behind the scenes. I think companies like Facebook or Google AI, has been behind the scenes and really at the core of the engine of the products for many years now. But in the past year I would say now you start seeing AI at the forefront driving the experience as well.

So much to talk about – new products, new AI features, new open source project, even a public Twitter beef with Elon Musk that you had about AGI, we’ll talk about that at the end – a fascinating topic.  Maybe let’s start with moderation, which is a debate that’s very much on top of everyone’s mind. So maybe to start, can you maybe walk us through the history of the arms race between Facebook trying to moderate comments and people trying to outwit Facebook again and again over the last few years.

Sure I mean moderation is always a partnership between humans and computers, right? We have 30,000 moderators that try to really understand if the content that is published on the platform is satisfying our policy.  But 30,000, given the amount of content, and you’re talking about billions of bits of content every day, is not enough. So in great part actually, the identification and flagging and the action is done by AI today and it’s a fantastic AI problem because it’s extremely complicated to understand deeply what’s happening. And it’s also an arms race because there are some aspects that [inaudible, at 44:57].  So you can really think of it as us throwing everything we have at it in terms of we use the most advanced AI possible. If you think of a system like XLM-R, which is, we are trying to create this really large language model leveraging transformers and they are multilingual at the source.  So we try to learn a hundred languages at the same time in a single model using transformer architecture. And that really gives you a base understanding of language and then on top of that you start creating some tasks and figure out if something is hate speech or if it’s offensive content or it’s bullying.  So we try to really have a very refined understanding of language, and that’s one axis. 

The second is when you look holistically at what’s happening on the platform, what we call a “multimodal” approach to it. So when you look at a post, you have the person posting it, you have all the metadata attached to it, you may have a video attached with some text. More and more, and I think I sent you a slide on this, we have this view of trying to bring all this signal in a single network, trying to identify if something is satisfying our policy or not? So that’s what we call multimodal, which is becoming a lot more in fashion in AI, but especially relevant for this problem of automated moderation. So that’s the second approach. The third thing we’re trying to do is we’re also trying to engage the community.

Just to make sure I understand this point multimodal, so multimodal means you could have an image that says one thing, but then the text around the image actually ends up meaning exactly the opposite thing of the image.

That’s the slide here. The first idea there is like you bring in a fusion model all of this signal that you have an image, the video, the text, you know, and the aspect of interaction, the actor, the metadata, right?  So that’s the approach. Now there’s a very interesting application. So it’s a good transition to what I was about to talk about, which is we are trying to engage the community into these problems. Last year we launched what’s called a deep fake detection challenge, trying to get people to really do this complex task we have seen coming, which people are trying to create content that’s synthetic and sometimes misleading. How do you identify that? We just launched another challenge, which is the Hateful Meme Challenge. So hateful memes are exactly an example of what you mentioned, which is you may have a text that seemed completely innocuous, you may have an image and maybe it seem completely innocuous and you put the two together and it’s extremely offensive and it’s actually policy-violating. So this is one of the challenges, right? You need to have the full context to understand if that meme is really satisfying policy or it means something really horrifying that you don’t want on the platform.

You guys did some really interesting work or are partnering with third parties as you mentioned around the virus, around COVID, in terms of trying to detect what is legitimate information versus not. Can you maybe expand on that?

That’s another thing that’s quite important for us to try to moderate the content on our platform. You know, in an open way. We are trying to make sure that people can  have free speech, that they they can discuss their views, but when we identify information that’s completely misleading we try to, if you want, ‘fingerprint’ it and then figure out what claims is made and then figure out all the context that’s actually very similar to this. So we work often with fact checking organizations and then when they flag content that should be at least shown as misleading then we have this very advanced similarity algorithm that’s kind of like looking at an embedding of the content itself – again, it can be an image itself or a multimodal aspect of things – and then try to keep telling you all the other content that’s very similar so that we can have much faster action on our platform.  If I identify one piece of content you can remove all the content that are very similar even if it has been modified five different ways or it’s the same version of the same claims.

So you partner with organizations to find a version of the truth and then you use machine learning and AI to find other documents that correspond to that truth.

More like content that should be flagged as being misleading in this case.

Deep fake is an absolutely fascinating problem. So your approach is to partner with developers around the world because presumably it’s such a complex problem. I saw that you had some prize with this organization, called Driven Data, so not data driven but Driven Data the org.  Any thoughts on where this whole thing is going, whether it’s going to be possible to detect those or what’s your vision of the future of deep fakes?

It’s a really good question. On one hand it’s, there’s no question that tampered content like this will become more prominent on the platform. We’re trying to be really ahead of the game and again, creating this competition and trying to figure out how we can detect this. We’re going to announce the results of the deep fake challenge in the coming weeks. I think through the challenge we saw tremendous progress.  People are working to figure out ways to detect this and just in a few months you could see progress through the leaderboard. It’s never going to be a soft problem, right? It’s always going to be an arms race. I’m confident that we can actually identify part of it. I think we can act on it. We can label it in a case when it matters to users, but I’m also confident that it’s going to be an ongoing problem and it’s not going to be fixed tomorrow.  In some way this is a problem that has always been around. Like even when you had the printing press, you know how do you detect that people make misleading claims?  So this is part of our mandate to make sure that we can identify as much as possible, act on it, label it, and especially make people conscious of what they’re seeing and keep them informed.

From a technical standpoint, any promising approaches? Is that a brute force exercise around data and like the more videos you have the smarter the algorithms become or are there different ways you can do things? Any promising approach?

I actually haven’t seen the error analysis at the end. So I don’t know. There are multiple approaches you can do.  You can try and find the original content that has been modified. So that’s one approach, it’s back to the bit of what I was mentioning earlier, which is we try to figure out similarities. The other is you have a pretty good understanding of the artifacts of the algorithm that are used today. So when a deep fake algorithm generates a new thing, it leaves some kind of signature, it has some side effects to the content that you can try to identify.  So there are a lot of  aspects like this. It’s pretty clear that the approach which we’ll use will be an ensemble technique that combines all these views, understanding what’s on it before looking at, you know, understanding the algorithm. Also looking at the multimodality aspect, right? Because you also have who posted the video? What’s the behavior? What’s the context around it?

Plenty of new products at Facebook around AI. Shops was a big announcement recently. Do you want to talk about the AI part of this and the universal product understanding model is one of the notes I took – what does it do and how does that work?

Yeah, I would say being at Facebook during COVID has been quite interesting.  I’ve been three months in and we basically launched two offerings because of it in the middle of it, even though we just had to all learn to work remotely. So it has been quite an interesting adventure. I got to say. I’ve never seen this before. It’s such a drive. So you’re right.

Just to make sure the launch was accelerated by COVID?

Oh yeah.   This was not in the plan before.  This is quite amazing. The way we look at it, the company is really trying to prioritize three things right now.  So one is that there’s all the COVID threads, you know, some of the things you mentioned. I mean some of it is making sure the information actually is correct on our platform. We can moderate this. The other is also we’re trying to help with COVID aspects. So I’m not sure you heard of this thing called symptom survey. We partnered with Carnegie Mellon to really understand the symptoms that people are self-evaluating and to try to get a model to policy makers to understand what’s happening.  Another part of that is actually FAIR in our research team is really pushing to try to create predictive models on top of that information, on top of morbidity data as well, to try to really understand what’s happening with COVID. As we are reopening, what are the policies that will lead to making the R factor greater than the line or keeping it under line.  So we have a big thread around COVID.

Another big thread in the company is that as people have been at home the usage of our platform has gone up like crazy. Not only are we all at home, we have to deal with people using our services a lot more, which put a lot of strain on our system. I mean basically the growth that happened in just a matter of a month was what we predicted would have happened over more than a year. So it really took us by surprise.  But you know it’s good. One of the issues that was really remarkable was remote presence.   Whether you use video conference – obviously everybody has seen the real big pickup of zoom – it’s something that we have noticed as well. That actually made us react there and really prioritize.  These are things that we had in the pipeline but it wasn’t a plan for anytime soon. To accelerate our effort around having a cross-platform video conference system that was really robust we launched this idea of Facebook Rooms, which you can connect with even without a Facebook account from your computer or from your phone. That actually was a really, really interesting thing. I’m based in New York as you know and most of the company is out in California. So I’ve been a big proponent of leveraging remote presence to a lot more. So I’m really happy that we are investing a lot there.

The coolest thing also in there is that real presence, to make the experience really awesome, you need to use a lot of AI.  It may not be as visible, but AI can make the experience interesting. I sent you an interesting video, maybe we can share it. One of the features we launched and I think is pretty cool is, with zoom people know about the background – you’re using one right now. One thing that the company had invested a lot on is , this is actually a background, it looks like it’s real, but it’s actually a completely artificial background here.  It’s using our augmented reality platform to change there and make it look like you’re in a place. The cool thing about it is that as you are moving around it’s not a static background it makes it look like you are in an environment.

Internally we even deployed one that has the Facebook office for nostalgia because none of us have been in the Facebook office, so it can make it look like you’re actually walking around the office. It’s a pretty good feature and it shows you all the things potentially to come at the intersection of remote presence, AI and AR. This is actually kind of a step towards virtual reality, augmented reality. Imagine for example, if everybody in that video had the same background. Then it looks like you’re all in the same room. You can tweak the sound so that it makes it look like the sound is coming from the person. There’s a sense of space in that room. So that’s a big direction. The other cool stuff is to make the experience a lot more pleasurable, like remove all the background sounds that are distracting, make sure that when people are on mute they know about it and you know auto-mute, have reactions.  One thing we’re working on is you do a thumbs up and the system actually recognizes that gesture – it’s something we worked on before and we’re pushing.  There’s a whole range of things or features that we are planning to push this year. We launched Rooms just a few weeks ago. It has the background like this on iOS right now. We’re working on it for Android and desktop. But it’s very exciting. 

The last one is what you mentioned is Shops and the vision which came from Mark directly was – this is a really tough time for SMBs. You know, how can we make it easier for them to reach out to customers?  That was the idea behind allowing people to make a shopping experience very, very quickly. One aspect of that experience is about – I can take any picture, right? So if my customers post something with my own product, then the system will immediately recognize that and tag the product. So we’ve created this pretty advanced model that looks at very detailed features, that takes care of occlusion. Because you know when you have a product on a picture, usually people don’t pay attention, they don’t try to take a picture directly of it. It may be occluded, that means part of it may be hidden by another thing. So we have a pretty advanced system for image segmentation and video segmentation and then image recognition. But then we push it to the next level by looking at not just objects and the type of object, but also the specific product. And we’re spending a lot of time on fashion. It’s obviously a big thing on Instagram.  We’re looking also at furniture because people can present things in their own home and see how it would play out. Obviously people  spend a lot more time on the home. So that’s the idea behind this.  Pretty much any image out there can be part of the image inventory of small and medium businesses so that they can point to it and we can tag their product so they can point to that experience and let people understand what the product looks like in that context.

Amazing. So that’s out and that’s functioning. Or are you building those features?

We have it functioning, we launched.  The background I mentioned, you can see it in the slide there, and the shopping stuff will happen in the coming months.

So we can all go play with it after this if we haven’t already done so.  Another theme that we had alluded to: let’s chat a little bit about open source. I think that what Facebook has done around open source, a lot of the innovation that you guys came up with has been quite remarkable. So maybe walk us through the general philosophy around so much openness.

Facebook is one of the top two companies out there leveraging AI.  For us, actually we’re not selling AI, we’re not a vendor, we don’t have a offering, we have no plans of having one and we have major problems. You know, I mentioned hateful memes, I mentioned deep fakes, I mentioned moderating our platform. We’re also trying to go to towards augmented reality. So for us it’s really essential that AI makes progress. It will really help us and will help our business. So the strategy I’ve taken and I have support from Schroepfer, the CTO, my boss and his boss, Mark, is to kind of go all in on openness and open source. Obviously PyTorch is one of our flagship projects at the moment and we’re really trying to build a community around this.  But not only that, we’re trying to incentivize people to share their model. We’ve taken an approach to be very open and try to put all our research models and even the base often of our production models, that is machine translation, or language models, or speech models, or vision models, our vision system, out there so that people can build on top of it and keep improving them and we can reap the benefit of this improvement. We really believe that it can help the community improve.

The second thing that I feel very passionate about, and that’s something that I was very passionate before joining Facebook and I joined people at Facebook like Yann or Joelle Pineau who are very passionate about it, is really this idea of reproducibility in AI.   There’s so much buzz about AI and you know a lot of the buzz – some of that is justified. I mean, we’re not going to go through any AI winter anytime soon. The use of AI is exploding. It’s real. But people make a lot of claims that it’s human level. There’s a lot of companies out there or others, you know, make this claim and we don’t want to do that. We really want to make it very easy for people to say, okay, this is what my system does and this is how you can reproduce it. We believe it’s good for two things.  One is to cut the buzz, people can actually verify for themselves what the performance is and also it helps people build on top of what other people have done.  So we have a really big push around, you know, being open source, putting not only the systems out there but the models and then making it really easy for people to reproduce on their own infrastructure the results of the whole community. We’ve made a lot of announcements around this. We’re doing acquisitions where we could, we are pushing very much on this. We’re partnering with conferences around the reproducibility checklist.

Sounds like you’re partnering with AWS on the PyTorch release?

Oh and that’s another aspect which is we are really trying to create a community including actually partnering with Google and we’re partnering with Amazon, we’re partnering with Microsoft and making sure that all their own platforms can integrate all the PyTorch systems very well. Then they adapt their hardware to it like for example when we announced PyTorch working on TPU. As you mentioned, TorchServe came from AWS. Our view is that we should not be just the only competitor in the community. We want to welcome a lot of people to contribute to the community and make it the most thriving possible.

One recent open source release that looked super interesting is BlenderBot. Do you want to talk about this?

BlenderBot is absolutely one of my favorites actually. I sent you a few interactions so I can tell you a lot about if you want to share that. 

BlenderBot is a chatbot, right?

BlenderBot is a chatbot, and it’s a combination of two things. It’s leveraging a lot of the research that we have done in our team, and one led by the group that Jason Weston put together around understanding what makes dialogue engaging and what are the pitfalls that dialogue falls into… You know, often a chatbot tends to be quickly off topic or they try give you the same response over, or they repeat themselves. So there are a lot of pitfalls and the team has really studied it very hard and created some specialized data sets and training data sets to train a bot. Then plus on top of that, there’s been a lot of advances around very large language models and transformers. The jumps in the two gave us BlenderBot and it’s a pretty interesting bot where you can have literally half an hour conversation with it without getting bored. The system has huge knowledge, it’s the largest chat model out there.  It’s 9 billion parameters. It integrates a lot of knowledge. The system doesn’t have any rules.  When you interact with something like Alexa or Google Home today, the system basically does on intent recognition and then they have a very scripted dialogue after this. Usually you recognize one of the steps and it is very limited because you can have no more than one or two turns. BlenderBot on the contrary, is completely trained from scratch.

This is the interaction and if you can read it, this is an interaction where I asked the box to talk to me about Ancient Greece. It knows some kind of interesting things about that, around schools of philosophy and stuff like that. Then after that I ask it about tea and it knows about different types of tea.  But none of that was actually injected in the system. It just learned. Actually in this case it learned from some specific data we put together and learned from Reddit. It’s not ready for prime consumption, let’s be clear, you know these bots have lots of interesting side effects. You cannot really control what they’re going to say, but they’re entertaining, fun, engaging and a lot more human-like than you’d expect.  It still doesn’t mean that they understand full reasoning. They don’t have a lot of common sense. They don’t have a lot of memory the moment, but they are really, really engaging and entertaining. So that was the latest. We think there’s a lot more work that needs to be done for this to be useful and we think there’s a whole new market of things that can be created and that’s why we released it completely open source and the model is available for anybody to try and build on. It’s not ready for prime time. It’s not ready for products. But it’s definitely really interesting for research.

Very cool. You mentioned CPU – I’d actually love to talk about hardware a little bit.  Watching various videos you have online, as I was preparing for this, you had one where you made the point that many people may not completely realize which is that the sheer cost in terms of hardware, cloud consumption, and all of the things of running those super high-profile, brute force exercises, to beat games and that type of thing that get a lot of press is becoming incredible. That actually one of the barriers to the progress of that type of method is actually very much cost-related. Can you talk a little bit about the cost, the hardware, and I guess the evolution towards using maybe more CPU as opposed to GPU?

Yeah so definitely what’s interesting about deep learning today, which is still surprising, is that scale works.  So you can take a system and as you scale them up the performance goes up and it’s actually quite surprising to everybody and including to me and many experts that we still see a lot of gains from scale. So I mentioned BlenderBot – BlenderBot is 9 billion parameters, which a few years back would have been hard to conceive. So that is one of the [inaudible, 1:10:16] actually if you take the top models out there, they tend to increase in terms of size, in terms of hardware consumption, by 10x a year. Now any if you look at utilization within Facebook, the use of resources for ML is definitely exponential right now. Now there’s this little law out there, which is all exponential laws will stop at some point. You know, you cannot continue because at some point you’re going to consume all the atoms in the universe. And that includes this one. So there are many different directions where this needs to go. You know, one direction you mentioned CPU, I mean we’re definitely using more than CPUs and we’re using GPUs and we are looking into dedicated hardware as well, but you need to try to really improve the consumption of this model. So I think that’s a field that’s increasing which is you need to make the models more efficient and we need to start looking at the trade-off of accuracy versus efficiency. I think a lot of the field, including my team, has looked a lot at one aspect of it, which is let’s improve the performance from an accuracy perspective, but we need to look at the efficiency of this model as well and look at the trade-off between the two. So that’s a big effort.

The second is, you’re looking at leveraging hardware and different types of hardware. CPU is obviously the most common one, but you know, we use GPU, building super clusters of GPU [check: 1:11:34]. So that’s something we spend a lot of time in and leveraging at times dedicated hardware as well.   We are actually growing on these three fronts. And the third is that we’re still researching, you know, larger. So as much as we are constrained, I am still pushing my team to say, what is the next advance we can get by scaling this up,  and how are we going to scale this up. So we’re still pushing hard but it’s not going to be 10x per year. It’s just not sustainable. That’s why I was quoted at some point saying we’re going to hit a wall. I was just stating the obvious, which is that 10x is not going to last forever and at some point it’s going to stop, it’s going to diminish in growth and we have to look at these other aspects I mentioned, you know, hardware acceleration making the system more efficient.

This in people’s minds, obviously not talking about anything specific about Facebook – the cost of running the inferences at scale, just one go can be in the millions, is that correct?

Well, I mean not one [inaudible, 1:12:50] from one user,  I think what has really increased lately has been the training costs. Actually to be clear, the most costly thing we do in ML is still inference cost. Because when you put a piece of content within Facebook, it’s running hundreds of different ML base algorithms and they all run on parallel and it’s using a huge number of machines, every time you post something. So I’m going to say it’s not a million per post. When it does become millions is when you do training runs. So some of the training runs in the most advanced systems, whether it comes from our company or other companies out there are starting to be extremely expensive. You can look at one run in the scale that you’re mentioning. That’s not sustainable, including for companies like Facebook.

We talked about hardware progress, making models more efficient, so a little bit to the public beef with Elon on Twitter – what’s your sense of where things are going in terms of AGI, meaning general intelligence, what’s realistic, what’s not realistic, from the ultimate experts in the field, what’s your current sentiment about what’s doable? What’s not doable?

When I talk about that, I try to really have a balanced view.  I’m concerned that people make claims out there that give a distorted view of the reality.  I’m trying to communicate three things. The first thing is that look, we are nowhere near a human level intelligence.  This system that we’re creating, I mean I talk about BlenderBot, it’s a really fun system. It’s so amazing where we are, but quickly as you interact with the system, you can realize this thing doesn’t have a lot of common sense. The system we have today are really limited and I don’t believe anybody has a good view as to when we’ll match human intelligence, but it’s not going to happen in the next decade. It’s going to happen in the next two or three decades. It’s going to take much longer. How long, I don’t know, but let’s not give the impression that is around the corner. I think it’d be a disservice to everybody. That’s my first point.

The second is, I’m trying to have a bit of a crusade around this term AGI. I think it’s a very misleading term. I don’t even know what it stands for. Actually it stands for Artificial General Intelligence. But what people don’t realize is that human intelligence – it’s not general. It’s actually very, very specific to our world. It’s very biased. It’s really customized to survival, humanity and our planet.  So the concept of general intelligence, I mean, yes, humans tend to have a more general intelligence than computers today. But there’s no such thing as a general intelligence. And it’s not saying even that should you reach human level intelligence that you will get to something like the singularity or you’ll get something that is going to keep improving itself. People throw a lot in that term sometimes they mean human intelligence, which is not a good term for it because human intelligence is not general. Sometimes they’re trying to imply that it’s going to be like the end of all, it’s going to be solving every problem by solving that general intelligence. I find the term quite misleading and I’m trying to get people to stop adopting it.  My point of raising these two things is I do believe that AI today has some really, really critical issues.  You don’t need to project yourself in this future where you’re going to have the singularity or you’re going to have human intelligence to face this problem. Today we need to create systems that have less biases. We need to create systems that are more inclusive, that work as well for everybody on the planet and not based on your skin color or your race or where you come from. 

We need to create systems that are robust. It’s quite difficult to create an AI system that’s robust.  I’m not going to create a car that’s going to run into the side of the road all of a sudden. They need to be transparent that if you give to people that they have enough control over. This problem, which you can project 30 years from now when AI reaches AGI, where it’s going to be computers taking over the world – well you don’t need to think of a computer taking over the world to try to start addressing this problem now. That’s what bugs me in this whole debate is that as we are projecting to a future that’s very far away and not real for today, we’re forgetting that there are real problems today.  When we put AI in a car we need to make sure this AI is going to robust before we put it on the road out there. When we put AI to do moderation we need to make sure that it does that without too many biases and fairly for everybody. There are really, really difficult problems and these are the problems that we should talk about now.  That’s the real debate and it will lead to actually protecting us against potential side effects in the far future. But there are real problems today.

In terms of today, and you have a great background as an entrepreneur, what’s interesting for startups to work on? Is that AI ops, machine learning infrastructure, AI applications? What are some of the interesting areas?

I think we are in a moment in my opinion where the capabilities of AI are way below the applications of AI. So where do I encourage startups to go on the application side? There are actually two fields that I’m very passionate about. I’m an entrepreneur so I always ask myself, if I were to start a company today, what would I do? One is really back to the previous company I was in, Benevolent, which is AI and science. I do think that AI-assisted science has a huge future ahead of us.  Chemistry, biology, physics leveraging AI systems to make better discoveries is something that’s going to really break through in the coming years. I mean, even my team came up with some just leveraging transformer models for proteins and it’s amazing how well it works.  My team also applied a transformer model to mathematics, doing things like partial derivation equations and integration and showing that who works as well or better, mathematics or some hundred page long algorithm. So there’s really interesting aspects around AI and scientific discovery.

The second I’m also very passionate about is around creativity. Can AI help people be a lot more creative? The way I talk about it is can AI help people become artists without having to have the technique, spending the 10,000 hours mastering a technique.  Can they make them really creative, really fast, without the barrier to entry that art often has, and also opening new realms of creativity. Right now, multimodality stories like you see on Instagram or TikTok.  This new concept of creative expression, I’m really bullish about this.  But I think right now if I wanted to start up, I would really go on this application side because I think that technology, even if you assume no research will happen in the next 10 years, there’s still so much that can be done and then it keeps improving.  It keeps giving new advances that we don’t quite know how to leverage at the moment.

Fascinating. All right. Thank you so much. Let’s switch to some of those questions. 

How important is it as a company to have data literacy across the organization before implementing advanced AI projects? How do you spread data literacy or AI literacy across Facebook? Any tips and tricks there?

It’s a really good question.  I have my standard presentation where I always start like this, especially when I try to advise companies, which is – before you talk about AI you should talk about data and data science. I actually always refer to the number of data scientists within Facebook. To give you an idea, there are as many data scientists in Facebook as there are product managers.  To answer your second question, I didn’t have to do anything because this was the state of the company when I arrived. It’s an extremely data driven organization that brings huge value to data and also to data driven decision making. My advice to everybody is yes, you need to get to that structure a lot more than actually adopting AI sometime.  I think that’s the biggest bottleneck to adopting AI, to not have a data driven organization both from keeping that data, validating, keeping it clean and useful, and then making decisions. That’s the hardest right? Making decisions based on data. Really the key at a place like Facebook, you cannot come up with a recommendation without supporting it with the data either in the user research or analysis of what’s happening on the system. That’s the first step, that’s really the ground to make an organization ready for AI.

I agree combating bias in AI is a big issue. Earlier you mentioned how BlenderBot was trained on sources like Reddit in an unsupervised way. How do you prevent a bot like BlenderBot to not reflect the biases one might find in Reddit threads?

Yeah, that’s a good question. I would not advise to train a production bot on Reddit. You know, it definitely gives it some flavor. The team actually – I refer back to the papers from the team because they are actually spending a lot of time on this. It’s also about creating specific objectives around de-biasing the data. You know, there’s data around genders for example. So try to really change the objective function and especially on some sub-task to make sure that the system optimizes for being the least biased possible. Bias is a really complicated subject because intelligence itself is by nature biased, right? It’s our bias toward making really quick inferences and understanding the world without having all the necessary information.  You cannot go ahead and say bias in itself is a bad thing, but you need to figure out what bias is acceptable versus what bias is not acceptable. We know that many biases are unacceptable and you need to make sure that you can control, check, monitor the system and also train them to optimize for de-biasing themselves.

There was a brilliant writeup by Michael Jordan, not The Last Dance one the one from UC Berkeley, on why he thinks that AI terminology is abused. He believes that what we have is at best IA, intelligent automation, not AI. Given all that, why do we still have widespread usage of the AI terminology, even by experts in the field?  What are your thoughts?

I disagree with this one. It’s the standard paradox, right? That every time AI solves a problem, then it’s no longer AI, right? It’s no longer intelligent because it’s simple. If you interact with something like BlenderBot or if you look at the advances we do in image recognition and image segmentation – I will call it AI. I have no problem with this, so I kind of disagree.  It’s definitely a level of performance that we were not expecting 10 years ago. Games is the same, right? I mean the fact that we can beat the best master in Go, that was also an aspect that came much faster. So I think that’s part of intelligence. It’s not the whole intelligence.  I think what people underestimate is the complexity of human intelligence and all the dimensions of human intelligence.  I would completely agree that we are still very far from a human intelligence. We have some pieces of it. As we said at Facebook, we’re really one percent done. We are just looking at a sliver of intelligence, but I would quantify it as intelligence. I do think this system qualified for it.

Other firms have published a number of AI data sets for large scale learning models. What is Facebook’s position on this?

It’s actually surprisingly difficult to do well. So, first, I think it’s really important, but it needs to be done well and it’s really hard to do well. You cannot just publish some pointer to some data and then let people go download the data and then you realize, well actually do I have the rights for that data? I cannot tell you how much effort we put in this deep fake dataset or the hateful meme, because initially we were like, ‘Oh okay, we have access to all this data at Facebook, let’s leverage it’. Well did we ask users for their permission to use it? And would that be okay? We actually ended up generating from scratch many of the datasets. So I do think it’s quite important for companies to come up with datasets, but it’s a lot more complicated than people think it is and you have to be a lot more responsible about it. You cannot just take like, ‘oh this user put their videos out there. I can use it for whenever I want’. It doesn’t work like this. You need to be very responsible, very careful. That ended up for us being that we actually invested in recreating datasets specifically for the research community from scratch.

How do you detect drifts and ongoing performance of your AI models?

Oh, that’s a really, really good question. My little trick for this one is that I have a very, very good data science team as part of my overall team and they are actually creating systems… my little joke that I have in my organization is that sometimes people who are good ML engineers or practitioners or AI scientists, are not very good data scientists.  This is a very good data scientist question. For me I have a very strong data science team that builds tooling to detect this drift and you want to do it in a live system. You always try to connect the performance of your system. I mean it’s much more convenient to have offline systems, but we always try to connect the offline performance to online performance and always try to link the performance of the UI system, like the change in the model and to see the performance over time to your online data and to the live system. Very, very interesting problem and a very good idea and not easy at all to do. And another good reason to recruit a lot of good data scientists even before you recruit your ML scientists and engineers.

I had one question I believe from Joshua Bloom, former speaker at Data Driven when he was founder of wise.io and then went to GE Digital and a professor at UC Berkeley. On deep fakes, one danger is the blending of an individual image with one another. This has been shown to strongly sway preference in the context of political candidate selection, for example.  Does Facebook ban the personalization of advertisements along this axis?

I don’t think I can answer.  That’s a really tricky question and I don’t have all the policies in mind, and I wouldn’t venture to actually give. So you see we have a very strong view on political ads but there are limits as to what we can do and some of them are banned. First of all, if the ad would say something that would be dangerous for people to follow, or would be a threat to their health or such a thing. There are limits and I think that were statements also that were made around how much you can use modified content for that. But I don’t have all the details and I’m not sure I understood the question and the detail enough. Happy to do that offline.

One question about the most interesting techniques that you may be experimenting with. Which category are you expecting the new GAN-like step forward in text and visual modalities? Is it going to be reinforcement learning like Few-Shots method maybe, or unsupervised learning, or something else?

The one thing right now, and I’ve started to be pretty vocal about it and obviously AI has been pushing this for the past few years, is that self-supervised learning is really showing promise everywhere. That’s the technique behind large language models, but my team came up with new papers that are on vision. So we believe actually that these techniques will supersede the field in pretty much every area. So  self-supervision really is the way to go. It doesn’t mean that it doesn’t combine with some level of supervision, but some pre-training and self-supervision is really the future in pretty much every area.

Actually one interesting question I think about a lot as well – do you have any advice in deploying AI and ML capabilities for non-technical stakeholders in the B2B enterprise space? Basically the question about the democratization of AI, beyond the chosen few of the Facebooks and the Googles that have all the data and access to all the top researchers. How do people deploy AI and ML?

Yeah, good question. AI is still a fairly technical field. I do believe that many people can use and deploy AI today. It’s not a question of having huge resources and data and all. I mean there’s small companies like HuggingFace making all of these models available for everybody, really reducing the barrier trench for people to use the system. I don’t buy the idea that people cannot use that, but it’s still a fairly technical space. I would not go down the path, if you remember if you’re old enough 20-30 years ago people were like ‘programming will be accessible to business users in five years, in 10 years’ – it never happened. I don’t buy that AI will be accessible to people who don’t have a programming background anytime soon. But machine learning will be part of the toolset of any programmer. But it will still be for more of the developer-type to develop the system rather than a point and click system. But I do believe that any developer today can leverage machine learning pretty efficiently as long as they understand the base concepts.

A couple of questions on rare events, which is an interesting discussion – there’s been a lot of write-ups about how COVID broke the machine learning models because the current situation was not reflected in prior datasets. So one question – what are your thoughts on the potential for AI to identify and classify rare events or objects that are required? How do you specialize in language or expertise [1:33:00], which is a variation of that?

The first thing I want to say is that – I know you hear this things like, ‘Oh, AI failed because it didn’t solve X’. Nobody ever said that AI could solve every problem. We are at the stage right now where AI applications are expanding, exploding, but it’s still a fraction of what humanity can do.  So I wouldn’t always put the expectation.  I do agree with you that when you have rare events, given that the AI today learns from prior data and historical data are not best suited for AI at the moment.  We are trying to create a system that can react to them and kind of learn from little data. But that’s, I would say, still pretty exploratory. So AI in many ways is a system that kind of reproduces the past at the moment and that’s the nature of it. That’s a known limitation. But that’s okay. It’s a known limitation. I’m not claiming that you can solve every problem with AI. That’s not at all the case.

A question from Morgan. Can you comment on Facebook’s vision for ONNX, is its goal to make this an industry standard neural network format?

I mean we’ve multiple approaches. We’re definitely going all in as PyTorch, end to end so I think initially when we launched the ONNX strategy it was more like a multi-framework world and we had actually two from [1:34:20] working internally in between PyTorch and Caffe2. But we’re still supporting ONNX as there are many partners that leverage it. So first of all, Microsoft is a big proponent of ONNX and they have their own serving platform that’s based on ONNX. It’s a useful translation layer. I think it’s an option that we offer as part of PyTorch. It’s not a necessary thing. You can also go from research to production, just PyTorch. But both paths we support at the moment.

How do you balance explainable AI and AI that performs better but is more difficult to explain? What are the implications of your position and does it become more or less AI/ML driven based on what position the org takes?

I think explaining AI is an interesting topic. I tend to prefer the term transparent than explainable. I don’t think that it’s necessary to have a system that you can explain all the way, like programs that we use every day – you’re not going to go in the code even though if you could read it, you’re not going to go in there. But I think systems that are transparent is quite important – you want to make sure people understand what the system is optimizing for example. The data that it can use, the criteria, the robustness that it has. It’s a really emerging field, I think we don’t understand it really well. There’s a lot of user research that needs to be done. How do you give trust in AI systems to people? What does it mean for them? Again, people trust other people, right? They interact with other people. Other people are not explainable. They’re not very transparent either. So what is the expectation for AI systems? I definitely believe that we need to make a lot more research and effort in that area at least to understand what the system is optimized for, what data it used, what are its constraints, and how it works. But it’s still very much a field of research.

How does Facebook think about build versus buy decisions related to AI? What’s Facebook willing to outsource versus build themselves?

It’s a good question. I mean in some way there’s no constraint. We tend to leverage a lot what the community develops. I’d say we tend to not buy a lot of AI systems out there because we’re at the age that it’s hard to find something that satisfies our requirements. But we do leverage a lot of external research and bring it in and compare it and sometimes integrate it. So we don’t buy a lot. We don’t license a lot, but we do look at a lot of companies out there and I have done quite a few acquisitions. I would say we try to be completely at the edge of AI in everything we do.

Leave a Reply

Your email address will not be published. Required fields are marked *