The excitement! The drama! The action!
Everybody is talking breathlessly about AI all of a sudden. OpenAI gets a $10B investment. Google is in Code Red. Sergey is coding again. Bill Gates says what’s been happening in AI in the last 12 months is “every bit as important as the PC or the internet” (here). Brand new startups are popping up (20 Generative AI companies just in the Winter ’23 YC batch). VCs are back to chasing pre-revenue startups at billions of valuation.
So what does it all mean? Is this one of those breakthrough moments that only happen every few decades? Or just the logical continuation of work that has been happening for many years? Are we in the early days of a true exponential acceleration? Or in the early days of a hype cycle and mini financing bubble, as many in tech are desperate for the next big platform shift, after social and mobile, and the crypto headfake?
The answer to all those questions is… yes.
We’ll dig in in the following order:
- AI goes mainstream
- The exponential acceleration of Generative AI
- The inevitable backlash
- The business of Generative AI: Big Tech has a head start over startups
AI goes mainstream
It had been a wild ride in the world of AI throughout 2022, but what truly took things to a fever pitch was, of course, the public release of Open’s AI conversational bot, ChatGPT, on November 30, 2022. ChatGPT, a chatbot with an uncanny ability to mimic a human conversationalist, quickly became the fastest growing product, well, ever.
For whoever was around then, the experience of first interacting with ChatGPT was reminiscent of the first time they interacted with Google in the late nineties. Wait, is it really that good? And that fast? How is this even possible? Or the iPhone when it first came out. Basically, a first glimpse into what feels like an exponential future.
In Silicon Valley, Wall Street and around the world, ChatGPT immediately took over every business meeting, conversation, dinner, and most of all, every bit of social media. Screenshots of smart, amusing and occasionally wrong replies by ChatGPT became ubiquitous on Twitter.
By January, ChatGPT had reached 100M users.
A whole industry of overnight experts emerged on social media, with a never ending bombardment of explainer threads and ambitious TikTokers teaching us the ways of prompt engineering, meaning providing the kind of input that would elicit the best response from ChatGPT:
ChatGPT didn’t come out of nowhere. AI circles had been buzzing about GPT-3 since its release in June 2020, raving about a quality of text output that was so high that it was difficult to determine whether or not it was written by a human. But GPT-3 was provided as an API targeting developers, not the broad public.
The release of ChatGPT (based on GPT 3.5) feels like the moment AI truly went mainstream in the collective consciousness.
We are all routinely exposed to AI prowess in our everyday lives through voice assistants, auto-categorization of photos, using our faces to unlock our cell phones, or receiving calls from our banks after an AI system detected possible financial fraud. But, beyond the fact that most people don’t realize that AI powers all of those capabilities and more, arguably those feel like one-trick ponies.
With ChatGPT, suddenly you had the experience of interacting with something that felt like an all-encompassing, general purpose intelligence.
The hype around ChatGPT is not just fun to talk about. It’s very consequential in many ways, including because it has forced everyone in the industry to react aggressively to it, unleashing, among other things, an epic battle for Internet search.
The Exponential Acceleration of Generative AI
But, of course, it’s not just ChatGPT. For anyone who was paying attention, the last few months saw a dizzying succession of groundbreaking announcements, seemingly every day. With AI, you could now create audio, code, images, text and videos.
What was at some point called synthetic media (a category in the 2021 MAD landscape) became widely known as Generative AI – a term still so new that it does not have an entry in Wikipedia, at the time of writing.
The rise of Generative AI has been several years in the making. Depending on how you look at it, it traces it roots back to deep learning (which is several decades old but dramatically accelerated after 2012) and the advent of Generative Adversarial Networks (GAN) in 2014, led by Ian Goodfellow, under the supervision of his professor and Turing Award recipient, Yoshua Bengio.
Its seminal moment, however, came barely five years ago, with the publication of the Transformer (the “T” in GPT) architecture in 2017, by Google – see the post by Google Research, and the now famous paper “Attention is all you need.”
Coupled with rapid progress in data infrastructure, powerful hardware and a fundamentally collaborative, open source approach to research, the Transformer architecture gave rise to the Large Language Model (LLM) phenomenon.
The concept of a language model itself is not particularly new. A language model’s core function is to predict the next word in a sentence.
However, Transformers brought a multimodal dimension to language models. There used to be separate architectures for computer vision, text and audio. With Transformers, one general architecture can now gobble up all sorts of data, leading to an overall convergence in AI.
In addition, the big change has been the ability to massively scale those models.
OpenAI’s GPT models are a flavor of Transformers that it trained on the Internet, starting in 2018. GPT-3, their third generation LLM, is one of the most powerful models currently available. It can be fine tuned for a wide range of tasks – language translation, text summarization, and more. GPT-4 is expected to be released sometime in 2024, and rumored to be even more mind-blowing. (Chat GPT is based on GPT 3.5, a variant of GPT-3).
OpenAI also played a driving role in AI image generation. In early 2021, it released CLIP, an open source, multimodal, zero-shot model. Given an image and text descriptions, the model can predict the most relevant text description for that image, without optimizing for a particular task.
OpenAI doubled-down with DALL-E, an AI system that can create realistic images and art from a description in natural language. The particularly impressive second version, DALL-E 2, was broadly released to the public at the end of September 2022.
There are already multiple contenders vying to be the best text-to-image model. Midjourney, entered open beta in July 2022 (it’s currently only accessible through their Discord*). Stable Diffusion, another impressive model, was released in August 2022. It originated through the collaboration of several entities, in particular Stability AI, CompVis LMU, and Runway ML. It offers the distinction of being open source, which DALL-E 2 and Midjourney are not.
But, those are not even close to the exponential acceleration of AI releases that occurred since the middle of 2022.
In September 2022, OpenAI released Whisper, an automatic speech recognition (ASR) system that enables transcription in multiple languages as well as translation from those languages into English.
Also in September 2022, MetaAI released Make-A-Video, an AI system that generates videos from text.
In October 2022, CSM (Common Sense Machines) released CommonSim-1, a model to create 3D worlds.
In November 2022, MetaAI released CICERO, the first AI to play the strategy game Diplomacy at a human level, described as “a step forward in human-AI interactions with AI that can engage and compete with people in gameplay using strategic reasoning and natural language.”
In January 2023, Google Research announced MusicLM, “a model generating high-fidelity music from text descriptions such as “a calming violin melody backed by a distorted guitar riff.”
Another particularly fertile area for Generative AI has been the creation of code.
In 2021, OpenAI released Codex, a model that translates natural language into code. You can use codex for tasks like “turning comments into code, rewriting code for efficiency, or completing your next line in context.” Codex is based on GPT-3, and was also trained on 54 million GitHub repositories. In turn, Github Co-pilot uses Codex to suggest code right from the editor.
Text, image, code… Generative AI can also produce incredible avatars (here, created with Synthesia*):
The inevitable backlash
The exponential acceleration in AI progress over the last few months has taken most people by surprise. It is a clear case where technology is way ahead of where we are as humans in terms of society, politics, legal framework and ethics. For all the excitement, it was received with horror by some and we are just in the early days of figuring out how to handle this massive burst of innovation and its consequences.
ChatGPT was pretty much immediately banned by some schools, AI conferences (the irony!) and programmer websites. Stable Diffusion was misused to create an NSFW porn generator, Unstable Diffusion, later shut down on Kickstarter. There are allegations of exploitation of Kenyan workers involved in the data labeling process. Microsoft /Github is getting sued for IP violation when training CoPilot, accused of killing open source communities. Stability AI is getting sued by Getty for copyright infringement. Midjourney might be next (Meta is partnering with Shutterstock to avoid this issue). When an A.I.-generated work, “Théâtre d’Opéra Spatial,” took first place in the digital category at the Colorado State Fair, artists around the world were up in arms.
AI and jobs
A lot of people’s reaction when confronted with the power of Generative AI is that it will kill jobs. The common wisdom in years past was that AI would gradually automate the most boring and repetitive jobs. AI would kill creative jobs last, because creativity is the most quintessentially human trait. But here we are, with Generative AI going straight after creative pursuits.
Artists are learning to co-create with AI (podcast with Karen K Chang). Many are realizing that there’s a different kind of skill involved. Jason Allen, the creator of Théâtre d’Opéra Spatial (see above), explains that he spent 80 hours and created 900 images before getting to the perfect combination.
Similarly, coders are figuring out how to work alongside Co-Pilot. AI leader, Andrej Karpathy, says Co-Pilot already writes 80% of his code. Early research seems to indicate significant improvements in developer productivity and happiness.
It seems that we’re evolving towards a co-working model where AI models work alongside humans as “pair programmers” or “pair artists.”
Perhaps AI will lead to the creation of new jobs. There’s already a marketplace for selling high quality text prompts – Promptbase.
A serious strike against Generative AI is that it is biased and possibly toxic. Given that AI reflects its training dataset, and considering GPT and others were trained on the highly biased and toxic Internet, it’s no surprise that this would happen.
Early research has found that image generation models, like Stable Diffusion and DALL-E not only perpetuate, but also amplify demographic stereotypes.
At the time of writing, there is a controversy in conservative circles that ChatGPT is painfully woke.
Another inevitable question is all the nefarious things that can be done with such a powerful new tool.
New research shows AI’s ability to simulate reactions from particular human groups, which could unleash another level in information warfare.
Gary Marcus warns us about AI’s Jurassic Park moment – how disinformation networks would take advantage of ChatGPT, “attacking social media and crafting fake websites at a volume we have never seen before.”
AI platforms are moving promptly to help fight back, in particular by detecting what was written by a human vs. what was written by an AI. OpenAI just launched a new classifier to do that, which is beating the state of the art in detecting AI-generated text.
Is AI content just… boring?
Another strike against Generative AI is that it could be mostly underwhelming.
Some commentators worry about an avalanche of uninteresting, formulaic content meant to help with SEO or demonstrate shallow expertise, not dissimilarly from what content farms (a la Demand Media) used to do ( What are the new AI chatbots for? Nothing good).
As Jack Clark pouts in his OpenAI newsletter: “Are we building these models to enrich our own experience, or will these models ultimately be used to slice and dice up human creativity and repackage and commoditize it? Will these models ultimately enforce a kind of cultural homogeneity acting as an anchor forever stuck in the past? Or could these models play their own part in a new kind of sampling and remix culture for music?”
Finally, perhaps the biggest strike against Generative AI is that it is, often, just wrong.
ChatGPT in particular is known for “hallucinating”, meaning making up facts, while conveying them with utter self-confidence in its answers.
Leaders in AI have been very explicit about it, like OpenAI CEO’s Sam Altman here:
The big tech companies have been well aware of the risk.
MetaAI introduced Galactica, a model designed to assist scientists, in November 2022, but pulled it after three days. The model generated both convincing scientific content and convincing (and occasionally racist) nonsense.
Perhaps due to the Duplex backlash in 2018, Google kept LaMBDA, the powerful conversation model it launched in 2021, very private, available to only a small group of people through AI Test Kitchen, an experimental app. See Jeff Dean about reputational risk here
The genius of Microsoft working with OpenAI as an outsourced research arm was that OpenAI, as a startup, could take risks that Microsoft could not. One can assume that Microsoft was still reeling from the Tay disaster in 2016.
However, Microsoft was forced by competition (or perhaps could not resist the temptation) to open Pandora’s box and add GPT very publicly to its Bing search engine.
Under pressure from OpenAI and Microsoft, Google also rushed to market its own ChatGPT competitor, the interestingly named Bard.
This did not go well either, and Google lost $100B in market capitalization after Bard made factual errors in its first demo (Bard is still available only to a small group of beta users, at the time of writing).
The business of AI: Big Tech has a head start over startups
The question on everyone’s minds in venture and startup circles: what is the business opportunity? The recent history of technology has seen a major platform shift every 15 years or so for the last few decades: the mainframe, the PC, the Internet, mobile. Many thought crypto and the blockchain architecture was the next big shift but, at a minimum, the jury is out on that one for now. Is Generative AI that once-every-15-years kind of generational opportunity that is about to unleash a massive new wave of startups (and funding opportunities for VCs)? Let’s look into some of the key questions.
Will incumbents own the market?
The success story in Silicon Valley lore goes something like this: big incumbent owns a large market but gets entitled and lazy; little startup comes up with a 10x better technology; against the odds and through great execution (and judicious from the VCs on the board, of course), little startup hits hyper-growth, becomes big and overtakes the big incumbent.
The issue in AI is that little startups are facing a very specific type of incumbents – the world’s biggest technology companies, including Alphabet/Google, Microsoft, Meta/Facebook and Amazon/AWS.
Not only are those incumbents not “lazy”, but in many ways they’ve been leading the charge in innovation in AI. Google thought of itself as an AI company from the very beginning (“Artificial intelligence would be the ultimate version of Google… that is basically what we work on”, said Larry Page in 2000). The company produced many key innovations in AI including Transformers, as mentioned, Tensorflow and the Tensor Processing Units (TPU). Meta/Facebook We talked about how Transformers came from Google, but that’s just one of the many innovations that the company has released over the years. Meta/Facebook created PyTorch, one of the most important and used machine learning frameworks. Amazon, Apple, Microsoft, Netflix have all produced groundbreaking work.
Incumbents also have some of the very best research labs, experienced machine learning engineers, massive amounts of data, tremendous processing power, enormous distribution and branding power.
And finally, AI is likely to become even more of a top priority, as it is becoming a major battleground.
As mentioned above, Google and Microsoft are now engaged in an epic battle in search, with Microsoft viewing GPT as an opportunity to breathe new life into Bing, and Google considering it a potentially life-threatening alert.
Meta/Facebook has made a huge bet in a very different area – the metaverse. That bet continues to prove to be very controversial. Meanwhile, it’s sitting on some of the best AI talent and technology in the world. How long until it reverses course and starts doubling or tripling down on AI?
Amazon/AWS has certainly been very active in ML/AI over the years, with a suite of tools that cuts across many categories of the MAD landscape. As its business largely targets developers, it has been less immediately present in the Generative AI debate of the last few months, however. We expect the company to keep making moves in the space, along the lines of its just announced partnership with Hugging Face.
Is AI just a feature?
Beyond Bing, Microsoft quickly rolled out GPT in Teams. Notion launched NotionAI, a new GPT-3-powered writing assistant. Canva launched its own AI tools. Quora launched Poe, its own AI chatbot. Customer service leaders Intercom and Ada* announced GPT powered features.
How quickly, and seemingly easily companies are rolling out AI-powered features seem to indicate that AI is going to be everywhere, soon.
In prior platform shifts, a big part of the story was that every company out there adopted the new platform – businesses became internet-enabled, everyone built a mobile app, etc.
We don’t expect anything different to happen here. We’ve long argued in prior posts that the success of data and AI technologies is that they eventually will become ubiquitous, and disappear in the background. It is the ransom of success for enabling technologies to become invisible.
What are the opportunities for startups?
However, as history has shown time and again, don’t discount startups. Give them a technology breakthrough, and entrepreneurs will find a way to build great companies.
Yes, when mobile appeared, all companies became mobile-enabled. However, founders built great startups that could not have existed without the mobile platform shift – Uber being the most obvious example.
Who will be the Uber of Generative AI?
The new generation of AI Labs are perhaps building the AWS, rather than Uber, of Generative AI. OpenAI, Anthropic, Stability AI, Adept, Midjourney and others are building broad horizontal platforms, upon which many applications are already being created. It is an expensive business, as building large language models is extremely resource intensive – although perhaps costs are going to drop rapidly (Training Stable Diffusion from Scratch Costs <$160k (Mosaic blog)) The business model of those platforms is still being worked out. OpenAI launched ChatGPT Plus, a paying premium version of ChatGPT. Stability AI plans on monetizing its platform by charging for customer-specific versions.
There’s been an explosion of new startups leveraging GPT in particular for all sorts of generative tasks, from creating code to marketing copy to videos. Many are derided as being a “thin layer” on top of GPT. There’s some truth to that, and their defensibility is unclear. But perhaps that’s the wrong question to ask. Perhaps those companies are just the next generation of software, rather than AI, companies. As they build more functionality around things like workflow and collaboration on top of the core AI engine, they will be no more, but also no less, defensible than your average SaaS company.
We believe that there are many opportunities to build great companies:
- vertical-specific or task specific companies that will intelligently leverage Generative AI for what it is good at.
- AI-first companies that will develop their own models for tasks that are not generative in nature.
- LLM-Ops companies that will provide the necessary infrastructure.
And many more. This next wave is just getting started, and we can’t wait to see what happens.