- Microsoft's Copilot AI is coming to your Office apps - whether you like it or not
- How to track US election results on your iPhone, iPad or Apple Watch
- One of the most dependable robot vacuums I've tested isn't a Roborock or Roomba
- Sustainability: Real progress but also thorny challenges ahead
- The 45+ best Black Friday PlayStation 5 deals 2024: Early sales available now
Generative AI: A paradigm shift in enterprise and startup opportunities
Embeddings proved immensely successful as a representation of language and fueled an exploration of new, more powerful neural net architectures. One of the most important of such architectures, the “transformer,” was developed in 2017. The transformer is a neural network architecture designed to process sequential input data, such as natural language, and perform tasks like text summarization or translation. Notably, the transformer incorporates a “self-attention” mechanism. This allows the model to focus on different parts of the input sequence as needed to capture complex relationships between words in a context-sensitive manner. Thus, the model can learn to weigh the importance of each part of the input data differently for each context. For example, in the phrase, “the dog didn’t jump the fence because it was too tired,” the model looks at the sentence to process each word and its position. Then, through self-attention, the model evaluates word positions to find the closest association with “it.” Self-attention is used to generate an understanding of all the words in the sentence relative to the one we are currently processing, “it.” Therefore, the model can associate the word “it” with the word “dog” rather than with the word “fence.”
Progress in deep learning architectures, efficiently distributed computation, and training algorithms and methodologies have made it possible to train bigger models. As of the time of writing this article, the largest model is OpenAI’s ChatGPT3, which consists of 173 billion parameters; ChatGPT4 parameter information is not yet available. ChatGPT3 is also noteworthy because it has “absorbed” the largest publicly known quantities of text, 45TB of data, in the form of examples of text, all text content of the internet, and other forms of human expression.
While the combined use of techniques like transfer learning, embedding, and transformers for Generative AI is evolutionary, the impact on how AI systems are built and on the adoption by the enterprise is revolutionary. As a result, the race for dominance of the foundation models, such as the popular Large Language Models (LLMs), is on with incumbent companies and startups vying for a winner-take-all or take-most position.
While the capital requirements for foundation models are high, favoring large incumbents in technology or extremely well-funded startups (read billions of dollars), opportunities for disruption by Generative AI are deep and wide across the enterprise.
Understanding the technology stack
To effectively leverage the potential of generative AI, enterprises and entrepreneurs should understand how its technology layers are categorized, and the implications each has on value creation.
The most basic way to understand the technologies around generative AI is to organize them in a three-layer technology “stack.” At the bottom of this stack are the foundation models, which represent a transformational wave in technology analogous to personal computing or the web. This layer will be dominated by entrenched incumbents such as Microsoft, Google, and Meta, rather than new startup entrants, not too different from what we saw with the mobile revolution or cloud computing. There are two critical reasons for this phenomenon. First, the scale in which these companies operate, and the size of their balance sheets are pretty significant. Secondly, today’s incumbents have cornered the primary resources that fuel foundation models: compute and data.