N.B. This was a difficult essay to write. Not only is it long, but the issues are complex and many of the technical concepts are difficult to explain in accessible terms. I labored to add anecdotes, illustrations, and metaphors to lessen the cognitive load. You, dear reader, are the ultimate judge of my success.
Saying the current artificial intelligence (AI) mania is excessive is a bit like saying the Pacific Ocean is vast and deep. Vacuously true, the challenge lies in separating the hype from the very real capabilities, the opportunities, and the challenges – whether they be social, economic, or geopolitical.
Most of today’s hype concerns generative AI (computing systems that can create text, images, videos, and even software by generalizing from patterns learned from large amounts of data). Deep learning and generative AI are subsets of the broader field of AI that exploit very large artificial neural networks, systems that crudely mimic the neurons of the brain. Of these, the most popular and successful are those based on transformers, which power systems such as ChatGPT.
Aside: One of my academic colleagues once rather ruefully noted that AI seemed to be defined by the things we did not know how to do. Once a solution was found, it was promptly declared to be an algorithm, not AI. Deep learning may prove to be the exception.
How does generative AI really work? What made it possible? What are its limitations? How is it affecting our present and how will it shape our future? Should we be worried or optimistic?
The answers to some of these hard questions are easy, while the answers to some of the easy questions are quite hard. This essay is a meditation on the landscape of AI – the questions and some of the answers – in eleven parts. It’s just a fancy ramble across philosophy, biology, mathematics, computing, ethics, and geopolitics. As Morpheus says in The Matrix:
You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit hole goes.
Though each of the following parts of this long (twenty-five page) essay can be read individually, they collectively span a range of philosophical, technical, and economic issues that define the current context for AI. As a complement, I offer a pdf (Download HPCDan AI Meditations) should you prefer to read offline and a reading list (Download HPCDan AI Reading List) for additional background.
- Learning and Intelligence
- Cogito, Ergo Sum (I Think, Therefore I Am)
- Strong AI
- Machine Learning Nomenclature
- Generative AI 101
- An Intuitive Rationale for Neural Network Training
- Bigger Is Different: Technology Confluence
- Lessons from Biology
- Culture, Business, Science, and Geopolitics
- AI Hype and Reality
- Maxims for the Future
Aside: To appreciate how deep neural networks and generative AI work in detail, it really helps to have a working knowledge of calculus (particularly derivatives and the chain rule) and linear algebra (i.e., operations on vectors and matrices). If not, you may want to just skim the section entitled Generative AI 101. If you want a quick review, this tutorial may be of value.
The most important thing to know about the current AI revolution is that involves three things – lots and lots of digital data (images, text, video, data) from the web, so called big data, huge configurations of hardware (fast PCs and GPUs), mostly operated by cloud providers such as Google and Microsoft, and lots of linear algebra to train and use neural networks. The results are AI systems that can recognize written and spoken questions and output human-like responses composed of text, image, and video. Conceptually, today’s generative AI really is that simple – statistical predictions by really big computers, based on lots of data.
Let’s start by a bit of history and philosophy. Then, we will deconstruct some of the buzzwords surrounding AI, discuss the generative AI revolution enabled by deep learning transformers, and dive deeper into the rabbit hole and dichotomy between hype and reality.
1. Learning and Intelligence
Anyone who has ever observed a young child exploring his or her surroundings is alternately amused, chagrined, and amazed. Children are innately curious, uninhibited by cultural norms and experience. They learn via a wide variety of mechanisms, both formal and informal – observation, imitation, repetition (e.g., practicing primary school multiplication tables), and experimentation. Children are also deeply dependent on socialization and adult supervision for training and protection from dangers their experience has not conditioned them to avoid.
Although developmental psychology and neuroscience tell us much about the emotional, intellectual, and neural development of young minds, there is even more we do not know. For example, we know that neuroplasticity is high in childhood, allowing children to learn new languages easily, then it declines with age, though it does not entirely disappear as once thought. Nor do we fully understand the effects of environment on learning abilities or the reasons why different children learn in different ways. The nature versus nurture debate (it’s a mix of both) continues to rage.
No place is our limited understanding of brain functions more obvious than in our crude treatments for mental illness and dysfunction. From invasive and damaging surgeries through electrical shocks to drugs whose functions and side effects are poorly understood, we treat the symptoms with only limited efficacy. For all our advances, it is still a crude and limited practice.
Meanwhile, advanced imaging techniques, controlled experiments, and dissection studies, including mapping the connectome of small organisms, are illuminating much about the biochemical and electrical properties of brains, their gross morphology, and the microstructures of neurons and dendrites. Here questions of perceptual models, neural pathways, and the neuroscience of free will intersect. With each advance, our incomplete understanding grows, leaving us still uncertain about fundamental brain functions, albeit confused on a higher intellectual plane than before.
Aside: Illustrating the power of generative AI, the image at right of a neuron and its dendrites was generated by the Stable Diffusion generative AI, using the prompt, “Generate a biological image of several neurons connected by their dendrites and axons, with the neurons in the forefront of the image.”
The European Union spent roughly €600 million and ten years on the recently completed Human Brain Project, which sought to understand brain structure at multiple levels – chemical, electrical, structural, and cognitive – and to build complex computational models. With an initial and wildly overambitious goal of simulating the entire human brain at the cellular level, the project was fraught with controversy over scope and priorities. Not only were our computing capabilities inadequate to the task, so too was our fundamental knowledge.
Perhaps the greatest success of the European Union project was the Human Brain Atlas, which now includes multilevel data, spanning cellular and molecular systems up through functional models and connectivity. The U.S. launched a similar project, the BRAIN Initiative, albeit with more modest goals, as have China, Australia, and South Korea.
Despite this expanding wealth of knowledge, we still lack a first principles understanding of language acquisition, face recognition and image identification, knowledge acquisition and representation, locomotion and fine motor skills, abstract reasoning, problem solving, and many other hallmarks of biological intelligence.
Much like early attempts to create heavier than air flying machines, we have biological existence proofs of intelligence, but we lack the fundamental understanding that would inform either creation or replication of alternate designs. Nor is this particularly surprising. Although birds and insects rely on the same aerodynamic principles as airplanes, their mechanisms – muscles and flapping wings – are not the same as the engines of modern aircraft.
As the Wright brothers demonstrated, an understanding of the basic physics principles of aerodynamics and lift is necessary but insufficient; one also needs an understanding of flight control surfaces and a high power-to-weight engine. Even over a century later, our understanding of fluid flow and turbulence is incomplete, despite steady advances in computational fluid dynamics.
Put another way, successful heavier than air flight was dependent on both theoretical insights and practical engineering. Leonardo da Vinci, for all his genius, was unlikely to have developed an internal combustion engine capable of supporting flight with the engineering capabilities of his day, nor did he have the theoretical understanding of lift and fluid flow needed to design viable flying machines. Newton’s and Leibniz’s discovery of the calculus was still two centuries in the future.
Undoubtedly, we will eventually understand the underlying first principles of biological intelligence, but until then we build models based on our current knowledge and insights. We then continually refine and test those scientific models, comparing them to human experiences and capabilities. Today, some of our best models are simplified mathematical representations of biological neurons and their network of dendrite connections. As simplified models they are necessarily approximations, wrong in many cases, but extraordinarily useful in others.
As the statistician George Box once remarked, “All models are wrong; some models are useful.” As with all models, it is critical to understand their underlying assumptions and their domain of applicability. Absent that, the old aphorism, “garbage in, garbage out” applies.
2. Cogito, Ergo Sum (I Think, Therefore I Am)
Despite the hype and the impressive capabilities of deep learning systems, it is important to realize we are still far from artificial general intelligence – strong AI – a system capable of the entire range of intellectual tasks possible by an educated human adult. Today’s AI systems are a bit like precocious children, alternatively impressive, dazzling with their intellectual prowess, and frustratingly limited, suffering at times from a lack of common sense knowledge. Like a child seeking to impress, they also can exaggerate and fabricate, falsely making claims not supported by data; in the argot of AI, these are euphemistically called hallucinations.
What does it mean to be intelligent, to be sentient, to be self-aware, to have a theory of mind (i.e., ascribing mental states to others), to have volition and free will? There are many philosophical, religious, and technical perspectives on existence and thinking, as well as what would constitute an artificial general intelligence, or even how one might unequivocally identify such behavior. For a wonderful romp through computability, intelligence, art, and music, I highly recommend Douglas Hofstadler’s 1979 Pulitzer Prize winning book, Gödel, Escher, Bach: An Eternal Golden Braid. It will make you think about the connectedness of many things.
Looking much earlier in Western thought and history, René Descartes’ 1637 Discourse on the Method of Rightly Conducting One’s Reason and of Seeking Truth in the Sciences touched on these issues, offering the now famous phrase, cogito, ergo sum (I think, therefore I am). Perspicuously, Descartes also contemplated animal and machine intelligence, claiming that intelligent language response was beyond the reach of any machine:
For we can easily understand a machine's being constituted so that it can utter words, and even emit some responses to action on it of a corporeal kind, which brings about a change in its organs; for instance, if touched in a particular part it may ask what we wish to say to it; if in another part it may exclaim that it is being hurt, and so on. But it never happens that it arranges its speech in various ways, in order to reply appropriately to everything that may be said in its presence, as even the lowest type of man can do.
Descartes was wrong, as today’s deep learning networks have shown. They can surprise – and on occasion, even dazzle – with their locution and inventive responses.
Since Descartes early observations, generations of philosophers and theologians have contemplated the nature of thought. With the World War II rise of modern digital computing systems, mathematicians, engineers, and computing researchers turned their own minds to the formal capabilities of computing machines and operational definitions of thought.
The foundational work by Church, Gödel, and Turing elucidated the mathematical formalisms for computing that still define the field, defining the theoretical limits on machine computation. (See The Magic Behind the Curtain: Hardware and Software.) Then, in his seminal 1950 paper, Computing Machinery and Intelligence, Alan Turing wrote
I propose to consider the question ‘Can machines think?’ This should begin with definitions of the meaning of the terms ‘machine’ and ‘think.’
Turing then goes on to describe the original Turing test, an “imitation game” that relies on a human interrogator and a series of questions and responses. If that human interrogator could not distinguish the written answers of a machine from those of a human, then the machine could reasonably be said to think. Alas, this test, though intuitively appealing, has proven too limiting in practice.
Today’s large language, generative AI, models, such as OpenAI’s GPT-4, or Google’s Bard, can readily pass such a Turing test but would not meet most people’s notion of general intelligence. Conversely, some cynics might note that, though inadequate as a true test of general intelligence, more than a few humans could not pass the Turing test either.
Some may rightly object on philosophical grounds, noting that theory of mind – ascribing mental states and motivations to others via a model of their thinking – is widely viewed as an inherent attribute of intelligent entities. This is a version of Searle’s “Chinese room” argument against functionalism and computationalism.
Functionalism is the thesis that all mental states are constituted solely by their functional roles – there causal relation to other mental states, sensor inputs, and behavioral outputs. Inherent in this thesis is the notion of multiple realizability, that these functional roles could be realized in biology or via other mechanisms (e.g., computers). Turing also addressed several concerns about functionalism in his original paper, and his operational notion is still apt. He also rightly noted that true intelligence is not just a set of task-specific skills but the ability to acquire new skills via learning.
John von Neumann, the namesake of von Neumann computer architecture, contemplated a different angle, the notion of machines capable of reproducing themselves, now called von Neumann replicators. Still others have long debated the theory of embodied cognition, that many attributes of cognition are shaped by aspects of the organism’s body. As a biological matter, this is demonstrably true. Whether a disembodied intelligence, one absent physical manifestation and associated experiences, can exist remains an open question.
Others have debated the meaning free will and sought to measure its existence. Neuroscience experiments show that there can be intent to act before self-awareness of the action. Is our brain simply building a model of reality? The rabbit hole goes even deeper if one considers the criticality of the observer in many quantum experiments, where the act of observing is required to cause superposition to collapse.
3. Strong AI
The passion to build an artificial general intelligence – a so-called strong AI capable of any intellectual task that humans can perform – has long been the holy grail of AI. As computer scientist Danny Hillis, then at Thinking Machines, once opined, echoing the dreams of many, “I want to build a machine that will be proud of me.” That captivating dream continues to motivate the work of thousands of computing scholars and the investment of hundreds of billions of dollars.
Nor is it surprising.
Generations of scientists and engineers have spent their professional careers making real the imaginary science fiction artifacts of their childhood – at least those not forbidden by the laws of physics. Due in no small part to that inspiration, today we have cellular telephones, global communication networks, high-definition television, electric vehicles, supercomputers, reusable rockets, and a host of other products that were once real only in the fecund imagination of science fiction writers.
The yearning – and the fear – of strong AI is equally deep and prevalent in popular culture, where the societal implications of such a strong AI have been the subject of countless books and movies. From Asimov’s I, Robot series through Forbidden Planet (arguably one of the best science fiction movies ever made, and a retelling of Shakespeare’s The Tempest), HAL, the emotionally troubled computer in 2001: A Space Odyssey, Colossus: The Forbin Project, The Day the Earth Stood Still, Blade Runner, and, of course, Skynet and the Terminator, the notion of AI has alternately fascinated and terrified generations of movie goers.
More recent movies, such as The Matrix, Moon, Her, and Ex Machina, have explored human relationships with sentient AIs and robots. Many of these cultural speculations center on variations of superintelligence and artificial consciousness, elements of the strong AI hypothesis, namely that an intelligent system with human capabilities would possess consciousness and a mind. (Think of Data on Star Trek.)
Like Turing, I believe in operationalization, the definition of a quantity via measurable, rather than philosophical attributes, though I realize not all ascribe to such a view. Operationally, a machine whose intellectual capabilities, including appropriate responses to other intelligent entities, are indistinguishable from those of a human is, in every practical and operational sense, intelligent.
Put more colloquially, if it quacks like a duck, walks like a duck, and acts like a duck, then ipso facto, it is a duck. As AI researchers and developers, we are trying to build some smart ducks. If we succeed, we will see if the smart ducks share Hillis’ hoped for pride in us.
4. Machine Learning Nomenclature
Both the popular press and the research literature are filled with AI buzzwords, and the web contains a variety of introductory tutorials, though most assume a working knowledge of linear algebra, elementary probability and statistics, and calculus.
It is impossible to fully explain generative AI, machine learning, deep learning and artificial neural networks (ANNs) in just a few paragraphs, but I am foolishly going to attempt an act of expository jujutsu by providing a bit of definitional context and rudimentary understanding. We will begin with learning nomenclature, followed by a description of the broad types of artificial neural networks, and conclude with some simple examples.
Broadly, machine learning systems are typically classified as supervised, unsupervised, or reinforcement learning, and it is worth noting that many successful deep learning systems use several of these techniques (and others) together. Let’s start with definitions and intuitions of supervised, unsupervised and reinforcement learning, then we will explain the use of large artificial neural networks and generative AI.
In supervised learning, the answers are known, and the AI system is trained to recognize the correct answers. For example, if one were training an AI system to recognize dogs and cats, it would be trained with a set of images that are labeled as dogs and cats, and its output (the classification) would be one of these categories.
The learning is supervised – with feedback on mistakes – just as a teacher might gently correct a young child learning to recognize animals in books and photographs. In addition to classification (i.e., categorization), supervised learning can also produce a numerical value (e.g., predicting the likely distances to objects in photographs). This is called regression and can be viewed as a form of function approximation.
Conversely, in unsupervised learning, the answers are not necessarily known, and the AI system must learn the patterns (labels) from unlabeled data. Learning this way is obviously harder, just as it would be for humans asked to identify important patterns in large volumes of data. There is no immediate feedback on correctness or even if the correct categories have been identified. Continuing our analogy, the important things might be the trees, the color of the birds, or whether the dogs and cats are standing or not.
Aside. As a philosophical aside, inductive reasoning (i.e., learning a function or learning a physical law from examples) is one of the ways science works. It is the crux of the experimental method. The other, of course, is deductive reasoning, where one derives potential (testable) hypotheses from known laws. Intuitively, machine learning and algorithms are the two computing manifestations of this idea.
When AI methods first began to gain traction in science, there was great resistance in the scientific community, one long trained in deductive reasoning and the power of computational models, themselves based on tested theories. However, as the value of deep neural networks became increasingly obvious – by allowing exploration of parameter spaces via reduced, more efficient models – and by solving heretofore intractable problems (e.g., protein folding), sentiment began to change. Such is the nature of scientific revolutions – a paradigm shift has now occurred.
Finally, in reinforcement learning, feedback rewards correct responses as the system is trained to learn the best outcomes. For example, when learning to play a game such as chess or Go, there is not always a single “correct” move, just moves that are better or worse. During reinforcement learning, the AI system adjusts its metrics repeatedly to recognize when moves yield good or better outcomes, much as we might reward a child for winning a game or give a dog a treat for learning to roll over on command. Robots are trained this way, and Google’s AlphaGo and its successors (AlphaZero and MuZero) used variants of this approach to defeat the world’s best Go players.
All this is in striking contrast to the custom-built Deep Blue hardware and evaluation algorithms like alpha-beta pruning used by IBM to defeat chess grandmaster Garry Kasparov. Similar, though much simpler, algorithmic approaches were first used by Samuel in the 1950s to play the game of checkers. Most impressively, systems such as MuZero learned entirely via self-play and had no access to the rules of the games, or even tables of known opening moves or endgames; nothing was “hardwired.” After being trained, AlphaGo and its successors were essentially unbeatable, either by other chess computers or Go systems, or by humans. Never again will humans defeat such machines.
The increasing difficultly of unsupervised, supervised, and reinforcement learning would come as no surprise to any human teacher who has crafted lesson plans to emphasize context and relationships or to any adult who has fearfully tried to translate driving rules into practical guidance for a teenager eager to drive.
As any autodidact knows, it is easier to learn with a teacher, no matter how much of a committed library denizen one might be. Libraries are a cornucopia of information, but they often lack context and evidentiary interdependence. On this I speak from personal experience, exploring some topics in great depth, but lacking even the awareness of others (See Libraries: Arms Too Short to Feed the Mind.)
Aside: One of my favorite cartoons shows a teacher speaking to a born digital student, books in hand. In the cartoon, the teacher patiently explains, “It’s called reading. It’s how you upload new software into your brain.”
5. Generative AI 101
As noted at the outset, machine learning is a subfield of artificial intelligence that uses algorithms that learn to solve problems by analyzing data and identifying patterns. In turn, deep learning is (in part) a subset of machine learning that leverages artificial neural networks (ANNs) and big data.
Artificial neural networks (ANNs) are modeled on biological neurons, albeit ones simplified in function and with fewer connections. Typically, an ANN consists of an input layer of neurons, multiple “hidden” layers of interconnected neurons, and an output layer. Beginning with the first (input) layer, the nodes (neurons) in each layer transform their inputs and send their outputs to the next layer, culminating with the result of the neural network at the output layer.
The characteristics of an artificial neural network are defined by the features of the artificial neuron (node) – typically the type of transformation applied (the activation function), and the weights associated with the neurons. Activation functions are usually simple non-linear scaling functions of the node input – sigmoid, hyperbolic tangent (tanh), ridge (e.g., ReLU rectifier), radial (e.g., Gaussian), or fold functions.
Artificial neural networks with large numbers of layers and nodes – often billions or trillions of parameters – are called deep learning networks. As the name suggests, large language models include very large numbers of parameters and are tailored to recognize and generate natural languages, often using transformers (more on transformers later).
Multilayer Perceptrons. An artificial neural network can be of several types. The simplest is a strictly feedforward neural network, where the flow of information is unidirectional from the input nodes through one or more hidden layers to the output node(s), much like a combinational circuit. These are often called multilayer perceptrons. Conceptually, a perceptron is a very simplified model of a biological neuron. ANNs are often used for pattern recognition or classification.
As an example, the figure above shows a simple feedforward artificial neural network with three inputs, two hidden layers, each with four neurons, and two outputs. In this example, the initial inputs – the input layer – are [x1, x2, x3] (e.g., a linearized vector of pixels from an image if this were an image classifier).
Each node in the hidden layers (the nodes in green) takes the inputs from the previous layer, then computes a dot product of its four inputs and the four node weights (plus a bias) at each node, then applies the activation function (in this case a sigmoid function) to normalize the value of this dot product to the range [0,1] as the node output. This repeats, layer by layer, until the accuracy of the entire network is computed over the outputs [y1, y2] via a loss function; this simple example uses the mean squared error (MSE). If this network were a classifier, it might use a SoftMax function to identify the probability of each possible output.
In the simplest case, artificial neural network training consists of repeatedly adjusting the node weights to minimize the derivative of the loss function – the difference between the desired neural network output and the actual output. The network is trained with some data, the errors are then computed, the weights are adjusted to reduce the error, and the process repeats.
Conceptually, neural network training can be viewed as iterative, stochastic gradient descent (i.e., following the derivative of the loss function) using backpropagation to compute the gradient on each iteration. This involves use of the chain rule from calculus, as the derivatives at the last layer are a function of the derivatives at the previous layer, which continues recursively back to the initial layer (i.e., it propagates backward, hence backpropagation).
Aside: Remember that the chain rule expresses the derivative of composed functions in terms of the derivatives of the functions, meaning the derivative of f(g(x)) is f'(g(x))g'(x).
Finally, if you would like to play with a real, albeit toy, feedforward neural network to see how the weights and errors evolve, I highly recommend this interactive neural network solver. It is a toy network, but it illustrates how real neural network training iteratively adjusts network weights.
Recurrent Neural Networks. Unlike an ANN, a recurrent neural network (RNN) has bidirectional information flows, where the output of some nodes can affect subsequent inputs to the same nodes, which means recurrent neural networks have some memory. RNNs are often used for handwriting recognition and speech recognition, when some context (memory) is required to recognize the next character or utterance.
Intuitively, knowing and remembering the previous letters in a word or the previous words in a sentence make it easier to predict the next ones. However, anyone who has sent text messages knows that single word prediction and error correction based on the immediately preceding word is fraught with error. Reducing such errors requires more context (i.e., recognizing more than just the previous word or two), which is the motivation for deeper RNNs and, more recently, attention-driven transformers.
Recurrent neural networks use exactly this idea, with the output of some neurons looping back to become inputs to those same neurons. In this way, RNNs incorporate the notion of memory. As the figure below shows, one can view an RNN as unrolling the behavior of neurons in time to create an even bigger ANN, where the unrolling represents how long something is remembered.
The astute reader will have immediately realized that this makes training the neural network even more complicated. Fortunately, the conceptual idea is simple, even if the implementation is more complex. The solution is called backpropagation through time. One simply applies the same chain rule idea, while also including the notion of time, recognizing that the function space is now even more complex, with the greater possibility of multiple local minima and both exploding and vanishing gradients (i.e., there is no obvious and immediate adjustment that would reduce the error of the loss function).
As with ANNs, there are many variants, each RNN variant designed to either address a shortcoming or accommodate some particular domain. Long short-term memory (LSTM) is one particular type of RNN that attempts to manage the vanishing gradient problem by increasing how long an RNN remembers. Intuitively, LSTMs provide more history and context (e.g., relating later words in a long sentence to words earlier in the sentence.
Convolutional Neural Networks. Finally, convolutional neural networks (CNNs) also exploit context, but in a different way, and they are often used for image classification and natural language processing (i.e., speech recognition and translation). After all, an image is not a random collection of black and white or colored pixels, nor is a sentence a random collection of words; both have structure and objects. There are shapes and edges, as well as light and dark regions, and there are nouns and verbs. CNNs recognize and exploit this structure using, as the name suggests, convolution functions – the application of one function to another.
A convolution layer in a CNN might use an edge detector (e.g., a Sobel operator) to reduce the size of the image - and by extension the size of the neural network - before passing the reduced image to the next neural network layer. Note that unlike an ANN, not all neurons are fully connected in an CNN. Commonly, a CNN would include multiple convolution layers and some pooling layers that combine the outputs of groups of neurons at one layer to become one neuron in the next layer (i.e., the pool results). At the end, the results are to one or more fully connected output layers.
Intuitively, a CNN seeks to raise the abstraction layer from pixels or words to features and objects. The human visual system does the same; it discards the overwhelming volume of image data, focusing attention on features in the visual field. We also recognize the important words in sentences and their relationships. The important features, of course, depend on context.
During training, a neural network may identify the “wrong” features as important, including recognizing pixel patterns that are invisible to the human eye. There is an old story of one system trained to recognize automobiles, only to fail badly when tested with new images of automobiles. Many of the training set images included clouds in the image backgrounds, and the AI system had identified the clouds as the key feature in the data set. It later failed because the images of automobiles had no clouds.
As even this cursory tutorial suggests, there are seemingly endless variations of ANNs, RNNs, and CNNs, specialized to particular functions or tasks. They differ primarily in the shape of the network, the activation functions, and training methods. Two of the most popular variants of late have been generative adversarial networks (GANs) and transformers.
Generative Adversarial Networks. Generative adversarial networks (GANs) frame machine learning as unsupervised (self-supervised) learning with two learning sub-models. A generative model is trained to generate new outputs (e.g., synthetic images of dogs), while a discriminative model seeks to classify the outputs of the generative model as fake (not realistic) or credible (e.g., look like dogs). The two models are trained concurrently until the discriminative model cannot reliably distinguish real from fake.
GANs have been widely used to synthesize artificial faces or videos that could pass as actual photographs or videos. One of their more pernicious uses has been to create deep fakes, mapping real human faces or bodies into non-existent contexts. As such, they have been highlighted for their potential dangers, including generating fake news, whether text, imagery, or video, using the persona of public figures, or creating deepfake pornography using the faces of famous individuals.
Transformers Are Revolutionary. It is no exaggeration to say that the 2017 Google paper, Attention Is All You Need, revolutionized deep learning. The transformer architecture is quickly replacing CNNs and RNNs in many (though not all) contexts, bringing the advantages of both while ameliorating some of their disadvantages. Reflecting the pervasive influence of transformers, ChatGPT is an acronym for Chat Generative Pre-Trained Transformer.
Originally, transformers were designed for natural language translation. As such, they consisted of two parts. The first was an encoder that mapped an input sequence in the input language into a sequence of vector embeddings that represented encoded content of the input message. Following this, a decoder used the encoded representation to generate the translation in the output language. Attention mechanisms are a critical component of both the encoder and the decoder, measuring the relative importance of each part of the input sentence to every other part.
Based on the notion of attention, transformers make connections between separated elements of input data (e.g., words in a sentence that are logically connected) in ways difficult for other neural networks. This attention makes it possible, for example, to distinguish the two different references of “it” in this sentence pair, something humans do intuitively:
The dog ate the meat because it was hungry.
The dog ate the meat because it was tasty.
In the first case, “it” refers to the dog, whereas in the second, “it” refers to the meat. In both of these cases, “it” is separated from its backward reference, albeit by varying distances.
N.B. I am grateful to my friend Dennis Gannon for portions of the transformer description below.
The attention mechanism output can be viewed as a matrix, where each row represents a word in the input and each column in the row represents the relative importance of that word to every other word. To capture the greatest possible context about the words in the input sentence, the sentence is fed to several (four or more) copies of the attention mechanism in parallel, so-called multiheaded attention. When optimally trained, the copies will each capture different short- and long-range dependencies in the input terms. The outputs of the attention blocks are then concatenated. Finally, an encoder is built from a stack of six or more multiheaded attention blocks.
With a stack of multiheaded attention blocks, a decoder is similar to an encoder, but with one modification. The attention matrix is masked so that each word is only compared to the words that precede it, and the decoder produces the probability of the next word in the output. The figure below, from the original transformers paper, shows how all these pieces fit together.
For many language generation and classification tasks it is not necessary to have both an encoder and decoder. BERT, the first major large language model, contained only an encoder. The most recent large language models (e.g., GPT-3, GPT-4, PaLM, and LLaMA) are all based on a decoder only architecture. These models also exhibit “zero-shot” training on different tasks, meaning they do not need to be retrained to learn a new skill – it is all in the prompt.
Like the late night infomercial, wait, there’s more! Transformers are also an enabling part of stable diffusion, a technique that exploits the language processing capabilities of transformers to translate the text into a numeric representation. This then feeds a diffusion model that iteratively denoises a random input to create a photorealistic image of what was described by the textual input. It seems like magic, but it is not; give it a try here.
It's Linear Algebra All the Way Down. A careful examination of how neural network weights are computed shows that they largely involve computing dot products, producing a scalar by multiplying two vectors – the inputs from nodes in the previous layer and the node weights. Stepping back even further shows that computing all the outputs of a layer actually involves matrix multiplications, each of which consists of many dot products of extraordinarily large, sparse matrices – large because of the billions of parameters, the neural network weights. Moreover, these matrices are getting even bigger, with multi-trillion parameter models now being trained.
Wonderful you say, we have 64-bit IEEE floating point vector instructions in all of today’s microprocessors, and we have GPUs designed for matrix-matrix operations, originally for the shading and graphics vector transforms needed for computer gaming and more recently used to accelerate scientific computations. Yes, that’s good, but we can do even better – compute them faster and more energy efficiently – if we examine the numerical properties of the matrices.
In practice, these operations do not need either a great deal of numerical precision (i.e., lots of bits in the floating point number mantissa) or lots of range in the floating point exponent. In fact, today’s GPUs, the ones designed for machine learning, and many of the custom AI accelerators, use only 16 or 8 bit floating point numbers – so called bfloat16 and bfloat8.
As an aside, this also has important implications for scientific computing, which has long depended on 64 and 128 bit floating point arithmetic. If all the money invested in research and development for AI shifts the hardware landscape toward lower precision – and it already has based on the number of GPUs and AI accelerators installed – it will require a rethinking of many numerical algorithms and an assessment of their numerical stability in this new regime.
And, It’s Expensive. Do we have a deep and complete theoretical understanding of why these neural network approaches work? Simply put, no. We have only experimental evidence, some rules of thumb, and some validated intuition. The lack of such a theory is part of the reason the pace of experimentation is so rapid, and the competition is so intense. More broadly, a deeper model would illuminate explainable AI (i.e., how deep neural networks generate predictions and their veracity and validity.)
Alas, no exponential is forever, except in mathematics textbooks. There are practical economic limits on how large these large models can grow. Today’s scale, with trillions of parameters, months of training, and billions of dollars of infrastructure cannot increase indefinitely. Though there is still headroom, research is already turning to more energy efficient architectures (e.g., analog, neuromorphic chips that can implement training and inference at much lower power) and to smaller language models that are trained on a higher quality corpus of data than that available solely on the web. As two notable examples, IBM has pioneered fabrication of multiple generations of neuromorphic chips, and Microsoft has released two research versions of its phi natural language and common sense reasoning systems.
Make no mistake, building deep learning systems that are more capable than a primary school student still requires months of training on some of the fastest computers in the world, supported by billions of dollars in computer hardware, with deep social, economic, and geopolitical implications for the future. It is not easy; otherwise, enterprising secondary school students would be building a HAL 9000 as their science fair project. Nevertheless, there is hope that technical advances will change this reality.
6. An Intuitive Rationale for Neural Network Training
Dan, you say, you have just used a just a bunch of fancy words, invented by Ph.Ds. to describe obscure and arcane concepts. You’d be right. We need a cool, refreshing example. How exactly does one adjust all those neural network weights to minimize the error of the predictions?
Suppose you are standing on a hill on the edge of a large city, one beside the ocean. You cannot see the ocean, but you want to walk to the shore, which means – wait for it – you need to go downhill. So, what do you do? You pick the steepest downhill slope and start walking. At each street intersection, you again pick the steepest downhill slope and turn that way. By always going downhill via the steepest route, you will get there as fast as possible, as long as there are no intervening hills. A gradient is a fancy name for a slope – the derivative – in multiple dimensions of a function.
Gradient descent for backpropagation works exactly like walking downhill to find the ocean. It iteratively computes new weights for the neural network, starting from the right and computing backward (left) until it reaches the inputs. Its goal is to update the weights in ways that reduce the error of the loss function (i.e., the computed error in the output) and improve the predictions. To minimize a multivariate function (and any useful neural network has large numbers of variables – the weights), gradient descent uses the slope of the function, its derivative, and changes the function parameters (the neural network weights) accordingly to reduce the loss function.
Anyone who has tried to walk downhill in a city know there are lots of choices. How long do you go in one direction before pausing to reconsider the path? In machine learning, this is called the step length or learning rate. What if it looks flat in every direction and you can no longer follow a downhill path and you have to guess? This is called the vanishing gradient problem in machine learning. What if it is uphill in all directions and you know you have to climb some hills before you can go down again? In this case, the solution space is non-convex (I.e., the function may have multiple local minima).
For functions with trillions of parameters, computing the function’s derivative is – no surprise – computationally expensive. To simplify the problem, we approximate the gradient, this is the stochastic element. Continuing the city and ocean analogy, when you look at the streets and pick the steepest slope, you are relying on a visual estimate, an eyeball guess, not a strict survey measurement.
Finally, backpropagation is the mechanism for computing the estimated derivative. It relies on the Leibnitz chain rule from calculus, where the approximate derivative (the gradient) is iteratively computed backward, from the output layer through the hidden layers to the input layer. The astute reader will have noted that true gradient descent assumes the function is convex (i.e., it has a single global minimum).
Let’s recap. Scale makes a quantitative and qualitative difference. CNNs help with image classification by reducing images to an increasingly smaller number of higher level features. RNNs introduce memory, allowing them to understand word context, not just a word-by-word processing. Transformers generalize this idea by looking at more distant word context and can look at groups of words together. CNNs, RNNs, and transformers all depend on deep neural networks, the basic multilayer perception.
7. Bigger Is Different: Technology Confluence
The thoughtful reader may rightly be asking what triggered the current AI revolution. After all, neural networks are not a new idea, dating in various forms from the 1950s, beginning with Rosenblatt’s perceptron, an early, single layer neural network with linear activation functions.
Exorbitant claims of perceptron capabilities in vision, speech, natural language processing and translation, and even consciousness, were soon dashed. Then, Minsky and Papert’s 1969 book, Perceptrons, famously and controversially showed that single layer perceptrons could not compute (learn) some common functions. In retrospect, this result, though true, was overly negative. Nevertheless, the response helped trigger the first AI winter, a rapid decline in research funding and interest in AI.
Since then, a variety of universal approximation theorems have shown that artificial neural networks with even one hidden layer (i.e., multilayer perceptrons (MLPs)) with enough hidden units can approximate any continuous function for inputs within a specific range. This is a very powerful result, which provides the theoretical basis for the generality and usefulness of today’s large artificial neural networks.
A second surge in AI interest in the 1970s and 1980s was triggered by the over-promise of qualitative reasoning and expert systems, the failure of DARPA investments to deliver the promised military capabilities, and the related demise of the U.S. Strategic Computing Initiative and the Japanese Fifth Generation computing projects that had again promised human-like capabilities.
After over a half century of struggles, frustration, and despair, what has suddenly made machine learning via neural networks all the rage? In a word, scale, massive scale.
That scale includes massive amounts of imagery, video, and text on the web, itself a result of 1980s and 1990s government investments in supercomputing, markup languages, search engines, and the information superhighway, together with the emergence of powerful, yet inexpensive workstations and PCs. When combined with the truly unprecedented size and economics of commercial clouds, it was finally possible to build and train extraordinarily large neural networks for the first time.
Recognizing the trillion dollar economic opportunities, the leading cloud providers (Amazon, Microsoft, and Google) have each invested tens of billions of dollars in hardware to support AI training (i.e., determining the weights for the neural network that minimize the error) and inference (i.e., application of the trained neural network to generate outputs in response to queries). With market capitalizations near to or in excess of one trillion dollars (U.S.) and large amounts of cash on hand, these companies dominate the AI ecosystem.
The training hardware consists of massive CPU clusters, large numbers of GPU accelerators (typically from NVIDIA), and custom-designed AI accelerator hardware (e.g., Google’s Tensor Processing Units (TPUs) or Amazon’s Inferentia). With hundreds of billions of parameters, it has been estimated that training GPT-4 consumed a substantial fraction of a calendar year, while the underlying hardware operated at (lower precision) exascale (i.e., operations per second), a performance level only recently achieved in scientific computing. For more background on the rapidly growing costs and scale for AI training, I highly recommend this arXiv paper.
It is almost impossible to over-estimate the volume of data used in deep neural network training. A big and increasing part of the human knowledge base is digital and now available on the web, and it is being scraped and used for neural network training. Imagine using billions of images and more digital text than all of humanity could read in a lifetime. All that big data – and more – is input to the insatiable maw of machine learning. In fact, there are some worries that these training systems may exhaust the world’s supply of digital data.
Aside: Having spent most of my professional career working in supercomputing and computational science, as a researcher, as director of the National Center for Supercomputing Applications (NCSA), and as a science and technology policy advisor, I always viewed myself as a “big iron” guy. When I moved to Microsoft, I realized I was actually just a “fast iron” guy; the scope and scale of the cloud now dwarfed that of government-funded supercomputing. With today’s investments in massive computing infrastructure, the cloud and AI vendors are now both big iron and fast iron players.
So, what is the advantage of massive scale? After all, computers and networks have been getting faster and cheaper and data has been growing larger for decades. Interestingly, it turns out that bigger artificial networks are not just quantitatively bigger, they are also qualitatively better. Together with algorithmic advances (e.g., transformers, GANs, autoencoders, and a host of other software approaches), once artificial neural networks cross a certain size threshold, their performance on a range of human tasks jumps dramatically.
Aside: I have used ChatGPT to write code for user interfaces using GTK, create multiclass queueing network models, develop PDE solvers, and critique my own writing. Is it perfect? No, but the time saved more than compensates for the occasional bug. Self-referentially, ChatGPT will also help you generate machine learning code using Keras. I am an amateur astronomer and astrophotographer, so I asked ChatGPT to generate code to recognize images of spiral galaxies. In response, ChatGPT quickly generated a parameterized CNN with hints about how to complete the parametrization.
In many other cases, deep learning tools substantially exceed that of human capabilities in speech recognition, handwriting recognition, reading comprehension, language understanding, computer vision, music composition, and game playing (e.g., chess and Go). Deep learning systems have also proven adept at common professional examinations, they perform as well or better than humans on licensure tests in law and medicine, and they score well on college and graduate school admissions examinations. Finally, at the time of this writing (September 2023), the Museum of Modern Art is hosting an interactive exhibit of a neural network trained on the museum’s collections. Unsupervised, the generative art composition of Refik Anadol is both impressive and mesmerizing.
Do these systems meet the definition of artificial general intelligence? Absolutely not! They can be brittle, with sometimes laughable outputs outside their training domains. At times they also exhibit hallucinations – confidently asserting statements not supported by facts. However, for the first time, AI systems are exhibiting impressive and practical capabilities in a wide range of intellectually and economically valuable domains.
8. Lessons from Biology
The connectome – the map of nervous system connections – of the small worm, Caenorhabditis elegans (c. elegans), a frequent subject of laboratory studies, was manually assembled in 1986 in Brenner’s laboratory; see this classic paper for details. Creating the connectome required identifying individual neurons in electron microscope images of microtome cross-sections and then manually and laboriously connecting them to create a logical map of the approximately 7,000 connections among the 302 neurons for the adult hermaphrodite.
The 302 neurons of C. elegans are either sensory, motor (controlling muscles) or interneurons that connect to the other two. Leveraging the connectome, an active community of modelers, OpenWorm, is now seeking to simulate all 959 cells of hermaphrodite C. elegans. This work has initially focused on simulating the 95 muscle cells and the 302 neurons of the connectome. Put another way, they are seeking to bring a digital model of the worm to life, all within a computer.
In 2023, a team mapped the complete connectome of the larval brain of Drosophila melanogaster, the common fruit fly. Like C. elegans, Drosophila has long been used as a model organism for biological studies, but it is a much more complex organism than the simple worm. The larval Drosophila brain contains 3,016 neurons and ~548,000 synaptic sites, while the adult fly brain has 125,000-150,000 neurons and tens of millions of synapses. By comparison, the human brain is estimated to contain roughly 86 billion neurons and approximately 100 trillion synapses.
Unlike the manual connectome mapping of C. elegans, the Drosophila larval connectome mapping exploited deep learning models to identify neurons and connections, greatly accelerating the process.
Aside: One can explore the Drosophila connectome via the Virtual Fly Brain website.
Although the Drosophila connectome is small compared to that of a human, it is capable of a wide range of complex behaviors, including learning. Its vision system can detect and avoid obstacles, and it can walk and fly. Like all biological organisms, it feeds, and it reproduces. It short, it is an autonomous, goal-seeking entity.
Using these connectomes, multiple groups have built computer models of the worm and fruit fly nervous system. Unlike the simplified neurons in artificial neural networks, accurate models of biological neurons are much more complex, with neurons transmitting information only when the membrane potential reaches a specific threshold value.
A neuron model that fires when a threshold is reached are called spiking neuron models, usually modeled with a leaky integrate-and-fire model or some variant. Unlike the linear algebra underlying the neurons in artificial neural networks, these activations as best modeled as differential equations.
The early results of these computational models (see Shiu et al, for example) are fascinating, because they show it is possible to generate testable hypotheses about neural network behavior. Put another way, stimulating the simulated network with sensory inputs shows activation in areas of the neural network associated with physiological functions. For example, stimulating sugar sensing neurons in Drosophila triggers the neurons that would respond to taste. This is extraordinary, as it shows we are edging closer to understanding how biological neural networks generate complex behaviors.
What do these natural neural networks and their computational models teach us? First, although they are capable of complex behavior and response, biological neural networks operate at extremely low power. A human brain, consumes roughly 20 watts, which though tiny compared to the megawatts required to train artificial neural networks, is still an admittedly significant portion of the body’s total energy demand. A worm or fly brain’s energy consumption is vanishingly smaller.
Second, natural neurons, like artificial ones, are highly connected, but natural neurons are not connected in clearly defined layers. For example, human neurons may have a couple of thousand connections, not all of which are local. Finally, dependent on electrical and biochemical reactions, natural neurons are analog, rather than digital, and fire only a few times per second, at most.
As we noted when discussing the evolution of heavier than air flight, there is no obvious requirement that the engineering design of a comparable artificial neural network need follow these same implementation principles. They do suggest, however, that we are missing some fundamental insights that would enable us to design more energy efficient, yet more powerful reasoning systems than today’s generative AI systems. This realization has triggered work on so-called neuromorphic computing chips that model aspects of biological neurons in silicon. Many of these design questions are deeply intertwined with the future of semiconductor designs.
9. Culture, Business, Science, and Geopolitics
Generative AI is changing how we conduct science and engineering research, educate and train the next generation of students, pursue business and commerce, and even how we consider our defense and national security posture. Below, I outline just a few examples.
Science and Engineering. Just as computational science – the use of advanced computing to model the world around us – joined theory and experiment as an essential element of the scientific process, machine learning is rapidly becoming an essential element of science and engineering. For much of its history, science was data limited, with each new advance requiring a repeated cycle of careful experiments and manual data analysis.
With massive new scientific instruments – telescopes, genetic sequencing systems, environmental monitors, and a host of others – producing ever larger volumes of experimental data, it is now impossible to process that data deluge manually. In an era of big data, automated data analysis is de rigueur, and the same machine learning techniques used by businesses and social media are being exploited to identify galaxies in whole sky surveys, detect signals from high energy physics particle accelerators, compare genetic data and identify biological relationships, and analyze environmental data.
Similarly, AI models are advancing computational science. Via variants of the same techniques used to master chess and Go, DeepMind’s AlphaFold accurately predicted the three-dimensional structure of proteins, given their amino acid chains, a protein folding problem that had defied solution via traditional molecular modeling. This rich database of protein structures is now reshaping many areas of biological research.
Increasingly, computational scientists are also building hybrid models that replace the core partial differential equations (PDE) solver with a trained machine learning model that produces equivalent outputs. There are now thousands of such examples across the breadth and depth of science and engineering, and new examples appear daily. For instance, Nature recently published an example of improved numerical weather prediction using trained neural networks that was more accurate than today’s operational weather forecasting models.
Because the trained model is usually much faster (10,000X) than a traditional PDE solver, the hybrid model can be used to explore the parameter space much more broadly. As always – caveat emptor – one must ensure the model is not stressed beyond its domain of applicability. (See HPC, Big Data, and the Peloponnesian War.)
Aside: Thirty years ago, I was developing fuzzy logic models for performance optimization and computational steering of parallel scientific computing codes. Today, I would undoubtedly be using trained neural networks for system tuning and performance analysis.
Higher Education. Higher education has publicly wrung its hands over generative AI, focusing most often on concerns about students cheating, rather than, in my judgment, asking the more important intellectual questions. Namely, how will this promising new technology expand access and improve the quality of public education, while also empowering new approaches to research and innovation?
Instead, I believe we should embrace what Steve Jobs called the “bicycle for the mind,” using AI to amplify human intellectual capabilities and help a larger fraction of the population overcome the limitations our inadequate primary and secondary education system. (See A {Racing} Bicycle for the Mind: Helmet Recommended.) This means finding a niche that exploits the character of the institution, either leveraging existing AI technology or tailoring or building its own.
If you once believed learning Latin in high school was an effective way to teach logic, culture, English word origins, and language structure, then surely basic data literacy and familiarity with the rudiments of machine learning need to be part of everyone’s 21st century education. This means understanding how AI can best reduce or eliminate intellectual drudgery and complement the creative process and educate students about AI’s strengths and limitations. As an early experiment, one on which I am reserving judgment, the Kahn Academy has introduced Khanmigo, an AI assistant based on GPT-4.
As the philosopher Plutarch once wrote, “A mind is a fire to be lit, not a vessel to be filled.” (See Academia: Who We Are and Why It Matters.) I believe AI offers a fresh opportunity to create customized and patient tutors that can serve as teacher and tutor surrogates, while bringing excitement to student discovery and education. Much like the early computing vision of the University of Illinois’ Plato computer aided instruction system, we can build new ways of expanding access to education. (See Mind to Mind: Building Innovation.)
Aside: I recently told a group of academic deans that generative AI models such as ChatGPT now write better than most of our undergraduates ever will, and that an increasing fraction of day-to-day research activities (e.g., research paper summarization and data characterization) are being subsumed as well. Yes, I used a bit of hyperbole, but the reality of AI advances cannot be ignored.
Economics and National Competitiveness. If there is any lesson to be learned from the history of technological change, it is that in the near term we tend to overestimate the impact of change, while underestimating its long term effects. (See Predicting Our Technological Future.) Despite the AI hype, the economic and national competitiveness impacts are real and growing.
Although pundits differ on the precise numbers, there is little doubt that AI will affect a substantial fraction of 21st century jobs in some way, transforming some, eliminating others, and creating new ones. For example, McKinsey estimates the economic potential of generative AI at between $2.6 trillion and $4.4 trillion dollars annually across a selected set of use cases. That’s trillion with a T, not billion with a B.
We once smugly assumed the ability of white-collar workers to complete non-routine cognitive tasks made them impervious to automation. It is increasingly clear that we were wrong; many of those jobs are at potential risk, at a minimum of being reshaped and in many cases being eliminated. These changes place an even greater premium on rare or unique cognitive skills, just as manufacturing automation eliminated low-level jobs while rewarding boutique craft skills.
Meanwhile, the software base for deep learning is rich and diverse, and it is growing daily, creating new economic and intellectual opportunities for those with the skills to exploit them. It is now possible for a high school student to solve problems, with less than a page of Python and Keras code (itself a library built atop TensorFlow), that once bedeviled the best artificial intelligence researchers on the planet. Sites such as GitHub and Hugging Face now host large numbers of training data sets and models as well.
From text-based services such as ChatGPT, Bing, and Bard, through image and music synthesis services such as DALL-E2, Midjourney, Stable Diffusion, and Jukebox, to technical services such as GitHub Copilot (software generation), Runway (video generation), WriteSonic (writing), automated journalism, Jasper and copy.ai (marketing) and a seemingly infinite list of domain-specific AI tools, every business domain is seeing new instances of AI-infused services and technologies. AI is even invading entertainment, including standup comedy. Concurrently, a new industry is growing rapidly around prompt engineering, creating roles and framing context before asking questions of generative AIs.
Aside: In addition to the tens of billions of dollars being spent by the hyperscalers (i.e., the big cloud computing vendors), the startup market is churning. That swooshing sound you hear is the rush of venture capitalist investments into AI startups of all kinds. Much like the earlier software as a service (SaaS) and cryptocurrency booms, the smart (and the not so smart) money is rushing into AI startups – software and tools, applications, and hardware – and the AI unicorns are real. Notable examples of AI infrastructure startups include Cerebras (wafer-scale AI hardware), Graphcore, Groq, and SambaNova.
Ethics and Explainability. We lack a true model of biological intelligence, and we can rarely predict with absolute certainty how organisms will respond except under limited and controlled circumstances. For example, we do not know how teenagers learn to drive automobiles, but we have made an uneasy cultural, legal, and financial peace with that reality. Via laws, courts, and cultural norms, we define the range of desired, tolerable, and unacceptable behaviors.
Explainable AI is the analogous problem in artificial intelligence. Can we explain how an AI system generated its outputs? Conceptually yes, but in practice it is extraordinarily difficult given its scale. For example, one could trace the optimization path for all weights in a deep neural network and correlate that with the training data. However, the volume of data and the complexity of the neural network make this impossible for all but the most trivial networks.
If the AI system is proprietary, and most are, the owners will likely be reluctant to share enough information for white box testing. Instead, as with many biological learning tests, only non-invasive black box testing is possible. In this case, one can only hope to infer the internal process from a combination of stimuli and responses.
This lack of verifiably and concerns about algorithmic bias are (rightly) concerns for many ethicists and policymakers, particularly as AI systems assume larger roles in society. How safe does an autonomous vehicle need to be, and how do we balance the risks against the benefits? For example, how do we weigh the benefits of greater mobility and social access for the aged and infirm via an autonomous vehicle against the risks of them driving or being unable to drive? Similar questions arise about other high risk driving categories – inexperienced teenagers, tired and drowsy drivers, and driving while under the influence of drugs or alcohol.
Aside: Interpreting theory has long been a subject of conversation in the physics community, as quantum mechanics can only be described as often defying our macroscopic intuitions. From the Copenhagen interpretation through Everett’s many worlds interpretation, physicists have along grappled with explainability. This once led Cornell physicist David Mermin to say that the standard approach to understanding quantum mechanics was to “Shut Up And Calculate.”
Other debates concern intellectual property rights. Does training on publicly available images, text, and data constitute property theft to create new content, or is it analogous to a widely read and educated individual who draws on a lifetime of education and experience to create new intellectual property? These are just a few of the many reasons why thoughtful and informed debate about the role of AI systems in society is so important.
Recently, the European Union recently passed the EU AI Act, with a goal of ensuring that AI systems are safe, transparent, traceable, non-discriminatory, and environmentally friendly. The U.S. Congress is also exploring what aspects of AI to regulate. Meanwhile, multiple media organizations are seeking to block the use of their public content for AI training. The tension between encouraging innovation via limited regulation and protecting consumers and citizens is very real.
Defense and National Security. National security ultimately flows from economic security, the ability to manufacture the goods and create the services that shape the global economy. Trade tensions surrounding semiconductor manufacturing and the race to secure domestic semiconductor supply chains is but the latest example, itself inextricably connected to generative AI and access to the GPUs and AI accelerators needed for training and inference. For this reason, U.S. national security advisor, Anthony Blinken, has highlighted semiconductor manufacturing and AI as two of the technologies critical to national security. Leaders in the European Union and China have expressed similar perspectives.
The rise of flexible AI is also raising the prospect of semiautonomous and autonomous warfighting machines. Although not sentient as in The Terminator, the age of intelligent warfighting machines is increasingly near. It’s the battlefield version of the debate about autonomous cars and trucks. What are the rules of engagement? Who or what bears ultimate responsibility? These are complex ethical issues whose answers are not at all clear. (See Reluctant Revolutionaries, the Trolley Paradox, and Ender’s Game.)
Whatever one believes, AI competition is a global race to the future, one that will shape the future of education and training, jobs and economic competitiveness, scientific discovery and innovation, and defense and national security. It is one the U.S. cannot afford to lose, and it is deeply connected to semiconductor design and fabrication capabilities, itself a global competition for domestic supremacy, national and international supply chains, immigration and trade policy, and STEM education.
10. AI Hype and Reality
I lived through the second AI winter in the late 1980s and early 1990s, when enthusiasm waned and funding was scare. I also remember the first, in the 1970s. As many have learned, developing AI systems with human level capabilities is not easy.
Illustrating this, there is an old joke about the state of early machine translation, which, during the Cold War, targeted translation between Russian and English. As the story goes, the experts tasked the system with translating the phrase, “The spirit is willing, but the flesh is weak” into Russian and then back into English. The result was “The vodka is strong, but the meat is rotten.”
The point being that social and historical context – connotation – matter as much as denotation, and humans have broad context for interpreting sounds, images, writing, and culture. There is so much excitement today because for the first time, deep neural networks have, albeit imperfectly, broken through, capturing enough connotation and denotation via training, and are now human capable in key domains.
Are we headed for a third AI winter? I rather doubt it, as the demonstrable success of generative AI while performing practical and important tasks is too valuable for the icy winds to completely chill the field again. Having said that, I do expect some corrections, though not a deep as what followed former Federal Reserve Board chair, Alan Greenspan’s, proclamation of “irrational exuberance” at the height of the dot.com stock market boom.
What’s different this time? As noted earlier, deep learning systems, despite their limitations, are now demonstrably better than humans at a wide range of important and useful tasks. They can also craft poetry, create realistic images and art, generate code given high level written descriptions, and conduct extensive question and answer sessions on diverse topics. Rational exuberance is entirely appropriate, the Gartner hype cycle notwithstanding.
11. Maxims for the Future
Do I know how generative AI and its intellectual descendants will shape our future? No, my crystal ball is as cloudy as that of any other putative pundit. However, as a prognosticator, I do know that certain principles always apply. (See Predicting Potentialities, Reifying Futures.) Herewith are a few potential words of wisdom:
- Maxim One The AI revolution is just beginning, with societal effects potentially as profound as those of the Industrial Revolution. It is likely that there will be major economic disruptions, and concomitant social disruptions, with an increasing number of both routine and non-routine cognitive tasks being automated. The arc of technological innovation, whether in medicine and health care, economic welfare, or communication, has long been one of improved quality of human life, with new jobs and new social and economic opportunities, albeit with profound intermediate disruptions and inequalities. The challenges lie in respecting the dignity of individuals during the transition. (See The Simple Things Matter, Most of All Now.)
- Maxim Two: AI and semiconductors are fueling a new global “arms race,” one that will, at least in part, determine the shape of the 21st century geopolitical landscape. Emerging AI systems are a critical element of national defense, as both implicit behavioral shaping via deep fakes, election interference, and sentiment analysis, and explicit shaping via cyberattacks and autonomous military vehicles – drones and other systems. They also will shape which geopolitical systems achieve or maintain economic and political dominance; simply put, the values that shape the future. (See Upgrading the Future: House Science Committee Testimony.)
- Maxim Three: For all the impressive capabilities of today’s generative AI systems, there are clear limits on continued exponential increases in the hardware scale and energy consumption needed for training. We need conceptually new approaches to hardware-software co-design that includes new materials, biology-inspired architectures, and novel, low precision learning algorithms. We also need continued research to develop new AI approaches that reflect increasing knowledge of learning and biology. (See Computing Futures: Technical, Economic, and Geopolitical Challenges Ahead.)
- Maxim Four: Deeper understanding of biological intelligence can fuel further advances in AI. Future breakthroughs in AI will depend in part on understanding the underlying principles of natural intelligence. Where those ideas illuminate and when they misdirect can only be understood via continued advances in imaging and modeling, while testing model hypotheses against biological observations. As always, the science is connected, obvious retrospectively but rarely prospectively.
- Maxim Five: STEM education matters more than ever before. We need a larger cadre of trained workers who understand the underlying technologies and who can adapt to the changing nature of the workplace, both as developers of AI technologies and as workers who use AI technologies as part of their jobs. This means rethinking many elements of how we approach education as well, viewing AI as a partner rather than as a disruptive technology to be feared. (See Public Higher Education: A New Social Compact for Innovation.)
Most importantly, it is critical that we consider the interplay of these complex social, economic, ethical, and geopolitical issues. As any strategist or gameplayer will tell you, you must see the whole board.
12. Parting Thoughts
I asked ChatGPT to speculate on the future of AI. Its response is as apt as any I might offer:
Predicting the future of AI involves both exciting possibilities and challenges. As AI continues to advance, we can expect more sophisticated natural language processing, improved machine learning techniques, and enhanced problem-solving capabilities. However, ethical considerations and potential job displacement will remain important topics of discussion. The integration of AI into various industries and its potential impact on society will likely shape the direction AI takes in the coming years.
Recent Comments