Reality and Artificial Intelligence

Given the nature of the current artificial intelligence or AI landscape, any attempt to discuss the technology is like trying to describe an ocean wave racing towards you. You might posit that the wave’s amplitude was high and that its travel velocity was likewise high, but even as you attempt to further describe the phenomenon, the wave breaks, and another has taken its place.

Among AI models, the large language model, or LLM, currently reigns supreme. LLMs excel at parsing and generating sequential data, which make them particularly good at working with language. They interact with humans using “prompts”; the human prompts the LLM with a query and the LLM generates a response. A brief conversation can be had in this manner. Like a human conversationalist, the LLM considers prior exchanges such that its response remains in context of the overall conversation. The fact that LLMs are such shockingly good conversationists has fueled mass speculation that they will soon meet or surpass humans in intelligence. This prospect has been so thoroughly covered in academia, literature, film, the press, think tanks, and through so many different lenses, from philosophical to technical to artistic, with the coverage increasingly produced by the LLMs themselves, that it feels almost presumptuous to add to it. The conversation, however, is nowhere close to being over. It is only getting started.

In this essay, I will explore whether it is possible for human-like intelligence to emerge from computer systems that are so unlike our own. By “unlike”, I don’t mean the differences that exist on the molecular level, the differing underlying “substrates” on which the two systems are built. Humans, like all other Earth life, are carbon-based, while computers are not. That is clearly relevant when discussing things like phenomenal consciousness, or the “what it is like” quality of experience. For the purposes of this essay however, I am not talking about the first-person, internally experienced “what it is like” quality of phenomenal consciousness, but of observable capabilities: intelligence.

Intelligence is another word whose meaning varies widely. I will therefore define it for my purpose here as the ability to learn how to perform modern, average-level human tasks. A human teenager, for example, learns to drive a car after only a handful of lessons. An even younger human child, albeit reluctantly, is easily trained to do chores around the house, loading and unloading the dishwasher, laying the dinner table, folding and putting away the clothes from the dryer, and so on. Might an LLM, paired with a robotic body, learn to perform these tasks while also holding a conversation, like a human child can?

First, a summary of how LLMs that use the currently widely implemented transformer architecture work. The human interlocutor begins the conversation by inputting a “prompt”. The LLM breaks up this prompt into “tokens”, where a word, subword, or even a letter, can be considered a token. It is the function of each token to evaluate the importance of other tokens in relation to itself and assign an “attention score” to all the other tokens in the prompt sequence. The attention scores are assigned bidirectionally; with each new token introduced into the sequence, every token preceding it dynamically adjusts all its previously computed vector attention scores. Vectors, because there are multiple layers of information in each attention score, including, for example, the token’s position within the text. The attention scores are used by the transformer to consider context from the entire conversation when formulating its response. How does the model take that initial step of knowing which words to weigh heavier than others? That would be the result of its training, which has increasingly relied on humans in a process called reinforcement learning human feedback, or RLHF, usually architected in a modular system, with several “expert” modules working in concert. We are talking here about a model that has already been trained. [1]

For each token in the prompt, the model generates three matrixes: the Q (query), K (key) and V(value). The Q matrix represents the attention scores the token has assigned to other tokens, the K matrix contains the attention scores that other tokens have assigned to it, and the V matrix is what results when the other two are multiplied. The V or value matrix thus indicates how relevant that token is in the context of the overall query. Armed with values for all tokens in the query, the model generates an “activated query”, which is a context-dependent query token that understands what the important bits are in the query, because the important bits are more heavily weighted. The model then reaches into its pre-labeled knowledge dataset and generates a relevant response. As should be apparent, there are a whole lot of matrices being multiplied at any moment, so many in fact that Google has built a specialized piece of hardware called the tensor processing unit that specializes in the task.

Foundational LLMs, or the models built from scratch, used to run on computers with an array of postage-stamp sized chips called graphical processing units, or GPUs. Today’s LLMs like chatGPT run on far more formidable “super chips” that integrate multiple functions including processing and memory, as used in Nvidia’s DGX 200 system, for example.

However, the most basic functional unit of any electronic device, including the most powerful super chips remains a transistor. A transistor does one of two things: it either amplifies or stops current flow.  Both the input and output of a transistor can be expressed as current/no current. This is not an analog signal that by constantly fluctuates but is instead a discrete signal that takes one of two values: on or off.[2]

It seems hardly conceivable that the capabilities of LLMs are rooted in components whose only function is to switch between on/off or 1/0 binary type states. Yet, trillions of these basic electronic components working in concert are responsible for all the extraordinary things that AI systems can do. And there are parallels to how biological systems function.

Consider the laws of inheritance. Each human inherits 23 chromosomes from each parent. Within each chromosome are hundreds of genes. As the father of genetics Gregor Mendel observed with pea plants in the mid-eighteenth century, the method of inheritance involves a dominant and recessive gene. There is no intermingling between the two. Each individual gene will be on or off. Heredity, one of the fundamental processes of life, can be viewed as a digital process.

It is no accident that a neural network is the umbrella term under which AIs like LLMs fall. This is because they were initially modeled on the human nervous system, a network comprised of hundreds of specialized cells called neurons. The shapes of neurons differ according to their type, but in general, they are comprised of three main parts. On one end are the branching dendrites that reach out into the inter-neuron spaces called synapses. The dendrites attach to the squat central processing unit of the cell, the soma, on the other side of which is the tail-like axon.

The dendrites can gauge the level of chemicals called neurotransmitters, that is present within synapses. This information is transduced into a fluctuating voltage which is transmitted to the soma for processing. If the neurotransmitter levels meet a threshold, then the soma “fires” an electrical impulse called an action potential towards the axon. When the axon receives this electrical signal, it releases the chemical neurotransmitters into the adjoining synapse, which in turn is picked by the dendrites of surrounding neurons.

The action potential is a digital signal; it either fires the electrical signal or it doesn’t. But unlike the LLM’s transistor, what the neuron takes as input and what it releases as output is a chemical analog-type signal, a signal that constantly fluctuates. Thus, unlike a transistor, a neuron processes both analog and digital information.

Does it matter that on the most basic component level, LLMs are unable to process analog signals? Will this make the difference between machine and human intelligence? To create an AI that will have human-like intelligence, will it need a neuron-like component that can process both analog and digital signals?[3]

To answer this question, we need to first ask if there is anything fundamentally different about analog and digital information. And to answer that, we need to talk about what we mean by the word “information”. Which in turn leads us to the mathematician Claude Shannon, the father of information theory. Shannon, while working as an engineer at Bell Labs in the 1940s and 50s, was instrumental in converting the abstract notion of information to something that can be described in physical terms the sort of thing that can be measured, compressed, transmitted, and received.

Writes Shannon at the beginning of his seminal 1949 paper, The Mathematical Theory of Communication: “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is, they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will be chosen since this is unknown at the time of design.”[4]

The paper was revolutionary in many ways. One of the key attributes of information is its fungibility: the physical signal containing the information can be separated from the semantic content of the information. Consider the example how your body is processing the information in this sentence. Photoreceptor cells in the eye’s retina transduce the light into electrical and chemical signals. These signals are passed to the occipital nerve for transport to the brain’s visual cortex. The “meaning” of the signal is interpreted or decrypted in the brain. Decrypted, because if the sentence had been written in an unknown script, the same physical signal would arrive at your brain, but you would be unable to extract meaning from it, as you lack the key to decrypt the code that can convert the little squiggles on your screen to something you understand. Now consider the scenario where instead of reading a sentence, you choose to have someone read it out loud to you. The resulting sound wave enters your ear canal, which causes your eardrum to vibrate, which in turn causes three little bones in your middle ear to vibrate, which creates a pressure wave in the fluid in the inner ear, which in turn stimulate hair cells in the cochlea to transduce this motion into an electrical and chemical signal which travels up the auditory nerve to the brain. The brain decrypts this signal to extract meaning. Note that the information has taken several forms in its journey to your brain even as meaning was retained throughout. One of the many revolutionary insights that Shannon had was to see that this fungibility of information allowed for the quantification of information. It is worth noting here that it was Shannon’s master thesis that showed that any operation in Boolean logic could be constructed as a combination of transistors, that is anything that can be described logically, can be implemented with circuits. Not only did Shannon propose a theory of information, but he also demonstrated how to implement it.

Based on Shannon’s theories, we have built ubiquitous electronic systems that convert information from one type to another. Modern analog to digital converters or ADCs, that are used to record music for example, can deliver very high fidelity. The ADC achieves conversion by sampling the sound wave at a given time interval or “sampling rate” and storing it at a “bit depth” or resolution, which is the number of bits that is used to represent the recorded sample. To achieve 100% fidelity, the ADC would need an infinite sampling rate and resolution, which is impossible.

What if there were other ways to digitally capture the kind of subtlety that analog signals can capture? Quantum computing, for example, uses the principles of quantum mechanics to store data, not in traditional bits that take either 1 or 0 values, but in qubits that can exist in states of 1, 0 or any combination (superposition) of both at the same time. Quantum computing has made impressive gains in recent years. IBM’s Q System 2, for example, surpassed the 1000 qubit barrier in late 2023. 1000 qubits allow the system to represent 2^ 1000 states or on the order of 10^301 states simultaneously; for comparison, it is estimated that are on the order of 10^82 atoms in the observable universe. 1000 qubits is the number that is often associated with “quantum supremacy”, or the point at which a quantum computer can perform specific tasks far more efficiently than any classical computer. In the context of LLMs, this could include perhaps the fabled capability of “generalization” which is the ability to perform well on tasks and data that the LLM has not previously been trained on.

Quantum computing comes with its own set of formidable problems. Error correction, for example, is particularly challenging for quantum systems. A qubit is inherently noisy, in that it is highly susceptible to environmental interference, leading to loss of coherence. Qubits can lose information when measured. A qubit cannot be cloned, which means many qubits are required to store the same state, reducing overall computational power. [5]

Quantum computing is based on the theory of Quantum Mechanics, which is often claimed to be the most successful theory that the field of physics has ever produced. Quantum Mechanics has succeeded in providing accurate predictions for known hypotheses from classical physics, but also its every novel testable hypothesis has been verified. The standard-bearer for the theory is the Schrödinger Wave Equation, formulated by the physicist Erwin Schrödinger in his 1926 work, “Quantization as an Eigenvalue Problem”. Like Newton’s Second Law in Classical Mechanics, the Schrödinger Equation can be conceptualized as the ability to predict the movement of a physical system over time, given initial conditions. Unlike Newton’s Second Law, the solution to the Schrödinger Equation is itself a mathematical construct, the Wave Function, denoted as Ψ or psi. The wave function Ψ(x), for example, represents the probability density of finding a particle at position x.

There are analogies between Ψ(x) and the qubit. They both represent quantum states of a system. They both represent superposition states that collapse when measured. However, while Ψ(x) represents the probability density over a continuous spacetime formulation, a qubit represents the discrete quantum state of a qubit.

Before wave mechanics, there was a mathematically equivalent formulation to it called Matrix Mechanics, that was developed by Werner Heisenberg, Max Born, and Pascual Jordan in 1925. Instead of a continuous representation of a physical system evolving over time as in the wave equation, in matrix mechanics, physical characteristics such as energy, momentum and position are each represented as a matrix that evolves over time. The equations of motion involve matrix algebra.

Matrix Mechanics represented the time-dependent behavior of a physical system as the superposition of all its possible states. A qubit is likewise represented by a time-evolving matrix, that represents a superposition of digital quantum information. A qubit, however, represents information and not matter. That distinction was of interest to the physicist John Wheeler, who posited a radical idea about the fundamental nature of reality in his 1989 work, “Information, physics, quantum: the search for links” as part of the Proceedings III International Symposium on Foundations of Quantum Mechanics in Tokyo, Japan.

Wheeler, already famous for his monumental contributions to nuclear fission among other things, suggests that the universe is “participatory” that is to say the universe owes its existence to the act of observation: “No element in the description of physics shows itself as closer to primordial than the elementary quantum phenomenon, that is, the elementary device-intermediated act of posing a yes-no physical question and eliciting an answer or, in brief, the elementary act of observer-participancy. Otherwise stated, every physical quantity, every it, derives its ultimate significance from bits, binary yes-or-no indications, a conclusion which we epitomize in the phrase, it from bit.”

The universe, posited Wheeler, is composed not of matter, but instead is woven out of information. Matter, at its very bottom, could not just be represented by information, but was information. And not the continuous analog type of information either, but digital, 1/0 type of information.[6]

While it may seem obvious to those in the field why achieving artificial general intelligence is the ultimate computational prize, it may not be so to everyone. After all, computational systems already far outdo humans in an ever-growing number of gamification-type tasks, including achieving total domination over our best players at ones that were once thought uniquely suited to human cognition, like Chess and Go. Specialized computational systems are being designed and trained at a furious pace in almost every field from medicine to astronomy. What is it about building a machine that can learn like we can that makes it singularly exciting?

There are practical reasons. A machine that can “watch” and learn to perform new tasks quickly, without the need for copious retraining using additional data sets, could solve long-standing problems. For example, the physically or mentally impaired among us, those who are unable to care for themselves, could receive round-the-clock care from a machine with human-like capabilities. Unpleasant or dangerous tasks, fire, or water rescues for example, that require learning-while-doing to respond to new exigencies, could be handed over to machines. The weakest and most vulnerable among us, along with the ambitious, the talented, the dreamers, the makers, the care-givers, individuals, families and communities, all could gain access to new and bold opportunities, that had hitherto been out of reach.

In addition to the numerous practical applications that would make life immeasurably better for untold numbers, there is also the sense of awe that an accomplishment of this magnitude would inspire. Arguably, the last time humanity achieved something comparably spectacular was on July 20th, 1969, when American astronauts Neil Armstrong and Buzz Aldrin walked on the moon. That audacious feat continues to inspire billions. Building a machine with the capability for human-like adaptability and learning can likewise spur optimism. Robotic missions to space, for example, could slam open the door that was inched open decades ago. Machines can be sent ahead to build the support systems necessary for human survival in other environs. Humanity became a sea-faring species and with the help of machines, we can become a spacefaring one as well.

There is also a further consideration. If Wheeler is right and the fabric of reality can be represented as some version of qubit, then artificial general intelligence running on quantum computational systems could usher in a new era, one where our biggest questions are answered, and our biggest problems are solved. Some have called an event of this magnitude the “singularity” as it metaphorically represents the mathematical notion of infinity. Although infinity is often thought of as an abstraction, it is postulated by astronomers that the density at the center of mass of a certain kind of collapsed star approaches infinity. Fittingly, it was John Wheeler who gave a name to this kind of mysterious cosmic object, calling it a “black hole”, because even light cannot escape its gravity well as circumscribed by its “event horizon”. Likewise, the “singularity” for computational systems is posited as a system whose intelligence gallops up asymptotically towards infinity. Continuing the analogy to the black hole, the computational system is posited to asymptotically collapse upwards after a threshold has been reached, conceptually like the black hole’s “event horizon”. Whether such a threshold exists, and what it would mean if a computational system breached it, has been the subject of thorough speculation for decades.[7] What we should all be able to agree on, however, is that we are hurtling towards an exciting future.

 

 

 

 

[1] https://armanasq.github.io/nlp/self-attention/

[2]   https://resources.nvidia.com/en-us-dgx-platform/nvidia-dgx-platform-solution-overview-web-us

https://armanasq.github.io/nlp/self-attention/

https://www.theregister.com/2024/03/18/nvidia_turns_up_the_ai/

[3]  https://www.science.org/doi/10.1126/science.aaj1497

Brain Communication: What calcium channels remember | eLife (elifesciences.org)

https://mcgovern.mit.edu/2019/02/28/ask-the-brain-how-do-neurons-communicate/

[4] https://pure.mpg.de/rest/items/item_2383164/component/file_2383163/content

https://dspace.mit.edu/bitstream/handle/1721.1/11174/34541447-MIT.pdf

https://www.cs.virginia.edu/~evans/greatworks/shannon38.pdf

http://neilsloane.com/doc/shannonbib.html

 

[5] IBM Quantum System Two: the era of quantum utility is here | IBM Quantum Computing Blog

ibm.com/roadmaps/quantum/2027/

2309.15642 (arxiv.org)

A topic-aware classifier based on a hybrid quantum-classical model | Neural Computing and Applications (springer.com)

2307.08191 (arxiv.org)

A framework for demonstrating practical quantum advantage: comparing quantum against classical generative models (nature.com)

 

[6] https://philarchive.org/rec/WHEIPQ

https://philpapers.org/archive/WHEIPQ.pdf

[7] https://singularitynet.io/

Please take a moment to support Amazing Stories with a one-time or recurring donation via Patreon. We rely on donations to keep the site going, and we need your financial support to continue quality coverage of the science fiction, fantasy, and horror genres as well as supply free stories weekly for your reading pleasure. https://www.patreon.com/amazingstoriesmag

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Article

The Big Idea: R. B. Lemberg

Next Article

AMAZING NEWS FROM FANDOM: July 21, 2024

You might be interested in …