The previous chapter showed that very simple machine learning methods can assist users by helping with basic programming tasks, conserving and enhancing the valuable resource of human attention as a result. The overall argument of this book is that the global “AI” business of machine learning, which is currently focused on extracting as much data from users as possible for the minimum cost, while also consuming their attention, is a tragically ineffective way of contributing to human lives and welfare. Instead, we need a moral alternative, in which the ability to instruct and configure computers is extended far more widely, delivering the capabilities of programming, through Moral Codes.
Before more detailed analysis of these opportunities, it’s important to be clear about the difference between programming languages and machine learning. To a software expert, this might seem like a stupid or naïve question, although apparently silly questions can of course be profound and important sometimes. According to current everyday conventions of the software industry, and in the textbooks and courses that prepare students for that industry, the two technical topics of programming languages and machine learning are completely distinct, with so many differences between the things being taught that it might seem hard to know where to start. A typical degree program in software engineering or computer science, of the kind I have directed myself, will probably include something like “Introduction to Machine Learning” on one hand and “Programming in Python” on the other. There are so many differences that the title of this chapter might seem like asking “why is flower arranging not like brain surgery”, or “why is riding a bicycle not like designing a factory”. Those look like the start of a joke, not a serious invitation to start listing differences. But let’s consider the broad context of what students learn, and how, in each of these areas.
An introductory class in programming involves learning to code. Students will be taught the syntax and keywords of some programming language notation, perhaps Python or Java, and will learn how to translate an algorithm into the conventional idioms of that particular programming language. Subsequent programming courses will also try to prepare students for more complex projects, in which a particular algorithm only solves one part of the problem, meaning that students have to learn to break a larger problem down into individual pieces that they can see how to code. This second aspect of programming is a lot harder, perhaps not even mentioned in an introductory class, and takes years or decades to become good at. Although it’s often claimed, when promoting courses in basic coding, that you can make any software system out of algorithms and code, this is a bit like saying you can make any house out of nails and pieces of wood. Most children can bang a nail into a piece of wood. And many introductions to coding are accessible to children from an early age. Sadly, these well-intentioned initiatives don’t even start to get you a software career. A child who loves banging nails into wood may well become an architect or engineer one day, but only after another 15 years of education.
Introductions to machine learning are not yet quite as familiar or popular as learn-to-code initiatives, but they are equally accessible to children – I have a friend who teaches machine learning methods to 8 year-olds in an after-school club. The current popularity of machine learning, now the most widely-taught approach to AI, dates back only to about 2010, when public interest was captured by a sudden increase in performance of “deep neural network algorithms”, especially after those algorithms were trained using very large quantities of free content that had been scraped from the Internet[1]. The next chapter will say more about the implications and problems of that approach. Although there were many other approaches to AI in the past, including my own research training in the 1980s, researchers now have the patronising (or perhaps ironically self-effacing) habit of calling those older methods “GOFAI” for Good Old-Fashioned AI.
Modern machine learning methods are really a branch of statistics, using algorithms that try to detect patterns in large amounts of data. The people who teach and do research in machine learning are mathematicians and statisticians, often just as happy to refer to their work as “data science” rather than “artificial intelligence”. The underlying logic of machine learning is similar in many ways to concepts that will be familiar from school statistics – collating a table of numbers, calculating an average, testing a correlation and so on. The recent habit of describing statistical “learning”[2] as “AI” has really only emerged when the amount of data became so large that it’s hard to relate it intuitively to those high-school concepts.
The impression that statistical patterns represent intelligence is particularly strong when the numbers being processed are the pixels of a digital photograph, or the words and sentences of a web page. Machine learning systems for statistical analysis of photographs are described as “computer vision”, and systems for statistical analysis of words and sentences are described as “natural language processing”. Although a single web page may have thousands of words, and a single photograph millions of pixels, the most dramatic advances have come from scraping many millions of photographs and web pages off the internet, and creating statistical algorithms that are able to detect consistent patterns across millions of pictures containing trillions of pixels, or millions of pages sharing a dictionary of thousands of different words. The algorithms all use statistics, but researchers have given them grander names such as “support vector machines”, “random forests”, “neural networks,” “transformers” and other jargon. For readers who would like a better idea of how such technical terminology arises and is used, Adrian Mackenzie is a sociological specialist in science and technology studies who has described his experiences of acquiring these statistical learning skills and terminology in his book Machine Learners[3].
I started this section by asking what is the difference between machine learning and programming, not the difference between AI and programming. The reason is, that if we lose sight of the technical reality that machine learning algorithms are simply doing different kinds of statistics, we might start to interpret some of the technical jargon from the field as if it was directly related to human intelligence. We might think that a “vision” algorithm is “seeing” things the way a human does, that a “language” algorithm is “understanding” words, that a “neural” algorithm is like the neurons in a brain, or even that a “learning” algorithm has something to do with how children learn. Unfortunately, much of the public debate about AI is based on exactly those kinds of misinterpretation. For commentators on AI who have never actually used these tools (and many have not – Adrian Mackenzie is a prominent exception who did the necessary work to use them extensively in his own research), they may not understand the statistical calculations that are really being done, and run the risk of treating the sometimes fanciful names given to the algorithm (whether neuron, forest, seeing, understanding, learning) as if the machines are doing the same things that people do. If we use such words carelessly, they become a conceptual “suitcase”, bringing with them a lot of baggage through the implications carried if the same words were being used in an everyday conversation about a person.
I’ve emphasised how naïve current discussion of these topics can be, when people don’t think carefully about what words mean. Of course, researchers do not give these names to their algorithms at random, or as jokes (at least, not often). Many are inspired by real research into the human brain, or visual anatomy, or linguistics. Some have even worked in neuroscience laboratories, perhaps measuring electrical impulses from the neuron of a fly or mouse, or measuring patterns of blood flow in human brain scans. These scientific ambitions help to motivate people doing this kind of work, but if you question them closely, they will admit that there is only the vaguest analogy between the “neuron” of a neural network (which has simply become a jargon term for a relatively simple mathematical function) and the complex electrochemical activity that would be observed in even a single animal neuron.
New algorithms do get inspired by observations of animals, just as Leonardo da Vinci was inspired by bird’s wings (among other things) to design flying machines. However, lessons from the history of technology tell us that, although observing nature can be a source of inspiration, successful engineering usually works differently. Airplanes don’t flap their wings. When words like “neuron” get carried over directly from observation of nature to implementation of algorithms, it can be hard to remember which parts were scientifically-observed statistical principles, and which parts were creative inspiration. And as I explained in the first chapter of this book, there are people who profit from avoiding proper definitions, blurring the distinction between objective statistical problems and subjective human interpretations.
So, to return to the initial question, “machine learning” currently refers to a variety of statistical algorithms that process numerical data (sometimes extracted from images or texts) to find patterns in that data. From this perspective, programming (or coding) has little direct relevance to AI – other than the mundane need to specify machine learning algorithms in a language that the computer will then discard once it starts on the supposedly “intelligent” part of using the data patterns.
If statistical machine “learning” was the only possible definition of how a machine could “learn”, then this would answer my initial question of the difference between programming and machine learning as follows: One of those things (programming) is the way we define all kinds of algorithms, while the other (statistical machine learning) is a particular kind of algorithm. However, there are other approaches to machine learning, which are not statistical, are not numerical, and don’t use neural networks. Some of those were widely understood in the era of Good Old-Fashioned AI, although as that name suggests, they are no longer fashionable, and might not even be mentioned in the introductory classes about statistical machine learning that are commonly taught as the basis of AI today.
The old-fashioned approach to machine learning, as with much GOFAI, followed a far closer analogy to human learning. In GOFAI, pieces of code recognisably resembled concepts that humans themselves might explicitly be taught, consciously perceive or talk about, rather than the statistical approach of attempting to break down raw data by some kind of poetic analogy to patterns of neurons in the brain. Writing the code of AI, in this old-fashioned approach, could be considered rather more like teaching a human. If the concepts are described using words, then teacher and student can communicate about what is being learnt. Human teachers don’t spend much time explaining how to recognise perceptual patterns of edges and shadows in order to read, or how to properly fire their neurons to do arithmetic, because this is not the right level of abstraction at which to discuss concepts.
The origins of GOFAI, much like the origins of computer science at the time of Turing, were more concerned with describing conceptual and logical principles than statistical ones. GOFAI programs worked with recognisable symbols like elephant
and canary
, and with logical relations between them such as (color elephant gray)
or (color tweety yellow)
[4]. These logical relationships do look more like the kind of things that might be taught in a classroom, and we can easily imagine teaching a computer to repeat such statements on command, as if we were teaching a child to pass an exam by memorising facts and logical relations. In the GOFAI era of AI research, from the 1950s until the end of the 20th century, the main challenge of machine learning was to create algorithms that could compute logical relations by combining coded and stored (we might say ‘memorised’ if being too literal) symbols. At a trivial level, we can imagine an algorithm that, after being taught (has cat fur)
and (has dog fur)
, could then use those individual facts to learn a general concept, by working out for itself that cats and dogs have something in common. This kind of algorithm is also a lot closer to what a philosopher might conclude by introspection about how we ourselves go about learning. Indeed many GOFAI researchers did work by introspection, thinking carefully about the logical implications of statements like these, and about the reasoning processes by which it is possible to draw general conclusions from combinations of facts and rules.
As a result, in the GOFAI era, the everyday work of the AI researcher did look much the same as the everyday work of any other kind of programmer – writing down symbols, coding rules that related the symbols to each other, and coding logical (at that time, not statistical) algorithms that work out other facts from the combinations of rules and symbols that have been collected so far. This approach to coding knowledge also seems to resemble ideas about formal schooling for humans – memorising facts, and relations between them, and being able to make deductions. To the extent that computers store such encoded knowledge reliably, and also execute the code of the specified deductive processes accurately, we might even describe GOFAI computers as superior learners to humans, simply because they do those things more reliably, once they have been coded to do so.
However, despite the fact that anyone can “teach” a computer to remember and perform tasks perfectly by writing the appropriate program code, we are reluctant to say that the computer is “learning” when we program it so directly. School teaching might sometimes be mechanical, but learning is not supposed to be. Now that we routinely turn to computers for so many data processing tasks that previously required human labour, society has become increasingly uncomfortable about the value of rote learning, whether teaching children to recite historical dates that can be found on Wikipedia, mechanically apply algorithms that are trivial with a calculator, or turn informal speech into grammatical sentences that could have been generated by a language model. Although memorisation, grammar and arithmetic were highly valued in earlier centuries, training people in mechanical repetition is less popular today. Indeed, we use the word “programming” to refer to kinds of teaching that are morally objectionable, and talk about “deprogramming” when we rescue people from religious cults or other sites of repetitious indoctrination.
The greatest commercial boom in that earlier generation of explicitly coded, symbolic GOFAI was driven by the promise of “expert systems”. An expert system programmer or “knowledge engineer” attempted to capture all the specialist facts from a particular field of knowledge such as geology or cardiology, together with all the necessary rules for combining and applying those facts, and encode them into algorithms that might be able to replicate or substitute for expert judgments.
Unfortunately, expert systems of that time never turned out to be quite as useful as promised in the sales pitches. The most successful were able to operate like automated textbooks or reference manuals, correctly combining symbols that reflected the specialist terminology of the experts who had been consulted when coding them. The problem came when such systems were used by people who did not use the right formulation of specialist terms to describe what they wanted, but instead tried to relate the “expert” reasoning of the system to their own ordinary non-expert understanding of the situation. Expert systems often failed in brittle ways through lack of common sense. For example, such a system might prescribe medicine for a patient who was dead, because although the programmers had coded thousands of facts about medicine, they could easily forget to code things that weren’t mentioned, such as the common sense fact that the dead won't benefit from medicine.
From this historical background, it’s easy to see why statistical machine learning started to look attractive, and why GOFAI has become a historical curiosity with a funny name, no longer a priority for university teaching and research. The meaning of the word “learning” has changed in AI research – it is no longer related to the way children might learn in schools, or to the knowledge that experts might write about or extract from reference books. Statistical machine learning systems only look for patterns in numbers, and have not been coded to know anything about symbols like cat
and dog
unless they happen to see those sequences of letters in collections of text training data.
In the early days of the deep learning revolution, AI researchers collected many statistical examples of pixels from photographs that have furry texture patterns in them, and also noted statistical frequency of the letters C, A, T when people uploaded particular kinds of furry pattern (this example is often a good starting point, because it turns out that there are plenty of photographs of C-A-T’s on the Internet). Programming was still involved. A programmer wrote the code that downloaded the photographs from the Internet, and the code that scanned the Instagram comments, Wikipedia pages, Facebook posts or whatever to find the letters C-A-T. This kind of learning looks a bit more like common sense, since so many people are apparently interested in cats, but the results seems to be a particularly trivial kind of learning – even more primitive than the GOFAI era – not so much like primary school, but more like an infant who can point at a picture and say CAT after being trained to do so. A child who did not progress beyond this stage would not be socially competent.
Being able to say the right kind of things about a CAT - instructing a robot that it ought to gently stroke the cat, or that cats, while made of meat, should not be cooked and eaten - are routinely understood by embodied children, but cannot be understood from a photograph alone. Social competence is not a matter of direct perception, but systems of moral discourse through which we learn and describe what ought to be done. In the GOFAI era, these kinds of rules of behaviour were explicitly encoded as conceptual symbols. In today’s large language models, moral discourse comes by association - words like “stroking” are very likely to be associated with cats, while “is a cat meat” raises problematic questions (having just tested this, the model initially answered no, before admitting that they are constituted of meat, then preferring to discuss humans eating pet food). It’s unlikely that robots will either be stroking or eating cats in the near future, but as with small children learning about cats, there are times when we would like computers to do what we tell them. This is the great challenge for machine learning: how the simple correlations between things that one might have seen on the Internet can become a useful basis for telling a machine what to do differently in future.
We do already have technologies optimised for telling a computer what to do – this is the whole point of programming languages. The GOFAI era saw dramatic advances in programming languages, and in the variety and sophistication of concepts and relationships that could be symbolically described. Unfortunately, practical deployment of expert systems resulted in so many disappointments through their brittle failures of common sense that a common saying among AI researchers was “if it works, it isn’t AI”. Nevertheless, many of the advances made in programming languages and algorithms through that period are now commonly taught as basic parts of the computer science curriculum. The flip side of defining AI as things that don’t work is that, once AI algorithms do work reliably, we don’t consider them AI any longer. If AI is the branch of computer science dedicated to imagination, and to making computers work the way they do in the movies, then boring products that just work are not exciting enough to be defined as AI. Most of the things that we do with computers today (including me typing these words right now) would have seemed like magic 100 years ago. The technical advances that actually allow us to instruct computers for practical purposes might have seemed like AI once, but now they are just coding.
Explaining this carefully, and perhaps at greater length than a computer scientist would have wanted, makes it clear that although there is a huge difference between the technical methods described as “machine learning” and “programming”, many of the things that we might have hoped could be achieved through “machine learning” – including most of those that have moral implications – actually require “programming”.
Machine learning algorithms can help us with the more mundane aspects of programming, and I’ll be discussing those opportunities in far more detail in chapter [XX 11 XX]. Using AI to help us write code can reduce the costs of attention investment in our favour. However, we have to contrast these useful applications with AI systems that consume our attention, without offering any of the benefits of Moral Codes. The next chapter will look more closely at how the attention-consuming systems work.
[1] Alan F. Blackwell, "The Age of ImageNet’s Discovery," in A cat, a dog, a microwave ... Cultural Practices and Politics of Image Datasets, ed. Nicolas Malevé and Ioanna Zouli. (London: The Photographers’ Gallery, 2023). See also Couldry and Mejias, The costs of connection.
[2] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning. (New York: Springer, 2013).
[3] Adrian Mackenzie, Machine learners: Archaeology of a data practice. (Cambridge, MA: MIT Press, 2017).
[4] These are actual lines of code from the textbook used in my first AI class: Eugene Charniak and Drew McDermott, Introduction to Artificial Intelligence. (Reading, MA: Addison Wesley, 1986), pages 16-20.