Chapter 8: Explanation and transparency: beyond no-code / low-code
·
Chapter 8: Explanation and transparency: Beyond no-code / low-code
Sumit Gulwani, who now leads an AI-assisted programming team at Microsoft, tells the story of a plane flight when his neighbour couldn’t believe her luck to be seated next to one of the best qualified young computer scientists at Microsoft. She brought out her laptop to show him how she had been struggling with one of those annoying repetitive tasks described in chapter 2. Sumit loves helping people, but was disappointed there was no obvious way in Excel to automate the many rows of data formatting she needed to do. His own research had been on the topic of program synthesis - how to automatically create a program that transforms one set of data into another - so his neighbour’s problem was exactly the kind of thing he believed should be automated.
Sumit’s job at Microsoft Research was not to help Excel users with repetitive tasks. As with most experts in programming language research, he had been assigned to a group responsible for high-end software engineering tools, and was working on performance of the company’s core database products. But in a kind of personal skunkworks[1], he got obsessed with reading online forums where people shared their problems getting jobs done in Excel. He suspected that many problems faced by the woman on his flight, and others like her, could be automated using program synthesis. A typical Excel forum post shows a few data cells, followed by an appeal for help. Contributors respond with a formula that solves the problem based on the data supplied, and it seemed like a program synthesis algorithm should be able to do the same thing. It took Sumit 3 months reading forums to work out the sweet spot where a particular kind of cell data could be used to automatically create an appropriate formula. But after that intensive search for the right problem, it took only 3 weeks working at home to create the first prototype of the Excel feature now famous as FlashFill.
Sumit is now a computer science celebrity, his group is thriving, and FlashFill is often cited to illustrate the next generation of AI-assisted tools for end-user programming. But just as with celebrity actors or pop stars, this apparently sudden success actually followed years of hard work and early talent. Sumit had already made other prize-winning research contributions in program synthesis before he even started work on FlashFill. But it was the months immersed in user forums that led to the breakthrough he is most famous for. His team now includes experts in HCI and design, not just machine learning and programming languages. And he is clear about how important it was to place a relentless focus on the best possible user experience, when he started to work on FlashFill with the Excel product team. In contrast to the often denigrated office assistant Clippy, whose supposedly helpful suggestions were either trivial or off-beam, the Excel team insisted that FlashFill must respond with immediately useful advice. The mathematics of program synthesis mean that three or four data examples increase the likelihood of a correct solution. But the team told Sumit that FlashFill must work correctly with even a single example[2]. The lessons from chapter 2, explaining the basis of user decision making in attention investment, make it clear why this has been the right choice, and has been critical to making FlashFill such a successful product.
Sumit Gulwani is especially pleased that FlashFill recently appeared in middle school textbooks in his native India, used to illustrate how Excel can be used for the data cleaning and reorganisation tasks (often called “data wrangling”) that are central to a desirable career in data science. He’s passionate about education and technology, following his own early experiences. Despite scoring in the top 200 of a million applicants for the prestigious Indian Institutes of Technology, he dropped out of their engineering programme, retaking the exams to gain one of the very few computer science seats. He now welcomes students from India as members of his PROSE team[3], and the team are working on AI-assisted tutoring that could help provide more relevant feedback to kids learning to code who do not have easy access to learning from good teachers.
Gulwani’s PROSE group continues to push the boundaries of AI-assisted programming in many ways, and are perhaps the most established team in the world dedicated to realising the agenda promoted in this book. Current areas of research include FlashFill++, which shows users the rules learned by the system, potentially to be debugged, modified, or shared. Members of the group are also working on synthesising code snippets for larger-scale software projects, and tools that assist the programmer by suggesting code edits. They describe their approach as “neuro-symbolic,” combining logical reasoning about the code with neural network models of programmers' preferences. They are also alert to the problems of stochastic parrots, in part because Microsoft is so diligent about not recording data from spreadsheets for model training, and also because of the legal challenges if models were trained to replicate copyrighted code - a problem that I discuss further in chapter 11.
Improving control and transparency with More Open Representations
The potential of program synthesis methods like FlashFill has encouraged a trend with the catchphrase “no-code / low-code,” imagining software development without conventional programming languages. This is related to my own recommendations, although I think we would do better to ask what kind of code is needed, rather than to hide or remove the code, which is a design strategy that raises doubts about transparency and control. Indeed, the ambition of “no-code” repeats the historic mistakes I already discussed in the last chapter, when researchers tried to reject text altogether in the style of Jonathan Swift’s philosophers of Lagado.
Sumit Gulwani suggests that we need to integrate different kinds of code, including the spreadsheet table itself, explanation of the synthesised code in FlashFill++, perhaps code recommended from a natural language description, or even visualisations like those from my student Maria Gorinova, who used Gulwani’s PROSE tools to create a playful data wrangling tool called Data Noodles[4]. Sumit calls these designs “multi-modal”, for the way they integrate visual, linguistic and mathematical styles of interaction, and I think they do demonstrate the need for More Open Representations, as a central element of Moral Codes. Indeed, many of the approaches now being taken in no-code projects are descended from the end-user programming innovations of Allen Cypher and his collaborators that I described in chapter 2, or from long-term programmes of research such as the demonstrational interfaces of prolific user interface and end-user programming innovator Brad Myers and his team at CMU[5].
Some recent low-code initiatives have replaced textual programming languages with alternative notations of the kind that were previously studied as “visual programming”. This combination of textual and diagrammatic elements is the same design strategy that I see as the basis for Moral Codes. However, the catch phrase no-code really appeals to the longstanding ambitions of first-generation HCI researchers such as Ben Shneiderman when he first described the windows, icon, mouse and point (WIMP) style of direct manipulation as “a step beyond programming languages.”[6]
Do we need code or not? Where personal computer operating systems had originally presented their users with a command line interface for typing program-like text instructions, the WIMP interface transformed the visual focus of the display, no longer focusing on the code-based world of the computer scientist, and instead illustrating the user’s own documents and activities. By hiding away the code of the command line, users could concentrate on what interested them - their own data. But the problem with the no-code agenda, as with AI, is that it can be difficult, without a sufficiently powerful language, to explain what you want the computer to do, especially if this is something that wasn’t already anticipated by the designer[7].
Representing code by representing data
The purpose of a computer is of course to manipulate data. In most of the ways computer software affects our lives, the data might come from us, but the algorithms come from somewhere else. Whether a bank, a school, or a government, our only opportunity to control what happens next is by deciding what data we give them. The actual algorithms that decide whether we pass an exam or have our mortgage approved are seldom visible, and certainly out of our control. We can control the data - in fact we are the data, as far as the algorithm is concerned. To a digital system, there is no further reality to you as a person, outside of the particular set of data that it knows about you. Fortunately, most of us do have actual physical lives outside of computer data, although some are more virtual than others, if too much of their personal identity has been constructed within the databases of Instagram or YouTube.
If we are going to gain more control over our virtual lives, there needs to be a way for users to move from displaying what we already knew anyway (the data about our own lives and livelihoods) toward representing what is hidden (the algorithms that control our lives by processing our data). AI researchers increasingly describe this as a problem of explanation - telling the user why a particular decision was taken, or transparency - being clear about what is being done with the data. Both explanation and transparency could become more difficult to achieve in the “no code” world, if the algorithm was defined in such a way that there was literally no code to see. A Moral Code (whether textual, pictorial, diagrammatic, or some combination of these) should offer both visibility and control.
The rest of this chapter describes a variety of design approaches to novel codes that expose the relationship between data and algorithms. In a way, the suggestions I am making are the opposite of the business model of surveillance capitalism as it has been described by Shoshanna Zuboff[8]. In surveillance capitalism, companies routinely collect data about your life, usually without drawing attention to what they are doing, other than an initial invitation to read a lengthy end-user licence agreement that few people ever look at. Although legislation is being introduced that forces companies to reveal (only on request) what they know about you, and to (trivially and repeatedly) ask for your consent, the vast majority of this data - the actual substance of your own virtual lives - is hidden from you, and processed by algorithms that are completely secret.
Machine learning technologies seem to have worsened the exploitative nature of surveillance capitalism, because observations of your own behaviour can be sold back to you, having been used to train the algorithm behind some AI service or other. The most profitable companies are careful to ensure that you are not paid for your data. Of course you are charged for the stored products of your own intelligence when they are sold back to you, and critical algorithm scholar Nick Seaver explains how the products themselves are delivered via algorithms for captivation - AI recommendations and playlists that trap your attention, training you to expect more of the same[9].
The spreadsheet: humble data table, or radically Moral Code?
As Sumit Gulwani recognised in his own “aha” moment on that aeroplane, there is one class of product that has established a quiet revolution in letting people control and manipulate their own data - the spreadsheet. Expert spreadsheet users will know that the spreadsheet really is a kind of programming language. In fact, researchers have calculated that there are more spreadsheet programmers than there are in all other programming languages combined[10].
At first sight, a spreadsheet does not look very much like a programming language, but a central message of this book is that Moral Codes will probably not look like familiar languages of the past. Even the GUI started life as an experiment in making programming languages more accessible, despite the fact that it quickly evolved beyond those programming elements, to the extent that some GUIs today focus only on data collection to sell services, rather than supporting user control as they should do. While spreadsheets and GUIs are often described as a starting point for low-code and no-code alternatives to programming, the message of this book is that these visual displays are all representational codes, and that hiding the code instead would be a surrender to hidden AI mechanisms.
However, spreadsheets are generally marketed to businesses, not their customers. Every business relies on algorithmic accounting procedures, and spreadsheets became popular because they provided such an accessible way to define financial processes. Nevertheless, although marketing and perception of spreadsheets assumes these business tools won’t be relevant to ordinary people’s lives, the design principles offer important lessons for how everybody could gain further control over algorithms by starting from data.
The brilliant innovation of the spreadsheet was to make data more visible than code. Until then, most programming languages showed only code, with data never seen until it was time to test the program. It’s easy to see how this became an obstacle to learning and popularising programming. It is our data that is directly relevant to our lives, while algorithms are typically created and discussed by experts we never meet. In old-style programming languages, the user interface of the programming tools hides the part that you understand (your data) and shows the part that you don’t (the code).
The spreadsheet reverses this convention, showing all of your data, and putting the code away in formulas and macros that you must specifically ask for. In one way, this is the same design insight that led to the graphical user interface and the desktop metaphor, where the screen shows the things that interest you (in the case of the GUI, your files and applications) rather than a command line window containing technical jargon. In the original GUIs, it was still possible to access the command line to operate on your data algorithmically, but those capabilities are increasingly hidden away, and are not available at all on most tablet devices and smartphones.
By contrast, in a spreadsheet, every piece of your data can be processed algorithmically by creating formulas. An expert spreadsheet programmer can build complex applications by combining many formulas across the various cells and tabs of a large spreadsheet. Fortunately, this power also comes with a gentle beginner’s slope, which allows even early learners to do simple sums and create databases, typing one or two equations and arranging their information in relevant rows and columns. The functionality of Gulwani’s FlashFill is right on this gentle slope, although the ambitions of his research team go much further.
The fact that your data is always visible in a spreadsheet is both reassuring and helpful. Because you understand how the data relates to your life, you can review the tables at any time to confirm that they look sensible, and to read off any value you might be looking for. But a trade-off comes with this simplicity, that complicated programs are harder to create and manage. Unlike a regular programming language, where pages of code explain what you want the computer to do, in a spreadsheet you only see the instructions one line at a time, and it’s difficult to keep the whole structure clear in your head while putting many invisible pieces together. Even worse, if one of the formulas has a mistake in it, that mistake will be hidden away up until the time that it causes something to go obviously wrong in some of the visible data. As a result, spreadsheets often have hidden errors in them[11], and famous business disasters have been caused by code errors in a spreadsheet formula.
So spreadsheets are easy to use (because they display your data rather than the algorithm), but also dangerous because of the design trade-offs resulting from hiding the code. Fortunately there are alternative trade-off choices, and the next section considers the underlying principles that might help us repeat the design innovations of the spreadsheet and the GUI, with new inventions that can make the power of algorithms more accessible.
Moral Codes as visual explanations
We urgently need improved access to algorithms in the era of machine learning systems, because until now these systems have been especially poor, hiding both your data and the algorithm, so that you have no way either to control or to understand what they are doing. The goal of “explainable AI” is to improve this situation, and the design approaches described here are going to become the most practical way to achieve genuinely explainable and controllable AI. As always, these approaches can be considered a kind of programming language, though once again they do not look like conventional code.
The different technical terms for this field include “interactive visualisations”, “intelligent user interfaces”, “visual programming languages”, "business modelling languages”, “visual formalisms" or “notational systems”. Sumit Gulwani describes them as “multimodal interaction”. But what they all have in common is that they use the display to communicate structure. In some cases they display the structure of an algorithm (for example, like a flowchart with decision points and loops), while in other cases they display the structure of your data (the structure of rows and columns in a spreadsheet is especially clear).
These notational systems offer both explanation and control, which is not true of all recent approaches to explainable AI. You can imagine why a surveillance capitalism business might prefer to collect (and hide) your data, process it with a (hidden) algorithm, and then offer to explain what it has done afterward, rather than offering you any freedom to make the system work differently. Many older textbooks and courses in data visualisation do exactly this - they explain how to draw beautiful graphs illustrating what has already happened, but not how to let the user modify or control the data. Regular users of Excel spreadsheets will know how easy it is to create a bar chart based on values in a table, while it is not at all easy in Excel to create a display that allows the user to modify or explore code and data by clicking and dragging the elements of a visualisation around.
It might seem overly idealistic to suggest that people could ever control algorithms as easily as they “control” those aspects of their everyday lives that generate data, for example when buying a coffee or taking a bus. But let’s consider what the design opportunities might be, to make things work differently. I’ve pointed out that mobile phones are easy to use because they show your data (contacts, messages, photos, whatever) while hiding the algorithms that process it. Spreadsheets offer a clear visualisation of data structure, and offer some access to create algorithms, but they hide the code in a way that invites errors. Conventional program language code doesn’t show any data at all, until it’s time to run the program. All of these alternatives represent trade-offs and design decisions, in which different choices have been made about what kinds of structure to show on the display.
There is limited space on a typical screen (especially on a phone), so it is inevitable that software will be designed to hide some structures and reveal others. But it’s not necessary to show only code structure, or only data structure. Many systems are able to combine code and data, using visual design elements to integrate them in meaningful ways. If you look closely, every movie poster, bus ticket, till receipt, instruction manual, festival programme, train timetable, and even a page in this book, is designed to organise and reveal particular aspects of data structure and the relationships between the parts while hiding others. If the display is on an interactive computer screen, it might include visual clues to the algorithms that are involved, for example highlighting or rearranging things in response to your actions, or using animations to show you how there might be causal interactions underneath the surface.
Spreadsheets have been around for decades, and programming languages even longer. They both date from an era when it was unusual to see pictures, curved lines, or even different colours on the screen of a computer. In a sense, it is only conservatism that has resulted in us still being stuck with such a small range of options for code today, although tables and indented paragraphs are certainly helpful. The original designers of Sketchpad, Smalltalk and the GUI were not thinking within these constraints when they started building the first graphics-based personal computers. Their early prototypes included graphical programming languages that combined handwritten symbols and text to control new algorithms, not simply collecting data to be processed by somebody else’s algorithm. However the huge success of the GUI, and the growth of the surveillance capitalism market, has helped to discourage the idea that new visualisations could allow people to observe and control algorithms as well as data.
Machine learning systems, until now, have not made things much better. Current enthusiasm for LLM-based chatbots will not help at all. But there are ways to design new interactive visual displays that provide the potential benefits of machine learning already explained in previous chapters, including access to both data and algorithms, in combinations that are appropriate to particular kinds of user trying to achieve particular kinds of things.
Inventing Moral Codes as design hybrids
One way to experiment with new design opportunities, exploring the space of possible trade-offs between control of code and visibility of data, is to build prototypes that combine the capabilities of familiar products in unfamiliar ways. The resulting prototypes are hybrids that help us think about possible futures in which users could have more control over their software. Hybrid prototypes should always try to explore interesting new combinations, but will not necessarily be useful (or at least, not useful in immediate or obvious ways). More often, they are useful, but perhaps only to a small group of people, or for an unusual specialist task, or perhaps for a neglected community who would not otherwise receive the attention of software designers.
My own research group works in all these ways, reporting the results of our explorations to specialist research communities such as international conferences on Theory and Application of Diagrams, Visual Languages and Human-Centric Computing, or the Psychology of Programming Interest Group. The experimental software applications that we create sometimes address business problems, sometimes critical healthcare issues such as the control of bleeding after open heart surgery[12], and sometimes non-commercial applications as diverse as teaching mathematics to the children of traditional hunter-gatherers in the Kalahari Desert[13], or improvising electronic dance music in a nightclub “algorave”[14].
In all cases, our software experiments include some elements that are recognisable from previous programming languages, interactive visualisations, or demonstrations of machine learning, but also opening up, hiding or exposing different aspects that draw attention to new opportunities. The research objective is to better understand ways of encoding in relation to human needs and lives, making the world a better place through new languages and representations - indeed, the Moral Codes of this book’s title.
The previous paragraphs have offered a rather abstract description of why we do our research, but no detail on what the results look like. This is in part because there are too many different projects to include in a single book. Many of them have been described at book-length in PhD theses, as well as hundreds of academic publications. While it’s useful to tie together the overall logic of these many projects across decades of work, it’s also helpful to get a flavour of what we do and why, through a few examples that illustrate this research strategy.
Mariana Mărășoiu is a Romanian-British computer scientist who I first met when she was working as an intern for Google. She still teaches interaction design in my department in Cambridge, although spending much of her time in the Scottish Highlands, where she is re-wilding 30 acres of forest. Her research has subverted the conventional approaches to data science and visualisation by creating tools that reverse the workflows and expose the hidden algorithms of familiar spreadsheet tools. One of her projects, that she calls Self-Raising Data[15], counters the deterministic logic by which data is harvested and turned into business visualisations.
In her work with a research sponsor, she learned that professional data scientists are often asked to contribute to problems where there is not actually (yet) any data. All of their communication tools had been designed on the assumption that the data comes before the visualisation, meaning that the data scientist had no way to contribute to important business problems. Mariana built an award-winning prototype (fig 8.1) allowing data scientists and business people to collaborate on visualisations of hypothetical or imaginary data, generating the data to illustrate alternative models.
In another of Mariana’s projects, Cuscus[17], she addressed the problem that users have so little control over how quantitative data is visualised. Users of spreadsheets and scientific data visualisation packages get a choice between bar charts, line graphs, scatter plots, pie charts, and very little else. Although very useful indeed, and familiar to most of us since school mathematics, this standard set of data visualisation choices has hardly changed since they were invented by William Playfair in the late 18th century for his Commercial and Political Atlas[18]. It is really a remarkable indictment of the lack of originality in the software industry and modern business, that practically all of their data visualisations were invented by one man over two centuries ago.
Graphic artists and designers can, of course, render quantitative data in many other ways. Some, such as Otto Neurath’s Isotype[19], were brilliant design innovations that advanced democracy by giving far wider access to understanding national economies and business. Data journalists today create infographics that throw light on complex problems as well as deploying powerful visual rhetoric to advance political or humanitarian causes. My bookshelf includes many beautiful collections of such work, including reference collections such as those by Edward Tufte that are popular purchases for computer scientists wondering how they might create more appealing graphics themselves[20].
Skilled programmers have technical tools to generate a wider range of graphics, beyond those provided in business tools like Excel and Powerpoint, using web graphics libraries such as the popular d3.js[21]. But these are tools for experts, demanding years of training to produce a new kind of visualisation, making such innovations rare outside financial “dashboards” built by wealthy businesses, or global scientific initiatives like Julian Allwood’s Sankey diagrams of energy consumption[22].
Mariana’s Cuscus system (fig x.x) offered more democratic access to alternative data visualisation by exposing the code that draws the graphics, and presenting this code as a familiar spreadsheet rather than in a web scripting language. Users can create flowers, mountains, trees, or Isotype-style human figures by combining any kind of geometric elements. The colours, sizes, positions and relationships of those elements are all represented by numbers in the accompanying spreadsheet. When combined with a data spreadsheet, the correspondence between “visual variables” (as they were described by graphic design theorist Jacques Bertin[25]) and the original data can be defined by pasting cell formulas, just like an ordinary spreadsheet. The result allows any spreadsheet user to invent a completely new data visualisation, no longer constrained to bars, pies and line graphs. In some ways, Cuscus is less convenient than graphics programming languages like Processing that are popular with designers and visual artists[26], but in other ways Cuscus is much easier, for example when debugging by looking directly at the coordinates of the shape to see where a problem might be.
The starting point for projects like Cuscus and Self-Raising Data was to ask whether the conventional business workflow of collecting data, making graphs in Excel, and pasting them into Powerpoint presentations, could become more creative and varied, changing the design priorities of those products to make different elements of their algorithms visible and controllable by users. One of my own research prototypes, an interactive visual tool called Palimpsest, explored whether photographs and paintings could be algorithmically controlled using internal computations related to those in Excel[27].
Palimpsest applied methods from early AI research to create a more powerful version of the popular art and design package Photoshop[28]. Expert users of Photoshop will be familiar with the way that an image gets built up in layers, for example in the notorious process of “Photoshopping” - using a digital airbrush to remove blemishes or parts of a scene. A skilled user will import the original photograph as a reference source, then draw on top of this in a second layer, as if using tracing paper or the part-transparent animation cels of early movie animators. Multiple edit layers can be added and removed, hidden, or made more or less transparent in order to blend multiple effects.
In my Palimpsest system, I played around with this idea of layering, turning it into a new code resource. (The word “palimpsest” is originally an old manuscript in which a parchment has been re-used, with new layers of text on top of an older document, where excavation can reveal hidden meanings or poetic re-interpretations). In Photoshop, the data might be organised into layers, but the user can’t change the algorithm - their only option is to create new layers, adjusting the appearance and amount of transparency. In Palimpsest, the pixels on each layer are diagrammatic code that can cause things to change on other layers, resulting in animations or even new visual languages and algorithms. The inspiration for how this works came (again) from the spreadsheet, where data entered in one cell can cause other cells to change. Each layer in Palimpsest works like a cell in a spreadsheet, making the overall system an experimental hybrid of Photoshop and Excel. The relationships can become complex, and I adapted methods from AI programming languages such as “constraint propagation” to make the resulting system more powerful. Products like Excel use similar methods, but within an agenda to make a mathematical tool that is as simple and obvious to use as possible. The Palimpsest model is not (yet) simple and obvious, but it is certainly different. I’ve used it in some stage performances and art projects, and it continues to inspire different ways of thinking about code and image culture, as well as including new interaction techniques that might pop up in other end-user programming systems or graphics packages.
One last project from my group, to illustrate this hybrid design philosophy, is the work of Advait Sarkar, who was inspired as a student by Sumit Gulwani’s work on program synthesis, and by the way that FlashFill made these capabilities available to so many people. Like Sumit, Advait grew up in India, where his parents were technology journalists in Bengaluru.
Among the many imaginative prototypes Advait has created is a system called BrainCel (Fig 8.4) that quite literally combined the user interface approaches of spreadsheets and machine learning algorithms to improve their accessibility through interactive visualisations[30]. In BrainCel, users can still see their data as in any spreadsheet, but can also see how patterns in the data are being extracted by a machine learning algorithm. The relationship between labels and data features are the critical element to all such algorithms, and these increasingly have substantial impact on our lives, when organisations use this data to make decisions in ways that are not transparent.
BrainCel is just one example of a tool that presents an alternative relationship between the work of labelling training cases for supervised machine learning, which is usually done by unskilled workers, and the interpretation of the clusters or correlations that might be extracted from the labelled features, usually done by corporate data scientists. By making these things visible, and by supporting workflows that build on the diversity of human expertise rather than isolating mundane labour from moral judgement, tools like BrainCel point the way toward forms of machine learning that are both more ethical and better informed.
Imagining future programming tools
These examples of work from my own group are typical of research currently being done in academic communities concerned with Intelligent User Interfaces (IUI), Visual Languages and Human-Centric Computing (VL/HCC), or Systems, Programming, Languages, and Applications: Software for Humanity(SPLASH). It is not unheard of for many different kinds of visualisation, analysis, and interactive code editors to be combined into sets of tools for professional programmers that are known as Integrated Development Environments or IDEs. When new kinds of visualisation or interaction are invented in research groups like mine, they often become available after a decade or two for general use as new features in the mainstream IDE products.
Spreadsheets like Excel, and business data processing tools like PowerBI from Microsoft or the Tableau visual analytics platform, are becoming more like the IDEs that were previously used only by professional programmers. All such tools focus attention on visualising the user’s own business data, rather than screens full of programming language code as in an IDE. But multiple interactive representations offer new potential to all these kinds of user, including ways to automate coding through recommendations from machine learning models. It is easy to see the potential for these combined advances in machine learning, notation design and interactive visualisation to support Moral Codes: More Open Representations, Access to Learning, and Control Over Digital Expression. Business data analytics tools can be considered as “high-end” alternatives to the spreadsheet, but we can also imagine future hybrids of the GUI as “low-end” alternatives for regular people who would benefit from openness, learning, and control, but without needing the full facilities of a spreadsheet.
Why hasn’t this happened earlier? The case studies I presented in this chapter have focused on business problems, because businesses pay the salaries and scholarships that a PhD student needs to live. When I started my own research career, the main international funder of AI research was the US military. When national research policies focus so heavily on technology for business or defence, it is rare to find original work in computer science that explores the questions of human wellbeing and values at the core of this book. Just as with the software itself, we only make progress by attending to one priority while being committed to another. The next chapter investigates in more depth how commercial and research trends can obscure the most interesting and empowering opportunities.
[1] This term refers to a product innovation within a company that has not been explicitly approved by senior management. Many case studies document how critical these can be to the long term success of leading manufacturers. Perhaps the most famous example in the computer industry is documented in Tracy Kidder’s Pulitzer Prize-winning book The Soul of a New Machine (Boston, MA: Little Brown, 1981).
[2] As a result, the deployed product pushed the state-of-the-art in program synthesis in two ways: (a) introducing efficiency in the synthesis process via goal-directed top-down synthesis based on symbolic backpropagation, (b) dealing with ambiguous specifications by incorporating program ranking techniques.
[3] Details of that program can be found at https://aka.ms/prose-research-fellowship
[4] Maria I. Gorinova, Advait Sarkar, Alan F. Blackwell, and Karl Prince. "Transforming spreadsheets with data noodles." In Proc. 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). (2016), 236-237
[5] Brad A. Myers, "Demonstrational interfaces: A step beyond direct manipulation". Computer 25, no. 8 (1992): 61-73.
[6] Ben Shneiderman,“Direct Manipulation: A Step Beyond Programming Languages,” Computer 16, no. 8, (Aug. 1983): 57-69.
[7] Of course, even with the help of a powerful language, a complex problem may remain fundamentally complex.
[10] Christopher Scaffidi, Mary Shaw, and Brad Myers, "Estimating the numbers of end users and end user programmers." In Proc. 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05), 207-214.
[11] Panko, Raymond R. "What we know about spreadsheet errors." Journal of Organizational and End User Computing (JOEUC) 10, no. 2 (1998): 15-21.
[12] Diana Robinson, Luke Church, Alan F. Blackwell, Alain Vuylsteke, Kenton O'Hara and Martin Besser. “Investigating Uncertainty in Postoperative Bleeding Management: Design Principles for Decision Support,” in Proceedings of the British HCI conference (2022).
[13] Alan F. Blackwell, Nicola J. Bidwell, Helen L. Arnold, Charlie Nqeisji, Kun Kunta, and Martin Mabeifam Ujakpa. "Visualising Bayesian Probability in the Kalahari," in Proceedings of the 32nd Annual Workshop of the Psychology of Programming Interest Group (PPIG 2021).
[14] Samuel Aaron and Alan F. Blackwell, "From Sonic Pi to Overtone: creative musical experiences with domain-specific and functional languages," in Proceedings of the first ACM SIGPLAN workshop on Functional art, music, modeling & design (2013), 35-46. See also Alan F. Blackwell, Emma Cocker, Geoff Cox, Alex McLean, and Thor Magnusson. Live coding: a user's manual. (Cambridge, MA: MIT Press, 2022).
[15] Mariana Mărăşoiu, Alan F. Blackwell, Advait Sarkar, and Martin Spott. "Clarifying hypotheses by sketching data." In Proceedings of the Eurographics/IEEE VGTC Conference on Visualization (2016), 125-129.
[16] Mărășoiu: Clarifying hypotheses by sketching data.
[17] Mariana Mărășoiu, Detlef Nauck, and Alan F. Blackwell, "Cuscus: An end user programming tool for data visualisation," in Proceedings of End-User Development: 7th International Symposium, IS-EUD 2019, Hatfield, UK, July 10–12, 2019, 115-131.
[18] William Playfair, Commercial and Political Atlas (London: Corry, 1786). See also James R. Beniger and Dorothy L. Robyn, “Quantitative Graphics in Statistics: A Brief History.” The American Statistician 32, no. 1 (1978): 1–11. JSTOR, https://doi.org/10.2307/2683467. Accessed 9 Aug. 2022.
[19] Otto Neurath, Isotype: international picture language. (London: Kegan Paul, 1936). See also James McElvenny, Otto Neurath’s Isotype and his philosophy of language, 2013, Web publication/site, History and Philosophy of the Language Sciences. https://hiphilangsci.net/2013/08/21/otto-neuraths-isotype-and-his-philosophy-of-language/
[20] e.g. Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, CT: Graphics Press, 1983).
[21] Mike Bostock, D3: Data-Driven Documents (website) (2001) https://github.com/d3/d3/wiki last accessed 9 Aug 2022
[22] Richard C. Lupton and Julian M. Allwood, "Hybrid Sankey diagrams: Visual analysis of multidimensional data for understanding resource use." Resources, Conservation and Recycling 124 (2017): 141-151.
[25] Jacques Bertin, Semiology of graphics. (Madison, WI: University of Wisconsin press, 1983).
[26] Casey Reas and Ben Fry, "Processing: programming for the media arts," Ai & Society 20 no. 4 (2006): 526-538.
[27] Alan F. Blackwell, "Palimpsest: A layered language for exploratory image processing," Journal of Visual Languages and Computing 25 no. 5 (2014): 545-571.
[28] For professional artists who can’t afford the licence costs of Photoshop, including me, the powerful open-source package GIMP is a valuable alternative that has been created by a volunteer community
[29] For further detail of the operating principles of Palimpsest, including worked examples of how this kind of image composition is created, see Blackwell, Palimpsest.
[30] Advait Sarkar, Mateja Jamnik, Alan F. Blackwell, and Martin Spott, "Interactive visual machine learning in spreadsheets," in Proceedings of the 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2015), 159-163.
Current AI approaches have a special problem with explainability, which is why this phrase has become perceived as a technical priority. Do I remember you drawing attention to the explainability requirement in data protection laws, during a seminar in Cambridge several years ago? You are right that my prescription is relevant to all questions of explainable software, not just AI.
?
David Berry:
Perhaps you should mention VisiCalc?
Also wasn’t the brilliant innovation the fact that non-programmers could work with data through a simplified interface, rather than making data visible per se?
Alan Blackwell:
Absolutely. There was a great biographical history of the VisiCalc invention that I read over 30 years ago, and I’ve been waiting for the best citation to pop into my mind. From my own memory of computing interfaces at the time, this really was a paradigm shift - I can’t recall another data-centric computation before this, other than the graphical examples I discuss in previous chapters.
?
David Berry:
I wonder if it would be better to point to the term “explainability” here. Especially as it is not necessarily the same thing as explanation?
Alan Blackwell:
Although I could have said “explainability and transparency”, which would have been grammatically smoother, I like the implications of emphasising explanation as a speech act. It’s a shame that there is not an equivalent noun such as “transparation”, meaning the act of making transparent!