I swear I’ll let up on the machine learning diatribes soon, but for this newsletter I wanted to share my recent essay for Spike’s “Field Guide to AI.”
Documentation of ImageNet Roulette (2019) usually presents photos of people framed by green boxes and green-on-black, sans serif phrases like “newsreader” or “nonsmoker.” However, such images are not quite the work itself, which Kate Crawford and Trevor Paglen, respectively an AI researcher and an artist, first presented in their Fondazione Prada exhibition “Training Humans.” Rather, ImageNet Roulette is a piece of software, which, like most software, functions obscurely in order to function for us at all.
In Milan, two monitors on freestanding metal brackets featured cameras capturing visitors; elsewhere, online visitors – users – were invited to upload photos to a simple web interface. If faces were detected, ImageNet Roulette would scour the “human” section of the eponymous ImageNet – a “canonical training set” consisting of millions of categorized photos – for the most likely tags. The dataset’s underlying logic is borrowed from WordNet, a Borgesian endeavor from the 1980s that intended to map the entire English language. For ImageNet, researchers led by computer scientist Fei-Fei Li contracted workers on Amazon’s crowdsourcing marketplace Mechanical Turk to assign terms to images ripped from across the web. “These datasets shape the epistemic boundaries governing how AI systems operate,” the artists write in an accompanying essay (“Excavating AI”) “and thus are an essential part of understanding socially significant questions about AI.”
ImageNet Roulette went viral on social media and received attention beyond arts rags. Users were variously amused or outraged by the weird and offensive identifiers. “Not bad,” tweeted a male doctor labeled a nurse. “Here’s a fun website that will both entertain and frighten you to no end,” declared a journalist at Nerdist. “Yes I am so flattered!” remarked a white woman labeled an “enchantress.” “I don’t think this is particularly funny,” posted Tabong Kima, a Black man labeled “wrongdoer, offender.” The artwork’s disquieting novelty was a sleight of hand: A press release from Fondazione Prada quotes Crawford explaining, “We wanted to engage with the materiality of AI, and to take those everyday images seriously as a part of a rapidly evolving machinic visual culture. That required us to open up the black boxes and look at how these ‘engines of seeing’ currently operate.” The web interface carried a warning: “ImageNet Roulette regularly returns racist, misogynistic, and cruel results.”
—
Writing on AI images in 2018, the artist Hito Steyerl argued that “‘recognition’ creates subjects and subjection, knowledge, authority, and as [Jacques] Rancière adds, neatly stacked categories of people. Pattern recognition is, besides many other things, also a fundamentally political operation.” Entering a computer vision-equipped building, for instance, might become akin to Russian roulette – armed guards or a weaponized drone could permit you entry, or, if the system recognizes you as (similar to) a threat, spill your gray matter on the floor. “[C]ameras based on brain functions provide dubious testimony,” writes Steyerl of neural net-generated imagery. “Likeness collapses into probability.”
And probability is the basis of prediction. As the marketing copy for Palintir, a predictive policing corporation backed by the CIA, once claimed, “good data and the right technology institutions … can change the world for the better.” But ImageNet Roulette asks us to consider what “good data” might be – and if such data can exist. Is “good data” images hand-labeled with archaic racial categories and explicit slurs; with jobs; with oddly specific terms such as “trollop” or “theosophist” or “failure” or “Bolshevik” (as with Barack Obama photoshopped into a Nazi uniform; in a blue suit, he was an “anti-semite”)? “Good data,” as ImageNet Roulette shows, often stands for the accurate reproduction of extant social biases. Just as PredPol data doesn’t evince patterns of crime but patterns of policing, facial categorizations, whether by humans or their machines, become what the artists term “latter-day calipers.” (Indeed, in an all-too-literal example, IBM’s “Diversity in Faces” dataset employed “skull shape” as a category.) ImageNet, write Crawford and Paglen, “is an object lesson, if you will, in what happens when people are categorized like objects.”
This taxonomic fixation is not novel to digital image culture; big tech has simply made it vastly more efficient and rhetorically polished. In his 1986 essay “The Body and the Archive,” the photographer and theorist Allan Sekula described the “juridical photographic realism” of 19th-century Paris. The emergent representational technology permitted not only the documentation of criminals, but also allowed for the composite analysis of alleged common characteristics of the “dangerous classes.” Nearly two centuries later, a machine can declare someone “accused” or a “closet queen” or a “good person”; data becomes the grounds for its own truth. In Roulette’s first five days online, ImageNet culled 600,000 portraits. It seems unlikely any human eyes could further filter the supposedly unbiased photos the cleaners left behind. By September 27th, 2019, two weeks after its launch, the artists removed web access, saying that ImageNet Roulette had “achieved its aims.”
—
Sekula recounts a song popular in London following the publicization of the daguerreotype:
O Mister Daguerre! Sure you’re not aware
Of half the impressions you’re making,
By the sun’s potent rays you’ll set Thames in a blaze,
While the National Gallery’s breaking
As Sekula summarizes the sardonic refrain, “photography threatens to overwhelm the citadels of high culture.” The lyricists imagine the daguerreotype’s reproductive output ballooning the National Gallery’s collection beyond the scope of a building expanded just fifteen years prior. The museum survived photography and, despite handwringing to the contrary, it will likely survive AI, too. However, Crawford and Paglen’s piece has little to do with the generative bent of much of the recent machine-learning art that stirs apocalyptic pronouncements. Or, if it does, it is because it returns us to an earlier step that disappears behind the gauze of generated pictures. Roulette, along with contemporaneous pieces like Anna Ridler’s Myriad (Tulips) (2018), Martine Syms’s MythiccBeing andThreat Model (both 2018), American Artist’s 2015 (2019), and Steyerl’s 2016 essay “A Sea of Data,” not to mention experiments by nonprofits like the American Civil Liberties Union, insisted we look beyond neural nets’ supposed objectivity in connecting language and pictures. ImageNet Roulette “disrupted” the assumption underlying machine-learning training that an image could be so stable as to be constrained by a paltry few words and, by turning the cameras onto us, reckoned with the algorithmically empowered structures of visibility – to cultural institutions, researchers, governments, cops, corporations, micro-wage enterprises, apps, and ourselves – beginning to inflect daily life. And they have come to inflect daily life: Filters adjust features to broadcast-ready perfection; photo apps auto-organize friends; I recently stepped on a plane using just my face, without a ticket, without an employee, and without consent. Of course, these are the uses we can see: most, by design, we can’t.
By the third verse, the 19th-century tune goes:
The new Police Act will take down each fact
That occurs in its wide jurisdiction
And each beggar and thief in the boldest relief
Will be giving a color to fiction
Not only will those surveilled and marginalized under this speculative Police Act be rendered insistently visible by the new technology, their visibility – and its apparent objectivity – will allow all sorts of tales to be spun, remaking reality with the device’s vision as an excuse. Then as now, carceral epistemologies and machine metaphysics go hand in hand. ImageNet Roulette asks: Can art – much a game of labels, as any museum shows – expose and undermine the dominant assumptions conditioning our life among and as images?
Crawford and Paglen’s essay contrasts René Magritte’s disjunctive captioning (Ceci n’est pas …) with the labeling of both early physiognomy and AI training sets. But in the post-DeepDream casino-contemporary of pictures produced from stochastic noise by adversarial neural nets, even surreality isn’t safe. It’s increasingly the case that images – stored and shifted between cloud servers and swept into training sets – are not destined for human eyes, but for machine interpretation. This condition is quite distinct from the analogue era of Daguerre. As Paglen put it in his 2016 essay “Invisible Images (Your Pictures Are Looking at You)”: “The fact that digital images are fundamentally machine-readable regardless of a human subject has enormous implications. It allows for the automation of vision on an enormous scale and, along with it, the exercise of power on dramatically larger and smaller scales than have ever been possible.” The black box gets ever darker.