Towards ML-literate dystopian fiction - Seth Ariel Green's website

My favorite moment in M3gan comes when Allison Williams says to the title character: “I know you think you’re maximizing your objective function.” it was fun to hear an explicit nod to the language of AI-alignment; someone who worked on this movie reads LessWrong, or talked to people who do.

M3gan could serve as a great prequel to an AI apocalypse movie (though I think it’s unlikely). But thinking those movies over – Terminator, Oblivion, The Matrix – they’re mostly action movies that come down to a test of strength. The robots basically think and act like humans, albeit more indestructible.

Meanwhile, Generalized Pretrained Models are more and more a part of our lives, and what sticks out to me about them is their weirdness. Kevin Roose’s chat with Sydney makes her seem vaguely dangerous but mostly just unhinged. AlphaGo’s Move 37 in a championship Go game was weird enough to induce a 15 minute break in the game. An ugly t-shirt can utterly confound facial recognition software and render its wearer invisible.

Let’s forecast these developments 10, 20, 50 years in the future. Imagine that AI has gotten a lot smarter and more dangerous, but basically followed an evolutionary path from today’s machine learning algorithms.¹ Imagine further that it inherited the quirks of its ancestors, the way we retain our appendixes and our proclivity to identify every startling noise as a tiger in the brush. How would you outwit such an adversary? What sorts of maneuvers would you pursue? And what clever, weird things would work against a computer but not against a human, the way Kellin Pelrine distracted AlphaGo?

A malevolent, omni-rational AI with access to nanotech would be an insurmountable opponent. But a bounded thing with evolutionary heritage, we could deceive, negotiate with, or fight. I’d love to see someone up-to-date on ML research take a crack at writing or visualizing what this might look like.

Some experts dispute that this is likely or even possible.↩︎