Are there any humans left on the internet?

In the late 1990s and early 2000s, Microsoft was convinced that its customers would benefit from an “intelligent assistant”. An animated paperclip, Clippit—which users renamed “Clippy”—appeared whenever Microsoft’s AI believed a programme feature could automate a task. Clippy usually wasn’t very helpful and instead forced frustrated users to take a few steps to dismiss the unwanted and often unhelpful paperclip.

In 2024, Clippy is everywhere. You may have noticed a button on Facebook, Ask Meta AI, under your friends’ posts—offering to explain something a friend has posted about. Or you might have seen search results from Google that include “AI Overviews” designed to summarise search results.

These AI tools can turn up bizarre results. Google’s AI Overviews have recommended that users eat a small rock every day, or use non-toxic glue to ensure cheese sticks to their pizza. In case it’s not clear, you should not do either of these things, and Google should not be encouraging anyone to do so either.

This new wave of AI assistants are large language models, designed to produce text that seems like it was written by a human. They do this by extrapolating from billions of examples written by humans. But humans don’t always give accurate answers. The advice to eat a small rock every day comes from humour site the Onion, and the culinary recommendation involving glue from Reddit, where sincere advice competes with sarcasm and shitposting.

AI models are voraciously hungry for text to learn from. But “high quality” text—by which the architects of these systems mean “lots of complete, grammatically correct sentences”—includes fiction, parody, humour and other features that are difficult for AIs to process. Large language models are plausibility engines, and often the reason that humour sites are funny is that they offer information which is plausible but, to a human, obviously wrong.

Google and Meta know the story of Clippy’s catastrophic failure well. So why introduce buggy tools that users aren’t asking for?

We are in a hype cycle, where enormous sums of money are being poured into AI startups and labs within massive technology companies. To remain part of the conversation around AI innovation, Google and Meta both feel compelled to release AI features, even if they are not ready for prime time.

But there’s another factor at work: a sense that the tools we’ve been using to navigate the web aren’t working anymore. As commerce has moved online, there are increasingly potent economic incentives to promote a webpage. Search engine optimisation (SEO) is a shadowy field in which consultants create huge masses of text designed to fool search engines into thinking a particular webpage is a great answer to a popular user query—eg, “bike tours in Belgium”. They create thousands of pages of text designed to be read by machines, not by humans, that point to the target page—and now your travel company’s page is a top result.

Rampant SEO means that search is often broken now: we don’t actually get good answers via weblinks, just spammy content promoted for profit. Search engines such as Google are responding by offering an alternative to search: an AI-generated answer to your question, rather than web links of questionable utility.

Scholars Judith Donath and Bruce Schneier predict a new form of manipulation—LLMO, or large language model optimisation, where content is created to persuade a large language model that a webpage or product is the best answer to a query. Generative AI will almost certainly produce the content the LLM will train on. It could be “the end of the web as we know it”, a system in which AIs write content solely to be read by other AIs. Consider Clippy, the virtual paperclip, as an ouroboros, a snake swallowing its own tail.

Perhaps what’s becoming truly scarce on the web is… actual humans. The Dead Internet conspiracy theory gained popularity in 2021, positing that the internet “died” in 2015 or 2016 and is now inhabited by AI posting content. Like any compelling conspiracy, it combines elements of the real with the imaginary.

Savvy websearchers have tried to avoid spam content and gain actual human answers for years by adding “reddit.com” to their searches. But Reddit is now one of the most popular sources for training large language models. And it’s not clear that humans are winning on Reddit anymore. AI startup Junia promises to “generate human-like, authentic Reddit post comments”, promoting your site or brand within discussion threads.

Darker still, Facebook users have recently reported that Meta’s AI has injected itself into conversations, identifying itself as a parent of a gifted and disabled child in a private parenting group, and recommending New York City schools based on that experience. When human participants in the conversation wondered if they had stumbled into a episode of Black Mirror, Meta deleted the comments.

In other words, conversation in these spaces increasingly involves humans, machines pretending to be humans to promote products and services and machines pretending to be humans in order to keep humans interested in continuing to use these services. What could possibly go wrong?

It’s worth remembering that internet—in all its inconsistency, awfulness and wonder—is a creation of hundreds of millions of humans. It is possible that AI at some point in time will be able to create content and train itself on it—indeed, every prediction of superhuman intelligence relies on this assumption, that AI can get infinitely smart by playing chess with itself. But the huge leaps we’ve recently seen in AI’s abilities have come from ingesting human-created content. If we exterminate humans from the internet, that progress may slow or stop.

So where are the humans? It won’t be a surprise to readers of this column to find that they are in the smaller spaces. Metafilter, which started in 1999 and operates on a model, ancient in internet terms, of people recommending cool websites to one another, hosts healthy human conversations on its main page and on a lively discussion board. It is aided by moderation and a one-time $5 account creation fee that seems to radically cut down on spam.

Other human-centred spaces rose in popularity during Covid isolation. Text and voice chat site Discord boomed with an influx of younger users during the pandemic. It’s become the “centre of the universe” for gamers, but increasingly is the place to go for authentic interactions that are too fast, too ephemeral for spammers to conquer.

While humans seem to be harder to find online, I’m finding hope in the long tail of online video on YouTube and TikTok. My lab at UMass has been studying random samples of videos on these platforms. The vast majority have fewer than 100 views. Videos like this are invisible to the commercial interests that drive most activity on these platforms. Often, they are intended for friends and family, little glimpses of life shared in public that are really private forms of communication.

OpenAI has already taken steps to colonise this space. The New York Times revealed that OpenAI’s Whisper transcription software was created to transcribe videos from YouTube, feeding the insatiable maw of their AI models. Facebook has begun asking EU users for permission to train AIs on their comments and conversations, which is both creepy, and significantly better than what OpenAI did.

As industrialisation swept across the US in the late 19th century, conservationists proposed a national park system to preserve natural wonders, starting with Yellowstone. Conserving land meant trading off opportunities for financial gain for preservation of something more important. Perhaps it is time to fence off human-only spaces on the internet, leaving them free of roaming AIs. Perhaps we will someday look at the unvarnished human expression in a space like Metafilter as we do at the Grand Canyon or thousand-year-old redwoods.