For several hours a week, I write for a technology company worth billions of dollars. Alongside me are published novelists, rising academics and several other freelance journalists. The workload is flexible, the pay better than we are used to, and the assignments never run out. But what we write will never be read by anyone outside the company.
That’s because we aren’t even writing for people. We are writing for an AI.
Large language models (LLMs) such as ChatGPT have made it possible to automate huge swaths of linguistic life, from summarising any amount of text to drafting emails, essays and even entire novels. These tools appear so good at writing that they have become synonymous with the very idea of artificial intelligence.
But before they ever risk leading to a godlike superintelligence or devastating mass unemployment, they first need training. Instead of using these grandiloquent chatbots to automate us out of our livelihoods, tech companies are contracting us to help train their models.
The core part of the job is writing pretend responses to hypothetical chatbot questions. This is the training data that the model needs to be fed. The “AI” needs an example of what “good” looks like before it can try to produce “good” writing.
As well as providing our model with such “gold standard” material, we are also helping it attempt to avoid “hallucinating” – a poetic term for telling lies. We do so by feeding it examples that use a search engine and cite sources. Without seeing writing that does this, it cannot learn to do so by itself.
Without better language data, these language models simply cannot improve. Their world is our word.
Hold on. Aren’t these machines trained on billions and billions of words and sentences? What would they need us fleshy scribes for?
Well, for starters, the internet is finite. And so too is the sum of every word on every page of every book ever written. So what happens when the last pamphlet, papyrus and prolegomenon have been digitised and the model is still not perfect? What happens when we run out of words?
The date for that linguistic apocalypse has already been set. Researchers announced in June that we can expect this to take place between 2026 and 2032 “if current LLM development trends continue”. At that point, “Models will be trained on datasets roughly equal in size to the available stock of public human text data.”
Note the word human. Large language models do little but produce prose, much of which is already being published on the internet. So couldn’t we train these models on their own output (so-called synthetic data)? Our cyborg internet – co-authored by us and our word machines – could then swell ad infinitum. No such luck. Training our current large language models on their own output doesn’t work. “Indiscriminately learning from data produced by other models causes ‘model collapse’ – a degenerative process whereby, over time, models forget the true underlying data distribution,” write Ilia Shumailov and colleagues in Nature. In other words, they go off the rails and tend towards producing nonsense. Feeding something its own effluvia leads to atrophy. Who would have thought?
Shumailov explained to me that each time a model is trained on synthetic data, it loses awareness of the long tail of “minority data” that it was originally trained on (rare words, unusual facts etc). The breadth of knowledge is eroded and replaced by only the most likely datapoints – LLMs are at their core sophisticated text-prediction machines. So when your original, digital data is already biased – very English language-heavy, largely US-centric, and full of unreliable forum posts – this bias will only be repeated.
If synthetic, AI-produced data is insufficient to help improve the models, then they will need something else. This is especially true as concerns spread that the much-vaunted models will stop being able to improve before they’ve ever become that useful. Leading startup investment firm Sequoia has shown that AI firms will need to fill a $500bn revenue gap by the end of this year to keep investors satisfied. The word machines might be hungry; the capital behind them also has an appetite.
OpenAI, the trillion-dollar Microsoft protectorate behind ChatGPT, recently signed licensing agreements – potentially worth hundreds of millions of dollars – with many of the world’s main media organisations, from News Corp to the Financial Times.
But it’s not just a question of accumulating more original words. These companies need the sort of writing that the model will seek to emulate, not merely absorb.
That’s where human annotators come in.
***
In Fritz Lang’s classic 1927 film Metropolis, the ancient Canaanite deity Moloch is reincarnated as an insatiable industrial machine. It is a technology that works us, as opposed to working for us. Factory workers respond to its ever-growing demands by lunging at its dials and pulling at its levers. But they cannot keep up. The machine hisses and explodes. We then see the workers forgoing the act of feeding and walking straight into the furnace mouth of Moloch themselves.
When I first took the role as an AI annotator, or more precisely as a “senior data quality specialist”, I was very aware of the irony of my situation. Large language models were supposed to automate writers’ jobs. The better they became through our work, the quicker our careers would decline. And so there I was, feeding our very own Moloch.
Indeed, if there is anything these models can achieve quite well, it is the sort of digital copywriting that many freelance writers perform to pay the bills. Writing an SEO blog about the “internet of things” might not take much research, pride or skill; but it usually pays far better than poetry.
Working for an AI company as a writer was therefore a little like being told you were going to be paid a visit by Dracula, and instead of running for the hills, you stayed in and laid the table. But our destroyer is generous, the pay sufficient to justify the alienation. If our sector was going up in smoke, we might as well get high off the fumes.
And therein lies the ultimate irony. Here is a new economic phenomenon that rewards writing, that encourages it, that truly values it; all while simultaneously deeming it an encumbrance, a problem to be solved, an inefficiency to be automated away. It is like being paid to write in sand, to whisper secrets into a slab of butter. Even if our words could make a dent, we wouldn’t ever be able to recognise it.
But perhaps it is foolish to be precious about so prosaic a craft. How many people deserve to make a real dent, after all?
François Chollet, a bestselling computer science textbook author and the creator of the Keras training library (which provides building blocks for researchers to create their own deep learning models), told me he estimates there are “probably about 20,000 people employed full-time just creating annotated data to train large language models”. Without manual human work, he says the models’ output would be “really, really bad”.
The goal of the annotation work that I and others perform is to provide gold-standard examples for the model to learn from and emulate. It’s a step up from the sorts of annotation work we’ve all done in the past, even unknowingly. If ever you’ve been faced with a “captcha” problem asking you to prove you aren’t a robot – eg “select all the tiles with pictures of a traffic light” – you were actually doing unpaid work for a machine, by helping to teach it to “see”.
When I was a student I remember repeating words like “left” and “right” into my laptop for a couple of hours straight, in order to help the developers of a self-driving car. After a few hours being paid per satisfactory vocal delivery, and not even coming close to minimum wage, I gave up.
Today’s roles are different and are a crucial part of LLM development. Alex Manthey, head of data at Contextual AI, is one of the people hiring writers to improve their models. She told the Observer that the practice is “mission critical”, as you “need humans in the loop to make sure [the model’s output] is palatable to the end user”. The human touch pays off. There’s a “reason why every company is spending so much time and unbelievable amounts of money making this happen,” she says.
According to both Chollet and Manthey, hiring in the sector has recently shifted away from controversial, low-paid work in developing countries towards more specialised, high-paid roles. As models get better at writing, the quality of training data they need rises. Higher salaries follow. Several remote annotation roles will pay writers upwards of £30 an hour. Third-party annotation vendors such as Scale AI (valued at $14bn) are also capitalising on this scarcity of high-quality training data.
Snippets from current UK job ads for AI annotation work give a clue as to the range of tasks involved: “create responses that will form the ‘voice’ of future AI”; “provide feedback to teach AI models to become more helpful, accurate, and safe”; “write clear, concise, factually and grammatically correct responses”; “coach an AI model by assessing the quality of AI-generated writing, reviewing the work of fellow writing evaluators, and crafting original responses to prompts”. If chatbots can pretend to write like humans, we can also pretend to write like chatbots.
But will this process continue? Will humans just forever write the words that AI models need to be able to do human jobs? Doesn’t that defeat the purpose of the whole enterprise? While one of the core methods underpinning the models is known as RLHF (reinforcement learning from human feedback), it’s unclear how many outside the field understand that the “secret sauce” behind these celebrated models relies on plain old human work.
If technology companies can throw huge amounts of money at hiring writers to create better training data, it does slightly call into question just how “artificial” current AIs really are.
The big technology companies have not been “that explicit at all” about this process, says Chollet, who expects investment in AI (and therefore annotation budgets) to “correct” in the near future. Manthey suggests that investors will probably question the “huge line item” taken up by “hefty data budgets”, which cover licensing and human annotation alike.
If the current models can’t risk running out of new words to train on, then perhaps we as writers will never run out of work. But technology fidgets. Better models, with different techniques and more efficient training needs, might appear. The next generation of annotators will need to be better than the AI at whatever skill it needs to master next: theoretical physics, maybe? Medical diagnosis?
Cracking language is at best an intermediary goal. Our words will be but temporary fuel.