How many people had their data harvested by Cambridge Analytica?

Mark Zuckerberg testifies before the committees of the US Congress — Mark Zuckerberg before the committees of the US Congress. ‘Many, many other entities – academic, commercial, governmental – could have harvested the data of users.’ Photograph: Chip Somodevilla/Getty

Statistics are a staple of journalistic accuracy issues, but rarely is a number so big, consequential and hard to verify as the number of Facebook users directly affected by the still emerging Cambridge Analytica story. Is it no more than 30 million, as Cambridge Analytica says? Fifty million, as estimated by the Observer and Guardian journalists who have done so much to disclose the issue? Or 87 million, as Facebook has ventured? Facebook’s estimate has a fine-print caveat: “We do not know precisely what data the app shared with Cambridge Analytica or exactly how many people were impacted. Using as expansive a methodology as possible, this is our best estimate of the maximum number of unique accounts that directly installed the thisisyourdigitallife app as well as those whose data may have been shared with the app by their friends.”

The numbers seem to be calculated by multiplying the number of people known as “seeders” by the average number of Facebook friends seeders are thought to have. A seeder was a Facebook user who installed certain apps that permitted the apps’ controllers to harvest data from the user and the seeder’s (unknowing) Facebook friends. The wide variation in the estimates of people affected results partly from different estimates of seeders – 185,000, 275,000, 300,000 – and different average-number-of-friends figures – 160, 180, 250, 340.

Does it matter, in the sense that it is now evident that many, many other entities – academic, commercial, governmental – could have harvested the data of users under previous Facebook policies, for which Mark Zuckerberg, the company’s ethically callow controller, apologised before committees of the US Congress last week, without apparent loss of face?

A sense of perspective was given by the Harvard professor Jonathan Zittrain, a sophisticated observer of the social and democratic impacts of digital technologies: “The Cambridge Analytica dataset from Facebook is itself but a lake within an ocean, a clarifying example of a pervasive but invisible ecosystem where thousands of firms possess billions of data points across hundreds of millions of people – and are able to do lots with it under the public radar.”

Incrementally since 2015, journalism organisations including the Guardian, the Observer, the New York Times, Politico and the Intercept have shown that a Cambridge Analytica-related entity paid a company called Global Science Research (GSR) for the use of Facebook data that GSR harvested. It appears that the data was to be matched with other datasets of voters’ personal information to which Cambridge Analytica-related entities had access. Broadly, the purpose was to refine profiles of voters to permit more precise messaging.

Cambridge Analytica, which did work for the presidential campaigns of Ted Cruz and Donald Trump, denies that it used “GSR data or any derivatives of this data in the US presidential election”.

GSR’s co-directors were Aleksandr Kogan and Joseph Chancellor. GSR, in effect, commercialised a technique for psychological profiling using big datasets, on which Kogan and several other Cambridge University academics worked.

Chancellor joined Facebook as a “quantitative social psychologist on the user experience research team”. Is it unreasonable to wonder whether the potential dataset for the team’s work is 2 billion, the total number of Facebook users?

• Paul Chadwick is the Guardian’s readers’ editor

Richard Hartley

Technology, Photography & Film

Leave a Comment Cancel comment