After reports at the end of 2022 that hackers were selling data stolen from 400 million Twitter users, researchers now say that a widely circulated trove of email addresses linked to about 200 million users is likely a refined version of the larger trove with duplicate entries removed. The social network has not yet commented on the massive exposure, but the cache of data clarifies the severity of the leak and who may be most at risk as a result of it.
From June 2021 until January 2022, there was a bug in a Twitter application programming interface, or API, that allowed attackers to submit contact information like email addresses and receive the associated Twitter account, if any, in return. Before it was patched, attackers exploited the flaw to “scrape” data from the social network. And while the bug didn’t allow hackers to access passwords or other sensitive information like DMs, it did expose the connection between Twitter accounts, which are often pseudonymous, and the email addresses and phone numbers linked to them, potentially identifying users.
While it was live, the vulnerability was seemingly exploited by multiple actors to build different collections of data. One that has been circulating in criminal forums since the summer included the email addresses and phone numbers of about 5.4 million Twitter users. The massive, newly surfaced trove seems to only contain email addresses. However, widespread circulation of the data creates the risk that it will fuel phishing attacks, identity theft attempts, and other individual targeting.
Twitter did not reply to WIRED’s requests for comment. The company wrote about the API vulnerability in an August disclosure: “When we learned about this, we immediately investigated and fixed it. At that time, we had no evidence to suggest someone had taken advantage of the vulnerability.” Seemingly, Twitter’s telemetry was insufficient to detect the malicious scraping.
Twitter is far from the first platform to expose data to mass scraping through an API flaw, and it is common in such scenarios for there to be confusion about how many distinct troves of data actually exist as a result of malicious exploitation. These incidents are still significant, though, because they add more connections and validation to the massive body of stolen data that already exists in the criminal ecosystem about users.
“Obviously, there are multiple people who were aware of this API vulnerability and multiple people who scraped it. Did different people scrape different things? How many troves are there? It kind of doesn’t matter,” says Troy Hunt, founder of the breach-tracking site HaveIBeenPwned. Hunt ingested the Twitter data set into HaveIBeenPwned and says that it represented information about more than 200 million accounts. Ninety-eight percent of the email addresses had already been exposed in past breaches recorded by HaveIBeenPwned. And Hunt says he sent notification emails to nearly 1,064,000 of his service’s 4,400,000 million email subscribers.
“It’s the first time I’ve sent a seven-figure email,” he says. “Almost a quarter of my entire corpus of subscribers is really significant. But because so much of this was already out there, I don’t think this is going to be an incident that has a long tail in terms of impact. But it may de-anonymize people. The thing I’m more worried about is those individuals who wanted to maintain their privacy.”
Twitter wrote in August that it shared this concern about the potential for users’ pseudonymous accounts to be linked to their real identities as a result of the API vulnerability.
“If you operate a pseudonymous Twitter account, we understand the risks an incident like this can introduce and deeply regret that this happened,” the company wrote. “To keep your identity as veiled as possible, we recommend not adding a publicly known phone number or email address to your Twitter account.”
For users who hadn’t already linked their Twitter handles to burner email accounts at the time of the scraping, though, the advice comes too late. In August, the social network said it was notifying potentially impacted individuals about the situation. The company has not said whether it will do further notification in light of the hundreds of millions of exposed records.
Ireland’s Data Protection Commission said last month that it is investigating the incident that produced the trove of 5.4 million users’ email addresses and phone numbers. Twitter is also currently under investigation by the US Federal Trade Commission over whether the company violated a “consent decree” that obligated Twitter to improve its user privacy and data protection measures.
This story originally appeared on wired.com.