
Image by AppsHunter.io, from Unsplash
Discord Privacy Concerns Grow After 2 Billion Messages Go Public
Brazilian researchers scraped 2 billion public Discord messages for academic research, raising privacy concerns despite claims of ethical collection and anonymization.
In a rush? Here are the quick facts:
Researchers scraped 2 billion Discord messages from 3,167 public servers.
Data spans 2015–2024 and includes 4.7 million users.
The database is now public, weighing over 118GB.
A Brazilian research team released a massive dataset of over 2 billion Discord messages which has sparked major privacy concerns despite their claims of ethical conduct., as first spotted by 404 Media.
The research team composed of 15 members from the Federal University of Minas Gerais obtained messages from 3,167 public Discord servers which represent 10% of all discoverable Discord communities through the platform’s public API.
The messages span nearly a decade, from 2015 to 2024, and were gathered as part of a study meant to help with mental health, political discourse, and AI chatbot research.
“Throughout every step of our data collection process, we prioritized adherence to ethical standards,” the researchers wrote. “All data was sourced from groups that are explicitly considered public according to Discord’s terms of use […] The data was anonymized.”
They say they removed usernames, changed user IDs, and took other steps to ensure privacy. The database is available online as a set of JSON files. Even a compressed sample is 6.2GB, while the full archive weighs in at 118GB.
However, despite these efforts, many Discord users are alarmed. 404 Media argues that users consider their Discord conversations private even though the servers exist in a public domain because the platform operates differently than Twitter or Reddit.
The research data collection method raises concerns because many users including teenagers remain unaware that their messages could be included in research datasets.
The scraping may also violate Discord’s own rules. Its Developer Policy clearly states: “Do not mine or scrape any data… through Discord services,” as noted by 404 Media.
This incident follows earlier scraping controversies, including Spy.pet, which collected data from private servers, as noted by 404 Media. But unlike that, the researchers insist they followed all API rules and scraped only public data.
Leave a Comment
Cancel