Social Media Researchers Must Demand More Transparent Data Access

By Amelia Acker

October 8, 2021 at 5:00 am ET

This week, Frances Haugen testified before Congress about how Facebook products have harmed democracy, public safety and children. Haugen, a former product manager researching misinformation at Facebook’s civic integrity team, disclosed thousands of internal documents to The Wall Street Journal and the Securities and Exchange Commission, revealing that Facebook’s internal research teams had evidence of harmful impact but the company did not share these findings with the public or give external researchers access to platform data to confirm their findings. Representatives from Facebook have since come forward and disputed their own internal research that Haugen leaked, but there’s no way for the public to know the truth without transparent data access.

Data access for external research has been impeded by Facebook in a number of different ways that are well-known to investigative journalists and social media researchers. Last week, the House Science Committee convened an investigation into researcher access to social media data. Earlier this month, reporting from The New York Times revealed that datasets given to researchers from a Facebook research partnership were incomplete and inaccurate.

This isn’t the first time that data has conspicuously disappeared from Facebook’s data transparency and access tools, thwarting researchers and journalists that rely on the platform’s data for investigating current events and the spread of disinformation and misinformation online. In August, Politico broke news about thousands of posts from the Jan. 6 insurrection that went missing from Crowdtangle, a Facebook-owned analytics tool for monitoring trends across platforms.

For years, Facebook has created hurdles for researchers, journalists and civil society groups attempting to access data via more traditional web-scraping methods and the company’s application programming interfaces. Facebook has blocked investigative tools from journalists and researchers that track ad targeting by disabling the functionality of browser plugins, disallowed the sharing and datasets amongst researchers and rolled back access for research projects. Time and again, Facebook has shown its transparency and access efforts to support independent research to be perfunctory at best, but increasingly the company’s careless approach to data access for accountability is creating harm to research and reporting communities concerned with studying the impacts of social media on society.

Social media platforms have a vested interest in researchers’ examining the impact of their products and algorithmic technologies in the world. Many have created formal programs for academics and graduate students to apply for special research grants and competitive internal competitions for data access like WhatsApp Research Awards or the Snap Research Fellowships. Other companies give data to researchers and students even more openly, such as the Yelp Academic Dataset or Twitter’s new Academic Research API.

At first blush, Facebook’s research partnerships with Social Science One that grant special access to teams of researchers, the Ad Library and other kinds of platform data gifting programs are a boon to researchers and journalists with limited computing resources. But such data philanthropy creates what sociologists of knowledge call the “agnotology” of social media platforms: By circulating massive datasets with missing or flawed evidence, Facebook deliberately creates space for ignorance to grow.

To be sure, Facebook could have made an honest mistake in providing datasets with gaping holes — it wouldn’t be the first time. But Haugen’s revelations of internal research combined with other well-documented instances of stalling external research via data access could just be how the platform wants to usurp scholarly inquiry and accurate reporting about data flows across its platform. Whatever the case, there’s no meaningful incentive, policy or market logic for Facebook to share its authentic internal data with social scientists, civil society organizations or journalists.

Researchers, especially academics in universities, are bound to codes of ethics and peer-reviewed research methods, and are frequently vetted by their institutional review boards. These interlocking systems of review build trust, accountability and credence in our claims. So, at a certain point, researchers (including me) must recognize that these data gifts are often in bad faith, most likely corrupted or selectively curated in ways that we can never confidently verify and authenticate as independent outsiders. Facebook’s corporate data philanthropy functions as a fig leaf — generating positive publicity when platforms are under scrutiny, but eventually putting empirical research at risk of being retracted and scholarly reputations on the line.

Computational social scientists, misinformation scholars and social media researchers have warned academics about relying exclusively on social media data from platforms alone. Some have even encouraged pursuing less sanctioned methods such as brute force web scraping and violating a platform’s terms of service. Other options are available as well: We can refuse to review and publish findings based on corporate datasets if that data cannot be easily archived, referenced and shared for replicability studies. We need to rebuff invitations to participate in these platform partnerships that engage researchers by wooing them with catered lunches, closed-door convenings and special access. We should convene our professional organizations explicitly to disengage with platform tools until they’ve been independently reviewed according to scientific standards.

Until now, researchers can only access what Facebook wants them to see. Haugen believes that the only way to fix Facebook is through regulation that ensures transparent access to data from the platform. Until laws and policies exist for regulated platform data transparency, researchers need to be able to scrape, access, preserve and share data independently. It’s up to researchers themselves to repudiate Facebook’s feeble transparency efforts and attempts to court researchers. We must create a critical distance from this platform data and develop multi-methods that combine new sources of evidence with both qualitative and quantitative techniques.

Amelia Acker Ph.D., a 2021 Public Voices Fellow with the Op-Ed Project, is an assistant professor at the School of Information, The University of Texas at Austin where she directs the Critical Data Studies Lab.

Morning Consult welcomes op-ed submissions on policy, politics and business strategy in our coverage areas. Updated submission guidelines can be found here.

BY ROLE

BY PRODUCT

FEATURED

INDUSTRIES