University pairs with Meta to build social media data archive


Despite the massive amount of information created by social media, social scientists have been unable to reliably access the underlying data, undermining attempts to understand the impact of social media on society.

As someone who studies how people use social media to organize, discuss and enact social change, Libby Hemphill knows this struggle firsthand.

“Some researchers studying social media have to do everything themselves, from getting approval to accessing data to writing programs to storing and analyzing data on their own,” said Hemphill, associate professor of information in the School of Information, research associate professor at the Institute for Social Research, and associate professor of the Digital Studies Institute in LSA.

And that’s not all.

“They might need to know how to work with data in different formats. Or they may need to pay for a service geared toward market research, which carries its own limitations on how they can download and use data,” she said.

That is a daunting undertaking for anyone trying to access and understand social media data, Hemphill said.

For several years, Hemphill has been working to establish a social media archive at the Inter-university Consortium for Political and Social Research at ISR. She says her existing work creating the archive was possible because of a 2021 Propelling Original Data Science grant from the Michigan Institute for Data Science, called “Ensuring FAIRness in Social Media Archives.”

Now, Meta, the parent company of Facebook, Instagram and WhatsApp, has partnered with U-M to support the Social Media Archive, or SOMAR.

“From the emotional well-being of local youth to the outcomes of global political processes, social media play a critical but poorly understood role,” said ISR Director Kathleen Cagney. “At ISR and ICPSR, it is our imperative to shed light on these processes.”

Led by Hemphill and housed at ICPSR, the SOMAR project will provide access to some of the most consequential information in contemporary society.

The $1.3 million gift from Meta is an investment to support the vision of SOMAR and to help build it so that it continues to exist and support research for years to come.

“In order to help advance the world’s understanding of key social issues, we have provided a gift to support ICPSR’s creation of a social media archive at the University of Michigan,” said Pratiti Raychoudhury, vice president and head of research at Meta. “This effort is part of our longstanding commitment to find the right ways to share data for the purpose of academic research.”

ISR Development Director Henry Jewell said SOMAR will provide “the foundation to address thousands of important research questions.”

“Once the data are made available to the research community, insights will begin to emerge immediately,” he said. “Within a decade, we anticipate SOMAR will yield findings on election integrity in the time of social media, the way advertisers leverage social media data to influence consumers and other critical issues. This is just the beginning.”

SOMAR’s home, ICPSR, has a long history of handling data with the utmost confidentiality and privacy. Stringent protections are in place for securing and distributing sensitive data.

This attention to ethical data use is irreplaceable when it comes to the data of millions of social media users, said ICPSR Director Margaret Levenstein.

“In an increasingly data-driven world, ICPSR seeks to make data more accessible, more useful and more understandable,” she said.

While the new social media archive is still in its early stages, existing social media data held at ICPSR will be cross-listed when SOMAR is up and running. The datasets include:

• “#MeToo Tweet IDs, October 15-28, 2017 (ICPSR 37447),” a collection of tweet IDs pertaining to the first two weeks of the #MeToo hashtag campaign in October 2017.

• “Appealing to the Base or to the Moveable Middle? Incumbents’ Partisan Messaging Before the 2016 U.S. Congressional Elections,” which contains weekly measures of partisanship for verified official U.S. Congress Twitter accounts for September-November 2016.

• “What Social Media Platforms Miss About White Supremacist Speech,” which includes 274,668 posts scraped from Stormfront and 509,982 comments collected from the Reddit API.

Students and scholars around the world will use SOMAR data to conduct research about the phenomenon of social media use; its impacts on social, political and psychological processes; and the views and behaviors of social media users. With services to support the analysis of this new kind of data, SOMAR will catalyze a new field of research, spurring potentially transformative discoveries.

In addition to removing data-access hurdles, SOMAR will offer training and outreach to help researchers and community members learn how to leverage social media data to form usable insights.

“A resource like SOMAR will lower persistent barriers to data access for researchers and is desperately needed,” Levenstein said. “The future of our society depends on it.”


Leave a comment

Commenting is closed for this article. Please read our comment guidelines for more information.