Challenges of Data Quality

January 25, 2019

Marcela Morales and Joseph Lynch

Topic Overview

Big Data (social media) has been used during crisis situtations with differing levels of effectiveness. During the 3 chapters, the prevalance of big date, veratcity of big data, and the usability of big data in crisis situations is explored.

Chapter Summaries

CH2 the rise of big crisis data pp 31-47

This section discussed in great detail the rise in the use of big data in business and humanitarian efforts. Meier asserted that, “ If government and humanitarian organizations do not actively or explicitly create demand for relevant, findable, and high quality social media content during disasters, then why should a supply of high quality data follow”(Meier, p. 31)? The use of hashtags by the Filipino government directly during Typhoon Pablo (#PabloPH) helped providing important updates during this natural disaster. Standardizing hashtags during crisis events can help in reporting accurate and high quality information to the public and from the public. What about False data? That is something that needs to be considered when getting data from social media feeds. False information on urgent humanitarian aid could be sent to the area which can result in wasted time, resources, or even cost a life. Access to information during disasters is equally as important as access to food. Nobody wants poisoned or rotten food which false is. The New York Times is considered by many to bed the gold standard of high quality journalism and yet they have to make 7,000 corrections to articles every year (Meier, 2015). Collecting information from traditional and nontraditional sources can help create a reasonably accurate picture of the situation. In the Twitter world the amount of tweets pinged from one location versus another can tell us a lot about the situation during the crisis. If the number of tweets are lower than we can potentially assume mass casualties, no electricity, or internet. If the number of tweets is the same or higher than average then we can assume that area was not hit as hard. Lastly, what is our responsibility with big data? What responsibility do we have to protect information that can create a safety risk? A suburban New York newspaper provided the names and addresses of all who held handgun permits including police officers, prison guards, and other position sensitive occupations.

CH7 verifying big crisis data via crowd computing

In this section identifying the authenticity of big data during a crisis. Humanitarian volunteers were being screened to ensure they were not trying to sabotage humanitarian efforts in Libya. In the survey they were asked to provide professional or academic email addresses. Twitter handles and Facebook pages were then used to get more information on past tweets and posts to verify the authenticity of this person. Several large pieces of big data, where by themselves didn’t mean much, however used contextually provided a clearer picture of the volunteer that was being vetted. According to the text, the quality of crowdsourced information simply mirrors the reliability of society. If we have low confidence in the reliability of the crowdsourced information then the belief is that this is a diagnosis of society and not the crowdsourcing tool itself. Additionally, an investigative strategy is imperative when going through the information and such strategies include asking sourced pointed questions and triangulating content to get a clear picture. Additionally, one must consider the platform being used and what type of information is being shared. For example, Redditt is probably best used for sharing pictures not necessarily sharing information about developing crisis. Redditt doesn’t encourage the analysis of a situation in order to discredit rumors.

CH8 verifying big crisis data via artificial intelligence

“No technology can automatically verify a piece of User-generated Content with 100% certainty. The human eye or traditional investigations aren’t enough either. It is the combination of the two” (Meier, 145). If everyone had perfect information during a crisis then it really wouldn’t be a crisis. Also, while bad information can have far reaching effects we need to balance the effects of no information. While verifying data is very important we need to realize that we will need to deal with bad information as well and use the tools at our disposal to siphon out the accurate information. All information is good information—even when it is bad.

Key Take-Aways (for Yellowdig)

Overview

In Chapter 2, 7, and 8, the readings focused on the use of data in crisis or disaster situations, how to verify the information, what tools can be used to verify the data. Some of the larger questions to review are:

How to make big data usable in disasters?

“If governments and humanitarian organizations do not actively or explicitly create demand for relevant, findable, and high- quality social media content during disasters, then why should supply of high-quality data follow?”

The variance of reliable data on social media during disasters vary greatly. The Joplin tornado (2011) had 10% relevant data shared on social media, the Australian Bush Fires (2009) had 65% relevant data, Hurricane Sandy (2012) had .001% relevant data.

Can the Data be Trusted?

During the 2010 Chilean Earthquake, emergency services were responding to requests on social media that included fake information. The false data was responsible for sending emergency response teams on wild goose chases instead of being at true emergencies. False data was repeated again with the hurricane in Haiti, Hurricane Sandy, hacked AP social media sites saying the White House was attacked (briefly wiping out $130 Billion in the stock market),

How Much Data is Right?

When responding to disaster social media, is one tweet as valuable as 1,000 tweets. Does the quantity add to the validity? If Social media is being used to prepare disaster response, do areas without social media receive the same service. A large portion of the world’s population does not use social media.

Can crowd sourcing be used to verify Data?

Libya screened volunteers to sourced data during humanitarian efforts. Identification and social media of the volunteers were vetted to ensure that information was not sabotaged. In Russia during the election, the crowd sourced election information function of social media was turned off due to massive reports of election violations. The Government did not want this shared.

Can Artificial Intelligence verify Big Data?

No technology can verify information alone. The blend of human “cognition” along with technical processes will deliver the most accurate result.

To best used data in a crisis situation, being “right is more important than being first”

Synopsis

Discussion Questions

Does the potential value of big data (social media/crowds sourcing) in crisis situations outweigh the potential false data and misuse of emergency response in crisis situations?
Does the variability of usable information (0.001% to 65%) cause you to limit your expectations for the viability of crowd sourced data?

References

Meier, P. (2015). Digital humanitarians: how big data is changing the face of humanitarian response. Routledge. CH2 the rise of big crisis data pp 31-47
Meier, P. (2015). Digital humanitarians: how big data is changing the face of humanitarian response. Routledge. CH7 verifying big crisis data via crowd computing
Meier, P. (2015). Digital humanitarians: how big data is changing the face of humanitarian response. Routledge. CH8 verifying big crisis data via artificial intelligence