Guide to Building Data-Driven Organizations in the Public Sector

Open Data

Team 1: Robert Lott, James Hogue, & Kirsten VanDeventer

Topic Overview

Data management carries some large ethical and legal issues as we have seen in readings this week. There are predominent questions that consistently need to be addressed. i.e. how should data be regulated? what is the data being used for? and how can a company maintain the anominity of those clients in which it serves. As seen in the readings from “Data as a Public Good” and “Reality Mining” there is no shortage of data throughout the world, the turn issues that we see are in the management of it.

(James H.) Data is everywhere around us. Data is collected through so many ways and can answer so many questions, but the biggest question is what questions need to be answered. A commonality that I am seeing is that data provides and insight into topics that otherwise have no way of understanding measurement. There are problems, and people have ideas about possible solutions but only until they look at the math can they truly make predicitons. In the material that is covered so far I am seeing that there is not limit to the magmintude of projects that can be tackled with the appropriate amount of infomration. All over the US we rely on oranizations such as the farmers almanac and the National Weather serice for weather predications, and in many cases those organizations save lives! The overview for starting this class tells me one thing, and that is the big pools of data isn’t just a window into American life, but it is the only way for us truly understand the world we live in.

Chapter Summaries

Social Physics CH5: data as a public good pp 148-179,179-188 (James H.)

Lets talk about DJ Patil and some of the history that the US Government has had with open data. Back in 1996 Dj began to hack the Department of Commerce servers while he was attending University of Maryland just to see what kind of data he could find and what he could do with it, fast foward to 2014 and he was asked to 2014 as President Obama asks him to the White House to Be their “Data Scientist”. DJ had a very strong background and understanding of the power of data and what that could mean for finding better uses for the data collected from the US Government. The US Department of Commerce (aka Department of Science and Technology) is much more than its name, it collects and makes sense of all of the countrys economic statistics which are instramental in understanding our cultural and creating a decision based science that are windows into American life. The data that the US Department of COmmerce has collected is years of stastical information from every area, and just one of those areas is weather. That data collected from billions of dollars of devices and research is where the National Oceanic and Atmospheric Administration (NOAA), National Weather Service, National Census and many other oranganizations both private and public get its information from and how they store / share information. Many people have used the data from the government for uses such as starting private businesses in helping farming, creating private weather reporting organizations such as AccuWeather and using them to see patterns within our own government so that we can become more efficient in our programs and how the government operates overall. Transparency of our data has been protected for a long time but our President and who is in charge of the individual departments monitor what is released to the public. As President Trump took office the transparency of governmnet data started to disappear as many of the reports that citizens used to be able to view either had much less information to them or they completely disappeared. That is where the story of our countrys first chief data scientist and all of what data can offer starts to come to an end.

Reality Mining: Census, Mobile Phones, and Internet Giants

Access to data in many formats is available publicly. Some examples of this data are the Census and the World Bank’s International Survey. Google uses World Bank data to integrate simple visualization into its search results. Other sources may include Call Data records or CDRs, which are kept more closely to the chest of the cell phone service providers. This information can be used to identify time and general location of where a call or text was made by the recipient. Models of human movement & social networks can be developed from the CDR data and had become increasingly popular for researchers.In the Chapter it is noted that the gender, birthdate and Zip Code coupled with Census information can result in an individual being uniquely identified 63-87% of the time in data collection. AT&T is developing a group called WHERE (Work and Home Extracted Regions). This group is responsible for maintaining the privacy of AT&Ts clients. The group creates synthetic CDRs and notes that “The main benefit of these synthetic CDRs is that they maintain the privacy of individuals because no data from real individuals are used” (Eagle, N., & Greene, K.2014).

Google and Facebook utilize their data for an advertising network. “Both companies also provide general statistics such as the number of emails or profiles your ad appears in. It’s this information that can be mined to determine overall sentiment toward a product across demographics…Facebook also offers an API that allows programmers to build applications that plug into Facebook users’ profiles.21 From these applications, a programmer can get access to as much information as a Facebook customer allows in his or her privacy settings, including phone numbers, contact lists, status updates, and any other identifying information” (Eagle, N., & Greene, K.2014).

Twitter’s data is largely represented on a public scale. They offer an API for researchers to tap into called “Twitter Fire Hose”. This however presents many issues as tweets are often fragmented with hashtags and alike. Given that all this data availble, it is important to forcus on how it can be best utilized. Many keepers of the CDRs are begining to realize that information can be used effectively in large scale circumstances.

NYT The Age of Big Data

NYT The Age of Big Data- The value of big data has arrived in terms of commerce. From supply chain management to professional sports teams, the necessity to “crunch” numbers is real. Large corporations are desperately in the need for staff members who can work with these numbers to help companies trim excess and reduce production times.The cost savings associated with big data is real and it appears that it is here to stay. Not only are companies like we would expect to use big data like Google and Facebook but other arenas such as politics where mathematics does not have a long history are taking advantage of the availability of big data.Data analytics has also arrived on the big screen with Brad Pitt’s “Moneyball” movie highlighting the recent success of baseball’s Oakland Athletics. Match.com also uses big data to assist in… well, matching potential suitors with one another. Love needs math. Retailers Kohl’s and WalMart have participate in the use of data to help with the economics of their stores and how purchasers shop. So from Brad Pitt to Bentonville, Ark., the use and value of big data has helped big businesses succeed.

Big data as we know it is intrinsically valuable for large corporations that dominate in their field. But how can the use of this kind of data help the average individual? Will that time come or is it just too overwhelming of a field to implement on a small scale? Big data has the capability to become much larger. Where do we foresee the end of computing numbers? Is there a ceiling to this growing trend?

Key Take-Aways:

Discussion Questions

  1. Since our government (including individual departments) collects mass data and raw information, as tax paying citizens, should all of that be given to us without restriction and for us to use as we see fit.
  2. Should programs that are offered completely by government driven data (i.e NOAA) be in fear of competition of private companies? Should the safety and security of our people (and paid for by tax dollars) be able to be overthrown by private corporations doing the same thing for profit?
  3. What are the responsibilties of the government (and each departmnet) when it comes to how they monitor and distribute the data collected?
  4. What are some of the ramifications of utilizing internet giants for open data sources? At what point is privacy of the individual no longer protected?
  5. How can APIs be regulated to protect anonimity? Who or what agency/organization should be responsible?
  6. What is the most effective way to mine data so that it is quickly actionable for those that need it? What are the implications of Law and policy not catching up to the data that is mined.
  7. Big data as we know it is intrinsically valuable for large corporations that dominate in their field. But how can the use of this kind of data help the average individual? Will that time come or is it just too overwhelming of a field to implement on a small scale?
  8. Big data has the capability to become much larger. Where do we foresee the end of computing numbers? Is there a ceiling to this growing trend?
  9. Big data has a tremendous value to larger companies. But how do we envision a mom and pop establishment using this kind of information in the future? Is this something that we legitimately see them using to turn additional profits? Yes? No?

References