Guide to Building Data-Driven Organizations in the Public Sector

Processing Satellite Data

Team 6 - Tommy Kolwicz and Dennis S Stockwell

Topic Overview

The three references take us through the current best practices for collecting and analyzing different types of imagery and how the public can be used to help process large volumes of data. It is not humanly possible for imagery experts and scientists to analyze the massive amount of satellite, and soon UAV, imagery collected on a daily basis. Crowd sourcing the detection and flagging of whatever the item of interest happens to be is the most efficient method in place today. Incredibly, no matter what the topic of interest is, the public wants to help. Humanitarian, scientific, wildlife preservation, archeological, all have found a following.

The only things that have been shown to do a better job than humans in analyzing imagery are machines. The second reading takes us to the role AI and machine learning is playing in the processing of data and how we are teaching AI, through crowd sourced training sets, to process data so that interpretation can be done at a larger scale and faster. Time is the element that we are continually attempting to harness. The last reading transitions us to consider the implications of massive campaigns of data collection. Clearly this is a technological capability that we currently possess. The article poses the question, “Should we?”. The article also ponders whether people truly understand how the data will be used and what privacy we may be giving up for the proposed positive outcomes associated with the data collection.

Chapter Summaries

Crowd Computing Satellite and Aerial Imagery

Digital Humanitarians by Patrick Meier (Chapter 4)

Big Data Fusion, or using multiple different sensor systems and web tools in an integrated manner to produce a coherent operational picture, is the future of crowd sourced Digital Humanitarian work; this is referred to as microtasking. Recent history has shown that there are a number of sensor systems that can be used to collect needed imagery to do the necessary analysis required in a crisis.

On one end of the spectrum, you have extensive, and expensive, satellite imagery. In 2014, satellite imagery was provided to digital humanitarians online to help in the search and rescue operations of Malaysia Airlines flight 370. In just four days, 8 million volunteers had combed over 400,000 square miles of ocean and land. Crowd sourced imagery analysis has helped try to find the grave of Genghis Khan, search for Steve Fossett’s missing airplane, identify the number of shelters in Somalia, and tag galaxies for astrophysicists.

For much cheaper, digital humanitarians can buy Unmanned Aerial Vehicles (UAVs). These are becoming less expensive by the day and can be used to collect images of large areas or many images of focused areas. Either way, UAVs are unaffected by clouds or other atmospherics that can cause trouble for satellite images. On the very inexpensive end of the spectrum are kites and balloons. In 2010, balloons and kites that could produce pseudosatellite images were used to document the BP oil spill and the magnitude of destruction.

No matter the method of collection, the crowd sourcing of analysis is what can make this methodology so powerful. Now, some organizations’ are even experimenting with crowdsourcing the crowdsourcing. In this manner, the most accurate and efficient digital humanitarians are assigned to review the work of others. The most important images are then sent to professional analysts to review and eventually disseminate to the stakeholders on the ground.

Just as texts and images were the medium used to create crisis maps in aiding emergency response during natural disasters and search and rescue operations, aerial selfies, using small personal UAVs, are already beginning to join the world of crisis mapping. Now aerial views of points of interest to digital humanitarians can be added to a crisis map and combined with tweets and pictures from Image Clickers. The result…Big Data Fusion.

Artificial Intelligence in the Sky

Digital Humanitarians by Patrick Meier (Chapter 6)

When it comes to Satellite imagery the sheer quantity of images is now outpacing even the ability of microtasking by crowdsourcing. “Microtasking alone can’t actually keep up with 1.5 million square miles of new satellite imagery produced every day – a figure that will increase substantially within just a few years.” Enter the machines. That’s right, just like something out of a 80s movie with Arnold Schwarzenegger, one solution being researched is machine learning and artificial intelligence. Machine learning has shown a lot of capability in the organizing and prioritizing of pictures from disaster response using many of the same techniques as with letters and words. Automated imagery analysis has shown to be extremely effective. In Haiti, the European Commission’s Joint Research Center (JRC) showed a 92% accuracy rate in automatically identifying rubble-filled areas. Unfortunately, “image-based machine-learning classifiers do not ‘port’ well.” In other words, one classification system for one area does not necessarily work for classification in another disaster response in another area.

Satellite imagery has also been shown to be useful in following refugee migration, estimating building size, and other aspects of disaster relief using such imagery characteristics as shadows. The hard part is getting satellite imagery inside the 24-hour threshold window for being able to have an impact.

Organizations like Galaxy Zoo have demonstrated that using the crowdsourced human-classified imagery to “train” the machines, they have been able to achieve classifications accuracy greater than 90%. Eventually, at least for Galaxy Zoo, the machines began to outperform the humans and put the volunteers on the sidelines. However, volunteer microtaskers are still needed to create new machine training sets.

UAV imagery is about to become as much of a “big data problem” as satellite imagery. But creation of training sets for machine learning to make sense of UAV images is already underway via crowd sourcing and microtasking. The University of Maryland’s Institute for Advanced Computer Studies has already developed specialized software to automatically identify Rhino poachers and even the type of weapon they are carrying via pattern-recognition algorithms. These techniques could be easily applied to humanitarian disasters.

Eye in the Sky

Radiolab Podcast https://www.wnycstudios.org/story/eye-sky

Radiolab Podcast: Eye in the Sky was sponsored by Radio Lab a Radio Program on WYNC a public radio station in New York City. The podcast included Manoush Zomorodi and Alex Goldmark from the podcast “Note to Self” and Mr. Ross McNutt the CEO of Persistent Surveillance systems. Mr. Ross McNutt had been a 20 year Airforce Veteran that was part of an Air Force Team looking at how to reduce the deaths as a result of IED attacks in IRAQ. McNutts team developed a system (Project Angel Fire) that photographed a large area and took photos at 1 per second creating a snapshot in time that can be reviewed later. Once an incident has occurred or has been reported, those photos can be looked at in reverse time. Starting from the incident in 1-second intervals in an effort to discover when the device was planted. If the vehicle responsible for the IED can be identified, the technology can also help answer multiple other questions. Where did the car come from? Did it pickup anyone up? Where did it start? The photographs can also be played forward post-event to see where the vehicle went after planting the device. As a result of the review of the photographs Security Forces can be sent to the most likely location of the terror cell.

The Podcast looks at the Project Angel Fire concept and examines the civilian impact of this technology, primarily law enforcement applications and the tradeoff of security and privacy; could this technology be used in the future by divorce lawyers, real estate people to monitor property, etc? The podcast recognized the potential for its significant ability to support Law Enforcement. A prime example is the city of Juarez Mexico after a killing of a Police Officer. The technology was used to identify the vehicles involved, who they came in contact with, and where they went. That data was used to map the locations these vehicles had in common so that law enforcement could make arrests at these mapped locations. This ultimately took down a drug cell that had been linked to multiple murders.

The podcast also discusses an incident involving the city of Dayton, Ohio. The city attempted to implement this program and had an open forum to allow residents to discuss their concerns. While many supported the program, the ones that did not were the loudest and prevented implementation. It did recognize that the program might be implemented in the future if the city can develop a better communication strategy, explaining the program and its limitations to reassure the community on what the program will be used for.

The Podcast participants clearly recognize the potential positive impacts this technology brings to the table, but raise questions about our government and data. Can we trust our government to use the data as they say they will? Just because we can, should we? My opinion is this technology will eventually be utilized in those cites that can afford it, it is my personal opinion programs like this are always better in the open under legal oversight rather than in the shadows as intelligence programs, where they may lack the oversight required to ensure they don’t exceed their guidance.

Key Take-Aways (for Yellowdig)

https://www.youtube.com/watch?v=fVL3TC9wC5Q&feature=youtu.be

Discussion Questions

  1. Chapter 4- Have you heard the term micro tasking or crowd sourcing? Have you been an internet volunteer or a digital humanitarian? Maybe you were and you didn’t even know it.
  2. Chapter 6- What do you think about the development of AI (artificial Intelligence) to help humans interpret big data? Do you trust AI to make decisions for us?
  3. Eye in the sky - Because we can, should we? Do we trust those with the data to use it as it was intended? What are the privacy concerns and can we protect it once it enters the public domain? Although the information is being used for one purpose now will there be future purposes for the information that we are collecting which we haven’t even thought of yet.

References