Data Science Resources

Resources for those that want to leverage data science tools for work in government and nonprofits.


CONTENTS:




US Digital Services Overview

  • Inside Obama’s Stealth Startup [ link ]
  • Why I Joined the US Digital Services [ link ]
  • Five Examples of How Federal Agencies Use Big Data [ link ]

Examples of Good Local Government Portals

Artificial Intelligence’s Impact on Government

  • AI to Transform Government [ link ]
  • AI for the American People [ link ]
  • Delivering Artificial Intelligence in Government: Challenges and Opportunities [ IBM report ]
  • Brookings Center Report on Automation [ link ]
  • Developing AI for Federal Government [ link ]




Open Data

Overview

  • Background on the Open Data Movement [ link ]
  • Ben Wellington’s TED Talk on Open Data in NYC [ link ]

Impact of Open Data

  • I Quant NY [ link ]
  • Realizing the Promise of Big Data: IBM Center for Gov.  [ link ]
  • Data Used in 2017 Public Policy Dissertations [ link ]

DATA Act

  • The Data Transparency Act [ overview ] [ link ] [ link ] [ link ]
  • Keynote Speech on Importance of DATA Act [ link ]
  • Progress Tracker on Federal Open Data Compliance [ link ]

Guides & Best Practices

  • Project Open Data [ link ]  [ principles ]
  • Open North standards [ link ]
  • Sunlight Foundation’s Open Data Guidelines [ link ]
  • Global Impact of Open Data Book: GovLab / O’Reilly [ link ]
  • The Hidden Cost (and Benefits) of Open Data [ link ]

Awesome Public Datasets

Great starting points for finding topical open data:

Awesome Public Datasets

Government Portals and Resources

  • 40 Brilliant Open Data Projects for Smart Cities [ link ]
  • US Cities Open Data Census [ link ]
  • How to Make Government Data Sites Better [ link ] [ link ]
  • Statewide Portal Tested in California [ link ]
  • Five Largest Cities Now Have Open Data Policies [ link ]

Data-Driven Journalism Project Portals

Data journalists are making their stories transparent by posting the data and code used for their work so that it can be easily replicated or the work can be extended.

Useful Data APIs




Moneyball for Government



Predictive Analytics Models

  • Food Inspection Forecasting: case study on predictive analytics for food violations in Chicago[ link ] 
  • Optimizing Infrastructure Repair [ measurement ] [ model ] [ news ]
  • Pretrial Criminal Risk Assessment for Judges [ link ]
  • Predicting Fire Hazards [ link ] [ model ]
  • Why the Bronx Really Burned - Predictive Analytics Fail [ link ]
  • Use Machine Learning to Predict Infrastructure Failure [ link ]
  • Using Prediction to Prioritize Water Infrastructure Maintenance [ link ] 
  • Using RFIDs to Regulate Marijuana Distribution in Colorado [ link ]
  • Crowd-Sources Solutions [ about DrivenData ] [ current competitions ]
  • State and National Presidential Poll Aggregation [ link ]

Open Innovation

  • The Data-Driven Justice Initiative [ link ]
  • Next Stage in the Open Data Movement [ link ]
  • Challenge.gov: Using Competitions to Spur Innovation [ link ]
  • Data for Democracy [ link ]

Data Science Vignettes

Some cool applications of open data + open source tools.

  • Spatial Analysis (GIS) for Urban Policy [ link ]
  • Network Analysis [ link ]
  • Child Welfare [ link ]
  • Social Media Data [ link ]
  • Moneyball for Infrastructure [ measurement ] [ model ]
  • Target Food Safety Inspection w Open Data [ link ]




Collaboration Tools

GitHub

Working in groups is hard. Most work is done in groups. As a result, project management is a non-trivial task that should not be approached in an ad-hoc fashion. The field of data science has inherited many great collaboration tools that were developed to manage large teams of software engineers, but are being used for many other creative purposes:



Data-Driven Documents

For the purpose of transparency and reproducibility, as well as simple convenience, there is high demand for documents that combine typical elements of publications and reports such as text, tables, graphs and images, and the code that was used to create the analysis presented in the text. These efforts have largely converged on Markdown as a simple publishing language, and derivations like R Markdown to incorporate output from models into documents.

Markdown is a simple set of rules used to format text and images. Formatting it accomplished by adding tags to text.

# H1
## H2
### H3

markdown

The basics are very easy to master by referencing a basic Cheat Sheet.

But don’t let the simplicity fool you. Markdown documents are extremely versatile and powerful. Using the same text and code in a document, minor changes can be made to select a variety of document outputs that best meet the needs of the client or team. For example, check out the diversity of formats available in the R Markdown Gallery.

R Studio makes it easy to create R Markdown documents, and you can select the format by changing the output type. Perhaps you have a regular report created as an HTML page:

--- 
output: html_document
---

And you want to re-organize the material into a dashboard. Simply change the output type:

--- 
output: flexdashboard
---

Then add a few page dividers, and your analysis will now be organized something like this StoryBoard.

Markdown is used on GitHub, Stack Overflow, and in R Markdown documents. Familiarity with the basics offers a lot of power in controlling how your analysis is presented to your audience.

You can see some advanced R Markdown features HERE.

Download an R Markdown Template for labs HERE.

Check out some NICE THEMES for R Markdown documents.




Methods

Visualization

Compendium of Clean Graphs in R: [ link ]
The Data Viz Project [ link ]
Gallery of ggplot geoms [ link ]
Creating More Effective Graphs [ book ] [ gallery
Data + Design: Ebook On Data [pdf ]
An Economist’s Guide to Visualizing Data [ pdf ]
Visuals for Teaching Statistics [ link ] [ link ]
Bl.ocks.org Graphics Gallery [ link ]
Help Me Viz Graphics Gallery [ link ]
What Makes a Map Beautiful? [ link ]
Tableau: Which chart or graph is right for you. [ link ]
Flowing Data [ link ]
Graphics in R Tutorial: [ FlowingData ]
ChartsNThings: A Blog by the NYT Graphics Dept [ link ]
Data Viz Syllabus by Quealy & Carter [ link
Junk Charts: Blog on Making Graphics Better [ link ]
Primer on Making Great Graphs in R [ download ]
10 Tips for Making R Graphics Look Good [ link ]
Data USA [ link ]
CityBike Data Visualized [ link ]
Arms Sales Visualized [ link ]
Pedestrian & Routes in US Cities Visualized [ link ] & Europe [ link ]
Winners of Infographic Awards [ link ]
Visual Essays [ link ]

Bad Graphs

How to Display Data Badly [ link ]
Clowns [ link ]
Worst of 2017 [ link ]
More Worst [ link ]
Calling Bullshit [ Misleading Axes ] [ Proportional Ink ]
Label Your Axes [ link ]
Pie Charts [ link ] [ link ]
Foreign Aid as Missile Attacks [ link ]

Dashboard Design

R Shiny Showcase [ link ]
R Shiny Widgets Gallery [ link ]
Nonprofit Dashboard Design [ webinar ] [ slides ]
Tableau: 6 Best Practices of Effective Dashboards [ download ]

Dashboard Examples

Pittsburgh Building Permits [ link ]
Government Performance in Chattanooga [link ]
Fundraising Dashboard in R [ link ]
DataUSA [ link ]
Census Reporter [ link ]
Teacher Dashboard on Student Performance [ link ]
Vehicle collisions in Edinburgh [ link ]
Traffic accidents in London [ link ]
Life Expectancy Charts [ link ] [ link ]
Rise of Inequality [ link ]
World Development Indicators [ link ]
Demographics in Catalonia, Spain [ link ]
Tableau Gallery [ link ]

Text Analysis

  • Quanteda [ link ]
  • Who Wrote the Anonymous Op-Ed? [ link ] [ link ]




Style Guides

Style guides are the hand-writing of the coding world. Some people have really nice code style, some people have really sloppy code that is hard to read. Consider the readibility of this:

y<-cut(rank(x),breaks=seq(from=1,to=100,by=10),labels=paste("X",1:10,sep=""))

Versus:

y <- cut( rank( x ), breaks=seq( from=1, to=100, by=10 ), labels=paste( "X", 1:10, sep="" ) )

Do yourself and all of your future collaborators a favor and try to develop a consistent coding style. There are two popular style guides for R:

Think of these suggestions as good habits that will make your life easier and will improve your ability to collaborate with others. And remember, your most important collaborator is yourself two months from now!