Data
Open Data
Everyone enjoys discovering interesting datasets, but useful datasets are even better. The problem is that the open data movement has been too successful by some measures.
We have gone from a relatively data poor environment in government and nonprofits to an amazing repositories of open data assets, including Socrata and CKAN’s Open Government data portals, the US Federal portal at www.data.gov, and over 80,000 academic datasets posted on Dataverse.
Data is more useful when it comes with a vignette that provides some information about the nature of the data and potential uses. We will work to update the site with some helpful datasets for public affairs programs in the near future.
In the meantime, enjoy some of these resources:
OPEN DATA
Overview
- Background on the Open Data Movement [ link ]
- Ben Wellington’s TED Talk on Open Data in NYC [ link ]
DATA Act
- The Data Transparency Act [ overview ] [ link ] [ link ] [ link ]
- Keynote Speech on Importance of DATA Act [ link ]
- Progress Tracker on Federal Open Data Compliance [ link ]
Impact of Open Data
- I Quant NY [ budget error ] [ metro fares ] [ parking tickets ]
- Realizing the Promise of Big Data: IBM Center for Gov. [ link ]
- Data Used in 2017 Public Policy Dissertations [ link ][ broken link ]
Guides & Best Practices
- Project Open Data [ link ] [ principles ]
- Open North standards [ link ]
- Sunlight Foundation’s Open Data Guidelines [ link ]
- Global Impact of Open Data Book: GovLab / O’Reilly [ link ]
- The Hidden Cost (and Benefits) of Open Data [ link ]
- Data Maturity Framework [ link ]
- How to Share Data for Collaboration [ link ]
Government Portal and Resources
- How to Make Government Data Sites Better [ link ] [ link ]
- US Cities Open Data Census [ link ]
- Statewide Portal Tested in California [ link ]
- Five Largest Cities Now Have Open Data Policies [ link ]
- 40 Brilliant Open Data Projects for Smart Cities [ link ]
Machine Learning Training Data
- Top Sources for Machine Learning Datasets [ link ]
Useful Data Sources
APIs
- Awesome Public Datasets Page [ GitHub ]
- Quandl API (many data sources) [ link ] [ r package ]
- Census Data API [ acs package ] [ census api ]
- TwitteR Package API [ link ]
- 19 Free Public Datasets (Springboard blog) [ link ]
- ckanr [ github ] [ vignette ]
- Rsocrata [ github ]
- censusapi Package [ github ] [ slides ] [ tutorial ]
- @unitedstates [ about ] [ github ]
- Data USA [ link ] [ documentation ]
- Data Science Toolkit [ link ] [ rpackage ]
- Federal Government APIs [ link ]
- Strava GPS Data of Athletes by City [ blog ]
- rtimes Package: NYTimes API for government data [ link ]
- rsunlight Package: Wrapper for the Open Congress and Open States APIs [ link ]
Data for Teaching
- A Little Stats [ link ]
- Fun Data for Teaching [ link ]
- Forbes: 35 Open Data Sources of Note [ link ]
- 100 Interesting Datasets [ link ]
- Data and Story Library [ link ]
Open Data for the Nonprofit Sector
- Urban Institute’s NCCS Data [ link ]
- Nonprofit Open Data Collective [ link ]
- Bureau of Labor Statistics Employment Data by Sector [ link ]
- Association of Religious Archives Congration Data by County 1950-2010 [ link ]
Poverty Action Lab Catalog of Administrative Data
A guide to data sources that have been used as the basis of sampling frameworks for randomized control trials (RCTs) in the US.
Data-Driven Journalism Project Portals
Data journalists are making their stories transparent by posting the data and code used for their work so that it can be easily replicated or the work can be extended.
- BBC creates graphics cookbook [ link ] [ cookbook ]
- Buzzfeed [ all projects on GitHub ]
- LA Times [ datadesk on GitHub ]
- Washington Post [ projects on GitHub ]
- Associated Press [ GitHub ] [ project template ] [ example ]
- The Economist [ GitHub ]
- Center for Public Integrity [ GitHub ] [ Workplace Descrimination Story ]
Disaster Management
SHELDUS Database [ link ]
Police Data Initiative
The Police Data Initiative is a law enforcement community of practice that includes leading law enforcement agencies, technologists, and researchers committed to engaging their communities in a partnership to improve public safety that is built on a foundation of trust, accountability and innovation. The PDI represents the great work and leadership of more than 130 law enforcement agencies who have released more than 200 datasets to date, and originated as a result of several recommendations in the Task Force on 21st Century Policing that focused on technology and transparency.
Data Blogs
- Data is Plural by Jeremy Singer-Vine [ archive ]
“Awesome Data” Catalog
This is just a sample of some datasets that would be relevant to the public and nonprofit sectors from the larger catalog of open public sources curated and managed by AwesomeData.
Agriculture
- Hyperspectral benchmark dataset on soil moisture
- U.S. Department of Agriculture’s Nutrient Database
- U.S. Department of Agriculture’s PLANTS Database
Climate+Weather
- Actuaries Climate Index
- Australian Weather
- Aviation Weather Center - Consistent, timely and accurate weather […]
- Brazilian Weather - Historical data (In Portuguese) - Data related to […]
- Canadian Meteorological Centre
- Climate Data from UEA (updated monthly)
- European Climate Assessment & Dataset [fixme ]
- Global Climate Data Since 1929
- NASA Global Imagery Browse Services
- NOAA Bering Sea Climate
- NOAA Climate Datasets
- NOAA Realtime Weather Models
- NOAA SURFRAD Meteorology and Radiation Datasets
- The World Bank Open Data Resources for Climate Change
- UEA Climatic Research Unit
- WU Historical Weather Worldwide
- WorldClim - Global Climate Data
ComplexNetworks
DataChallenges
Economics
- American Economic Association (AEA)
- EconData from UMD
- Economic Freedom of the World Data
- Historical MacroEconomic Statistics
- INFORUM - Interindustry Forecasting at the University of Maryland
- International Economics Database
- International Trade Statistics
- Internet Product Code Database
- Joint External Debt Data Hub
- Jon Haveman International Trade Data Links
- OpenCorporates Database of Companies in the World
- Our World in Data
- SciencesPo World Trade Gravity Datasets
- The Atlas of Economic Complexity
- The Center for International Data
- The Observatory of Economic Complexity
- UN Commodity Trade Statistics
- UN Human Development Reports
Education
Foundations
- International Aid Transparency Initiative (iati) [ database of grants ]
- Ford Foundation Grants [ database ]
- Hewlett Foundation Grants [ database ]
GIS
- ArcGIS Open Data portal
- Cambridge, MA, US, GIS data on GitHub
- Factual Global Location Data
- IEEE Geoscience and Remote Sensing Society DASE Website
- Geo Maps - High Quality GeoJSON maps programmatically generated
- Geo Spatial Data from ASU
- Geo Wiki Project - Citizen-driven Environmental Monitoring
- GeoFabrik - OSM data extracted to a variety of formats and areas
- GeoNames Worldwide
- Global Administrative Areas Database (GADM) - Geospatial data organized […]
- Homeland Infrastructure Foundation-Level Data
- Landsat 8 on AWS
- List of all countries in all languages
- National Weather Service GIS Data Portal
- Natural Earth - vectors and rasters of the world
- OpenAddresses
- OpenStreetMap (OSM)
- Pleiades - Gazetteer and graph of ancient places
- Reverse Geocoder using OSM data
- Robin Wilson - Free GIS Datasets
- TIGER/Line - U.S. boundaries and roads
- TZ Timezones shapfiles
- TwoFishes - Foursquare’s coarse geocoder
- UN Environmental Data
- World boundaries from the U.S. Department of State
- World countries in multiple formats
Government
- Alberta, Province of Canada
- Antwerp, Belgium
- Argentina (non official)
- Datos Argentina - Portal de datos abiertos de la República Argentina. […]
- Austin, TX, US
- Australia (abs.gov.au)
- Australia (data.gov.au)
- Austria (data.gv.at)
- Baton Rouge, LA, US
- Belgium
- Brazil
- Buenos Aires, Argentina
- Calgary, AB, Canada
- Cambridge, MA, US
- Canada
- Chicago
- Chile
- China
- Dallas Open Data
- DataBC - data from the Province of British Columbia
- Denver Open Data
- Durham, NC Open Data
- Edmonton, AB, Canada
- England LGInform
- EuroStat
- EveryPolitician - Ongoing project collating and sharing data on every […]
- Federal Committee on Statistical Methodology (FCSM) (formerly FedStats)
- Finland
- France [fixme ]
- Fredericton, NB, Canada
- Gatineau, QC, Canada
- Germany
- Ghent, Belgium
- Glasgow, Scotland, UK
- Greece
- Guardian world governments
- Halifax, NS, Canada
- Helsinki Region, Finland
- Hong Kong, China
- Houston, TX, US
- Indian Government Data
- Indonesian Data Portal [fixme ]
- Ireland’s Open Data Portal
- Italy - Il Portale dati.gov.it è il catalogo nazionale dei metadati […]
- Japan
- Laval, QC, Canada
- Lexington, KY
- London Datastore, UK
- London, ON, Canada
- Los Angeles Open Data
- Luxembourg - Luxembourgish Open Data Portal
- MassGIS, Massachusetts, U.S.
- Metropolitain Transportation Commission (MTC), California, US
- Mexico
- Missisauga, ON, Canada
- Moldova
- Moncton, NB, Canada
- Montreal, QC, Canada
- Mountain View, California, US (GIS)
- NYC Open Data [fixme ]
- NYC betanyc
- Netherlands
- New Zealand
- OECD
- Oakland, California, US
- Oklahoma
- Open Data for Africa
- Open Government Data (OGD) Platform India
- OpenDataSoft’s list of 1,600 open data
- Oregon
- Ottawa, ON, Canada
- Palo Alto, California, US
- OpenDataPhilly - OpenDataPhilly is a catalog of open data in the […]
- Portland, Oregon
- Portugal - Pordata organization
- Puerto Rico Government
- Quebec City, QC, Canada
- Quebec Province of Canada
- Regina SK, Canada
- Rio de Janeiro, Brazil
- Romania
- Russia
- San Diego, CA
- San Antonio, TX - Community Information Now - CI:Now is a nonprofit […]
- San Francisco Data sets
- San Jose, California, US
- San Mateo County, California, US
- Saskatchewan, Province of Canada
- Seattle
- Singapore Government Data
- South Africa Trade Statistics
- South Africa
- State of Utah, US
- Switzerland
- Taiwan gov
- Taiwan
- Tel-Aviv Open Data
- Texas Open Data
- The World Bank [fixme ]
- Toronto, ON, Canada
- Tunisia [fixme ]
- U.K. Government Data [fixme ]
- U.S. American Community Survey [fixme ]
- U.S. CDC Public Health datasets
- U.S. Census Bureau
- U.S. Department of Housing and Urban Development (HUD)
- U.S. Federal Government Agencies
- U.S. Federal Government Data Catalog
- U.S. Food and Drug Administration (FDA)
- U.S. National Center for Education Statistics (NCES)
- U.S. Open Government
- UK 2011 Census Open Atlas Project
- U.S. Patent and Trademark Office (USPTO) Bulk Data Products
- Uganda Bureau of Statistics
- Ukraine
- United Nations
- Uruguay [fixme ]
- Valley Transportation Authority (VTA), California, US [fixme ]
- Vancouver, BC Open Data Catalog
- Victoria, BC, Canada
- Vienna, Austria
- U.S. Congressional Research Service (CRS) Reports
Healthcare
- Composition of Foods Raw, Processed, Prepared USDA National Nutrient Database for Standard […]
- EHDP Large Health Data Sets
- GDC - GDC supports several cancer genome programs for CCG, TCGA, TARGET etc.
- Gapminder World demographic databases
- MeSH, the vocabulary thesaurus used for indexing articles for PubMed
- Medicare Coverage Database (MCD), U.S.
- Medicare Data Engine of medicare.gov Data
- Medicare Data File
- Number of Ebola Cases and Deaths in Affected Countries (2014)
- Open-ODS (structure of the UK NHS)
- OpenPaymentsData, Healthcare financial relationship data
- PhysioBank Databases - A large and growing archive of physiological data.
- The Cancer Imaging Archive (TCIA)
- The Cancer Genome Atlas project (TCGA)
- World Health Organization Global Health Observatory
- Informatics for Integrating Biology & the Bedside
PublicDomains
- Amazon
- Archive.org Datasets
- Archive-it from Internet Archive
- CMU JASA data archive
- CMU StatLab collections
- Data.World [fixme ]
- Data360 [fixme ]
- Enigma Public
- Grand Comics Database - The Grand Comics Database (GCD) is a nonprofit, […]
- Infochimps [fixme ]
- KDNuggets Data Collections
- Microsoft Azure Data Market Free DataSets
- Microsoft Data Science for Research
- Microsoft Research Open Data
- Numbray [fixme ]
- Open Library Data Dumps
- Reddit Datasets [fixme ]
- RevolutionAnalytics Collection
- Sample R data sets
- StatSci.org
- Stats4Stem R data sets (archived)
- The Washington Post List
- UCLA SOCR data collection
- UFO Reports
- Wikileaks 911 pager intercepts
- Yahoo Webscope
SearchEngines
- Academic Torrents of data sharing from UMB
- DataMarket (Qlik)
- Datahub.io
- Harvard Dataverse Network of scientific data
- ICPSR (UMICH)
- Institute of Education Sciences
- National Technical Reports Library
- Open Data Certificates (beta)
- OpenDataNetwork - A search engine of all Socrata powered data portals
- Statista.com - statistics and Studies
- Zenodo - An open dependable home for the long-tail of science
SocialNetworks
- 72 hours #gamergate Twitter Scrape
- Ancestry.com Forum Dataset over 10 years
- CMU Enron Email of 150 users
- Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape
- EDRM Enron EMail of 151 users, hosted on S3
- Facebook Data Scrape (2005)
- Facebook Social Networks from LAW (since 2007)
- Foursquare from UMN/Sarwat (2013)
- GitHub Collaboration Archive
- Google Scholar citation relations
- High-Resolution Contact Networks from Wearable Sensors
- Indie Map: social graph and crawl of top IndieWeb sites
- Mobile Social Networks from UMASS [fixme ]
- Network Twitter Data
- Reddit Comments
- Skytrax’ Air Travel Reviews Dataset
- Social Twitter Data
- SourceForge.net Research Data
- Twitter Data for Online Reputation Management
- Twitter Data for Sentiment Analysis
- Twitter Graph of entire Twitter site
- Twitter Scrape Calufa May 2011 [fixme ]
- UNIMI/LAW Social Network Datasets
- Yahoo! Graph and Social Data
- Youtube Video Social Graph in 2007,2008
SocialSciences
- ACLED (Armed Conflict Location & Event Data Project)
- Canadian Legal Information Institute
- Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc
- Correlates of War Project
- Cryptome Conspiracy Theory Items
- Datacards [fixme ]
- European Social Survey
- FBI Hate Crime 2013 - aggregated data
- Fragile States Index
- GDELT Global Events Database
- General Social Survey (GSS) since 1972
- German Social Survey
- Global Religious Futures Project
- Gun Violence Data - A comprehensive, accessible database that contains […]
- Humanitarian Data Exchange
- INFORM Index for Risk Management
- Institute for Demographic Studies
- International Networks Archive
- International Social Survey Program ISSP
- International Studies Compendium Project
- James McGuire Cross National Data
- MIT Reality Mining Dataset
- MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste
- Minnesota Population Center
- Notre Dame Global Adaptation Index (ND-GAIN)
- Open Crime and Policing Data in England, Wales and Northern Ireland
- OpenSanctions - A global database of persons and companies of political, […]
- Paul Hensel General International Data Page
- PewResearch Internet Survey Project
- PewResearch Society Data Collection
- Political Polarity Data
- StackExchange Data Explorer
- Terrorism Research and Analysis Consortium
- Texas Inmates Executed Since 1984
- Titanic Survival Data Set
- UCB’s Archive of Social Science Data (D-Lab)
- UCLA Social Sciences Data Archive
- UN Civil Society Database
- UPJOHN for Labor Employment Research
- Universities Worldwide
- Uppsala Conflict Data Program
- World Bank Open Data
- WorldPop project - Worldwide human population distributions
TimeSeries
- Databanks International Cross National Time Series Data Archive
- Hard Drive Failure Rates
- Heart Rate Time Series from MIT
- Time Series Data Library (TSDL) from MU
- UC Riverside Time Series Dataset
Transportation
- Airlines OD Data 1987-2008
- Ford GoBike Data (formerly Bay Area Bike Share Data)
- Bike Share Systems (BSS) collection
- GeoLife GPS Trajectory from Microsoft Research
- German train system by Deutsche Bahn
- Hubway Million Rides in MA
- Montreal BIXI Bike Share
- NYC Taxi Trip Data 2009-
- NYC Taxi Trip Data 2013 (FOIA/FOILed)
- NYC Uber trip data April 2014 to September 2014
- Open Traffic collection
- OpenFlights - airport, airline and route data
- Philadelphia Bike Share Stations (JSON) [fixme ]
- Plane Crash Database, since 1920
- RITA Airline On-Time Performance data
- RITA/BTS transport data collection (TranStat)
- Renfe (Spanish National Railway Network) dataset
- Toronto Bike Share Stations (JSON and GBFS files)
- Transport for London (TFL)
- Travel Tracker Survey (TTS) for Chicago
- U.S. Bureau of Transportation Statistics (BTS)
- U.S. Domestic Flights 1990 to 2009
- U.S. Freight Analysis Framework since 2007