Data Programming for Social Scientists

The Data Science Toolkit

We will need three tools to manage your data science projects: a data programming language (R), a project management interfact (R Studio), and a way to create data-driven documents (R Markdown).

Core R [ CH-01 ]

R Studio [ CH-02 ]

Data-Driven Docs [ CH-03 ]

Markdown [ CH-04 ]

Getting Started

R as a Calculator [ CH-05 ]

Functions [ CH-06 ]

The Learning Curve [ CH-07 ]

Getting Help [ CH-08 ]

Starting to Code

One-Dimensional Datasets

Intro to Vectors [ CH-09 ]

Identifying Groups within Data [ CH-10 ]

Two-Dimensional Datasets

Dataframes

Matrices and Lists

Data IO

Getting Data into R [ data import ]

Saving Data [ exporting datasets ]

APIs [ using APIs in R ] [ Demo with DataUSA API ]

Data Wrangling (dplyr)

Data wrangling is the process of preparing data for analysis, which includes reading data into R from a variety of formats, cleaning data, tidying datasets, creating subsets and filters, transforming variables, grouping data, and joining multiple datasets.

The goal of data wrangling is to create a rodeo dataset (clean and well-structured) that is ready for the big show (modeling and visualization)!

Slicing Datasets – Base R and dplyr [ CH-11 ]

Wrangling Recipes [ CH-12 ]

Combining Datasets [ CH-13 ]

Explore and Describe

Group Structure [ CH-14 ]

Summarizing Vectors

Summarizing Groups of Vectors

Visualize

Principles of Visual Communication [ Intro to Data Viz ]

Core Graphics Engine [ Core ] [ Custom ]

Advanced Graphics

ggplot2 [ Intro to the Grammar of Graphics ]

Make Dynamic

R shiny [ overview ] [ tutorial ]

flexdashboards [ overview ] [ demo RMD ]