Part 3 Dot Density and Dot Density Change Maps

The Setup

  • load libraries

    library(sf)
    library(tidyverse)
    library(tigris)
    library(tidycensus)
    library(ggrepel)
    options(tigris_use_cache=TRUE)
    options(tigris_class="sf")
census_api_key("Your Key here")

Bring in 2012 Census Data

  • Bring in variable related to educational attainment

    acs12 <- get_acs("tract", table = "B15003", cache_table = TRUE,
                 geometry = FALSE, state = "AZ", county = "Maricopa County",
                 year = 2012, output = "tidy")
  • As before, need to collapse educational attainment groups into 3 categories

    acs12 <- acs12 %>%
      mutate(
    id = str_extract(variable, "[0-9]{3}$") %>% as.integer
      ) %>%
      # variable 1 is the "total", which is just the sum of the others
      filter(id > 1) %>%
      mutate(education =case_when(
    id %>% between(2, 16) ~ "No HS diploma",
    id %>% between(17, 21) ~ "HS, no Bachelors",
    id > 21 ~ "At Least a Bachelors"
      )) %>% 
      group_by(GEOID, education) %>% 
      summarise(estimate2012 = sum(estimate))

MERGE 2012 and 2017 5-year ACS estimats

```r
acs <- merge(acs,acs12, by.all="GEOID", all.x=TRUE) # all.x=TRUE makes sure that the merged dataframe keeps all counties in CenDF even if missing in CenDF2012
```

Calculate Change

```r
acs <- acs %>%
  mutate( pct_change = 100 * (`estimate` - `estimate2012`) / `estimate2012`,
      change = estimate - estimate2012,
      abs.change = abs(change),
      change.type = ifelse( change >= 0, "POS", "NEG" ) )
```

GENERATING DOTS

  • Dot-density maps work by placing dots randomly within the appropriate geographic boundaries, to approximate the overall distribution of people in space. The function sf::st_sample samples points from within polygons.

  • We will split the data by education level, and then run the sampling function on each level for each block group, rbinding them back together at the end.

    acs_split <- acs %>%
      filter(estimate > 50) %>%
      split(.$education)
    
    generate_samples <- function(data) 
      suppressMessages(st_sample(data, size = round(data$estimate / 100)))
    
    points <- map(acs_split, generate_samples)
    points <- imap(points, 
               ~st_sf(data_frame(education = rep(.y, length(.x))),
                      geometry = .x))
    points <- do.call(rbind, points)
    • At this point, I’ve generated individual points to be plotted. sf can group and summarize geometry, in this case I group by education level and then summarize. I have just 3 layers of multipoints. I’ll also re-code education level to an ordered factor, to make plotting easier.
    points <- points %>% group_by(education) %>% summarise()
    points <- points %>%
      mutate(education = factor(
    education,
    levels = c("No HS diploma", "HS, no Bachelors",
               "At Least a Bachelors")))
    # view how many points are in each layer
    points %>% mutate(n_points = map_int(geometry, nrow))
    ## Simple feature collection with 3 features and 2 fields
    ## geometry type:  MULTIPOINT
    ## dimension:      XY
    ## bbox:           xmin: -113.3239 ymin: 32.65786 xmax: -111.0936 ymax: 34.04087
    ## epsg (SRID):    4269
    ## proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
    ## # A tibble: 3 x 3
    ##   education                                                    geometry n_points
    ## * <fct>                                                <MULTIPOINT [°]>    <int>
    ## 1 At Least a Bac… (-113.2919 33.28612, -113.2531 33.89895, -113.1802 3…     8589
    ## 2 HS, no Bachelo… (-113.3239 33.134, -113.3134 32.93492, -113.3123 33.…    15189
    ## 3 No HS diploma   (-113.2988 32.94418, -113.2777 33.9625, -113.2615 33…     3519

Plot

```r
theme_set( theme_minimal() +
        theme(panel.grid.major = element_line(size = 0),
              plot.background = element_rect( fill = "#fdfdfd",
                                              colour = NA ),
              axis.title = element_blank(),
              axis.text = element_blank(),
              legend.position = "bottom"))

plot7<-ggplot() + 
  geom_sf(data = points, 
      aes(colour = education,
          fill = education),
      size = .1) + 
  scale_color_brewer( type = 'qual', palette = 2 ) + 
  scale_fill_brewer( type = 'qual', palette = 2 )+
  ggtitle("Distribution of educational attainment in Maricopa County",
      "1 dot equals 100 people")

plot7
```

<img src="MapVis2_files/figure-html/unnamed-chunk-13-1.png" width="672" />
  • Not very informative plot other than the heavy concentration of purple dots, denotiing individuals with at least college education. We will work further to make this map more legible later.

Dot Change Maps

  • Can we get an idea about how changes in demographics of people with at least college education coming and leaving census tracts?

  • This time filter out sample of only bachelors and determine whether increase or decrease over time.

    acs_bachelors <-
      acs %>% 
      filter( education == 'At Least a Bachelors' )
    
    acs.split.bachelors <- acs %>%
      # filter( estimate > 50) %>%
      split( .$change.type )
    
    ##Generate Samples function
    
    generate_samples <- function(data)
    {
      suppressMessages( 
       st_sample( data, size = round(data$abs.change / 100 ) ) 
      )
    } 
    
    pointsChange <- map( acs.split.bachelors, generate_samples )
    pointsChange <- imap( pointsChange, 
               ~st_sf(data_frame( change.type = rep(.y, length(.x))),
                      geometry = .x))
    pointsChange <- do.call( rbind, pointsChange )
  • Assign positive changes (increase in educated population) steel blue, otherwise assign firebrick if negative change (decrease in educated population)

    pointsChange$col.code <- ifelse( pointsChange$change.type == "POS", "steelblue", "firebrick" )
    
    plot8<-ggplot() + 
      geom_sf( data = pointsChange, 
          aes(colour = change.type,
              fill = change.type ),
          size = .1)+
      scale_color_brewer(type = "div", palette ='Set1', direction = -1) + 
      scale_fill_brewer(type = "div", palette = 'Set1',direction = -1)+
      ggtitle("Distribution of Change in educational attainment in Maricopa County",
          "1 dot equals 100 people")
    
    
    plot8

  • This map also not very intuitive given most tracts experience net postive changes in educated population. A better illustration would require us to go back and split our change variable into additionl categories, i.e. change from negative/postive binary to several groups such as: negative, small positive change, medium positive change, and high positive change. We would want to focus on high positive change to show significant and rapid change in demogrpahics of residing population, an indiciation of gentrification

Adding Layers

  • The tigris package makes it easy to get various geography layers from TIGER, I’ll add water, major roads, and label the towns in the county. I’ll also pull down the outline of Maricopa County:

    water <- tigris::area_water("Arizona", "Maricopa County")
    towns <- tigris::county_subdivisions("Arizona", county = "Maricopa County")
    roads <- tigris::roads("Arizona", "Maricopa County")
    Az_county <- tigris::counties(state = "Arizona")
    Maricopa <- Az_county %>% filter(COUNTYFP == "Maricopa County")
    
    # create town labels by finding the centroid of each town
    # ggplot's label functions work better with X/Y dataframes rather 
    # than sf objects
    town_labels <- towns %>% select(NAME) %>%
      mutate(center = st_centroid(geometry)) %>%
      as.tibble %>%
      mutate(center = map(center, ~st_coordinates(.) %>%
                        as_data_frame)) %>%
      select(NAME, center) %>% unnest()

Plot

    #Dot Density Map
Plot9<-ggplot() +
  geom_sf(data = Maricopa, size = .1, fill = NA) +
  geom_sf(data = water, colour = "#eef7fa", size = .1,
          fill = "#e6f3f7") +
  geom_sf(data = points,
          aes(colour = education, fill = education),
          size = .1) +
  geom_sf(data = roads %>% filter(RTTYP %in% c("I", "S")),
          size = .2, colour = "gray40") +
  ggrepel::geom_label_repel(
    data = town_labels,
    aes(x = X, y = Y, label = NAME),
    size = 3, 
    label.padding = unit(.1, "lines"), alpha = .7) +
  ggtitle("Distribution of educational attainment in Maricopa County",
          "1 dot equals 100 people")
Plot9

# Dot Change Map
Plot10<-ggplot() +
  geom_sf(data = Maricopa, size = .1, fill = NA) +
  geom_sf(data = water, colour = "#eef7fa", size = .1,
          fill = "#e6f3f7") +
  geom_sf(data = pointsChange,
          aes(colour = change.type, fill = change.type),
          size = .1) +
  geom_sf(data = roads %>% filter(RTTYP %in% c("I", "S")),
          size = .2, colour = "gray40") +
  ggrepel::geom_label_repel(
    data = town_labels,
    aes(x = X, y = Y, label = NAME),
    size = 3, 
    label.padding = unit(.1, "lines"), alpha = .7) +
  ggtitle("Distribution of change in educational attainment in Maricopa County",
          "1 dot equals 100 people")

Plot10

Tmap

  • Lastly, use Tmap to visualize educational attainment changes

    ##TMAP
    library( tmap )
    library( sp )
    library( maptools )
    
    
    pointsChange2 <- st_transform( pointsChange, "+init=epsg:3395" )
    acs2 <- st_transform( acs, "+init=epsg:3395" )
    
    bb <- st_bbox( c( xmin = -12519146, xmax = -12421368, 
                  ymax = 3965924, ymin = 3899074 ), 
               crs = st_crs("+init=epsg:3395"))
    
    tmap_mode("plot")
    tm_shape( acs2, bbox=bb ) +
      tm_borders( "grey50", alpha = 0.3, lwd = 0.1 ) +
      tm_shape( pointsChange2 ) +
      tm_dots(  shape=19, col = "col.code", size=0.15, alpha=0.4 ) 

On your own

  1. In part 1, you transformed HHInc_HousePrice_Ratio from continuous variable to factor variable with 4 levels (i.e. quartiles), and created new cloropleth maps. Looking at grid.arrange to compare your new cloropleth maps (i.e. Plot 1, Plot3, and Plot 5), answer the following questions:

    1a. Which map do you think tells the most (and least) accurate representation of the data?

    Answer:

    1b. Do you think its best to let R automatically cut the varyiable for you (i.e. Plot 1) or better to adjust the variable cut yourself?

    Answer:

  2. In Step 3 of Part 2, what do you notice when looking at the descriptive statistics for the house price to income ratio during the two different time periods (i.e. is there an average increase, decrease or no change over time)? What is the range of the change variable and interpet the meaning?

    Answer:

  3. In Step 5 of Part 2, create at least 2 new cloropleth maps based on alternative cut values for the change in house price to income ratio. Comment on what you did and what map provides the most accurate depiction of the underlying data.

  1. In Part 3, we looked at the change in college educated population using dot change plots. Go back and look at the change in relatively uneducated population by looking at changes in population without high school degree.

    Hint Replace education grouping from above (acs_bachelors <- acs %>% filter( education == 'At Least a Bachelors' )) with No HS diploma. Then, re-run the code and show below 3 new maps from plot 8, plot 10, and tmap above. Write down any information that you can glean from these maps, along with criticisms or how to improve the map?

    #Edit me