CPP 526: Foundations of Data Science I




This lab offers practice with logical statements used to create groups from your data.

I have provided you with a LAB-02 RMD template:




Functions

You will use the following functions for this lab:



Data

This lab uses city tax parcel data from Syracuse, NY. [ Data Dictionary ]



Loading Data Into R

You can load the dataset by including the following code chunk in your file:

Load the Syracuse Parcel Map

You will need the following packages for this lab:

NOTE: do not include include install package commands in your RMD chunks. Trying to install packages while knitting can cause errors.

Load the map file:

For this lab you will construct a group by translating the question into a logical statement, then show a map of the group by adapting the code provided:

How many parcels are larger than one acre?

## [1] 1199
## [1] 0.02889017

To show the location of this group on a map:

What proportion of tax parcels are single family homes?

## [1] 0.5877307

Plot single family homes on a map:

Land Use

To define group membership using character vectors or factors we need to know the current group names. We can find this using a table() or unique() function.

##  [1] "Vacant Land"        "Single Family"      "Commercial"        
##  [4] "Parking"            "Two Family"         "Three Family"      
##  [7] "Apartment"          "Schools"            "Parks"             
## [10] "Multiple Residence" "Cemetery"           "Religious"         
## [13] "Recreation"         "Community Services" "Utilities"         
## [16] "Industrial"
## 
##          Apartment           Cemetery         Commercial 
##               1228                 35               2601 
## Community Services         Industrial Multiple Residence 
##                138                102                217 
##            Parking              Parks         Recreation 
##                437                 98                 55 
##          Religious            Schools      Single Family 
##                174                106              24392 
##       Three Family         Two Family          Utilities 
##                825               7259                103 
##        Vacant Land 
##               3732

Note that the spelling has to be identical for the statement to work.

## [1] 1228
## [1] 0

We often want to create a new group from sets of old groups:

## [1] 0.007204472



Lab Instructions

Answer the following questions using the Syracuse parcels dataset and the functions listed.

Your solution should include a written response to the question, as well as the code used to generate the result.


1. How many Single Family homes are in Syracuse? Map your results.

sum() function and land_use variable


2. Where are the majority of commercial properties located in the city? Map your results.

land_use variable


3. Where is new housing stock being built?

Calculate the proportion of single family homes built since 1980. Map them.

count single family homes, and single family homes built since 1980: land_use and yearbuilt variables


4. How many parcels contain multi-family units? Map your results.

use the sum() function with land_use variable;
create a group that includes parcels with apartments and two and three family homes


5. How many single family homes are worth more than $200,000? Map your results.

sum() with the assessedva and land_use variables;


6. What proportion of parcels have delinquent tax payments owed?

mean() with the amtdelinqu variable


7. Does tax delinquiency vary by land use?

table() with amtdelinqu and land_use variables



Submission Instructions

When you have completed your assignment, knit your RMD file to generate your rendered HTML file.

Login to Canvas at http://canvas.asu.edu and navigate to the assignments tab in the course repository. Upload your HTML and RMD files to the appropriate lab submission link.

Platforms like BlackBoard and Canvas sometimes disallow you from submitting HTML files when there is embedded computer code. If this happens create a zipped folder with both the RMD and HTML files.

Remember to:

See Google’s R Style Guide for examples.



Markdown Trouble?

If you are having problems with your RMD file, visit the RMD File Styles and Knitting Tips manual.

Notes on Knitting

Note that when you knit a file, it starts from a blank slate. You might have packages loaded or datasets active on your local machine, so you can run code chunks fine. But when you knit you might get errors that functions cannot be located or datasets don’t exist. Be sure that you have included chunks to load these in your RMD file.

Your RMD file will not knit if you have errors in your code. If you get stuck on a question, just add eval=F to the code chunk and it will be ignored when you knit your file. That way I can give you credit for attempting the question and provide guidance on fixing the problem.