CPP 526: Foundations of Data Science I




This lab offers practice with some basic R functions for summarizing vectors.

I have provided you with a LAB-01 RMD template to get you started:




Functions

You will use the following functions for this lab:



Data

This lab uses city tax parcel data from Syracuse, NY. [ Data Dictionary ]



Loading Data Into R

You can load the dataset by including the following code chunk in your file:

Note that referencing variables in R requires both the dataset name and variable name, separated by the $ operator:

Unlike other stats programs, you can have several datasets loaded at the same time in R. They will often have variables with the same name (if you create a subset, for example, and save it as a new object you will have two datasets with identical names). To avoid conflicts R forces you to use the dataset$variable convention.



Lab Instructions

Answer the following questions using the Syracuse parcels dataset and the functions listed.

Your solution should include a written response to the question, as well as the code used to generate the result.


1. How many tax parcels are in Syracuse?

dataset dimensions: dim() or nrow()


2. How many acres of land are in syracuse?

sum() over the numeric acres vector


3. How many vacant BUILDINGS are there in Syracuse?

sum() over the vacantbuil logical vector


4. What proportion of parcels are tax exempt?

sum() plus length() functions withthe logical tax.exempt vector


5. Which neighborhood contains the most tax parcels?

table() with the neighborhood variable


6. Which neighborhood contains the most vacant LOTS?

table() with the neighborhood and land_use variables



HELPFUL HINTS:

When you apply a sum() function to a numeric vector it returns the sum of all elements in the vector.

When you apply a sum() function to a logical vector, it will count all of the TRUEs:

R wants to make sure you are aware of missing values, so it will return NA (not available) for functions performed on vectors with missing values.

Add the ‘NA remove’ argument (na.rm=TRUE) to functions to ignore missing values:



Submission Instructions

When you have completed your assignment, knit your RMD file to generate your rendered HTML file. Platforms like BlackBoard and Canvas often disallow you from submitting HTML files when there is embedded computer code, so create a zipped folder with both the RMD and HTML files.

Login to Canvas at http://canvas.asu.edu and navigate to the assignments tab in the course repository. Upload your zipped folder to the appropriate lab submission link.

Remember to:

See Google’s R Style Guide for examples.



Markdown Trouble?

If you are having problems with your RMD file, visit the RMD File Styles and Knitting Tips manual.

Notes on Knitting

Note that when you knit a file, it starts from a blank slate. You might have packages loaded or datasets active on your local machine, so you can run code chunks fine. But when you knit you might get errors that functions cannot be located or datasets don’t exist. Be sure that you have included chunks to load these in your RMD file.

Your RMD file will not knit if you have errors in your code. If you get stuck on a question, just add eval=F to the code chunk and it will be ignored when you knit your file. That way I can give you credit for attempting the question and provide guidance on fixing the problem.