CPP 526: Foundations of Data Science I


This lab introduces core plotting functions in order to create customized graphics in R.

You can create a new RMarkdown file, or download the LAB-03 RMD template:




Replicating NYT Graphics

For this lab you will replicate the following NYT Graphic.



Functions

You will use the following functions:



Data

The data comes from the Lahman baseball data package. The Teams dataset contains season statistics for each baseball team from the late 1800s onward. The graph reports average strike-outs per game, which is calculated as ave.so below:


You will need only the average strike-outs per game, and the year variables:


Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
0 3.476 4.951 4.811 6.089 9.525 120


Note that you don’t have to understand baseball to make the graphic.



Lab Instructions

Your task is to replicate as closely as possible the graphic published by the NYT.

1. Plot average strike-outs by year.

Use 1900 as the starting year for the graph and 2025 as the end point using the xlim=c() argument in the plot.window() function.

4. Create grid lines for your y-axis using the abline() function. The argument h=c() allows you to specify the location of your horizontal lines.

7. Use the text() and segments() functions to reproduce at least two of the narrative texts from the graphic (“US enters World War I”, etc.). Note that a line break within text is created by including “\n” in your string.

For example:

Your final plot should be as similar as possible to the NYT graphic!



Hints

If you need help looking up arguments remember these two helpful functions:



Submission Instructions

Login to Canvas at http://canvas.asu.edu and navigate to the assignments tab in the course repository. Upload your HTML file and RMD file to the appropriate lab submission link.

Remember to: