Data-Driven Docs
CONTENTS
- What Are Data-Driven Docs?
- How Do Data-Driven Docs Work?
- What is Markdown?
- Knitting R Markdown Files
- Output Types
- Installation
What Are Data-Driven Docs?
Data-driven documents are formats that combine text and analysis (data+code).
In doing so, they promote transparency and reproducability. For any given table, figure, or model in the document you should be able to easily discern how it was created, from what data, and what analysis was used.
Popular formats include things like R Markdown documents and Jupyter notebooks.
How Do Data-Driven Docs Work?
All of the document formats build from a simple text formatting convention called markdown.
To create an R Markdown document, you need three things:
- A header to specify the document type.
- Some text (formatted in markdown).
- Some code (inside a “code chunk”).
You can download a sample template HERE.
What is Markdown?
Markdown is a simple set of rules used to format text. It has been adopted broadly by the data science community and is used on GitHub, Stackoverflow, and now in R Studio.
To give just a couple of examples of how it works:
Unordered Lists
* First item
* Second item
* Third item
* First nested item
* Second nested item
- First item
- Second item
- Third item
- First nested item
- Second nested item
Hyperlinks
Create links by wrapping the link text in square brackets [ ], and the URL in adjacent parentheses ( ).
I find that [Google News](https://news.google.com) over-curates my media diet.
I find that Google News over-curates my media diet.
Tables
| Title 1 | Title 2 |
|------------------|------------------|
| First entry | Second entry |
| Third entry | Fourth entry |
| Fifth entry | Sixth entry |
Title 1 | Title 2 |
---|---|
First entry | Second entry |
Third entry | Fourth entry |
Fifth entry | Sixth entry |
You can see a full list of markdown rules HERE.
Knitting R Markdown Files
Code is placed inside of “chunks” in the documents:
When you “knit” a file R Studio will run all of code, embed the output into your document, and then convert the file to whichever type you have specified in the file header.
Output Types
You can select from many different document types, including HTML pages, Microsoft word, presentation formats, or dashboards.
Check out these examples:
R Markdown Formats
R Markdown Gallery
HTML Pages
---
output: html_document
---
Dashboards
---
output: flexdashboard::flex_dashboard:
---
[ dashboard example ] [ source code ] [ blog about the tracker ]
PDFs
---
output: pdf_document
---
Installation
You will need the following programs to generate data-driven documents in R:
- Base R installation CRAN
- R Studio download page
- Pandoc (comes with R Studio by default)
When you first try to knit a file, you might get a message that you need the following packages:
- rmarkdown
- knitr
These can be installed in the usual manner:
install.packages( "rmarkdown" )
install.packages( "knitr" )
PDFs:
If you would like to knit to PDF you need one additional program. TeX creates publication-quality PDF files. The open-source version is called MiKTeX download page.
If you have problems, you can find some nice tutorials like this one: https://www.reed.edu/data-at-reed/software/R/r_studio_pc.html
Specialized packages:
Some document output formats require specific R packages. For example:
- journal templates
- dashboards
- r websites
- books in bookdown
You can find many of these packages on the R Markdown templates page.