Welcome to Intro to Data Science for the Social Sector. This is a broad overview course designed to expose you to common and useful open source tools in the field.
A few notes about the semester:
The course can be completed entirely online, but I will also offer a weekly face-to-face review sessions if you prefer to ask questions or discuss content in a classroom.
Learning a new skill like R programming is never painless, but you can follow a few simple rules to succeed at this course:
Your grade is comprised of the following:
Labs will give you practice with the key concepts and functions of the week. They are graded pass-fail and you get to drop one lab (chances are you will have one rough week this semester). To pass you need to get at least half of the questions correct and demonstrate that you understand the material.
This grading system is designed as a way for you to focus on the big picture each week and not worry if a specific function or model is not working properly. Computer languages can be unforgiving in the sense that one minor error can prevent the program from running.
Learning a programming language is a lot like learning a natural language in that it is easy to become conversant enough to find your way around a city and order some food at a restaurant, but much harder to become fluent. The goal is for you to become conversant in R by the end of the semester so that you can begin using tutorials and discussion forums to further your journey.
Discussion boards will be used to reflect upon case studies of data science tools in practice. The final project will require you to practice skills from the semester by building a basic dashboard.
You are not required to purchase a textbook. I have referenced several useful reference texts in the syllabus, many of which are available free online.
The text available on this website is a set of lecture notes created using Markdown documents so that you can easily copy and paste code from the examples, or borrow document formats that you like. All of the raw files are available on GitHub, and can be easily accessed by clicking on the “edit” link on each page.
It is always helpful to motivate a topic with an example. I like this case study of change that occurred when the Philadelphia police began transitioning from managing a department using instinct of senior officers to using data to identify areas of high need and most effective practices.
“A data scientist is a person who should be able to leverage existing data sources, and create new ones as needed in order to extract meaningful information and actionable insights. These insights can be used to drive business decisions and changes intended to achieve business goals… ‘The Perfect Data Scientist’ is the individual who is equally strong in business, programming, statistics, and communication.”
FROM: What Is Data Science, and What Does a Data Scientist Do?
“Universities can hardly turn out data scientists fast enough. To meet demand from employers, the United States will need to increase the number of graduates with skills handling large amounts of data by as much as 60 percent.”
FROM: Data Science: The Numbers of Our Lives, The New York Times
“Data Scientists identify complex business problems whilst leveraging data value. They work as part of a multidisciplinary team with data architects, data engineers, analysts and others.”
FROM: Data Scientist Career Pathways in Government.
This course is organized around learning the foundations of programming in R. Although there are several good choices for languages that specialize in data analysis, R has advantages:
For some background information on R, read the New York Times story: Data Analysts Captivated by R’s Power. Or Check out this 1-minute explainer video:
You will find this background information helpful as you start using R:
See the Resources tab for some additional information about R and R Studio.
PART I
Install R and R Studio on your machine. See the Resources tab for tips.
Install Pandoc if you would like to create PDF reports.
PART II - Create a Markdown Bio
Create a short bio for the class. Each question requires a different type of text or graphic. Use Markdown to format each answer correctly.
The full lab instructions are available HERE.
Aug 26-Sept 2
By the end of this unit, you will be able to:
Course text, CH2: Functions
Course text, CH3: Data Structures
For additional material, check out:
Teetor, P. (2011). R cookbook: Proven recipes for data analysis, statistics, and graphics. “ O’Reilly Media, Inc.”. CH 5.2-5.5
You have one lab and one discussion topic this week:
Lab 02 - Functions and Vectors
Note the due date on the lab has been changed to Sunday, September 2nd.
Aug 31 - Sept 7
By the end of this unit, you will be able to:
Course text, CH4: Logical Statements
You have one lab and one discussion topic this week, due Friday, Sept 7th by 11:59pm.