Intro to Data Science for the Social Sector

This course introduces students to the field of data science and its applications in the public and nonprofit sectors. Modern performance management and evaluation processes require strong data literacy and the ability to combine and analyze data from a variety of sources to inform managerial processes. We offer a practical, tools-based approach that is designed to build strong foundations for people that want to work as analysts, data-driven managers, or data-driven journalists. We will cover data programming fundamentals, visualization, text analysis, automated reporting, and dynamic reporting using dashboards. The course is analytically rigorous, but no prior programming experience is assumed.


Course Info

Program Title MS in Program Evaluation and Data Analytics
Course Title Introduction to Data Science for the Social Sector
Course Number CPP 591
Course Level Graduate
Course Start-End Aug 16th - Dec 5th, 2018
Class Timings Online; In-Person Review Sessions Wed 6-7pm & Thurs 1-2pm
Class Location Online; Review sessions in UCENT 580A (Wed) & 480A (Th)

Course Instructors

Jesse Lecy Professor

Office Hours

Jesse Lecy Flexible, by appointment UCENT 517 ONLINE OFFICE HOURS


Paul Teetor Not Required
Wickham, H., & Grolemund, G. Free Online
Peng, R. D., & Matsui, E. Free Online
Chester Ismay & Albert Y. Kim Free Online
Smith, Smith, and Johnson Not Required

I. Course Description, Course Goal and Course Learning Objectives:

Data is an essential consideration of modern organizations. Those that want to embrace evidence-based approaches to management need to develop processes for gathering data, linking organizational data to external sources, running analysis, and communicating results with stakeholders. The ability to collect, organize and analyze data is a desirable skill set for professional knowledge workers, high-level management, and evaluators.

The course introduces students to the R data programming language, an open source platform that has become an industry standard because of its flexibility and power. It was designed to allow people to quickly develop and share new statistical tools. It has evolved into a more general data analytics platform that can be used for analytics, customized visualizations, GIS applications, text analysis, building web applications, and much more. It has a large and active user community that has developed thousands of free custom programs.

Typically only 10-20% of a project is spent analyzing data. The other 80-90% consists of merging data sources, cleaning data, defining new variables, and arranging data into the proper format. These steps require knowledge about data wrangling as well as general project management process. The Foundations of Data Science sequence teaches both the data programming fundamentals and well as project management skills to ensure that analysis is transparent, error-free and reproducible.

Intro to Data Science will cover the fundamentals of data programming – building unique datasets using APIs and custom tools, importing data from the cloud, linking multiple data sources, and wrangling processes to clean, transform, and reshape datasets. Advanced topics will be introduced such as writing functions, running simulations, writing packages for R, and de-bugging techniques. We will spend roughly a third of the units on graphing procedures and reporting packages.

This course will cover the building blocks of data programming in R. We will learn about variables, operators, functions, dataset construction, group structure in data, visualization, and simulation. Students will also be introduced to markdown documents and automated reporting.

The six main learning objectives for the course are:

  • Mastery of functions as the building blocks of all R programs, including arguments and scope
  • Knowledge of variable types and data structures in R, including construction and manipulation of data sets
  • Use of logical statements to create and analyze groups within data
  • Basic programming control structures and simulation techniques
  • Ability to build custom visualizations through the base R graphics package
  • Creation of dynamic graphics and data dashboards using R shiny tools

Course Schedule

Week 1 Aug 17-24 Intro to Markdown, Overview of Data Science
Week 2 Aug 24-31 Functions and Data Structures
Week 3 Aug 31 - Sept 7 Expressive Logic
Week 4 Sep 7 - Sept 14 Joining Data
Week 5 Sept 14 - Sept 21 Descriptive Analysis
Week 6 Sept 21 - Sept 28 Visualization I
Week 7 Sept 28 - Oct 5 Visualization II
Week 8 Oct 5 - Oct 12 Data Ingestion
Week 9 Oct 12 - Oct 19 Advanced Graphics
Week 10 Oct 19 - Oct 26 Data Dashboards
Week 11 Oct 26 - Nov 2 Collaboration Tools
Week 12 Nov 2 - Nov 9 Programming Principles and Control
Week 13 Nov 9 - Nov 16 Text Analysis
Week 14 Nov 16 - Nov 23 Final Project
Week 15 Nov 23 - Nov 30 Thanksgiving
Week 16 Nov 30 - Dec 7 Final Project

Course Prerequisites:

There are no prerequisites, and we do not assume any prior background in computer programming or statistics. Students should, however, have installed R and R Studio, and worked through a basic tutorial on R Studio.

II. Assessment of Student Learning Performance & Proficiency: Keys to Student Success

Assessment of student performance in this course is based on indications that the course learning objectives stated above have been achieved. Several areas of measurement will be used to produce a final student performance rating. These areas of performance assessment include the following:

  • Translating from plain English business cases to logical statements in R using logical operators and analytical techniques applied to groups.
  • Communicating information by developing custom visualizations and graphics.
  • Using markdown documents to generate data-driven reports and data dashboards.

Students will demonstrate competency in understanding, producing and communicating results of their analyses through the following assignments:

  • Weekly labs that provide opportunities to consolidate and apply material from the lectures.
  • A final project that integrates several components of the learning objectives above.
  • Discussion boards that highlight case studies and applications

Assigned work, including the course final project, and the quality of active participation in the regular online discussion sessions that are a critical part of the course learning strategy are the tools the instructors will use to measure comprehension and skill; the student’s course grade is a direct reflection of demonstrated performance. Students should take stated expectations seriously regarding preparation, conduct, and academic honesty in order to receive a grade reflective of outstanding performance.

III. Course Structure and Operations; Performance Expectations

A. Format and Pedagogical Theory

Mastering advanced analytical techniques and data programming is like learning a language. You start by mastering basic vocabulary that is specific to statistics and data science. Through your coursework you will become conversant in the domains of regression analysis, research design, and data science. Progress might be slow at first as you work to master core concepts, integrate the building blocks into a coherent mental model of real-world problems, learn to translate technical results into clear narratives for non-technical audiences, and become comfortable with data programming skills. Over time you will find that your thought processes change as you approach problem-solving in a more structured and evidence-based manner, you apply counter-factual reasoning to performance problems, and you start reading the news and viewing scientific evidence differently. You begin to think and speak like a program evaluator.

By the end of this course you will be conversant in data programming. Fluency takes time and will be developed through professional experience. It requires you to practice these skills to develop muscle memory. You can do this through participating in projects on the job and gaining experience building and cleaning data sets from scratch. Understand, though, that this degree focuses on building foundations for your career. Don’t be nervous if it feels like it’s impossible to master all of the material in a semester or a few courses – it is impossible to learn everything in this field in a couple of years. The goal is to build solid foundations.

Similar to immersion in a language, the best way to learn the material is to be consistent in doing course work each day. The more frequently you revisit concepts and practice data programming the more you will absorb. The curriculum has been designed around this approach. Lectures are split into small units, and each unit includes questions to test your understanding of the material. Weekly labs allow you to spend some time applying the material to a specific problem. The final exam at the end of the semester is designed to help you make connections between concepts and consolidate knowledge. You will be much better off spending a small amount of time each day on the material instead of trying to cram everything into a couple of days a week.

Online discussion boards are design for students to engage with the material together. The purpose of online discussion sessions is threefold: (1) the online discussion sessions allow students to interact with their peers and share ideas and interpretations of the assigned material, (2) such peer-to-peer discussion online helps build professional relationships with potential future colleagues in the field, and (3) the discussions permit the instructor to assess student engagement with the assigned material.

The online discussions are explicitly intended to meet the objectives stated above. They are not intended as another form of “lecture” where the instructor provides commentary and students simply react to that. Rather, the discussions are a chance for peer-to-peer interaction and proactive engagement by each individual student.

The purpose of assigned written work afford students the opportunity to demonstrate substantive understanding of materials covered in course readings, lectures and online discussion.

B. Assigned Reading Materials

There is are no required textbooks for this course, for the following serve as good references:

  • Wickham, H., & Grolemund, G. (2016). R for Data Science. O’Reilly Press. (free online)
  • Teetor, P. (2011). R Cookbook: Proven recipes for data analysis, statistics, and graphics. O’Reilly Media, Inc.
  • Sanchez, G. (2013). Handling and processing strings in R. Berkeley: Trowchez Editions. (free online)
  • Peng, R. D., & Matsui, E. (2015). The Art of Data Science. A Guide for Anyone Who Works with Data. Skybrude Consulting, 200, 162.

The instructor will supplement the assigned unit readings with various journal articles, policy reports, or other related material. These will be made available in the course shell in Canvas.

C. Course Grading System for Assigned Work, including Final Project:

Letter grades comport with a traditional set of intervals:

  • 100 – 98% = A+
  • 97 – 94% = A
  • 93 – 90% = A -
  • 89 – 87% = B+
  • 86 – 84% = B
  • 83 – 80% = B –
  • Below 80% - C, D, F

The assigned work for the term comes in the form of four elements, described below:

Weekly Labs (60%): Each week you will receive a short lab that will help you synthesize the lectures from the week though exercises that involve data, analysis, and important formulas from the lectures. These labs contain exercises that are similar in form or difficulty to what will be presented on the final exam. They are graded pass / fail by the instructors based upon an assessment of whether you have sincerely attempted the lab and answered over half of the questions correctly. This is designed to hold you accountable for the material, but not create anxiety about perfection.

Final Project (20%): This course will close with a final project that requires you to analyze some data and present your results as a data dashboard. It is designed to give you practice integrating material the we have covered throughout the course.

Discussion Sessions (20%): Each student in the course will be given the opportunity to participate in discussion boards designed to reflect on and explore real-world case studies that involve organizations using data science tools to solve problems. The intent is to show how the topics covered in class might be used to deepend impact in organizations, or to raise important ethical dimensions of the work.

D. General Grading Rubric for Written Work

In general any submitted work written work (assignments and/or exams) is assessed on these evaluative criteria:

  • Assignment completeness – all elements of the assignment are addressed
  • Quality of analysis – substantively rigorous in addressing the assignment
  • Demonstrated synthesis of core concepts from lecture notes and ability to apply to new problems

Assignments are distributed with an accompanying specific assessment rubric.

E. Late and Missing Assignments

This course is based on students reading course material, participating in discussion with colleagues and producing labs and a final project. There are a total of 13 labs throughout the semester, which are graded pass-fail. You are only required to complete 12 labs by the end of semester, giving you a buffer if you need to prioritize other work one week.

F. Course Communications and Instructor Feedback:

For communication regarding the class please email the instructor or schedule office hours via the contact information provided above.

Students should be aware that the course instructor will attempt to respond to any course-related email as quickly as possible. Students are asked to allow between 24 and 48 hours for replies to direct instructor emails, generally, as a reasonable time to reply to questions or other issues posed in an email. Additionally, the general timeline for instructor grading or other feedback on assignments, either writer work or online discussion work, is between 5 and 10 work days.

G. Student Conduct: Expectation of Professional Behavior:

Respectful conversations and tolerance of others’ opinions will be strictly enforced. Any inappropriate language, threatening, harassing, or otherwise inappropriate behavior during discussion could result in the student(s) being administratively dropped from the course with no refund, per ASU policy USI 201-10. Students are required to adhere to the behavior standards listed in the Arizona Board of Regents Policy Manual Chapter V—Campus and Student Affairs .

H. Academic Integrity and Honesty

ASU expects the highest standards of academic integrity. Violations of academic integrity include but are not limited to cheating, plagiarism, fabrication, etc. or facilitating any of these activities. This course relies heavily on writing and original critical thought. Any student who is suspected of not producing his or her own original work will be reported to the College of Public Programs for investigation. Plagiarism will not be tolerated. Any student who plagiarizes or otherwise fabricates his or her work will receive no credit for that assignment. It will be recorded as zero points—and the student will risk a failing grade for the course. For more information, refer to http://provost.asu.edu/academicintegrity.

I. Student Learning Environment: Accommodations

Disability Accommodations: Students should be fully aware that the Arizona State University, the MA in EMHS program, and all program course instructors are committed to providing reasonable accommodation and access to programs and services to persons with disabilities. Students with disabilities who wish to seek academic accommodations must contact the ASU Disability Resources Center directly. Information on the Center’s procedures, resources and how to contact its staff can be found here: https://eoss.asu.edu/drc/. The Disability Resources Center is responsible for reviewing any student’s requests; once that review has taken place, the Center will provide the student with appropriate information on academic accommodations which in turn will be provided to the course instructor.

Religious accommodations: Students will not be penalized for missing an assignment due solely to a religious holiday/observance, but as this class operates with a fairly flexible schedule, all efforts should be made to complete work within the required timeframe. If this is not possible, students must notify the instructor as far in advance as possible in order to make an alternative arrangement.

Military Accommodations: A student who is a member of the National Guard, Reserve, or other branch of the armed forces and is unable to complete classes because of military activation may request complete or partial unrestricted administrative withdrawals or incompletes depending on the timing of the activation. For more information see ASU policy USI 201-18.