ESOC 2014 Introduction to Data Science
Fall 2020
2020-12-07
Module 1 Syllabus
The University of Arizona sits on the homelands of the Tohono O’odham and Pascua Yaqui, whose care and keeping of these lands allows us to be here today. Territory acknowledgements are one small part of disrupting and dismantling colonial structures.
This syllabus is subject to change if need arises.
There are two sections of this course
Tuesday & Thursday 12:30pm - 1:45pm
- Final Exam Date: December 16 (Wednesday) 1:00pm - 3:00pm
Tuesday & Thursday 2:00pm - 3:15pm
- Final Exam Date: December 14 (Monday) 3:30pm - 5:30pm
Office Hours/Free help session/Work time
- Tuesday 9:30am - 11:00am & Wednesday 1:00pm - 2:30pm
1.1 Course Description
This course provides an introduction to the various skills and considerations required for data management and analysis in business, education, and science. Particular attention will be given to learning how to use the free and open-source computing environment R, focusing on the tidyverse
package for data science. This course is designed to be interactive and hands-on.
1.2 Course Objectives
This course aims at providing students with an understanding of the various steps in the data science workflow. Students will engage in data wrangling and exploration to provide answers to questions about the data, using the R programming language. During the semester students will work on an individual data science project to be presented to the class.
1.3 Learning Outcomes
At the end of this course, students will be able to:
Apply the different steps of data science as a process to derive knowledge from data
1.1. form the question to be answered
1.2. acquire the data to answer question
1.3. transform and tidy data so that data analysis is possible
1.4. explore data with understanding as the goal, which includes data visualization
1.5. communicate data analysis results
Demonstrate proficiency of the steps 1.3 - 1.5 above in the R programming language and R Markdown
Identify and apply professional standards regarding all aspects of data ethics and privacy, including how data are stored, used, managed, analyzed, and presented
Demonstrate knowledge of what a data scientist is and what a career in data science requires in terms of education, and set goals and make plans in case they want to pursue data science beyond the completion of this course
Please refer to the department’s undergraduate student competencies to find out how this course’s learning outcomes fit into your broad education goals.
1.4 A Few Words on R and Coding
This course will be based around the programming language R which we will use within the integrated development environment (IDE) R Studio. For many of you this will be the first time programming, AND THAT’S OK! This course is intended for beginners, and we will actively focus on building up your R skills over the course of the semester. Of course, there will still be challenges along the way, but you will rapidly figure out how to solve your own problems as well as to apply your current knowledge to new and exciting questions. If you are struggling I highly encourage you to take advantage of my free help sessions (see times above). Of course, Google is always a super helpful way to get insight into coding problems. Our class Slack channel will also be there so you can help each other out. You might want to watch Roger Peng’s video on how to get help, which contains guidelines on what information to provide when asking a question in a public forum.
I also want to note that I highly encourage you to help each other, as data scientists are rarely working in isolation. This does not mean you can directly share code associated with an assignment (this is a violation of UA’s Code of Academic Integrity). What it does mean is that it is helpful to talk to each other about problems you encountered, resources you found, or provide helpful tips.
Learning to code nowadays is much easier, since a simple Google search will research in a huge amount of code that can solve any number of problems. You may use online resources (e.g., StackOverflow), but we will go over the syntax needed to solve all assignments in class. If you do use any external resources, you must explicitly cite where the code was obtained in your comments (add a direct link to the resource). I’ll be checking for recycled code, and any code you re-used without a proper citation will be treated as plagiarism.
1.5 A Few Words on Technology
YOU MUST HAVE ACCESS TO A COMPUTER YOU CAN CODE WITH IN EVERY CLASS! We will be actively coding in R on a daily basis, and not being able to follow along will severely hamper your learning. If you do not have a laptop or yours had troubles at some point during the semester, the library offers fast and free rentals of both macs and PCs: https://new.library.arizona.edu/tech/borrow. You can also take advantage of the multiple computer labs on campus: https://it.arizona.edu/service/oscr-computer-labs
You will have access to and will be required to retrieve all course materials from the course page on GitHub.
You will need to have R and R Studio installed and functioning by the second day of class. We will go over what these programs are and how to install them in the first week of class.
Slack participation is critical! If you are having a coding issue, first try and solve it on your own. If you’re still struggling, then post it to our Slack. Essentially, if you are about to email me with a homework/class/coding question, post it to Slack first. I’m not doing this to save me time, but rather because virtually all programmers/coders solve problems by helping each other, and thus I want you to do the same! Please register for our Slack channel.
1.6 Readings
There is no required textbook for this class. A few times we will use the book “R for Data Science” by Hadley Wickham and Garret Grolemund. This book covers how to create full data science pipelines in R (more than we’ll be doing here) and is available free here: https://r4ds.had.co.nz/.
Aside from this book, there will be other required readings. I will link these readings for you on this bookdown. Some come from academic journals, and others are news articles that appear in many of the newspapers you read in print and online. For each reading, a word count and an approximate reading time will be provided. Please adjust these approximations to your own reading time, so you can plan accordingly.
It is crucial that you read all assigned readings to do well in this class. Anyone who has not done the reading will simply not be able to participate.
1.7 Assignments with Grade Breakdown
Activity | Total Percent | Unit Percent | Description |
---|---|---|---|
Final Project | 30% | 5% Project Proposal 15% Write-up 10% Oral presentation |
This will be a full data science project, complete with formal write-up and presentation to the class |
Midterm | 20% | 20% | |
Sharing Code during Zoom sections (5) | 10% | 2% | |
Data Challenges (9) | 28% | 3.5% | Lowest will be dropped. All assignments must completed by the date and time provided in the assignment instructions |
Class Participation | 10% | Participation includes both in-class and message board questions, engagement. To get full credit I should see your name or hear you in class once a week. | |
Intro and exit surveys | 2% | 1% |
Late assignments within 24 hours of due date and time will get a 20% grade penalty. Assignments submitted 24 hours after the due date and time will not get any credit.
If you are unable to complete an assignment on the due date due to an illness or another personal problem, please contact me as soon as possible so we can talk about ways to help you complete that assignment.
Any work turned in for this class needs to be distinctly developed for this class, and not work turned in for other classes.
Grade Distribution:
90-100% = A “exemplary, far beyond reqs/expectations”
80-89% = B “exceeds requirements/expectations”
70-79% = C “meets requirements/expectations”
60-69% = D “falls short of requirements/expectations”
< 60% = E “repeat of course needed”
A Note About Final Grades
I do not modify final grades. I have designed this course to be highly passable for the new learner assuming they do the modest homework assignments, come to class, and participate. I’m not a difficult grader, and I build in extensive opportunities for ‘easy points.’ Given all this, please do not try and ask for a higher grade when end of semester rolls around.
1.8 Requirements for the Course
To succeed in this course, 2-3 hours of study time per hour of formal class time (or per unit) are required. This means that in addition to our three hours of formal class meeting time, 6-9 hours a week of study time are needed in order to meet course expectations. These hours should be spent on reading texts, working on your data challenges, researching for new information, or thinking about course content.
It’s important to mention that each lesson builds upon the previous, and thus staying on top of the material is critical to your success. As mentioned before, this class is built specifically for beginners, and plenty of students who have never coded before have done extremely well. But, the reason they did is that they came to class consistently, asked questions when they had an issue, and completed their data challenges. If you miss a class, come to office hours to make up what you missed. I will do everything possible to make sure you succeeded assuming you’re willing to put in the work!
1.9 Course Schedule
Here is the tentative course schedule. Data challenges are always due before the start of class on the associated due date. There will sometimes be other short readings and assignments. These will be posted on D2L directly after the class period in which they are assigned.
Week | Date | Goals | Assignment |
---|---|---|---|
Week 01 | 2020-08-25 | Introductions Syllabus |
|
2020-08-27 | Intro do Data Science Data Science workflow |
Reading: What’s data science? (20 min) YouTube video Angry Hiring Manager Panel (6.5 min) Survey 1 (10 min) |
|
Week 02 | 2020-09-01 | What’s Data? What does data analysis look like? IDE overview How and Why to Start a Project Basics of R |
Reading: Data Science examples (8 min), Data Intake (12 min) Install R and RStudio |
2020-09-03 | Basics of R - basic operations - objects - data types |
||
Week 03 | 2020-09-08 | Basics of R - data frames - inspecting data - slicing your data |
Read A Million Lines of Bad Code (5 min) What is Statistics Good For? (3 min) |
2020-09-10 | Submitting assignments through GitHub | Join our GitHub classroom | |
Week 04 | 2020-09-15 | Installing R Packages Intro to Tidyverse |
Read Advice to Young (and Old) Programmers: A Conversation with Hadley Wickham (10 min) Submit test assignment |
2020-09-17 | Tidyverse | ||
Week 05 | 2020-09-22 | Data Wrangling | Data Challenge 1 |
2020-09-24 | Data Wrangling | ||
Week 06 | 2020-09-29 | Intro to Data Visualization | Data Challenge 2 |
2020-10-01 | Data Visualization | ||
Week 07 | 2020-10-06 | Data Visualization | Data Challenge 3 |
2020-10-08 | Data Visualization | ||
Week 08 | 2020-10-13 | Data analysis case study 1 | Data Challenge 4 |
2020-10-15 | Data analysis case study 1 | ||
Week 09 | 2020-10-20 | MIDTERM - Study Guide on D2L | Data Challenge 5 |
2020-10-22 | Data analysis case study 2 | ||
Week 10 | 2020-10-27 | Data analysis case study 2 | |
2020-10-29 | Getting Data | Data Challenge 6 | |
Week 11 | 2020-11-03 | Data analysis case study 3 | Deadline to meet about final project |
2020-11-05 | Data analysis case study 3 | Project Proposal | |
Week 12 | 2020-11-10 | Markdown | Data Challenge 7 |
2020-11-12 | Markdown | ||
Week 13 | 2020-11-17 | Full data analysis case study 4 | Data Challenge 8 |
2020-11-19 | Full data analysis case study 4 | ||
Week 14 | 2020-11-24 | Happy Thanksgiving! 🌽🦃🏡 | Data Challenge 9 |
2020-11-26 | Happy Thanksgiving! 🌽🦃🏡 | ||
Week 15 | 2020-12-01 | Written and Oral Communication in Data Science | |
2020-12-03 | Written and Oral Communication in Data Science | ||
Week 16 | 2020-12-08 | Preparing for Final Presentations Wrap-up |
Final Project is due December 16 (Wednesday) at 3:00pm Survey 2 is also due December 16 (Wednesday) at 3:00pm |
For more information about dates including holidays, check UArizona’s Academic Calendar.
Why am I using YYYY-MM-DD date format?
1.10 Final Project
There is a final project in place of a final exam for this class. You will find your own dataset that helps you answer a question that you’re interested in. You’ll bring these data into R, explore it, clean it, make features, and run an analysis that allows you to answer your question. You will be graded on the completed R script as well as your presentation of the data.
The presentation will last 3-4 minutes, and will take place on the day of the final exam (in place of the exam). University policy on final examinations can be found here: https://www.registrar.arizona.edu/courses/final-examination-regulations-and-information
1.11 Honors Students’ Requirements
Students wishing to take this course for Honors Credit should email me to set up an appointment to discuss the terms of the contact and to sign the Honors Course Contract Request Form. The form is available at https://honors.arizona.edu/academics/honors-contracts. Students earning credit with the University of Arizona Honors College will be held to the following enhancements:
Honors students will be required to create an academic poster based on their final project, and then present this poster at the iSchool’s iShowcase at the end of the semester. Creating a poster will require extra work to ensure clarity of logic, having a well-defined question and approach, and the creation of quality visuals. Guidelines on how to create an engaging academic poster can be found here: https://guides.nyu.edu/posters. Note: The iShowcase is at the end of the semester, but before finals when the regular class will have the project due. Thus, you will have to be ahead of schedule a bit to meet your honors requirement.
Honors students will also be expected to informally ‘journal’ about the course each week. Each week, that is, students will be required to write a five-sentence paragraph reflecting on some issue or moment that has arisen in our readings or discussions (e.g., the problem with particular terms or some philosophical or practical dilemma). Ultimately, if offering a paragraph each week, honors students will have written roughly 15 reflective paragraphs for the semester. This must be emailed directly to me by Sunday 5pm each week.
1.12 Student Accommodations
It is the University’s goal that learning experiences be as accessible as possible. If you anticipate or experience physical or academic barriers based on disability or pregnancy, please let me know immediately so that we can discuss options. You are also welcome to contact Disability Resources (520-621-3268) to establish reasonable accommodations. For additional information on Disability Resources and reasonable accommodations, please visit http://drc.arizona.edu/.
1.13 Attendance, Due Dates, and Missing Work
- Missed class assignments or exams cannot be made up without a well-documented, verifiable, excuse (for example, a physician’s medical excuse). Indeed, due dates are firm, and late work will be accepted only with a verifiable and valid excuse.
- The UA policy regarding absences for any sincerely held religious belief, observance or practice will be accommodated where reasonable, http://policy.arizona.edu/human-resources/religious-accommodation-policy.
- Absences pre-approved by the UA Dean of Students (or Dean designee) will be honored. https://deanofstudents.arizona.edu/absences
- Arriving late and leaving early is extremely disruptive to others in the class. Please avoid this kind of disruption.
- The UA’s policy concerning Class Attendance and Administrative Drops is available at: https://catalog.arizona.edu/policy/class-attendance-participation-and-administrative-drop
1.14 Course Conduct and Campus Policies
It’s important to be familiar with all campus policies.
Students are encouraged to share intellectual views and discuss freely the principles and applications of course materials. However, graded work/exercises must be the product of independent effort unless otherwise instructed. Students are expected to adhere to the UA Code of Academic Integrity as described in the UA General Catalog. See: http://deanofstudents.arizona.edu/academic-integrity/students/academic-integrity.
It is the University’s goal that learning experiences be as accessible as possible. If you anticipate or experience physical or academic barriers based on disability or pregnancy, please let me know immediately so that we can discuss options. You are also welcome to contact Disability Resources (520-621-3268) to establish reasonable accommodations. For additional information on Disability Resources and reasonable accommodations, please visit http://drc.arizona.edu/.
The UA Threatening Behavior by Students Policy prohibits threats of physical harm to any member of the University community, including to oneself. See http://policy.arizona.edu/education-and-student-affairs/threatening-behavior-students.
All student records will be managed and held confidentially. http://www.registrar.arizona.edu/personal-information/family-educational-rights-and-privacy-act-1974-ferpa?topic=ferpa
The University is committed to creating and maintaining an environment free of discrimination; see http://policy.arizona.edu/human-resources/nondiscrimination-and-anti-harassment-policy.
Information contained in this syllabus, other than the grade and absence policy, may be subject to change without advance notice as deemed appropriate by the instructor.
1.15 Code of Conduct
This code of conduct is based on GitHub Community Guidelines. One of the goals of this course is to get you familiar with the data science community, and how people work and learn better together. This is a community we build together, and we need everybody’s help to make it better each day.
1.15.1 Our Pledge
In the interest of fostering an open and welcoming environment, we as instructor and students pledge to making participation in our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
Be welcoming and open-minded. Although this is an intro course, like in any other learning setting, we have people at different levels of experience. Other people may not have the same experience level or background as you, but that doesn’t mean they don’t have good ideas to contribute. I encourage you to be welcoming to everyone, from more advanced coders to those just getting started. We can all learn from each other.
Respect each other. Nothing sabotages healthy conversation like rudeness. Be civil and professional, and don’t post or say anything that a reasonable person would consider offensive, abusive, or hate speech. Don’t harass or grief anyone. Treat each other with dignity and consideration in all interactions.
You may wish to respond to something by disagreeing with it. That’s fine. But remember to criticize ideas, not people. Avoid name-calling, ad hominem attacks, responding to a post’s tone instead of its actual content, and knee-jerk reactions. Instead, provide reasoned counter-arguments that improve the conversation.
Communicate with empathy. Disagreements or differences of opinion are a fact of life. Being part of a community means interacting with people from a variety of backgrounds and perspectives (and we are all better because of this variety), many of which may not be your own. If you disagree with someone, try to understand and share their feelings before you address them. This will promote a respectful and friendly atmosphere where people feel comfortable asking questions, participating in discussions, and making contributions.
Be clear and stay on topic. The goal of this course is to learn about data science and how to do data science with R. Off-topic comments are a distraction (sometimes welcome, but usually not) from getting work done and being productive. Staying on topic helps produce positive and productive discussions.
Additionally, as this class will be conducted online, you might not have met each other in person. Communicating on the internet can be awkward, even when you already know people. It’s hard to convey or read tone, and sarcasm is frequently misunderstood. Try to use clear language, and think about how it will be received by the other person.
1.15.2 Our Standards
Examples of behavior that contributes to creating a positive environment include:
Using welcoming and inclusive language
Being respectful of differing viewpoints and experiences
Gracefully accepting constructive criticism
Focusing on what is best for the community
Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
The use of sexualized language or imagery and unwelcome sexual attention or advances
Trolling, insulting/derogatory comments, and personal or political attacks
Public or private harassment
Publishing others’ private information, such as a physical or electronic address, without explicit permission
Other conduct which could reasonably be considered inappropriate in a professional setting
1.15.3 Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting your instructor at adrianaps@email.arizona. Your instructor will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. Your instructor is obligated to maintain confidentiality with regard to the reporter of an incident.
1.15.4 Attribution
This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at http://contributor-covenant.org/version/1/4
1.16 How to Ask For Help
We’ll see in this course that a key skill that you should develop as a data science is the ability to find solutions to problems. Knowing how to get help is part of that skill.
1.16.1 Before You ask for help
Check for typos. One of the most common causes of errors are typos, which usually throw an error such as Error in _____ : could not find function “_____” due to a function being misspelled
Check loaded packages. You also get errors like Error in data %>% summary() : could not find function “%>%” when you failed to load a package.
Read the error message. Don’t ignore what R is telling you. Be aware that red text that appears in your console is not alwayws indication of errors. Sometimes it’s just a warning.
Google is your friend. Copy and paste the exact error message on a Google search. (this step also includes read the documentation on the package you’re trying to use).
If you are still stuck, you an always try rubber duck debugging. Describe the problem aloud, explaining it line-by-line, to a rubber duck or another person (who might not have any experience with programming of data science). This is also a good preparation step to asking other people for help (next section).
1.16.2 Ask other people for help
Like mentioned before, you should ask your peers for help before you ask your instructor. Relying on a single person to solve all of your problems is dangerous, because that person won’t be available throughout your career as a data scientist.
Check our Slack to see if someone else has asked a question similar to yours, and whether there’s a solution posted for it.
Be precise and informative. The more context you can provide about what you’re trying to do and what errors you’re getting, the better. Also describe the steps you took to try to solve the problem yourself.