DSC 80 – The Practice and Application of Data Science
This Week
TF-IDF
Final Project Released! 📊
The Final Project is now available! This project is worth 10% of your overall grade (double a regular project) and will be a culmination of everything you've learned this quarter.
This project is structured differently from previous projects, so read the instructions carefully. Namely, there are two checkpoint assignments, and the project is not autograded.
Important Dates:
- Checkpoint 1: Thursday, November 20th, 2025 at 11:59PM
- Checkpoint 2: Tuesday, December 2nd, 2025 at 11:59PM (Note: Due on a Tuesday because of Thanksgiving break)
- Final Submission: Thursday, December 11th, 2025 at 11:59PM
You'll choose one of three datasets (Recipes and Ratings, League of Legends, or Power Outages) and conduct an open-ended investigation. Your deliverables include a public-facing website and a Jupyter Notebook PDF.
Start early and build something you're proud of – this is a great portfolio piece for your resume!
Lecture 13 — TF-IDF and Features
- 📖 Reading Chapter 9 - Introduction to Regression
- 📓 Lecture Slides (pdf)
past weeks
Week 6
Parsing HTML and RegEx
Exam 01 on Thursday, Nov 06
Lecture 12 — Text Features
- 📖 Reading Chapter 9 - Introduction to Features
- 📓 Lecture Slides (pdf)
Week 5
HTTP and HTML
Lecture 10 — HTML and Scraping
- 📖 Reading Chapter 7 - Collecting Data
- 📓 Lecture Slides (pdf)
Lecture 11 — Regular Expressions
- 📖 Reading Chapter 8 - Information Extraction
- 📓 Lecture Slides (pdf)
Week 4
Imputation
Lecture 8 — Missingness and Imputation
- 📖 Reading Chapter 6 - Missing Data
- 📓 Lecture Slides (pdf)
Lecture 9 — APIs and Scraping
- 📖 Reading Chapter 7 - Collecting Data
- 📓 Lecture Slides (pdf)
Week 3
Missingness
Lecture 6 — Cleaning Data, Hypothesis Testing
- 📖 Reading Chapter 4 - Introduction to Data Science
- 📓 Lecture Slides (pdf)
Lecture 7 — Permutation Tests, Missing Data
- 📖 Reading Chapter 6 - Missing Data
- 📓 Lecture Slides (pdf)
Week 2
Hypothesis Testing and Data Granularity
Lecture 4 — Pivot Tables, Simpson's Paradox
- 📖 Reading Chapters 2 and 3 - Introduction to Data Science
- 📓 Lecture Slides (pdf)
Lecture 5 — Merging and Cleaning Data
- 📖 Reading Chapter 4 - Introduction to Data Science
- 📓 Lecture Slides (pdf)
Week 1
Tables and Messy Data
Lecture 2 — Pandas
- 📖 Reading Chapters 2 and 3 - Introduction to Data Science
- 📓 Lecture Slides (pdf)
Lecture 3 — Aggregating
- 📖 Reading Chapters 2 and 3 - Introduction to Data Science
- 📓 Lecture Slides (pdf)
Week 0
Introduction
Welcome to DSC 80!
Here's how to get started:
- Read the syllabus.
- Join our
Campuswire message board
and
Gradescope
with the email invitations you received earlier this week. If
you didn't receive an email, you can use access code
42BY42for Gradescope and access code7629for Campuswire. - Read the Tech Support page and set up your development environment.
- The first lectures will be on Thursday, September 25 at:
- 12:30 PM in CENTR 212.
- 3:30 PM in PODEM 1A22.
- There are no discussions for this class. Instead, we'll post optional, ungraded weekly exam study guides and host additional office hours.
See you in lecture!
Lecture 1 — Introduction
- 📖 Reading Chapter 1 - Introduction to Data Science
- 📓 Lecture Slides (pdf)
future weeks
Week 8
Features and sklearn
Week 9
Thanksgiving Week
Week 10
Model Evaluation and Fairness
Exam 02 on Saturday, Dec 06