Lecture 20 – Modeling and Linear Regression

DSC 80, Winter 2023

📣 Announcements

RSVP to the capstone showcase on Wednesday, March 15th!

The senior capstone showcase is on Wednesday, March 15th in the Price Center East Ballroom. The DSC seniors will be presenting posters on their capstone projects. Come and ask them questions; if you're a DSC major, this will be you one day!

The session is broken into two blocks:

Look at the list of topics and RSVP here!

There will be no live DSC 80 lecture on the day of the showcase – instead, lecture will be pre-recorded!




So far this quarter, we've learned how to:


Goals of modeling

  1. To make accurate predictions regarding unseen data drawn from the data generating process.
    • Given this dataset of past UCSD data science students' salaries, can we predict your future salary? (regression)
    • Given this dataset of images, can we predict if this new image is of a dog, cat, or zebra? (classification)
  1. To make inferences about the structure of the data generating process, i.e. to understand complex phenomena.
    • Is there a linear relationship between the heights of children and the heights of their biological mothers?
    • The weights of smoking and non-smoking mothers' babies babies in my sample are different – how confident am I that this difference exists in the population?


Example: Restaurant tips 🧑‍🍳

About the data

What features does the dataset contain?

Predicting tips

Exploratory data analysis (EDA)

Visualizing distributions