Lecture 20 – Modeling and Linear Regression

So far this quarter, we've learned how to:


Goals of modeling

  1. To make accurate predictions regarding unseen data drawn from the data generating process.
    • Given this dataset of past UCSD data science students' salaries, can we predict your future salary? (regression)
    • Given this dataset of images, can we predict if this new image is of a dog, cat, or zebra? (classification)
  1. To make inferences about the structure of the data generating process, i.e. to understand complex phenomena.
    • Is there a linear relationship between the heights of children and the heights of their biological mothers?
    • The weights of smoking and non-smoking mothers' babies babies in my sample are different – how confident am I that this difference exists in the population?


Example: Restaurant tips 🧑‍🍳

About the data

What features does the dataset contain?

Predicting tips

Exploratory data analysis (EDA)

Visualizing distributions