Lecture 24 – Decision Trees, Grid Search, Multicollinearity

DSC 80, Winter 2023

Announcements

Agenda

Cross-validation

Recap

$k$-fold cross-validation

Instead of relying on a single validation set, we can create $k$ validation sets, where $k$ is some positive integer (5 in the example below).

Since each data point is used for training $k-1$ times and validation once, the (averaged) validation performance should be a good metric of a model's ability to generalize to unseen data.

$k$-fold cross-validation (or simply "cross-validation") is the technique we will use for finding hyperparameters.

$k$-fold cross-validation

First, shuffle the dataset randomly and split it into $k$ disjoint groups. Then:

As a reminder, here's what "sample 1" looks like.