import pandas as pd import numpy as np import os import matplotlib.pyplot as plt import plotly.express as px import plotly.graph_objects as go pd.options.plotting.backend = 'plotly' TEMPLATE = 'seaborn' import warnings warnings.simplefilter('ignore')
Instead of relying on a single validation set, we can create $k$ validation sets, where $k$ is some positive integer (5 in the example below).
Since each data point is used for training $k-1$ times and validation once, the (averaged) validation performance should be a good metric of a model's ability to generalize to unseen data.
$k$-fold cross-validation (or simply "cross-validation") is the technique we will use for finding hyperparameters.
First, shuffle the dataset randomly and split it into $k$ disjoint groups. Then:
As a reminder, here's what "sample 1" looks like.
sample_1 = pd.read_csv(os.path.join('data', 'sample-1.csv')) px.scatter(x=sample_1['x'], y=sample_1['y'], template=TEMPLATE)