import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
pd.options.plotting.backend = 'plotly'
TEMPLATE = 'seaborn'
import warnings
warnings.simplefilter('ignore')
Instead of relying on a single validation set, we can create $k$ validation sets, where $k$ is some positive integer (5 in the example below).
Since each data point is used for training $k-1$ times and validation once, the (averaged) validation performance should be a good metric of a model's ability to generalize to unseen data.
$k$-fold cross-validation (or simply "cross-validation") is the technique we will use for finding hyperparameters.
First, shuffle the dataset randomly and split it into $k$ disjoint groups. Then:
As a reminder, here's what "sample 1" looks like.
sample_1 = pd.read_csv(os.path.join('data', 'sample-1.csv'))
px.scatter(x=sample_1['x'], y=sample_1['y'], template=TEMPLATE)