Creating a bootstrap set

Configure GenomEn, load the dataset, and create train/test splits.

import genomen.utils as utils
from genomen.data import DataSet, split, bootstrap
from genomen.model import GenomenModel

utils.set_config_path("config.yml")

dataset = DataSet()
train_set, test_set = split(dataset, test_size=0.2)

With the data split, draw a bootstrap sample from the training set. Bootstrapping samples with replacement and keeps the same size, approximating sampling variability.

# Create a bootstrap-resampled training set
bootstrapped_set = bootstrap(train_set)

Train on the bootstrapped training set, validate on the original validation split, and predict on the held-out test set.

model = GenomenModel()
model.fit(bootstrapped_set, val_set)
geno_preds, covar_preds, preds = model.predict(test_set)

Optionally, repeat the process multiple times to aggregate metrics or estimate uncertainty (e.g., mean and confidence intervals).

for i in range(10):
    bs_train = bootstrap(train_set)
    model = GenomenModel()
    model.fit(bs_train, val_set)
    _ = model.predict(test_set)