Creating a bootstrap set
Configure GenomEn, load the dataset, and create train/test splits.
import genomen.utils as utils
from genomen.data import DataSet, split, bootstrap
from genomen.model import GenomenModel
utils.set_config_path("config.yml")
dataset = DataSet()
train_set, test_set = split(dataset, test_size=0.2) With the data split, draw a bootstrap sample from the training set. Bootstrapping samples with replacement and keeps the same size, approximating sampling variability.
# Create a bootstrap-resampled training set
bootstrapped_set = bootstrap(train_set) Train on the bootstrapped training set, validate on the original validation split, and predict on the held-out test set.
model = GenomenModel()
model.fit(bootstrapped_set, val_set)
geno_preds, covar_preds, preds = model.predict(test_set) Optionally, repeat the process multiple times to aggregate metrics or estimate uncertainty (e.g., mean and confidence intervals).
for i in range(10):
bs_train = bootstrap(train_set)
model = GenomenModel()
model.fit(bs_train, val_set)
_ = model.predict(test_set)