chemicalchecker.tool.targetmate.tmsetup.TargetMateRegressorSetup

class TargetMateRegressorSetup(models_path, tmp_path=None, cc_root=None, is_classic=False, classic_dataset='A1.001', classic_cctype='sign0', prestacked_dataset=None, overwrite=True, n_jobs=None, n_jobs_hpc=8, max_train_samples=10000, max_train_ensemble=10, train_sample_chance=0.95, standardize=False, is_cv=False, is_stratified=True, n_splits=3, test_size_hyperopt=0.2, scaffold_split=False, outofuniverse_split=False, outofuniverse_datasets=['A1.001'], outofuniverse_cctype='sign1', conformity=True, hpc=False, do_init=True, search_n_iter=25, train_timeout=7200, shuffle=False, log='INFO', use_stacked_signature=False, is_tmp_bases=True, is_tmp_signatures=True, is_tmp_predictions=True, use_cc=True, **kwargs)[source]

Bases: TargetMateSetup

Set up a TargetMate regressor

Basic setup of the TargetMate.

Parameters:
  • models_path (str) – Directory where models will be stored.

  • tmp_path (str) – Directory where temporary data will be stored (relevant at predict time) (default=None).

  • cc_root (str) – CC root folder (default=None).

  • is_classic (bool) – Use a classical chemical fingerprint, instead of CC signatures (default=False).

  • classic_dataset (str) – Dataset code for the classic fingerprint.

  • classic_cctype (str) – Signature for the classic dataset.

  • prestacked_dataset (str) – Prestacked dataset signature.

  • overwrite (bool) – Clean models_path directory (default=True).

  • n_jobs (int) – Number of CPUs to use, all by default (default=None).

  • n_jobs_hpc (int) – Number of CPUs to use in HPC (default=1).

  • max_train_samples (int) – Maximum number of training samples to use (default=10000).

  • max_train_ensemble (int) – Maximum size of an ensemble (important when many samples are available) (default=10).

  • train_sample_chance (float) – Chance of visiting a sample (default=0.95).

  • standardize (bool) – Standardize small molecule structures (default=True).

  • is_cv (bool) – In hyper-parameter optimization, do cross-validation (default=False).

  • is_stratified (bool) – In hyper-parameter optimization, do stratified split (default=True).

  • n_splits (int) – If hyper-parameter optimization is done, number of splits (default=3).

  • test_size_hyperopt (int) – If hyper-parameter optimization is done, size of the test (default=0.2).

  • scaffold_split (bool) – Model should be evaluated with scaffold splits (default=False).

  • outofuniverse_split (bool) – Model should be evaluated with out-of-universe splits (default=False).

  • outofuniverse_datasets (list) – Datasets to consider as part of the universe in the out-of-universe split.

  • outofuniverse_cctype (str) – Signature type of the datasets considered to be part of the out-of-universe split.

  • conformity (bool) – Do cross-conformal prediction (default=True)

  • hpc (bool) – Use HPC (default=False)

  • search_n_iter (int) – Number of iterations in a search for hyperparameters (default=25).

  • train_timeout (int) – Maximum time in seconds for training a classifier; applies to autosklearn (default=7200).

  • use_cc (bool) – Use pre-computed CC signatures.

Methods

compress_models

Store model in compressed format for persistance

cpu_count

create_models_path

directory_tree

func_hpc

Execute the any method on the configured HPC.

load

Load previously stored TargetMate instance.

load_base_model

Load a base model

load_data

read_data

repath_bases_by_fold

Redefine path of a TargetMate instance.

repath_predictions_by_fold

Redefine path of a TargetMate instance.

repath_predictions_by_fold_and_set

repath_predictions_by_set

Redefine path of a TargetMate instance.

reset_path_bases

reset_path_predictions

Reset predictions path

save

Save TargetMate instance

save_data

waiter

Wait for jobs to finish

wipe

Delete temporary data

compress_models()

Store model in compressed format for persistance

func_hpc(func_name, *args, **kwargs)

Execute the any method on the configured HPC.

Parameters:
  • args (tuple) – the arguments for of the function method

  • kwargs (dict) – arguments for the HPC method.

static load(models_path)

Load previously stored TargetMate instance.

load_base_model(destination_dir, append_pipe=False)

Load a base model

repath_bases_by_fold(fold_number, is_tmp=True, reset=True, only_train=False)

Redefine path of a TargetMate instance. Used by the Validation class.

repath_predictions_by_fold(fold_number, is_tmp=True, reset=True)

Redefine path of a TargetMate instance. Used by the Validation class.

repath_predictions_by_set(is_train, is_tmp=True, reset=True)

Redefine path of a TargetMate instance. Used by the Validation class.

reset_path_predictions(is_tmp=True)

Reset predictions path

save()

Save TargetMate instance

waiter(jobs, secs=3)

Wait for jobs to finish

wipe()

Delete temporary data