chemicalchecker.tool.targetmate.tmsetup.TargetMateSetup
- class TargetMateSetup(models_path, tmp_path=None, cc_root=None, is_classic=False, classic_dataset='A1.001', classic_cctype='sign0', prestacked_dataset=None, overwrite=True, n_jobs=None, n_jobs_hpc=8, max_train_samples=10000, max_train_ensemble=10, train_sample_chance=0.95, standardize=False, is_cv=False, is_stratified=True, n_splits=3, test_size_hyperopt=0.2, scaffold_split=False, outofuniverse_split=False, outofuniverse_datasets=['A1.001'], outofuniverse_cctype='sign1', conformity=True, hpc=False, do_init=True, search_n_iter=25, train_timeout=7200, shuffle=False, log='INFO', use_stacked_signature=False, is_tmp_bases=True, is_tmp_signatures=True, is_tmp_predictions=True, use_cc=True, **kwargs)[source]
Bases:
HPCUtils
Set up the base TargetMate class
Basic setup of the TargetMate.
- Parameters:
models_path (str) – Directory where models will be stored.
tmp_path (str) – Directory where temporary data will be stored (relevant at predict time) (default=None).
cc_root (str) – CC root folder (default=None).
is_classic (bool) – Use a classical chemical fingerprint, instead of CC signatures (default=False).
classic_dataset (str) – Dataset code for the classic fingerprint.
classic_cctype (str) – Signature for the classic dataset.
prestacked_dataset (str) – Prestacked dataset signature.
overwrite (bool) – Clean models_path directory (default=True).
n_jobs (int) – Number of CPUs to use, all by default (default=None).
n_jobs_hpc (int) – Number of CPUs to use in HPC (default=1).
max_train_samples (int) – Maximum number of training samples to use (default=10000).
max_train_ensemble (int) – Maximum size of an ensemble (important when many samples are available) (default=10).
train_sample_chance (float) – Chance of visiting a sample (default=0.95).
standardize (bool) – Standardize small molecule structures (default=True).
is_cv (bool) – In hyper-parameter optimization, do cross-validation (default=False).
is_stratified (bool) – In hyper-parameter optimization, do stratified split (default=True).
n_splits (int) – If hyper-parameter optimization is done, number of splits (default=3).
test_size_hyperopt (int) – If hyper-parameter optimization is done, size of the test (default=0.2).
scaffold_split (bool) – Model should be evaluated with scaffold splits (default=False).
outofuniverse_split (bool) – Model should be evaluated with out-of-universe splits (default=False).
outofuniverse_datasets (list) – Datasets to consider as part of the universe in the out-of-universe split.
outofuniverse_cctype (str) – Signature type of the datasets considered to be part of the out-of-universe split.
conformity (bool) – Do cross-conformal prediction (default=True)
hpc (bool) – Use HPC (default=False)
search_n_iter (int) – Number of iterations in a search for hyperparameters (default=25).
train_timeout (int) – Maximum time in seconds for training a classifier; applies to autosklearn (default=7200).
use_cc (bool) – Use pre-computed CC signatures.
Methods
Store model in compressed format for persistance
cpu_count
create_models_path
directory_tree
Execute the any method on the configured HPC.
Load previously stored TargetMate instance.
Load a base model
load_data
read_data
Redefine path of a TargetMate instance.
Redefine path of a TargetMate instance.
repath_predictions_by_fold_and_set
Redefine path of a TargetMate instance.
reset_path_bases
Reset predictions path
Save TargetMate instance
save_data
Wait for jobs to finish
Delete temporary data
- func_hpc(func_name, *args, **kwargs)
Execute the any method on the configured HPC.
- Parameters:
args (tuple) – the arguments for of the function method
kwargs (dict) – arguments for the HPC method.
- repath_bases_by_fold(fold_number, is_tmp=True, reset=True, only_train=False)[source]
Redefine path of a TargetMate instance. Used by the Validation class.
- repath_predictions_by_fold(fold_number, is_tmp=True, reset=True)[source]
Redefine path of a TargetMate instance. Used by the Validation class.
- repath_predictions_by_set(is_train, is_tmp=True, reset=True)[source]
Redefine path of a TargetMate instance. Used by the Validation class.
- waiter(jobs, secs=3)
Wait for jobs to finish