chemicalchecker.tool.targetmate.universes.univs.Universe

class Universe(cc_root=None, molrepo=None, k=None, model_path=None, tmp_path='/tmp/tm/tmp_universe', min_actives_oneclass=10, max_actives_oneclass=1000, representative_mols_per_cluster=10, trials=1000000, only_bioactive=False)[source]

Bases: object

Initialize the Universe class.

Parameters:
  • cc_root (str) – Chemical Checker root directory (default=None).

  • molrepo (str) – Molrepo to use. Chembl if not specified (default=None)

  • k (int) – Number of partitions for the k-Means clustering (default=sqrt(N/2)).

  • model_path (str) – Folder where the universe should be stored (default = .)

  • tmp_path (str) – Temporary directory (default=/tmp/tm/tmp_universe).

  • min_actives_oneclass (int) – Minimum number of actives to use in the OneClassSVM (default=10).

  • max_actives_oneclass (int) – Maximum number of actives to use in the OneClassSVM (default=1000).

  • representative_mols_per_cluster (int) – Number of molecules to samples for each cluster (default=10).

  • trials (int) – Number of sampling trials before stop trying (default=1000000).

  • only_bioactive (bool) – Only include known bioactive compounds in the chemical space i.e. those compounds found in ChemicalChecker.

Methods

calculate_arena

cluster

clusters_dict

fetch_molecules

fit

fit_oneclass_svm

load_universe

predict

param actives:

Should include (smiles, id, inchikey).

representative_smiles

save

smiles

predict(actives, inactives, inactives_per_active=100, min_actives=10, naive=False, biased_universe=0, maximum_potential_actives=5, random_state=None)[source]
Parameters:
  • actives (list or set) – Should include (smiles, id, inchikey).

  • inactives (list or set) – Should include (smiles, id, inchikey).

  • inactives_per_active (int) – Number of inactives to sample from the universe. Can be None (default=100).

  • min_actives (int) – Minimum number of actives (default=10).

  • naive (bool) – Sample naively (randomly), without using the OneClassSVM (default=False).

  • biased_universe (float) – Proportion of closer molecules to sample as putative inactives (default = 0).

  • maximum_potential_actives (int) – Maximum number of representative molecules within active hyperplane before cluster discarded, used for biased universe (default=5).