chemicalchecker.core.signature_base.BaseSignature
- class BaseSignature(signature_path, dataset, readyfile='fit.ready', **params)[source]
Bases:
object
BaseSignature class.
Initialize a BaseSignature instance.
Methods
This signature data is available.
Return the background distances according to the selected metric.
Remove everything from this signature.
Remove everything from this signature for both referene and full.
diagnosis
Fit a model.
Conclude fit method.
Execute the fit method on the configured HPC.
Execute the any method on the configured HPC.
Return the CC where the signature is present
Return the intersection between two signatures.
Return a signature from a different molset
Return the neighbors signature, given a signature
Return the non redundant intersection between two signatures.
Return the signature type for current dataset
get_status_stack
is_fit
mark_ready
Use the fitted models to predict.
Map the non redundant signature in explicit full molset.
Save a non redundant signature in reference molset.
Write smiles to h5.
update_status
Perform validations.
Attributes
Signature qualified name (e.g.
status
- background_distances(metric, limit_inks=None, name=None)[source]
Return the background distances according to the selected metric.
- Parameters:
metric (str) – the metric name (cosine or euclidean).
- fit_end(**kwargs)[source]
Conclude fit method.
We compute background distances, run validations (including diagnostic) and finally marking the signature as ready.
- fit_hpc(*args, **kwargs)[source]
Execute the fit method on the configured HPC.
- Parameters:
args (tuple) – the arguments for of the fit method
kwargs (dict) – arguments for the HPC method.
- func_hpc(func_name, *args, **kwargs)[source]
Execute the any method on the configured HPC.
- Parameters:
args (tuple) – the arguments for of the fit method
kwargs (dict) – arguments for the HPC method.
- get_non_redundant_intersection(sign)[source]
Return the non redundant intersection between two signatures.
(i.e. keys and vectors that are common to both signatures.) N.B: to maximize overlap it’s better to use signatures of type ‘full’. N.B: Near duplicates are found in the first signature.
- property qualified_name
Signature qualified name (e.g. ‘B1.001-sign1-full’).
- save_full(overwrite=False)[source]
Map the non redundant signature in explicit full molset.
It generates a new signature in the full folders.
- Parameters:
overwrite (bool) – Overwrite existing (default=False).
- save_reference(cpu=4, overwrite=False)[source]
Save a non redundant signature in reference molset.
It generates a new signature in the references folders.
- Parameters:
cpu (int) – Number of CPUs (default=4),
overwrite (bool) – Overwrite existing (default=False).
- to_csv(filename, smiles=None)[source]
Write smiles to h5.
At the moment this is done quering the Structure table for inchikey inchi mapping and then converting via Converter.
- validate(apply_mappings=True, metric='cosine', diagnostics=False)[source]
Perform validations.
A validation file is an external resource basically presenting pairs of molecules and whether they share or not a given property (i.e the file format is inchikey inchikey 0/1). Current test are performed on MOA (Mode Of Action) and ATC (Anatomical Therapeutic Chemical) corresponding to B1.001 and E1.001 dataset.
- Parameters:
apply_mappings (bool) – Whether to use mappings to compute validation. Signature which have been redundancy-reduced (i.e. reference) have fewer molecules. The key are moleules from the full signature and values are moleules from the reference set.