chemicalchecker.core.signature_base.BaseSignature

class BaseSignature(signature_path, dataset, readyfile='fit.ready', **params)[source]

Bases: object

BaseSignature class.

Initialize a BaseSignature instance.

Methods

available

This signature data is available.

background_distances

Return the background distances according to the selected metric.

clear

Remove everything from this signature.

clear_all

Remove everything from this signature for both referene and full.

diagnosis

fit

Fit a model.

fit_end

Conclude fit method.

fit_hpc

Execute the fit method on the configured HPC.

func_hpc

Execute the any method on the configured HPC.

get_cc

Return the CC where the signature is present

get_intersection

Return the intersection between two signatures.

get_molset

Return a signature from a different molset

get_neig

Return the neighbors signature, given a signature

get_non_redundant_intersection

Return the non redundant intersection between two signatures.

get_sign

Return the signature type for current dataset

get_status_stack

is_fit

mark_ready

predict

Use the fitted models to predict.

save_full

Map the non redundant signature in explicit full molset.

save_reference

Save a non redundant signature in reference molset.

to_csv

Write smiles to h5.

update_status

validate

Perform validations.

Attributes

qualified_name

Signature qualified name (e.g.

status

__repr__()[source]

String representig the signature.

available()[source]

This signature data is available.

background_distances(metric, limit_inks=None, name=None)[source]

Return the background distances according to the selected metric.

Parameters:

metric (str) – the metric name (cosine or euclidean).

clear()[source]

Remove everything from this signature.

clear_all()[source]

Remove everything from this signature for both referene and full.

abstract fit(**kwargs)[source]

Fit a model.

fit_end(**kwargs)[source]

Conclude fit method.

We compute background distances, run validations (including diagnostic) and finally marking the signature as ready.

fit_hpc(*args, **kwargs)[source]

Execute the fit method on the configured HPC.

Parameters:
  • args (tuple) – the arguments for of the fit method

  • kwargs (dict) – arguments for the HPC method.

func_hpc(func_name, *args, **kwargs)[source]

Execute the any method on the configured HPC.

Parameters:
  • args (tuple) – the arguments for of the fit method

  • kwargs (dict) – arguments for the HPC method.

get_cc(cc_root=None)[source]

Return the CC where the signature is present

get_intersection(sign)[source]

Return the intersection between two signatures.

get_molset(molset)[source]

Return a signature from a different molset

get_neig()[source]

Return the neighbors signature, given a signature

get_non_redundant_intersection(sign)[source]

Return the non redundant intersection between two signatures.

(i.e. keys and vectors that are common to both signatures.) N.B: to maximize overlap it’s better to use signatures of type ‘full’. N.B: Near duplicates are found in the first signature.

get_sign(sign_type)[source]

Return the signature type for current dataset

abstract predict()[source]

Use the fitted models to predict.

property qualified_name

Signature qualified name (e.g. ‘B1.001-sign1-full’).

save_full(overwrite=False)[source]

Map the non redundant signature in explicit full molset.

It generates a new signature in the full folders.

Parameters:

overwrite (bool) – Overwrite existing (default=False).

save_reference(cpu=4, overwrite=False)[source]

Save a non redundant signature in reference molset.

It generates a new signature in the references folders.

Parameters:
  • cpu (int) – Number of CPUs (default=4),

  • overwrite (bool) – Overwrite existing (default=False).

to_csv(filename, smiles=None)[source]

Write smiles to h5.

At the moment this is done quering the Structure table for inchikey inchi mapping and then converting via Converter.

validate(apply_mappings=True, metric='cosine', diagnostics=False)[source]

Perform validations.

A validation file is an external resource basically presenting pairs of molecules and whether they share or not a given property (i.e the file format is inchikey inchikey 0/1). Current test are performed on MOA (Mode Of Action) and ATC (Anatomical Therapeutic Chemical) corresponding to B1.001 and E1.001 dataset.

Parameters:

apply_mappings (bool) – Whether to use mappings to compute validation. Signature which have been redundancy-reduced (i.e. reference) have fewer molecules. The key are moleules from the full signature and values are moleules from the reference set.