chemicalchecker.core.diagnostics.Diagnosis
- class Diagnosis(sign, ref_cc=None, ref_cctype='sign0', save=True, plot=True, overwrite=False, load=True, n=10000, seed=42, cpu=4)[source]
Bases:
object
Diagnosis class.
Initialize a Diagnosis instance.
- Parameters:
ref_cc (ChemicalChecker) – A CC instance used as reference.
sign (CC signature) – The CC signature object to be diagnosed.
save (bool) – Whether to save results in the diags folder of the signature. (default=True)
plot (bool) – Whether to save plots in the diags folder of the signature. (default=True)
overwrite (bool) – Whether to overwrite the results of the diagnosis. (default=False)
n (int) – Number of molecules to sample. (default=10000)
Methods
Check coverage against a collection of other CC signatures.
Check coverage against a collection of other CC signatures.
atc_roc
available
canvas
Run HPC jobs .
canvas_large
canvas_medium
canvas_small
Remove al diagnostic data.
cluster_sizes
clusters_projection
confidences
confidences_projection
Cosine distance distribution.
Intersection of coverages.
Perform validations.
custom_comparative_vertical
Run HPC jobs .
Get dimensions of the signature and compare to other signatures.
Euclidean distance distribution.
features_bins
features_iqr
Sample-specific global accuracy.
global_ranks_agreement_projection
image
intensities
intensities_projection
key_coverage
key_coverage_projection
keys_bins
keys_iqr
moa_roc
Check ROC against another signature at different NN levels.
orthogonality
Computes anomaly score of the input samples.
pr
TSNE projection of CC signatures.
Sample-specific accuracy.
ranks_agreement_projection
redundancy
roc
values
Attributes
V
keys
- across_coverage(*args, datasets=None, exemplary=True, ref_cctype='sign1', **kwargs)[source]
Check coverage against a collection of other CC signatures.
- Parameters:
datasets (list) – List of datasets. If None, all available are used. (default=None)
exemplary (bool) – Whether to use only exemplary datasets (recommended). (default=True)
cctype (str) – CC signature type. (default=None)
molset (str) – Molecule set to use. Full is recommended. (default=None)
kwargs (dict) – params of hte cross_coverage method.
- across_roc(*args, datasets=None, exemplary=True, ref_cctype=None, redo=False, include_datasets=None, **kwargs)[source]
Check coverage against a collection of other CC signatures.
- Parameters:
datasets (list) – List of datasets. If None, all available are used. (default=None).
exemplary (bool) – Whether to use only exemplary datasets (recommended). (default=True)
ref_cctype (str) – CC signature type. (default=’sign0’)
redo (bool) – redo the plot
include_datasets (list) – specific datasets to add when exemplary is set to True (default=None)
kwargs (dict) – Parameters of the cross_roc method.
- canvas_hpc(tmpdir, **kwargs)[source]
Run HPC jobs .
- tmpdir(str): Folder (usually in scratch) where the job directory is
generated.
cc_root: CC root path cctype: CC type (sign0, sign1, sign2, sign3) on which the method is applied molset: ‘full’ or ‘reference’ dss: datasets to run the diagnostics on cc_reference: another version of CC to use as diagnostic reference
- cosine_distances(*args, n_pairs=10000, **kwargs)[source]
Cosine distance distribution.
- Parameters:
n_pairs (int) – Number of pairs to sample. (default=10000)
- cross_coverage(dataset, *args, ref_cctype='sign1', molset='full', try_conn_layer=False, redo=False, **kwargs)[source]
Intersection of coverages.
- Parameters:
sign (signature) – A CC signature object to check against.
- cross_roc(sign, *args, n_samples=10000, n_neighbors=5, neg_pos_ratio=1, apply_mappings=False, try_conn_layer=False, metric='cosine', redo=False, val_type='roc', **kwargs)[source]
Perform validations.
- Parameters:
sign (signature) – A CC signature object to validate against.
n_samples (int) – Number of samples.
apply_mappings (bool) – Whether to use mappings to compute validation. Signature which have been redundancy-reduced (i.e. reference) have fewer molecules. The key are molecules from the full signature and values are molecules from the reference set.
try_conn_layer (bool) – Try with the inchikey connectivity layer. (default=False)
metric (str) – ‘cosine’ or ‘euclidean’. (default=’cosine’)
val_type (str) – ‘roc’ or ‘pr’. (default=’roc’)
save (bool) – Specific save parameter. If not specified, the global is set. (default=None).
- static diagnostics_hpc(tmpdir, cc_root, cctype, molset, dss, cc_reference, **kwargs)[source]
Run HPC jobs .
- tmpdir(str): Folder (usually in scratch) where the job directory is
generated.
cc_root: CC root path cctype: CC type (sign0, sign1, sign2, sign3) on which the method is applied molset: ‘full’ or ‘reference’ dss: datasets to run the diagnostics on cc_reference: another version of CC to use as diagnostic reference
- dimensions(*args, datasets=None, exemplary=True, ref_cctype='sign1', molset='full', **kwargs)[source]
Get dimensions of the signature and compare to other signatures.
- euclidean_distances(*arg, n_pairs=10000, **kwargs)[source]
Euclidean distance distribution.
- Parameters:
n_pairs (int) – Number of pairs to sample. (default=10000)
- global_ranks_agreement(*args, n_neighbors=100, min_shared=100, metric='minkowski', p=0.9, ref_cctype=None, **kwargs)[source]
Sample-specific global accuracy.
Estimated as general agreement with the rest of the CC, based on a Z-global ranking.
- neigh_roc(ds, *args, ref_cctype=None, n_neighbors=[1, 5, 10, 50, 100], **kwargs)[source]
Check ROC against another signature at different NN levels.
- Parameters:
ds – Dataset aginst which to run ROC analysis.
ref_cctype (str) – CC signature type.
neighbors (list) – list of top NN for which we want to compute ROC.
molset (str) – Molecule set to use. Full is recommended. (default=’full’)
kwargs (dict) – Parameters of hte cross_coverage method.
- outliers(*args, n_estimators=1000, **kwargs)[source]
Computes anomaly score of the input samples.
The lower, the more abnormal. Negative scores represent outliers, positive scores represent inliers.
- projection(*args, keys=None, focus_keys=None, max_keys=10000, perplexity=None, max_pca=100, redo=False, **kwargs)[source]
TSNE projection of CC signatures.
- Parameters:
keys (list) – Keys to be projected. If None specified, keys are randomly sampled. (default=None)
focus_keys (list) – Keys to be highlighted in the projection. (default=None).
max_keys (int) – Maximum number of keys to include in the projection. (default=10000)