chemicalchecker.core.chemcheck

Chemical Checker entry-point.

The most common starting point in a CC project is the ChemicalChecker:

from chemicalchecker import ChemicalChecker
cc = ChemicalChecker()

When initializing a CC instance we usually want provide a root directory. If, like in the example above, we don’t specify anything, the default path is assumed (that is the CC_ROOT variable in the Config).

If the specified CC_ROOT directory is already populated, we have successfully initialized the CC instance and will have access to its signatures.

If the CC_ROOT directory is empty we proceed generating the CC directory structure and we’ll have an empty CC instance optimal for handling our own signatures.

The organization of signatures under the CC_ROOT follows hierarchy of molset/dataset/signature.

  • The molset is mostly for internal usage, and its expected values are either “full” or “reference”. In some steps of the pipeline is convenient to work with the non-redundant set of signatures (“reference”) while at end we want to map back to the “full” set of molecules.

  • The dataset is the bioactivity space of interest and is described by the _level_ (e.g. A) the _sublevel_ (e.g. 1) and a _code_ for each input dataset starting from .001. The directory structure follow this hierarchy (e.g. /root/full/A/A1/A1.001)

  • The signature is one of the possible type of signatures (see Signaturization) and the final path is something like /root/full/A/A1/A1.001/sign2

Main goals of this class are:
  1. Check and enforce the directory structure behind a CC instance.

  2. Serve signatures to users or pipelines.

Classes

ChemicalChecker

ChemicalChecker class.