chemicalchecker.core.preprocess

Data preprocessing.

Given the diversity of formats and datasources, the signaturization process starts in tailored pre-process scripts (available in the package scripts folder). The fit method invoke the pre-process script with a fit argument where we essentially learn the feature to consider. The predict method allow deriving signatures without altering the feature set. This can also be used when mapping to a bioactive space different entities (i.e. not only compounds)

E.g. categorical: “C0015230,C0016436…” which translates in n array of 0s or 1s. discrete: “GO:0006897(8),GO:0006796(3),…” which translates in an array of integers continous: “0.515,1.690,0.996” which is an array of floats

Classes

Preprocess

Preprocess class.