chemicalchecker.util.sanitize.sanitizer

Simple sanitization of input matrices.

Garbage in, garbage out, before data can be used it needs to be cleaned. This includes:

  • Removing seldomly occurring features (columns)

  • Removing molecules with few data (rows)

  • Handling missing data (NaNs or infs)

  • Trimming less informative features if too many are provided

Classes

Sanitizer

Sanitizer class.