Chemical Checker Manifesto

In Ersilia, to establish the relationships that sustain the city’s life, the inhabitants stretch strings from the corners of the houses, white or black or gray or black-and-white according to whether they mark a relationship of blood, of trade, authority, agency. When the strings become so numerous that you can no longer pass among them, the inhabitants leave: the houses are dismantled; only the strings and their supports remain.

Invisible Cities, Italo Calvino

Chemistry first

The chemical matter has been largely overlooked in the omics era in favor of sequence molecules such as DNA or proteins. It is time to praise for small molecules and bring them back to the front-line of biomedical research.

The small molecule similarity principle simply makes sense

Similar molecules (tend to) have similar bioactivities. This principle has been validated recurrently and it is the cornerstone of the CC.

All molecules are equally interesting

There is an acute research bias towards popular compounds such as approved drugs. The chemical space is much larger than this. Understudied molecules need to enter computational drug discovery pipelines.

Infinite data, finite types of data

The CC is restricted to a 5x5 organization of data types, and it will continue like this. Data are constantly released to the public domain, and it is our duty to keep up to date with the work of others. Despite the rather fixed organization of the CC, we have room for an unlimited number of datasets. Dataset are data of a certain type, belonging to one (or more) particular resource, and treated in a well-defined way.

Others are better curators than us

The SB&NB is not a data curation group, and the CC is not an integrative database in the strict sense of the word. We trust the work of others.

It is time to go predictive

The omics revolution has produced an overwhelming amount of molecule/phenotype correlations but, most often, these have not been useful to do prospective predictions. In the CC, we sacrify precise interpretation in favor of more abstract representations of the data (signatures), which are easy to plug to machine learning algorithms.

We don’t need another chemoinformatics library

There are tons of chemoinformatics libraries out there, and some of them are truly awesome. The CC is mostly about generating biological signatures, not about reading different types of small molecule formats or producing images for them.