chemicalchecker.util.sampler.triplets.TripletSampler

class TripletSampler(cc, sign0, max_sampled_keys=10000, save=True)[source]

Bases: object

TripletSampler class.

Initialize a TripletSampler instance.

Methods

choice

Choose from a list of candidates

map_triplets_to_reference

Map triplets from full to reference indices

sample

Sample triplets from multiple exemplary datasets of the CC.

sample_triplets

Sample triplets from multiple exemplary datasets of the CC.

sample_triplets_from_dataset

save_triplets

Save triplets

choice(row, fmref_idxs, num_samp)[source]

Choose from a list of candidates

map_triplets_to_reference(triplets)[source]

Map triplets from full to reference indices

sample(datasets=None, num_triplets=1000000, p_pos=0.001, p_neg=0.1, min_pos=10, max_pos=100, max_neg=1000, max_rounds=3, **kwargs)[source]

Sample triplets from multiple exemplary datasets of the CC.

Parameters:
  • datasets (list) – Datasets to be used for the triplet sampling. In none specified, all exemplary are used (default=None).

  • num_triplets (int) – Number of triplets to sample (default=1000000).

  • p_pos (float) – P-value for positive cases (default=0.001).

  • p_neg (float) – P-value for negative cases. In order to provide ‘hard’ cases, it is recommended to put a relatively low p-value (default=0.1).

  • min_pos (int) – Minimum number of neighbors considered to be positives.

  • max_neg (int) – Maximum number of neighbors considered to be negatives.

  • max_rounds (int) – Triplets may be sampled redundantly. Number of rounds to be done before stopping trying (default=10).

sample_triplets(datasets, num_triplets, p_pos, p_neg, min_pos, max_pos, max_neg, max_rounds)[source]

Sample triplets from multiple exemplary datasets of the CC.

save_triplets(triplets, fn)[source]

Save triplets