chemicalchecker.util.splitter.neighbortriplet.OldTripletSampler

class OldTripletSampler(*args, **kwargs)[source]

Bases: BaseTripletSampler

Used to be the monstrous NeighborTripletTraintest. Performs well on large spaces, less well on smaller ones

Methods

generate_triplets

Sample triplets using an approach suited for large spaces.

get_split_indeces

Get random indexes for different splits.

save_triplets

Save sampled triplets to file.

generate_triplets(f_per=0.1, t_per=0.01, mean_center_x=True, shuffle=True, check_distances=True, split_names=['train', 'test'], split_fractions=[0.8, 0.2], suffix='eval', x_dtype=<class 'numpy.float32'>, y_dtype=<class 'numpy.float32'>, num_triplets=1000000.0, limit=100000, cpu=1)[source]

Sample triplets using an approach suited for large spaces.

Parameters:

num_triplets (int) – Total number of triplets to generate.

get_split_indeces(rows, fractions)

Get random indexes for different splits.

save_triplets(triplets, mean_center_x=True, shuffle=True, split_names=['train', 'test'], split_fractions=[0.8, 0.2], suffix='eval', cpu=1, x_dtype=<class 'numpy.float32'>, y_dtype=<class 'numpy.float32'>)

Save sampled triplets to file.

This function saves triplets performing the train test split, shuffling and normalization.

Parameters:
  • triplets (array) – Indexes of anchor, positive and negative for each triplet.

  • mean_center_x (bool) – Normalize data columns wise.

  • shuffle (bool) – shuffle order of triplets.

  • split_names (list str) – names of the splits.

  • split_fractions (list float) – fraction of each split.

  • suffix (str) – suffix of the generated scaler.