chemicalchecker.util.splitter.neighbortriplet.OldTripletSampler
- class OldTripletSampler(*args, **kwargs)[source]
Bases:
BaseTripletSampler
Used to be the monstrous NeighborTripletTraintest. Performs well on large spaces, less well on smaller ones
Methods
Sample triplets using an approach suited for large spaces.
Get random indexes for different splits.
Save sampled triplets to file.
- generate_triplets(f_per=0.1, t_per=0.01, mean_center_x=True, shuffle=True, check_distances=True, split_names=['train', 'test'], split_fractions=[0.8, 0.2], suffix='eval', x_dtype=<class 'numpy.float32'>, y_dtype=<class 'numpy.float32'>, num_triplets=1000000.0, limit=100000, cpu=1)[source]
Sample triplets using an approach suited for large spaces.
- Parameters:
num_triplets (int) – Total number of triplets to generate.
- get_split_indeces(rows, fractions)
Get random indexes for different splits.
- save_triplets(triplets, mean_center_x=True, shuffle=True, split_names=['train', 'test'], split_fractions=[0.8, 0.2], suffix='eval', cpu=1, x_dtype=<class 'numpy.float32'>, y_dtype=<class 'numpy.float32'>)
Save sampled triplets to file.
This function saves triplets performing the train test split, shuffling and normalization.
- Parameters:
triplets (array) – Indexes of anchor, positive and negative for each triplet.
mean_center_x (bool) – Normalize data columns wise.
shuffle (bool) – shuffle order of triplets.
split_names (list str) – names of the splits.
split_fractions (list float) – fraction of each split.
suffix (str) – suffix of the generated scaler.