chemicalchecker.util.remove_near_duplicates.remove_near_duplicates.RNDuplicates
- class RNDuplicates(nbits=128, only_duplicates=False, cpu=1)[source]
Bases:
object
RNDuplicates class.
Initialize a RNDuplicates instance.
- Parameters:
nbits (int) – Number of bits to use to quantize.
only_duplicates (boolean) – Remove only exact duplicates.
cpu (int) – Number of cores to use.
Methods
Remove redundancy from data.
Save non-redundant data.
- remove(data, keys=None, save_dest=None, just_mappings=False)[source]
Remove redundancy from data.
- Parameters:
data (array) – The data to remove duplicates from. It can be a numpy array or a file path to a
HDF5
file with datasetV
.keys (array) – Array of keys for the input data. If None, keys are taken from
HDF5
datasetkeys
.save_dest (str) – If the result needs to be saved in a file, the path to the file. (default: None)
just_mappings (bool) – Just return the mappings. Only applies if save_dest is None. (default=False)
- Returns:
data (array): mappings (dictionary):
- Return type:
keys (array)