chemicalchecker.util.splitter.ae_siam_traintest.AE_SiameseTraintest

class AE_SiameseTraintest(hdf5_file, split, replace_nan=None)[source]

Bases: object

AE_SiameseTraintest class.

Initialize a AE_SiameseTraintest instance.

We assume the file is containing diffrent splits. e.g. “x_train”, “y_train”, “x_test”, …

Methods

close

Close the HDF5.

generator_fn

Return the generator function that we can query for batches.

get_split_indeces

Get random indeces for different splits.

get_split_names

Return the name of the splits.

get_sw

Get a batch of X.

get_x

Get a batch of X.

get_x_shapes

Return the shpaes of X.

get_xy

Get a batch of X and Y.

get_xy_shapes

Return the shpaes of X an Y.

open

Open the HDF5.

split_h5_blocks

Create the HDF5 file with validation splits from an input file.

Attributes

sw_name_right

available_splits = self.get_split_names() if split not in available_splits: raise Exception("Split '%s' not found in %s!" % (split, str(available_splits)))

close()[source]

Close the HDF5.

static generator_fn(file_name, split, batch_size=None, only_x=False, sample_weights=False, shuffle=True, return_on_epoch=False)[source]

Return the generator function that we can query for batches.

static get_split_indeces(rows, fractions)[source]

Get random indeces for different splits.

get_split_names()[source]

Return the name of the splits.

get_sw(beg_idx, end_idx)[source]

Get a batch of X.

get_x(beg_idx, end_idx)[source]

Get a batch of X.

get_x_shapes()[source]

Return the shpaes of X.

get_xy(beg_idx, end_idx, shuffle)[source]

Get a batch of X and Y.

get_xy_shapes()[source]

Return the shpaes of X an Y.

open()[source]

Open the HDF5.

static split_h5_blocks(in_file, out_file, split_names=['train', 'test', 'validation'], split_fractions=[0.8, 0.1, 0.1], block_size=1000, input_datasets=None)[source]

Create the HDF5 file with validation splits from an input file.

Parameters:
  • in_file (str) – path of the h5 file to read from.

  • out_file (str) – path of the h5 file to write.

  • split_names (list(str)) – names for the split of data.

  • split_fractions (list(float)) – fraction of data in each split.

  • block_size (int) – size of the block to be used.

  • dataset (list) – only split the given dataset and ignore others.

sw_name_right

available_splits = self.get_split_names() if split not in available_splits:

raise Exception(“Split ‘%s’ not found in %s!” %

(split, str(available_splits)))