chemicalchecker.tool.node2vec.node2vec.Node2Vec

class Node2Vec(executable='node2vec', **kwargs)[source]

Bases: object

Wrapper to run SNAP’s node2vec.

Check if executable is found.

Methods

emb_to_h5

Convert from native node2vec format to familiar h5.

heuristic_max_degree

Return the maximum degree.

merge_edgelists

Convert Nearest-neighbor to an edgelist.

run

Call external exe with given parameters.

split_edgelist

Split a graph in train and test.

to_edgelist

Convert Nearest-neighbor to an edgelist.

emb_to_h5(keys, in_file, out_file)[source]

Convert from native node2vec format to familiar h5.

We need to map back from sign1 ids to inchikeys and sort.

static heuristic_max_degree(nr_nodes, max_mbytes=6000.0)[source]

Return the maximum degree.

The heuristic is based on the limiting factor that is node2vec’s memory footprint. To precompute transition probabilities, node2vec needs 12 bytes for each triplet node-neighbor-neighbor.

Parameters:
  • nr_nodes (int) – Number of nodes the network will contain.

  • max_mbytes (float) – Maximum RAM consumption to allow (in MBs).

merge_edgelists(signs, edgefiles, out_file, **params)[source]

Convert Nearest-neighbor to an edgelist.

Parameters:
  • signs (list) – List of signature objects.

  • edgefiles (list) – List of edge files.

  • out_file (str) – Destination file.

  • params (dict) – Parameters defining similarity network.

run(i, o, **kwargs)[source]

Call external exe with given parameters.

Parameters:
  • i – Input graph path (default:’graph/karate.edgelist’)

  • o – Output graph path (default:’emb/karate.emb’)

  • d – Number of dimensions. Default is 128 (default:128)

  • l – Length of walk per source. Default is 80 (default:80)

  • r – Number of walks per source. Default is 10 (default:10)

  • k – Context size for optimization. Default is 10 (default:10)

  • e – Number of epochs in SGD. Default is 1 (default:1)

  • p – Return hyperparameter. Default is 1 (default:1)

  • q – Inout hyperparameter. Default is 1 (default:1)

  • output. (v Verbose) –

  • directed. (dr Graph is) –

  • weighted. (w Graph is) –

  • embeddings. (ow Output random walks instead of) –

split_edgelist(full_graph, train, test, train_fraction=0.8)[source]

Split a graph in train and test.

Give a Network object split it in two sets, train and test, so that train has train_fraction of edges for each node.

to_edgelist(sign1, neig1, out_file, **params)[source]

Convert Nearest-neighbor to an edgelist.

Parameters:
  • sign1 (str) – Signature 1, h5 file path.

  • neig1 (str) – Nearest neighbors 1, h5 file path.

  • out_file (str) – Destination file.

  • params (dict) – Parameters defining similarity network.