chemicalchecker.tool.node2vec.node2vec.Node2Vec
- class Node2Vec(executable='node2vec', **kwargs)[source]
Bases:
object
Wrapper to run SNAP’s node2vec.
Check if executable is found.
Methods
Convert from native node2vec format to familiar h5.
Return the maximum degree.
Convert Nearest-neighbor to an edgelist.
Call external exe with given parameters.
Split a graph in train and test.
Convert Nearest-neighbor to an edgelist.
- emb_to_h5(keys, in_file, out_file)[source]
Convert from native node2vec format to familiar h5.
We need to map back from sign1 ids to inchikeys and sort.
- static heuristic_max_degree(nr_nodes, max_mbytes=6000.0)[source]
Return the maximum degree.
The heuristic is based on the limiting factor that is node2vec’s memory footprint. To precompute transition probabilities, node2vec needs 12 bytes for each triplet node-neighbor-neighbor.
- Parameters:
nr_nodes (int) – Number of nodes the network will contain.
max_mbytes (float) – Maximum RAM consumption to allow (in MBs).
- merge_edgelists(signs, edgefiles, out_file, **params)[source]
Convert Nearest-neighbor to an edgelist.
- Parameters:
signs (list) – List of signature objects.
edgefiles (list) – List of edge files.
out_file (str) – Destination file.
params (dict) – Parameters defining similarity network.
- run(i, o, **kwargs)[source]
Call external exe with given parameters.
- Parameters:
i – Input graph path (default:’graph/karate.edgelist’)
o – Output graph path (default:’emb/karate.emb’)
d – Number of dimensions. Default is 128 (default:128)
l – Length of walk per source. Default is 80 (default:80)
r – Number of walks per source. Default is 10 (default:10)
k – Context size for optimization. Default is 10 (default:10)
e – Number of epochs in SGD. Default is 1 (default:1)
p – Return hyperparameter. Default is 1 (default:1)
q – Inout hyperparameter. Default is 1 (default:1)
output. (v Verbose) –
directed. (dr Graph is) –
weighted. (w Graph is) –
embeddings. (ow Output random walks instead of) –