#network-science/graph-embedding
#network-science/random-walks
node2vec is a direct application of word2vec optimized with negative sampling.
There are many repos for node2vec, although not every repo produces consistent results, and some repos seem to have bugs. The choice of repo does matter.
node2vec in pypi. However, this node2vec often underperforms because of a bad hyperparameter configuration. Furthermore, this node2vec is extremely slow and memory demanding and not scalable to large networks.All packages except the pytorch geometric are built on top of word2vec implemented in gensim.
node2vec has several parameters to define random walks in networks together with the parameters for gensim (word2vec).
num_walks specifies the number of walkers starting from each node. Larger is better at the expense of computation time and memory. A good value ranges between 10 or 20. If the network is directed, set a larger value like 30 and 40.p is inversely proportional to the probability of backtrack. Less likely a walker backtracks if q is inversely proportional to the probability of visiting a common neighbor of the previously visited node. Less likely a walker visits the common neighbor if context or window_length defines the length of context window. It controls the resolution of the structure to be preserved in the embedding, i.e., smaller window size preserves more local structure. Set window_length = 10 if you have no preference.batch_walk or batch_size is the number of data samples with which to calculate a gradient. Larger is better. Set batch_walk = 10000 if you have no preference.workers is the number of CPU cores to train word2vec.epochs (or iter in gensim version 3.9 or less) is the number of times we go through the given sentences. Larger is better but epochs=1 works enough in many cases.ns_exponent is the exponent of word frequency distribution used to generate negative samples. Set ns_exponent = 0.75 or 1 if no preference.ns_exponent=1) paperp and q. papernum_walks or epochs as much as possible. For reference, num_walks should be at least 20 for networks of 10,000 nodes. Increase more when training a larger network.num_walks when embedding directed networks. node2vec is known to perform poorly for directed networks. I found that this is because a random walker stops walking when it hits a dangling node, producing fewer walks with which to train word2vec.context or (window_length) is one. paperns_exponent = 0 or the given network is a regular graph.