Missing link prediction

updated: 2023-04-01
#network-science/link-prediction

Link prediction is a task that seeks to identify relationships that have not yet been observed in a network. It can be used for a variety of purposes, including building recommendation systems, detecting fraud, and uncovering knowledge. It can also be used to test theories about how networks form, with successful prediction offering insights into the processes that shape their structure.

Link prediction benchmark

A link prediction algorithm is evaluated based on its ability to predict missing edges that are present but not observed in a given network. The widely-accepted benchmark goes as follows:

An algorithm is given a training network with missing edges and is required to provide a prediction score for each pair of nodes and . The higher the score, the greater the likelihood of an edge existing between those two nodes.
Since there are significantly more unconnected nodes than missing edges due to edge sparsity, a subset of unconnected node pairs sampled uniformly at random is used, with the same size as the set of missing edges.
The algorithm is deemed to be a successful link prediction algorithm if the score is higher for the missing edges (green in the figure below) than for the unconnected node pairs (red).
The overlap of the distributions for the missing edges and unconnected node pairs is commonly quantified by the area under curve (AUC) of the receiving operator characteristics (ROC; right panel).

Link prediction algorithms

Quest for predictive structural variables

A key to a successful link prediction is to find the structural variables that are strongly predictive of edge existence. Pearhaps the simplest variable is the edge density, with which one predicts that an edge appears between any pair of nodes with the same probability, though this is a crude prediction.

Degree

A more predictive variable is node degree, i.e., number of neighbors in a given network. For a given node pair , a prediction score is given by

Despite its simplicity, degree is surprisingly predictive of edges. Indeed, it sometimes excels more complex link prediction algorithms like graph convolution network and graph attention network.

Residual2Vec: Debiasing graph embedding with random graphs

Degree is often a strong determinant of many structural variables, and thus, may contribute to their edge predictability.

Common neighbors

Common neighbors is a pairwise variable that looks at how many neighbors two nodes have in common, reflecting the local relationship between them. This is a major departure from degree, as it focuses more on the connections between pairs of nodes.

The following structural variables are all based on common neighbors, with difference being the normalization by the degree of common neighbors and the number of neighbors.

Adamic Adar:
- Friends and neighbors on the Web - ScienceDirect
Jaccard index:
- The link prediction problem for social networks | Proceedings of the twelfth international conference on Information and knowledge management
Resource allocation:
- Original paper: Phys. Rev. E 76, 046115 (2007) - Bipartite network projection and personal recommendation
- Application to link prediction Predicting missing links via local information | SpringerLink

Path

Path-based variables measure the distance between two nodes in a network by looking at the path between them. This is a generalization of the concept of common neighbors, i.e., two nodes that have common neighbors are connected by the shortest path of length two, the shortest distance for nodes that are not connected in the network.

Katz
- The weighted sum of the number of paths of different length
- About the variable: A new status index derived from sociometric analysis | SpringerLink
- Application to link prediction: The link prediction problem for social networks | Proceedings of the twelfth international conference on Information and knowledge management
Random walk with restart:
- The probability that a random walker moves from one node to another, with random events of restarting.
- Rooted from PageRank The anatomy of a large-scale hypertextual Web search engine - ScienceDirect
- Application to link prediction: The link prediction problem for social networks | Proceedings of the twelfth international conference on Information and knowledge management
SimRank:
- SimRank | Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
- (Interpretation is still not clear)
Local path index
- A simplified version of Katz index
- Phys. Rev. E 80, 046122 (2009) - Similarity index based on local paths for link prediction of complex networks
Local random walk
- The sum of the transition probabilities of random walks of different walk length
- Link prediction based on local random walk - IOPscience

Matrix factorization

SVD
PCA
Laplacian EigenMap

-Learning spectral graph transformations for link prediction | Proceedings of the 26th Annual International Conference on Machine Learning

Model-based

Graph embedding

[PDF] Self-Explainable Graph Neural Networks for Link Prediction | Semantic Scholar

Metadata data

[PDF] Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction | Semantic Scholar

Evaluation metric

Ranking vs classification
Classification
Ranking
- The link prediction problem for social networks | Proceedings of the twelfth international conference on Information and knowledge management

References

Benchmark design

Prediction methods

Network features:
- Link prediction based on local random walk - IOPscience
Ensemble learning:
- Stacking models for nearly optimal link prediction in complex networks | PNAS
Graph embedding:
- [1711.08267] GraphGAN: Graph Representation Learning with Generative Adversarial Nets
- Link Prediction Based on Graph Neural Networks