updated: 2022-11-21
#network-science/random-networks
The configuration model is widely used as a baseline model to assess the significance of patterns observed in empirical networks. It is a model of random networks that preserves a critical characteristic of networks, i.e., node degree.
An often underappreciated fact is that there are multiple specifications of the configuration model and that the decision to choose which specification results in a profound impact on the final results.
The simplest form of the configuration model is the Chung-Lu model:
Connected Components in Random Graphs with Given Expected Degree Sequences | SpringerLink
which samples two nodes independently with probability proportional to the degree and places a link between them. It allows multiple edges and self-loops, but the likelihood of the events is very negligible for large networks.
Although the chances are small, sometimes we want to prevent self-loops and multiple edges. Furthermore, the Chung-Lu model is not maximally random. A more rigorous model preserving the degree sequence is the enhanced configuration model:
The enhanced configuration model is computationally expensive, which is one big reason people like the Chung-Lu model over the enhanced configuration model. A follow-up paper mitigates the computation cost by using an efficient quadratic programming solver:
Yet, in my opinion, the computational cost is fundamentally the same, which limits the use of the enhanced configuration model for large networks (although I like the idea).
A naive extension of the configuration model to weighted networks is to regard an edge with weight
The enhanced configuration model can preserve the node strengths.
One can decouple edge placement and weight distribution process and yield a more computationally efficient algorithms, which is an idea at the heart of the conditional maximum-entropy reconstruction models (CREM):
See also my note @parisiFasterHorseSafer2020.
Directed acyclic graph is a network without loops. In DAG, each edge is directed and always pointing from a higher to lower levels of a hirarchy. A configuration model that repsects this constraint is proposed by
Correlation networks are a type of weighted networks, where weights represent correlations. Correlation networks should be treated with caution, because correlations are not independent of each other. Two nodes are strongly correlated even without any dydiac relationships if they are correlated with other nodes in a system. The enhanced configuration model does not respect the mutual correlations of correlations, and its naive application to correlation networks may result in poor understanding and prediction.
The configuration model for correlation data builds on the principle of maximum entropy; it conside a maximally random correlation data constrained on the weighted degree of each node in the given correlation network.
There are many other null models for correlation networks, apart from the configuration model:
Simulating the configuration model appears to be expensive, as it is a collection of
If multiedges are allowed, an easy and cheap method is the Chung-Lu model.
The enhanced configuration model can handle weighted and directed networks. A fast library for the enhanced configuration model is proposed by
If the network is DAG, we can utilize the ordering of nodes to efficiently generate the configuration model: