Configuration model

updated: 2022-11-21
#network-science/random-networks

The configuration model is widely used as a baseline model to assess the significance of patterns observed in empirical networks. It is a model of random networks that preserves a critical characteristic of networks, i.e., node degree.

Specification of configuration models

An often underappreciated fact is that there are multiple specifications of the configuration model and that the decision to choose which specification results in a profound impact on the final results.

Unweighted networks

The simplest form of the configuration model is the Chung-Lu model:

Connected Components in Random Graphs with Given Expected Degree Sequences | SpringerLink

which samples two nodes independently with probability proportional to the degree and places a link between them. It allows multiple edges and self-loops, but the likelihood of the events is very negligible for large networks.

Although the chances are small, sometimes we want to prevent self-loops and multiple edges. Furthermore, the Chung-Lu model is not maximally random. A more rigorous model preserving the degree sequence is the enhanced configuration model:

The enhanced configuration model is computationally expensive, which is one big reason people like the Chung-Lu model over the enhanced configuration model. A follow-up paper mitigates the computation cost by using an efficient quadratic programming solver:

Yet, in my opinion, the computational cost is fundamentally the same, which limits the use of the enhanced configuration model for large networks (although I like the idea).

Weighted networks

A naive extension of the configuration model to weighted networks is to regard an edge with weight as edges of weight one. This heuristics is cheap and preserved the weighted degree (or strength). However, it does not preserve the number of neighbors of each node, or simply degree.

The enhanced configuration model can preserve the node strengths.

One can decouple edge placement and weight distribution process and yield a more computationally efficient algorithms, which is an idea at the heart of the conditional maximum-entropy reconstruction models (CREM):

A faster horse on a safer trail: generalized inference for the efficient reconstruction of weighted networks - IOPscience

See also my note @parisiFasterHorseSafer2020.

Directed acyclic graph (DAG)

Directed acyclic graph is a network without loops. In DAG, each edge is directed and always pointing from a higher to lower levels of a hirarchy. A configuration model that repsects this constraint is proposed by

Correlation networks

Correlation networks are a type of weighted networks, where weights represent correlations. Correlation networks should be treated with caution, because correlations are not independent of each other. Two nodes are strongly correlated even without any dydiac relationships if they are correlated with other nodes in a system. The enhanced configuration model does not respect the mutual correlations of correlations, and its naive application to correlation networks may result in poor understanding and prediction.

The configuration model for correlation data builds on the principle of maximum entropy; it conside a maximally random correlation data constrained on the weighted degree of each node in the given correlation network.

There are many other null models for correlation networks, apart from the configuration model:

Algorithms

Unweighted networks

Simulating the configuration model appears to be expensive, as it is a collection of binary variables representing the presence of edges. Luckly, we have an efficient algorithm that runs linear with respect to the number of edges.

If multiedges are allowed, an easy and cheap method is the Chung-Lu model.

Weighted and directed networks

The enhanced configuration model can handle weighted and directed networks. A fast library for the enhanced configuration model is proposed by

Directed Acyclic Graph

If the network is DAG, we can utilize the ordering of nodes to efficiently generate the configuration model:

Properties: