updated: 2022-11-25
#machine-learning/metrics
How similar are the two rankings? Answering this question leads to a variety of ranking similarity metrics.
We can measure the similarity between two entities using correlation. For example, if the entities have two variables representing their positions in two rankings, the correlation between the variables can be used to quantify the similarity.
Rank biased overlap (RBO) is a similarity metric which is more sensitive to the differences in top rankers, as opposed to correlation-based metrics which treat all entities equally. This is preferred, as in practice, the top entities are more important than the bottom entities.
The RBO is a concept that is fundamental to many biased similarities. Let's consider two lists of entities,
The parameter
Implementing RBO is trickly since the sum is extended over
The original RBO takes the sum over entities beyond the ranking. Since the entities outside of the ranking is unknown, the original paper introduces an exterpolation method. But this gives many complexity with additional assumptions on the input data. So I personally prefer its truncated version, where the summation is taken for the finite ranking. Since the summation is truncated, the maximum "truncated" RBO is less than one, which makes interpretation difficult. To amend this, I rescale the truncated RBO by dividing it by the maximum truncated sum.
My implementation is available here:
Mean Squared Error
Normalized Discounted Cumulative Gain
Spearman Rank-Order Correlation
Kendall Tau Correlation
Average Precision at K
Normalized Mutual Information
Jaccard Similarity
Chu-Liu/Edmonds Algorithm
List of metrics for rankings
The mathematical link between MAP and DCG Metrics for evaluating ranking algorithms - Cross Validated