Quantifying and debiasing AI bias

updated: 2023-01-09

#ai-bias

Understanding and quantifying bias manifold

Understanding the bias manifold in embedding is instrumental in designing debiasing algorithms. This seminal work demonstrates that word embedding has a linear axis accounting for gender-bias, i.e., bias dimensions.

Some word emebdding models including Glove and SGNS word2vec embed bias into a linear subspace because the word associations are captured by frequency ratios.

The linearity of bias manifold has led to a metric to quantiy the degree of bias in emebdding.

WEAT statistics are common metrics, although they tend to overestimate the bias and not able to capture non-linear bias:

The picture might look more complex than initially thought, as the bias manifold can be impacted by seemingly inconsequential factors such as word popularity.

One can quantify the local manifold of bias spanned by the k-nearest neighbors:

A word takes a distribution, instead of a point in space, when the embedding is contextual, as a word representation varies depending on the surrounding words. This calls a new framework to probe bias manifold:

Identifying bias

Perturbation analysis can bring out harmful data samples that induce bias:

Alternatively, one can identify the bias a posteriori from the embedding:

Debiasing

Debiasing graph embedding