Equality Of Odds

A Visual Introduction to Measuring and Mitigating Bias in Machine Learning

Mia Mayer & Jared Wilber, April 2023

Machine Learning models learn to make predictions by looking at data with the help of algorithms, both of which can potentially be biased against different groups of people. Unwanted bias in machine learning can inadvertently harm, and negatively stereotype against underrepresented or (historically and otherwise) disfavored groups. Therefore, it is crucial to evaluate and control data and model predictions not only for general machine learning performance but also for bias.

Defining Equalized Odds

In this article, we will review a well-known fairness criterion, called 'Equalized Odds' (EO). EO aims to equalize the error a model makes when predicting categorical outcomes for different groups, here: and Sorry, your browser does not support inline SVG. .

EO takes the merit different groups of people have into account by considering the underlying ground truth distribution of the labels. This ensures the errors across outcomes and groups are similar, i.e. fair.

For example, if we consider a hiring scenario, the errors EO compares are 'wrong rejection' and 'wrong acceptance'. We could simply count the number of wrong rejections and acceptances but as groups generally differ in size, we should use error rates instead as those are scale invariant. Useful error rates to consider are the False Negative Rate (FNR)[ℹ] and False Positive Rate (FPR)[ℹ] of a classifier, or the combination of both those error rates[ℹ] .

According to EO, a model is fair if the predictions it makes have the same TPR[ℹ] and FPR across all groups in the dataset. Formally, this can be written as:

P(Y^=1Y=y,A=  \mathbb{P}(\hat{Y}=1| Y=y, A= \; Sorry, your browser does not support inline SVG. )=P(Y^=1Y=y,A=) = \mathbb{P}(\hat{Y}=1| Y=y, A= ),  y{0,1}), \; y \: \in \: \{0,1\}


where Y^\hat{Y} denotes predictions (here: positive), AA refers to the group membership and YY represents the ground truth.

Equalized odds aims to match TPR and FPR for different groups, punishing models that perform well for one group only. Unfortunately this can be very hard to achieve in practice, so it makes sense to relax the EO criterion and consider a modified version of the EO equation with y=1y=1 for equalizing TPR (equal opportunity), or y=0y=0 for equalizing FPR.

Equalized Odds to measure fairness

Using the EO equation, we can derive different metrics to measure the fairness of a model. For example, we can look at:


False Positive Error Rate (FPR) Balance


To calculate FPR balance, we work out FPR [ℹ] per group and take the difference:

FPR\textrm{FPR} FPR - \, \textrm{FPR} Sorry, your browser does not support inline SVG.


The resulting value will be in the range [-1, 1], the closer to 0, the more predictive equality the model achieves and we satisfy the EO equation where y=0 y=0 .


The metrics above show how fair/unfair the model is by measuring either FPR or FNR; but according to EO, we need both values to be the same (known as Conditional Procedure Accuracy Equality) while also achieving a certain predictive performance with the model.

Have a look at the beeswarm plot below. It shows how the predictions of a model change when the probability threshold (the slider) is moved.

Try to find a probability threshold that results in 0 FPR and FNR difference at the same time; is it even possible? Also observe what the model performance (here: group-wise accuracy) is doing as you move the slider.

00.10.20.30.40.50.60.70.80.91 ClassifyAcceptedClassifyRejectedProbability Threshold
Truth Prediction Truth Prediction05101520253035404550Count of Outcomes

Note that as you drag the slider, you might find some so-called lazy solutions where everyone gets rejected or accepted. These are solutions where the FPR or FNR difference is indeed 0. However, while these solutions technically meet the relaxed version of EO, they make little sense from a general ML performance perspective (check out the accuracy values of the model).

You can also verify that there is no probability threshold where FPR and FPR are the same for both groups by comparing the values in the chart below:



Group:Group:00.10.20.30.40.50.60.70.80.9100.10.20.30.40.50.60.70.80.91Comparing FNR and FPR by Probability Threshold Probability Threshold

Equalized Odds to achieve fairness

Using EO, we can also influence the predictions a model makes to achieve a more fair outcome. We are going to look at two different ways of performing this: by constraining the model during training and by introducing group-wise probability thresholds for a trained model.


Constrained Optimization during Training


To implement EO during model training, we can constrain the possible set of parameters, θ\theta, that the so-called loss function[ℹ] , L(θ)L(\theta), can assume. The constraint can be written as:

min  L(θ)subject toP(Y^Y,A= \min \quad \qquad \;\, L(\theta) \\ \textrm{subject to} \quad \mathbb{P}(\hat{Y} \mathrel{\char`≠} Y, A= )P(Y^Y,A=) - \mathbb{P}(\hat{Y} \mathrel{\char`≠} Y, A= \: Sorry, your browser does not support inline SVG. )ϵ        P(Y^Y,A=) \leq \epsilon \\ \qquad \qquad \,\;\;\;\; \mathbb{P}(\hat{Y} \mathrel{\char`≠} Y, A= )P(Y^Y,A=) - \mathbb{P}(\hat{Y} \mathrel{\char`≠} Y, A= \: Sorry, your browser does not support inline SVG. )ϵ) \geq - \epsilon

where ϵR+\epsilon \in \R^{+}. The smaller ϵ\epsilon, the fairer the decision boundary.

Notice that compared to the EO equation, the constraint is actually 'relaxed' as we only require the parameters to create a solution where the difference between FPR and FNR respectively is smaller than ϵ\epsilon.


To visualize the search for the probability thresholds that meets EO, we can look at the so-called ROC curves for both groups, and Sorry, your browser does not support inline SVG. . We can see that for most probability thresholds the TPR and FPR values are different per group. For the dataset shown below, there is only one point where TPR and FPR are equal for both groups (and not lazy solutions); this is where the EO criterion is satisfied.


00.10.20.30.40.50.60.70.80.9100.10.20.30.40.50.60.70.80.91Comparing TPR and FPR per group False Positive RateTrue Positive RateHere, TPR and FPR match for both groups (neither are 0) and EO is satisfied. In this region, TPR = 1 for both groups (lazy solution). In this region, FPR = 0 for both groups (lazy solution).

The End

While machine learning algorithms have the potential to revolutionize decision-making, we have to ensure that a fairness criteria is used for measuring any potential bias in addition to general Machine Learning metrics. Depending on the outcome of the bias evaluation we should include bias mitigation. Equality of Odds (EO) offers a promising approach to mitigate bias and is a method that can be used in different ways (and even during post-processing with access only to the predictions). However, before using EO for evaluation or bias mitigation, we should carefully consider the context and potential trade-offs between competing objectives.





References + Open Source

This article is a product of the following resources + the awesome people who made (and contributed to) them:


[1] Fairness and Machine Learning
(Solon Barocas, Moritz Hardt, Arvind Narayanan).

[2] Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment
(Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, Krishna P. Gummadi, 2016).

[3] Equality of Opportunity in Supervised Learning
(Moritz Hardt, Eric Price, Nathan Srebro, 2016).

D3.js
(Mike Bostock & Philippe Rivière)

KaTeX
(Emily Eisenberg & Sophie Alpert)

Svelte
(Rich Harris)