Algorithmic Fairness/en

Article
Topic area	Machine Learning Ethics
Prerequisites	Machine Learning, Supervised Learning, Classification

Translate this page

Other languages:

Overview

Algorithmic fairness is the study of how to design, evaluate, and modify automated decision systems so that their outputs do not produce systematic, unjustified disadvantage for individuals or demographic groups. It sits at the intersection of Machine Learning, statistics, law, and moral philosophy, and has become a central concern as predictive models are deployed in lending, hiring, criminal justice, healthcare, and content moderation. The field provides formal definitions of what it means for a classifier or scoring rule to be "fair", quantitative metrics for measuring deviations from those definitions, and algorithmic interventions for reducing them.

The motivation is empirical as well as normative. Audits of deployed models have repeatedly shown that systems trained on historical data can replicate or amplify patterns of disparity present in the data: a face recognition model with markedly higher error rates on darker-skinned women, a recidivism risk score that produces more false positives for Black defendants than for white defendants at the same true risk, a resume screener that downweights graduates of women's colleges. Algorithmic fairness asks two distinct questions. First, descriptively: how do we detect and quantify such disparities? Second, prescriptively: which disparities are unjust, and what should be done about them? The first question admits mostly statistical answers; the second is irreducibly normative and contested.

Sources of Unfairness

Disparities in model outputs arise from multiple, often compounding sources. Historical bias is encoded in the labels themselves: if past hiring decisions were biased, a model trained to predict "would this person be hired" learns to reproduce that bias even when no protected attribute is used as a feature. Representation bias arises when subgroups are under-sampled relative to the population on which the model will act, so the learned function is less accurate where data is sparse. Measurement bias occurs when a chosen target is a flawed proxy for the underlying construct of interest, for example using arrest rates as a proxy for criminal offending when policing intensity differs across neighborhoods.

Aggregation bias appears when a single model is fit to a population that is in fact a mixture, so coefficients reflect a compromise that fits no subgroup well. Evaluation bias arises when the benchmark used to declare a model "good" is itself unrepresentative. Finally, deployment bias emerges when the operational use of a model differs from the conditions under which it was trained or evaluated, for example when humans selectively override low-stakes predictions but defer to high-stakes ones. None of these sources is fixed by removing the protected attribute from the feature set, since correlated proxies (zip code, name, browsing history) typically remain.

Group Fairness Definitions

Most quantitative work concentrates on group fairness: statistical parity properties that compare outcomes across protected groups $A \in \{0, 1\}$ . Let $Y \in \{0, 1\}$ denote the true label and $\hat{Y}$ the model's prediction. Three families dominate the literature.

Demographic parity (also called statistical parity or independence) requires the prediction to be independent of the protected attribute: $P(\hat{Y} = 1 \mid A = 0) = P(\hat{Y} = 1 \mid A = 1).$ A relaxation, the disparate impact ratio, replaces equality with a tolerance such as the U.S. Equal Employment Opportunity Commission's "four-fifths rule".

Equalized odds (separation) requires equal true positive and false positive rates across groups: $P(\hat{Y} = 1 \mid Y = y, A = 0) = P(\hat{Y} = 1 \mid Y = y, A = 1) \quad \text{for } y \in \{0, 1\}.$ Equal opportunity is the relaxation that requires only equality of true positive rates.

Calibration within groups (sufficiency) requires that, conditional on the model's score $$ S $$ , the outcome is independent of the protected attribute: $P(Y = 1 \mid S = s, A = 0) = P(Y = 1 \mid S = s, A = 1) \quad \text{for all } s.$ A score is well-calibrated within groups when "70 percent risk" means the same empirical thing for both groups.

Individual Fairness

A complementary tradition argues that group statistics are too coarse and that fairness should bind at the level of individuals. The canonical formulation, due to Dwork and colleagues, is that "similar individuals should be treated similarly": for a task-specific metric $$ d $$ on individuals and a metric $$ D $$ on output distributions, $D(M(x), M(x')) \leq L \cdot d(x, x'),$ where $$ M $$ is the model and $$ L $$ a Lipschitz constant. The theoretical appeal of this Lipschitz condition is offset by the difficulty of specifying $$ d $$ : the metric must encode which differences between individuals are morally relevant to the decision, which is precisely the contested question. In practice, individual fairness is often approximated by counterfactual fairness, which asks whether a prediction would change if the protected attribute and its descendants in a causal model were intervened upon.

Impossibility Results

A celebrated set of results shows that the major group fairness criteria are mutually incompatible except in degenerate cases. If base rates differ across groups, $P(Y = 1 \mid A = 0) \neq P(Y = 1 \mid A = 1)$ , then no non-trivial classifier can simultaneously satisfy calibration within groups and equalized odds. Versions of this result appear in Chouldechova's analysis of the COMPAS recidivism tool and in Kleinberg, Mullainathan, and Raghavan's broader inherent trade-offs theorem.^[1]^[2] The implication is that a designer must choose which property to enforce, since enforcing one will violate the others whenever populations differ in their underlying outcome rates.

This result has been read both as a technical curiosity and as a fundamental constraint. It does not say that fair classification is impossible; it says that "fairness" is not a single thing, and that statements like "the model is fair" must be relativized to a specific criterion chosen for specific reasons.

Mitigation Techniques

Algorithmic interventions are usually grouped by where in the pipeline they act. Pre-processing methods reweight or transform the training data so that learned correlations between protected attributes and labels are reduced; reweighing, fair representations, and disparate impact remover fall in this class. In-processing methods modify the training objective itself, adding a fairness regularizer or imposing the fairness criterion as a constraint; adversarial debiasing trains an adversary to predict the protected attribute from the model's representations and the main model to defeat it. Post-processing methods leave the trained scorer untouched and adjust group-specific decision thresholds to satisfy a chosen criterion; the Hardt, Price, and Srebro construction for equalized odds is the canonical example.^[3]

Each location in the pipeline involves trade-offs. Pre-processing is portable across downstream models but loses information that may be useful for prediction. In-processing can yield the best accuracy-fairness frontier but requires retraining. Post-processing is cheap and auditable but requires the protected attribute at decision time, which may itself be illegal or undesirable.

Limitations and Critiques

The formal apparatus of algorithmic fairness has been criticized along several axes. The most basic critique is that statistical parity criteria treat the protected attribute as a fixed, observable category, when in fact race, gender, and disability are socially constructed, contextually performed, and unstably measured. A more structural critique notes that any criterion that compares outcomes across groups while taking the prediction task itself as given will leave untouched the larger question of whether the prediction task should exist; "fair" pretrial detention scoring, for example, may still entrench mass detention.

The field has also been criticized for an excessive focus on binary classification with two protected groups, neglecting intersectional subgroups (where worst-case disparities are typically worse than any single-axis analysis suggests), regression, ranking, and generative models. Recent work on multi-calibration and multi-accuracy generalizes calibration to a rich set of overlapping subgroups, and fairness in large language models has emerged as a domain in its own right.

Relation to Related Fields

Algorithmic fairness is adjacent to but distinct from privacy, robustness, and interpretability. Differential Privacy gives formal guarantees about what an adversary can learn from a model's outputs, and can interact non-trivially with fairness: noise added for privacy can disproportionately degrade accuracy for small subgroups. Robustness to distribution shift is related because fairness can be reframed as performance parity across subpopulations defined by the protected attribute. Interpretability is often invoked as a pathway to fairness — a transparent model is auditable — but transparency is neither necessary nor sufficient for fair outcomes.

References

↑ Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments, Big Data, 2017.
↑ Template:Cite arxiv
↑ Template:Cite arxiv

[1] Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments, Big Data, 2017.

[2] Template:Cite arxiv

[3] Template:Cite arxiv

[1]

[2]

[3]