In The Lancet Digital Health, Hannah Knight and colleagues1 highlight stages in the data science pipeline that are affected by and lead to racism. Data linkage is a further stage in which ethnic bias can be encoded into datasets. Ethnic bias occurs when linkage error (false or missed matches) is more likely to occur for particular ethnic groups. The problem of ethnic bias in health data linkage is well described in the literature2 and is concerning because health data are widely used for monitoring, service planning, research, evaluation, and policy. Systematic biases in data linkage misestimate health needs for ethnic minorities and further entrench existing disadvantages.
The Lancet Digital Health, Volume 3, June 2021,