Study explores face detection bias in race and gender
13 February 2018 16:04 GMT

A new study has found that face recognition software using machine learning often allocates an incorrect gender due to racial biases in its datasets.

The study by MIT and Microsoft experts looks at how bad facial-recognition software is at accurately identifying darker faces, especially when those faces belong to women. In a test to identify the sex of people from their faces, when the software was able to do so with more than 99% accuracy for light-skinned men. For darker-skinned women, the software could be wrong as frequently as one-third of the time.

The results shed more light on a known problem—how limited data sets can impact the effectiveness of artificial intelligence, which might in turn heighten bias against individuals as AI becomes more widespread.

In the paper, Joy Buolamwini of the MIT Media Lab and Timnit Gebru of Microsoft Research, discussed the results of a software evaluation carried out in April and May of last year. For it, they gathered a database of 1,270 faces, drawn from images of lawmakers in countries with high percentages of women in power. Three of the countries were in Africa, while three were Nordic countries.

The researchers then ran the images against facial recognition software from three providers—IBM, Microsoft, and China’s Megvii (identified in the paper by the name of its software, Face++)—to asses how accurately each recognized the gender of the person pictured. (The researchers said they worked with binary gender classifications because of limitations with the data they were working with.)

The team discovered that all three companies were more likely to correctly identify a subject as male or female if the subject had pale skin.

Past research has also shown that the accuracies of face recognition systems used by US-based law enforcement are systematically lower for people labeled female, Black, or between the ages of 18—30 than for other demographic cohorts.

"Inclusive benchmark datasets and subgroup accuracy reports will be necessary to increase transparency and accountability in artificial intelligence," write the authors. 

"The lack of datasets that are labeled by ethnicity limits the generalizability of research exploring the impact of ethnicity on gender classification accuracy", they add.