How to detect gender bias in your algorithm ...
Several research papers point to biases in algorithms and Machine Learning. In the broader sense of AI, this is a huge and challenging topic. In the narrow sense, we can detect potential biases in several algorithms such as Logistic Regression (a classification algorithm).
Once the algorithm has been trained, the coefficients (prediction) of the Logistic Regression can alert us to potential biases.
For example, let's say we get the following coefficients: Age (normalized) * 2.56 Female * 0.05 Male * -0.15
In the coefficient, we can see clearly that the feature "age" has a large value (positive or negative) and therefore is a strong indicator for classification. Of course, in many services, age might be a viable factor. For example, I would not be surprised to see that Snapchat users churn with higher age.
In some situations, algorithmic bias can become a problem. Here, we want to see if the algorithm favors male or female:
With coefficients around zero for both, male and female, it seems that the algorithm does not favor one sex over the other. But what should we do, if we discover bias in the algorithm? One potential way is to visualize it. In this example, the algorithm has a bias towards age and gender (male). We can set a threshold and color-code the coefficients.
The visual combined with the color-code shows that there's a potential problem with the algorithm. The visualization can help to discuss the issue internally and how to proceed. From a Data Science perspective, we could additionally calculate the accuracy of the algorithm with and without the bias in order to make better data-driven decisions.