Exploring Gender Bias in Machine Learning Algorithms: A Linguistic Examination
الكلمات المفتاحية:
gender bias; machine learning, natural language processing, embedding words, algorithm justice.الملخص
The two possible approaches to the understanding of how gender bias is present within the language-based AI systems discussed in this study include word embedding (technique in natural language processing that represents words as numerical vectors) and machine translation. It applies concepts of sociolinguistics and algorithm justice (making equitable, transparent, and socially responsible decisions) to the question of whether these systems reproduce or reproduce previously existing stereotypes of gender in the society. Word embedding analysis shows some obvious trends: there exist strong associations of words that mean male career, science and power, and words that mean female family, appearance and emotions. Using machine translation there is a high inclination to translate into masculine translations. Indicatively, career translations in STEM (science, technology, engineering, and mathematics) contexts show that, out of 1000 of them, 4% are made with female subject pronouns, and 72% with male subject pronouns. All in all, the results indicate that both word embedding and AI-based translations are biased in terms of gender and the biases are frequently stronger than the gender trends in the real world. The research points out three important facts namely, (1) word embedding reinforces subterranean gender stereotyping, (2) language translation systems prefer masculine ones, and (3) they actually favor representations of social inequalities. The authors recommend that such areas as integrating other spheres in addition to deductive techniques and incorporating more diverse data should be improved to understand the ways language represents social hierarchies better. The research will aim to ensure that more transparent and less biased AI is designed by detecting these biases.