WEB & WIRELESS COMPUTING LABORATORY :: MISSOURI University of Science & Technology

Projects

Project 5: Crowd-Audit Platforms to Quantify Biases in ML-based Systems

Description: Several open-source libraries have been developed to evaluate biases in recommendations and develop trustworthy machine learning (ML) based systems [MMS+21]. However, these fair-AI platforms evaluate traditional fairness notions such as individual and group fairness which rely on the knowledge of diverse factors such as private attributes that are contextually sensitive and distance metrics to quantify similarity between people, which makes it difficult to apply on practical applications [MZP21]. On the other hand, such ML-based systems cannot be evaluated by human auditors alone due to their sheer complexity [SHK19]. Therefore, it is necessary to develop an audit platform that leverages human-system teaming (e.g. crowd-audit framework). In our preliminary work in [TN21], we proposed a novel fairness notion based on the principles of non-comparative justice, where an auditor evaluates the classifier’s outcome by comparing it to an intrinsically desired outcome. We showed that any AI system can be deemed fair in terms of any traditional fairness notion, if it is non-comparatively fair with respect to a fair auditor. Therefore, in an attempt to identify fair auditors, there is a need to estimate auditor biases, identify fair and reliable auditors. Specifically, the goal of this project is to design and develop a reliable and sustainable crowd-audit platform. Students will first develop a web-application based crowd auditing platform that evaluates classifier biases based on input tuples and corresponding output labels generated by the classifier, using crowd-auditors’ judgments. Furthermore, the performance of an auditor will be tracked and a reputation badge can be designed based on estimated bias of a particular auditor in order to promote participation of unbiased auditors. By developing such a platform, students will learn how established crowdsourcing platforms work, such as Amazon M-Turk, in addition to developing novel features relevant to evaluating biases in classifiers as well as auditors. Furthermore, students will also learn how privacy of human auditors can be preserved by protecting any sensitive information inferred based on their evaluation data.

Sample Design Experiment: Performance evaluation of the crowd-audit platform’s sustainability, auditor evaluation as well as quantifying biases in ML-based systems.

Purpose: To let students learn how crowd-auditing platforms work via (i) building a web-application based platform from scratch, (ii) aggregate crowd-auditors’ feedback to improve inference performance while simultaneously neutralizing each auditor’s intrinsic biases, as well as (iii) quantify individual auditor’s biases within the evaluation of various recommender systems and/or data-driven interventions.

Method: Students will first learn about various algorithmic fairness notions and how to implement a crowd-audit platform using cloud features such as AWS SageMaker Clarify and established toolkits such as AI Fairness 360, and learn their limitations in evaluating bias in different application contexts. Then, different types of auditors will be evaluated on benchmark datasets to evaluate their biases, and then the same auditors will be asked to evaluate an unknown classifier. In both cases, students will have the opportunity to qualify auditors in terms of their biases with respect to diverse protected attributes, and learn their contextual significance.

Input Parameters: Data tuples, outcome labels from an unverified classifier, and crowd-auditors’ judgments.

Output Parameters: Measure of biases in the given classifier.

Project Deliverables: Experimental results, and modification to algorithms and their implementations, and possible publications.

Researcher