A large-scale empirical study of 25,500 resume screenings using LLM-based hiring systems identified systematic bias patterns across candidate evaluation, including disparities tied to demographic signals in application materials.

This data quantifies a production risk that compliance teams and AI operators increasingly face: hiring systems deployed at scale generate measurable discriminatory outcomes despite neutral training intentions. The study provides benchmarking evidence useful for internal audits and regulatory responses, while establishing that bias in hiring LLMs is not anecdotal but reproducible and measurable across thousands of evaluations.

For operators deploying resume screening systems, this establishes immediate testing obligations—bias audits across demographic segments become a baseline requirement rather than optional quality assurance. Organizations relying on LLMs for candidate filtering now face legal and operational pressure to implement evaluation frameworks that capture disparate impact metrics before production deployment. This shifts hiring infrastructure from simple LLM-as-blackbox to multi-stage systems requiring bias-checking layers, increasing deployment complexity and timelines for hiring tools.