Open-source evaluations for LLM safety, bias, and alignment.
Measures whether models rate sensitive topics differently when prompted in different languages.