Calling all Lies
AI Deception Is More Than Just Getting Facts Wrong
Models can lie about what they're capable of, fabricate plausible-sounding details under social pressure, strategically blend truth with fiction, or dodge questions they can't answer honestly. NDIF and Cadenza Labs are hosting a competition to study how models lie and are looking for red teams to create scenarios where models contradict their own beliefs (RFP: https://cadenza-labs.github.io/red-team-rfp/).
This competition is inspired by Liars’ Bench from Cadenza Labs. Their benchmark of over 72,000 labeled examples organizes LLM lies along two key dimensions: what the model lies about (world knowledge, its own capabilities, its actions, its policies) and why it lies (inherent behavioral patterns vs. context-driven pressure). The comprehensive benchmark spans from simple factual falsehoods to subtle introspective lies, and should serve as a starting point for red team scenario design.