Adversarial Prompting
Adversarial prompting is the technique of framing prompts so one AI session works against another—or against its own prior output after clearing context.
Examples
- One model writes tests, another tries to make them pass
- One model inserts intentional bugs to verify tests catch them
- Reviewing code as if written by “an unfamiliar engineer” to get more stringent critique
- One model generates edge cases, another validates handling
Why It Works
The key is removing collaborative framing. When you clear context and prompt with an adversarial frame, the model produces output that simulates opposition rather than agreement. It’s not defending work it just helped create.
The Pattern
- Complete the initial work (write code, generate tests, etc.)
- Clear the context or switch models
- Prompt with an adversarial frame (“find issues”, “break this”, “what would fail”)
- Use the critical output to improve the original work