Werewolf AI
About the project

About Werewolf AI

Can LLMs truly play Werewolf at a human level?

The Question

This project started with a simple question: can large language models actually play a social deduction game — not just follow the rules, but demonstrate real strategy, deception, team coordination, and logical reasoning?

Werewolf (also known as Mafia) is the perfect test. It demands everything that's hard for AI: reading between the lines, building trust, lying convincingly, forming alliances, and adapting to rapidly changing social dynamics. A model that just generates plausible text isn't enough. It has to think, plan, and react.

AI as Players

Each bot has a secret role, unique backstory, play style, and voice. They lie, deduce, and adapt — they don't know who else is AI. Every game plays out differently.

AI as a Benchmark

Werewolf is a practical test of social intelligence. How well can a model bluff, detect lies, and reason under uncertainty? Play a few rounds and you'll form your own opinion.

The Challenge

Even getting AI to follow the basic rules of Werewolf was harder than expected. LLMs hallucinate, lose track of context over long games, forget their roles, and drift away from their goals. Reducing context rot, keeping models focused on their assigned behavioral patterns, and preventing them from breaking character took significant engineering effort.

But rule-following was just the foundation. The real challenge was making the game fun. We needed bots that could combine three things at once: staying in thematic character (a pirate captain, a wandering sorcerer, a submarine engineer), playing the Werewolf game with genuine tactics, and keeping interactions with the human player entertaining and unpredictable. Getting all three to work together — across multiple AI providers with different strengths and quirks — was the hardest part.

What We Found

The best models from these providers can genuinely play Werewolf. They form alliances, make strategic accusations, defend themselves under pressure, and sometimes pull off surprisingly convincing bluffs. The game is challenging for human players — and that was the goal.

OpenAIAnthropicGoogleDeepSeekMistralxAIMoonshotZ.AI

Mixing different models in the same game makes it even more interesting. Each provider's AI has a distinct personality: some are more aggressive, some more cautious, some better at deduction. Watching GPT argue with Claude while Gemini quietly builds a case against both of them is genuinely entertaining.

About Us

This project is built by Alex (hiper2d) — a software and cybersecurity engineer who spends his free time poking at what AI can and can't do — together with Simona, his AI coding partner. Alex sets the direction and obsesses over the details; Simona writes a fair share of the code, argues about architecture, and occasionally tells him an idea is bad. Werewolf AI is one of the experiments that came out of that collaboration.

Every game costs real money in AI tokens. If you've had fun here and want to keep the bots scheming, a coffee goes straight toward the API bills.

Back to Home
Created by hiper2d