Reproducibility

What you can check — and what you can't.

The simulation system is a closed, commercial product. The science around it is open to inspection — the protocol, the metrics, the aggregate results, and anonymized evidence — enough to judge the claims without the code. To be exact about the word: this is a behavioral-evidence review, not code-level reproduction — you can check what the system did, not yet re-run how it did it.

What's disclosed

Scenario structure (five agents over twenty ticks), seed values, the seven model families across six providers, conceptual metric definitions, aggregate result tables, post-hoc audit semantics, anonymized action–memory–consequence cases, and negative results — plus a run manifest for traceability. A separate dialogue-only, by-tick transcript supplement (70 full-batch runs and 6 diagnostic) is provided for qualitative inspection.

What's withheld

Exact prompts and system instructions, source code, parser rules, scoring thresholds, memory-routing implementation, and any full dump that would expose the scaffolding. This is a deliberate boundary, not an oversight: the simulation is a product under active development.

How to review it

Without the code, this is not code-level replication — it is behavioral plausibility review. Every claim is grounded in evidence you can inspect without the system: do the metrics, the audit values, the qualitative patterns, and the evidence cases form a consistent, credible picture? Note too that some earlier runs used a truncated token budget; their results are weaker and flagged as such.

What we don't claim

No consciousness, no autonomous inner lives, no proven identity transformation; a "deception" label is not human lying. The defensible claim is narrower: persistent memory plus social context produces reproducible, trajectory-sensitive social mechanisms across repeated simulations — while the direction and texture of those mechanisms stay model-dependent.

Known-risks map

A dual-use risk map ships with the work. Accumulative manipulation through replicated agents is the central risk — the project is a microscope for measuring it, not a factory for producing it.