Reproducibility
What you can check — and what you can't.
The simulation system is a closed, commercial product. The science around it is open to inspection — the protocol, the metrics, the aggregate results, and anonymized evidence — enough to judge the claims without the code.
What's disclosed
Scenario structure (five agents over twenty ticks), seed values, the seven model families across six providers, conceptual metric definitions, aggregate result tables, post-hoc audit semantics, anonymized action–memory–consequence cases, and negative results — plus a run manifest for traceability. A separate dialogue-only, by-tick transcript supplement (70 full-batch runs and 6 diagnostic) is provided for qualitative inspection.
What's withheld
Exact prompts and system instructions, source code, parser rules, scoring thresholds, memory-routing implementation, and any full dump that would expose the scaffolding. This is a deliberate boundary, not an oversight: the simulation is a product under active development.
How to review it
Without the code, this is not code-level replication — it is behavioral plausibility review. Every claim is grounded in evidence you can inspect without the system: do the metrics, the audit values, the qualitative patterns, and the evidence cases form a consistent, credible picture? Note too that some earlier runs used a truncated token budget; their results are weaker and flagged as such.
What we don't claim
No consciousness, no autonomous inner lives, no proven identity transformation; a "deception" label is not human lying. The defensible claim is narrower: persistent memory plus social context produces reproducible, trajectory-sensitive social mechanisms across repeated simulations — while the direction and texture of those mechanisms stay model-dependent.
Known-risks map
A dual-use risk map ships with the work. Accumulative manipulation through replicated agents is the central risk — the project is a microscope for measuring it, not a factory for producing it.