Research-first repository for mechanistic interpretability, residual-state analysis, sparse autoencoders, runtime observability, and intervention-aware analysis.
Current claims are scoped to the active GPT-2 Small setup. Cross-model generalization is not yet established. Read-only observer traces and write-back interventions are treated as distinct evidence classes.
Base76 Research Lab (Sweden)