| layer_output | Full layer contribution to residual stream | Overall layer importance |
| attention | Attention block aggregated output | Token-to-token relationships active |
| attention_conv | Mamba-style convolution in hybrid attention | Local sequential pattern matching |
| attention_qkv | Query/Key/Value projections | Complex attention query formation |
| attention_gate | Attention gating mechanism | Selective attention filtering |
| attention_proj | Attention output projections | Compressed attention output |
| mlp | Feed-forward network block | Knowledge recall and reasoning |
| mlp_gate_up | MLP gated projection (SwiGLU) | Which knowledge neurons fired |
| mlp_down | MLP output down-projection | Compressed reasoning output |
| layernorm_input | Pre-attention normalization | Signal scale into attention |
| layernorm_post | Post-attention normalization | Signal scale into MLP |