============================================================
NNsight vs PyTorch Hooks Performance Benchmark
============================================================
Model: openai-community/gpt2
Device: cuda
Warmup runs: 3
Benchmark runs: 10
Layer options: [1, 2, 4, 6, 8, 10, 12]
Token options: [1, 5, 10, 20]
Module types: ['attn', 'mlp', 'both']

Loading models on cuda...

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]
Loading weights:   1%|          | 1/148 [00:00<00:00, 16912.52it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|          | 1/148 [00:00<00:00, 4922.89it/s, Materializing param=transformer.h.0.attn.c_attn.bias] 
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 5629.94it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 4238.81it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 4723.32it/s, Materializing param=transformer.h.0.attn.c_proj.bias]  
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 2964.17it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 2465.42it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1918.05it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 2195.28it/s, Materializing param=transformer.h.0.ln_1.bias]         
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 1992.35it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 2029.83it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 1946.46it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 2136.37it/s, Materializing param=transformer.h.0.ln_2.bias]  
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 2052.73it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 1872.88it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 1810.04it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 1970.90it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 1879.45it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 2045.50it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 2006.75it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 2174.24it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 2101.55it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 2234.28it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 2185.67it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 2314.54it/s, Materializing param=transformer.h.1.attn.c_attn.bias] 
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 2257.06it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 2305.92it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 2283.59it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 2372.88it/s, Materializing param=transformer.h.1.attn.c_proj.bias]  
Loading weights:  10%|█         | 15/148 [00:00<00:00, 2330.17it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 2435.98it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 2393.84it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 2488.85it/s, Materializing param=transformer.h.1.ln_1.bias]         
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 2361.42it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 2466.67it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 2425.15it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 2534.81it/s, Materializing param=transformer.h.1.ln_2.bias]  
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 2508.48it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 2601.52it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 2572.17it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 2646.73it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 2616.38it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 2709.10it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 2659.83it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 2756.02it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 2717.74it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 2764.03it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 2719.09it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 2763.56it/s, Materializing param=transformer.h.2.attn.c_attn.bias] 
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 2726.26it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 2755.23it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 2697.10it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 2768.38it/s, Materializing param=transformer.h.2.attn.c_proj.bias]  
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 2728.82it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 2782.09it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 2741.44it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 2819.08it/s, Materializing param=transformer.h.2.ln_1.bias]         
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 2706.12it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 2737.14it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 2714.06it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 2733.42it/s, Materializing param=transformer.h.2.ln_2.bias]  
Loading weights:  21%|██        | 31/148 [00:00<00:00, 2711.48it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 2755.50it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 2709.61it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 2759.85it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 2739.04it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 2780.12it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 2763.15it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 2825.86it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 2752.68it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 2784.04it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 2754.83it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 2804.54it/s, Materializing param=transformer.h.3.attn.c_attn.bias] 
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 2796.71it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 2798.07it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 2790.57it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 2851.23it/s, Materializing param=transformer.h.3.attn.c_proj.bias]  
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 2843.94it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 2904.59it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 2897.72it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 2958.31it/s, Materializing param=transformer.h.3.ln_1.bias]         
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 2951.00it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 3008.88it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 3002.06it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 3061.43it/s, Materializing param=transformer.h.3.ln_2.bias]  
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 3054.48it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 3113.65it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 3106.68it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 3165.62it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 3158.41it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 3216.12it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 3208.47it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 3265.78it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 3258.17it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 3314.95it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 3307.21it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 3362.20it/s, Materializing param=transformer.h.4.attn.c_attn.bias] 
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 3354.46it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 3409.89it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 3402.48it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 3457.57it/s, Materializing param=transformer.h.4.attn.c_proj.bias]  
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 3449.93it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 3504.80it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 3497.27it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 3551.77it/s, Materializing param=transformer.h.4.ln_1.bias]         
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 3543.84it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 3596.94it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 3588.85it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 3641.98it/s, Materializing param=transformer.h.4.ln_2.bias]  
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 3633.83it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 3686.43it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 3678.81it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 3731.35it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 3723.33it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 3772.91it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 3763.51it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 3814.06it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 3806.02it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 3856.54it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 3848.28it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 3897.76it/s, Materializing param=transformer.h.5.attn.c_attn.bias] 
Loading weights:  41%|████      | 61/148 [00:00<00:00, 3889.64it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 3939.39it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 3931.23it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 3980.55it/s, Materializing param=transformer.h.5.attn.c_proj.bias]  
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 3972.42it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 4021.08it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 4011.83it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 4060.68it/s, Materializing param=transformer.h.5.ln_1.bias]         
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 4052.59it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 4101.40it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 4092.85it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 4140.72it/s, Materializing param=transformer.h.5.ln_2.bias]  
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 4132.08it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 4179.43it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 4171.30it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 4219.25it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 4210.96it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 4257.74it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 4249.55it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 4295.77it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 4287.17it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 4333.15it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 4324.71it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 4369.88it/s, Materializing param=transformer.h.6.attn.c_attn.bias] 
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 4360.98it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 4406.41it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 4397.29it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 4442.18it/s, Materializing param=transformer.h.6.attn.c_proj.bias]  
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 4433.29it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 4477.95it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 4469.22it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 4513.28it/s, Materializing param=transformer.h.6.ln_1.bias]         
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 4504.28it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 4548.19it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 4539.67it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 4583.49it/s, Materializing param=transformer.h.6.ln_2.bias]  
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 4574.83it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 4618.32it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 4608.62it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 4651.98it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 4643.52it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 4686.12it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 4677.39it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 4720.05it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 4711.43it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 4753.59it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 4744.82it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 4786.73it/s, Materializing param=transformer.h.7.attn.c_attn.bias] 
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 4778.20it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 4819.43it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 4810.17it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 4848.78it/s, Materializing param=transformer.h.7.attn.c_proj.bias]  
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 4839.20it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 4879.03it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 4870.34it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 4911.23it/s, Materializing param=transformer.h.7.ln_1.bias]         
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 4902.46it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 4942.87it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 4933.96it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 4974.22it/s, Materializing param=transformer.h.7.ln_2.bias]  
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 4965.03it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 5004.68it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 4995.61it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 5035.57it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 5026.87it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 5066.04it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 5056.94it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 5096.23it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 5087.70it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 5126.21it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 5117.67it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 5156.36it/s, Materializing param=transformer.h.8.attn.c_attn.bias] 
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 5147.56it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 5185.80it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 5176.72it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 5214.24it/s, Materializing param=transformer.h.8.attn.c_proj.bias]  
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 5205.29it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 5239.74it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 5228.50it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 5265.23it/s, Materializing param=transformer.h.8.ln_1.bias]         
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 5255.89it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 5292.82it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 5284.00it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 5321.02it/s, Materializing param=transformer.h.8.ln_2.bias]  
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 5312.18it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 5348.17it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 5339.07it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 5375.67it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 5366.31it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 5402.21it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 5393.49it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 5428.90it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 5419.46it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 5454.10it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 5444.66it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 5479.73it/s, Materializing param=transformer.h.9.attn.c_attn.bias] 
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 5470.55it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 5505.26it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 5496.40it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 5529.90it/s, Materializing param=transformer.h.9.attn.c_proj.bias]  
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 5521.31it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 5556.36it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 5547.23it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 5581.54it/s, Materializing param=transformer.h.9.ln_1.bias]         
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 5572.61it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 5607.29it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 5598.23it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 5632.57it/s, Materializing param=transformer.h.9.ln_2.bias]  
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 5624.03it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 5658.82it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 5650.53it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 5684.72it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 5676.17it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 5708.71it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 5699.70it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 5733.02it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 5723.42it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 5756.46it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 5747.92it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 5780.83it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 5771.89it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 5803.82it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 5794.55it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 5826.74it/s, Materializing param=transformer.h.10.attn.c_proj.bias]  
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 5817.41it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 5848.88it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 5838.57it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 5869.57it/s, Materializing param=transformer.h.10.ln_1.bias]         
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 5859.28it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 5890.94it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 5881.50it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 5913.31it/s, Materializing param=transformer.h.10.ln_2.bias]  
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 5904.72it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 5936.15it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 5926.97it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 5957.49it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 5948.58it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 5979.51it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 5970.54it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 6000.83it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 5991.54it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 6021.78it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 6010.86it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 6040.33it/s, Materializing param=transformer.h.11.attn.c_attn.bias] 
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 6030.92it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 6060.54it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 6051.08it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 6080.46it/s, Materializing param=transformer.h.11.attn.c_proj.bias]  
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 6071.53it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 6100.94it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 6091.55it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 6120.92it/s, Materializing param=transformer.h.11.ln_1.bias]         
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 6112.00it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 6141.20it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 6132.61it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 6162.36it/s, Materializing param=transformer.h.11.ln_2.bias]  
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 6153.12it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 6181.47it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 6170.88it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 6199.78it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 6190.95it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 6219.30it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 6210.61it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 6239.25it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 6230.44it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 6258.79it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 6249.79it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 6278.25it/s, Materializing param=transformer.ln_f.bias]             
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 6269.38it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 6297.95it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 6288.89it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 6317.89it/s, Materializing param=transformer.wpe.weight] 
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 6308.90it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 6336.78it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 6328.31it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 6306.78it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: openai-community/gpt2
Key                  | Status     |  | 
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]
Loading weights:   1%|          | 1/148 [00:00<00:00, 55924.05it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|          | 1/148 [00:00<00:00, 6087.52it/s, Materializing param=transformer.h.0.attn.c_attn.bias] 
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 3992.67it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 3184.74it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 3628.29it/s, Materializing param=transformer.h.0.attn.c_proj.bias]  
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 3157.57it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 3551.49it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 3424.62it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 3256.95it/s, Materializing param=transformer.h.0.ln_1.bias]         
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 3150.77it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 3625.15it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 3518.22it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 3961.16it/s, Materializing param=transformer.h.0.ln_2.bias]  
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 3888.24it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 4311.80it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 4236.14it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 4627.77it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 4547.49it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 4901.03it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 4823.26it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 3603.64it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 3560.53it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 3814.45it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 3772.42it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 2801.52it/s, Materializing param=transformer.h.1.attn.c_attn.bias] 
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 2763.47it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 2931.91it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 2895.48it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 2624.06it/s, Materializing param=transformer.h.1.attn.c_proj.bias]  
Loading weights:  10%|█         | 15/148 [00:00<00:00, 2606.34it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 2749.24it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 2732.67it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 2762.19it/s, Materializing param=transformer.h.1.ln_1.bias]         
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 2668.23it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 2792.69it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 2776.97it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 2903.80it/s, Materializing param=transformer.h.1.ln_2.bias]  
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 2846.13it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 2949.68it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 2902.33it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 2096.95it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 2086.67it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 2172.45it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 2161.25it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 2246.39it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 2238.88it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 2323.61it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 2315.96it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 2399.43it/s, Materializing param=transformer.h.2.attn.c_attn.bias] 
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 2391.88it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 2473.84it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 2465.45it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 2447.51it/s, Materializing param=transformer.h.2.attn.c_proj.bias]  
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 2438.92it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 2514.52it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 2496.29it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 2534.80it/s, Materializing param=transformer.h.2.ln_1.bias]         
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 2492.52it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 2552.42it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 2540.21it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 2599.64it/s, Materializing param=transformer.h.2.ln_2.bias]  
Loading weights:  21%|██        | 31/148 [00:00<00:00, 2580.19it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 2644.63it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 2624.72it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 2689.44it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 2682.14it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 2750.73it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 2742.96it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 2550.97it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 2544.12it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 2604.75it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 2595.40it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 2382.65it/s, Materializing param=transformer.h.3.attn.c_attn.bias] 
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 2368.29it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 2414.61it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 2407.32it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 2325.17it/s, Materializing param=transformer.h.3.attn.c_proj.bias]  
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 2314.02it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 2355.69it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 2347.05it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 2366.89it/s, Materializing param=transformer.h.3.ln_1.bias]         
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 2352.87it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 2396.68it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 2385.68it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 2431.42it/s, Materializing param=transformer.h.3.ln_2.bias]  
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 2426.70it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 2475.01it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 2470.31it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 2518.66it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 2514.10it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 2395.91it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 2387.37it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 2430.10it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 2424.10it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 2467.06it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 2462.35it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 2385.06it/s, Materializing param=transformer.h.4.attn.c_attn.bias] 
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 2379.15it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 2419.70it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 2415.52it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 2354.77it/s, Materializing param=transformer.h.4.attn.c_proj.bias]  
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 2350.78it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 2390.15it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 2386.44it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 2411.59it/s, Materializing param=transformer.h.4.ln_1.bias]         
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 2386.09it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 2415.71it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 2410.39it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 2448.20it/s, Materializing param=transformer.h.4.ln_2.bias]  
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 2443.09it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 2480.50it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 2470.61it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 2499.53it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 2492.96it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 2385.07it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 2379.91it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 2414.14it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 2409.44it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 2443.73it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 2438.62it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 2353.62it/s, Materializing param=transformer.h.5.attn.c_attn.bias] 
Loading weights:  41%|████      | 61/148 [00:00<00:00, 2346.45it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 2376.55it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 2371.28it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 2326.76it/s, Materializing param=transformer.h.5.attn.c_proj.bias]  
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 2320.84it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 2351.71it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 2348.74it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 2357.84it/s, Materializing param=transformer.h.5.ln_1.bias]         
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 2354.64it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 2385.30it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 2382.39it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 2413.23it/s, Materializing param=transformer.h.5.ln_2.bias]  
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 2405.65it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 2431.98it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 2427.51it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 2458.00it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 2455.14it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 2389.20it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 2385.72it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 2414.55it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 2410.46it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 2438.98it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 2435.60it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 2365.32it/s, Materializing param=transformer.h.6.attn.c_attn.bias] 
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 2361.55it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 2388.74it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 2386.04it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 2346.07it/s, Materializing param=transformer.h.6.attn.c_proj.bias]  
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 2341.23it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 2366.99it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 2363.60it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 2380.39it/s, Materializing param=transformer.h.6.ln_1.bias]         
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 2373.98it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 2393.52it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 2389.17it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 2412.26it/s, Materializing param=transformer.h.6.ln_2.bias]  
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 2409.63it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 2431.55it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 2428.98it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 2455.13it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 2452.83it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 2389.50it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 2380.14it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 2403.33it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 2400.50it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 2425.30it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 2422.97it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 2367.62it/s, Materializing param=transformer.h.7.attn.c_attn.bias] 
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 2365.07it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 2383.19it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 2379.97it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 2340.15it/s, Materializing param=transformer.h.7.attn.c_proj.bias]  
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 2334.61it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 2355.51it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 2351.73it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 2357.06it/s, Materializing param=transformer.h.7.ln_1.bias]         
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 2354.77it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 2377.24it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 2370.82it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 2391.40it/s, Materializing param=transformer.h.7.ln_2.bias]  
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 2383.34it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 2405.50it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 2403.43it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 2426.08it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 2424.02it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 2446.48it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 2444.49it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 2313.11it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 2310.80it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 2331.52it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 2329.44it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 2350.13it/s, Materializing param=transformer.h.8.attn.c_attn.bias] 
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 2347.42it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 2367.67it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 2364.77it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 2346.21it/s, Materializing param=transformer.h.8.attn.c_proj.bias]  
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 2343.48it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 2363.47it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 2361.39it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 2367.98it/s, Materializing param=transformer.h.8.ln_1.bias]         
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 2364.82it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 2384.51it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 2381.03it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 2393.92it/s, Materializing param=transformer.h.8.ln_2.bias]  
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 2387.95it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 2405.72it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 2400.99it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 2417.71it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 2415.84it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 2435.57it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 2433.69it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 2282.01it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 2279.27it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 2296.98it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 2293.95it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 2311.68it/s, Materializing param=transformer.h.9.attn.c_attn.bias] 
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 2309.95it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 2328.10it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 2326.39it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 2311.08it/s, Materializing param=transformer.h.9.attn.c_proj.bias]  
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 2307.66it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 2323.75it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 2320.65it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 2327.95it/s, Materializing param=transformer.h.9.ln_1.bias]         
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 2324.53it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 2338.66it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 2335.83it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 2352.04it/s, Materializing param=transformer.h.9.ln_2.bias]  
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 2350.21it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 2367.59it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 2365.64it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 2382.54it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 2380.77it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 2276.19it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 2272.81it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 2289.13it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 2287.64it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 2304.28it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 2302.81it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 2319.53it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 2317.64it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 2333.98it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 2332.43it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 2320.09it/s, Materializing param=transformer.h.10.attn.c_proj.bias]  
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 2317.62it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 2328.97it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 2326.07it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 2332.06it/s, Materializing param=transformer.h.10.ln_1.bias]         
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 2326.35it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 2340.84it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 2335.15it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 2349.72it/s, Materializing param=transformer.h.10.ln_2.bias]  
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 2348.07it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 2363.82it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 2362.21it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 2378.01it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 2375.97it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 2391.73it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 2390.27it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 2264.36it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 2262.72it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 2277.62it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 2276.31it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 2291.34it/s, Materializing param=transformer.h.11.attn.c_attn.bias] 
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 2289.97it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 2304.85it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 2303.53it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 2279.81it/s, Materializing param=transformer.h.11.attn.c_proj.bias]  
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 2277.60it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 2292.05it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 2290.73it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 2297.12it/s, Materializing param=transformer.h.11.ln_1.bias]         
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 2292.66it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 2305.92it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 2303.67it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 2315.66it/s, Materializing param=transformer.h.11.ln_2.bias]  
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 2312.83it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 2325.60it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 2324.15it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 2338.25it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 2336.84it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 2308.62it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 2307.04it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 2320.84it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 2318.84it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 2281.62it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 2280.24it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 2293.70it/s, Materializing param=transformer.ln_f.bias]             
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 2291.76it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 2305.13it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 2303.56it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 2317.15it/s, Materializing param=transformer.wpe.weight] 
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 2315.89it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 2312.16it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 2310.96it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1672.18it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: openai-community/gpt2
Key                  | Status     |  | 
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
You have set `compile_config`, but we are unable to meet the criteria for compilation. Compilation will be skipped.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
