think truss cli has a regression when it comes to builds and moe_expert_parallel_option
44 repliesMichael Feil  [10:50 PM]
Latest truss?
[10:51 PM]I'm not sure why truss is picky all of sudden.
Dhruv Singal  [10:52 PM]
yeah, i downgraded, but when I get glm4.7 working again I'll upgrade again and share a config.yaml to repro
Sid Shanker  [5:53 AM]
what's the error?
Michael Feil  [7:02 AM]
We used to push a truss with “exclude unset” option. Now all default settings are written to yaml too.
[7:03 AM]The result: if you introduce a single field + read the truss inside engine builder / Briton, it gets super messy
Jimmy Whitaker  [1:40 PM]
Untitled 

2026-01-29T21:23:00.097535Z  INFO ldr.block_until_download_complete: model_cache: Fetch took 1.41 seconds, of which 0.00 seconds were spent blocking.   

2026-01-29T21:23:00.107133Z  WARN trt_llm_config.validate_patch_kwargs: trt_llm.runtime.patch_kwargs is a preview feature. Fields may change in the future.   

2026-01-29T21:23:00.107180Z ERROR trt_llm_config.validate_patch_kwargs: runtime.config_kwargs contains the key 'enable_chunked_prefill'. This is already a field in the TRTLLMRuntimeConfigurationV2. Please use the appropriate field in the TRTLLMRuntimeConfigurationV2.   


[1:40 PM]Hey, I'm getting the same issue. Here's the error log.
[1:43 PM]Here's a link to my deployment. Note, this model was working for Bland. I was just trying to update it to work with mdn.
https://app.baseten.co/models/6wgjgggw/logs/w674z2g
Sid Shanker  [1:44 PM]
@Nikhil Narayen or @Deepak Nagaraj would one of you mind helping here? i'm not sure what changed in truss that's causing these older trusses to break
Nikhil Narayen  [2:15 PM]
sorry yes I dropped this, looking now!
Nikhil Narayen  [2:23 PM]
I'm trying to figure out exactly what truss version caused the degradation, will help us narrow down what could have changed. I'm going to try the above config for w674z2g with H100s, unless folks know that won't work
[2:24 PM]I guess between 0.12.8 and 0.12.11 is already a pretty tight bound
Jimmy Whitaker  [2:27 PM]
An FYI here. We fixed the Bland issue by updating the config to a newer image and changing some of the config (@Dhruv Singal helped me with this). So it's not a blocker. Just surfacing it in case you needed a repro.
Michael Feil  [2:27 PM]
Like in truss 0.12.09 we added moe_expert_parallel_option for briton.

Jimmy tried to bring up a model with e.g. truss 0.12.11 locally. In his yaml, we never set moe_expert_parallel_option
Yet, 0.12.11, despite unset, dumps in into the config.yaml that is uploaded. (Option 1 here to fix that by not dumping unset)

Now, the v2 stack has version 0.11.X (varies, baked with the version of trt etc) - server tried to read the full yaml -fails. (Option 2 here to fix by reading with drop allowed)

Hope this helps. Historic behaviour was to not have truss read. (edited) 
[2:27 PM]https://basetenlabs.slack.com/archives/C060ZMW6QLU/p1769725612899989
@Michael Feil Is this a recent regression in the newest version of truss for inference_stack_v2?

I see this in the config in billip:
moe_expert_parallel_option: -1

My config.yaml does not specify it at all

Thread in model-performance | Jan 29th | View messageTess Lipsky  [2:28 PM]
thank you guys @Jimmy Whitaker @Dhruv Singal what was the fix here? i've been stuck on this for rox deployment on H100s
Fred Liu  [2:29 PM]
Continue in the #model-performance thread per Michael's request
Michael Feil  [2:29 PM]
I got tagged by 3 of you at the same time.
[2:30 PM]I think @Nikhil Narayen if you have time to sync that would be useful
[2:30 PM]There is no fix, apart from exactly using the truss version inside the v2 image in your CLI.
Nikhil Narayen  [2:31 PM]
I see, I can sync in a bit but my understanding is now there's a truss version coupled into the V2 image, and we need that to be in sync with the truss version that starts the push

so we have a workaround, and maybe there's a medium/long term ask for how to make this better?
Dhruv Singal  [2:31 PM]
I was able to get through with truss 0.12.11 and a version override image from here: https://basetenlabs.slack.com/archives/C09AX56E6HY/p1769672666465119?thread_ts=1769542097.928019&cid=C09AX56E6HY
I'm not sure if that will work for others, but worth trying while nikhil and michael fix? (edited) 
model_name: GLM-4.7
resources:
  accelerator: B200:8
  cpu: "1"
  memory: 10Gi
From a thread in internal-opencode | Jan 28th | View replyMichael Feil  [2:32 PM]
basically changing truss version + varying v2_llm_version: is not good at the moment, working on fixing it.
Jojo Ortiz  [2:33 PM]
cc: @Ervin Wang
Michael Feil  [2:33 PM]
No, we cannot install a custom truss version. I'd rather duck-read / read the pydantic model inside truss over this.
[2:33 PM]Its also tripple coupled with engine-builder, that needs again the same version.
[2:35 PM]The short term fix: Remove: v2_llm_version: and use truss==v0.12.11 (edited) 
Fred Liu  [2:36 PM]
I think truss==0.9.11 might be too young to use mdn if ur trynna combine that with inference stack v2
Jojo Ortiz  [2:36 PM]
MDN is 12.8+
Michael Feil  [2:36 PM]
to young, explain? (edit meant to write v0.12.11) Its the latest. (edited) 
Jimmy Whitaker  [2:37 PM]
https://github.com/basetenlabs/truss/releases
Fred Liu  [2:37 PM]
using truss==0.9.11 will not let you use mdn, which uses the new weights field
Michael Feil  [2:41 PM]
edit meant v0.12.11.
Fred Liu  [2:46 PM]
Yup got a successful deployment with michael and dhruv's short term fix, thank u!
Bola Malek  [2:46 PM]
anything after 0.12.6 is fine for MDN, but the later the better for more upstream support
Sid Shanker  [3:56 PM]
I see -- so the two truss versions don't agree (truss version on CLI, truss version in the inference stack image)?
Michael Feil  [3:56 PM]
yes (3), especially when paired with overrides. (edited) 
Sid Shanker  [3:57 PM]
where do these come from?
    v2_llm_version: trtllm-gpu-b10-1.2.0rc5.1-528d7b8c1-c11e1396c9Michael Feil  [4:00 PM]
These are mixing the:
trt versiondynamo versionother things: tool call parser behaviour, truss behaviour. All are baked, non-flexible. As a result, trtllm-gpu-b10-1.2.0rc5.1-528d7b8c1-c11e1396c9 is best of accepting both Newer, and sadly also older truss versions. This does not go well with strict validation.
Dhruv Singal  [12:37 AM]
what if we allowed v2 to be buildless   (feel free to tell me you hate it); I mostly use v2 for its tool call parsers and reasoning parsers these days anyways
Michael Feil  [8:08 AM]
We already have a flag for skip_build_job. But this guy wants and specifically specifies quantization.
Michael Feil  [9:01 AM]
But this does not skip the build job in a traditional sense (it could, if we wanted to)
Dhruv Singal  [2:49 PM]
 is anyone working on this by any chance? wanted to push a v2 version override from mapi for gpt-oss and running into this. does this need a change in truss or engine builder?

error code:
2026-02-05T22:49:46.824571Z  WARN trt_llm_config.validate_patch_kwargs: trt_llm.runtime.patch_kwargs is a preview feature. Fields may change in the future.   

Traceback (most recent call last):

  File "<string>", line 1, in <module>

  File "/workspace/trtllm/standalone/run.py", line 174, in main

    asyncio.run(worker())

  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run

    return runner.run(main)

           ^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run

    return self._loop.run_until_complete(task)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete

    return future.result()

           ^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/dynamo/runtime/__init__.py", line 46, in wrapper

    await func(runtime, *args, **kwargs)

  File "/workspace/trtllm/standalone/run.py", line 109, in worker

    trt_config_path, truss_trt_llm_config = load_config_trt_llm_truss()

                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/workspace/trtllm/standalone/config_conversion.py", line 123, in load_config_trt_llm_truss

    truss_config = try_load_truss_config(path)

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/workspace/trtllm/standalone/config_conversion.py", line 72, in try_load_truss_config

    config = TRTLLMConfigurationV2(**trt_llm)

             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/pydantic/main.py", line 253, in __init__

    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)

                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/truss/base/trt_llm_config.py", line 595, in validate_inference_stack_v2

    and build_settings[field] != build_settings_reference[field]

                                 ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^

KeyError: 'moe_expert_parallel_option'

Registered shutdown handler.

🛑 Cleaning up…

2026-02-05 22:49:48,708 WARN exited: model-server (exit status 0; not expected)
