LLM nondeterminism arises from batch-invariance failures and floating-point rounding in GPU/TPU kernels
Nondeterministic outputs undermine reproducibility, RLHF stability, debugging, alignment, and scientific audits
Batch-invariant kernels plus deterministic inference mode ensure consistent outputs across batch sizes
Default Qwen showed ~80 distinct outputs; deterministic mode yielded 1,000 identical completions