Temperature isn’t a treatment
A reflexive objection from practitioners acquainted with LLM configuration holds that rising sampling temperature would attenuate these distributional biases by flattening the likelihood panorama from which characters are drawn. Irregular’s empirical outcomes are unambiguous in refuting this instinct. Testing performed at temperature 1.0, the utmost setting on Claude, produces no statistically significant enchancment in efficient entropy. The character-position biases are encoded in mannequin weights, not in sampling parameters, and temperature modulation operates downstream of these weight-instantiated distributions.
Individually, Kaspersky’s Data Science Staff Lead Alexey Antonov performed a complementary investigation analyzing 1,000 passwords generated by ChatGPT, Meta’s Llama, and DeepSeek. The character-frequency histograms disclosed pronounced non-uniformity throughout all three fashions: ChatGPT displays a scientific choice for the characters x, p, and L; Llama for the hash image and the letter p; DeepSeek for t and w. At temperature 0.0, Claude produces the an identical string on each invocation. These findings are constant throughout completely different mannequin households and measurement methodologies, corroborating the structural fairly than incidental nature of the vulnerability.
The sensible corollary is that an adversary who has recognized the LLM used to generate a goal credential needn’t try exhaustive brute-force in opposition to a 94^16 keyspace. They will assemble a model-specific assault dictionary, ordering candidates by their empirical technology frequency, and execute a probabilistically optimized search in opposition to a keyspace a number of orders of magnitude smaller. Kaspersky’s cracking exams discovered that 88 % of DeepSeek passwords and 87 % of Llama passwords failed to resist focused assault, as did 33 % of ChatGPT passwords, all utilizing commonplace GPU {hardware}.



