The security dangers posed by the Pickle format have as soon as once more come to the fore with the invention of a brand new “hybrid machine studying (ML) mannequin exploitation method” dubbed Sleepy Pickle.
The assault methodology, per Path of Bits, weaponizes the ever present format used to package deal and distribute machine studying (ML) fashions to deprave the mannequin itself, posing a extreme provide chain threat to a company’s downstream prospects.
“Sleepy Pickle is a stealthy and novel assault method that targets the ML mannequin itself moderately than the underlying system,” security researcher Boyan Milanov stated.
Whereas pickle is a extensively used serialization format by ML libraries like PyTorch, it may be used to hold out arbitrary code execution assaults just by loading a pickle file (i.e., throughout deserialization).
“We recommend loading fashions from customers and organizations you belief, counting on signed commits, and/or loading fashions from [TensorFlow] or Jax codecs with the from_tf=True auto-conversion mechanism,” Hugging Face factors out in its documentation.
Sleepy Pickle works by inserting a payload right into a pickle file utilizing open-source instruments like Fickling, after which delivering it to a goal host by utilizing one of many 4 methods resembling an adversary-in-the-middle (AitM) assault, phishing, provide chain compromise, or the exploitation of a system weak point.
“When the file is deserialized on the sufferer’s system, the payload is executed and modifies the contained mannequin in-place to insert backdoors, management outputs, or tamper with processed information earlier than returning it to the consumer,” Milanov stated.
Put otherwise, the payload injected into the pickle file containing the serialized ML mannequin may be abused to change mannequin conduct by tampering with the mannequin weights, or tampering with the enter and output information processed by the mannequin.
In a hypothetical assault situation, the strategy may very well be used to generate dangerous outputs or misinformation that may have disastrous penalties to consumer security (e.g., drink bleach to remedy flu), steal consumer information when sure situations are met, and assault customers not directly by producing manipulated summaries of reports articles with hyperlinks pointing to a phishing web page.
Path of Bits stated that Sleepy Pickle may be weaponized by menace actors to keep up surreptitious entry on ML techniques in a way that evades detection, on condition that the mannequin is compromised when the pickle file is loaded within the Python course of.
That is additionally simpler than instantly importing a malicious mannequin to Hugging Face, as it may modify mannequin conduct or output dynamically with out having to entice their targets into downloading and working them.
“With Sleepy Pickle attackers can create pickle recordsdata that are not ML fashions however can nonetheless corrupt native fashions if loaded collectively,” Milanov stated. “The assault floor is thus a lot broader, as a result of management over any pickle file within the provide chain of the goal group is sufficient to assault their fashions.”
“Sleepy Pickle demonstrates that superior model-level assaults can exploit lower-level provide chain weaknesses by way of the connections between underlying software program parts and the ultimate software.”
From Sleepy Pickle to Sticky Pickle
Sleepy Pickle just isn’t the one assault to be demonstrated by Path of Bits, for the cybersecurity agency stated it may very well be improved to realize persistence in a compromised mannequin and finally evade detection – a method known as Sticky Pickle.
This variant “incorporates a self-replicating mechanism that propagates its malicious payload into successive variations of the compromised mannequin,” Milanov stated. “Moreover, Sticky Pickle makes use of obfuscation to disguise the malicious code to forestall detection by pickle file scanners.”
In doing so, the exploit stays persistent even in eventualities if a consumer opts to change a compromised mannequin and redistribute it utilizing a brand new pickle file that is past the attacker’s management.
To safe towards Sleepy Pickle and different provide chain assaults, it is suggested to keep away from utilizing pickle recordsdata to distribute serialized fashions and solely use fashions from trusted organizations and depend on safer file codecs like SafeTensors.