To ban or to not ban, that’s the pickle
Whereas Hugging Face helps machine studying (ML) fashions in varied codecs, Pickle is among the many most prevalent due to the recognition of PyTorch, a broadly used ML library written in Python that makes use of Pickle serialization and deserialization for fashions. Pickle is an official Python module for object serialization, which in programming languages means turning an object right into a byte stream — the reverse course of is called deserialization, or in Python terminology: pickling and unpickling.
The method of serialization and deserialization, particularly of enter from untrusted sources, has been the reason for many distant code execution vulnerabilities in a wide range of programming languages. Equally, the Python documentation for Pickle has a giant purple warning: “It’s potential to assemble malicious pickle information which can execute arbitrary code throughout unpickling. By no means unpickle information that would have come from an untrusted supply, or that would have been tampered with.”
That poses an issue for an open platform like Hugging Face, the place customers overtly share and should unpickle mannequin information. On one hand, this opens the potential for abuse by ill-intentioned people who add poisoned fashions, however on the opposite, banning this format could be too restrictive given PyTorch’s recognition. So Hugging Face selected the center street, which is to try to scan and detect malicious Pickle recordsdata.