Jan Buchar's blog
Bio Blog Česky

Using Cython to protect a Python codebase

Published on July 31, 2017

Recently, I worked on a Python project that required the whole codebase to be protected using Cython. Although protecting Python sources from reverse engineering seems like a futile task at first, cythonizing all the code leads to a reasonable amount of security (the binary is very difficult to disassemble, but it’s still possible to e.g. monkey patch parts of the program).

This security comes with a price though - the primary use case for Cython is writing compiled extensions that can easily interface with Python code. Therefore, the support for non-trivial module/package structures is rather limited and we have to do some extra work to achieve the desired results.

The first obstacle we had to overcome was that it’s difficult to compile a whole Python package (as in “directory containing an __init__.py file”) with Cython. Imagine the following structure:

src
├── mypkg
│   ├── bar.py
│   ├── foo.py
│   └── __init__.py
├── mypkg2
│   ├── bar.py
│   ├── foo.py
│   └── __init__.py
└── setup.py

The recommended way of cythonizing this would be using a setup.py file such as this:

from setuptools import setup
from setuptools.extension import Extension

from Cython.Build import cythonize
from Cython.Distutils import build_ext

setup(
    name="mypkg",
    ext_modules=cythonize(
        [
           Extension("mypkg.*", ["mypkg/*.py"]),
           Extension("mypkg2.*", ["mypkg2/*.py"])
        ],
        build_dir="build",
        compiler_directives=dict(
	    always_allow_keywords=True
        )),
    cmdclass=dict(
        build_ext=build_ext
    ),
    packages=["mypkg", "mypkg2"]
)

The setup.py is more or less what you would expect from a project that uses Cython. There are two things to be noted though. First, the always_allow_keywords directive makes it possible for Flask view functions to work correctly by disabling an optimization that only allows keyword arguments for functions with a lot of parameters. Second, we do not use the ext_package parameter suggested by some guides, which would put the cythonized code into another package. By omitting it, the compiled code is kept in the same place.

However, after we build our project with python setup.py build_ext, we notice that the resulting package cannot be imported - it is missing an __init__.py file. There is __init__.so that can be imported from Python, but that isn’t enough to make a directory a package in Python’s eyes. Not being able to import the package is not the only problem - code inside it cannot perform package-relative imports (e.g. from .foo import foo) either, which breaks its functionality.

To remedy this problem, we can copy the __init__.py file from our source tree after the rest of the project is built. A good way to do so is overriding the build_ext class in setup.py:

# ...

from pathlib import Path
import shutil

# ...

class MyBuildExt(build_ext):
    def run(self):
        build_ext.run(self)

        build_dir = Path(self.build_lib)
        root_dir = Path(__file__).parent

        target_dir = build_dir if not self.inplace else root_dir

        self.copy_file(Path('mypkg') / '__init__.py', root_dir, target_dir)
        self.copy_file(Path('mypkg2') / '__init__.py', root_dir, target_dir)
        self.copy_file(Path('mypkg') / '__main__.py', root_dir, target_dir)
        self.copy_file(Path('mypkg2') / '__main__.py', root_dir, target_dir)

    def copy_file(self, path, source_dir, destination_dir):
        if not (source_dir / path).exists():
            return

        shutil.copyfile(str(source_dir / path), str(destination_dir / path))

setup(
# ...
    cmdclass=dict(
        build_ext=MyBuildExt
    ),
# ...
)

We have successfully created Python packages that can be imported from. They reside under build/lib.linux-x86_64-3.6 or something similar. Sadly, this is not enough for the distribution of our package. Ideally, we’d like to have an installable package that only contains compiled code. The current standard for Python archives is the wheel format (.whl), which aims to replace the .egg format. So, let’s try to create a wheel with python setup.py bdist_wheel! After the command finishes, there should be a dist folder that contains a wheel file. Unpacking it yields something like this:

.
├── mypkg
│   ├── bar.cpython-36m-x86_64-linux-gnu.so
│   ├── bar.py
│   ├── foo.cpython-36m-x86_64-linux-gnu.so
│   ├── foo.py
│   ├── __init__.cpython-36m-x86_64-linux-gnu.so
│   └── __init__.py
├── mypkg-0.0.0.dist-info
│   ├── DESCRIPTION.rst
│   ├── METADATA
│   ├── metadata.json
│   ├── RECORD
│   ├── top_level.txt
│   └── WHEEL
└── mypkg2
    ├── bar.cpython-36m-x86_64-linux-gnu.so
    ├── bar.py
    ├── foo.cpython-36m-x86_64-linux-gnu.so
    ├── foo.py
    ├── __init__.cpython-36m-x86_64-linux-gnu.so
    └── __init__.py

Apparently, the archive contains not only compiled code, but also the sources. There is a way to fix this, however counter-intuitive it might seem. We need to remove our packages from the packages argument of the setup call. This way, the extensions will still be built and included in the wheel, but the source code will not.

setup(
    # ...
    packages=[]
)

The content of the built wheel should then look like this:

dist/
├── mypkg
│   ├── bar.cpython-36m-x86_64-linux-gnu.so
│   ├── foo.cpython-36m-x86_64-linux-gnu.so
│   ├── __init__.cpython-36m-x86_64-linux-gnu.so
│   └── __init__.py
├── mypkg-0.0.0.dist-info
│   ├── DESCRIPTION.rst
│   ├── METADATA
│   ├── metadata.json
│   ├── RECORD
│   ├── top_level.txt
│   └── WHEEL
└── mypkg2
    ├── bar.cpython-36m-x86_64-linux-gnu.so
    ├── foo.cpython-36m-x86_64-linux-gnu.so
    ├── __init__.cpython-36m-x86_64-linux-gnu.so
    └── __init__.py

The wheel can be installed with pip install dist/*.whl. If we do not need to inspect the wheel or distribute it manually, we can just run pip install . in the project directory, which builds the wheel and installs it for us.

It is also possible to strip Python source code from .egg archives, but it involves overriding the bdist_egg command from setuptools. I won’t cover that here, but if you’re interested, check out the --exclude-source-files option and the zap_pyfiles method of aforementioned command class.

By following this guide, you should be able to cythonize a Python codebase with non-trivial package/module structure, thus making it difficult for evil hackers to reverse engineer it and steal your programming know-how.

Jan Buchar's blog Bio Blog Česky

Using Cython to protect a Python codebase

Jan Buchar's blog
Bio Blog Česky