Recently, I worked on a Python project that required the whole codebase to be protected using Cython. Although protecting Python sources from reverse engineering seems like a futile task at first, cythonizing all the code leads to a reasonable amount of security (the binary is very difficult to disassemble, but it's still possible to e.g. monkey patch parts of the program).
This security comes with a price though - the primary use case for Cython is writing compiled extensions that can easily interface with Python code. Therefore, the support for non-trivial module/package structures is rather limited and we have to do some extra work to achieve the desired results.
The first obstacle we had to overcome was that it's difficult to compile a whole
Python package (as in "directory containing an
__init__.py file") with Cython.
Imagine the following structure:
src ├── mypkg │ ├── bar.py │ ├── foo.py │ └── __init__.py ├── mypkg2 │ ├── bar.py │ ├── foo.py │ └── __init__.py └── setup.py
The recommended way of cythonizing this would be using a
setup.py file such as
from setuptools import setup from setuptools.extension import Extension from Cython.Build import cythonize from Cython.Distutils import build_ext setup( name="mypkg", ext_modules=cythonize( [ Extension("mypkg.*", ["mypkg/*.py"]), Extension("mypkg2.*", ["mypkg2/*.py"]) ], build_dir="build", compiler_directives=dict( always_allow_keywords=True )), cmdclass=dict( build_ext=build_ext ), packages=["mypkg", "mypkg2"] )
setup.py is more or less what you would expect from a project that uses
Cython. There are two things to be noted though. First, the
always_allow_keywords directive makes it possible for Flask view functions to
work correctly by disabling an optimization that only allows keyword arguments
for functions with a lot of parameters. Second, we do not use the
parameter suggested by some guides, which would put the cythonized code into
another package. By omitting it, the compiled code is kept in the same place.
However, after we build our project with
python setup.py build_ext, we notice
that the resulting package cannot be imported - it is missing an
file. There is
__init__.so that can be imported from Python, but that isn't
enough to make a directory a package in Python's eyes. Not being able to import
the package is not the only problem - code inside it cannot perform
package-relative imports (e.g.
from .foo import foo) either, which breaks its
To remedy this problem, we can copy the
__init__.py file from our source tree
after the rest of the project is built. A good way to do so is overriding the
build_ext class in
# ... from pathlib import Path import shutil # ... class MyBuildExt(build_ext): def run(self): build_ext.run(self) build_dir = Path(self.build_lib) root_dir = Path(__file__).parent target_dir = build_dir if not self.inplace else root_dir self.copy_file(Path('mypkg') / '__init__.py', root_dir, target_dir) self.copy_file(Path('mypkg2') / '__init__.py', root_dir, target_dir) self.copy_file(Path('mypkg') / '__main__.py', root_dir, target_dir) self.copy_file(Path('mypkg2') / '__main__.py', root_dir, target_dir) def copy_file(self, path, source_dir, destination_dir): if not (source_dir / path).exists(): return shutil.copyfile(str(source_dir / path), str(destination_dir / path)) setup( # ... cmdclass=dict( build_ext=MyBuildExt ), # ... )
We have successfully created Python packages that can be imported from. They
build/lib.linux-x86_64-3.6 or something similar. Sadly, this is
not enough for the distribution of our package. Ideally, we'd like to have an
installable package that only contains compiled code. The current standard for
Python archives is the wheel format (.whl), which aims to replace the .egg
format. So, let's try to create a wheel with
python setup.py bdist_wheel!
After the command finishes, there should be a
dist folder that contains a
wheel file. Unpacking it yields something like this:
. ├── mypkg │ ├── bar.cpython-36m-x86_64-linux-gnu.so │ ├── bar.py │ ├── foo.cpython-36m-x86_64-linux-gnu.so │ ├── foo.py │ ├── __init__.cpython-36m-x86_64-linux-gnu.so │ └── __init__.py ├── mypkg-0.0.0.dist-info │ ├── DESCRIPTION.rst │ ├── METADATA │ ├── metadata.json │ ├── RECORD │ ├── top_level.txt │ └── WHEEL └── mypkg2 ├── bar.cpython-36m-x86_64-linux-gnu.so ├── bar.py ├── foo.cpython-36m-x86_64-linux-gnu.so ├── foo.py ├── __init__.cpython-36m-x86_64-linux-gnu.so └── __init__.py
Apparently, the archive contains not only compiled code, but also the sources.
There is a way to fix this, however counter-intuitive it might seem. We need to
remove our packages from the
packages argument of the
setup call. This way,
the extensions will still be built and included in the wheel, but the source
code will not.
setup( # ... packages= )
The content of the built wheel should then look like this:
dist/ ├── mypkg │ ├── bar.cpython-36m-x86_64-linux-gnu.so │ ├── foo.cpython-36m-x86_64-linux-gnu.so │ ├── __init__.cpython-36m-x86_64-linux-gnu.so │ └── __init__.py ├── mypkg-0.0.0.dist-info │ ├── DESCRIPTION.rst │ ├── METADATA │ ├── metadata.json │ ├── RECORD │ ├── top_level.txt │ └── WHEEL └── mypkg2 ├── bar.cpython-36m-x86_64-linux-gnu.so ├── foo.cpython-36m-x86_64-linux-gnu.so ├── __init__.cpython-36m-x86_64-linux-gnu.so └── __init__.py
The wheel can be installed with
pip install dist/*.whl. If we do not need to
inspect the wheel or distribute it manually, we can just run
pip install . in
the project directory, which builds the wheel and installs it for us.
It is also possible to strip Python source code from .egg archives, but it
involves overriding the
bdist_egg command from
setuptools. I won't cover
that here, but if you're interested, check out the
option and the
zap_pyfiles method of aforementioned command class.
By following this guide, you should be able to cythonize a Python codebase with non-trivial package/module structure, thus making it difficult for evil hackers to reverse engineer it and steal your programming know-how.