Recently, I worked on a Python project that required the whole codebase to be protected using Cython. Although protecting Python sources from reverse engineering seems like a futile task at first, cythonizing all the code leads to a reasonable amount of security (the binary is very difficult to disassemble, but it's still possible to e.g. monkey patch parts of the program).
This security comes with a price though - the primary use case for Cython is writing compiled extensions that can easily interface with Python code. Therefore, the support for non-trivial module/package structures is rather limited and we have to do some extra work to achieve the desired results.
The first obstacle we had to overcome was that it's difficult to compile a whole
Python package (as in "directory containing an __init__.py
file") with Cython.
Imagine the following structure:
src
├── mypkg
│ ├── bar.py
│ ├── foo.py
│ └── __init__.py
├── mypkg2
│ ├── bar.py
│ ├── foo.py
│ └── __init__.py
└── setup.py
The recommended way of cythonizing this would be using a setup.py
file such as
this:
from setuptools import setup
from setuptools.extension import Extension
from Cython.Build import cythonize
from Cython.Distutils import build_ext
setup(
name="mypkg",
ext_modules=cythonize(
[
Extension("mypkg.*", ["mypkg/*.py"]),
Extension("mypkg2.*", ["mypkg2/*.py"])
],
build_dir="build",
compiler_directives=dict(
always_allow_keywords=True
)),
cmdclass=dict(
build_ext=build_ext
),
packages=["mypkg", "mypkg2"]
)
The setup.py
is more or less what you would expect from a project that uses
Cython. There are two things to be noted though. First, the
always_allow_keywords
directive makes it possible for Flask view functions to
work correctly by disabling an optimization that only allows keyword arguments
for functions with a lot of parameters. Second, we do not use the ext_package
parameter suggested by some guides, which would put the cythonized code into
another package. By omitting it, the compiled code is kept in the same place.
However, after we build our project with python setup.py build_ext
, we notice
that the resulting package cannot be imported - it is missing an __init__.py
file. There is __init__.so
that can be imported from Python, but that isn't
enough to make a directory a package in Python's eyes. Not being able to import
the package is not the only problem - code inside it cannot perform
package-relative imports (e.g. from .foo import foo
) either, which breaks its
functionality.
To remedy this problem, we can copy the __init__.py
file from our source tree
after the rest of the project is built. A good way to do so is overriding the
build_ext
class in setup.py
:
# ...
from pathlib import Path
import shutil
# ...
class MyBuildExt(build_ext):
def run(self):
build_ext.run(self)
build_dir = Path(self.build_lib)
root_dir = Path(__file__).parent
target_dir = build_dir if not self.inplace else root_dir
self.copy_file(Path('mypkg') / '__init__.py', root_dir, target_dir)
self.copy_file(Path('mypkg2') / '__init__.py', root_dir, target_dir)
self.copy_file(Path('mypkg') / '__main__.py', root_dir, target_dir)
self.copy_file(Path('mypkg2') / '__main__.py', root_dir, target_dir)
def copy_file(self, path, source_dir, destination_dir):
if not (source_dir / path).exists():
return
shutil.copyfile(str(source_dir / path), str(destination_dir / path))
setup(
# ...
cmdclass=dict(
build_ext=MyBuildExt
),
# ...
)
We have successfully created Python packages that can be imported from. They
reside under build/lib.linux-x86_64-3.6
or something similar. Sadly, this is
not enough for the distribution of our package. Ideally, we'd like to have an
installable package that only contains compiled code. The current standard for
Python archives is the wheel format (.whl), which aims to replace the .egg
format. So, let's try to create a wheel with python setup.py bdist_wheel
!
After the command finishes, there should be a dist
folder that contains a
wheel file. Unpacking it yields something like this:
.
├── mypkg
│ ├── bar.cpython-36m-x86_64-linux-gnu.so
│ ├── bar.py
│ ├── foo.cpython-36m-x86_64-linux-gnu.so
│ ├── foo.py
│ ├── __init__.cpython-36m-x86_64-linux-gnu.so
│ └── __init__.py
├── mypkg-0.0.0.dist-info
│ ├── DESCRIPTION.rst
│ ├── METADATA
│ ├── metadata.json
│ ├── RECORD
│ ├── top_level.txt
│ └── WHEEL
└── mypkg2
├── bar.cpython-36m-x86_64-linux-gnu.so
├── bar.py
├── foo.cpython-36m-x86_64-linux-gnu.so
├── foo.py
├── __init__.cpython-36m-x86_64-linux-gnu.so
└── __init__.py
Apparently, the archive contains not only compiled code, but also the sources.
There is a way to fix this, however counter-intuitive it might seem. We need to
remove our packages from the packages
argument of the setup
call. This way,
the extensions will still be built and included in the wheel, but the source
code will not.
setup(
# ...
packages=[]
)
The content of the built wheel should then look like this:
dist/
├── mypkg
│ ├── bar.cpython-36m-x86_64-linux-gnu.so
│ ├── foo.cpython-36m-x86_64-linux-gnu.so
│ ├── __init__.cpython-36m-x86_64-linux-gnu.so
│ └── __init__.py
├── mypkg-0.0.0.dist-info
│ ├── DESCRIPTION.rst
│ ├── METADATA
│ ├── metadata.json
│ ├── RECORD
│ ├── top_level.txt
│ └── WHEEL
└── mypkg2
├── bar.cpython-36m-x86_64-linux-gnu.so
├── foo.cpython-36m-x86_64-linux-gnu.so
├── __init__.cpython-36m-x86_64-linux-gnu.so
└── __init__.py
The wheel can be installed with pip install dist/*.whl
. If we do not need to
inspect the wheel or distribute it manually, we can just run pip install .
in
the project directory, which builds the wheel and installs it for us.
It is also possible to strip Python source code from .egg archives, but it
involves overriding the bdist_egg
command from setuptools
. I won't cover
that here, but if you're interested, check out the --exclude-source-files
option and the zap_pyfiles
method of aforementioned command class.
By following this guide, you should be able to cythonize a Python codebase with non-trivial package/module structure, thus making it difficult for evil hackers to reverse engineer it and steal your programming know-how.