Example Python package with Python/C and Cython extensions
added explicit license information to setup.py
switched from static determination of the library options to using build_ext.get_ext_fullpath().
removed dead comment

heads

tip
browse log

clone

read-only
https://hg.sr.ht/~dalke/abcd
read/write
ssh://hg@hg.sr.ht/~dalke/abcd

#"abcd" - an example C shared library used by both Python/C and Cython Python extensions

This is the result of research I did to break a monolithic hand-written Python/C extension into components, including new components written in Cython.

I decided to move the core C functions into one shared library, move the old Python/C into its own Python extension, and develop the new components in their own extensions, where all the extensions depend on the core shared library.

I couldn't find documentation on how that might work, so developed this example to explore the topic.

NOTE: I am far from knoweldgeable about this topic. What you read here is the rough equivalent of taking a hammer and beating things until it works. If there's a better solution, please let me know.

#Overview

The file "abcd.c" implements three functions:

  1. const char *abcd_get_name(void) - get a byte string indicated by an internal value;

  2. void abcd_set_name(int value) - modify that internal value (changes the returned string);

  3. int abcd_add_me(int i, int j) - return the sum i+j and set the internal value to that sum.

I want these to be in the Python module "abcd" as functions with the same name.

The setup.py creates three Python extensions:

  1. abcd.libabcd_core contains the core shared library, plus an empty module definition for Python;

  2. abcd.name uses the raw Python/C API to access abcd_get_name()/abcd_set_name() as the Python functions get_name()/set_name();

  3. abcd.add uses Cython to call abcd_add_me() as add_me().

I chose this design to check that calling add_me() in Cython modifies the same internal value used by get_name() in the Python/C API.

#Creating the "abcd.libabcd_core" shared library

Before getting started, my initial version of this abcd package put the C extension code into abcd.core instead of abcd.libabcd_core. The result worked fine on macOS, but not on Linux-based OSes, where I had to make the shared library dependency explicit with the -l flag (see below). That in turn required my core package start with lib and have a prefix that likely wasn't shared with other packages. Now, back to the narrative, which has been updated to reflect my new knowledge.

I started by making an abcd.libabcd_core containing only the C code, compiled like this in setup.py:

module1 = Extension(
    "abcd.libabcd_core",
    sources = ["abcd.c"],
    depends = ["abcd.h"],
    )

I also had a bare abcd/__init__.py package that did nothing.

The pip install compiled and installed abc/libabcd_core.cpython-39-darwin.so without a problem, but it wasn't importable:

>>> import abcd.libabcd_core
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit_libabcd_core)

I decided the best option was to have abcd.libabcd_core define an empty Python module, also in abcd.c:

/* Define an empty Python library */

static PyMethodDef libabcd_core_methods[] = {
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef libabcd_core_module = {
    PyModuleDef_HEAD_INIT,
    "abcd.libabcd_core",
    "Store the core C functionality",
    -1,
    libabcd_core_methods
};

PyMODINIT_FUNC
PyInit_libabcd_core(void) {
  return PyModule_Create(&libabcd_core_module);
}

This lets Python handle the mechanics of loading the shared library, which happens to bring along my C shared library.

#Creating "abcd.name" Python/C extension

The next step is to check if a Python/C extension can use the shared library, so I created abcd_name.c using the traditional Python/C API, which defines the following Python functions:

/* Define the Python function "get_name()" */
static PyObject *
get_name(PyObject *self, PyObject *args) {
  const char *name = abcd_get_name();
  return PyUnicode_FromString(name);
}

/* Define the Python function "set_name(i)" */
static PyObject *
set_name(PyObject *self, PyObject *args) {
  int value;
  if (!PyArg_ParseTuple(args, "i:set_name", &value)) {
    return NULL;
  }
  abcd_set_name(value);
  Py_RETURN_NONE;
}

/* Define the available methods */
static PyMethodDef abcd_name_methods[] = {
  {"get_name", get_name, METH_NOARGS, "Get the name"},
  {"set_name", set_name, METH_VARARGS, "Set the name"},
  {NULL, NULL, 0, NULL}
};

The setup.py is pretty simple:

module2 = Extension(
    "abcd.name",
    sources = ["abcd_name.c"],
    depends = ["abcd.h"],
    # extra_link_args will go here, in a bit
    )

and pip is able to build and install it.

#Problem importing "abcd.name" on macOS

Does it work?

Not quite.

>>> import abcd.name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dlopen(/Users/dalke/cvses/abcd/abcd/name.cpython-39-darwin.so, 2): Symbol not found: _abcd_get_name
  Referenced from: /Users/dalke/cvses/abcd/abcd/name.cpython-39-darwin.so
  Expected in: flat namespace
  in /Users/dalke/cvses/abcd/abcd/name.cpython-39-darwin.so

I noticed that I can make it work by first importing abcd.libabcd_core and then import abcd.name:

>>> import abcd.libabcd_core
>>> import abcd.name
>>> abcd.name.get_name()
'Andrew'
>>> abcd.name.set_name(1)
>>> abcd.name.get_name()
'Dalke'

To ensure this always happens, I added the following to the top of abcd/__init__.py:

from . import libabcd_core as _core

My best interpretation is that the macOS shared library loader resolves missing symbols by looking for symbols already loaded into the process namespace.

#Problem importing "abcd.name" on Linux-based OSes

I primarily develop under macOS. Months after I thought I had a successful solution, I found it didn't work on Linux-based OSes (and likely not other Unix-based OSes either). Here's an example using the CentOS release 6.10 Docker image from manylinux-2010:

>>> import abcd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/__init__.py", line 6, in <module>
    from .name import get_name, set_name
ImportError: /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so: undefined symbol: abcd_set_name

That's not good.

The problem seems to be that on GNU systems (or perhaps it's a SYS V thing? Or how Python is compiled on those platformat? I mentioned I don't really know what I'm doing), the dynamic library loader doesn't consider symbol table of the current process when it load a new package. Instead, they must available from the listed dependencies.

Running ldd shows that abcd/name.so (or rather, name.cpython-37m-x86_64-linux-gnu.so, which includes the "SOABI" string Python uses to distinguish between shared libraries meant for different platform ABIs) doesn't include libabcd_core:

% ldd /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so
	linux-vdso.so.1 =>  (0x00007ffc2137a000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6f4e3b5000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f6f4e021000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f6f4e7d4000)

To verify that's the issue, I'll use patchelf to add libabcd_core as a shared library dependency of name.so:

%patchelf --add-needed \
  /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/libabcd_core.cpython-37m-x86_64-linux-gnu.so \
  /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so

Then verify that it's added:

# ldd /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so
	linux-vdso.so.1 =>  (0x00007fffccdfa000)
	/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/libabcd_core.cpython-37m-x86_64-linux-gnu.so (0x00007fe16d253000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe16d036000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fe16cca2000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe16d658000)

Then verify that it imports:

>>> import abcd.name
>>> abcd.name.get_name()
'Andrew'
>>> abcd.name.set_name(2)
>>> abcd.name.get_name()
'Scientific'

I don't want to manually patch the elf after it's created, so what can I do in setup.py?

#Linking "abcd.name" to "libabcd_core" on Linux-based OSes

(Short version: add the library as a -l option, add the build directory as a -L option, and add $ORIGIN to the library's rpath as "-Wl,-rpath,$ORIGIN".)

The traditional way to add a shared library dependency is to use the -l flag during the linker step. I'll can use sysconfig to get the value of "SOABI",

import sysconfig
# A value like 'cpython-37m-x86_64-linux-gnu' or 'cpython-39-darwin'
soabi = sysconfig.get_config_var("SOABI")

but as I learned, the generated extension doesn't aways include the ABI suffix (for example, on my FreeBSD box). Instead, I need to get "EXT_SUFFIX", which includes the ".so" suffix, which I then need to strip off to get the infix (for lack of a better word) term:

# A value like '.cpython-39-x86_64-linux-gnu.so'
# or  '.cpython-39-darwin.so' or '.so'
# Used to determine the shared library filename.
EXT_SUFFIX = sysconfig.get_config_var("EXT_SUFFIX")
EXT_INFIX = EXT_SUFFIX.rpartition(".")[0]

and modify module2's Extension definition to include the library:

module2 = Extension(
    "abcd.name",
    sources = ["abcd_name.c"],
    depends = ["abcd.h"],
    extra_link_args = [f"-labcd_core{EXT_INFIX}"],
    )

Compilation fails, with the following:

    building 'abcd.libabcd_core' extension
    creating build/temp.linux-x86_64-3.7
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/opt/python/cp37-cp37m/include/python3.7m -c abcd.c -o build/temp.linux-x86_64-3.7/abcd.o
    gcc -pthread -shared build/temp.linux-x86_64-3.7/abcd.o -o build/lib.linux-x86_64-3.7/abcd/libabcd_core.cpython-37m-x86_64-linux-gnu.so
    building 'abcd.name' extension
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/opt/python/cp37-cp37m/include/python3.7m -c abcd_name.c -o build/temp.linux-x86_64-3.7/abcd_name.o
    gcc -pthread -shared build/temp.linux-x86_64-3.7/abcd_name.o -o build/lib.linux-x86_64-3.7/abcd/name.cpython-37m-x86_64-linux-gnu.so -labcd_core.cpython-37m-x86_64-linux-gnu
    /opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/ld: cannot find -labcd_core.cpython-37m-x86_64-linux-gnu
    collect2: error: ld returned 1 exit status
    error: command 'gcc' failed with exit status 1

This says the linker couldn't find libabcd_core.cpython-37m-x86_64-linux-gnu.so, which makes sense as the library isn't on the library search path. The output shows that library has been compiled, and is in build/lib.linux-x86_64-3.7, so I need to add "-Lbuild/lib. linux-x86_64-3.7" as a linker argument.

The platform specifier string "linux-x86_64-3.7" is not the same as the SOABI. With a bit of research, I figured out how to make it, using the following:

# A value like 'linux-x86_64' or 'macosx-10.14-x86_64'
platform = sysconfig.get_platform()

# The default location of the build directory.
# (This assumes the directory was not specified on the command-line.)
plat_specifier = "%s-%d.%d" % (platform, *sys.version_info[:2])

which I'll add to module's Extension definition:

module2 = Extension(
    "abcd.name",
    sources = ["abcd_name.c"],
    depends = ["abcd.h"],
    extra_link_args = [f"-Lbuild/lib.{plat_specifier}/abcd", f"-labcd_core{EXT_INFIX}"],
    )

This compiles and installs! But does it work?

>>> import abcd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/__init__.py", line 6, in <module>
    from .name import get_name, set_name
ImportError: libabcd_core.cpython-37m-x86_64-linux-gnu.so: cannot open shared object file: No such file or directory

That's still not what I want. What does ldd say?

% ldd /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so
	linux-vdso.so.1 =>  (0x00007ffd73ccf000)
	libabcd_core.cpython-37m-x86_64-linux-gnu.so => not found
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f54f9019000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f54f8c85000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f54f9438000)

Hmm. libabcd_core.so is on the path, but can't be resolved. It appears the dynamic library loader doesn't look at other libraries in the same directory. I'll check if that's the case by adding the appropriate directory to the LD_LIBRARY_PATH:

% env LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/  \
        /opt/python/cp37-cp37m/bin/python
Python 3.7.9 (default, Jan 11 2021, 19:12:33)
[GCC 8.3.1 20190311 (Red Hat 8.3.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import abcd
>>> abcd.get_name()
'Andrew'
>>>

That fixed it! Now how do I tell the dynamic library loader to look in the same directory for the dependency?

There's a special linker value called rpath (though I've read that's deprecated and you should use runpath instead?). If it contains $ORIGIN (or alternatively ${ORIGIN}), then the loader will look relative to the directory containing the library. This is designed so movies can be move around without having to change the relative paths each time.

I'll configure this in module2's Extension definition:

module2 = Extension(
    "abcd.name",
    sources = ["abcd_name.c"],
    depends = ["abcd.h"],
    extra_link_args = [f"-Lbuild/lib.{plat_specifier}/abcd", f"-labcd_core{EXT_INFIX}",
                       "-Wl,-rpath,$ORIGIN"],
    )

After a re-installation ... it works!:

>>> import abcd
>>>

#Re-adding macOS support

Problem is, the fix I did for CentOS breaks macOS support. If I try to pip install the current code, I get:

    building 'abcd.libabcd_core' extension
    creating build/temp.macosx-10.14-x86_64-3.9
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Users/dalke/venvs/py39-2021-4/include -I/Users/dalke/local/include/python3.9 -c abcd.c -o build/temp.macosx-10.14-x86_64-3.9/abcd.o
    gcc -bundle -undefined dynamic_lookup build/temp.macosx-10.14-x86_64-3.9/abcd.o -o build/lib.macosx-10.14-x86_64-3.9/abcd/libabcd_core.cpython-39-darwin.so
    building 'abcd.name' extension
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Users/dalke/venvs/py39-2021-4/include -I/Users/dalke/local/include/python3.9 -c abcd_name.c -o build/temp.macosx-10.14-x86_64-3.9/abcd_name.o
    gcc -bundle -undefined dynamic_lookup build/temp.macosx-10.14-x86_64-3.9/abcd_name.o -o build/lib.macosx-10.14-x86_64-3.9/abcd/name.cpython-39-darwin.so -Lbuild/lib.macosx-10.14-x86_64-3.9/abcd -labcd_core.cpython-39-darwin -Wl,-rpath,$ORIGIN
    ld: can't link with bundle (MH_BUNDLE) only dylibs (MH_DYLIB) file 'build/lib.macosx-10.14-x86_64-3.9/abcd/libabcd_core.cpython-39-darwin.so'
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command '/usr/bin/gcc' failed with exit code 1

Apparantly it's because Python compiles with -bundle instead of -dynamiclib? I didn't bother to figure it out, since I knew that I could support macOS without making any changes.

My solution instead was to only include the extra linker args when not compiling for macOS (a.k.a "darwin"):

import sys
if sys.platform == "darwin":
    libcore_link_args = []
else:
    libcore_link_args = [f"-Lbuild/lib.{plat_specifier}/abcd", f"-labcd_core.{EXT_INFIX}", "-Wl,-rpath,$ORIGIN"]

module2 = Extension(
    "abcd.name",
    sources = ["abcd_name.c"],
    depends = ["abcd.h"],
    extra_link_args = libcore_link_args,
    )

With that, I'm able to support CentOS and macOS.

#Creating "abcd.add" Cython extension

Whew. That was a long detour.

Finally, I want a Cython extension which uses add_me() so I can check that the two extensions are changing the same private value in the core shared library.

The interface abcd_add.pyx is ridiculously simple, for someone like me used to the Python/C API:

# Use a Cython extension to call abcd_add_me() 

cdef extern from "abcd.h" nogil:
    int abcd_add_me(int i, int j)

def add_me(int i, int j):
    return abcd_add_me(i, j)

and the setup.py gains a new Extension definition, with the same extra_link_args as module2:

# A Cython extension
module3 = Extension(
    "abcd.add",
    sources = ["abcd_add.pyx"],
    depends = ["abcd.h"],
    extra_link_args = libcore_link_args,
    )

along with a cythonize() in the corresponding setup.py around the Extension() definition (see the next section).

After a pip install, check if it works:

>>> from abcd import add
>>> add.add_me(2, 7)
9
>>> add.add_me(3, -2)
1

Now for the real test - are both extensions using the same memory space?

>>> from abcd import name
>>> name.get_name()
'Dalke'
>>> add.add_me(0, 0)
0
>>> name.get_name()
'Andrew'

Yes! This shows that the Cython add_me() shares the same internal value as the Python/C get_name().

You can also try the small test suite in the tests/ subdirectory.

% cd tests/
% python test_abcd.py

#The final setup.py

The final setup.py is not that simple, but it does work on macOS, CentOS, and FreeBSD, with Python 3.7 and Python 3.9.

import sys
from setuptools import setup, Extension
import sysconfig
from Cython.Build import cythonize

####  System configuration information

# A value like '.cpython-39-x86_64-linux-gnu.so'
# or  '.cpython-39-darwin.so' or '.so'
# Used to determine the shared library filename.
EXT_SUFFIX = sysconfig.get_config_var("EXT_SUFFIX")
EXT_INFIX = EXT_SUFFIX.rpartition(".")[0]

# A value like 'linux-x86_64', 'macosx-10.14-x86_64', or
# 'freebsd-12.2-RELEASE-p9-amd64'
platform = sysconfig.get_platform()

# The default location of the build directory.
# (This assumes the directory was not specified on the command-line.)
plat_specifier = "%s-%d.%d" % (platform, *sys.version_info[:2])

# Figure out the platform-specific linker arguments so the new
# extensions can access the C functions in the core library.
if sys.platform == "darwin":
    libcore_link_args = []
else:
    # The compiler needs the -L to find libcore in the build directory.
    # The run-time loader needs an '$ORIGIN' rpath to find libcore in the install directory.
    libcore_link_args = [f"-Lbuild/lib.{plat_specifier}/abcd", f"-labcd_core{EXT_INFIX}", "-Wl,-rpath,$ORIGIN"]

####  Extension configuration information
    
# This starts with 'lib' so I can use the '-l' flag.
module1 = Extension(
    "abcd.libabcd_core",
    sources = ["abcd.c"],
    depends = ["abcd.h"],
    )

# A hand-written Python/C extension.
module2 = Extension(
    "abcd.name",
    sources = ["abcd_name.c"],
    depends = ["abcd.h"],
    extra_link_args = libcore_link_args,
    )

# A Cython extension
module3 = Extension(
    "abcd.add",
    sources = ["abcd_add.pyx"],
    depends = ["abcd.h"],
    extra_link_args = libcore_link_args,
    )

####  setup()

setup(
    name = "abcd",
    version = "1.0",
    description = "example of a shared library used by C/Python and Cython extensions",
    author = "Andrew Dalke",
    packages = ["abcd"],
    ext_modules = [
        module1,
        module2,
        ] + cythonize([
            module3,
        ]),
    )

#The final __init__.py

The abcd/__init__.py file first loads abcd. libabcd_core (to ensure the C shared library is available in macOS) then loads the functions from the Python/C and Cython extensions.

# Load the C shared library before loading any extensions

from . import libabcd_core as _core

# Import the Python/C extension
from .name import get_name, set_name

# Import the Cython extension
from .add import add_me

This package is written in 2021 by Andrew Dalke <dalke@dalkescientific.com>.

To the extent possible under law, the author has dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.

You should have received a copy of the CC0 Public Domain Dedication along with this software. If not, see http://creativecommons.org/publicdomain/zero/1.0/.

#Release history

Version 1.1 (2021-09-20): Added support for CentOS and FreeBSD.

Version 1.0 (2021-08-13): Initial public release. Only supports macOS.