added explicit license information to setup.py
switched from static determination of the library options to using build_ext.get_ext_fullpath().
removed dead comment
This is the result of research I did to break a monolithic hand-written Python/C extension into components, including new components written in Cython.
I decided to move the core C functions into one shared library, move the old Python/C into its own Python extension, and develop the new components in their own extensions, where all the extensions depend on the core shared library.
I couldn't find documentation on how that might work, so developed this example to explore the topic.
NOTE: I am far from knoweldgeable about this topic. What you read here is the rough equivalent of taking a hammer and beating things until it works. If there's a better solution, please let me know.
The file "abcd.c" implements three functions:
const char *abcd_get_name(void)
- get a byte string indicated by an internal value;
void abcd_set_name(int value)
- modify that internal value (changes the returned string);
int abcd_add_me(int i, int j)
- return the sum i+j
and set the internal value to that sum.
I want these to be in the Python module "abcd" as functions with the same name.
The setup.py creates three Python extensions:
abcd.libabcd_core
contains the core shared library, plus an empty
module definition for Python;
abcd.name
uses the raw Python/C API to access
abcd_get_name()
/abcd_set_name()
as the Python functions get_name()
/set_name()
;
abcd.add
uses Cython to call abcd_add_me()
as add_me()
.
I chose this design to check that calling add_me()
in Cython
modifies the same internal value used by get_name()
in the Python/C
API.
Before getting started, my initial version of this abcd
package put
the C extension code into abcd.core
instead of abcd.libabcd_core
.
The result worked fine on macOS, but not on Linux-based OSes, where I
had to make the shared library dependency explicit with the -l
flag
(see below). That in turn required my core package start with lib
and have a prefix that likely wasn't shared with other packages. Now,
back to the narrative, which has been updated to reflect my new
knowledge.
I started by making an abcd.libabcd_core
containing only the C code,
compiled like this in setup.py
:
module1 = Extension(
"abcd.libabcd_core",
sources = ["abcd.c"],
depends = ["abcd.h"],
)
I also had a bare abcd/__init__.py
package that did nothing.
The pip install
compiled and installed
abc/libabcd_core.cpython-39-darwin.so
without a problem, but it
wasn't importable:
>>> import abcd.libabcd_core
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit_libabcd_core)
I decided the best option was to have abcd.libabcd_core
define
an empty Python module, also in abcd.c
:
/* Define an empty Python library */
static PyMethodDef libabcd_core_methods[] = {
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef libabcd_core_module = {
PyModuleDef_HEAD_INIT,
"abcd.libabcd_core",
"Store the core C functionality",
-1,
libabcd_core_methods
};
PyMODINIT_FUNC
PyInit_libabcd_core(void) {
return PyModule_Create(&libabcd_core_module);
}
This lets Python handle the mechanics of loading the shared library, which happens to bring along my C shared library.
The next step is to check if a Python/C extension can use the shared
library, so I created abcd_name.c
using the traditional Python/C
API, which defines the following Python functions:
/* Define the Python function "get_name()" */
static PyObject *
get_name(PyObject *self, PyObject *args) {
const char *name = abcd_get_name();
return PyUnicode_FromString(name);
}
/* Define the Python function "set_name(i)" */
static PyObject *
set_name(PyObject *self, PyObject *args) {
int value;
if (!PyArg_ParseTuple(args, "i:set_name", &value)) {
return NULL;
}
abcd_set_name(value);
Py_RETURN_NONE;
}
/* Define the available methods */
static PyMethodDef abcd_name_methods[] = {
{"get_name", get_name, METH_NOARGS, "Get the name"},
{"set_name", set_name, METH_VARARGS, "Set the name"},
{NULL, NULL, 0, NULL}
};
The setup.py
is pretty simple:
module2 = Extension(
"abcd.name",
sources = ["abcd_name.c"],
depends = ["abcd.h"],
# extra_link_args will go here, in a bit
)
and pip is able to build and install it.
Does it work?
Not quite.
>>> import abcd.name
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: dlopen(/Users/dalke/cvses/abcd/abcd/name.cpython-39-darwin.so, 2): Symbol not found: _abcd_get_name
Referenced from: /Users/dalke/cvses/abcd/abcd/name.cpython-39-darwin.so
Expected in: flat namespace
in /Users/dalke/cvses/abcd/abcd/name.cpython-39-darwin.so
I noticed that I can make it work by first importing
abcd.libabcd_core
and then import abcd.name
:
>>> import abcd.libabcd_core
>>> import abcd.name
>>> abcd.name.get_name()
'Andrew'
>>> abcd.name.set_name(1)
>>> abcd.name.get_name()
'Dalke'
To ensure this always happens, I added the following to the top of abcd/__init__.py
:
from . import libabcd_core as _core
My best interpretation is that the macOS shared library loader resolves missing symbols by looking for symbols already loaded into the process namespace.
I primarily develop under macOS. Months after I thought I had a successful solution, I found it didn't work on Linux-based OSes (and likely not other Unix-based OSes either). Here's an example using the CentOS release 6.10 Docker image from manylinux-2010:
>>> import abcd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/__init__.py", line 6, in <module>
from .name import get_name, set_name
ImportError: /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so: undefined symbol: abcd_set_name
That's not good.
The problem seems to be that on GNU systems (or perhaps it's a SYS V thing? Or how Python is compiled on those platformat? I mentioned I don't really know what I'm doing), the dynamic library loader doesn't consider symbol table of the current process when it load a new package. Instead, they must available from the listed dependencies.
Running ldd
shows that abcd/name.so
(or rather,
name.cpython-37m-x86_64-linux-gnu.so
, which includes the "SOABI"
string Python uses to distinguish between shared libraries meant for
different platform ABIs) doesn't include libabcd_core
:
% ldd /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffc2137a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6f4e3b5000)
libc.so.6 => /lib64/libc.so.6 (0x00007f6f4e021000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6f4e7d4000)
To verify that's the issue, I'll use patchelf
to add libabcd_core
as a shared library dependency of name.so
:
%patchelf --add-needed \
/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/libabcd_core.cpython-37m-x86_64-linux-gnu.so \
/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so
Then verify that it's added:
# ldd /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007fffccdfa000)
/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/libabcd_core.cpython-37m-x86_64-linux-gnu.so (0x00007fe16d253000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe16d036000)
libc.so.6 => /lib64/libc.so.6 (0x00007fe16cca2000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe16d658000)
Then verify that it imports:
>>> import abcd.name
>>> abcd.name.get_name()
'Andrew'
>>> abcd.name.set_name(2)
>>> abcd.name.get_name()
'Scientific'
I don't want to manually patch the elf after it's created, so what can I do in setup.py?
(Short version: add the library as a -l
option, add the build
directory as a -L
option, and add $ORIGIN
to the library's rpath
as "-Wl,-rpath,$ORIGIN"
.)
The traditional way to add a shared library dependency is to use the
-l
flag during the linker step. I'll can use sysconfig to get the value
of "SOABI
",
import sysconfig
# A value like 'cpython-37m-x86_64-linux-gnu' or 'cpython-39-darwin'
soabi = sysconfig.get_config_var("SOABI")
but as I learned, the generated extension doesn't aways include the
ABI suffix (for example, on my FreeBSD box). Instead, I need to get
"EXT_SUFFIX
", which includes the ".so
" suffix, which I then need
to strip off to get the infix (for lack of a better word) term:
# A value like '.cpython-39-x86_64-linux-gnu.so'
# or '.cpython-39-darwin.so' or '.so'
# Used to determine the shared library filename.
EXT_SUFFIX = sysconfig.get_config_var("EXT_SUFFIX")
EXT_INFIX = EXT_SUFFIX.rpartition(".")[0]
and modify module2's Extension definition to include the library:
module2 = Extension(
"abcd.name",
sources = ["abcd_name.c"],
depends = ["abcd.h"],
extra_link_args = [f"-labcd_core{EXT_INFIX}"],
)
Compilation fails, with the following:
building 'abcd.libabcd_core' extension
creating build/temp.linux-x86_64-3.7
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/opt/python/cp37-cp37m/include/python3.7m -c abcd.c -o build/temp.linux-x86_64-3.7/abcd.o
gcc -pthread -shared build/temp.linux-x86_64-3.7/abcd.o -o build/lib.linux-x86_64-3.7/abcd/libabcd_core.cpython-37m-x86_64-linux-gnu.so
building 'abcd.name' extension
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/opt/python/cp37-cp37m/include/python3.7m -c abcd_name.c -o build/temp.linux-x86_64-3.7/abcd_name.o
gcc -pthread -shared build/temp.linux-x86_64-3.7/abcd_name.o -o build/lib.linux-x86_64-3.7/abcd/name.cpython-37m-x86_64-linux-gnu.so -labcd_core.cpython-37m-x86_64-linux-gnu
/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/ld: cannot find -labcd_core.cpython-37m-x86_64-linux-gnu
collect2: error: ld returned 1 exit status
error: command 'gcc' failed with exit status 1
This says the linker couldn't find
libabcd_core.cpython-37m-x86_64-linux-gnu.so
, which makes sense as the library
isn't on the library search path. The output shows that library has
been compiled, and is in build/lib.linux-x86_64-3.7
, so I need to
add "-Lbuild/lib. linux-x86_64-3.7"
as a linker argument.
The platform specifier string "linux-x86_64-3.7" is not the same as the SOABI. With a bit of research, I figured out how to make it, using the following:
# A value like 'linux-x86_64' or 'macosx-10.14-x86_64'
platform = sysconfig.get_platform()
# The default location of the build directory.
# (This assumes the directory was not specified on the command-line.)
plat_specifier = "%s-%d.%d" % (platform, *sys.version_info[:2])
which I'll add to module's Extension definition:
module2 = Extension(
"abcd.name",
sources = ["abcd_name.c"],
depends = ["abcd.h"],
extra_link_args = [f"-Lbuild/lib.{plat_specifier}/abcd", f"-labcd_core{EXT_INFIX}"],
)
This compiles and installs! But does it work?
>>> import abcd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/__init__.py", line 6, in <module>
from .name import get_name, set_name
ImportError: libabcd_core.cpython-37m-x86_64-linux-gnu.so: cannot open shared object file: No such file or directory
That's still not what I want. What does ldd
say?
% ldd /opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/name.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffd73ccf000)
libabcd_core.cpython-37m-x86_64-linux-gnu.so => not found
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f54f9019000)
libc.so.6 => /lib64/libc.so.6 (0x00007f54f8c85000)
/lib64/ld-linux-x86-64.so.2 (0x00007f54f9438000)
Hmm. libabcd_core.so
is on the path, but can't be resolved. It
appears the dynamic library loader doesn't look at other libraries in
the same directory. I'll check if that's the case by adding the
appropriate directory to the LD_LIBRARY_PATH
:
% env LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/python/cp37-cp37m/lib/python3.7/site-packages/abcd/ \
/opt/python/cp37-cp37m/bin/python
Python 3.7.9 (default, Jan 11 2021, 19:12:33)
[GCC 8.3.1 20190311 (Red Hat 8.3.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import abcd
>>> abcd.get_name()
'Andrew'
>>>
That fixed it! Now how do I tell the dynamic library loader to look in the same directory for the dependency?
There's a special linker value called rpath
(though I've read that's
deprecated and you should use runpath
instead?). If it contains
$ORIGIN
(or alternatively ${ORIGIN}
), then the loader will look
relative to the directory containing the library. This is designed so
movies can be move around without having to change the relative paths
each time.
I'll configure this in module2's Extension definition:
module2 = Extension(
"abcd.name",
sources = ["abcd_name.c"],
depends = ["abcd.h"],
extra_link_args = [f"-Lbuild/lib.{plat_specifier}/abcd", f"-labcd_core{EXT_INFIX}",
"-Wl,-rpath,$ORIGIN"],
)
After a re-installation ... it works!:
>>> import abcd
>>>
Problem is, the fix I did for CentOS breaks macOS support. If I try to pip install
the current code, I get:
building 'abcd.libabcd_core' extension
creating build/temp.macosx-10.14-x86_64-3.9
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Users/dalke/venvs/py39-2021-4/include -I/Users/dalke/local/include/python3.9 -c abcd.c -o build/temp.macosx-10.14-x86_64-3.9/abcd.o
gcc -bundle -undefined dynamic_lookup build/temp.macosx-10.14-x86_64-3.9/abcd.o -o build/lib.macosx-10.14-x86_64-3.9/abcd/libabcd_core.cpython-39-darwin.so
building 'abcd.name' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Users/dalke/venvs/py39-2021-4/include -I/Users/dalke/local/include/python3.9 -c abcd_name.c -o build/temp.macosx-10.14-x86_64-3.9/abcd_name.o
gcc -bundle -undefined dynamic_lookup build/temp.macosx-10.14-x86_64-3.9/abcd_name.o -o build/lib.macosx-10.14-x86_64-3.9/abcd/name.cpython-39-darwin.so -Lbuild/lib.macosx-10.14-x86_64-3.9/abcd -labcd_core.cpython-39-darwin -Wl,-rpath,$ORIGIN
ld: can't link with bundle (MH_BUNDLE) only dylibs (MH_DYLIB) file 'build/lib.macosx-10.14-x86_64-3.9/abcd/libabcd_core.cpython-39-darwin.so'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command '/usr/bin/gcc' failed with exit code 1
Apparantly it's because Python compiles with -bundle
instead of
-dynamiclib
? I didn't bother to figure it out, since I knew that I
could support macOS without making any changes.
My solution instead was to only include the extra linker args when not compiling for macOS (a.k.a "darwin"):
import sys
if sys.platform == "darwin":
libcore_link_args = []
else:
libcore_link_args = [f"-Lbuild/lib.{plat_specifier}/abcd", f"-labcd_core.{EXT_INFIX}", "-Wl,-rpath,$ORIGIN"]
module2 = Extension(
"abcd.name",
sources = ["abcd_name.c"],
depends = ["abcd.h"],
extra_link_args = libcore_link_args,
)
With that, I'm able to support CentOS and macOS.
Whew. That was a long detour.
Finally, I want a Cython extension which uses add_me()
so I can
check that the two extensions are changing the same private value in
the core shared library.
The interface abcd_add.pyx
is ridiculously simple, for someone like
me used to the Python/C API:
# Use a Cython extension to call abcd_add_me()
cdef extern from "abcd.h" nogil:
int abcd_add_me(int i, int j)
def add_me(int i, int j):
return abcd_add_me(i, j)
and the setup.py
gains a new Extension definition, with the same
extra_link_args
as module2:
# A Cython extension
module3 = Extension(
"abcd.add",
sources = ["abcd_add.pyx"],
depends = ["abcd.h"],
extra_link_args = libcore_link_args,
)
along with a cythonize()
in the corresponding setup.py
around the
Extension()
definition (see the next section).
After a pip install
, check if it works:
>>> from abcd import add
>>> add.add_me(2, 7)
9
>>> add.add_me(3, -2)
1
Now for the real test - are both extensions using the same memory space?
>>> from abcd import name
>>> name.get_name()
'Dalke'
>>> add.add_me(0, 0)
0
>>> name.get_name()
'Andrew'
Yes! This shows that the Cython add_me()
shares the same internal
value as the Python/C get_name()
.
You can also try the small test suite in the tests/
subdirectory.
% cd tests/
% python test_abcd.py
The final setup.py
is not that simple, but it does work on macOS,
CentOS, and FreeBSD, with Python 3.7 and Python 3.9.
import sys
from setuptools import setup, Extension
import sysconfig
from Cython.Build import cythonize
#### System configuration information
# A value like '.cpython-39-x86_64-linux-gnu.so'
# or '.cpython-39-darwin.so' or '.so'
# Used to determine the shared library filename.
EXT_SUFFIX = sysconfig.get_config_var("EXT_SUFFIX")
EXT_INFIX = EXT_SUFFIX.rpartition(".")[0]
# A value like 'linux-x86_64', 'macosx-10.14-x86_64', or
# 'freebsd-12.2-RELEASE-p9-amd64'
platform = sysconfig.get_platform()
# The default location of the build directory.
# (This assumes the directory was not specified on the command-line.)
plat_specifier = "%s-%d.%d" % (platform, *sys.version_info[:2])
# Figure out the platform-specific linker arguments so the new
# extensions can access the C functions in the core library.
if sys.platform == "darwin":
libcore_link_args = []
else:
# The compiler needs the -L to find libcore in the build directory.
# The run-time loader needs an '$ORIGIN' rpath to find libcore in the install directory.
libcore_link_args = [f"-Lbuild/lib.{plat_specifier}/abcd", f"-labcd_core{EXT_INFIX}", "-Wl,-rpath,$ORIGIN"]
#### Extension configuration information
# This starts with 'lib' so I can use the '-l' flag.
module1 = Extension(
"abcd.libabcd_core",
sources = ["abcd.c"],
depends = ["abcd.h"],
)
# A hand-written Python/C extension.
module2 = Extension(
"abcd.name",
sources = ["abcd_name.c"],
depends = ["abcd.h"],
extra_link_args = libcore_link_args,
)
# A Cython extension
module3 = Extension(
"abcd.add",
sources = ["abcd_add.pyx"],
depends = ["abcd.h"],
extra_link_args = libcore_link_args,
)
#### setup()
setup(
name = "abcd",
version = "1.0",
description = "example of a shared library used by C/Python and Cython extensions",
author = "Andrew Dalke",
packages = ["abcd"],
ext_modules = [
module1,
module2,
] + cythonize([
module3,
]),
)
__init__.py
The abcd/__init__.py
file first loads abcd. libabcd_core
(to
ensure the C shared library is available in macOS) then loads the
functions from the Python/C and Cython extensions.
# Load the C shared library before loading any extensions
from . import libabcd_core as _core
# Import the Python/C extension
from .name import get_name, set_name
# Import the Cython extension
from .add import add_me
This package is written in 2021 by Andrew Dalke <dalke@dalkescientific.com>.
To the extent possible under law, the author has dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.
You should have received a copy of the CC0 Public Domain Dedication along with this software. If not, see http://creativecommons.org/publicdomain/zero/1.0/.
Version 1.1 (2021-09-20): Added support for CentOS and FreeBSD.
Version 1.0 (2021-08-13): Initial public release. Only supports macOS.