don't know what happened with branching. Hope this works.
bringing in the support too used to generate the Hilbert (and perhaps Peano) curves from simple rule defs.
updated with the corrected Hilbert classic timing numbers
updated with the correct timings for Hilbert classic
fixed typo in error message, commented out unneeded code
fixed timing error that caused classic Hilbert to actually report classic Morton
added reference to a paper describing this specific method for Hilbert generation
change USE_PEXT to RECTFILLCURVE_USE_PEXT in the migration to supplying a macro version
switch from Makefile to cmake
refer to header files via <rectfillcurve/*.h>
moving C code to src/
moved header files to include/rectfillcurve/
Added tag rectfillcurve-1.0 for changeset 9e5a6bdd73ba
A standalone C library and Python extension module for generating space-filling curves for arbitrary-sized rectangles through an iterator API. (See Background for details.)
This package implements several curves:
"row" - for each row, visit each column in forward order;
"zigzag" - for each row, visit each column in alternating order;
"hilbert_classic" - the classic Hilbert curve, which requires a square with lengths which are a power of 2 in size (or 0x0).
"hilbert" - an implementation of Jakub Červený's Generalized Hilbert space-filling curve algorithm for 2D;
"morton_classic" - the classic Morton/Z-order curve, which requires a square with lengths which are a power of 2 in size (or 0x0).
"morton" - an implementation of Perdacher et al's generalized Morton/Z-order method;
"mlcg" - use a multiplicative linear congruential generator to visit the coordinates in a random-like order.
The rectangles are described by (width, height), with coordinates (i, j) where 0 <= i < width and 0 <=j < height.
There is also an option to flip the coordinates. The flipped "row" and "zigzag" curves processes by columns instead of by row. The flipped "hilbert" curve uses the less preferred direction. The flipped "morton" curve uses a И-curve (also called an N-curve) instead of a Z-curve. The flipped "mlcg" curve swaps the calculation used to convert the internal seqence to an (i, j) coordinate.
The first position for all curves is (0, 0).
If you really need the performance, include the curve generation code in your main loop, rather than use an iterator from this package.
The function-based iterator API adds significant overhead compared to a basic for-loop. If we normalize the for-loop time to "1 second", then the "row" and "zigzag" iterators take about 4.5 seconds, and the "morton_classic" and "hilbert_classic" methods take about 6 seconds and 9 seconds, respectively.
The generalized "morton" iterator isn't much slower, at 6.3 seconds, but the generalized "hilbert" iterate takes significantly longer, at 27 seconds.
The "mlcg" iterator ranges from 16 seconds to 64 seconds, depending on
value of area = w*h
; it's particularly slow if the area is a power
of two and if the area is larger than 2³²-6.
See Performance Analysis, below, for plots.
I'm disappointed that the generalized Hilbert curve is so slow, at about 1/5th the performance of the "classic" Hilbert curve. I suspect there are faster ways to generate this curve for rectangles.
The "mlcg" curve is even slower. I want a way to visit every point in a random-like pattern and thought it would be fast because it's only a multiplication and a modulo. In practice, I have to discard about half the points, and the multiplication can require 128-bit values.
Can you point me to a better implementation or approach of either of these?
First, clone the Mercurial repository:
% hg clone https://hg.sr.ht/~dalke/rectfillcurve
Now you have two options. You can compile the executables and shared library, or your can build and install the Python extension.
Running make
with no options creates the binaries rectfillcurve
and time_rectfillcurve
, and a the shared library
librectfillcurve.so
:
% make
If you are compiling on x86-64 and know that your processor supports
the PEXT instruction then you may need to let your compiler know that
it should enable the _pext_u64
intrinsic. This makes the classic
Morton algorithm about 5% faster overall. One way to do this with gcc
is:
% make CC="gcc -mbmi2"
There is no make install
or equivalent, but there is a make clean
.
Here is the --help
for rectfillcurve
:
% ./rectfillcurve --help
Usage: ./rectfillcurve [--row|--zigzag|--hilbert|--morton|--mlcg] [--flip] [--output FILENAME] [W [H]]
Use a space-filling curve to visit each coordinate of a WxH rectangle.
Options:
--row For each row, visit each column, in increasing column order.
--zigzag For each row, visit each column, in alternating column order.
--hilbert Use the generalized Hilbert method (the default).
--morton Use the generalized Morton/Z-curve method.
--mlcg Use the random-like multiplicative linear congruential generator.
--flip Swap the row and column visit orders.
--output / -o Write the output to the named file (default: stdout).
Each output line contains a visit coordinate, in visit order, as (i, j) pairs,
written as two space-separated integers. The i coordinates - the columns - range
from 0 <= i < W. The j coordinates - the rows - range from 0 <= j < H.
For example, the following generates the Morton/Z-curve coordinates for a 5x3 rectangle:
% ./rectfillcurve --morton 5 3
0 0
0 1
1 0
1 1
0 2
1 2
2 0
2 1
3 0
3 1
2 2
3 2
4 0
4 1
4 2
See below for how to use the C API.
The distribution includes the program plot_curve2d.py
, styled along
the line's of Jakub Červený's
plotpath.m
to use matplotlib to plot the 2D curve:
% ./rectfillcurve --morton 5 3 | python plot_curve2d.py -o morton.png
Saved to 'morton.png'
The graph is in math coordinates, with i
increasing on the x-axis
and j
increasing on the y-axis. Rotate the graph 90° clockwise if
your view of a table has the origin in the upper-left with i
as rows
and j
a columns.
You can install the package as a Python extension. The following will
install the Python modules _rectfillcurve
and rectfillcurve
, and
to install the script entry point rectfillcurve
:
% python -m pip install .
(It is possible to set the the CC environment variable to have the compiler use the BMI2 intrinsics, but the speedup isn't really visible given Python's iterator overhead.)
This step requires the cffi package to generate the Python/C bindings. It's configured as a dependency so if PyPI is available it should be installed automatically.
The installed script works in a similar fashion to the rectfillcurve
executable of the same name:
% rectfillcurve --help
usage: rectfillcurve [-h] [--generic] [--row] [--zigzag] [--hilbert]
[--hilbert-static] [--morton] [--mlcg] [--flip] [--version] [w] [h]
Generate rectangle-filling curve coordinates.
positional arguments:
w The width of the rectangle (0<=i<w) (default: 5)
h The height of the rectangle (i<=j<h) (default: w)
options:
-h, --help show this help message and exit
--generic Use the generic API instead of the curve-specific API.
--row Iterate in row order, starting from j=0 for each row.
--zigzag Iterate in row order, going back and forth for each row.
--hilbert Use the generalized Hilbert order. (default curve)
--hilbert-static Use the generalized Hilbert order with static memory allocation.
--morton Use Morton/Z-order.
--mlcg Use a random-like multiplicative linear congruential generator.
--flip Use column major order instead of row major.
--version show program's version number and exit
See below for how to use the Python API.
While the Makefile will generate a librectfillcurve.so
, I don't much
experience using it and I don't have an installer for it because for
my purposes I'll be compiling the source as part of the package rather
than as a library. I will use that library for these examples.
There are two APIs: a generic one which lets you specify the curve as a parameter, and a lower-level API specific for each curve. For the Hilbert and Morton options, the generic version will use the "classic" implementations if the size is a square with a length which is a power of two.
The generic API looks like:
// Call this 'hilbert_2_3.c' and compile using:
//
// cc hilbert_2_3.c -I. -L. -lrectfillcurve -o hilbert_2_3
#include <stdio.h>
#include <rectfillcurve.h>
int main(void) {
rectfillcurve curve;
int i, j;
if (rectfillcurve_init(&curve, RECTFILLCURVE_HILBERT, 2, 3, 0)) {
fprintf(stderr, "Unexpected error!\n");
return 0;
}
while (rectfillcurve_next(&curve, &i, &j)) {
printf("%d %d\n", i, j);
}
// free internal data allocated by rectfillcurve_init().
rectfillcurve_free(&curve);
return 0;
}
Running the output program gives:
% ./hilbert_2_3
0 0
1 0
1 1
1 2
0 2
0 1
The rectfillcurve_free()
isn't strictly needed as the program will
exit so it doesn't matter if the program leaks memory.
If you know the curve you want to use, and don't need the flexibility of choosing the curve at run-time, then you can use the curve-specific API. In general the functions are:
rectfillcurve_{curve}_clear
-- clear the data structurerectfillcurve_{curve}_init
-- initialize to a given width, height, and fliprectfillcurve_{curve}_next
-- get the next (i, j) coordinateThe following typedef structures use fixed-memory space and can be allocated on the stack stack, with no malloc calls:
rectfillcurve_row
rectfillcurve_zigzag
rectfillcurve_hilbert_static
rectfillcurve_hilbert_classic
rectfillcurve_morton_classic
rectfillcurve_morton
rectfillcurve_mlcg
The rectfillcurve_hilbert_static
type reserves enough space to
handle the full ranges of sizes, up to 2³¹-1 x 2³¹-1. Alternatively,
the rectfillcurve_hilbert
type will malloc space based on the sizes
given. You will need to use rectfillcurve_hilbert_free
to free this
space.
The following example uses the API for the Morton curve.
// Call this 'morton_2_3.c' and compile using:
//
// cc morton_2_3.c -I. -L. -lrectfillcurve -o morton_2_3
#include <stdio.h>
#include <rectfillcurve.h>
int main(void) {
rectfillcurve_morton curve;
int i, j;
if (rectfillcurve_morton_init(&curve, 2, 3, 0)) {
fprintf(stderr, "Unexpected error!\n");
return 0;
}
while (rectfillcurve_morton_next(&curve, &i, &j)) {
printf("%d %d\n", i, j);
}
return 0;
}
The rectfillcurve
module contains the user-facing API, plus the "main"
entry-point for the installed script.
The functions hilbert
, hilbert_classic
, hilbert_static
, mlcg
,
morton
, morton_classic
, row
, and zigzag
all take a width, and
height and optional value for flip
, and return the appropriate
iterator over that space:
>>> import rectfillcurve
>>> rectfillcurve.morton(5, 3)
<generator object morton.<locals>.morton at 0x10d773060>
>>> list(rectfillcurve.morton(5, 3))
[(0, 0), (0, 1), (1, 0), (1, 1), (0, 2), (1, 2), (2, 0), (2, 1), (3, 0), (3, 1), (2, 2), (3, 2), (4, 0), (4, 1), (4, 2)]
>>> list(rectfillcurve.morton(5, 3, flip=1))
[(0, 0), (1, 0), (0, 1), (1, 1), (0, 2), (1, 2), (2, 0), (3, 0), (2, 1), (3, 1), (4, 0), (4, 1), (2, 2), (3, 2), (4, 2)]
The function rectfillcurve.rectfillcurve()
takes an additional
curve_type
parameter and determines the most appropriate
implementation. For example, if you specify "morton" and the size is
128x128 then it will use the "morton classic" implementation rather
than the generic morton implementation for rectangles.
>>> import rectfillcurve
>>> rectfillcurve.rectfillcurve (3, 2, curve_type="zigzag", flip=1)
<generator object rectfillcurve.<locals>.rectfillcurve at 0x10bc3f220>
>>> list(rectfillcurve.rectfillcurve(3, 2, curve_type="zigzag", flip=1))
[(0, 0), (1, 0), (2, 0), (2, 1), (1, 1), (0, 1)]
To run the unit tests, first build the Python extension then do:
% python test_rectfillcurve.py
...................................................ss.ss.............
.............s.sss.ss.......................................
----------------------------------------------------------------------
Ran 129 tests in 0.442s
The skipped tests are used either for benchmarking or to test that
internal variables support 64-bit integers. These tests take minutes
to run. One of the tests checks that each MLCG generators handles the
entire space. No one has the patience for the full test, which would
generate some 2**64
points and take about a century!
The program time_rectfillcurve
times curve generation across a range
of sizes.
By default it uses a series of rectangles where the width is about π times larger than the height and where each rectangle is about sqrt(10) larger than the previous rectangle. There are also options to use squares, and to use squares where each length is a power of two, as well as to use a user-specified size.
By default it times all of the curves which are appropriate for the specified size(s). You can instead specify which curves to use.
By default it repeats the timings 3 times. Use --repeat N
to change
that value.
Here is the full --help
:
Usage: ./time_rectfillcurve [--repeat N] [curve options] [size options | [W [H]]
Run space-filling curve timings on different WxH sizes.
Options:
--repeat N repeat up to N times (default: 3)
--version print the version and exit
--help print this help and exit
Curve options (if not specified, use all that are relevant to the size(s)):
--for-loop classic for loop (no iterator overhead)
--row row-major iterator
--zigzag row-major iterator, alternating direction each row
--hilbert-classic Hilbert curve (only if w = h and is a power of 2)
--hilbert generalized Hilbert curve
--morton-classic Morton/Z-order curve (only if w = h and is a power of 2)
--morton generalized Morton/Z-order curve
--mlcg random-like visits based on an MLCH
Size options:
--user user-specified size
--powers-of-two 19 squares with lengths which are a power of 2
--squares 20 squares with area ~10**(1/2) larger than the previous
--rectangles 37 rectangles with area ~10**(1/4) larger than the previous
W [H] specify the size (automatically sets --user)
The --for-loop
curve is the basic row-major for-loop, with an extra
test to help prevent the compiler from optimizing the code away.
long long benchmark_for_loop(int w, int h, int bad_i, int bad_j) {
long long count = 0;
for (int i=0; i<w; i++) {
for (int j=0; j<h; j++) {
if (i == bad_i || j == bad_j) {
return INVALID;
}
count++;
}
}
return count;
}
It's there to give an idea of the call overhead for the iterator API.
The curve implementations are structurally identical, and created with the following macro:
#define BENCHMARK(NAME, TYPENAME) \
long long benchmark_##NAME(int w, int h, int bad_i, int bad_j) {\
int status, i, j; \
long long count = 0; \
TYPENAME iter; \
\
status = TYPENAME##_init(&iter, w, h, 0); \
if (status != RECTFILLCURVE_OK) { \
return INVALID; \
} \
while (TYPENAME##_next(&iter, &i, &j)) { \
if ((i == bad_i) || (j == bad_j)) { \
return INVALID; \
} \
count++; \
} \
if (count != ((long long) w) * ((long long) h)) { \
return INVALID; \
} \
return count; \
}
BENCHMARK(row, rectfillcurve_row)
BENCHMARK(zigzag, rectfillcurve_zigzag)
BENCHMARK(hilbert_classic, rectfillcurve_morton_classic)
BENCHMARK(hilbert, rectfillcurve_hilbert_static)
BENCHMARK(morton_classic, rectfillcurve_morton_classic)
BENCHMARK(morton, rectfillcurve_morton)
BENCHMARK(mlcg, rectfillcurve_mlcg)
The timer uses CLOCK_MONOTONIC to measure the elapsed time before and after each benchmark call, then figures out the change in time:
clock_gettime(CLOCK_MONOTONIC, &start_time);
long long count = method->benchmark(w, h, w, h);
clock_gettime(CLOCK_MONOTONIC, &end_time);
double dt = elapsed_time(start_time, end_time);
The output is a space-separated set of columns, with a header row followed by data rows. The column header names are:
"w" for width
"h" for height
"method" for the name of the method being timed
"time" for the total time in seconds
"million/sec" for the number of iteration points visited per second, measured in millions per second. (This can be "inf" for small cases taking less than the time quantum of the monotonic clock.)
The timing should be linear in the area of the rectangle, so "million/sec" should be constant.
Here's an example of the output for a 32768x32768 square:
% ./time_rectfillcurve --repeat 1 32768
w h method time million/sec
32768 32768 for_loop 0.526962 2037.61
32768 32768 row 2.287110 469.48
32768 32768 zigzag 2.358536 455.26
32768 32768 hilbert_classic 4.696979 228.60
32768 32768 hilbert 13.610767 78.89
32768 32768 morton_classic 3.010689 356.64
32768 32768 morton 3.160907 339.69
32768 32768 mlcg 18.119807 59.26
These numbers are generated on my MacBook Pro (13-inch, M1, 2020) laptop, and available in "timings/". The "milllion/sec" plots for the three data sets are there as well, and shown here:
The "for_loop" can process about 2.1 billion points per second.
"row" and "zigzag" can iterate over about 470 million points per second.
"morton_classic" can iterate about 350 million points per second, and "morton" can iterate about 335 million points per second.
"hilbert_classic" can iterate about 230 million points per second, while "hilbert" is only about 1/3rd that speed, at 78 million points per second.
"mlcg" varies between 33 and 130 million points per second. It's
slowest for powers of two (likely because about half the points are be
discarded), and when the total area (w*h
) is larger than 2³²-6
(those size require emulating 128-bit integers).
There are many ways to visit each point on a grid once. The simplest is to visit all of the columns of the first row, then the second row, and so on. For some tasks - the standard example is matrix multiplication - the jump from one row to the next ends up loading a lot of data from relatively slow RAM, rather than being able to re-use data from the faster memory cache.
If one traversal order is more effective at using memory cache, causing the program to be faster than another traversal order, then we say the first order has better memory locality than the second.
Space-filling curves are frequently used to generate curves with better memory locality. The three most common ones are the Z-order curve (also called Morton order and Lebesgue curve), the Peano curve, and the Hilbert curve.
The simplest implementations of these three curves only works on squares of exactly the right size. For example, the Hilbert and the Morton/Z-curves are easy to construct on a square with sizes which are a power of 2 in length.
It's trickier to handle arbitrary-sized grids, with fewer resources which describe how to handle them. See below for the sources for the "hilbert" and "morton" implementations.
Space-filling curves don't need to be as complicated as those. The "zigzag" curve (I got the name from Aldo Cortesi's scurve project) simply snakes back and forth in a boustrophedon pattern. This sort of curve might be more appropriate for the outer-join of two vectors, so don't discount its simplicity out-of-hand.
The "mlcg curve" isn't really a curve. It uses a multiplicative linear congruential generator to visit the points in a random-like pattern, always starting with (0, 0). It's meant to have bad memory locality, as a way to check if memory locality is an issue.
The source has been compiled on the Apple-provided clang for macOS and with gcc-11 and gcc-12, on both M1 and x86-64 CPUs. It should be portable to other platforms without much difficulty.
The file morton.c
uses __builtin_clz
if available, otherwise it
falls back to an included version from Hacker's Delight.
The file rectfillcurve_pext.h
will use the compiler's _pext_u64
intrinsic, if available, otherwise use a de-interleaving method from
Hacker's Delight.
The file mlcg.c uses the compiler's __uint128_t
extension if
detected, and if not falls back to its own implementation of 64-bit x
64-bit multiplication. By default it will use Montgomery modular
multiplication
but there is a compile-time option to use the compiler's __uint128_t
modulo operator instead. I found it was a few percent slower.
Use the --version
option to see the compiled-time values used for
pext and int128. For example:
% rectfillcurve --version
rectfillcurve 1.0 (pext_u64 in C, __uint128_t + Montgomery)
The integer values are available in the C API as
rectfillcurve_get_use_pext()
and rectfillcurve_get_use_int128()
and from the Python API as rectfillcurve.get_use_pext()
and
rectfillcurve.get_use_int128()
.
The "hilbert" curve is the generalized Hilbert 2D curve developed by Jakub Červený. That implemention is in Python and uses recursive iterators.
I have translated it to C, and implemented recursion using a manual
"stack". The generic API and rectfillcurve_hilbert
internally
allocate that space on the heap using malloc. The
rectfillcurve_hilbert_static
API reserves enough space to handle the
full range of supported sizes.
Červený's page notes two examples of previous work by Lutz Tautenhahn, Draw a Space-Filling Curve of Arbitrary Size and Zhang, Kamata and Ueshige, A Pseudo-Hilbert Scan Algorithm for Arbitrarily-Sized Rectangle Region.
I chose to use Gilbert2D because I could start from working code, rather than an algorithm description.
The "morton" curve is from the paper by Perdacher, Plant, and Böhm, Improved Data Locality Using Morton-order Curve on the Example of LU Decomposition.
If you use this curve in academic works, please cite:
Martin Perdacher, Claudia Plant, Christian Böhm
2020 IEEE International Conference on Big Data (Big Data)
DOI:10.1109/BigData50022.2020.9378385
https://ieeexplore.ieee.org/document/9378385
The algorithm description is in:
Algorithm 2 Microcell placement. Implemented as a preprocessor macro.
Details missing from the description were filled in from the data tables and the reference ZORDLOOP_START and ZORDLOOP_END macro definitions as part of the MortonLU package.
Please note that the paper encourages people to prefer the macro approach over the iterator approach used here in rectfillcurve. My timings here bear that out! Perhaps a future release will include helpful macros.
The MLCG implementation uses a multiplicative linear congruential generator to iterate over the cells in a random-like order. This is meant to stress test memory locality.
The basic generator is of the form:
x_{i+1} = x_i * a % m
where the values of a
and m
are judiciously chosen to generate
then full cycle of values 1<= x_i < m
, which are then converted to
coordinates.
The values of a
and m
are selected from Table 2 of:
Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure
Pierre L’Ecuyer,
Mathematics of Computation, 68, 225 (1999), 249–260.
https://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-00996-5/S0025-5718-99-00996-5.pdf
DOI: https://doi.org/10.1090/S0025-5718-99-00996-5
Only the first 'a' is used for each entry, which has the best M_8(m, a) figure of merit.
In short, the value w*h
is used to select the smallest appropriate
m
and associated a
. x_0
is set to 1, giving i,j = (0,0)
as the
first generated coordinate.
For each iteration, if x_i
is in range then use x_i-1
to compute
the coordinates based divison and modulo using either the width or
height (depending on the flip choice).
These are full cycle generators, so if the value gets back to 1 again then the iterator is done.
Unfortunately, the m
values from Table 2 are slightly below powers
of two, which means if the area (w*h
) is a power of two then this
method uses the next largest m
, which is almost twice as large as
needed, and about half the points get discarded.
In addition, if the area is over 2³²-6 then the m
used is over
32-bits in length, causing the code to use a code path with
intermediate 128-bit numbers.
Andrew Dalke <dalke@dalkescientific.com>, Andrew Dalke
Scientific AB holds the copyright to most of the source code, and
distributes it under the terms of the GNU General Public License
vertion 3 or (at your option) any later version. A copy is available
from the file LICENSE
.
Jakub Červený holds the copyright to portions of the file gilbert.c
,
and licenses under the terms of the BSD 2-Clause License. The full
license text is available in that file.
Martin Perdacher, Claudia Plant, and/or Christian Böhm may hold
copyright to parts of morton.c
. Martin Perdacher writes that
distributing their source as part of this package is "fine with me.
Feel free to use the code in anykind of way. If you make a publication
out of it, just cite our paper. That's it."
A small amount of source in morton.c
comes from Christopher
Swenson's sort.h, most of which in
turn derives from the book "Hacker’s Delight" by Henry S. Warren, Jr.
(more specifically, section 5–3, "Counting Leading 0’s", in the second
edition). I do not believe this source is covered under copyright, but
if there is, Swenson distributes it under the terms of the MIT
License.
There is an unused implementation of pext_u64
in
rectfillcurve_pext.h
which derives from Matthew Fioravante's
N3864 - A constexpr bitwise operations library for C++.
It is included in the generated binary only if by manually editing a
#if 0
, which I did for timing purposes. I do not believe this use is
covered under copyright. Even if I'm wrong, Fioravante distributes it
under the terms of the MIT
License.
The rectfillcurve_pext.h
also includes a de-interleaving function
from the book "Hacker’s Delight" by Henry S. Warren, Jr. It is the
standard parallel reduction method found in various sources, and I do
not believe its use here is under copyright protection.
Space-filling curves are a popular topic. See also:
generalized Hilbert curve by Jakub Červený, in Python, and the source of generalized Hilbert curve used here.
Libmorton by Jeroen Baert, a C++ header-only library to encode/decode Morton coordinates.
The book "Hacker’s Delight" by Henry S. Warren, Jr.
The book "Space-Filling Curves: An Introduction with Applications in Scientific Computing" by Michael Bader.