Generate space-filling curves for arbitrary-sized rectangles.
don't know what happened with branching. Hope this works.
bringing in the support too used to generate the Hilbert (and perhaps Peano) curves from simple rule defs.
updated with the corrected Hilbert classic timing numbers
updated with the correct timings for Hilbert classic
fixed typo in error message, commented out unneeded code
fixed timing error that caused classic Hilbert to actually report classic Morton
added reference to a paper describing this specific method for Hilbert generation
change USE_PEXT to RECTFILLCURVE_USE_PEXT in the migration to supplying a macro version
switch from Makefile to cmake
refer to header files via <rectfillcurve/*.h>
moving C code to src/
moved header files to include/rectfillcurve/
Added tag rectfillcurve-1.0 for changeset 9e5a6bdd73ba


browse log
browse .tar.gz



#rectfillcurve 1.0

A standalone C library and Python extension module for generating space-filling curves for arbitrary-sized rectangles through an iterator API. (See Background for details.)

graphical depiction of the available curves

This package implements several curves:

  • "row" - for each row, visit each column in forward order;

  • "zigzag" - for each row, visit each column in alternating order;

  • "hilbert_classic" - the classic Hilbert curve, which requires a square with lengths which are a power of 2 in size (or 0x0).

  • "hilbert" - an implementation of Jakub Červený's Generalized Hilbert space-filling curve algorithm for 2D;

  • "morton_classic" - the classic Morton/Z-order curve, which requires a square with lengths which are a power of 2 in size (or 0x0).

  • "morton" - an implementation of Perdacher et al's generalized Morton/Z-order method;

  • "mlcg" - use a multiplicative linear congruential generator to visit the coordinates in a random-like order.

The rectangles are described by (width, height), with coordinates (i, j) where 0 <= i < width and 0 <=j < height.

There is also an option to flip the coordinates. The flipped "row" and "zigzag" curves processes by columns instead of by row. The flipped "hilbert" curve uses the less preferred direction. The flipped "morton" curve uses a И-curve (also called an N-curve) instead of a Z-curve. The flipped "mlcg" curve swaps the calculation used to convert the internal seqence to an (i, j) coordinate.

The first position for all curves is (0, 0).


If you really need the performance, include the curve generation code in your main loop, rather than use an iterator from this package.

The function-based iterator API adds significant overhead compared to a basic for-loop. If we normalize the for-loop time to "1 second", then the "row" and "zigzag" iterators take about 4.5 seconds, and the "morton_classic" and "hilbert_classic" methods take about 6 seconds and 9 seconds, respectively.

The generalized "morton" iterator isn't much slower, at 6.3 seconds, but the generalized "hilbert" iterate takes significantly longer, at 27 seconds.

The "mlcg" iterator ranges from 16 seconds to 64 seconds, depending on value of area = w*h; it's particularly slow if the area is a power of two and if the area is larger than 2³²-6.

See Performance Analysis, below, for plots.

#Can you help?

I'm disappointed that the generalized Hilbert curve is so slow, at about 1/5th the performance of the "classic" Hilbert curve. I suspect there are faster ways to generate this curve for rectangles.

The "mlcg" curve is even slower. I want a way to visit every point in a random-like pattern and thought it would be fast because it's only a multiplication and a modulo. In practice, I have to discard about half the points, and the multiplication can require 128-bit values.

Can you point me to a better implementation or approach of either of these?


First, clone the Mercurial repository:

% hg clone

Now you have two options. You can compile the executables and shared library, or your can build and install the Python extension.

#Executables and shared library

Running make with no options creates the binaries rectfillcurve and time_rectfillcurve, and a the shared library

% make

If you are compiling on x86-64 and know that your processor supports the PEXT instruction then you may need to let your compiler know that it should enable the _pext_u64 intrinsic. This makes the classic Morton algorithm about 5% faster overall. One way to do this with gcc is:

% make CC="gcc -mbmi2"

There is no make install or equivalent, but there is a make clean.

Here is the --help for rectfillcurve:

% ./rectfillcurve --help
Usage: ./rectfillcurve [--row|--zigzag|--hilbert|--morton|--mlcg] [--flip] [--output FILENAME] [W [H]]

  Use a space-filling curve to visit each coordinate of a WxH rectangle.

  --row           For each row, visit each column, in increasing column order.
  --zigzag        For each row, visit each column, in alternating column order.
  --hilbert       Use the generalized Hilbert method (the default).
  --morton        Use the generalized Morton/Z-curve method.
  --mlcg          Use the random-like multiplicative linear congruential generator.
  --flip          Swap the row and column visit orders.
  --output / -o   Write the output to the named file (default: stdout).

Each output line contains a visit coordinate, in visit order, as (i, j) pairs,
written as two space-separated integers. The i coordinates - the columns - range
from 0 <= i < W. The j coordinates - the rows - range from 0 <= j < H.

For example, the following generates the Morton/Z-curve coordinates for a 5x3 rectangle:

% ./rectfillcurve --morton 5 3
0 0
0 1
1 0
1 1
0 2
1 2
2 0
2 1
3 0
3 1
2 2
3 2
4 0
4 1
4 2

See below for how to use the C API.


The distribution includes the program, styled along the line's of Jakub Červený's plotpath.m to use matplotlib to plot the 2D curve:

% ./rectfillcurve --morton 5 3 | python -o morton.png
Saved to 'morton.png'

The graph is in math coordinates, with i increasing on the x-axis and j increasing on the y-axis. Rotate the graph 90° clockwise if your view of a table has the origin in the upper-left with i as rows and j a columns.

#Python extension

You can install the package as a Python extension. The following will install the Python modules _rectfillcurve and rectfillcurve, and to install the script entry point rectfillcurve:

% python -m pip install .

(It is possible to set the the CC environment variable to have the compiler use the BMI2 intrinsics, but the speedup isn't really visible given Python's iterator overhead.)

This step requires the cffi package to generate the Python/C bindings. It's configured as a dependency so if PyPI is available it should be installed automatically.

The installed script works in a similar fashion to the rectfillcurve executable of the same name:

% rectfillcurve --help
usage: rectfillcurve [-h] [--generic] [--row] [--zigzag] [--hilbert]
  [--hilbert-static] [--morton] [--mlcg] [--flip] [--version] [w] [h]

Generate rectangle-filling curve coordinates.

positional arguments:
  w                 The width of the rectangle (0<=i<w) (default: 5)
  h                 The height of the rectangle (i<=j<h) (default: w)

  -h, --help        show this help message and exit
  --generic         Use the generic API instead of the curve-specific API.
  --row             Iterate in row order, starting from j=0 for each row.
  --zigzag          Iterate in row order, going back and forth for each row.
  --hilbert         Use the generalized Hilbert order. (default curve)
  --hilbert-static  Use the generalized Hilbert order with static memory allocation.
  --morton          Use Morton/Z-order.
  --mlcg            Use a random-like multiplicative linear congruential generator.
  --flip            Use column major order instead of row major.
  --version         show program's version number and exit

See below for how to use the Python API.


While the Makefile will generate a, I don't much experience using it and I don't have an installer for it because for my purposes I'll be compiling the source as part of the package rather than as a library. I will use that library for these examples.

There are two APIs: a generic one which lets you specify the curve as a parameter, and a lower-level API specific for each curve. For the Hilbert and Morton options, the generic version will use the "classic" implementations if the size is a square with a length which is a power of two.

#generic API

The generic API looks like:

// Call this 'hilbert_2_3.c' and compile using:
//   cc hilbert_2_3.c -I. -L. -lrectfillcurve -o hilbert_2_3

#include <stdio.h>
#include <rectfillcurve.h>

int main(void) {
  rectfillcurve curve;
  int i, j;
  if (rectfillcurve_init(&curve, RECTFILLCURVE_HILBERT, 2, 3, 0)) {
    fprintf(stderr, "Unexpected error!\n");
    return 0;
  while (rectfillcurve_next(&curve, &i, &j)) {
    printf("%d %d\n", i, j);
  // free internal data allocated by rectfillcurve_init().
  return 0;

Running the output program gives:

% ./hilbert_2_3
0 0
1 0
1 1
1 2
0 2
0 1

The rectfillcurve_free() isn't strictly needed as the program will exit so it doesn't matter if the program leaks memory.

#curve-specific API

If you know the curve you want to use, and don't need the flexibility of choosing the curve at run-time, then you can use the curve-specific API. In general the functions are:

  • rectfillcurve_{curve}_clear -- clear the data structure
  • rectfillcurve_{curve}_init -- initialize to a given width, height, and flip
  • rectfillcurve_{curve}_next -- get the next (i, j) coordinate

The following typedef structures use fixed-memory space and can be allocated on the stack stack, with no malloc calls:

  • rectfillcurve_row
  • rectfillcurve_zigzag
  • rectfillcurve_hilbert_static
  • rectfillcurve_hilbert_classic
  • rectfillcurve_morton_classic
  • rectfillcurve_morton
  • rectfillcurve_mlcg

The rectfillcurve_hilbert_static type reserves enough space to handle the full ranges of sizes, up to 2³¹-1 x 2³¹-1. Alternatively, the rectfillcurve_hilbert type will malloc space based on the sizes given. You will need to use rectfillcurve_hilbert_free to free this space.

The following example uses the API for the Morton curve.

// Call this 'morton_2_3.c' and compile using:
//   cc morton_2_3.c -I. -L. -lrectfillcurve -o morton_2_3

#include <stdio.h>
#include <rectfillcurve.h>

int main(void) {
  rectfillcurve_morton curve;
  int i, j;
  if (rectfillcurve_morton_init(&curve, 2, 3, 0)) {
    fprintf(stderr, "Unexpected error!\n");
    return 0;
  while (rectfillcurve_morton_next(&curve, &i, &j)) {
    printf("%d %d\n", i, j);
  return 0;

#Python API

The rectfillcurve module contains the user-facing API, plus the "main" entry-point for the installed script.

The functions hilbert, hilbert_classic, hilbert_static, mlcg, morton, morton_classic, row, and zigzag all take a width, and height and optional value for flip, and return the appropriate iterator over that space:

>>> import rectfillcurve
>>> rectfillcurve.morton(5, 3)
<generator object morton.<locals>.morton at 0x10d773060>
>>> list(rectfillcurve.morton(5, 3))
[(0, 0), (0, 1), (1, 0), (1, 1), (0, 2), (1, 2), (2, 0), (2, 1), (3, 0), (3, 1), (2, 2), (3, 2), (4, 0), (4, 1), (4, 2)]
>>> list(rectfillcurve.morton(5, 3, flip=1))
[(0, 0), (1, 0), (0, 1), (1, 1), (0, 2), (1, 2), (2, 0), (3, 0), (2, 1), (3, 1), (4, 0), (4, 1), (2, 2), (3, 2), (4, 2)]

The function rectfillcurve.rectfillcurve() takes an additional curve_type parameter and determines the most appropriate implementation. For example, if you specify "morton" and the size is 128x128 then it will use the "morton classic" implementation rather than the generic morton implementation for rectangles.

>>> import rectfillcurve
>>> rectfillcurve.rectfillcurve (3, 2, curve_type="zigzag", flip=1)
<generator object rectfillcurve.<locals>.rectfillcurve at 0x10bc3f220>
>>> list(rectfillcurve.rectfillcurve(3, 2, curve_type="zigzag", flip=1))
[(0, 0), (1, 0), (2, 0), (2, 1), (1, 1), (0, 1)]


To run the unit tests, first build the Python extension then do:

% python
Ran 129 tests in 0.442s

The skipped tests are used either for benchmarking or to test that internal variables support 64-bit integers. These tests take minutes to run. One of the tests checks that each MLCG generators handles the entire space. No one has the patience for the full test, which would generate some 2**64 points and take about a century!


The program time_rectfillcurve times curve generation across a range of sizes.

By default it uses a series of rectangles where the width is about π times larger than the height and where each rectangle is about sqrt(10) larger than the previous rectangle. There are also options to use squares, and to use squares where each length is a power of two, as well as to use a user-specified size.

By default it times all of the curves which are appropriate for the specified size(s). You can instead specify which curves to use.

By default it repeats the timings 3 times. Use --repeat N to change that value.

Here is the full --help:

Usage: ./time_rectfillcurve [--repeat N] [curve options] [size options | [W [H]]

  Run space-filling curve timings on different WxH sizes.

  --repeat N         repeat up to N times (default: 3)
  --version          print the version and exit
  --help             print this help and exit

Curve options (if not specified, use all that are relevant to the size(s)):
  --for-loop         classic for loop (no iterator overhead)
  --row              row-major iterator
  --zigzag           row-major iterator, alternating direction each row
  --hilbert-classic  Hilbert curve (only if w = h and is a power of 2)
  --hilbert          generalized Hilbert curve
  --morton-classic   Morton/Z-order curve (only if w = h and is a power of 2)
  --morton           generalized Morton/Z-order curve
  --mlcg             random-like visits based on an MLCH

Size options:
  --user             user-specified size
  --powers-of-two    19 squares with lengths which are a power of 2
  --squares          20 squares with area ~10**(1/2) larger than the previous
  --rectangles       37 rectangles with area ~10**(1/4) larger than the previous
  W [H]              specify the size (automatically sets --user)

#What is actually timed?

The --for-loop curve is the basic row-major for-loop, with an extra test to help prevent the compiler from optimizing the code away.

long long benchmark_for_loop(int w, int h, int bad_i, int bad_j) {
  long long count = 0;
  for (int i=0; i<w; i++) {
    for (int j=0; j<h; j++) {
      if (i == bad_i || j == bad_j) {
        return INVALID;
  return count;

It's there to give an idea of the call overhead for the iterator API.

The curve implementations are structurally identical, and created with the following macro:

#define BENCHMARK(NAME, TYPENAME)                               \
long long benchmark_##NAME(int w, int h, int bad_i, int bad_j) {\
  int status, i, j;                                             \
  long long count = 0;                                          \
  TYPENAME iter;                                                \
  status = TYPENAME##_init(&iter, w, h, 0);                     \
  if (status != RECTFILLCURVE_OK) {                             \
    return INVALID;                                             \
  }                                                             \
  while (TYPENAME##_next(&iter, &i, &j)) {                      \
    if ((i == bad_i) || (j == bad_j)) {                         \
      return INVALID;                                           \
    }                                                           \
    count++;                                                    \
  }                                                             \
  if (count != ((long long) w) * ((long long) h)) {             \
      return INVALID;                                           \
  }                                                             \
  return count;                                                 \

BENCHMARK(row, rectfillcurve_row)
BENCHMARK(zigzag, rectfillcurve_zigzag)
BENCHMARK(hilbert_classic, rectfillcurve_morton_classic)
BENCHMARK(hilbert, rectfillcurve_hilbert_static)
BENCHMARK(morton_classic, rectfillcurve_morton_classic)
BENCHMARK(morton, rectfillcurve_morton)
BENCHMARK(mlcg, rectfillcurve_mlcg)

The timer uses CLOCK_MONOTONIC to measure the elapsed time before and after each benchmark call, then figures out the change in time:

        clock_gettime(CLOCK_MONOTONIC, &start_time);
        long long count = method->benchmark(w, h, w, h);
        clock_gettime(CLOCK_MONOTONIC, &end_time);
        double dt = elapsed_time(start_time, end_time);

#Output from the timing program

The output is a space-separated set of columns, with a header row followed by data rows. The column header names are:

  • "w" for width

  • "h" for height

  • "method" for the name of the method being timed

  • "time" for the total time in seconds

  • "million/sec" for the number of iteration points visited per second, measured in millions per second. (This can be "inf" for small cases taking less than the time quantum of the monotonic clock.)

The timing should be linear in the area of the rectangle, so "million/sec" should be constant.

Here's an example of the output for a 32768x32768 square:

% ./time_rectfillcurve --repeat 1 32768
w h method time million/sec
32768 32768 for_loop 0.526962 2037.61
32768 32768 row 2.287110 469.48
32768 32768 zigzag 2.358536 455.26
32768 32768 hilbert_classic 4.696979 228.60
32768 32768 hilbert 13.610767 78.89
32768 32768 morton_classic 3.010689 356.64
32768 32768 morton 3.160907 339.69
32768 32768 mlcg 18.119807 59.26

#Performance Analysis

These numbers are generated on my MacBook Pro (13-inch, M1, 2020) laptop, and available in "timings/". The "milllion/sec" plots for the three data sets are there as well, and shown here:

Scaling for squares with sides that are a power of 2

Scaling for squares with sides that are a power of sqrt(10)

Scaling for rectangles

The "for_loop" can process about 2.1 billion points per second.

"row" and "zigzag" can iterate over about 470 million points per second.

"morton_classic" can iterate about 350 million points per second, and "morton" can iterate about 335 million points per second.

"hilbert_classic" can iterate about 230 million points per second, while "hilbert" is only about 1/3rd that speed, at 78 million points per second.

"mlcg" varies between 33 and 130 million points per second. It's slowest for powers of two (likely because about half the points are be discarded), and when the total area (w*h) is larger than 2³²-6 (those size require emulating 128-bit integers).


There are many ways to visit each point on a grid once. The simplest is to visit all of the columns of the first row, then the second row, and so on. For some tasks - the standard example is matrix multiplication - the jump from one row to the next ends up loading a lot of data from relatively slow RAM, rather than being able to re-use data from the faster memory cache.

If one traversal order is more effective at using memory cache, causing the program to be faster than another traversal order, then we say the first order has better memory locality than the second.

Space-filling curves are frequently used to generate curves with better memory locality. The three most common ones are the Z-order curve (also called Morton order and Lebesgue curve), the Peano curve, and the Hilbert curve.

The simplest implementations of these three curves only works on squares of exactly the right size. For example, the Hilbert and the Morton/Z-curves are easy to construct on a square with sizes which are a power of 2 in length.

It's trickier to handle arbitrary-sized grids, with fewer resources which describe how to handle them. See below for the sources for the "hilbert" and "morton" implementations.

Space-filling curves don't need to be as complicated as those. The "zigzag" curve (I got the name from Aldo Cortesi's scurve project) simply snakes back and forth in a boustrophedon pattern. This sort of curve might be more appropriate for the outer-join of two vectors, so don't discount its simplicity out-of-hand.

The "mlcg curve" isn't really a curve. It uses a multiplicative linear congruential generator to visit the points in a random-like pattern, always starting with (0, 0). It's meant to have bad memory locality, as a way to check if memory locality is an issue.


The source has been compiled on the Apple-provided clang for macOS and with gcc-11 and gcc-12, on both M1 and x86-64 CPUs. It should be portable to other platforms without much difficulty.

The file morton.c uses __builtin_clz if available, otherwise it falls back to an included version from Hacker's Delight.

The file rectfillcurve_pext.h will use the compiler's _pext_u64 intrinsic, if available, otherwise use a de-interleaving method from Hacker's Delight.

The file mlcg.c uses the compiler's __uint128_t extension if detected, and if not falls back to its own implementation of 64-bit x 64-bit multiplication. By default it will use Montgomery modular multiplication but there is a compile-time option to use the compiler's __uint128_t modulo operator instead. I found it was a few percent slower.

Use the --version option to see the compiled-time values used for pext and int128. For example:

% rectfillcurve --version
rectfillcurve 1.0 (pext_u64 in C, __uint128_t + Montgomery)

The integer values are available in the C API as rectfillcurve_get_use_pext() and rectfillcurve_get_use_int128() and from the Python API as rectfillcurve.get_use_pext() and rectfillcurve.get_use_int128().

#Generalized Hilbert algorithm

The "hilbert" curve is the generalized Hilbert 2D curve developed by Jakub Červený. That implemention is in Python and uses recursive iterators.

I have translated it to C, and implemented recursion using a manual "stack". The generic API and rectfillcurve_hilbert internally allocate that space on the heap using malloc. The rectfillcurve_hilbert_static API reserves enough space to handle the full range of supported sizes.

Červený's page notes two examples of previous work by Lutz Tautenhahn, Draw a Space-Filling Curve of Arbitrary Size and Zhang, Kamata and Ueshige, A Pseudo-Hilbert Scan Algorithm for Arbitrarily-Sized Rectangle Region.

I chose to use Gilbert2D because I could start from working code, rather than an algorithm description.

#Generalized Morton algorithm

The "morton" curve is from the paper by Perdacher, Plant, and Böhm, Improved Data Locality Using Morton-order Curve on the Example of LU Decomposition.

If you use this curve in academic works, please cite:

Martin Perdacher, Claudia Plant, Christian Böhm
2020 IEEE International Conference on Big Data (Big Data)

The algorithm description is in:

Algorithm 2 Microcell placement. Implemented as a preprocessor macro.

Details missing from the description were filled in from the data tables and the reference ZORDLOOP_START and ZORDLOOP_END macro definitions as part of the MortonLU package.

Please note that the paper encourages people to prefer the macro approach over the iterator approach used here in rectfillcurve. My timings here bear that out! Perhaps a future release will include helpful macros.

#MLCG algorithm

The MLCG implementation uses a multiplicative linear congruential generator to iterate over the cells in a random-like order. This is meant to stress test memory locality.

The basic generator is of the form:

  x_{i+1} = x_i * a % m

where the values of a and m are judiciously chosen to generate then full cycle of values 1<= x_i < m, which are then converted to coordinates.

The values of a and m are selected from Table 2 of:

Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure
Pierre L’Ecuyer,
Mathematics of Computation, 68, 225 (1999), 249–260.

Only the first 'a' is used for each entry, which has the best M_8(m, a) figure of merit.

In short, the value w*h is used to select the smallest appropriate m and associated a. x_0 is set to 1, giving i,j = (0,0) as the first generated coordinate.

For each iteration, if x_i is in range then use x_i-1 to compute the coordinates based divison and modulo using either the width or height (depending on the flip choice).

These are full cycle generators, so if the value gets back to 1 again then the iterator is done.

Unfortunately, the m values from Table 2 are slightly below powers of two, which means if the area (w*h) is a power of two then this method uses the next largest m, which is almost twice as large as needed, and about half the points get discarded.

In addition, if the area is over 2³²-6 then the m used is over 32-bits in length, causing the code to use a code path with intermediate 128-bit numbers.

Andrew Dalke <>, Andrew Dalke Scientific AB holds the copyright to most of the source code, and distributes it under the terms of the GNU General Public License vertion 3 or (at your option) any later version. A copy is available from the file LICENSE.

Jakub Červený holds the copyright to portions of the file gilbert.c, and licenses under the terms of the BSD 2-Clause License. The full license text is available in that file.

Martin Perdacher, Claudia Plant, and/or Christian Böhm may hold copyright to parts of morton.c. Martin Perdacher writes that distributing their source as part of this package is "fine with me. Feel free to use the code in anykind of way. If you make a publication out of it, just cite our paper. That's it."

A small amount of source in morton.c comes from Christopher Swenson's sort.h, most of which in turn derives from the book "Hacker’s Delight" by Henry S. Warren, Jr. (more specifically, section 5–3, "Counting Leading 0’s", in the second edition). I do not believe this source is covered under copyright, but if there is, Swenson distributes it under the terms of the MIT License.

There is an unused implementation of pext_u64 in rectfillcurve_pext.h which derives from Matthew Fioravante's N3864 - A constexpr bitwise operations library for C++. It is included in the generated binary only if by manually editing a #if 0, which I did for timing purposes. I do not believe this use is covered under copyright. Even if I'm wrong, Fioravante distributes it under the terms of the MIT License.

The rectfillcurve_pext.h also includes a de-interleaving function from the book "Hacker’s Delight" by Henry S. Warren, Jr. It is the standard parallel reduction method found in various sources, and I do not believe its use here is under copyright protection.

#Other resources

Space-filling curves are a popular topic. See also:

  • generalized Hilbert curve by Jakub Červený, in Python, and the source of generalized Hilbert curve used here.

  • Libmorton by Jeroen Baert, a C++ header-only library to encode/decode Morton coordinates.

  • The book "Hacker’s Delight" by Henry S. Warren, Jr.

  • The book "Space-Filling Curves: An Introduction with Applications in Scientific Computing" by Michael Bader.