Wrapping C code for Python

Back to Research

Part of my PhD project involves simulating gravitational wave signals. The code that I use to do this has very kindly been lent to me by Zhoujian Cao and Wen-Biao Han, 2017, and it’s written in C. The software package that I use to perform inference on data using these waveforms is written in Python.

C is pretty fast, so it’s easy to see why many large-scale simulations are written in it (e.g., COMPAS). It’s not as fast as its lower-level grandfather, FORTRAN, which is still used to construct even more computationally vast programs like NBODY6. What C doesn’t have in speed, it makes up for in readability. But it’s not as readable as its fashionable, highly abstracted younger sister, Python.

We’ve chosen to write our inference software in Python because we want it to be user-friendly, with clear intent and functionality. However, much of the hard work is done under-the-hood by different modules that are written in C and ‘wrapped’ to make them callable from a Python script.

When I first started using the C waveform package, I was scared to tinker with it at all. Instead, I used the subprocess Python package to compile and build the C into an executable, which I could pass parameters to via Python to get a nice data file as output. I could then read in this output file, and delete it via subprocess to avoid clogging up my file system. This, however, was extremely slow. When I ran hundreds of jobs in parallel on the LIGO cluster at CIT, it wasn’t long before I’d completely overloaded the backend file server for my home directory, effectively causing my jobs to back up into an impassable virtual traffic jam.

Clearly I was going to have to get my hands dirty with the C code. I still wanted to avoid rewriting any of the existing code, because I’m not a waveforms expert and I didn’t want to introduce errors. So, I made a seperate module with a single function that does exactly what the executable does, but with a few tweaks that let me read the data from memory, rather than writing to a file.

The name of the waveform approximant I’m using is SEOBNRE, which stands for Spinning Effective-One-Body-Numerical-Relativity with Eccentricity. Catchy.

I pass pointers, hplusData and hcrossData, to the function that generates the SEOBNRE waveform, which has the catchy name XLALSimInspiralChooseTDWaveform. The pointers point to some spaces in the computer’s memory that I have allocated for holding the data produced by this function.

The function looks like this:

#include "seobnre.h"

void do_waveform_generation(REAL8 **hplusData, REAL8 **hcrossData,
                            REAL8 phiRef, REAL8 deltaT, REAL8 m1, REAL8 m2, 
                            REAL8 s1z, REAL8 s2z, REAL8 f_min, REAL8 e0, 
                            REAL8 distance, REAL8 inclination) {

    REAL8TimeSeries *hplus = NULL;
    REAL8TimeSeries *hcross = NULL;
    // convert solar masses and Mpc to kg and m
    REAL8 m1_kg = m1 * LAL_MSUN_SI;
    REAL8 m2_kg = m2 * LAL_MSUN_SI;
    REAL8 distance_m = distance * 1e6 * LAL_PC_SI;
    // in Panyi_elip.c
    XLALSimInspiralChooseTDWaveform(&hplus, &hcross, phiRef,
                    deltaT, m1_kg, m2_kg, 0.0, 0.0, s1z, 0.0, 0.0,
                    s2z, f_min, e0, distance_m, inclination
                    );
    *hplusData = hplus->data->data;
    *hcrossData = hcross->data->data;
}

The header file looks like this:

#include <Python.h>
#include <stdlib.h>
#include "Panyidatatypes.h"
#include "PanyiLALConstants.h"
#include "Panyi_elip.h"

typedef struct gravitationalWaveStrain {
    REAL8TimeSeries * hplus, * hcross;
}
Strain;

void do_waveform_generation(REAL8 **hplusData, REAL8 **hcrossData, 
                            REAL8 phiRef, REAL8 deltaT, REAL8 m1, REAL8 m2, 
                            REAL8 s1z, REAL8 s2z, REAL8 f_min, REAL8 e0, 
                            REAL8 distance, REAL8 inclination);

In order to make these callable by Python, I have to wrap the code. I do this using SWIG. In order to use SWIG, I need to set up a file that tells SWIG what’s important for my code. I do this using a SWIG interface file. It looks like this:

/* file : seobnre.i */

/* name of module to use*/
%module seobnre

%{
    /* Every thing in this file is being copied in
     wrapper file. We include the C header file necessary
     to compile the interface */
     #define SWIG_FILE_WITH_INIT
     #include "seobnre.h"

%}

/* if we want to interface all functions then we can simply
   include header file like this -
*/
%include "seobnre.h"

Here, I’m telling SWIG which module my code directly depends on. The SWIG_FILE_WITH_INIT line tells it to instantiate my module, seobnre, with an __init__.py file, so that I can use it as a module straight away.

I then run the command

swig -python -py3 seobnre.i

This enables SWIG to produce the file seobnre_wrap.c, which contains the wrapper code. This now needs to be compiled, along with all of the other .c files that my one function depends on.

So, I run the command

gcc -m64 -fPIC -c -O2 -I/home/$USER/anaconda3/include/python3.6m -I/usr/local/include Panyi_elip.c Panyicomm_elip.c PNwaveformPRD544813.c seobnre.c seobnre_wrap.c

This includes a few flags: -m64 tells the compiler, gcc, that I want the compiler code to be compatible with my 64-bit system architecture; -fPIC ensures that the compiled code will be position-independent, suitable for dynamic linking; -c specifies that I’m using C (rather than, say, C++ or C#); -O2 turns on a bunch of optimisation flags; -I tells the compiler where to look for files to include.

Once the .c files are compiled, I need to create a shared object. I do this with the command

gcc -m64 -shared -lpython3.6m Panyi_elip.o Panyicomm_elip.o PNwaveformPRD544813.o seobnre.o seobnre_wrap.o -L/home/$USER/anaconda3/lib -L/usr/local/lib -I/home/$USER/anaconda3/include/python3.6m -I/usr/local/include -o _seobnre.so -lgsl -lgslcblas -fPIC```

The new flags here are: -shared tells the compiler I want my compiled code to be a shared object; -lpython3.6m directs the compiler to my python 3.6 library; -L tells the compiler which libraries to use; -o _seobnre.so gives the compiler the name I want it to give to the shared object; -lgsl and -lgslcblas tell the compiler where my GSL libraries live.

Now I have a shared object. In order to use it, I need to set up my new module. So I create a file called setup.py, which can do this for me. It looks like this:

from distutils.core import setup, Extension

# name of module
name = "seobnre"

# version of module
version = "0.1"

# specify the name of the extension and source files
# required to compile this
ext_modules = Extension(
      name='_seobnre',
      sources=[
            "seobnre.i",
            "seobnre.c",
      ],
)

setup(name=name,
      version=version,
      ext_modules=[ext_modules])

Finally, to build the extension, I run the command

python setup.py build_ext --inplace -I/home/$USER/anaconda3/include -I/usr/local/include -L/home/$USER/anaconda3/lib -L/usr/local/lib -lgsl -lgslcblas -lpython3.6m

The new flag I’ve used here is: —inplace, which tells the compiler to put compiled code into the source directory.

Now I’ve wrapped my C code into a Python-useable module. Great! But it doesn’t stop there. I need the C code to understand the Python data types that I’m passing it if I want it to produce meaningful output. This is where the Python module ctypes comes in.

The code that I’ve written generates a gravitational waveform. In Python, I usually set up a dictionary of input parameters that describe the waveform, and I need to convert these into a format that C will understand. Equally, I need to make sure that the output of the C function is understood by my Python code.

Below is a script that I use to generate a waveform. It calls the function seobnre_bbh_with_spin_and_eccentricity. This code looks like pure Python from the outset.

"""
This script simply plots an SEOBNRE waveform in the time domain.
"""
import pycentricity.waveform as wf
import matplotlib.pyplot as plt

sampling_frequency = 4096.
minimum_frequency = 10.

injection_parameters = dict(
    mass_1=30,
    mass_2=35,
    eccentricity=0.1,
    luminosity_distance=440.0,
    theta_jn=0.4,
    psi=0.1,
    phase=1.2,
    geocent_time=0.0,
    ra=45,
    dec=5.73,
    chi_1=0.0,
    chi_2=0.0,
)

seobnre_waveform = wf.seobnre_bbh_with_spin_and_eccentricity(
    injection_parameters, sampling_frequency, minimum_frequency
)

fig = plt.figure()
plt.plot(seobnre_waveform['plus'], label='plus')
plt.plot(seobnre_waveform['cross'], label='cross')
plt.xlabel('time [$s$]')
plt.ylabel('strain')
plt.legend()
plt.savefig('seobnre_waveform', bbox_inches='tight')
plt.clf()

That function, seobnre_bbh_with_spin_and_eccentricity, is defined as follows in the waveform.py file inside the pycentricity folder.

def seobnre_bbh_with_spin_and_eccentricity(
    parameters, sampling_frequency, minimum_frequency
):
    """
    Return the  waveform polarisations.
    :param parameters: dict
        dictionary of waveform parameters
    :param sampling_frequency: int
        frequency with which to 'sample' the waveform
    :param minimum_frequency: int
        minimum frequency to contain in the signal
    :return:
        seobnre: dict
            time-domain waveform polarisations
    """
    deltaT = 1 / sampling_frequency
    hp, hc = generate_seobnre_waveform_from_c_code(parameters, deltaT, minimum_frequency)
    seobnre_waveform = {
        "plus": new_waveform_list(hp),
        "cross": new_waveform_list(hc)
    }
    return seobnre_waveform

Again, pure Python. But this function calls two other functions. The first is generate_seobnre_waveform_from_c_code, which is defined (within the same file) as:

def generate_seobnre_waveform_from_c_code(parameters, deltaT, minimum_frequency):
    """
    Return the waveform polarisations simulated by the SEOBNRe c-code.
    :param parameters: dict
        dictionary of waveform parameters
    :param deltaT: float
        time step
    :param minimum_frequency: int
        minimum frequency to contain in the signal
    :return:
        hp, hc: c_double[]
            time-domain plus and cross waveform polarisations
    """
    seobnre = cdll.LoadLibrary(
                      os.path.dirname(os.path.realpath(__file__)) + '/_seobnre.so'
                  )
    if parameters['chi_1'] > 0.6 or parameters['chi_2'] > 0.6:
        print('WARNING:: Dimensionless component spins larger than 0.6.')
        print('WARNING:: Not attempting to generate waveform. ')
        return None, None
    else:
        c_injection_parameters = deepcopy(parameters)
        for key in parameters:
            c_injection_parameters[key] = c_double(c_injection_parameters[key])
        c_deltaT = c_double(deltaT)
        c_minimum_frequency = c_double(minimum_frequency)
        hp = POINTER(c_double)()
        hc = POINTER(c_double)()
        seobnre.do_waveform_generation.argtypes = [
            POINTER(POINTER(c_double)),
            POINTER(POINTER(c_double)),
            c_double,
            c_double,
            c_double,
            c_double,
            c_double,
            c_double,
            c_double,
            c_double,
            c_double,
            c_double,
        ]
        seobnre.do_waveform_generation(
            pointer(hp),
            pointer(hc),
            c_injection_parameters['phase'],
            c_deltaT,
            c_injection_parameters['mass_1'],
            c_injection_parameters['mass_2'],
            c_injection_parameters['chi_1'],
            c_injection_parameters['chi_2'],
            c_minimum_frequency,
            c_injection_parameters['eccentricity'],
            c_injection_parameters['luminosity_distance'],
            c_injection_parameters['theta_jn']
        )
        return hp, hc

cdll comes from the module ctypes. The function LoadLibrary loads in the shared object _seobnre.so, which I made earlier with SWIG. Once I’ve loaded that shared object, I can use seobnre as I might use any other module.

ctypes is able to convert Python types to C types. For every one of my waveform parameters, as well as the variables deltaT and minimum_frequency, I use c_double() to convert from Python floats to C doubles.

I also use POINTER to declare two variables, hp and hc, as pointers to data of the type c_double. This means that they point to locations in the memory where c_double data is held. This is really convenient. It means that I can use the C function to populate those specific locations with data, and then simply check what’s held in that bit of memory with Python.

I do that with the second function: new_waveform_list.

def new_waveform_list(old_c_list):
    """
    Transform a c-type list to a python list.
    :param old_c_list: list of c_double
    :return:
        new_list: list
            python list
    """
    new_list = []
    if old_c_list is not None:
        for element in old_c_list:
            if element == 0.0:
                break
            new_list.append(element)
    return new_list

Since I know that my pointers are pointing to lists held in memory, I can construct new Python lists containing the data. The check to see if the element is ‘0.0’ is essential to avoid segmentation faults, because otherwise I’m trying to read from parts of the memory that haven’t been filled with data.

The time taken in setting up this wrapper has been regained many times over in the speed-ups achieved by cutting out file I/O. I hope this post helps you if you’re looking to speed up your Python by introducing some efficient hidden C code!

Back to Research

Isobel Marguarethe Romero-Shaw