On the integration of command line tools

2 minute read

Many workflows require the integration of compiled programs or scripts that cannot be immediately integrated with signac’s Python interface.

There are basically four alternatives to do so:

  1. Use signac’s native command line interface,
  2. generate a shell script for execution.
  3. use subprocess forking,
  4. use signac-flow.

We will leverage the standard ideal gas example, but we will assume that we need to interface with the idg program instead of a Python script. The idg program expects the system size N, pressure p, and thermal energy kT to calculate the volume V according to the ideal gas law. For example:

$ idg 1000 2.0 1.0
2000.0

The following demonstrations all basically implement the same workflow:

signac’s CLI

N=1000
kT=1.0
for p in 0.1 1.0 10.0; do
WS=$(cat << EOF | signac job -cw
  {"p": ${p}, "N": ${N}, "kT": ${kT}}
EOF
)
./idg ${N} ${kT} ${p} > ${WS}/V.txt
done

Here we use the heredoc syntax to specify the state point in place, avoiding the awkward escaping of quotes, that would otherwise be needed.

Generate a shell script

Alternativly we can use a mixed Python-shell approach, where we use the Python script and signac’s Python interface to generate a shell script:

import signac

IDG = './idg {job.sp.N} {job.sp.kT} {job.sp.p} > {job.ws}/V.txt'

project = signac.get_project()

for p in 0.1, 1.0, 10.0:
    sp = {'N': 1000, 'kT': 1.0, 'p': p}
    job = project.open_job(sp)
    job.init()
    print(IDG.format(job=job))

Executing this script, will generate the necessary commands:

./idg 1000 1.0 0.1 > /home/johndoe/my_project/workspace/5a6c687f7655319db24de59a2336eff8/V.txt
./idg 1000 1.0 1.0 > /home/johndoe/my_project/workspace/ee617ad585a90809947709a7a45dda9a/V.txt
./idg 1000 1.0 10.0 > /home/johndoe/my_project/workspace/5a456c131b0c5897804a4af8e77df5aa/V.txt

We can execute these commands by piping them into a shell of our choosing, e.g., bash:

$ python run.py | /bin/bash

or into a script, which we submit to an HPC cluster scheduler:

$ python run.py > submit.sh
$ qsub submit.sh

In the latter case, we would need to add the necessary PBS instructions to the script’s header.

Use process forking

This approach is very similar to the previous example, but instead we fork the required processes immediately with the subprocess package:

import signac
from subprocess import run

IDG = './idg {job.sp.N} {job.sp.kT} {job.sp.p} > {job.ws}/V.txt'

project = signac.get_project()

for p in 0.1, 1.0, 10.0:
    sp = {'N': 1000, 'kT': 1.0, 'p': p}
    job = project.open_job(sp)
    job.init()
    run(IDG.format(job=job), shell=True)

The subprocess.run() command was introduced with Python version 3.5, you would use subprocess.call or similar with previous versions.

Use signac-flow

Finally, if we already use signac-flow for our workflow implementation, we just add the command as a regular operation:

# project.py
from flow import FlowProject
# import flow.environments  # uncomment to use default environments


class Project(FlowProject):

    def __init__(self, *args, **kwargs):
        super(Project, self).__init__(*args, **kwargs)

        self.add_operation(
            name='calc-volume',
            cmd='idg {job.sp.N} {job.sp.kT} {job.sp.p} > {job.wd}/V.txt')


if __name__ == '__main__':
    Project().main()

This workflow could be executed with

$ python project.py run

or submitted to an HPC cluster scheduler with

$ python project.py submit

Updated: