How to read and write tiffstacks in Python
Jul 24, 2019
Allard Hendriksen
4 minute read

During a PhD in the computational imaging group, you will not escape reading and writing copious amounts of tiff files. In this blog post, I will run through the most basic (and most helpful!) python libraries and snippets to help you out.

pathlib

Pathlib is a library designed to make handling file paths easier in python. You do not even have to install it: it is part of the Python standard library! You can convert any valid string path to a path as follows:

from pathlib import Path
print(Path("~/projects/blog"))
~/projects/blog

You might already have found out that most libraries will not read a path that contains a tilde (~). Therefore, you may expand the tilde using expanduser. Moreover, if your path contains symbolic links, or it is a relative path, you can use resolve to obtain an absolute file path:

from pathlib import Path

print(Path("~"))
print(Path("~/projects").expanduser())
print(Path("~/projects").expanduser().resolve())
print(Path("../").resolve())
.
~
/ufs/hendriks/projects
/export/scratch1/hendriks/projects
/export/scratch1/hendriks/projects/blog

The most useful feature of pathlib is the easy file globbing, which allows you to quickly select multiple files, for instance:

from pathlib import Path

dataset_dir = Path("~/datasets/SophiaBeads_1024_averaged/").expanduser().resolve()
bead_tiffs = dataset_dir.glob("*.tif")
print("Unordered:")
print([p.name for p in bead_tiffs][:2])

# note that the glob does not return the files in any particular
# ordering, you must SORT THEM first:
bead_tiffs = sorted(dataset_dir.glob("*.tif"))
print("\nOrdered:")
print([p.name for p in bead_tiffs][:2])
Unordered:
['SophiaBeads_1024_averaged_0846.tif', 'SophiaBeads_1024_averaged_0468.tif']

Ordered:
['SophiaBeads_1024_averaged_0001.tif', 'SophiaBeads_1024_averaged_0002.tif']

Finally, you can easily create directories using pathlib:

from pathlib import Path
# Make a directory. By default, mkdir returns an error if the
# directory already exists and it does not create directories
# recursively. You can enable both behaviours as follows:
Path("/tmp/test_dir/sub_dir").mkdir(exist_ok=True, parents=True)

tifffile

To read tiff files, I always use the library that Daan uses tifffile. This library is easily installable using Conda:

conda install -c conda-forge tifffile

Reading and writing files is really easy:

import tifffile
from pathlib import Path
import numpy as np

tmp_dir = Path("/tmp/tmp_tiff_dir/")
tmp_dir.mkdir(exist_ok=True)

img = np.zeros((10, 10), dtype=np.float32)

img_path = tmp_dir / "img.tif"

# tifffile does not support a `Path' as argument, convert to string
# first. Also, make sure to convert the image to float32. By default,
# numpy creates float64 arrays. This does not only waste disk space,
# but is also annoying because some tiff viewers (such as imageJ) do not
# support float64.
tifffile.imsave(str(img_path), img.astype(np.float32))

# This loads an image:
img_loaded = tifffile.imread(str(img_path))

assert np.all(img_loaded == img), "images do not match"

tqdm: progress bars

Some things just take long, and you might want to know if things are progressing fast enough. Instead of printing something to the terminal, you may want to see a progress bar. This is really easy in Python. Just use tqdm. Install with Conda:

conda install tqdm

Use as follows:

from tqdm import tqdm
import time

for i in tqdm(range(10)):
    # Do something long-winded:
    time.sleep(0.5)

Check out the tqdm GitHub page for more information and some spiffy animations.

enumerate: smarter looping

Looping over things in Python is really easy using “for .. in ..":

for c in "hello":
    print(c)
h
e
l
l
o

Sometimes, you also want to know what index the current thing your are looping over has. This is what th enumerate function is for. It makes it easy to loop over things and get the index. Look at this, for instance:

import numpy as np

img = np.random.normal(size=(3, 10))

print("Initial version:")
i = 0
for row in img:
    print(i, len(row), np.mean(row))
    i += 1

print("Idiomatic:")
for i, row in enumerate(img):
    print(i, len(row), np.mean(row))


Initial version:
0 10 -0.19616953681937158
1 10 -0.10755020911455213
2 10 -0.4219351202831644
Idiomatic:
0 10 -0.19616953681937158
1 10 -0.10755020911455213
2 10 -0.4219351202831644

tying it all together

Using pathlib and tifffile, you can define some some very useful utility functions in a quick way.

Loading a tiff stack is a task that you will often perform. This function does exactly that:

from pathlib import Path
import tifffile
from tqdm import tqdm
import numpy as np


def load_stack(path, *, skip=1, squeeze=False):
    """Load a stack of tiff files.

    Make sure that the tiff files are sorted *alphabetically*,
    otherwise it is not going to look pretty..

    :param path: path to directory containing tiff files
    :param skip: read every `skip' image
    :param squeeze: whether to remove any empty dimensions from image
    :returns: an np.array containing the values in the tiff files
    :rtype: np.array

    """
    path = Path(path).expanduser().resolve()

    # Only read every `skip' image:
    img_paths = sorted(path.glob("*.tif"))[::skip]
    # Make a list containing
    if squeeze:
	imgs = [tifffile.imread(str(p)).squeeze() for p in tqdm(img_paths)]
    else:
	imgs = [tifffile.imread(str(p)) for p in tqdm(img_paths)]

    return np.array(imgs)

Saving a tiff stack is also quite easy:

from pathlib import Path
import tifffile
from tqdm import tqdm


def save_stack(path, data, *, prefix="output", exist_ok=False, parents=False):
    path = Path(path).expanduser().resolve()
    path.mkdir(exist_ok=exist_ok, parents=parents)

    for i, d in tqdm(enumerate(data), mininterval=1.0):
	output_path = path / f"{prefix}_{i:05d}.tif"
	tifffile.imsave(str(output_path), d)

Displaying tiff stacks

A great way to display volumetric and projection data is using pyqtgraph. This package can be installed using Conda:

conda install pyqtgraph

to be continued..