During a PhD in the computational imaging group, you will not escape reading and writing copious amounts of tiff files. In this blog post, I will run through the most basic (and most helpful!) python libraries and snippets to help you out.
pathlib
Pathlib is a library designed to make handling file paths easier in python. You do not even have to install it: it is part of the Python standard library! You can convert any valid string path to a path as follows:
from pathlib import Path
print(Path("~/projects/blog"))
~/projects/blog
You might already have found out that most libraries will not read a
path that contains a tilde (~
). Therefore, you may expand the tilde
using expanduser
. Moreover, if your path contains symbolic links, or
it is a relative path, you can use resolve
to obtain an absolute
file path:
from pathlib import Path
print(Path("~"))
print(Path("~/projects").expanduser())
print(Path("~/projects").expanduser().resolve())
print(Path("../").resolve())
.
~
/ufs/hendriks/projects
/export/scratch1/hendriks/projects
/export/scratch1/hendriks/projects/blog
The most useful feature of pathlib
is the easy file globbing, which
allows you to quickly select multiple files, for instance:
from pathlib import Path
dataset_dir = Path("~/datasets/SophiaBeads_1024_averaged/").expanduser().resolve()
bead_tiffs = dataset_dir.glob("*.tif")
print("Unordered:")
print([p.name for p in bead_tiffs][:2])
# note that the glob does not return the files in any particular
# ordering, you must SORT THEM first:
bead_tiffs = sorted(dataset_dir.glob("*.tif"))
print("\nOrdered:")
print([p.name for p in bead_tiffs][:2])
Unordered:
['SophiaBeads_1024_averaged_0846.tif', 'SophiaBeads_1024_averaged_0468.tif']
Ordered:
['SophiaBeads_1024_averaged_0001.tif', 'SophiaBeads_1024_averaged_0002.tif']
Finally, you can easily create directories using pathlib
:
from pathlib import Path
# Make a directory. By default, mkdir returns an error if the
# directory already exists and it does not create directories
# recursively. You can enable both behaviours as follows:
Path("/tmp/test_dir/sub_dir").mkdir(exist_ok=True, parents=True)
tifffile
To read tiff files, I always use the library that Daan uses
tifffile
. This library is easily installable using Conda:
conda install -c conda-forge tifffile
Reading and writing files is really easy:
import tifffile
from pathlib import Path
import numpy as np
tmp_dir = Path("/tmp/tmp_tiff_dir/")
tmp_dir.mkdir(exist_ok=True)
img = np.zeros((10, 10), dtype=np.float32)
img_path = tmp_dir / "img.tif"
# tifffile does not support a `Path' as argument, convert to string
# first. Also, make sure to convert the image to float32. By default,
# numpy creates float64 arrays. This does not only waste disk space,
# but is also annoying because some tiff viewers (such as imageJ) do not
# support float64.
tifffile.imsave(str(img_path), img.astype(np.float32))
# This loads an image:
img_loaded = tifffile.imread(str(img_path))
assert np.all(img_loaded == img), "images do not match"
tqdm: progress bars
Some things just take long, and you might want to know if things are
progressing fast enough. Instead of printing something to the
terminal, you may want to see a progress bar. This is really easy in
Python. Just use tqdm
. Install with Conda:
conda install tqdm
Use as follows:
from tqdm import tqdm
import time
for i in tqdm(range(10)):
# Do something long-winded:
time.sleep(0.5)
Check out the tqdm GitHub page for more information and some spiffy animations.
enumerate: smarter looping
Looping over things in Python is really easy using “for .. in ..":
for c in "hello":
print(c)
h
e
l
l
o
Sometimes, you also want to know what index the current thing your are
looping over has. This is what th enumerate
function is for. It
makes it easy to loop over things and get the index. Look at this, for
instance:
import numpy as np
img = np.random.normal(size=(3, 10))
print("Initial version:")
i = 0
for row in img:
print(i, len(row), np.mean(row))
i += 1
print("Idiomatic:")
for i, row in enumerate(img):
print(i, len(row), np.mean(row))
Initial version:
0 10 -0.19616953681937158
1 10 -0.10755020911455213
2 10 -0.4219351202831644
Idiomatic:
0 10 -0.19616953681937158
1 10 -0.10755020911455213
2 10 -0.4219351202831644
tying it all together
Using pathlib
and tifffile
, you can define some some very useful
utility functions in a quick way.
Loading a tiff stack is a task that you will often perform. This function does exactly that:
from pathlib import Path
import tifffile
from tqdm import tqdm
import numpy as np
def load_stack(path, *, skip=1, squeeze=False):
"""Load a stack of tiff files.
Make sure that the tiff files are sorted *alphabetically*,
otherwise it is not going to look pretty..
:param path: path to directory containing tiff files
:param skip: read every `skip' image
:param squeeze: whether to remove any empty dimensions from image
:returns: an np.array containing the values in the tiff files
:rtype: np.array
"""
path = Path(path).expanduser().resolve()
# Only read every `skip' image:
img_paths = sorted(path.glob("*.tif"))[::skip]
# Make a list containing
if squeeze:
imgs = [tifffile.imread(str(p)).squeeze() for p in tqdm(img_paths)]
else:
imgs = [tifffile.imread(str(p)) for p in tqdm(img_paths)]
return np.array(imgs)
Saving a tiff stack is also quite easy:
from pathlib import Path
import tifffile
from tqdm import tqdm
def save_stack(path, data, *, prefix="output", exist_ok=False, parents=False):
path = Path(path).expanduser().resolve()
path.mkdir(exist_ok=exist_ok, parents=parents)
for i, d in tqdm(enumerate(data), mininterval=1.0):
output_path = path / f"{prefix}_{i:05d}.tif"
tifffile.imsave(str(output_path), d)
Displaying tiff stacks
A great way to display volumetric and projection data is using
pyqtgraph
. This package can be installed using Conda:
conda install pyqtgraph
to be continued..