---
blogpost: true
date: Aug 9, 2019
title: Dask and ITK for large scale image analysis
author: John Kirkham, Matthew Rocklin, Matthew McCormick
tags: imaging
---

## Executive Summary

This post explores using the [ITK](https://www.itk.org) suite of image processing utilities in parallel with Dask Array.

We cover ...

1.  A simple but common example of applying deconvolution across a stack of 3d images
2.  Tips on how to make these two libraries work well together
3.  Challenges that we ran into and opportunities for future improvements.

## A Worked Example

Let's start with a full example applying Richardson Lucy deconvolution to a
stack of light sheet microscopy data. This is the same data that we showed how
to load in our [last blogpost on image loading](https://blog.dask.org/2019/06/20/load-image-data).
You can [access the data as tiff files from google drive here](https://drive.google.com/drive/folders/13mpIfqspKTIINkfoWbFsVtFF8D7jbTqJ), and the access the [corresponding point spread function images here](https://drive.google.com/drive/folders/13udO-h9epItG5MNWBp0VxBkKCllYBLQF).

```python
# Load our data from last time¶
import dask.array as da
imgs = da.from_zarr("AOLLSMData_m4_raw.zarr/", "data")
```

<table>
<tr>
<td>
<table>  <thead>    <tr><td> </td><th> Array </th><th> Chunk </th></tr>
  </thead>
  <tbody>
    <tr><th> Bytes </th><td> 188.74 GB </td> <td> 316.15 MB </td></tr>
    <tr><th> Shape </th><td> (3, 199, 201, 1024, 768) </td> <td> (1, 1, 201, 1024, 768) </td></tr>
    <tr><th> Count </th><td> 598 Tasks </td><td> 597 Chunks </td></tr>
    <tr><th> Type </th><td> uint16 </td><td> numpy.ndarray </td></tr>
  </tbody></table>
</td>
<td>
<svg width="404" height="206" style="stroke:rgb(0,0,0);stroke-width:1" >

  <!-- Horizontal lines -->
  <line x1="0" y1="0" x2="45" y2="0" style="stroke-width:2" />
  <line x1="0" y1="9" x2="45" y2="9" />
  <line x1="0" y1="18" x2="45" y2="18" />
  <line x1="0" y1="27" x2="45" y2="27" style="stroke-width:2" />

  <!-- Vertical lines -->
  <line x1="0" y1="0" x2="0" y2="27" style="stroke-width:2" />
  <line x1="0" y1="0" x2="0" y2="27" />
  <line x1="0" y1="0" x2="0" y2="27" />
  <line x1="0" y1="0" x2="0" y2="27" />
  <line x1="0" y1="0" x2="0" y2="27" />
  <line x1="1" y1="0" x2="1" y2="27" />
  <line x1="1" y1="0" x2="1" y2="27" />
  <line x1="1" y1="0" x2="1" y2="27" />
  <line x1="1" y1="0" x2="1" y2="27" />
  <line x1="2" y1="0" x2="2" y2="27" />
  <line x1="2" y1="0" x2="2" y2="27" />
  <line x1="2" y1="0" x2="2" y2="27" />
  <line x1="2" y1="0" x2="2" y2="27" />
  <line x1="2" y1="0" x2="2" y2="27" />
  <line x1="3" y1="0" x2="3" y2="27" />
  <line x1="3" y1="0" x2="3" y2="27" />
  <line x1="3" y1="0" x2="3" y2="27" />
  <line x1="3" y1="0" x2="3" y2="27" />
  <line x1="4" y1="0" x2="4" y2="27" />
  <line x1="4" y1="0" x2="4" y2="27" />
  <line x1="4" y1="0" x2="4" y2="27" />
  <line x1="4" y1="0" x2="4" y2="27" />
  <line x1="5" y1="0" x2="5" y2="27" />
  <line x1="5" y1="0" x2="5" y2="27" />
  <line x1="5" y1="0" x2="5" y2="27" />
  <line x1="5" y1="0" x2="5" y2="27" />
  <line x1="5" y1="0" x2="5" y2="27" />
  <line x1="6" y1="0" x2="6" y2="27" />
  <line x1="6" y1="0" x2="6" y2="27" />
  <line x1="6" y1="0" x2="6" y2="27" />
  <line x1="6" y1="0" x2="6" y2="27" />
  <line x1="7" y1="0" x2="7" y2="27" />
  <line x1="7" y1="0" x2="7" y2="27" />
  <line x1="7" y1="0" x2="7" y2="27" />
  <line x1="7" y1="0" x2="7" y2="27" />
  <line x1="7" y1="0" x2="7" y2="27" />
  <line x1="8" y1="0" x2="8" y2="27" />
  <line x1="8" y1="0" x2="8" y2="27" />
  <line x1="8" y1="0" x2="8" y2="27" />
  <line x1="8" y1="0" x2="8" y2="27" />
  <line x1="9" y1="0" x2="9" y2="27" />
  <line x1="9" y1="0" x2="9" y2="27" />
  <line x1="9" y1="0" x2="9" y2="27" />
  <line x1="9" y1="0" x2="9" y2="27" />
  <line x1="10" y1="0" x2="10" y2="27" />
  <line x1="10" y1="0" x2="10" y2="27" />
  <line x1="10" y1="0" x2="10" y2="27" />
  <line x1="10" y1="0" x2="10" y2="27" />
  <line x1="10" y1="0" x2="10" y2="27" />
  <line x1="11" y1="0" x2="11" y2="27" />
  <line x1="11" y1="0" x2="11" y2="27" />
  <line x1="11" y1="0" x2="11" y2="27" />
  <line x1="11" y1="0" x2="11" y2="27" />
  <line x1="12" y1="0" x2="12" y2="27" />
  <line x1="12" y1="0" x2="12" y2="27" />
  <line x1="12" y1="0" x2="12" y2="27" />
  <line x1="12" y1="0" x2="12" y2="27" />
  <line x1="12" y1="0" x2="12" y2="27" />
  <line x1="13" y1="0" x2="13" y2="27" />
  <line x1="13" y1="0" x2="13" y2="27" />
  <line x1="13" y1="0" x2="13" y2="27" />
  <line x1="13" y1="0" x2="13" y2="27" />
  <line x1="14" y1="0" x2="14" y2="27" />
  <line x1="14" y1="0" x2="14" y2="27" />
  <line x1="14" y1="0" x2="14" y2="27" />
  <line x1="14" y1="0" x2="14" y2="27" />
  <line x1="15" y1="0" x2="15" y2="27" />
  <line x1="15" y1="0" x2="15" y2="27" />
  <line x1="15" y1="0" x2="15" y2="27" />
  <line x1="15" y1="0" x2="15" y2="27" />
  <line x1="15" y1="0" x2="15" y2="27" />
  <line x1="16" y1="0" x2="16" y2="27" />
  <line x1="16" y1="0" x2="16" y2="27" />
  <line x1="16" y1="0" x2="16" y2="27" />
  <line x1="16" y1="0" x2="16" y2="27" />
  <line x1="17" y1="0" x2="17" y2="27" />
  <line x1="17" y1="0" x2="17" y2="27" />
  <line x1="17" y1="0" x2="17" y2="27" />
  <line x1="17" y1="0" x2="17" y2="27" />
  <line x1="18" y1="0" x2="18" y2="27" />
  <line x1="18" y1="0" x2="18" y2="27" />
  <line x1="18" y1="0" x2="18" y2="27" />
  <line x1="18" y1="0" x2="18" y2="27" />
  <line x1="18" y1="0" x2="18" y2="27" />
  <line x1="19" y1="0" x2="19" y2="27" />
  <line x1="19" y1="0" x2="19" y2="27" />
  <line x1="19" y1="0" x2="19" y2="27" />
  <line x1="19" y1="0" x2="19" y2="27" />
  <line x1="20" y1="0" x2="20" y2="27" />
  <line x1="20" y1="0" x2="20" y2="27" />
  <line x1="20" y1="0" x2="20" y2="27" />
  <line x1="20" y1="0" x2="20" y2="27" />
  <line x1="20" y1="0" x2="20" y2="27" />
  <line x1="21" y1="0" x2="21" y2="27" />
  <line x1="21" y1="0" x2="21" y2="27" />
  <line x1="21" y1="0" x2="21" y2="27" />
  <line x1="21" y1="0" x2="21" y2="27" />
  <line x1="22" y1="0" x2="22" y2="27" />
  <line x1="22" y1="0" x2="22" y2="27" />
  <line x1="22" y1="0" x2="22" y2="27" />
  <line x1="22" y1="0" x2="22" y2="27" />
  <line x1="23" y1="0" x2="23" y2="27" />
  <line x1="23" y1="0" x2="23" y2="27" />
  <line x1="23" y1="0" x2="23" y2="27" />
  <line x1="23" y1="0" x2="23" y2="27" />
  <line x1="23" y1="0" x2="23" y2="27" />
  <line x1="24" y1="0" x2="24" y2="27" />
  <line x1="24" y1="0" x2="24" y2="27" />
  <line x1="24" y1="0" x2="24" y2="27" />
  <line x1="24" y1="0" x2="24" y2="27" />
  <line x1="25" y1="0" x2="25" y2="27" />
  <line x1="25" y1="0" x2="25" y2="27" />
  <line x1="25" y1="0" x2="25" y2="27" />
  <line x1="25" y1="0" x2="25" y2="27" />
  <line x1="25" y1="0" x2="25" y2="27" />
  <line x1="26" y1="0" x2="26" y2="27" />
  <line x1="26" y1="0" x2="26" y2="27" />
  <line x1="26" y1="0" x2="26" y2="27" />
  <line x1="26" y1="0" x2="26" y2="27" />
  <line x1="27" y1="0" x2="27" y2="27" />
  <line x1="27" y1="0" x2="27" y2="27" />
  <line x1="27" y1="0" x2="27" y2="27" />
  <line x1="27" y1="0" x2="27" y2="27" />
  <line x1="28" y1="0" x2="28" y2="27" />
  <line x1="28" y1="0" x2="28" y2="27" />
  <line x1="28" y1="0" x2="28" y2="27" />
  <line x1="28" y1="0" x2="28" y2="27" />
  <line x1="28" y1="0" x2="28" y2="27" />
  <line x1="29" y1="0" x2="29" y2="27" />
  <line x1="29" y1="0" x2="29" y2="27" />
  <line x1="29" y1="0" x2="29" y2="27" />
  <line x1="29" y1="0" x2="29" y2="27" />
  <line x1="30" y1="0" x2="30" y2="27" />
  <line x1="30" y1="0" x2="30" y2="27" />
  <line x1="30" y1="0" x2="30" y2="27" />
  <line x1="30" y1="0" x2="30" y2="27" />
  <line x1="31" y1="0" x2="31" y2="27" />
  <line x1="31" y1="0" x2="31" y2="27" />
  <line x1="31" y1="0" x2="31" y2="27" />
  <line x1="31" y1="0" x2="31" y2="27" />
  <line x1="31" y1="0" x2="31" y2="27" />
  <line x1="32" y1="0" x2="32" y2="27" />
  <line x1="32" y1="0" x2="32" y2="27" />
  <line x1="32" y1="0" x2="32" y2="27" />
  <line x1="32" y1="0" x2="32" y2="27" />
  <line x1="33" y1="0" x2="33" y2="27" />
  <line x1="33" y1="0" x2="33" y2="27" />
  <line x1="33" y1="0" x2="33" y2="27" />
  <line x1="33" y1="0" x2="33" y2="27" />
  <line x1="33" y1="0" x2="33" y2="27" />
  <line x1="34" y1="0" x2="34" y2="27" />
  <line x1="34" y1="0" x2="34" y2="27" />
  <line x1="34" y1="0" x2="34" y2="27" />
  <line x1="34" y1="0" x2="34" y2="27" />
  <line x1="35" y1="0" x2="35" y2="27" />
  <line x1="35" y1="0" x2="35" y2="27" />
  <line x1="35" y1="0" x2="35" y2="27" />
  <line x1="35" y1="0" x2="35" y2="27" />
  <line x1="36" y1="0" x2="36" y2="27" />
  <line x1="36" y1="0" x2="36" y2="27" />
  <line x1="36" y1="0" x2="36" y2="27" />
  <line x1="36" y1="0" x2="36" y2="27" />
  <line x1="36" y1="0" x2="36" y2="27" />
  <line x1="37" y1="0" x2="37" y2="27" />
  <line x1="37" y1="0" x2="37" y2="27" />
  <line x1="37" y1="0" x2="37" y2="27" />
  <line x1="37" y1="0" x2="37" y2="27" />
  <line x1="38" y1="0" x2="38" y2="27" />
  <line x1="38" y1="0" x2="38" y2="27" />
  <line x1="38" y1="0" x2="38" y2="27" />
  <line x1="38" y1="0" x2="38" y2="27" />
  <line x1="38" y1="0" x2="38" y2="27" />
  <line x1="39" y1="0" x2="39" y2="27" />
  <line x1="39" y1="0" x2="39" y2="27" />
  <line x1="39" y1="0" x2="39" y2="27" />
  <line x1="39" y1="0" x2="39" y2="27" />
  <line x1="40" y1="0" x2="40" y2="27" />
  <line x1="40" y1="0" x2="40" y2="27" />
  <line x1="40" y1="0" x2="40" y2="27" />
  <line x1="40" y1="0" x2="40" y2="27" />
  <line x1="41" y1="0" x2="41" y2="27" />
  <line x1="41" y1="0" x2="41" y2="27" />
  <line x1="41" y1="0" x2="41" y2="27" />
  <line x1="41" y1="0" x2="41" y2="27" />
  <line x1="41" y1="0" x2="41" y2="27" />
  <line x1="42" y1="0" x2="42" y2="27" />
  <line x1="42" y1="0" x2="42" y2="27" />
  <line x1="42" y1="0" x2="42" y2="27" />
  <line x1="42" y1="0" x2="42" y2="27" />
  <line x1="43" y1="0" x2="43" y2="27" />
  <line x1="43" y1="0" x2="43" y2="27" />
  <line x1="43" y1="0" x2="43" y2="27" />
  <line x1="43" y1="0" x2="43" y2="27" />
  <line x1="44" y1="0" x2="44" y2="27" />
  <line x1="44" y1="0" x2="44" y2="27" />
  <line x1="44" y1="0" x2="44" y2="27" />
  <line x1="44" y1="0" x2="44" y2="27" />
  <line x1="44" y1="0" x2="44" y2="27" />
  <line x1="45" y1="0" x2="45" y2="27" />
  <line x1="45" y1="0" x2="45" y2="27" style="stroke-width:2" />

  <!-- Colored Rectangle -->
  <polygon points="0.000000,0.000000 45.378219,0.000000 45.378219,27.530335 0.000000,27.530335" style="fill:#ECB172A0;stroke-width:0"/>

  <!-- Text -->

<text x="22.689110" y="47.530335" font-size="1.0rem" font-weight="100" text-anchor="middle" >199</text>
<text x="65.378219" y="13.765167" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,65.378219,13.765167)">3</text>

  <!-- Horizontal lines -->
  <line x1="115" y1="0" x2="141" y2="26" style="stroke-width:2" />
  <line x1="115" y1="130" x2="141" y2="156" style="stroke-width:2" />

  <!-- Vertical lines -->
  <line x1="115" y1="0" x2="115" y2="130" style="stroke-width:2" />
  <line x1="141" y1="26" x2="141" y2="156" style="stroke-width:2" />

  <!-- Colored Rectangle -->
  <polygon points="115.000000,0.000000 141.720328,26.720328 141.720328,156.720328 115.000000,130.000000" style="fill:#ECB172A0;stroke-width:0"/>

  <!-- Horizontal lines -->
  <line x1="115" y1="0" x2="212" y2="0" style="stroke-width:2" />
  <line x1="141" y1="26" x2="239" y2="26" style="stroke-width:2" />

  <!-- Vertical lines -->
  <line x1="115" y1="0" x2="141" y2="26" style="stroke-width:2" />
  <line x1="212" y1="0" x2="239" y2="26" style="stroke-width:2" />

  <!-- Colored Rectangle -->
  <polygon points="115.000000,0.000000 212.500000,0.000000 239.220328,26.720328 141.720328,26.720328" style="fill:#ECB172A0;stroke-width:0"/>

  <!-- Horizontal lines -->
  <line x1="141" y1="26" x2="239" y2="26" style="stroke-width:2" />
  <line x1="141" y1="156" x2="239" y2="156" style="stroke-width:2" />

  <!-- Vertical lines -->
  <line x1="141" y1="26" x2="141" y2="156" style="stroke-width:2" />
  <line x1="239" y1="26" x2="239" y2="156" style="stroke-width:2" />

  <!-- Colored Rectangle -->
  <polygon points="141.720328,26.720328 239.220328,26.720328 239.220328,156.720328 141.720328,156.720328" style="fill:#ECB172A0;stroke-width:0"/>

  <!-- Text -->

<text x="190.470328" y="176.720328" font-size="1.0rem" font-weight="100" text-anchor="middle" >768</text>
<text x="259.220328" y="91.720328" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,259.220328,91.720328)">1024</text>
<text x="118.360164" y="163.360164" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,118.360164,163.360164)">201</text>
</svg>

</td>
</tr>
</table>

This dataset has shape `(3, 199, 201, 1024, 768)`:

- 3 fluorescence color channels,
- 199 time points,
- 201 z-slices,
- 1024 pixels in the y dimension, and
- 768 pixels in the x dimension.

```python
# Load our Point Spread Function (PSF)
import dask.array.image
psf = dask.array.image.imread("AOLLSMData/m4/psfs_z0p1/*.tif")[:, None, ...]
```

<table>
<tr>
<td>
<table>  <thead>    <tr><td> </td><th> Array </th><th> Chunk </th></tr>
  </thead>
  <tbody>
    <tr><th> Bytes </th><td> 2.48 MB </td> <td> 827.39 kB </td></tr>
    <tr><th> Shape </th><td> (3, 1, 101, 64, 64) </td> <td> (1, 1, 101, 64, 64) </td></tr>
    <tr><th> Count </th><td> 6 Tasks </td><td> 3 Chunks </td></tr>
    <tr><th> Type </th><td> uint16 </td><td> numpy.ndarray </td></tr>
  </tbody></table>
</td>
<td>
<svg width="402" height="208" style="stroke:rgb(0,0,0);stroke-width:1" >

  <!-- Horizontal lines -->
  <line x1="0" y1="0" x2="27" y2="0" style="stroke-width:2" />
  <line x1="0" y1="11" x2="27" y2="11" />
  <line x1="0" y1="22" x2="27" y2="22" />
  <line x1="0" y1="33" x2="27" y2="33" style="stroke-width:2" />

  <!-- Vertical lines -->
  <line x1="0" y1="0" x2="0" y2="33" style="stroke-width:2" />
  <line x1="27" y1="0" x2="27" y2="33" style="stroke-width:2" />

  <!-- Colored Rectangle -->
  <polygon points="0.000000,0.000000 27.530335,0.000000 27.530335,33.941765 0.000000,33.941765" style="fill:#ECB172A0;stroke-width:0"/>

  <!-- Text -->

<text x="13.765167" y="53.941765" font-size="1.0rem" font-weight="100" text-anchor="middle" >1</text>
<text x="47.530335" y="16.970882" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,47.530335,16.970882)">3</text>

  <!-- Horizontal lines -->
  <line x1="97" y1="0" x2="173" y2="76" style="stroke-width:2" />
  <line x1="97" y1="82" x2="173" y2="158" style="stroke-width:2" />

  <!-- Vertical lines -->
  <line x1="97" y1="0" x2="97" y2="82" style="stroke-width:2" />
  <line x1="173" y1="76" x2="173" y2="158" style="stroke-width:2" />

  <!-- Colored Rectangle -->
  <polygon points="97.000000,0.000000 173.470588,76.470588 173.470588,158.846826 97.000000,82.376238" style="fill:#ECB172A0;stroke-width:0"/>

  <!-- Horizontal lines -->
  <line x1="97" y1="0" x2="179" y2="0" style="stroke-width:2" />
  <line x1="173" y1="76" x2="255" y2="76" style="stroke-width:2" />

  <!-- Vertical lines -->
  <line x1="97" y1="0" x2="173" y2="76" style="stroke-width:2" />
  <line x1="179" y1="0" x2="255" y2="76" style="stroke-width:2" />

  <!-- Colored Rectangle -->
  <polygon points="97.000000,0.000000 179.376238,0.000000 255.846826,76.470588 173.470588,76.470588" style="fill:#ECB172A0;stroke-width:0"/>

  <!-- Horizontal lines -->
  <line x1="173" y1="76" x2="255" y2="76" style="stroke-width:2" />
  <line x1="173" y1="158" x2="255" y2="158" style="stroke-width:2" />

  <!-- Vertical lines -->
  <line x1="173" y1="76" x2="173" y2="158" style="stroke-width:2" />
  <line x1="255" y1="76" x2="255" y2="158" style="stroke-width:2" />

  <!-- Colored Rectangle -->
  <polygon points="173.470588,76.470588 255.846826,76.470588 255.846826,158.846826 173.470588,158.846826" style="fill:#ECB172A0;stroke-width:0"/>

  <!-- Text -->

<text x="214.658707" y="178.846826" font-size="1.0rem" font-weight="100" text-anchor="middle" >64</text>
<text x="275.846826" y="117.658707" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,275.846826,117.658707)">64</text>
<text x="125.235294" y="140.611532" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,125.235294,140.611532)">101</text>
</svg>

</td>
</tr>
</table>

```python
# Convert data to float32 for computation¶
import numpy as np
imgs = imgs.astype(np.float32)
# Note: the psf needs to be sampled with a voxel spacing
# consistent with the image's sampling
psf = psf.astype(np.float32)
```

```python
# Apply Richardson-Lucy Deconvolution¶
def richardson_lucy_deconvolution(img, psf, iterations=1):
    """ Apply deconvolution to a single chunk of data """
    import itk

    img = img[0, 0, ...]  # remove leading two length-one dimensions
    psf = psf[0, 0, ...]  # remove leading two length-one dimensions

    image = itk.image_view_from_array(img)   # Convert to ITK object
    kernel = itk.image_view_from_array(psf)  # Convert to ITK object

    deconvolved = itk.richardson_lucy_deconvolution_image_filter(
        image,
        kernel_image=kernel,
        number_of_iterations=iterations
    )

    result = itk.array_from_image(deconvolved)  # Convert back to Numpy array
    result = result[None, None, ...]  # Add back the leading length-one dimensions

    return result

out = da.map_blocks(richardson_lucy_deconvolution, imgs, psf, dtype=np.float32)
```

```python
# Create a local cluster of dask worker processes
# (this could also point to a distributed cluster if you have it)
from dask.distributed import LocalCluster, Client
cluster = LocalCluster(n_workers=20, threads_per_process=1)
client = Client(cluster)  # now dask operations use this cluster by default

# Trigger computation and store
out.to_zarr("AOLLSMData_m4_raw.zarr", "deconvolved", overwrite=True)
```

So in the example above we ...

1.  Load data both from Zarr and TIFF files into multi-chunked Dask arrays
2.  Construct a function to apply an ITK routine onto each chunk
3.  Apply that function across the dask array with the [dask.array.map_blocks](https://docs.dask.org/en/latest/array-api.html#dask.array.core.map_blocks) function.
4.  Store the result back into Zarr format

From the perspective of an imaging scientist,
the new piece of technology here is the
[dask.array.map_blocks](https://docs.dask.org/en/latest/array-api.html#dask.array.core.map_blocks) function.
Given a Dask array composed of many NumPy arrays and a function, `map_blocks` applies that function across each block in parallel, returning a Dask array as a result.
It's a great tool whenever you want to apply an operation across many blocks in a simple fashion.
Because Dask arrays are just made out of Numpy arrays it's an easy way to
compose Dask with the rest of the Scientific Python ecosystem.

## Building the right function

However in this case there are a few challenges to constructing the right Numpy
-> Numpy function, due to both idiosyncrasies in ITK and Dask Array. Let's
look at our function again:

```python
def richardson_lucy_deconvolution(img, psf, iterations=1):
    """ Apply deconvolution to a single chunk of data """
    import itk

    img = img[0, 0, ...]  # remove leading two length-one dimensions
    psf = psf[0, 0, ...]  # remove leading two length-one dimensions

    image = itk.image_view_from_array(img)   # Convert to ITK object
    kernel = itk.image_view_from_array(psf)  # Convert to ITK object

    deconvolved = itk.richardson_lucy_deconvolution_image_filter(
        image,
        kernel_image=kernel,
        number_of_iterations=iterations
    )

    result = itk.array_from_image(deconvolved)  # Convert back to Numpy array
    result = result[None, None, ...]  # Add back the leading length-one dimensions

    return result

out = da.map_blocks(richardson_lucy_deconvolution, imgs, psf, dtype=np.float32)
```

This is longer than we would like.
Instead, we would have preferred to just use the `itk` function directly,
without all of the steps before and after.

```python
deconvolved = da.map_blocks(itk.richardson_lucy_deconvolution_image_filter, imgs, psf)
```

What were the extra steps in our function and why were they necessary?

1.  **Convert to and from ITK Image objects**: ITK functions don't consume and
    produce Numpy arrays, they consume and produce their own `Image` data
    structure. There are convenient functions to convert back and forth,
    so handling this is straightforward, but it does need to be handled each
    time. See [ITK #1136](https://github.com/InsightSoftwareConsortium/ITK/issues/1136) for a
    feature request that would remove the need for this step.

2.  **Unpack and pack singleton dimensions**: Our Dask arrays have shapes like
    the following:

    ```
    Array Shape: (3, 199, 201, 1024, 768)
    Chunk Shape: (1,   1, 201, 1024, 768)
    ```

    So our `map_blocks` function gets NumPy arrays of the chunk size,
    `(1, 1, 201, 1024, 768)`.
    However, our ITK functions are meant to work on 3d arrays, not 5d arrays,
    so we need to remove those first two dimensions.

    ```python
    img = img[0, 0, ...]  # remove leading two length-one dimensions
    psf = psf[0, 0, ...]  # remove leading two length-one dimensions
    ```

    And then when we're done, Dask expects to get back 5d arrays like what it
    provided, so we add these singleton dimensions back in

    ```python
    result = result[None, None, ...]  # Add back the leading length-one dimensions
    ```

    Again, this is straightforward for users who are accustomed to NumPy
    slicing syntax, but does need to be done each time.
    This adds some friction to our development process,
    and is another step that can confuse users.

But if you're comfortable working around things like this,
then ITK and `map_blocks` can be a powerful combination
if you want to parallelize out ITK operations across a cluster.

## Defining a Dask Cluster

Above we used `dask.distributed.LocalCluster` to set up 20 single-threaded
workers on our local workstation:

```python
from dask.distributed import LocalCluster, Client
cluster = LocalCluster(n_workers=20, threads_per_process=1)
client = Client(cluster)  # now dask operations use this cluster by default
```

If you had a distributed resource, this is where you would connect it.
You would swap out `LocalCluster` with one of
[Dask's other deployment options](https://docs.dask.org/en/latest/setup.html).

Also, we found that we needed to use many single-threaded processes rather than
one multi-threaded process because ITK functions seem to still hold onto the
GIL. This is fine, we just need to be aware of it so that we set up our Dask
workers appropriately with one thread per process for maximum efficiency.
See [ITK #1134](https://github.com/InsightSoftwareConsortium/ITK/issues/1134)
for an active Github issue on this topic.

## Serialization

We had some difficulty when using the ITK library across multiple processes,
because the library itself didn't serialize well. (If you don't understand
what that means, don't worry). We solved a bit of this in
[ITK #1090](https://github.com/InsightSoftwareConsortium/ITK/pull/1090),
but some issues still remain.

We got around this by including the import in the function rather than outside
of it.

```python
def richardson_lucy_deconvolution(img, psf, iterations=1):
    import itk   # <--- we work around serialization issues by importing within the function
```

That way each task imports itk individually, and we sidestep this issue.

## Trying Scikit-Image

We also tried out the Richardson Lucy deconvolution operation in
[Scikit-Image](https://scikit-image.org/). Scikit-Image is known for being
more Scipy/Numpy native, but not always as fast as ITK. Our experience
confirmed this perception.

First, we were glad to see that the scikit-image function worked with
`map_blocks` immediately without any packing/unpacking, dimensionality, or
serialization issues:

```python
import skimage.restoration

out = da.map_blocks(skimage.restoration.richardson_lucy, imgs, psf)  # just works
```

So all of that converting to and from image objects or removing and adding
singleton dimensions isn't necessary here.

In terms of performance we were also happy to see that Scikit-Image released
the GIL, so we were able to get very high reported CPU utilization when using a
small number of multi-threaded processes. However, even though CPU utilization
was high, our parallel performance was poor enough that we stuck with the ITK
solution, warts and all. More information about this is available in
Github issue [scikit-image #4083](https://github.com/scikit-image/scikit-image/issues/4083).

_Note: sequentially on a single chunk, ITK ran in around 2 minutes while
scikit-image ran in 3 minutes. It was only once we started parallelizing that
things became slow._

Regardless, our goal in this experiment was to see how well ITK and Dask
array played together. It was nice to see what smooth integration looks like,
if only to motivate future development in ITK+Dask relations.

## Numba GUFuncs

An alternative to `da.map_blocks` are Generalized Universal Functions (gufuncs)
These are functions that have many magical properties, one of which is that
they operate equally well on both NumPy and Dask arrays. If libraries like
ITK or Scikit-Image make their functions into gufuncs then they work without
users having to do anything special.

The easiest way to implement gufuncs today is with Numba. I did this on our
wrapped `richardson_lucy` function, just to show how it could work, in case
other libraries want to take this on in the future.

```python
import numba

@numba.guvectorize(
    ["float32[:,:,:], float32[:,:,:], float32[:,:,:]"],  # we have to specify types
    "(i,j,k),(a,b,c)->(i,j,k)",                          # and dimensionality explicitly
    forceobj=True,
)
def richardson_lucy_deconvolution(img, psf, out):
    # <---- no dimension unpacking!
    iterations = 1
    image = itk.image_view_from_array(np.ascontiguousarray(img))
    kernel = itk.image_view_from_array(np.ascontiguousarray(psf))

    deconvolved = itk.richardson_lucy_deconvolution_image_filter(
        image, kernel_image=kernel, number_of_iterations=iterations
    )
    out[:] = itk.array_from_image(deconvolved)

# Now this function works natively on either NumPy or Dask arrays
out = richardson_lucy_deconvolution(imgs, psf)  # <-- no map_blocks call!
```

Note that we've both lost the dimension unpacking and the `map_blocks` call.
Our function now knows enough information about how it can broadcast that Dask
can do the parallelization without being told what to do explicitly.

This adds some burden onto library maintainers,
but makes the user experience much more smooth.

## GPU Acceleration

When doing some user research on image processing and Dask, almost everyone we
interviewed said that they wanted faster deconvolution. This seemed to be a
major pain point. Now we know why. It's both very common, and _very slow_.

Running deconvolution on a single chunk of this size takes around 2-4 minutes,
and we have hundreds of chunks in a single dataset. Multi-core parallelism can
help a bit here, but this problem may also be ripe for GPU acceleration.
Similar operations typically have 100x speedups on GPUs. This might be a more
pragmatic solution than scaling out to large distributed clusters.

## What's next?

This experiment both ...

- **Gives us an example** that other imaging scientists
  can copy and modify to be effective with Dask and ITK together.
- **Highlights areas of improvement** where developers from the different
  libraries can work to remove some of these rough interactions spots in the
  future.

  It's worth noting that Dask has done this with lots of libraries within the
  Scipy ecosystem, including Pandas, Scikit-Image, Scikit-Learn, and others.

We're also going to continue with our imaging experiment, while these technical
issues get worked out in the background. Next up, segmentation!
