<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.dask.org</id>
  <title>Dask Working Notes - Posts by John Kirkham (NVIDIA) and Ben Zaitlen (NVIDIA)</title>
  <updated>2026-03-05T15:05:19.886830+00:00</updated>
  <link href="https://blog.dask.org"/>
  <link href="https://blog.dask.org/blog/author/john-kirkham-nvidia-and-ben-zaitlen-nvidia/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://blog.dask.org/2020/11/12/deconvolution/</id>
    <title>Image Analysis Redux</title>
    <updated>2020-11-12T00:00:00+00:00</updated>
    <author>
      <name>John Kirkham (NVIDIA) and Ben Zaitlen (NVIDIA)</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/11/12/deconvolution.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="summary"&gt;

&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2019/08/09/image-itk"&gt;Last year&lt;/a&gt; we experimented with
Dask/ITK/Scikit-Image to perform large scale image analysis on a stack of 3D
images. Specifically, we looked at deconvolution, a common method to &lt;em&gt;deblur&lt;/em&gt;
images. Now, a year later, we return to these experiments with a better
understanding of how Dask and CuPy can interact, enhanced serialization
methods, and support from the open-source community. This post looks at the
following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Implementing a common deconvolution method for CPU + GPU&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Leveraging Dask to perform deconvolution on a larger dataset&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Exploring the results with the Napari image viewer&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/11/12/deconvolution.md&lt;/span&gt;, line 23)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="image-analysis-redux"&gt;
&lt;h1&gt;Image Analysis Redux&lt;/h1&gt;
&lt;p&gt;Previously we used the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Richardson%E2%80%93Lucy_deconvolution"&gt;Richardson Lucy
(RL)&lt;/a&gt;
deconvolution algorithm from ITK and
&lt;a class="reference external" href="https://github.com/scikit-image/scikit-image/blob/master/skimage/restoration/deconvolution.py#L329"&gt;Scikit-Image&lt;/a&gt;.
We left off at theorizing how GPUs could potentially help accelerate these
workflows. Starting with Scikit-Image’s implementation, we naively tried
replacing &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scipy.signal.convolve&lt;/span&gt;&lt;/code&gt; calls with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cupyx.scipy.ndimage.convolve&lt;/span&gt;&lt;/code&gt;,
and while performance improved, it did not improve &lt;em&gt;significantly&lt;/em&gt; – that is,
we did not get the 100X speed we were looking for.&lt;/p&gt;
&lt;p&gt;As it often turns out in mathematics a problem that can be inefficient to solve
in one representation can often be made more efficent by transforming the data
beforehand. In this new representation we can solve the same problem
(convolution in this case) more easily before transforming the result back into
a more familiar representation. When it comes to convolution, the
transformation we apply is called &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Fast_Fourier_transform"&gt;Fast-Fourier Transform
(FFT)&lt;/a&gt;. Once this is
applied we are able to convolve data using a simple multiplication.&lt;/p&gt;
&lt;p&gt;As it turns out this FFT transformation is extremely fast on both CPUs and
GPUs. Similarly the algorithm we can write with FFTs is accelerated. This is a
commonly used technique in the image processing field to speed up convolutions.
Despite the added step of doing FFTs, the cost of transformation + the cost of
the algorithm is still lower than performing the original algorithm in real
space. We (and others before us) found this was the case for Richardson Lucy
(on both CPUs and GPUs) and performance continued increasing when we
parallelized with Dask over multiple GPUs.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/11/12/deconvolution.md&lt;/span&gt;, line 53)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="help-from-open-source"&gt;
&lt;h1&gt;Help from Open-Source&lt;/h1&gt;
&lt;p&gt;An FFT RL equivalent has been around for some time and the good folks at the
&lt;a class="reference external" href="https://sdo.gsfc.nasa.gov/mission/instruments.php"&gt;Solar Dynamics Observatory&lt;/a&gt;
built and shared a NumPy/CuPy implementation as part the &lt;a class="reference external" href="https://aiapy.readthedocs.io/en/v0.2.0/_modules/aiapy/psf/deconvolve.html"&gt;Atmospheric Imaging
Assembly&lt;/a&gt;
Python package (aiapy). We slightly modified their implementation to handle 3D
as well as 2D &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Point_spread_function"&gt;Point Spread
Functions&lt;/a&gt; and to take
advantage of
&lt;a class="reference external" href="https://numpy.org/neps/nep-0018-array-function-protocol.html"&gt;NEP-18&lt;/a&gt; for
convenient dispatching of NumPy and CuPy to NumPy and CuPy functions:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;deconvolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Pad PSF with zeros to match image shape&lt;/span&gt;
    &lt;span class="n"&gt;pad_l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pad_r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;divmod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pad_r&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;pad_l&lt;/span&gt;
    &lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pad_l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pad_r&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;constant&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;constant_values&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Recenter PSF at the origin&lt;/span&gt;
    &lt;span class="c1"&gt;# Needed to ensure PSF doesn&amp;#39;t introduce an offset when&lt;/span&gt;
    &lt;span class="c1"&gt;# convolving with image&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;roll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Convolution requires FFT of the PSF&lt;/span&gt;
    &lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rfftn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Perform deconvolution in-place on a copy of the image&lt;/span&gt;
    &lt;span class="c1"&gt;# (avoids changing the original)&lt;/span&gt;
    &lt;span class="n"&gt;img_decon&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;irfftn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rfftn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_decon&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;img_decon&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;irfftn&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rfftn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conj&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conj&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;img_decon&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For a 1.3 GB image we measured the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CuPy ~3 seconds for 20 iterations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NumPy ~36 seconds for 2 iterations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We see 10x increase in speed for 10 times the number of iterations – very
close to our desired 100x speedup! Let’s explore how this implementation
performs with real biological data and Dask…&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/11/12/deconvolution.md&lt;/span&gt;, line 100)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="define-a-dask-cluster-and-load-the-data"&gt;
&lt;h1&gt;Define a Dask Cluster and Load the Data&lt;/h1&gt;
&lt;p&gt;We were provided sample data from &lt;a class="reference external" href="https://www.nibib.nih.gov/about-nibib/staff/hari-shroff"&gt;Prof.
Shroff’s&lt;/a&gt; lab at the
NIH. The data originally was provided as a 3D TIFF file which we subsequently
converted to Zarr with a shape of (950, 2048, 2048).&lt;/p&gt;
&lt;p&gt;We start by creating a Dask cluster on a DGX2 (16 GPUs in a single node) and
loading the image stored in Zarr :&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/quasiben/3a638bb9a4f075ac9041bf66974ebb45"&gt;Example Notebook&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cuda&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LocalCUDACluster&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;rmm&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cp&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocalCUDACluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;local_directory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/tmp/bzaitlen&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;enable_nvlink&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rmm_pool_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;26GB&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_allocator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rmm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rmm_cupy_allocator&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;imgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_zarr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/public/NVMICROSCOPY/y1z1_C1_A.zarr/&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 7.97 GB &lt;/td&gt; &lt;td&gt; 8.39 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (950, 2048, 2048) &lt;/td&gt; &lt;td&gt; (1, 2048, 2048) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 951 Tasks &lt;/td&gt;&lt;td&gt; 950 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="212" height="202" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="42" y2="32" style="stroke-width:2" /&gt;
  &lt;line x1="10" y1="120" x2="42" y2="152" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="10" y2="120" style="stroke-width:2" /&gt;
  &lt;line x1="12" y1="2" x2="12" y2="122" /&gt;
  &lt;line x1="14" y1="4" x2="14" y2="124" /&gt;
  &lt;line x1="16" y1="6" x2="16" y2="126" /&gt;
  &lt;line x1="18" y1="8" x2="18" y2="128" /&gt;
  &lt;line x1="20" y1="10" x2="20" y2="130" /&gt;
  &lt;line x1="22" y1="12" x2="22" y2="132" /&gt;
  &lt;line x1="24" y1="14" x2="24" y2="134" /&gt;
  &lt;line x1="26" y1="16" x2="26" y2="136" /&gt;
  &lt;line x1="28" y1="18" x2="28" y2="138" /&gt;
  &lt;line x1="30" y1="20" x2="30" y2="140" /&gt;
  &lt;line x1="32" y1="22" x2="32" y2="142" /&gt;
  &lt;line x1="34" y1="24" x2="34" y2="144" /&gt;
  &lt;line x1="36" y1="26" x2="36" y2="146" /&gt;
  &lt;line x1="38" y1="28" x2="38" y2="148" /&gt;
  &lt;line x1="41" y1="31" x2="41" y2="151" /&gt;
  &lt;line x1="42" y1="32" x2="42" y2="152" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 42.743566,32.743566 42.743566,152.743566 10.000000,120.000000" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="130" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="12" y1="2" x2="132" y2="2" /&gt;
  &lt;line x1="14" y1="4" x2="134" y2="4" /&gt;
  &lt;line x1="16" y1="6" x2="136" y2="6" /&gt;
  &lt;line x1="18" y1="8" x2="138" y2="8" /&gt;
  &lt;line x1="20" y1="10" x2="140" y2="10" /&gt;
  &lt;line x1="22" y1="12" x2="142" y2="12" /&gt;
  &lt;line x1="24" y1="14" x2="144" y2="14" /&gt;
  &lt;line x1="26" y1="16" x2="146" y2="16" /&gt;
  &lt;line x1="28" y1="18" x2="148" y2="18" /&gt;
  &lt;line x1="30" y1="20" x2="150" y2="20" /&gt;
  &lt;line x1="32" y1="22" x2="152" y2="22" /&gt;
  &lt;line x1="34" y1="24" x2="154" y2="24" /&gt;
  &lt;line x1="36" y1="26" x2="156" y2="26" /&gt;
  &lt;line x1="38" y1="28" x2="158" y2="28" /&gt;
  &lt;line x1="41" y1="31" x2="161" y2="31" /&gt;
  &lt;line x1="42" y1="32" x2="162" y2="32" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="42" y2="32" style="stroke-width:2" /&gt;
  &lt;line x1="130" y1="0" x2="162" y2="32" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 130.000000,0.000000 162.743566,32.743566 42.743566,32.743566" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="42" y1="32" x2="162" y2="32" style="stroke-width:2" /&gt;
  &lt;line x1="42" y1="152" x2="162" y2="152" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="42" y1="32" x2="42" y2="152" style="stroke-width:2" /&gt;
  &lt;line x1="162" y1="32" x2="162" y2="152" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="42.743566,32.743566 162.743566,32.743566 162.743566,152.743566 42.743566,152.743566" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="102.743566" y="172.743566" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;2048&lt;/text&gt;
&lt;text x="182.743566" y="92.743566" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,182.743566,92.743566)"&gt;2048&lt;/text&gt;
&lt;text x="16.371783" y="156.371783" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,16.371783,156.371783)"&gt;950&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;From the Dask output above you can see the data is a z-stack of 950 images
where each slice is 2048x2048. For this data set, we can improve GPU
performance if we operate on larger chunks. Additionally, we need to ensure
the chunks are are least as big as the PSF which in this case is, (128, 128,
128). As we did our work on a DGX2, which has 16 GPUs, we can comfortably fit
the data and perform deconvolution on each GPU if we &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;rechunk&lt;/span&gt;&lt;/code&gt; the data
accordingly:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# chunk with respect to PSF shape (128, 128, 128)&lt;/span&gt;
&lt;span class="n"&gt;imgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rechunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;190&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;imgs&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;td&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 7.97 GB &lt;/td&gt; &lt;td&gt; 99.61 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (950, 2048, 2048) &lt;/td&gt; &lt;td&gt; (190, 512, 512) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 967 Tasks &lt;/td&gt;&lt;td&gt; 80 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/td&gt;
&lt;p&gt;Next, we convert to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;float32&lt;/span&gt;&lt;/code&gt; as the data may not already be of floating point
type. Also 32-bit is a bit faster than 64-bit when computing and saves a bit on
memory. Below we map &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cupy.asarray&lt;/span&gt;&lt;/code&gt; onto each block of data. &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cupy.asarray&lt;/span&gt;&lt;/code&gt;
moves the data from host memory (NumPy) to the device/GPU (CuPy).&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;imgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;c_imgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 15.94 GB &lt;/td&gt; &lt;td&gt; 199.23 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (950, 2048, 2048) &lt;/td&gt; &lt;td&gt; (190, 512, 512) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 80 Tasks &lt;/td&gt;&lt;td&gt; 80 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; float32 &lt;/td&gt;&lt;td&gt; cupy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="212" height="202" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="42" y2="32" style="stroke-width:2" /&gt;
  &lt;line x1="10" y1="30" x2="42" y2="62" /&gt;
  &lt;line x1="10" y1="60" x2="42" y2="92" /&gt;
  &lt;line x1="10" y1="90" x2="42" y2="122" /&gt;
  &lt;line x1="10" y1="120" x2="42" y2="152" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="10" y2="120" style="stroke-width:2" /&gt;
  &lt;line x1="16" y1="6" x2="16" y2="126" /&gt;
  &lt;line x1="23" y1="13" x2="23" y2="133" /&gt;
  &lt;line x1="29" y1="19" x2="29" y2="139" /&gt;
  &lt;line x1="36" y1="26" x2="36" y2="146" /&gt;
  &lt;line x1="42" y1="32" x2="42" y2="152" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.0,0.0 42.74356617647059,32.74356617647059 42.74356617647059,152.74356617647058 10.0,120.0" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="130" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="16" y1="6" x2="136" y2="6" /&gt;
  &lt;line x1="23" y1="13" x2="143" y2="13" /&gt;
  &lt;line x1="29" y1="19" x2="149" y2="19" /&gt;
  &lt;line x1="36" y1="26" x2="156" y2="26" /&gt;
  &lt;line x1="42" y1="32" x2="162" y2="32" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="42" y2="32" style="stroke-width:2" /&gt;
  &lt;line x1="40" y1="0" x2="72" y2="32" /&gt;
  &lt;line x1="70" y1="0" x2="102" y2="32" /&gt;
  &lt;line x1="100" y1="0" x2="132" y2="32" /&gt;
  &lt;line x1="130" y1="0" x2="162" y2="32" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.0,0.0 130.0,0.0 162.74356617647058,32.74356617647059 42.74356617647059,32.74356617647059" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="42" y1="32" x2="162" y2="32" style="stroke-width:2" /&gt;
  &lt;line x1="42" y1="62" x2="162" y2="62" /&gt;
  &lt;line x1="42" y1="92" x2="162" y2="92" /&gt;
  &lt;line x1="42" y1="122" x2="162" y2="122" /&gt;
  &lt;line x1="42" y1="152" x2="162" y2="152" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="42" y1="32" x2="42" y2="152" style="stroke-width:2" /&gt;
  &lt;line x1="72" y1="32" x2="72" y2="152" /&gt;
  &lt;line x1="102" y1="32" x2="102" y2="152" /&gt;
  &lt;line x1="132" y1="32" x2="132" y2="152" /&gt;
  &lt;line x1="162" y1="32" x2="162" y2="152" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="42.74356617647059,32.74356617647059 162.74356617647058,32.74356617647059 162.74356617647058,152.74356617647058 42.74356617647059,152.74356617647058" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="102.743566" y="172.743566" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;2048&lt;/text&gt;
&lt;text x="182.743566" y="92.743566" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,182.743566,92.743566)"&gt;2048&lt;/text&gt;
&lt;text x="16.371783" y="156.371783" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,16.371783,156.371783)"&gt;950&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;What we now have is a Dask array composed of 16 CuPy blocks of data. Notice
how Dask provides nice typing information in the SVG output. When we moved
from NumPy to CuPy, the block diagram above displays &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Type:&lt;/span&gt; &lt;span class="pre"&gt;cupy.ndarray&lt;/span&gt;&lt;/code&gt; –
this is a nice sanity check.&lt;/p&gt;
&lt;p&gt;The last piece we need before running the deconvolution is the PSF which should
also be loaded onto the GPU:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;skimage.io&lt;/span&gt;

&lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skimage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/public/NVMICROSCOPY/PSF.tif&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;c_psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Lastly, we call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt; with the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;deconvolve&lt;/span&gt;&lt;/code&gt; function across the Dask
array:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_overlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;deconvolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;c_imgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;c_psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;c_imgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c_psf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;boundary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;periodic&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a href="/images/deconvolve.png"&gt;
    &lt;img src="/images/deconvolve.png" width="100%"&gt;&lt;/a&gt;
&lt;p&gt;The image above is taken from a mouse intestine.&lt;/p&gt;
&lt;p&gt;With Dask and multiple GPUs, we measured deconvolution of an 16GB image in ~30
seconds! But this is just the first step towards accelerated image science.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/11/12/deconvolution.md&lt;/span&gt;, line 386)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="napari"&gt;
&lt;h1&gt;Napari&lt;/h1&gt;
&lt;p&gt;Deconvolution is just one operation and one tool, an image scientist or
microscopist will need. They will need other tools as they study the
underlying biology. Before getting to those next steps, they will need tools
to visualize the data. &lt;a class="reference external" href="https://napari.org/"&gt;Napari&lt;/a&gt;, a multi-dimensional image
viewer used in the PyData Bio ecosystem, is a good tool for visualizing this
data. As an experiment, we ran the same workflow on a local workstation with 2
Quadro RTX 8000 GPUs connected with NVLink. &lt;a class="reference external" href="https://gist.github.com/quasiben/02b3dabba8fb3415e40e685b3cb2ca4a"&gt;Example
Notebook&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By adding a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; call to our array, we can move our data &lt;em&gt;back&lt;/em&gt; from
GPU to CPU (device to host).&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;cupy_to_numpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cp&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asnumpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;np_out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cupy_to_numpy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a href="/images/napari-deconv.png"&gt;
    &lt;img src="/images/napari-deconv.png" width="100%"&gt;&lt;/a&gt;
&lt;p&gt;When the user moves the slider on the Napari UI, we are instructing dask to the
following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Load the data from disk onto the GPU (CuPy)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compute the deconvolution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move back to the host (NumPy)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Render with Napari&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This has about a second latency which is great for a naive implementation! We
can improve this by adding caching, improving communications with
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt;, and optimizing the deconvolution kernel.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/11/12/deconvolution.md&lt;/span&gt;, line 423)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;We have now shown with Dask + CuPy how one can perform Richardson-Lucy
Deconvolution. This required a minimal amount of code. Combining this with an
image viewer (Napari), we were able to inspect the data and our result. All of
this performed reasonably well by assembling PyData libraries: Dask, CuPy,
Zarr, and Napari with a new deconvolution kernel. Hopefully this provides you
a good template to get started analyzing your own data and demonstrates the
richness and easy expression of custom workflows. If you run into any
challenges, please reach out on &lt;a class="reference external" href="https://github.com/dask/dask/issues"&gt;the Dask issue
tracker&lt;/a&gt; and we would be happy to engage
with you :)&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2020/11/12/deconvolution/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <published>2020-11-12T00:00:00+00:00</published>
  </entry>
</feed>
