<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.dask.org</id>
  <title>Dask Working Notes - Posted in 2019</title>
  <updated>2026-03-05T15:05:21.627599+00:00</updated>
  <link href="https://blog.dask.org"/>
  <link href="https://blog.dask.org/blog/2019/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://blog.dask.org/2019/11/01/deployment-updates/</id>
    <title>Dask Deployment Updates</title>
    <updated>2019-11-01T00:00:00+00:00</updated>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 7)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="summary"&gt;

&lt;p&gt;Over the last six months many Dask developers have worked on making Dask easier
to deploy in a wide variety of situations. This post summarizes those
efforts, and provides links to ongoing work.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 13)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-we-mean-by-deployment"&gt;
&lt;h1&gt;What we mean by Deployment&lt;/h1&gt;
&lt;p&gt;In order to run Dask on a cluster, you need to setup a scheduler on one
machine:&lt;/p&gt;
&lt;div class="highlight-console notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;dask-scheduler
&lt;span class="go"&gt;Scheduler running at tcp://192.168.0.1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And start Dask workers on many other machines&lt;/p&gt;
&lt;div class="highlight-console notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;dask-worker&lt;span class="w"&gt; &lt;/span&gt;tcp://192.168.0.1
&lt;span class="go"&gt;Waiting to connect to:       tcp://scheduler:8786&lt;/span&gt;

&lt;span class="gp"&gt;$ &lt;/span&gt;dask-worker&lt;span class="w"&gt; &lt;/span&gt;tcp://192.168.0.1
&lt;span class="go"&gt;Waiting to connect to:       tcp://scheduler:8786&lt;/span&gt;

&lt;span class="gp"&gt;$ &lt;/span&gt;dask-worker&lt;span class="w"&gt; &lt;/span&gt;tcp://192.168.0.1
&lt;span class="go"&gt;Waiting to connect to:       tcp://scheduler:8786&lt;/span&gt;

&lt;span class="gp"&gt;$ &lt;/span&gt;dask-worker&lt;span class="w"&gt; &lt;/span&gt;tcp://192.168.0.1
&lt;span class="go"&gt;Waiting to connect to:       tcp://scheduler:8786&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For informal clusters people might do this manually, by logging into each
machine and running these commands themselves. However it’s much more common
to use a cluster resource manager such as Kubernetes, Yarn (Hadoop/Spark),
HPC batch schedulers (SGE, PBS, SLURM, LSF …), some cloud service or some custom system.&lt;/p&gt;
&lt;p&gt;As Dask is used by more institutions and used more broadly within those
institutions, making deployment smooth and natural becomes increasingly
important. This is so important in fact, that there have been seven separate
efforts to improve deployment in some regard or another by a few different
groups.&lt;/p&gt;
&lt;p&gt;We’ll briefly summarize and link to this work below, and then we’ll finish up
by talking about some internal design that helped to make this work more
consistent.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 54)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="dask-ssh"&gt;
&lt;h1&gt;Dask-SSH&lt;/h1&gt;
&lt;p&gt;According to our user survey, the most common deployment mechanism was still
SSH. Dask has had a &lt;a class="reference external" href="https://docs.dask.org/en/latest/setup/ssh.html#command-line"&gt;command line dask-ssh
tool&lt;/a&gt; to make it
easier to deploy with SSH for some time. We recently updated this to also
include a Python interface, which provides more control.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;host1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;host2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;host3&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;host4&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="n"&gt;connect_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;known_hosts&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="n"&gt;worker_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;nthreads&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="n"&gt;scheduler_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;port&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;dashboard_address&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;:8797&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This isn’t what we recommend for large institutions, but it can be helpful for
more informal groups who are just getting started.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 76)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="dask-jobqueue-and-dask-kubernetes-rewrite"&gt;
&lt;h1&gt;Dask-Jobqueue and Dask-Kubernetes Rewrite&lt;/h1&gt;
&lt;p&gt;We’ve rewritten Dask-Jobqueue for SLURM/PBS/LSF/SGE cluster managers typically
found in HPC centers and Dask-Kubernetes. These now share a common codebase
along with Dask SSH, and so are much more consistent and hopefully bug free.&lt;/p&gt;
&lt;p&gt;Ideally users shouldn’t notice much difference with existing workloads,
but new features like asynchronous operation, integration with the Dask
JupyterLab extension, and so on are more consistently available. Also, we’ve
been able to unify development and reduce our maintenance burden considerably.&lt;/p&gt;
&lt;p&gt;The new version of Dask Jobqueue where these changes take place is 0.7.0, and
the work was done in &lt;a class="reference external" href="https://github.com/dask/dask-jobqueue/pull/307"&gt;dask/dask-jobqueue #307&lt;/a&gt;.
The new version of Dask Kubernetes is 0.10.0 and the work was done in
&lt;a class="reference external" href="https://github.com/dask/dask-kubernetes/pull/162"&gt;dask/dask-kubernetes #162&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 92)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="dask-cloudprovider"&gt;
&lt;h1&gt;Dask-CloudProvider&lt;/h1&gt;
&lt;p&gt;For cloud deployments we generally recommend using a hosted Kubernetes or Yarn
service, and then using Dask-Kubernetes or Dask-Yarn on top of these.&lt;/p&gt;
&lt;p&gt;However, some institutions have made decisions or commitments to use
certain vendor specific technologies, and it’s more convenient to use APIs that
are more native to the particular cloud. The new package &lt;a class="reference external" href="https://cloudprovider.dask.org"&gt;Dask
Cloudprovider&lt;/a&gt; handles this today for Amazon’s
ECS API, which has been around for a long while and is more universally
accepted.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cloudprovider&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ECSCluster&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ECSCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster_arn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;arn:aws:ecs:&amp;lt;region&amp;gt;:&amp;lt;acctid&amp;gt;:cluster/&amp;lt;clustername&amp;gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cloudprovider&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FargateCluster&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FargateCluster&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 112)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="dask-gateway"&gt;
&lt;h1&gt;Dask-Gateway&lt;/h1&gt;
&lt;p&gt;In some cases users may not have access to the cluster manager. For example
the institution may not give all of their data science users access to the Yarn
or Kubernetes cluster. In this case the &lt;a class="reference external" href="https://gateway.dask.org"&gt;Dask-Gateway&lt;/a&gt;
project may be useful.
It can launch and manage Dask jobs,
and provide a proxy connection to these jobs if necessary.
It is typically deployed with elevated permissions but managed directly by IT,
giving them a point of greater control.&lt;/p&gt;
&lt;img src="https://gateway.dask.org/_images/architecture.svg" width="50%"&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 125)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="gpus-and-dask-cuda"&gt;
&lt;h1&gt;GPUs and Dask-CUDA&lt;/h1&gt;
&lt;p&gt;While using Dask with multi-GPU deployments the &lt;a class="reference external" href="https://rapids.ai"&gt;NVIDIA
RAPIDS&lt;/a&gt; has needed the ability to specify increasingly
complex setups of Dask workers. They recommend the following deployment
strategy:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;One Dask-worker per GPU on a machine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Specify the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;/code&gt; environment variable to pin that worker
to that GPU&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If your machine has multiple network interfaces then choose the network interface that has the best connection to that GPU&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If your machine has multiple CPUs then set thread affinities to use the closest CPU&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;… and more&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For this reason we wanted to specify these configurations in code, like the
following:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;specification&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;worker-0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;cls&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;distributed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Nanny&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;options&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;nthreads&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;env&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;CUDA_VISIBLE_DEVICES&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;0,1,2,3&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ib0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;worker-1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;cls&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;distributed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Nanny&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;options&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;nthreads&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;env&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;CUDA_VISIBLE_DEVICES&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;1,2,3,0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ib0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;worker-2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;cls&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;distributed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Nanny&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;options&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;nthreads&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;env&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;CUDA_VISIBLE_DEVICES&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;2,3,0,1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ib1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;worker-2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;cls&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;distributed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Nanny&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;options&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;nthreads&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;env&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;CUDA_VISIBLE_DEVICES&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;3,0,1,2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ib1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And the new SpecCluster class to deploy these workers:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SpecCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;specification&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We’ve used this technique in the
&lt;a class="reference external" href="https://github.com/rapidsai/dask-cuda"&gt;Dask-CUDA&lt;/a&gt; project to provide
convenient functions for deployment on multi-GPU systems.&lt;/p&gt;
&lt;p&gt;This class was generic enough that it ended up forming the base of the SSH,
Jobqueue, and Kubernetes solutions as well.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 176)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="standards-and-conventions"&gt;
&lt;h1&gt;Standards and Conventions&lt;/h1&gt;
&lt;p&gt;The solutions above are built by different teams that work in different companies.
This is great because those teams have hands-on experience with the
cluster managers in the wild, but has historically been somewhat challenging to
standardize user experience. This is particularly challenging when we build
other tools like IPython widgets or the Dask JupyterLab extension, which want
to interoperate with all of the Dask deployment solutions.&lt;/p&gt;
&lt;p&gt;The recent rewrite of Dask-SSH, Dask-Jobqueue, Dask-Kubernetes, and the new
Dask-Cloudprovider and Dask-CUDA libraries place them
all under the same &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.distributed.SpecCluster&lt;/span&gt;&lt;/code&gt; superclass. So we can expect a high degree of
uniformity from them. Additionally, all of the classes now match the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.distributed.Cluster&lt;/span&gt;&lt;/code&gt; interface, which standardizes things like
adaptivity, IPython widgets, logs, and some basic reporting.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Cluster&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SpecCluster&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Kubernetes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;JobQueue (PBS/SLURM/LSF/SGE/Torque/Condor/Moab/OAR)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SSH&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CloudProvider (ECS)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CUDA (LocalCUDACluster, DGX)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LocalCluster&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Yarn&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gateway&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/11/01/deployment-updates.md&lt;/span&gt;, line 203)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="future-work"&gt;
&lt;h1&gt;Future Work&lt;/h1&gt;
&lt;p&gt;There is still plenty to do. Here are some of the themes we’ve seen among
current development:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Move the Scheduler off to a separate job/pod/container in the network,
which is often helpful for complex networking situations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improve proxying of the dashboard in these situations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optionally separate the life-cycle of the cluster from the lifetime of the
Python process that requested the cluster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write up best practices how to compose GPU support generally with all of the cluster managers&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/11/01/deployment-updates/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <published>2019-11-01T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/10/08/df-groupby/</id>
    <title>DataFrame Groupby Aggregations</title>
    <updated>2019-10-08T00:00:00+00:00</updated>
    <author>
      <name>Benjamin Zaitlen &amp; James Bourbeau</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/10/08/df-groupby.md&lt;/span&gt;, line 10)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="groupby-aggregations-with-dask"&gt;

&lt;p&gt;In this post we’ll dive into how Dask computes groupby aggregations. These are commonly used operations for ETL and analysis in which we split data into groups, apply a function to each group independently, and then combine the results back together. In the PyData/R world this is often referred to as the split-apply-combine strategy (first coined by &lt;a class="reference external" href="https://www.jstatsoft.org/article/view/v040i01"&gt;Hadley Wickham&lt;/a&gt;) and is used widely throughout the &lt;a class="reference external" href="https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html"&gt;Pandas ecosystem&lt;/a&gt;.&lt;/p&gt;
&lt;div align="center"&gt;
  &lt;a href="/images/split-apply-combine.png"&gt;
    &lt;img src="/images/split-apply-combine.png" width="80%" align="center"&gt;
  &lt;/a&gt;
  &lt;p align="center"&gt;&lt;i&gt;Image courtesy of swcarpentry.github.io&lt;/i&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Dask leverages this idea using a similarly catchy name: apply-concat-apply or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;aca&lt;/span&gt;&lt;/code&gt; for short. Here we’ll explore the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;aca&lt;/span&gt;&lt;/code&gt; strategy in both simple and complex operations.&lt;/p&gt;
&lt;p&gt;First, recall that a Dask DataFrame is a &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-design.html#internal-design"&gt;collection&lt;/a&gt; of DataFrame objects (e.g. each &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-design.html#partitions"&gt;partition&lt;/a&gt; of a Dask DataFrame is a Pandas DataFrame). For example, let’s say we have the following Pandas DataFrame:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pandas&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;                       &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;                       &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;
&lt;span class="go"&gt;     a   b   c&lt;/span&gt;
&lt;span class="go"&gt;0    1   1   2&lt;/span&gt;
&lt;span class="go"&gt;1    1   3   4&lt;/span&gt;
&lt;span class="go"&gt;2    2  10   5&lt;/span&gt;
&lt;span class="go"&gt;3    3   3   2&lt;/span&gt;
&lt;span class="go"&gt;4    3   2   3&lt;/span&gt;
&lt;span class="go"&gt;5    1   1   5&lt;/span&gt;
&lt;span class="go"&gt;6    1   3   2&lt;/span&gt;
&lt;span class="go"&gt;7    2  10   3&lt;/span&gt;
&lt;span class="go"&gt;8    3   3   9&lt;/span&gt;
&lt;span class="go"&gt;9    3   3   2&lt;/span&gt;
&lt;span class="go"&gt;10  99  12  44&lt;/span&gt;
&lt;span class="go"&gt;11  10   0  33&lt;/span&gt;
&lt;span class="go"&gt;12   1   9   2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To create a Dask DataFrame with three partitions from this data, we could partition &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;df&lt;/span&gt;&lt;/code&gt; between the indices of: (0, 4), (5, 9), and (10, 12). We can perform this partitioning with Dask by using the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;from_pandas&lt;/span&gt;&lt;/code&gt; function with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;npartitions=3&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pandas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;npartitions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The 3 partitions are simply 3 individual Pandas DataFrames:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="go"&gt;   a   b  c&lt;/span&gt;
&lt;span class="go"&gt;0  1   1  2&lt;/span&gt;
&lt;span class="go"&gt;1  1   3  4&lt;/span&gt;
&lt;span class="go"&gt;2  2  10  5&lt;/span&gt;
&lt;span class="go"&gt;3  3   3  2&lt;/span&gt;
&lt;span class="go"&gt;4  3   2  3&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/10/08/df-groupby.md&lt;/span&gt;, line 66)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="apply-concat-apply"&gt;
&lt;h1&gt;Apply-concat-apply&lt;/h1&gt;
&lt;p&gt;When Dask applies a function and/or algorithm (e.g. &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sum&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mean&lt;/span&gt;&lt;/code&gt;, etc.) to a Dask DataFrame, it does so by applying that operation to all the constituent partitions independently, collecting (or concatenating) the outputs into intermediary results, and then applying the operation again to the intermediary results to produce a final result. Internally, Dask re-uses the same apply-concat-apply methodology for many of its internal DataFrame calculations.&lt;/p&gt;
&lt;p&gt;Let’s break down how Dask computes &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ddf.groupby(['a',&lt;/span&gt; &lt;span class="pre"&gt;'b']).c.sum()&lt;/span&gt;&lt;/code&gt; by going through each step in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;aca&lt;/span&gt;&lt;/code&gt; process. We’ll begin by splitting our &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;df&lt;/span&gt;&lt;/code&gt; Pandas DataFrame into three partitions:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;df_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;df_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;df_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section id="apply"&gt;
&lt;h2&gt;Apply&lt;/h2&gt;
&lt;p&gt;Next we perform the same &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;groupby(['a',&lt;/span&gt; &lt;span class="pre"&gt;'b']).c.sum()&lt;/span&gt;&lt;/code&gt; operation on each of our three partitions:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;sr1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;sr2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;sr3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;These operations each produce a Series with a &lt;a class="reference external" href="https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html"&gt;MultiIndex&lt;/a&gt;:&lt;/p&gt;
&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;
      &lt;pre&gt;
&gt;&gt;&gt; sr1
a  b
1  1     2
   3     4
2  10    5
3  2     3
   3     2
Name: c, dtype: int64
      &lt;/pre&gt;
    &lt;/th&gt;
    &lt;th&gt;
      &lt;pre&gt;
&gt;&gt;&gt; sr2
a  b
1  1      5
   3      2
2  10     3
3  3     11
Name: c, dtype: int64
      &lt;/pre&gt;
    &lt;/th&gt;
    &lt;th&gt;
      &lt;pre&gt;
&gt;&gt;&gt; sr3
a   b
1   9      2
10  0     33
99  12    44
Name: c, dtype: int64
      &lt;/pre&gt;
    &lt;/th&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;/section&gt;
&lt;section id="the-concat"&gt;
&lt;h2&gt;The conCat!&lt;/h2&gt;
&lt;p&gt;After the first &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;apply&lt;/span&gt;&lt;/code&gt;, the next step is to concatenate the intermediate &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sr1&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sr2&lt;/span&gt;&lt;/code&gt;, and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sr3&lt;/span&gt;&lt;/code&gt; results. This is fairly straightforward to do using the Pandas &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;concat&lt;/span&gt;&lt;/code&gt; function:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;sr_concat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;sr1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;sr_concat&lt;/span&gt;
&lt;span class="go"&gt;a   b&lt;/span&gt;
&lt;span class="go"&gt;1   1      2&lt;/span&gt;
&lt;span class="go"&gt;    3      4&lt;/span&gt;
&lt;span class="go"&gt;2   10     5&lt;/span&gt;
&lt;span class="go"&gt;3   2      3&lt;/span&gt;
&lt;span class="go"&gt;    3      2&lt;/span&gt;
&lt;span class="go"&gt;1   1      5&lt;/span&gt;
&lt;span class="go"&gt;    3      2&lt;/span&gt;
&lt;span class="go"&gt;2   10     3&lt;/span&gt;
&lt;span class="go"&gt;3   3     11&lt;/span&gt;
&lt;span class="go"&gt;1   9      2&lt;/span&gt;
&lt;span class="go"&gt;10  0     33&lt;/span&gt;
&lt;span class="go"&gt;99  12    44&lt;/span&gt;
&lt;span class="go"&gt;Name: c, dtype: int64&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="apply-redux"&gt;
&lt;h2&gt;Apply Redux&lt;/h2&gt;
&lt;p&gt;Our final step is to apply the same &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;groupby(['a',&lt;/span&gt; &lt;span class="pre"&gt;'b']).c.sum()&lt;/span&gt;&lt;/code&gt; operation again on the concatenated &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sr_concat&lt;/span&gt;&lt;/code&gt; Series. However we no longer have columns &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;a&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;b&lt;/span&gt;&lt;/code&gt;, so how should we proceed?&lt;/p&gt;
&lt;p&gt;Zooming out a bit, our goal is to add the values in the column which have the same index. For example, there are two rows with the index &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;(1,&lt;/span&gt; &lt;span class="pre"&gt;1)&lt;/span&gt;&lt;/code&gt; with corresponding values: 2, 5. So how can we groupby the indices with the same value? A MutliIndex uses &lt;a class="reference external" href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html#pandas.MultiIndex"&gt;levels&lt;/a&gt; to define what the value is at a give index. Dask &lt;a class="reference external" href="https://github.com/dask/dask/blob/973c6e1b2e38c2d9d6e8c75fb9b4ab7a0d07e6a7/dask/dataframe/groupby.py#L69-L75"&gt;determines&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/dask/dask/blob/973c6e1b2e38c2d9d6e8c75fb9b4ab7a0d07e6a7/dask/dataframe/groupby.py#L1065"&gt;uses these levels&lt;/a&gt; in the final apply step of the apply-concat-apply calculation. In our case, the level is &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;[0,&lt;/span&gt; &lt;span class="pre"&gt;1]&lt;/span&gt;&lt;/code&gt;, that is, we want both the index at the 0th level and the 1st level and if we group by both, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;0,&lt;/span&gt; &lt;span class="pre"&gt;1&lt;/span&gt;&lt;/code&gt;, we will have effectively grouped the same indices together:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sr_concat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;
      &lt;pre&gt;
&gt;&gt;&gt; total
a   b
1   1      7
    3      6
    9      2
2   10     8
3   2      3
    3     13
10  0     33
99  12    44
Name: c, dtype: int64
      &lt;/pre&gt;
    &lt;/th&gt;
    &lt;th&gt;
      &lt;pre&gt;
&gt;&gt;&gt; ddf.groupby(['a', 'b']).c.sum().compute()
a   b
1   1      7
    3      6
2   10     8
3   2      3
    3     13
1   9      2
10  0     33
99  12    44
Name: c, dtype: int64
      &lt;/pre&gt;
    &lt;/th&gt;
    &lt;th&gt;
      &lt;pre&gt;
&gt;&gt;&gt; df.groupby(['a', 'b']).c.sum()
a   b
1   1      7
    3      6
    9      2
2   10     8
3   2      3
    3     13
10  0     33
99  12    44
Name: c, dtype: int64
      &lt;/pre&gt;
    &lt;/th&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Additionally, we can easily examine the steps of this apply-concat-apply calculation by &lt;a class="reference external" href="https://docs.dask.org/en/latest/graphviz.html"&gt;visualizing the task graph&lt;/a&gt; for the computation:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a href="/images/sum.svg"&gt;
  &lt;img src="/images/sum.svg" width="80%"&gt;
&lt;/a&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sum&lt;/span&gt;&lt;/code&gt; is rather a straight-forward calculation. What about something a bit more complex like &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mean&lt;/span&gt;&lt;/code&gt;?&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a href="/images/mean.svg"&gt;
  &lt;img src="/images/mean.svg" width="80%"&gt;
&lt;/a&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Mean&lt;/span&gt;&lt;/code&gt; is a good example of an operation which doesn’t directly fit in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;aca&lt;/span&gt;&lt;/code&gt; model – concatenating &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mean&lt;/span&gt;&lt;/code&gt; values and taking the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mean&lt;/span&gt;&lt;/code&gt; again will yield incorrect results. Like any style of computation: vectorization, Map/Reduce, etc., we sometime need to creatively fit the computation to the style/mode. In the case of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;aca&lt;/span&gt;&lt;/code&gt; we can often break down the calculation into constituent parts. For &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mean&lt;/span&gt;&lt;/code&gt;, this would be &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sum&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;count&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class="math notranslate nohighlight"&gt;
\[ \bar{x} = \frac{x_1+x_2+\cdots +x_n}{n}\]&lt;/div&gt;
&lt;p&gt;From the task graph above, we can see that two independent tasks for each partition: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;series-groupby-count-chunk&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;series-groupby-sum-chunk&lt;/span&gt;&lt;/code&gt;. The results are then aggregated into two final nodes: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;series-groupby-count-agg&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;series-groupby-sum-agg&lt;/span&gt;&lt;/code&gt; and then we finally calculate the mean: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;total&lt;/span&gt; &lt;span class="pre"&gt;sum&lt;/span&gt; &lt;span class="pre"&gt;/&lt;/span&gt; &lt;span class="pre"&gt;total&lt;/span&gt; &lt;span class="pre"&gt;count&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/10/08/df-groupby/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="dask" label="dask"/>
    <category term="dataframe" label="dataframe"/>
    <published>2019-10-08T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/09/30/dask-hyperparam-opt/</id>
    <title>Better and faster hyperparameter optimization with Dask</title>
    <updated>2019-09-30T00:00:00+00:00</updated>
    <author>
      <name>&lt;a href="http://stsievert.com"&gt;Scott Sievert&lt;/a&gt;</name>
    </author>
    <content type="html">&lt;p&gt;&lt;em&gt;&lt;a class="reference external" href="https://stsievert.com"&gt;Scott Sievert&lt;/a&gt; wrote this post. The original post lives at
&lt;a class="reference external" href="https://stsievert.com/blog/2019/09/27/dask-hyperparam-opt/"&gt;https://stsievert.com/blog/2019/09/27/dask-hyperparam-opt/&lt;/a&gt; with better
styling. This work is supported by Anaconda, Inc.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://dask.org"&gt;Dask&lt;/a&gt;’s machine learning package, &lt;a class="reference external" href="https://ml.dask.org/"&gt;Dask-ML&lt;/a&gt; now implements Hyperband, an
advanced “hyperparameter optimization” algorithm that performs rather well.
This post will&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;describe “hyperparameter optimization”, a common problem in machine learning&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;describe Hyperband’s benefits and why it works&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;show how to use Hyperband via example alongside performance comparisons&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this post, I’ll walk through a practical example and highlight key portions
of the paper “&lt;a class="reference external" href="http://conference.scipy.org/proceedings/scipy2019/pdfs/scott_sievert.pdf"&gt;Better and faster hyperparameter optimization with Dask&lt;/a&gt;”, which is also
summarized in a &lt;a class="reference external" href="https://www.youtube.com/watch?v=x67K9FiPFBQ"&gt;~25 minute SciPy 2019 talk&lt;/a&gt;.&lt;/p&gt;
&lt;!--More--&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/30/dask-hyperparam-opt.md&lt;/span&gt;, line 41)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="problem"&gt;

&lt;p&gt;Machine learning requires data, an untrained model and “hyperparameters”, parameters that are chosen before training begins that
help with cohesion between the model and data. The user needs to specify values
for these hyperparameters in order to use the model. A good example is
adapting ridge regression or LASSO to the amount of noise in the
data with the regularization parameter.&lt;a class="footnote-reference brackets" href="#alpha" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Model performance strongly depends on the hyperparameters provided. A fairly complex example is with a particular
visualization tool, &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html"&gt;t-SNE&lt;/a&gt;. This tool requires (at least) three
hyperparameters and performance depends radically on the hyperparameters. In fact, the first section in “&lt;a class="reference external" href="https://distill.pub/2016/misread-tsne/"&gt;How to Use t-SNE
Effectively&lt;/a&gt;” is titled “Those hyperparameters really matter”.&lt;/p&gt;
&lt;p&gt;Finding good values for these hyperparameters is critical and has an entire
Scikit-learn documentation page, “&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/grid_search.html"&gt;Tuning the hyperparameters of an
estimator&lt;/a&gt;.” Briefly, finding decent values of hyperparameters
is difficult and requires guessing or searching.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How can these hyperparameters be found quickly and efficiently with an
advanced task scheduler like Dask?&lt;/strong&gt; Parallelism will pose some challenges, but
the Dask architecture enables some advanced algorithms.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: this post presumes knowledge of Dask basics. This material is covered in
Dask’s documentation on &lt;a class="reference external" href="https://docs.dask.org/en/latest/why.html"&gt;Why Dask?&lt;/a&gt;, a ~15 minute &lt;a class="reference external" href="https://www.youtube.com/watch?v=ods97a5Pzw0"&gt;video introduction to
Dask&lt;/a&gt;, a &lt;a class="reference external" href="https://www.youtube.com/watch?v=tQBovBvSDvA"&gt;video introduction to Dask-ML&lt;/a&gt; and &lt;a class="reference external" href="https://stsievert.com/blog/2016/09/09/dask-cluster/"&gt;a
blog post I wrote&lt;/a&gt; on my first use of Dask.&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/30/dask-hyperparam-opt.md&lt;/span&gt;, line 78)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contributions"&gt;
&lt;h1&gt;Contributions&lt;/h1&gt;
&lt;p&gt;Dask-ML can quickly find high-performing hyperparameters. I will back this
claim with intuition and experimental evidence.&lt;/p&gt;
&lt;p&gt;Specifically, this is because
Dask-ML now
implements an algorithm introduced by Li et. al. in “&lt;a class="reference external" href="https://arxiv.org/pdf/1603.06560.pdf"&gt;Hyperband: A novel
bandit-based approach to hyperparameter optimization&lt;/a&gt;”.
Pairing of Dask and Hyperband enables some exciting new performance opportunities,
especially because Hyperband has a simple implementation and Dask is an
advanced task scheduler.&lt;a class="footnote-reference brackets" href="#first" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Let’s go
through the basics of Hyperband then illustrate its use and performance with
an example.
This will highlight some key points of &lt;a class="reference external" href="http://conference.scipy.org/proceedings/scipy2019/pdfs/scott_sievert.pdf"&gt;the corresponding paper&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/30/dask-hyperparam-opt.md&lt;/span&gt;, line 104)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="hyperband-basics"&gt;
&lt;h1&gt;Hyperband basics&lt;/h1&gt;
&lt;p&gt;The motivation for Hyperband is to find high performing hyperparameters with minimal
training. Given this goal, it makes sense to spend more time training high
performing models – why waste more time training time a model if it’s done poorly in the past?&lt;/p&gt;
&lt;p&gt;One method to spend more time on high performing models is to initialize many
models, start training all of them, and then stop training low performing models
before training is finished. That’s what Hyperband does. At the most basic
level, Hyperband is a (principled) early-stopping scheme for
&lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html"&gt;RandomizedSearchCV&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Deciding when to stop the training of models depends on how strongly
the training data effects the score. There are two extremes:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;when only the training data matter&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;i.e., when the hyperparameters don’t influence the score at all&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;when only the hyperparameters matter&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;i.e., when the training data don’t influence the score at all&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Hyperband balances these two extremes by sweeping over how frequently
models are stopped. This sweep allows a mathematical proof that Hyperband
will find the best model possible with minimal &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt;
calls&lt;a class="footnote-reference brackets" href="#qual" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Hyperband has significant parallelism because it has two “embarrassingly
parallel” for-loops – Dask can exploit this. Hyperband has been implemented
in Dask, specifically in Dask’s machine library Dask-ML.&lt;/p&gt;
&lt;p&gt;How well does it perform? Let’s illustrate via example. Some setup is required
before the performance comparison in &lt;em&gt;&lt;a class="reference internal" href="#performance"&gt;&lt;span class="xref myst"&gt;Performance&lt;/span&gt;&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/30/dask-hyperparam-opt.md&lt;/span&gt;, line 140)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="example"&gt;
&lt;h1&gt;Example&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Note: want to try &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; out yourself? Dask has &lt;a class="reference external" href="https://examples.dask.org/machine-learning/hyperparam-opt.html"&gt;an example use&lt;/a&gt;.
It can even be run in-browser!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I’ll illustrate with a synthetic example. Let’s build a dataset with 4 classes:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;experiment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_circles&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_circles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_features&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_informative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img src="/images/2019-hyperband/synthetic/dataset.png"
style="max-width: 100%;"
width="200px" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: this content is pulled from
&lt;a class="reference external" href="https://github.com/stsievert/dask-hyperband-comparison"&gt;stsievert/dask-hyperband-comparison&lt;/a&gt;, or makes slight modifications.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Let’s build a fully connected neural net with 24 neurons for classification:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.neural_network&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MLPClassifier&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MLPClassifier&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Building the neural net with PyTorch is also possible&lt;a class="footnote-reference brackets" href="#skorch" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; (and what I used in development).&lt;/p&gt;
&lt;p&gt;This neural net’s behavior is dictated by 6 hyperparameters. Only one controls
the model of the optimal architecture (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;hidden_layer_sizes&lt;/span&gt;&lt;/code&gt;, the number of
neurons in each layer). The rest control finding the best model of that
architecture. Details on the hyperparameters are in the
&lt;em&gt;&lt;a class="reference internal" href="#appendix"&gt;&lt;span class="xref myst"&gt;Appendix&lt;/span&gt;&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;  &lt;span class="c1"&gt;# details in appendix&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="go"&gt;dict_keys([&amp;#39;hidden_layer_sizes&amp;#39;, &amp;#39;alpha&amp;#39;, &amp;#39;batch_size&amp;#39;, &amp;#39;learning_rate&amp;#39;&lt;/span&gt;
&lt;span class="go"&gt;           &amp;#39;learning_rate_init&amp;#39;, &amp;#39;power_t&amp;#39;, &amp;#39;momentum&amp;#39;])&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;hidden_layer_sizes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# always 24 neurons&lt;/span&gt;
&lt;span class="go"&gt;[(24, ), (12, 12), (6, 6, 6, 6), (4, 4, 4, 4, 4, 4), (12, 6, 3, 3)]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I choose these hyperparameters to have a complex search space that mimics the
searches performed for most neural networks. These searches typically involve
hyperparameters like “dropout”, “learning rate”, “momentum” and “weight
decay”.&lt;a class="footnote-reference brackets" href="#user-facing" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;
End users don’t care hyperparameters like these; they don’t change the
model architecture, only finding the best model of a particular architecture.&lt;/p&gt;
&lt;p&gt;How can high performing hyperparameter values be found quickly?&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/30/dask-hyperparam-opt.md&lt;/span&gt;, line 205)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="finding-the-best-parameters"&gt;
&lt;h1&gt;Finding the best parameters&lt;/h1&gt;
&lt;p&gt;First, let’s look at the parameters required for Dask-ML’s implementation
of Hyperband (which is in the class &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;).&lt;/p&gt;
&lt;section id="hyperband-parameters-rule-of-thumb"&gt;
&lt;h2&gt;Hyperband parameters: rule-of-thumb&lt;/h2&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; has two inputs:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;max_iter&lt;/span&gt;&lt;/code&gt;, which determines how many times to call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the chunk size of the Dask array, which determines how many data each
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt; call receives.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These fall out pretty naturally once it’s known how long to train the best
model and very approximately how many parameters to sample:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;n_examples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 50 passes through dataset for best model&lt;/span&gt;
&lt;span class="n"&gt;n_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;299&lt;/span&gt;  &lt;span class="c1"&gt;# sample about 300 parameters&lt;/span&gt;

&lt;span class="c1"&gt;# inputs to hyperband&lt;/span&gt;
&lt;span class="n"&gt;max_iter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n_params&lt;/span&gt;
&lt;span class="n"&gt;chunk_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n_examples&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;n_params&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The inputs to this rule-of-thumb are exactly what the user cares about:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;a measure of how complex the search space is (via &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;n_params&lt;/span&gt;&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;how long to train the best model (via &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;n_examples&lt;/span&gt;&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notably, there’s no tradeoff between &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;n_examples&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;n_params&lt;/span&gt;&lt;/code&gt; like with
Scikit-learn’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;RandomizedSearchCV&lt;/span&gt;&lt;/code&gt; because &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;n_examples&lt;/span&gt;&lt;/code&gt; is only for &lt;em&gt;some&lt;/em&gt;
models, not for &lt;em&gt;all&lt;/em&gt; models. There’s more details on this
rule-of-thumb in the “Notes” section of the &lt;a class="reference external" href="https://ml.dask.org/modules/generated/dask_ml.model_selection.HyperbandSearchCV.html#dask_ml.model_selection.HyperbandSearchCV"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;
docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With these inputs a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; object can easily be created.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="finding-the-best-performing-hyperparameters"&gt;
&lt;h2&gt;Finding the best performing hyperparameters&lt;/h2&gt;
&lt;p&gt;This model selection algorithm Hyperband is implemented in the class
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;. Let’s create an instance of that class:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_ml.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="n"&gt;est&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aggressiveness&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;aggressiveness&lt;/span&gt;&lt;/code&gt; defaults to 3. &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;aggressiveness=4&lt;/span&gt;&lt;/code&gt; is chosen because this is an
&lt;em&gt;initial&lt;/em&gt; search; I know nothing about how this search space. Then, this search
should be more aggressive in culling off bad models.&lt;/p&gt;
&lt;p&gt;Hyperband hides some details from the user (which enables the mathematical
guarantees), specifically the details on the amount of training and
the number of models created. These details are available in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;metadata&lt;/span&gt;&lt;/code&gt;
attribute:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;n_models&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="go"&gt;378&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;partial_fit_calls&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="go"&gt;5721&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now that we have some idea on how long the computation will take, let’s ask it
to find the best set of hyperparameters:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_ml.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rechunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rechunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The dashboard will be active during this time&lt;a class="footnote-reference brackets" href="#dashboard" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;
&lt;video width="600" style="max-width: 100%;" autoplay loop controls &gt;
  &lt;source src="/images/2019-hyperband/dashboard-compress.mp4" type="video/mp4" &gt;
  Your browser does not support the video tag.
&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;How well do these hyperparameters perform?&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_score_&lt;/span&gt;
&lt;span class="go"&gt;0.9019221418447483&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; mirrors Scikit-learn’s API for &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html"&gt;RandomizedSearchCV&lt;/a&gt;, so it
has access to all the expected attributes and methods:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params_&lt;/span&gt;
&lt;span class="go"&gt;{&amp;quot;batch_size&amp;quot;: 64, &amp;quot;hidden_layer_sizes&amp;quot;: [6, 6, 6, 6], ...}&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;0.8989070100111217&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_model_&lt;/span&gt;
&lt;span class="go"&gt;MLPClassifier(...)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Details on the attributes and methods are in the &lt;a class="reference external" href="https://ml.dask.org/modules/generated/dask_ml.model_selection.HyperbandSearchCV.html"&gt;HyperbandSearchCV
documentation&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/30/dask-hyperparam-opt.md&lt;/span&gt;, line 322)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="performance"&gt;
&lt;h1&gt;Performance&lt;/h1&gt;
&lt;!--
Plot 1: how well does it do?
Plot 2: how does this scale?
Plot 3: what opportunities does Dask enable?
--&gt;
&lt;p&gt;I ran this 200 times on my personal laptop with 4 cores.
Let’s look at the distribution of final validation scores:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/2019-hyperband/synthetic/final-acc.svg"
style="max-width: 100%;"
 width="400px"/&gt;&lt;/p&gt;
&lt;p&gt;The “passive” comparison is really &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;RandomizedSearchCV&lt;/span&gt;&lt;/code&gt; configured so it takes
an equal amount of work as &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;. Let’s see how this does over
time:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/2019-hyperband/synthetic/val-acc.svg"
style="max-width: 100%;"
 width="400px"/&gt;&lt;/p&gt;
&lt;p&gt;This graph shows the mean score over the 200 runs with the solid line, and the
shaded region represents the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Interquartile_range"&gt;interquartile range&lt;/a&gt;. The dotted green
line indicates the data required to train 4 models to completion.
“Passes through the dataset” is a good proxy
for “time to solution” because there are only 4 workers.&lt;/p&gt;
&lt;p&gt;This graph shows that &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; will find parameters at least 3 times
quicker than &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;RandomizedSearchCV&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;section id="dask-opportunities"&gt;
&lt;h2&gt;Dask opportunities&lt;/h2&gt;
&lt;p&gt;What opportunities does combining Hyperband and Dask create?
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; has a lot of internal parallelism and Dask is an advanced task
scheduler.&lt;/p&gt;
&lt;p&gt;The most obvious opportunity involves job prioritization. Hyperband fits many
models in parallel and Dask might not have that
workers available. This means some jobs have to wait for other jobs
to finish. Of course, Dask can prioritize jobs&lt;a class="footnote-reference brackets" href="#prior" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and choose which models
to fit first.&lt;/p&gt;
&lt;p&gt;Let’s assign the priority for fitting a certain model to be the model’s most
recent score. How does this prioritization scheme influence the score? Let’s
compare the prioritization schemes in
a single run of the 200 above:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/2019-hyperband/synthetic/priority.svg"
style="max-width: 100%;"
     width="400px" /&gt;&lt;/p&gt;
&lt;p&gt;These two lines are the same in every way except for
the prioritization scheme.
This graph compares the “high scores” prioritization scheme and the Dask’s
default prioritization scheme (“fifo”).&lt;/p&gt;
&lt;p&gt;This graph is certainly helped by the fact that is run with only 4 workers.
Job priority does not matter if every job can be run right away (there’s
nothing to assign priority too!).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="amenability-to-parallelism"&gt;
&lt;h2&gt;Amenability to parallelism&lt;/h2&gt;
&lt;p&gt;How does Hyperband scale with the number of workers?&lt;/p&gt;
&lt;p&gt;I ran another separate experiment to measure. This experiment is described more in the &lt;a class="reference external" href="http://conference.scipy.org/proceedings/scipy2019/pdfs/scott_sievert.pdf"&gt;corresponding
paper&lt;/a&gt;, but the relevant difference is that a &lt;a class="reference external" href="https://pytorch.org/"&gt;PyTorch&lt;/a&gt; neural network is used
through &lt;a class="reference external" href="https://skorch.readthedocs.io/en/stable/"&gt;skorch&lt;/a&gt; instead of Scikit-learn’s MLPClassifier.&lt;/p&gt;
&lt;p&gt;I ran the &lt;em&gt;same&lt;/em&gt; experiment with a different number of Dask
workers.&lt;a class="footnote-reference brackets" href="#same" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; Here’s how &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; scales:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/2019-hyperband/image-denoising/scaling-patience.svg" width="400px"
style="max-width: 100%;"
/&gt;&lt;/p&gt;
&lt;p&gt;Training one model to completion requires 243 seconds (which is marked by the
white line). This is a comparison with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;patience&lt;/span&gt;&lt;/code&gt;, which stops training models
if their scores aren’t increasing enough. Functionally, this is very useful
because the user might accidentally specify &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;n_examples&lt;/span&gt;&lt;/code&gt; to be too large.&lt;/p&gt;
&lt;p&gt;It looks like the speedups start to saturate somewhere
between 16 and 24 workers, at least for this example.
Of course, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;patience&lt;/span&gt;&lt;/code&gt; doesn’t work as well for a large number of
workers.&lt;a class="footnote-reference brackets" href="#scale-worker" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/30/dask-hyperparam-opt.md&lt;/span&gt;, line 421)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="future-work"&gt;
&lt;h1&gt;Future work&lt;/h1&gt;
&lt;p&gt;There’s some ongoing pull requests to improve &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;. The most
significant of these involves tweaking some Hyperband internals so &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;
works better with initial or very exploratory searches (&lt;a class="reference external" href="https://github.com/dask/dask-ml/pull/532"&gt;dask/dask-ml #532&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The biggest improvement I see is treating &lt;em&gt;dataset size&lt;/em&gt; as the scarce resource
that needs to be preserved instead of &lt;em&gt;training time&lt;/em&gt;. This would allow
Hyperband to work with any model, instead of only models that implement
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Serialization is an important part of the distributed Hyperband implementation
in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;. Scikit-learn and PyTorch can easily handle this because
they support the Pickle protocol&lt;a class="footnote-reference brackets" href="#pickle-post" id="id10" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, but
Keras/Tensorflow/MXNet present challenges. The use of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; could
be increased by resolving this issue.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/30/dask-hyperparam-opt.md&lt;/span&gt;, line 444)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="appendix"&gt;
&lt;h1&gt;Appendix&lt;/h1&gt;
&lt;p&gt;I choose to tune 7 hyperparameters, which are&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;hidden_layer_sizes&lt;/span&gt;&lt;/code&gt;, which controls the activation function used at each
neuron&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;alpha&lt;/span&gt;&lt;/code&gt;, which controls the amount of regularization&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;More hyperparameters control finding the best neural network:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;batch_size&lt;/span&gt;&lt;/code&gt;, which controls the number of examples the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;optimizer&lt;/span&gt;&lt;/code&gt; uses to
approximate the gradient&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;learning_rate&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;learning_rate_init&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;power_t&lt;/span&gt;&lt;/code&gt;, which control some basic
hyperparameters for the SGD optimizer I’ll be using&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;momentum&lt;/span&gt;&lt;/code&gt;, a more advanced hyperparameter for SGD with Nesterov’s momentum.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;hr class="footnotes docutils" /&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="alpha" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Which amounts to choosing &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;alpha&lt;/span&gt;&lt;/code&gt; in Scikit-learn’s &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html"&gt;Ridge&lt;/a&gt; or &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html"&gt;LASSO&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="first" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;To the best of my knowledge, this is the first implementation of Hyperband with an advanced task scheduler&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="qual" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;More accurately, Hyperband will find close to the best model possible with &lt;span class="math notranslate nohighlight"&gt;\(N\)&lt;/span&gt; &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt; calls in expected score with high probability, where “close” means “within log terms of the upper bound on score”. For details, see Corollary 1 of the &lt;a class="reference external" href="http://conference.scipy.org/proceedings/scipy2019/pdfs/scott_sievert.pdf"&gt;corresponding paper&lt;/a&gt; or Theorem 5 of &lt;a class="reference external" href="https://arxiv.org/pdf/1603.06560.pdf"&gt;Hyperband’s paper&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="skorch" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;through the Scikit-learn API wrapper &lt;a class="reference external" href="https://skorch.readthedocs.io/en/stable/"&gt;skorch&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="user-facing" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;5&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;There’s less tuning for adaptive step size methods like &lt;a class="reference external" href="https://arxiv.org/abs/1412.6980"&gt;Adam&lt;/a&gt; or &lt;a class="reference external" href="http://jmlr.org/papers/v12/duchi11a.html"&gt;Adagrad&lt;/a&gt;, but they might under-perform on the test data (see “&lt;a class="reference external" href="https://arxiv.org/abs/1705.08292"&gt;The Marginal Value of Adaptive Gradient Methods for Machine Learning&lt;/a&gt;”)&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="dashboard" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id6"&gt;6&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;But it probably won’t be this fast: the video is sped up by a factor of 3.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="prior" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;See Dask’s documentation on &lt;a class="reference external" href="https://distributed.dask.org/en/latest/priority.html"&gt;Prioritizing Work&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="same" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Everything is the same between different runs: the hyperparameters sampled, the model’s internal random state, the data passed for fitting. Only the number of workers varies.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="scale-worker" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id9"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;There’s no time benefit to stopping jobs early if there are infinite workers; there’s never a queue of jobs waiting to be run&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="pickle-post" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id10"&gt;10&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;“&lt;a class="reference external" href="http://matthewrocklin.com/blog/work/2018/07/23/protocols-pickle"&gt;Pickle isn’t slow, it’s a protocol&lt;/a&gt;” by Matthew Rocklin&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="regularization" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;11&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Performance comparison: Scikit-learn’s visualization of tuning a Support Vector Machine’s (SVM) regularization parameter: &lt;a class="reference external" href="https://scikit-learn.org/stable/auto_examples/svm/plot_svm_scale_c.html"&gt;Scaling the regularization parameter for SVMs&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="new" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;12&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;It’s been around since 2016… and some call that “old news.”&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
</content>
    <link href="https://blog.dask.org/2019/09/30/dask-hyperparam-opt/"/>
    <summary>Scott Sievert wrote this post. The original post lives at
https://stsievert.com/blog/2019/09/27/dask-hyperparam-opt/ with better
styling. This work is supported by Anaconda, Inc.</summary>
    <category term="dask-ml" label="dask-ml"/>
    <category term="machine-learning" label="machine-learning"/>
    <published>2019-09-30T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/09/13/jupyter-on-dask/</id>
    <title>Co-locating a Jupyter Server and Dask Scheduler</title>
    <updated>2019-09-13T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;p&gt;If you want, you can have Dask set up a Jupyter notebook server for you,
co-located with the Dask scheduler. There are many ways to do this, but this
blog post lists two.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/13/jupyter-on-dask.md&lt;/span&gt;, line 13)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="first-why-would-you-do-this"&gt;

&lt;p&gt;Sometimes people inside of large institutions have complex deployment pains.
It takes them a while to stand up a process running on a machine in their
cluster, with all of the appropriate networking ports open and such.
In that situation, it can sometimes be nice to do this just once, say for Dask,
rather than twice, say for Dask and for Jupyter.&lt;/p&gt;
&lt;p&gt;Probably in these cases people should invest in a long term solution like
&lt;a class="reference external" href="https://jupyter.org/hub"&gt;JupyterHub&lt;/a&gt;,
or one of its enterprise variants,
but this blogpost gives a couple of hacks in the meantime.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/13/jupyter-on-dask.md&lt;/span&gt;, line 26)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="hack-1-create-a-jupyter-server-from-a-python-function-call"&gt;
&lt;h1&gt;Hack 1: Create a Jupyter server from a Python function call&lt;/h1&gt;
&lt;p&gt;If your Dask scheduler is already running, connect to it with a Client and run
a Python function that starts up a Jupyter server.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;scheduler-address:8786&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;start_juptyer_server&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;notebook.notebookapp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NotebookApp&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NotebookApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;([])&lt;/span&gt;  &lt;span class="c1"&gt;# add command line args here if you want&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_on_scheduler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_jupyter_server&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you have a complex networking setup (maybe you’re on the cloud or HPC and
had to open up a port explicitly) then you might want to install
&lt;a class="reference external" href="https://jupyter-server-proxy.readthedocs.io/en/latest/"&gt;jupyter-server-proxy&lt;/a&gt;
(which Dask also uses by default if installed), and then go to
&lt;a class="reference external" href="https://example.com"&gt;http://scheduler-address:8787/proxy/8888&lt;/a&gt; . The Dask dashboard can route your
connection to Jupyter (Jupyter is also kind enough to do the same for Dask if
it is the main service).&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/13/jupyter-on-dask.md&lt;/span&gt;, line 52)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="hack-2-preload-script"&gt;
&lt;h1&gt;Hack 2: Preload script&lt;/h1&gt;
&lt;p&gt;This is also a great opportunity to learn about the various ways of &lt;a class="reference external" href="https://docs.dask.org/en/latest/setup/custom-startup.html"&gt;adding
custom startup and teardown&lt;/a&gt;.
One such way, is a preload script like the following:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# jupyter-preload.py&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;notebook.notebookapp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NotebookApp&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;dask_setup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NotebookApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;([])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-bash notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;dask-scheduler&lt;span class="w"&gt; &lt;/span&gt;--preload&lt;span class="w"&gt; &lt;/span&gt;jupyter-preload.py
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;That script will run at an appropriate time during scheduler startup. You can
also put this into configuration&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;distributed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;preload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/path/to/jupyter-preload.py&amp;quot;&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/13/jupyter-on-dask.md&lt;/span&gt;, line 80)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="really-though-you-should-use-something-else"&gt;
&lt;h1&gt;Really though, you should use something else&lt;/h1&gt;
&lt;p&gt;This is mostly a hack. If you’re at an institution then you should ask for
something like &lt;a class="reference external" href="https://jupyter.org/hub"&gt;JuptyerHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Or, you might also want to run this in a separate subprocess, so that Jupyter
and the Dask scheduler don’t collide with each other. This shouldn’t be so
much of a problem (they’re both pretty light weight), but isolating them
probably makes sense.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/09/13/jupyter-on-dask.md&lt;/span&gt;, line 90)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="thanks-nick"&gt;
&lt;h1&gt;Thanks Nick!&lt;/h1&gt;
&lt;p&gt;Thanks to &lt;a class="reference external" href="https://github.com/bollwyvl"&gt;Nick Bollweg&lt;/a&gt;, who answered a &lt;a class="reference external" href="https://github.com/jupyter/notebook/issues/4873"&gt;questions on this topic here&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/09/13/jupyter-on-dask/"/>
    <summary>If you want, you can have Dask set up a Jupyter notebook server for you,
co-located with the Dask scheduler. There are many ways to do this, but this
blog post lists two.</summary>
    <category term="HPC" label="HPC"/>
    <published>2019-09-13T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/08/28/dask-on-summit/</id>
    <title>Dask on HPC: a case study</title>
    <updated>2019-08-28T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;p&gt;Dask is deployed on traditional HPC machines with increasing frequency.
In the past week I’ve personally helped four different groups get set up.
This is a surprisingly individual process,
because every HPC machine has its own idiosyncrasies.
Each machine uses a job scheduler like SLURM/PBS/SGE/LSF/…, a network file
system, and fast interconnect, but each of those sub-systems have slightly
different policies on a machine-by-machine basis, which is where things get tricky.&lt;/p&gt;
&lt;p&gt;Typically we can solve these problems in about 30 minutes if we have both:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Someone familiar with the machine, like a power-user or an IT administrator&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Someone familiar with setting up Dask&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These systems span a large range of scale. At different ends of this scale
this week I’ve seen both:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A small in-house 24-node SLURM cluster for research work inside of a
bio-imaging lab&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Summit, the world’s most powerful supercomputer&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this post I’m going to share a few notes of what I went through in dealing
with Summit, which was particularly troublesome. Hopefully this gives a sense
for the kinds of situations that arise. These tips likely don’t apply to your
particular system, but hopefully they give a flavor of what can go wrong,
and the processes by which we track things down.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 35)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="power-architecture"&gt;

&lt;p&gt;First, Summit is an IBM PowerPC machine, meaning that packages compiled on
normal Intel chips won’t work. Fortunately, Anaconda maintains a download of
their distribution that works well with the Power architecture, so that gave me
a good starting point.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://www.anaconda.com/distribution/#linux"&gt;https://www.anaconda.com/distribution/#linux&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Packages do seem to be a few months older than for the normal distribution, but
I can live with that.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 47)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="install-dask-jobqueue-and-configure-basic-information"&gt;
&lt;h1&gt;Install Dask-Jobqueue and configure basic information&lt;/h1&gt;
&lt;p&gt;We need to tell Dask how many cores and how much memory is on each machine.
This process is fairly straightforward, is well documented at
&lt;a class="reference external" href="https://jobqueue.dask.org"&gt;jobqueue.dask.org&lt;/a&gt; with an informative screencast,
and even self-directing with error messages.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_jobqueue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PBSCluster&lt;/span&gt;
&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PBSCluster&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="ne"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;You&lt;/span&gt; &lt;span class="n"&gt;must&lt;/span&gt; &lt;span class="n"&gt;specify&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt; &lt;span class="n"&gt;many&lt;/span&gt; &lt;span class="n"&gt;cores&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;per&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="n"&gt;like&lt;/span&gt; &lt;span class="err"&gt;``&lt;/span&gt;&lt;span class="n"&gt;cores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="err"&gt;``&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I’m going to skip this section for now because, generally, novice users are
able to handle this. For more information, consider watching this YouTube
video (30m).&lt;/p&gt;
&lt;iframe width="560" height="315"
        src="https://www.youtube.com/embed/FXsgmwpRExM?rel=0"
        frameborder="0" allow="autoplay; encrypted-media"
        allowfullscreen&gt;&lt;/iframe&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 69)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="invalid-operations-in-the-job-script"&gt;
&lt;h1&gt;Invalid operations in the job script&lt;/h1&gt;
&lt;p&gt;So we make a cluster object with all of our information, we call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.scale&lt;/span&gt;&lt;/code&gt; and
we get some error message from the job scheduler.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_jobqueue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LSFCluster&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LSFCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;600 GB&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;GEN119&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;walltime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;00:30&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ask for three nodes&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;Command:
bsub /tmp/tmp4874eufw.sh
stdout:

Typical usage:
  bsub [LSF arguments] jobscript
  bsub [LSF arguments] -Is $SHELL
  bsub -h[elp] [options]
  bsub -V

NOTES:
 * All jobs must specify a walltime (-W) and project id (-P)
 * Standard jobs must specify a node count (-nnodes) or -ln_slots. These jobs cannot specify a resource string (-R).
 * Expert mode jobs (-csm y) must specify a resource string and cannot specify -nnodes or -ln_slots.

stderr:
ERROR: Resource strings (-R) are not supported in easy mode. Please resubmit without a resource string.
ERROR: -n is no longer supported. Please request nodes with -nnodes.
ERROR: No nodes requested. Please request nodes with -nnodes.
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Dask-Jobqueue tried to generate a sensible job script from the inputs that you
provided, but the resource manager that you’re using may have additional
policies that are unique to that cluster. We debug this by looking at the
generated script, and comparing against scripts that are known to work on the
HPC machine.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;job_script&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-bash notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="ch"&gt;#!/usr/bin/env bash&lt;/span&gt;

&lt;span class="c1"&gt;#BSUB -J dask-worker&lt;/span&gt;
&lt;span class="c1"&gt;#BSUB -P GEN119&lt;/span&gt;
&lt;span class="c1"&gt;#BSUB -n 128&lt;/span&gt;
&lt;span class="c1"&gt;#BSUB -R &amp;quot;span[hosts=1]&amp;quot;&lt;/span&gt;
&lt;span class="c1"&gt;#BSUB -M 600000&lt;/span&gt;
&lt;span class="c1"&gt;#BSUB -W 00:30&lt;/span&gt;
&lt;span class="nv"&gt;JOB_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LSB_JOBID&lt;/span&gt;&lt;span class="p"&gt;%.*&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;

/ccs/home/mrocklin/anaconda/bin/python&lt;span class="w"&gt; &lt;/span&gt;-m&lt;span class="w"&gt; &lt;/span&gt;distributed.cli.dask_worker&lt;span class="w"&gt; &lt;/span&gt;tcp://scheduler:8786&lt;span class="w"&gt; &lt;/span&gt;--nthreads&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--nprocs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--memory-limit&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;75&lt;/span&gt;.00GB&lt;span class="w"&gt; &lt;/span&gt;--name&lt;span class="w"&gt; &lt;/span&gt;name&lt;span class="w"&gt; &lt;/span&gt;--nanny&lt;span class="w"&gt; &lt;/span&gt;--death-timeout&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;60&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--interface&lt;span class="w"&gt; &lt;/span&gt;ib0&lt;span class="w"&gt; &lt;/span&gt;--interface&lt;span class="w"&gt; &lt;/span&gt;ib0
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;After comparing notes with existing scripts that we know to work on Summit,
we modify keywords to add and remove certain lines in the header.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LSFCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;500 GB&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;GEN119&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;walltime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;00:30&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;job_extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-nnodes 1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;          &lt;span class="c1"&gt;# &amp;lt;--- new!&lt;/span&gt;
    &lt;span class="n"&gt;header_skip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-R&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-n &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-M&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;--- new!&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And when we call scale this seems to make LSF happy. It no longer dumps out
large error messages.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# things seem to pass&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 153)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="workers-don-t-connect-to-the-scheduler"&gt;
&lt;h1&gt;Workers don’t connect to the Scheduler&lt;/h1&gt;
&lt;p&gt;So things seem fine from LSF’s perspective, but when we connect up a client to
our cluster we don’t see anything arriving.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;Client: scheduler=&amp;#39;tcp://10.41.0.34:41107&amp;#39; processes=0 cores=0&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Two things to check, have the jobs actually made it through the queue?
Typically we use a resource manager operation, like &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;qstat&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;squeue&lt;/span&gt;&lt;/code&gt;, or
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;bjobs&lt;/span&gt;&lt;/code&gt; for this. Maybe our jobs are trapped in the queue?&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ bash
JOBID   USER       STAT   SLOTS    QUEUE       START_TIME    FINISH_TIME   JOB_NAME
600785  mrocklin   RUN    43       batch       Aug 26 13:11  Aug 26 13:41  dask-worker
600786  mrocklin   RUN    43       batch       Aug 26 13:11  Aug 26 13:41  dask-worker
600784  mrocklin   RUN    43       batch       Aug 26 13:11  Aug 26 13:41  dask-worker
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Nope, it looks like they’re in a running state. Now we go and look at their
logs. It can sometimes be tricky to track down the log files from your jobs,
but your IT administrator should know where they are. Often they’re where you
ran your job from, and have the Job ID in the filename.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ cat dask-worker.600784.err
distributed.worker - INFO -       Start worker at: tcp://128.219.134.81:44053
distributed.worker - INFO -          Listening to: tcp://128.219.134.81:44053
distributed.worker - INFO -          dashboard at:       128.219.134.81:34583
distributed.worker - INFO - Waiting to connect to: tcp://128.219.134.74:34153
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                         16
distributed.worker - INFO -                Memory:                   75.00 GB
distributed.worker - INFO -       Local Directory: /autofs/nccs-svm1_home1/mrocklin/worker-ybnhk4ib
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Waiting to connect to: tcp://128.219.134.74:34153
distributed.worker - INFO - Waiting to connect to: tcp://128.219.134.74:34153
distributed.worker - INFO - Waiting to connect to: tcp://128.219.134.74:34153
distributed.worker - INFO - Waiting to connect to: tcp://128.219.134.74:34153
distributed.worker - INFO - Waiting to connect to: tcp://128.219.134.74:34153
distributed.worker - INFO - Waiting to connect to: tcp://128.219.134.74:34153
...
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;So the worker processes have started, but they’re having difficulty connecting
to the scheduler. When we ask at IT administrator they identify the address
here as on the wrong network interface:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="mf"&gt;128.219.134.74&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;---&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;accessible&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;So we run &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ifconfig&lt;/span&gt;&lt;/code&gt;, and find the infiniband network interface, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ib0&lt;/span&gt;&lt;/code&gt;, which
is more broadly accessible.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LSFCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;500 GB&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;GEN119&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;walltime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;00:30&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;job_extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-nnodes 1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;header_skip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-R&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-n &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-M&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ib0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c1"&gt;# &amp;lt;--- new!&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We try this out and still, no luck :(&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 227)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="interactive-nodes"&gt;
&lt;h1&gt;Interactive nodes&lt;/h1&gt;
&lt;p&gt;The expert user then says “Oh, our login nodes are pretty locked-down, lets try
this from an interactive compute node. Things tend to work better there”. We
run some arcane bash command (I’ve never seen two of these that look alike so
I’m going to omit it here), and things magically start working. Hooray!&lt;/p&gt;
&lt;p&gt;We run a tiny Dask computation just to prove that we can do some work.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="go"&gt;11&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Actually, it turns out that we were eventually able to get things running from
the login nodes on Summit using a slightly different &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;bsub&lt;/span&gt;&lt;/code&gt; command in LSF, but
I’m going to omit details here because we’re fixing this in Dask and it’s
unlikely to affect future users (I hope?). Locked down login nodes remain a
common cause of no connections across a variety of systems. I’ll say something
like 30% of the systems that I interact with.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 249)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="ssh-tunneling"&gt;
&lt;h1&gt;SSH Tunneling&lt;/h1&gt;
&lt;p&gt;It’s important to get the dashboard up and running so that you can see what’s
going on. Typically we do this with SSH tunnelling. Most HPC people know how
to do this and it’s covered in the Youtube screencast above, so I’m going to
skip it here.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 256)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="jupyter-lab"&gt;
&lt;h1&gt;Jupyter Lab&lt;/h1&gt;
&lt;p&gt;Many interactive Dask users on HPC today are moving towards using JupyterLab.
This choice gives them a notebook, terminals, file browser, and Dask’s
dashboard all in a single web tab. This greatly reduces the number of times
they have to SSH in, and, with the magic of web proxies, means that they only
need to tunnel once.&lt;/p&gt;
&lt;p&gt;I conda installed JupyterLab and a proxy library, and then tried to
&lt;a class="reference external" href="https://github.com/dask/dask-labextension#installation"&gt;set up the Dask JupyterLab extension&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;conda&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;jupyterlab&lt;/span&gt;
&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;jupyter&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;proxy&lt;/span&gt;  &lt;span class="c1"&gt;# to route dashboard through Jupyter&amp;#39;s port&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Next, we’re going to install the
&lt;a class="reference external" href="https://github.com/dask/dask-labextension"&gt;Dask Labextension&lt;/a&gt; into JupyterLab
in order to get the Dask Dashboard directly into our Jupyter session..
For that, we need &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nodejs&lt;/span&gt;&lt;/code&gt; in order to install things into JupyterLab.
I thought that this was going to be a pain, given the Power architecture, but
amazingly, this also seems to be in Anaconda’s default Power channel.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;mrocklin@login2.summit $ conda install nodejs  # Thanks conda packaging devs!
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then I install Dask-Labextension, which is both a Python and a JavaScript
package:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;dask_labextension&lt;/span&gt;
&lt;span class="n"&gt;jupyter&lt;/span&gt; &lt;span class="n"&gt;labextension&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;labextension&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then I set up a password for my Jupyter sessions&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;jupyter&lt;/span&gt; &lt;span class="n"&gt;notebook&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And run JupyterLab in a network friendly way&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;mrocklin@login2.summit $ jupyter lab --no-browser --ip=&amp;quot;login2&amp;quot;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And set up a single SSH tunnel from my home machine to the login node&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;# Be sure to match the login node&amp;#39;s hostname and the Jupyter port below

mrocklin@my-laptop $ ssh -L 8888:login2:8888 summit.olcf.ornl.gov
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I can now connect to Jupyter from my laptop by navigating to
&lt;a class="reference external" href="http://localhost:8888"&gt;http://localhost:8888&lt;/a&gt; , run the cluster commands above in a notebook, and
things work great. Additionally, thanks to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;jupyter-server-proxy&lt;/span&gt;&lt;/code&gt;, Dask’s
dashboard is also available at &lt;a class="reference external" href="http://localhost:8888/proxy/####/status"&gt;http://localhost:8888/proxy/####/status&lt;/a&gt; , where
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;####&lt;/span&gt;&lt;/code&gt; is the port currently hosting Dask’s dashboard. You can probably find
this by looking at &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cluster.dashboard_link&lt;/span&gt;&lt;/code&gt;. It defaults to 8787, but if
you’ve started a bunch of Dask schedulers on your system recently it’s possible
that that port is taken up and so Dask had to resort to using random ports.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 320)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="configuration-files"&gt;
&lt;h1&gt;Configuration files&lt;/h1&gt;
&lt;p&gt;I don’t want to keep typing all of these commands, so now I put things into a
single configuration file, and plop that file into &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;~/.config/dask/summit.yaml&lt;/span&gt;&lt;/code&gt;
(any filename that ends in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.yaml&lt;/span&gt;&lt;/code&gt; will do).&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;jobqueue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;lsf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;cores&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;128&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;processes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;8&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;500 GB&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;job-extra&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;-nnodes&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ib0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;header-skip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;-R&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;-n&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;-M&amp;quot;&lt;/span&gt;

&lt;span class="nt"&gt;labextension&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;module&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dask_jobqueue&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;class&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;LSFCluster&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;your-project-id&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 349)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="slow-worker-startup"&gt;
&lt;h1&gt;Slow worker startup&lt;/h1&gt;
&lt;p&gt;Now that things are easier to use I find myself using the system more, and some
other problems arise.&lt;/p&gt;
&lt;p&gt;I notice that it takes a long time to start up a worker. It seems to hang
intermittently during startup, so I add a few lines to
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;distributed/__init__.py&lt;/span&gt;&lt;/code&gt; to print out the state of the main Python thread
every second, to see where this is happening:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;threading&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;

&lt;span class="n"&gt;main_thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_ident&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_current_frames&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;main_thread&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daemon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;thraed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This prints out a traceback that brings us to this code in Dask:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_locking_enabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dir_path&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;DIR_LOCK_EXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Locking &lt;/span&gt;&lt;span class="si"&gt;%r&lt;/span&gt;&lt;span class="s2"&gt;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Avoid a race condition before locking the file&lt;/span&gt;
        &lt;span class="c1"&gt;# by taking the global lock&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;workspace&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_global_lock&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;locket&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lock_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock_file&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;acquire&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It looks like Dask is trying to use a file-based lock.
Unfortunately some NFS systems don’t like file-based locks, or handle them very
slowly. In the case of Summit, the home directory is actually mounted
read-only from the compute nodes, so a file-based lock will simply fail.
Looking up the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;is_locking_enabled&lt;/span&gt;&lt;/code&gt; function we see that it checks a
configuration value.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;is_locking_enabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;distributed.worker.use-file-locking&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;So we add that to our config file. At the same time I switch from the
forkserver to spawn multiprocessing method (I thought that this might help, but
it didn’t), which is relatively harmless.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;distributed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;multiprocessing&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;spawn&lt;/span&gt;
    &lt;span class="n"&gt;use&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;locking&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;

&lt;span class="n"&gt;jobqueue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;lsf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;cores&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;
    &lt;span class="n"&gt;processes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="n"&gt;GB&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-nnodes 1&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ib0&lt;/span&gt;
    &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;skip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-R&amp;quot;&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-n &amp;quot;&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;-M&amp;quot;&lt;/span&gt;

&lt;span class="n"&gt;labextension&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;dask_jobqueue&amp;#39;&lt;/span&gt;
     &lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;LSFCluster&amp;#39;&lt;/span&gt;
     &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
     &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 435)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;This post outlines many issues that I ran into when getting Dask to run on
one specific HPC system. These problems aren’t universal, so you may not run
into them, but they’re also not super-rare. Mostly my objective in writing
this up is to give people a sense of the sorts of problems that arise when
Dask and an HPC system interact.&lt;/p&gt;
&lt;p&gt;None of the problems above are that serious. They’ve all happened before and
they all have solutions that can be written down in a configuration file.
Finding what the problem is though can be challenging, and often requires the
combined expertise of individuals that are experienced with Dask and with that
particular HPC system.&lt;/p&gt;
&lt;p&gt;There are a few configuration files posted here
&lt;a class="reference external" href="https://jobqueue.dask.org/en/latest/configurations.html"&gt;jobqueue.dask.org/en/latest/configurations.html&lt;/a&gt;, which may be informative. The &lt;a class="reference external" href="https://github.com/dask/dask-jobqueue/issues"&gt;Dask Jobqueue issue tracker&lt;/a&gt; is also a fairly friendly place, full of both IT professionals and Dask experts.&lt;/p&gt;
&lt;p&gt;Also, as a reminder, you don’t need to have an HPC machine in order to use
Dask. Dask is conveniently deployable from other Cloud, Hadoop, and local
systems. See the &lt;a class="reference external" href="https://docs.dask.org/en/latest/setup.html"&gt;Dask setup
documentation&lt;/a&gt; for more
information.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 458)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="future-work-gpus"&gt;
&lt;h1&gt;Future work: GPUs&lt;/h1&gt;
&lt;p&gt;Summit is fast because it has a ton of GPUs. I’m going to work on that next,
but that will probably cover enough content to fill up a whole other blogpost :)&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/28/dask-on-summit.md&lt;/span&gt;, line 463)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="branches"&gt;
&lt;h1&gt;Branches&lt;/h1&gt;
&lt;p&gt;For anyone playing along at home (or on Summit). I’m operating from the
following development branches:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="github reference external" href="https://github.com/dask/distributed&amp;#64;master"&gt;dask/distributed&amp;#64;master&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="github reference external" href="https://github.com/mrocklin/dask-jobqueue&amp;#64;spec-rewrite"&gt;mrocklin/dask-jobqueue&amp;#64;spec-rewrite&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Although hopefully within a month of writing this article, everything should be
in a nicely released state.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/08/28/dask-on-summit/"/>
    <summary>Dask is deployed on traditional HPC machines with increasing frequency.
In the past week I’ve personally helped four different groups get set up.
This is a surprisingly individual process,
because every HPC machine has its own idiosyncrasies.
Each machine uses a job scheduler like SLURM/PBS/SGE/LSF/…, a network file
system, and fast interconnect, but each of those sub-systems have slightly
different policies on a machine-by-machine basis, which is where things get tricky.</summary>
    <category term="HPC" label="HPC"/>
    <published>2019-08-28T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/08/09/image-itk/</id>
    <title>Dask and ITK for large scale image analysis</title>
    <updated>2019-08-09T00:00:00+00:00</updated>
    <author>
      <name>Matthew McCormick</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;This post explores using the &lt;a class="reference external" href="https://www.itk.org"&gt;ITK&lt;/a&gt; suite of image processing utilities in parallel with Dask Array.&lt;/p&gt;
&lt;p&gt;We cover …&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A simple but common example of applying deconvolution across a stack of 3d images&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tips on how to make these two libraries work well together&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Challenges that we ran into and opportunities for future improvements.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 19)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="a-worked-example"&gt;
&lt;h1&gt;A Worked Example&lt;/h1&gt;
&lt;p&gt;Let’s start with a full example applying Richardson Lucy deconvolution to a
stack of light sheet microscopy data. This is the same data that we showed how
to load in our &lt;a class="reference external" href="https://blog.dask.org/2019/06/20/load-image-data"&gt;last blogpost on image loading&lt;/a&gt;.
You can &lt;a class="reference external" href="https://drive.google.com/drive/folders/13mpIfqspKTIINkfoWbFsVtFF8D7jbTqJ"&gt;access the data as tiff files from google drive here&lt;/a&gt;, and the access the &lt;a class="reference external" href="https://drive.google.com/drive/folders/13udO-h9epItG5MNWBp0VxBkKCllYBLQF"&gt;corresponding point spread function images here&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Load our data from last time¶&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="n"&gt;imgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_zarr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AOLLSMData_m4_raw.zarr/&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;data&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 188.74 GB &lt;/td&gt; &lt;td&gt; 316.15 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (3, 199, 201, 1024, 768) &lt;/td&gt; &lt;td&gt; (1, 1, 201, 1024, 768) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 598 Tasks &lt;/td&gt;&lt;td&gt; 597 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="404" height="206" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="0" y1="0" x2="45" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="0" y1="9" x2="45" y2="9" /&gt;
  &lt;line x1="0" y1="18" x2="45" y2="18" /&gt;
  &lt;line x1="0" y1="27" x2="45" y2="27" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="0" y1="0" x2="0" y2="27" style="stroke-width:2" /&gt;
  &lt;line x1="0" y1="0" x2="0" y2="27" /&gt;
  &lt;line x1="0" y1="0" x2="0" y2="27" /&gt;
  &lt;line x1="0" y1="0" x2="0" y2="27" /&gt;
  &lt;line x1="0" y1="0" x2="0" y2="27" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="27" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="27" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="27" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="27" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="27" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="27" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="27" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="27" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="27" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="27" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="27" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="27" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="27" /&gt;
  &lt;line x1="4" y1="0" x2="4" y2="27" /&gt;
  &lt;line x1="4" y1="0" x2="4" y2="27" /&gt;
  &lt;line x1="4" y1="0" x2="4" y2="27" /&gt;
  &lt;line x1="4" y1="0" x2="4" y2="27" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="27" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="27" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="27" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="27" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="27" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="27" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="27" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="27" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="27" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="27" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="27" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="27" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="27" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="27" /&gt;
  &lt;line x1="8" y1="0" x2="8" y2="27" /&gt;
  &lt;line x1="8" y1="0" x2="8" y2="27" /&gt;
  &lt;line x1="8" y1="0" x2="8" y2="27" /&gt;
  &lt;line x1="8" y1="0" x2="8" y2="27" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="27" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="27" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="27" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="27" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="27" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="27" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="27" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="27" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="27" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="27" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="27" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="27" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="27" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="27" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="27" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="27" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="27" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="27" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="27" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="27" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="27" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="27" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="27" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="27" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="27" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="27" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="27" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="27" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="27" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="27" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="27" /&gt;
  &lt;line x1="16" y1="0" x2="16" y2="27" /&gt;
  &lt;line x1="16" y1="0" x2="16" y2="27" /&gt;
  &lt;line x1="16" y1="0" x2="16" y2="27" /&gt;
  &lt;line x1="16" y1="0" x2="16" y2="27" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="27" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="27" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="27" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="27" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="27" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="27" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="27" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="27" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="27" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="27" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="27" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="27" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="27" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="27" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="27" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="27" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="27" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="27" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="27" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="27" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="27" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="27" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="27" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="27" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="27" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="27" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="27" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="27" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="27" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="27" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="27" /&gt;
  &lt;line x1="24" y1="0" x2="24" y2="27" /&gt;
  &lt;line x1="24" y1="0" x2="24" y2="27" /&gt;
  &lt;line x1="24" y1="0" x2="24" y2="27" /&gt;
  &lt;line x1="24" y1="0" x2="24" y2="27" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="27" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="27" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="27" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="27" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="27" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="27" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="27" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="27" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="27" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="27" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="27" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="27" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="27" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="27" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="27" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="27" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="27" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="27" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="27" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="27" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="27" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="27" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="27" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="27" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="27" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="27" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="27" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="27" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="27" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="27" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="27" /&gt;
  &lt;line x1="32" y1="0" x2="32" y2="27" /&gt;
  &lt;line x1="32" y1="0" x2="32" y2="27" /&gt;
  &lt;line x1="32" y1="0" x2="32" y2="27" /&gt;
  &lt;line x1="32" y1="0" x2="32" y2="27" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="27" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="27" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="27" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="27" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="27" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="27" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="27" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="27" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="27" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="27" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="27" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="27" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="27" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="27" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="27" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="27" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="27" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="27" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="27" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="27" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="27" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="27" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="27" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="27" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="27" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="27" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="27" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="27" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="27" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="27" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="27" /&gt;
  &lt;line x1="40" y1="0" x2="40" y2="27" /&gt;
  &lt;line x1="40" y1="0" x2="40" y2="27" /&gt;
  &lt;line x1="40" y1="0" x2="40" y2="27" /&gt;
  &lt;line x1="40" y1="0" x2="40" y2="27" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="27" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="27" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="27" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="27" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="27" /&gt;
  &lt;line x1="42" y1="0" x2="42" y2="27" /&gt;
  &lt;line x1="42" y1="0" x2="42" y2="27" /&gt;
  &lt;line x1="42" y1="0" x2="42" y2="27" /&gt;
  &lt;line x1="42" y1="0" x2="42" y2="27" /&gt;
  &lt;line x1="43" y1="0" x2="43" y2="27" /&gt;
  &lt;line x1="43" y1="0" x2="43" y2="27" /&gt;
  &lt;line x1="43" y1="0" x2="43" y2="27" /&gt;
  &lt;line x1="43" y1="0" x2="43" y2="27" /&gt;
  &lt;line x1="44" y1="0" x2="44" y2="27" /&gt;
  &lt;line x1="44" y1="0" x2="44" y2="27" /&gt;
  &lt;line x1="44" y1="0" x2="44" y2="27" /&gt;
  &lt;line x1="44" y1="0" x2="44" y2="27" /&gt;
  &lt;line x1="44" y1="0" x2="44" y2="27" /&gt;
  &lt;line x1="45" y1="0" x2="45" y2="27" /&gt;
  &lt;line x1="45" y1="0" x2="45" y2="27" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="0.000000,0.000000 45.378219,0.000000 45.378219,27.530335 0.000000,27.530335" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="22.689110" y="47.530335" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;199&lt;/text&gt;
&lt;text x="65.378219" y="13.765167" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,65.378219,13.765167)"&gt;3&lt;/text&gt;&lt;/p&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="115" y1="0" x2="141" y2="26" style="stroke-width:2" /&gt;
  &lt;line x1="115" y1="130" x2="141" y2="156" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="115" y1="0" x2="115" y2="130" style="stroke-width:2" /&gt;
  &lt;line x1="141" y1="26" x2="141" y2="156" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="115.000000,0.000000 141.720328,26.720328 141.720328,156.720328 115.000000,130.000000" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="115" y1="0" x2="212" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="141" y1="26" x2="239" y2="26" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="115" y1="0" x2="141" y2="26" style="stroke-width:2" /&gt;
  &lt;line x1="212" y1="0" x2="239" y2="26" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="115.000000,0.000000 212.500000,0.000000 239.220328,26.720328 141.720328,26.720328" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="141" y1="26" x2="239" y2="26" style="stroke-width:2" /&gt;
  &lt;line x1="141" y1="156" x2="239" y2="156" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="141" y1="26" x2="141" y2="156" style="stroke-width:2" /&gt;
  &lt;line x1="239" y1="26" x2="239" y2="156" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="141.720328,26.720328 239.220328,26.720328 239.220328,156.720328 141.720328,156.720328" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="190.470328" y="176.720328" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;768&lt;/text&gt;
&lt;text x="259.220328" y="91.720328" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,259.220328,91.720328)"&gt;1024&lt;/text&gt;
&lt;text x="118.360164" y="163.360164" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,118.360164,163.360164)"&gt;201&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;This dataset has shape &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;(3,&lt;/span&gt; &lt;span class="pre"&gt;199,&lt;/span&gt; &lt;span class="pre"&gt;201,&lt;/span&gt; &lt;span class="pre"&gt;1024,&lt;/span&gt; &lt;span class="pre"&gt;768)&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;3 fluorescence color channels,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;199 time points,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;201 z-slices,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1024 pixels in the y dimension, and&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;768 pixels in the x dimension.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Load our Point Spread Function (PSF)&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array.image&lt;/span&gt;
&lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AOLLSMData/m4/psfs_z0p1/*.tif&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 2.48 MB &lt;/td&gt; &lt;td&gt; 827.39 kB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (3, 1, 101, 64, 64) &lt;/td&gt; &lt;td&gt; (1, 1, 101, 64, 64) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 6 Tasks &lt;/td&gt;&lt;td&gt; 3 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="402" height="208" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="0" y1="0" x2="27" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="0" y1="11" x2="27" y2="11" /&gt;
  &lt;line x1="0" y1="22" x2="27" y2="22" /&gt;
  &lt;line x1="0" y1="33" x2="27" y2="33" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="0" y1="0" x2="0" y2="33" style="stroke-width:2" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="33" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="0.000000,0.000000 27.530335,0.000000 27.530335,33.941765 0.000000,33.941765" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="13.765167" y="53.941765" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;1&lt;/text&gt;
&lt;text x="47.530335" y="16.970882" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,47.530335,16.970882)"&gt;3&lt;/text&gt;&lt;/p&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="97" y1="0" x2="173" y2="76" style="stroke-width:2" /&gt;
  &lt;line x1="97" y1="82" x2="173" y2="158" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="97" y1="0" x2="97" y2="82" style="stroke-width:2" /&gt;
  &lt;line x1="173" y1="76" x2="173" y2="158" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="97.000000,0.000000 173.470588,76.470588 173.470588,158.846826 97.000000,82.376238" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="97" y1="0" x2="179" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="173" y1="76" x2="255" y2="76" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="97" y1="0" x2="173" y2="76" style="stroke-width:2" /&gt;
  &lt;line x1="179" y1="0" x2="255" y2="76" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="97.000000,0.000000 179.376238,0.000000 255.846826,76.470588 173.470588,76.470588" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="173" y1="76" x2="255" y2="76" style="stroke-width:2" /&gt;
  &lt;line x1="173" y1="158" x2="255" y2="158" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="173" y1="76" x2="173" y2="158" style="stroke-width:2" /&gt;
  &lt;line x1="255" y1="76" x2="255" y2="158" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="173.470588,76.470588 255.846826,76.470588 255.846826,158.846826 173.470588,158.846826" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="214.658707" y="178.846826" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;64&lt;/text&gt;
&lt;text x="275.846826" y="117.658707" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,275.846826,117.658707)"&gt;64&lt;/text&gt;
&lt;text x="125.235294" y="140.611532" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,125.235294,140.611532)"&gt;101&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Convert data to float32 for computation¶&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="n"&gt;imgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Note: the psf needs to be sampled with a voxel spacing&lt;/span&gt;
&lt;span class="c1"&gt;# consistent with the image&amp;#39;s sampling&lt;/span&gt;
&lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Apply Richardson-Lucy Deconvolution¶&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;richardson_lucy_deconvolution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; Apply deconvolution to a single chunk of data &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;itk&lt;/span&gt;

    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# remove leading two length-one dimensions&lt;/span&gt;
    &lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# remove leading two length-one dimensions&lt;/span&gt;

    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_view_from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# Convert to ITK object&lt;/span&gt;
    &lt;span class="n"&gt;kernel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_view_from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Convert to ITK object&lt;/span&gt;

    &lt;span class="n"&gt;deconvolved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;richardson_lucy_deconvolution_image_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;kernel_image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;number_of_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;iterations&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array_from_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deconvolved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Convert back to Numpy array&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Add back the leading length-one dimensions&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;richardson_lucy_deconvolution&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Create a local cluster of dask worker processes&lt;/span&gt;
&lt;span class="c1"&gt;# (this could also point to a distributed cluster if you have it)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threads_per_process&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# now dask operations use this cluster by default&lt;/span&gt;

&lt;span class="c1"&gt;# Trigger computation and store&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_zarr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AOLLSMData_m4_raw.zarr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;deconvolved&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;overwrite&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;So in the example above we …&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Load data both from Zarr and TIFF files into multi-chunked Dask arrays&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Construct a function to apply an ITK routine onto each chunk&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apply that function across the dask array with the &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.core.map_blocks"&gt;dask.array.map_blocks&lt;/a&gt; function.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Store the result back into Zarr format&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;From the perspective of an imaging scientist,
the new piece of technology here is the
&lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.core.map_blocks"&gt;dask.array.map_blocks&lt;/a&gt; function.
Given a Dask array composed of many NumPy arrays and a function, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; applies that function across each block in parallel, returning a Dask array as a result.
It’s a great tool whenever you want to apply an operation across many blocks in a simple fashion.
Because Dask arrays are just made out of Numpy arrays it’s an easy way to
compose Dask with the rest of the Scientific Python ecosystem.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 459)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="building-the-right-function"&gt;
&lt;h1&gt;Building the right function&lt;/h1&gt;
&lt;p&gt;However in this case there are a few challenges to constructing the right Numpy
-&amp;gt; Numpy function, due to both idiosyncrasies in ITK and Dask Array. Let’s
look at our function again:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;richardson_lucy_deconvolution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; Apply deconvolution to a single chunk of data &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;itk&lt;/span&gt;

    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# remove leading two length-one dimensions&lt;/span&gt;
    &lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# remove leading two length-one dimensions&lt;/span&gt;

    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_view_from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# Convert to ITK object&lt;/span&gt;
    &lt;span class="n"&gt;kernel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_view_from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Convert to ITK object&lt;/span&gt;

    &lt;span class="n"&gt;deconvolved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;richardson_lucy_deconvolution_image_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;kernel_image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;number_of_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;iterations&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array_from_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deconvolved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Convert back to Numpy array&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Add back the leading length-one dimensions&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;richardson_lucy_deconvolution&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is longer than we would like.
Instead, we would have preferred to just use the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;itk&lt;/span&gt;&lt;/code&gt; function directly,
without all of the steps before and after.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;deconvolved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;richardson_lucy_deconvolution_image_filter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What were the extra steps in our function and why were they necessary?&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Convert to and from ITK Image objects&lt;/strong&gt;: ITK functions don’t consume and
produce Numpy arrays, they consume and produce their own &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Image&lt;/span&gt;&lt;/code&gt; data
structure. There are convenient functions to convert back and forth,
so handling this is straightforward, but it does need to be handled each
time. See &lt;a class="reference external" href="https://github.com/InsightSoftwareConsortium/ITK/issues/1136"&gt;ITK #1136&lt;/a&gt; for a
feature request that would remove the need for this step.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unpack and pack singleton dimensions&lt;/strong&gt;: Our Dask arrays have shapes like
the following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;Array&lt;/span&gt; &lt;span class="n"&gt;Shape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;199&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Chunk&lt;/span&gt; &lt;span class="n"&gt;Shape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;So our &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; function gets NumPy arrays of the chunk size,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;(1,&lt;/span&gt; &lt;span class="pre"&gt;1,&lt;/span&gt; &lt;span class="pre"&gt;201,&lt;/span&gt; &lt;span class="pre"&gt;1024,&lt;/span&gt; &lt;span class="pre"&gt;768)&lt;/span&gt;&lt;/code&gt;.
However, our ITK functions are meant to work on 3d arrays, not 5d arrays,
so we need to remove those first two dimensions.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# remove leading two length-one dimensions&lt;/span&gt;
&lt;span class="n"&gt;psf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# remove leading two length-one dimensions&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And then when we’re done, Dask expects to get back 5d arrays like what it
provided, so we add these singleton dimensions back in&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Add back the leading length-one dimensions&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Again, this is straightforward for users who are accustomed to NumPy
slicing syntax, but does need to be done each time.
This adds some friction to our development process,
and is another step that can confuse users.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;But if you’re comfortable working around things like this,
then ITK and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; can be a powerful combination
if you want to parallelize out ITK operations across a cluster.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 541)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="defining-a-dask-cluster"&gt;
&lt;h1&gt;Defining a Dask Cluster&lt;/h1&gt;
&lt;p&gt;Above we used &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.distributed.LocalCluster&lt;/span&gt;&lt;/code&gt; to set up 20 single-threaded
workers on our local workstation:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threads_per_process&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# now dask operations use this cluster by default&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you had a distributed resource, this is where you would connect it.
You would swap out &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;LocalCluster&lt;/span&gt;&lt;/code&gt; with one of
&lt;a class="reference external" href="https://docs.dask.org/en/latest/setup.html"&gt;Dask’s other deployment options&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also, we found that we needed to use many single-threaded processes rather than
one multi-threaded process because ITK functions seem to still hold onto the
GIL. This is fine, we just need to be aware of it so that we set up our Dask
workers appropriately with one thread per process for maximum efficiency.
See &lt;a class="reference external" href="https://github.com/InsightSoftwareConsortium/ITK/issues/1134"&gt;ITK #1134&lt;/a&gt;
for an active Github issue on this topic.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 563)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="serialization"&gt;
&lt;h1&gt;Serialization&lt;/h1&gt;
&lt;p&gt;We had some difficulty when using the ITK library across multiple processes,
because the library itself didn’t serialize well. (If you don’t understand
what that means, don’t worry). We solved a bit of this in
&lt;a class="reference external" href="https://github.com/InsightSoftwareConsortium/ITK/pull/1090"&gt;ITK #1090&lt;/a&gt;,
but some issues still remain.&lt;/p&gt;
&lt;p&gt;We got around this by including the import in the function rather than outside
of it.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;richardson_lucy_deconvolution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;itk&lt;/span&gt;   &lt;span class="c1"&gt;# &amp;lt;--- we work around serialization issues by importing within the function&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;That way each task imports itk individually, and we sidestep this issue.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 581)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="trying-scikit-image"&gt;
&lt;h1&gt;Trying Scikit-Image&lt;/h1&gt;
&lt;p&gt;We also tried out the Richardson Lucy deconvolution operation in
&lt;a class="reference external" href="https://scikit-image.org/"&gt;Scikit-Image&lt;/a&gt;. Scikit-Image is known for being
more Scipy/Numpy native, but not always as fast as ITK. Our experience
confirmed this perception.&lt;/p&gt;
&lt;p&gt;First, we were glad to see that the scikit-image function worked with
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; immediately without any packing/unpacking, dimensionality, or
serialization issues:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;skimage.restoration&lt;/span&gt;

&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skimage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;restoration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;richardson_lucy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# just works&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;So all of that converting to and from image objects or removing and adding
singleton dimensions isn’t necessary here.&lt;/p&gt;
&lt;p&gt;In terms of performance we were also happy to see that Scikit-Image released
the GIL, so we were able to get very high reported CPU utilization when using a
small number of multi-threaded processes. However, even though CPU utilization
was high, our parallel performance was poor enough that we stuck with the ITK
solution, warts and all. More information about this is available in
Github issue &lt;a class="reference external" href="https://github.com/scikit-image/scikit-image/issues/4083"&gt;scikit-image #4083&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: sequentially on a single chunk, ITK ran in around 2 minutes while
scikit-image ran in 3 minutes. It was only once we started parallelizing that
things became slow.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Regardless, our goal in this experiment was to see how well ITK and Dask
array played together. It was nice to see what smooth integration looks like,
if only to motivate future development in ITK+Dask relations.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 616)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="numba-gufuncs"&gt;
&lt;h1&gt;Numba GUFuncs&lt;/h1&gt;
&lt;p&gt;An alternative to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;da.map_blocks&lt;/span&gt;&lt;/code&gt; are Generalized Universal Functions (gufuncs)
These are functions that have many magical properties, one of which is that
they operate equally well on both NumPy and Dask arrays. If libraries like
ITK or Scikit-Image make their functions into gufuncs then they work without
users having to do anything special.&lt;/p&gt;
&lt;p&gt;The easiest way to implement gufuncs today is with Numba. I did this on our
wrapped &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;richardson_lucy&lt;/span&gt;&lt;/code&gt; function, just to show how it could work, in case
other libraries want to take this on in the future.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numba&lt;/span&gt;

&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guvectorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;float32[:,:,:], float32[:,:,:], float32[:,:,:]&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# we have to specify types&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;(i,j,k),(a,b,c)-&amp;gt;(i,j,k)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                          &lt;span class="c1"&gt;# and dimensionality explicitly&lt;/span&gt;
    &lt;span class="n"&gt;forceobj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;richardson_lucy_deconvolution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# &amp;lt;---- no dimension unpacking!&lt;/span&gt;
    &lt;span class="n"&gt;iterations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_view_from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ascontiguousarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;kernel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_view_from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ascontiguousarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;deconvolved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;richardson_lucy_deconvolution_image_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kernel_image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number_of_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;iterations&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array_from_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deconvolved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Now this function works natively on either NumPy or Dask arrays&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;richardson_lucy_deconvolution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;-- no map_blocks call!&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note that we’ve both lost the dimension unpacking and the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; call.
Our function now knows enough information about how it can broadcast that Dask
can do the parallelization without being told what to do explicitly.&lt;/p&gt;
&lt;p&gt;This adds some burden onto library maintainers,
but makes the user experience much more smooth.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 658)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="gpu-acceleration"&gt;
&lt;h1&gt;GPU Acceleration&lt;/h1&gt;
&lt;p&gt;When doing some user research on image processing and Dask, almost everyone we
interviewed said that they wanted faster deconvolution. This seemed to be a
major pain point. Now we know why. It’s both very common, and &lt;em&gt;very slow&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Running deconvolution on a single chunk of this size takes around 2-4 minutes,
and we have hundreds of chunks in a single dataset. Multi-core parallelism can
help a bit here, but this problem may also be ripe for GPU acceleration.
Similar operations typically have 100x speedups on GPUs. This might be a more
pragmatic solution than scaling out to large distributed clusters.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/09/image-itk.md&lt;/span&gt;, line 670)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-s-next"&gt;
&lt;h1&gt;What’s next?&lt;/h1&gt;
&lt;p&gt;This experiment both …&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gives us an example&lt;/strong&gt; that other imaging scientists
can copy and modify to be effective with Dask and ITK together.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Highlights areas of improvement&lt;/strong&gt; where developers from the different
libraries can work to remove some of these rough interactions spots in the
future.&lt;/p&gt;
&lt;p&gt;It’s worth noting that Dask has done this with lots of libraries within the
Scipy ecosystem, including Pandas, Scikit-Image, Scikit-Learn, and others.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We’re also going to continue with our imaging experiment, while these technical
issues get worked out in the background. Next up, segmentation!&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/08/09/image-itk/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="imaging" label="imaging"/>
    <published>2019-08-09T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/08/05/user-survey/</id>
    <title>2019 Dask User Survey</title>
    <updated>2019-08-05T00:00:00+00:00</updated>
    <author>
      <name>Tom Augspurger</name>
    </author>
    <content type="html">&lt;style type="text/css"&gt;
table td {
    background: none;
}

table tr.even td {
    background: none;
}

table {
    text-shadow: none;
}

&lt;/style&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/05/user-survey.md&lt;/span&gt;, line 25)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="dask-user-survey-results"&gt;

&lt;p&gt;This notebook presents the results of the 2019 Dask User Survey,
which ran earlier this summer. Thanks to everyone who took the time to fill out the survey!
These results help us better understand the Dask community and will guide future development efforts.&lt;/p&gt;
&lt;p&gt;The raw data, as well as the start of an analysis, can be found in this binder:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://mybinder.org/v2/gh/dask/dask-examples/main?urlpath=%2Ftree%2Fsurveys%2F2019.ipynb"&gt;&lt;img alt="Binder" src="https://mybinder.org/badge_logo.svg" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Let us know if you find anything in the data.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/05/user-survey.md&lt;/span&gt;, line 37)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="highlights"&gt;
&lt;h1&gt;Highlights&lt;/h1&gt;
&lt;p&gt;We had 259 responses to the survey. Overall, we found that the survey respondents really care about improved documentation, and ease of use (including ease of deployment), and scaling. While Dask brings together many different communities (big arrays versus big dataframes, traditional HPC users versus cloud-native resource managers), there was general agreement in what is most important for Dask.&lt;/p&gt;
&lt;p&gt;Now we’ll go through some individual items questions, highlighting particularly interesting results.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/05/user-survey.md&lt;/span&gt;, line 43)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="how-do-you-use-dask"&gt;
&lt;h1&gt;How do you use Dask?&lt;/h1&gt;
&lt;p&gt;For learning resources, almost every respondent uses the documentation.&lt;/p&gt;
&lt;p&gt;&lt;img alt="svg" src="https://blog.dask.org/_images/analyze_4_0.svg" /&gt;&lt;/p&gt;
&lt;p&gt;Most respondents use Dask at least occasionally. Fortunately we had a decent number of respondents who are just looking into Dask, yet still spent the time to take the survey.&lt;/p&gt;
&lt;p&gt;&lt;img alt="svg" src="https://blog.dask.org/_images/analyze_6_0.svg" /&gt;&lt;/p&gt;
&lt;p&gt;I’m curiuos about how learning resource usage changes as users become more experienced. We might expect those just looking into Dask to start with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;examples.dask.org&lt;/span&gt;&lt;/code&gt;, where they can try out Dask without installing anything.&lt;/p&gt;
&lt;p&gt;&lt;img alt="svg" src="https://blog.dask.org/_images/analyze_8_0.svg" /&gt;&lt;/p&gt;
&lt;p&gt;Overall, documentation is still the leader across user user groups.&lt;/p&gt;
&lt;p&gt;The usage of the &lt;a class="reference external" href="https://github.com/dask/dask-tutorial"&gt;Dask tutorial&lt;/a&gt; and the &lt;a class="reference internal" href="#examples.dask.org"&gt;&lt;span class="xref myst"&gt;dask examples&lt;/span&gt;&lt;/a&gt; are relatively consistent across groups. The primary difference between regular and new users is that regular users are more likely to engage on GitHub.&lt;/p&gt;
&lt;p&gt;From StackOverflow questions and GitHub issues, we have a vague idea about which parts of the library are used.
The survey shows that (for our respondents at least) DataFrame and Delayed are the most commonly used APIs.&lt;/p&gt;
&lt;p&gt;&lt;img alt="svg" src="https://blog.dask.org/_images/analyze_10_0.svg" /&gt;&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;About 65.49% of our respondests are using Dask on a Cluster.
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But the majority of respondents &lt;em&gt;also&lt;/em&gt; use Dask on their laptop.
This highlights the importance of Dask scaling down, either for
prototyping with a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;LocalCluster&lt;/span&gt;&lt;/code&gt;, or for out-of-core analysis
using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;LocalCluster&lt;/span&gt;&lt;/code&gt; or one of the single-machine schedulers.&lt;/p&gt;
&lt;p&gt;&lt;img alt="svg" src="https://blog.dask.org/_images/analyze_13_0.svg" /&gt;&lt;/p&gt;
&lt;p&gt;Most respondents use Dask interactively, at least some of the time.&lt;/p&gt;
&lt;p&gt;&lt;img alt="svg" src="https://blog.dask.org/_images/analyze_15_0.svg" /&gt;&lt;/p&gt;
&lt;p&gt;Most repondents thought that more documentation and examples would be the most valuable improvements to the project. This is especially pronounced among new users. But even among those using Dask everyday more people thought that “More examples” is more valuable than “New features” or “Performance improvements”.&lt;/p&gt;
&lt;style  type="text/css" &gt;
    #T_820ef326_b488_11e9_ad41_186590cd1c87row0_col0 {
            background-color:  #3b92c1;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row0_col1 {
            background-color:  #b4c4df;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row0_col2 {
            background-color:  #dad9ea;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row0_col3 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row0_col4 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row1_col0 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row1_col1 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row1_col2 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row1_col3 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row1_col4 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row2_col0 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row2_col1 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row2_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row2_col3 {
            background-color:  #1b7eb7;
            color:  #000000;
        }    #T_820ef326_b488_11e9_ad41_186590cd1c87row2_col4 {
            background-color:  #589ec8;
            color:  #000000;
        }&lt;/style&gt;&lt;table id="T_820ef326_b488_11e9_ad41_186590cd1c87" &gt;&lt;caption&gt;Normalized by row. Darker means that a higher proporiton of users with that usage frequency prefer that priority.&lt;/caption&gt;&lt;thead&gt;    &lt;tr&gt;        &lt;th class="index_name level0" &gt;Which would help you most right now?&lt;/th&gt;        &lt;th class="col_heading level0 col0" &gt;Bug fixes&lt;/th&gt;        &lt;th class="col_heading level0 col1" &gt;More documentation&lt;/th&gt;        &lt;th class="col_heading level0 col2" &gt;More examples in my field&lt;/th&gt;        &lt;th class="col_heading level0 col3" &gt;New features&lt;/th&gt;        &lt;th class="col_heading level0 col4" &gt;Performance improvements&lt;/th&gt;    &lt;/tr&gt;    &lt;tr&gt;        &lt;th class="index_name level0" &gt;How often do you use Dask?&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;    &lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;            &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87level0_row0&amp;quot; class=&amp;quot;row_heading level0 row0&amp;quot; &amp;gt;Every day&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row0_col0&amp;quot; class=&amp;quot;data row0 col0&amp;quot; &amp;gt;9&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row0_col1&amp;quot; class=&amp;quot;data row0 col1&amp;quot; &amp;gt;11&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row0_col2&amp;quot; class=&amp;quot;data row0 col2&amp;quot; &amp;gt;25&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row0_col3&amp;quot; class=&amp;quot;data row0 col3&amp;quot; &amp;gt;22&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row0_col4&amp;quot; class=&amp;quot;data row0 col4&amp;quot; &amp;gt;23&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87level0_row1&amp;quot; class=&amp;quot;row_heading level0 row1&amp;quot; &amp;gt;Just looking for now&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row1_col0&amp;quot; class=&amp;quot;data row1 col0&amp;quot; &amp;gt;1&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row1_col1&amp;quot; class=&amp;quot;data row1 col1&amp;quot; &amp;gt;3&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row1_col2&amp;quot; class=&amp;quot;data row1 col2&amp;quot; &amp;gt;18&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row1_col3&amp;quot; class=&amp;quot;data row1 col3&amp;quot; &amp;gt;9&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row1_col4&amp;quot; class=&amp;quot;data row1 col4&amp;quot; &amp;gt;5&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87level0_row2&amp;quot; class=&amp;quot;row_heading level0 row2&amp;quot; &amp;gt;Occasionally&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row2_col0&amp;quot; class=&amp;quot;data row2 col0&amp;quot; &amp;gt;14&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row2_col1&amp;quot; class=&amp;quot;data row2 col1&amp;quot; &amp;gt;27&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row2_col2&amp;quot; class=&amp;quot;data row2 col2&amp;quot; &amp;gt;52&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row2_col3&amp;quot; class=&amp;quot;data row2 col3&amp;quot; &amp;gt;18&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_820ef326_b488_11e9_ad41_186590cd1c87row2_col4&amp;quot; class=&amp;quot;data row2 col4&amp;quot; &amp;gt;15&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;&amp;lt;/table&amp;gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Perhaps users of certain dask APIs feel differenlty from the group as a whole? We perform a similar analysis grouped by API use, rather than frequency of use.&lt;/p&gt;
&lt;style  type="text/css" &gt;
    #T_821479f4_b488_11e9_ad41_186590cd1c87row0_col0 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row0_col1 {
            background-color:  #cacee5;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row0_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row0_col3 {
            background-color:  #f1ebf4;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row0_col4 {
            background-color:  #c4cbe3;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row1_col0 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row1_col1 {
            background-color:  #3b92c1;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row1_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row1_col3 {
            background-color:  #62a2cb;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row1_col4 {
            background-color:  #bdc8e1;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row2_col0 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row2_col1 {
            background-color:  #c2cbe2;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row2_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row2_col3 {
            background-color:  #94b6d7;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row2_col4 {
            background-color:  #e0dded;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row3_col0 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row3_col1 {
            background-color:  #e6e2ef;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row3_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row3_col3 {
            background-color:  #ced0e6;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row3_col4 {
            background-color:  #c5cce3;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row4_col0 {
            background-color:  #dedcec;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row4_col1 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row4_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row4_col3 {
            background-color:  #1c7fb8;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row4_col4 {
            background-color:  #73a9cf;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row5_col0 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row5_col1 {
            background-color:  #b4c4df;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row5_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row5_col3 {
            background-color:  #b4c4df;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row5_col4 {
            background-color:  #eee9f3;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row6_col0 {
            background-color:  #faf2f8;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row6_col1 {
            background-color:  #e7e3f0;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row6_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row6_col3 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_821479f4_b488_11e9_ad41_186590cd1c87row6_col4 {
            background-color:  #f4eef6;
            color:  #000000;
        }&lt;/style&gt;&lt;table id="T_821479f4_b488_11e9_ad41_186590cd1c87" &gt;&lt;caption&gt;Normalized by row. Darker means that a higher proporiton of users of that API prefer that priority.&lt;/caption&gt;&lt;thead&gt;    &lt;tr&gt;        &lt;th class="index_name level0" &gt;Which would help you most right now?&lt;/th&gt;        &lt;th class="col_heading level0 col0" &gt;Bug fixes&lt;/th&gt;        &lt;th class="col_heading level0 col1" &gt;More documentation&lt;/th&gt;        &lt;th class="col_heading level0 col2" &gt;More examples in my field&lt;/th&gt;        &lt;th class="col_heading level0 col3" &gt;New features&lt;/th&gt;        &lt;th class="col_heading level0 col4" &gt;Performance improvements&lt;/th&gt;    &lt;/tr&gt;    &lt;tr&gt;        &lt;th class="index_name level0" &gt;Dask APIs&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;    &lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;            &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87level0_row0&amp;quot; class=&amp;quot;row_heading level0 row0&amp;quot; &amp;gt;Array&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row0_col0&amp;quot; class=&amp;quot;data row0 col0&amp;quot; &amp;gt;10&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row0_col1&amp;quot; class=&amp;quot;data row0 col1&amp;quot; &amp;gt;24&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row0_col2&amp;quot; class=&amp;quot;data row0 col2&amp;quot; &amp;gt;62&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row0_col3&amp;quot; class=&amp;quot;data row0 col3&amp;quot; &amp;gt;15&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row0_col4&amp;quot; class=&amp;quot;data row0 col4&amp;quot; &amp;gt;25&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87level0_row1&amp;quot; class=&amp;quot;row_heading level0 row1&amp;quot; &amp;gt;Bag&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row1_col0&amp;quot; class=&amp;quot;data row1 col0&amp;quot; &amp;gt;3&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row1_col1&amp;quot; class=&amp;quot;data row1 col1&amp;quot; &amp;gt;11&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row1_col2&amp;quot; class=&amp;quot;data row1 col2&amp;quot; &amp;gt;16&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row1_col3&amp;quot; class=&amp;quot;data row1 col3&amp;quot; &amp;gt;10&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row1_col4&amp;quot; class=&amp;quot;data row1 col4&amp;quot; &amp;gt;7&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87level0_row2&amp;quot; class=&amp;quot;row_heading level0 row2&amp;quot; &amp;gt;DataFrame&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row2_col0&amp;quot; class=&amp;quot;data row2 col0&amp;quot; &amp;gt;16&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row2_col1&amp;quot; class=&amp;quot;data row2 col1&amp;quot; &amp;gt;32&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row2_col2&amp;quot; class=&amp;quot;data row2 col2&amp;quot; &amp;gt;71&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row2_col3&amp;quot; class=&amp;quot;data row2 col3&amp;quot; &amp;gt;39&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row2_col4&amp;quot; class=&amp;quot;data row2 col4&amp;quot; &amp;gt;26&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87level0_row3&amp;quot; class=&amp;quot;row_heading level0 row3&amp;quot; &amp;gt;Delayed&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row3_col0&amp;quot; class=&amp;quot;data row3 col0&amp;quot; &amp;gt;16&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row3_col1&amp;quot; class=&amp;quot;data row3 col1&amp;quot; &amp;gt;22&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row3_col2&amp;quot; class=&amp;quot;data row3 col2&amp;quot; &amp;gt;55&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row3_col3&amp;quot; class=&amp;quot;data row3 col3&amp;quot; &amp;gt;26&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row3_col4&amp;quot; class=&amp;quot;data row3 col4&amp;quot; &amp;gt;27&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87level0_row4&amp;quot; class=&amp;quot;row_heading level0 row4&amp;quot; &amp;gt;Futures&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row4_col0&amp;quot; class=&amp;quot;data row4 col0&amp;quot; &amp;gt;12&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row4_col1&amp;quot; class=&amp;quot;data row4 col1&amp;quot; &amp;gt;9&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row4_col2&amp;quot; class=&amp;quot;data row4 col2&amp;quot; &amp;gt;25&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row4_col3&amp;quot; class=&amp;quot;data row4 col3&amp;quot; &amp;gt;20&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row4_col4&amp;quot; class=&amp;quot;data row4 col4&amp;quot; &amp;gt;17&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87level0_row5&amp;quot; class=&amp;quot;row_heading level0 row5&amp;quot; &amp;gt;ML&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row5_col0&amp;quot; class=&amp;quot;data row5 col0&amp;quot; &amp;gt;5&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row5_col1&amp;quot; class=&amp;quot;data row5 col1&amp;quot; &amp;gt;11&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row5_col2&amp;quot; class=&amp;quot;data row5 col2&amp;quot; &amp;gt;23&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row5_col3&amp;quot; class=&amp;quot;data row5 col3&amp;quot; &amp;gt;11&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row5_col4&amp;quot; class=&amp;quot;data row5 col4&amp;quot; &amp;gt;7&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87level0_row6&amp;quot; class=&amp;quot;row_heading level0 row6&amp;quot; &amp;gt;Xarray&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row6_col0&amp;quot; class=&amp;quot;data row6 col0&amp;quot; &amp;gt;8&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row6_col1&amp;quot; class=&amp;quot;data row6 col1&amp;quot; &amp;gt;11&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row6_col2&amp;quot; class=&amp;quot;data row6 col2&amp;quot; &amp;gt;34&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row6_col3&amp;quot; class=&amp;quot;data row6 col3&amp;quot; &amp;gt;7&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_821479f4_b488_11e9_ad41_186590cd1c87row6_col4&amp;quot; class=&amp;quot;data row6 col4&amp;quot; &amp;gt;9&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;&amp;lt;/table&amp;gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Nothing really stands out. The “futures” users (who we expect to be relatively advanced) may prioritize features and performance over documentation. But everyone agrees that more examples are the highest priority.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/05/user-survey.md&lt;/span&gt;, line 325)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="common-feature-requests"&gt;
&lt;h1&gt;Common Feature Requests&lt;/h1&gt;
&lt;p&gt;For specific features, we made a list of things that we (as developers) thought might be important.&lt;/p&gt;
&lt;p&gt;&lt;img alt="svg" src="https://blog.dask.org/_images/analyze_22_0.svg" /&gt;&lt;/p&gt;
&lt;p&gt;The clearest standout is how many people thought “Better NumPy/Pandas support” was “most critical”. In hindsight, it’d be good to have a followup fill-in field to undertand what each respondent meant by that. The parsimonious interpretion is “cover more of the NumPy / pandas API”.&lt;/p&gt;
&lt;p&gt;“Ease of deployment” had a high proportion of “critical to me”. Again in hindsight, I notice a bit of ambiguity. Does this mean people want Dask to be easier to deploy? Or does this mean that Dask, which they currently find easy to deploy, is critically important? Regardless, we can prioritize simplicity in deployment.&lt;/p&gt;
&lt;p&gt;Relatively few respondents care about things like “Managing many users”, though we expect that this would be relatively popular among system administartors, who are a smaller population.&lt;/p&gt;
&lt;p&gt;And of course, we have people pushing Dask to its limits for whom “Improving scaling” is critically important.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/05/user-survey.md&lt;/span&gt;, line 339)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-other-systems-do-you-use"&gt;
&lt;h1&gt;What other systems do you use?&lt;/h1&gt;
&lt;p&gt;A relatively high proportion of respondents use Python 3 (97% compared to 84% in the most recent &lt;a class="reference external" href="https://www.jetbrains.com/research/python-developers-survey-2018/"&gt;Python Developers Survey&lt;/a&gt;).&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;3    97.29%
2     2.71%
Name: Python 2 or 3?, dtype: object
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We were a bit surprised to see that SSH is the most popular “cluster resource manager”.&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;SSH                                                       98
Kubernetes                                                73
HPC resource manager (SLURM, PBS, SGE, LSF or similar)    61
My workplace has a custom solution for this               23
I don&amp;#39;t know, someone else does this for me               16
Hadoop / Yarn / EMR                                       14
Name: If you use a cluster, how do you launch Dask? , dtype: int64
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;How does cluster-resource manager compare with API usage?&lt;/p&gt;
&lt;style  type="text/css" &gt;
    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col0 {
            background-color:  #056faf;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col1 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col3 {
            background-color:  #034e7b;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col4 {
            background-color:  #2685bb;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col5 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col6 {
            background-color:  #f2ecf5;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col0 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col1 {
            background-color:  #f7f0f7;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col2 {
            background-color:  #0771b1;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col3 {
            background-color:  #0771b1;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col4 {
            background-color:  #c5cce3;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col5 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col6 {
            background-color:  #79abd0;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col0 {
            background-color:  #8bb2d4;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col1 {
            background-color:  #b4c4df;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col3 {
            background-color:  #589ec8;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col4 {
            background-color:  #eee9f3;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col5 {
            background-color:  #8bb2d4;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col6 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col0 {
            background-color:  #4c99c5;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col1 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col3 {
            background-color:  #056dac;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col4 {
            background-color:  #73a9cf;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col5 {
            background-color:  #d9d8ea;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col6 {
            background-color:  #f3edf5;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col0 {
            background-color:  #056ba9;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col1 {
            background-color:  #fff7fb;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col2 {
            background-color:  #023858;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col3 {
            background-color:  #1379b5;
            color:  #f1f1f1;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col4 {
            background-color:  #dfddec;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col5 {
            background-color:  #e8e4f0;
            color:  #000000;
        }    #T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col6 {
            background-color:  #f9f2f8;
            color:  #000000;
        }&lt;/style&gt;&lt;table id="T_8326d0f8_b488_11e9_ad41_186590cd1c87" &gt;&lt;thead&gt;    &lt;tr&gt;        &lt;th class="index_name level0" &gt;Dask APIs&lt;/th&gt;        &lt;th class="col_heading level0 col0" &gt;Array&lt;/th&gt;        &lt;th class="col_heading level0 col1" &gt;Bag&lt;/th&gt;        &lt;th class="col_heading level0 col2" &gt;DataFrame&lt;/th&gt;        &lt;th class="col_heading level0 col3" &gt;Delayed&lt;/th&gt;        &lt;th class="col_heading level0 col4" &gt;Futures&lt;/th&gt;        &lt;th class="col_heading level0 col5" &gt;ML&lt;/th&gt;        &lt;th class="col_heading level0 col6" &gt;Xarray&lt;/th&gt;    &lt;/tr&gt;    &lt;tr&gt;        &lt;th class="index_name level0" &gt;If you use a cluster, how do you launch Dask? &lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;        &lt;th class="blank" &gt;&lt;/th&gt;    &lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;            &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87level0_row0&amp;quot; class=&amp;quot;row_heading level0 row0&amp;quot; &amp;gt;Custom&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col0&amp;quot; class=&amp;quot;data row0 col0&amp;quot; &amp;gt;15&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col1&amp;quot; class=&amp;quot;data row0 col1&amp;quot; &amp;gt;6&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col2&amp;quot; class=&amp;quot;data row0 col2&amp;quot; &amp;gt;18&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col3&amp;quot; class=&amp;quot;data row0 col3&amp;quot; &amp;gt;17&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col4&amp;quot; class=&amp;quot;data row0 col4&amp;quot; &amp;gt;14&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col5&amp;quot; class=&amp;quot;data row0 col5&amp;quot; &amp;gt;6&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row0_col6&amp;quot; class=&amp;quot;data row0 col6&amp;quot; &amp;gt;7&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87level0_row1&amp;quot; class=&amp;quot;row_heading level0 row1&amp;quot; &amp;gt;HPC&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col0&amp;quot; class=&amp;quot;data row1 col0&amp;quot; &amp;gt;50&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col1&amp;quot; class=&amp;quot;data row1 col1&amp;quot; &amp;gt;13&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col2&amp;quot; class=&amp;quot;data row1 col2&amp;quot; &amp;gt;40&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col3&amp;quot; class=&amp;quot;data row1 col3&amp;quot; &amp;gt;40&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col4&amp;quot; class=&amp;quot;data row1 col4&amp;quot; &amp;gt;22&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col5&amp;quot; class=&amp;quot;data row1 col5&amp;quot; &amp;gt;11&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row1_col6&amp;quot; class=&amp;quot;data row1 col6&amp;quot; &amp;gt;30&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87level0_row2&amp;quot; class=&amp;quot;row_heading level0 row2&amp;quot; &amp;gt;Hadoop / Yarn / EMR&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col0&amp;quot; class=&amp;quot;data row2 col0&amp;quot; &amp;gt;7&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col1&amp;quot; class=&amp;quot;data row2 col1&amp;quot; &amp;gt;6&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col2&amp;quot; class=&amp;quot;data row2 col2&amp;quot; &amp;gt;12&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col3&amp;quot; class=&amp;quot;data row2 col3&amp;quot; &amp;gt;8&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col4&amp;quot; class=&amp;quot;data row2 col4&amp;quot; &amp;gt;4&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col5&amp;quot; class=&amp;quot;data row2 col5&amp;quot; &amp;gt;7&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row2_col6&amp;quot; class=&amp;quot;data row2 col6&amp;quot; &amp;gt;3&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87level0_row3&amp;quot; class=&amp;quot;row_heading level0 row3&amp;quot; &amp;gt;Kubernetes&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col0&amp;quot; class=&amp;quot;data row3 col0&amp;quot; &amp;gt;40&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col1&amp;quot; class=&amp;quot;data row3 col1&amp;quot; &amp;gt;18&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col2&amp;quot; class=&amp;quot;data row3 col2&amp;quot; &amp;gt;56&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col3&amp;quot; class=&amp;quot;data row3 col3&amp;quot; &amp;gt;47&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col4&amp;quot; class=&amp;quot;data row3 col4&amp;quot; &amp;gt;37&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col5&amp;quot; class=&amp;quot;data row3 col5&amp;quot; &amp;gt;26&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row3_col6&amp;quot; class=&amp;quot;data row3 col6&amp;quot; &amp;gt;21&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
        &amp;lt;tr&amp;gt;
                    &amp;lt;th id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87level0_row4&amp;quot; class=&amp;quot;row_heading level0 row4&amp;quot; &amp;gt;SSH&amp;lt;/th&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col0&amp;quot; class=&amp;quot;data row4 col0&amp;quot; &amp;gt;61&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col1&amp;quot; class=&amp;quot;data row4 col1&amp;quot; &amp;gt;23&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col2&amp;quot; class=&amp;quot;data row4 col2&amp;quot; &amp;gt;72&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col3&amp;quot; class=&amp;quot;data row4 col3&amp;quot; &amp;gt;58&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col4&amp;quot; class=&amp;quot;data row4 col4&amp;quot; &amp;gt;32&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col5&amp;quot; class=&amp;quot;data row4 col5&amp;quot; &amp;gt;30&amp;lt;/td&amp;gt;
                    &amp;lt;td id=&amp;quot;T_8326d0f8_b488_11e9_ad41_186590cd1c87row4_col6&amp;quot; class=&amp;quot;data row4 col6&amp;quot; &amp;gt;25&amp;lt;/td&amp;gt;
        &amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;&amp;lt;/table&amp;gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;HPC users are relatively heavy users of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.array&lt;/span&gt;&lt;/code&gt; and xarray.&lt;/p&gt;
&lt;p&gt;Somewhat surprisingly, Dask’s heaviest users find dask stable enough. Perhaps they’ve pushed past the bugs and found workarounds (percentages are normalized by row).&lt;/p&gt;
&lt;p&gt;&lt;img alt="svg" src="https://blog.dask.org/_images/analyze_32_0.svg" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/05/user-survey.md&lt;/span&gt;, line 525)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="takeaways"&gt;
&lt;h1&gt;Takeaways&lt;/h1&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;We should prioritize improving and expanding our documentation and examples. This may be
accomplished by Dask maintainers seeking examples from the community. Many of the examples
on &lt;a class="reference external" href="https://examples.dask.org"&gt;https://examples.dask.org&lt;/a&gt; were developed by domain specialist who use Dask.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved scaling to larger problems is important, but we shouldn’t
sacrifice the single-machine usecase to get there.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Both interactive and batch workflows are important.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask’s various sub-communities are more similar than they are different.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Thanks again to all the respondents. We look forward to repeating this process to identify trends over time.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/08/05/user-survey/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="UserSurvey" label="User Survey"/>
    <published>2019-08-05T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/08/02/dask-2.2/</id>
    <title>Dask Release 2.2.0</title>
    <updated>2019-08-02T00:00:00+00:00</updated>
    <content type="html">&lt;p&gt;I’m pleased to announce the release of Dask version 2.2.
This is a significant release with bug fixes and new features.
The last blogged release was 2.0 on 2019-06-22.
This blogpost outlines notable changes since the last post.&lt;/p&gt;
&lt;p&gt;You can conda install Dask:&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;conda install dask
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;or pip install from PyPI:&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;pip install dask[complete] --upgrade
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Full changelogs are available here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/blob/master/docs/source/changelog.rst"&gt;dask/dask&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/distributed/blob/master/docs/source/changelog.rst"&gt;dask/distributed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/02/dask-2.2.md&lt;/span&gt;, line 26)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="notable-changes"&gt;

&lt;p&gt;As always there are too many changes to list here,
instead we’ll highlight a few that readers may find interesting,
or that break old behavior.
In particular we discuss the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Parquet rewrite&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nicer HTML output for Clients and Logs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hyper-parameter selection with Hyperband in Dask-ML&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move bytes I/O handling out of Dask to FSSpec&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;async/await everywhere, and cleaner setup for developers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A new SSH deployment solution&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/02/dask-2.2.md&lt;/span&gt;, line 40)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="parquet-rewrite"&gt;
&lt;h1&gt;1 - Parquet Rewrite&lt;/h1&gt;
&lt;p&gt;Today Dask DataFrame can read and write Parquet data using either
&lt;a class="reference external" href="https://fastparquet.readthedocs.io"&gt;fastparquet&lt;/a&gt; or
&lt;a class="reference external" href="https://arrow.apache.org/"&gt;Apache Arrow&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/path/to/mydata.parquet&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;arrow&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# or&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/path/to/mydata.parquet&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fastparquet&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Supporting both libraries within Dask has been helpful for
users, but introduced some maintenance burden, especially given that each
library co-evolved with Dask dataframe over the years. The contract between
Dask Dataframe and these libraries was convoluted, making it difficult to
evolve swiftly.&lt;/p&gt;
&lt;p&gt;To address this we’ve formalized what Dask expects of Parquet reader/writers
into a more formal Parquet Engine contract. This keeps maintenance costs
low, enables independent development for each project, and allows for new
engines to emerge.&lt;/p&gt;
&lt;p&gt;Already a GPU-accelerated Parquet reader is available in a PR on the &lt;a class="reference external" href="https://github.com/rapidsai/cudf/pull/2368"&gt;RAPIDS
cuDF&lt;/a&gt; library.&lt;/p&gt;
&lt;p&gt;As a result, we’ve also been able to fix a number of long-standing bugs, and
improve the functionality with both engines.&lt;/p&gt;
&lt;p&gt;Some fun quotes from &lt;a class="reference external" href="https://github.com/birdsarah"&gt;Sarah Bird&lt;/a&gt; during development&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;I am currently testing this. So far so good. I can load my dataset in a few seconds with 1800 partitions. Game changing!&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;and&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;I am now successfully working on a dataset with 74,000 partitions and no metadata.
Opening dataset and df.head() takes 7 - 30s. (Presumably depending on whether s3fs cache is cold or not). THIS IS HUGE! This was literally impossible before.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The API remains the same, but functionality should be smoother.&lt;/p&gt;
&lt;p&gt;Thanks to &lt;a class="reference external" href="https://github.com/rjzamora"&gt;Rick Zamora&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/martindurant"&gt;Martin
Durant&lt;/a&gt; for doing most of the work here and to
&lt;a class="reference external" href="https://github.com/birdsarah"&gt;Sarah Bird&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/wesm"&gt;Wes
McKinney&lt;/a&gt;, and &lt;a class="reference external" href="https://github.com/mmccarty"&gt;Mike
McCarty&lt;/a&gt; for providing guidance and review.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/02/dask-2.2.md&lt;/span&gt;, line 88)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="nicer-html-output-for-clients-and-logs"&gt;
&lt;h1&gt;2 - Nicer HTML output for Clients and Logs&lt;/h1&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table style="border: 2px solid white;"&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 0px solid white"&gt;
&lt;h3 style="text-align: left;"&gt;Client&lt;/h3&gt;
&lt;ul style="text-align: left; list-style: none; margin: 0; padding: 0;"&gt;
  &lt;li&gt;&lt;b&gt;Scheduler: &lt;/b&gt;tcp://127.0.0.1:60275&lt;/li&gt;
  &lt;li&gt;&lt;b&gt;Dashboard: &lt;/b&gt;&lt;a href='http://127.0.0.1:8787/status' target='_blank'&gt;http://127.0.0.1:8787/status&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 0px solid white"&gt;
&lt;h3 style="text-align: left;"&gt;Cluster&lt;/h3&gt;
&lt;ul style="text-align: left; list-style:none; margin: 0; padding: 0;"&gt;
  &lt;li&gt;&lt;b&gt;Workers: &lt;/b&gt;4&lt;/li&gt;
  &lt;li&gt;&lt;b&gt;Cores: &lt;/b&gt;12&lt;/li&gt;
  &lt;li&gt;&lt;b&gt;Memory: &lt;/b&gt;17.18 GB&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div markdown="0"&gt;
&lt;details&gt;
&lt;summary&gt;Scheduler&lt;/summary&gt;
&lt;pre&gt;&lt;code&gt;distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:     tcp://127.0.0.1:60275
distributed.scheduler - INFO -   dashboard at:            127.0.0.1:8787
distributed.scheduler - INFO - Register tcp://127.0.0.1:60281
distributed.scheduler - INFO - Register tcp://127.0.0.1:60282
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:60281
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:60282
distributed.scheduler - INFO - Register tcp://127.0.0.1:60285
distributed.scheduler - INFO - Register tcp://127.0.0.1:60286
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:60285
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:60286
distributed.scheduler - INFO - Receive client connection: Client-6b6ba1d0-b3bd-11e9-9bd0-acde48001122&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt;tcp://127.0.0.1:60281&lt;/summary&gt;
&lt;pre&gt;&lt;code&gt;distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:60281
distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:60281
distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:60275
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          3
distributed.worker - INFO -                Memory:                    4.29 GB
distributed.worker - INFO -       Local Directory: /Users/mrocklin/workspace/dask/dask-worker-space/worker-c4_44fym
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:60275
distributed.worker - INFO - -------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt;tcp://127.0.0.1:60282&lt;/summary&gt;
&lt;pre&gt;&lt;code&gt;distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:60282
distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:60282
distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:60275
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          3
distributed.worker - INFO -                Memory:                    4.29 GB
distributed.worker - INFO -       Local Directory: /Users/mrocklin/workspace/dask/dask-worker-space/worker-quu4taje
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:60275
distributed.worker - INFO - -------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt;tcp://127.0.0.1:60285&lt;/summary&gt;
&lt;pre&gt;&lt;code&gt;distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:60285
distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:60285
distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:60275
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          3
distributed.worker - INFO -                Memory:                    4.29 GB
distributed.worker - INFO -       Local Directory: /Users/mrocklin/workspace/dask/dask-worker-space/worker-ll4cozug
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:60275
distributed.worker - INFO - -------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;tcp://127.0.0.1:60286&lt;/summary&gt;&lt;pre&gt;&lt;code&gt;distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:60286
distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:60286
distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:60275
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          3
distributed.worker - INFO -                Memory:                    4.29 GB
distributed.worker - INFO -       Local Directory: /Users/mrocklin/workspace/dask/dask-worker-space/worker-lpbkkzj6
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:60275
distributed.worker - INFO - -------------------------------------------------&lt;/code&gt;&lt;/pre&gt;&lt;/details&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Note: this looks better under any browser other than IE and Edge&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Thanks to &lt;a class="reference external" href="https://github.com/jacobtomlinson"&gt;Jacob Tomlinson&lt;/a&gt; for this work.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/02/dask-2.2.md&lt;/span&gt;, line 191)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="hyperparameter-selection-with-hyperband"&gt;
&lt;h1&gt;3 - Hyperparameter selection with HyperBand&lt;/h1&gt;
&lt;p&gt;Dask-ML 1.0 has been released with a new &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperBandSearchCV&lt;/span&gt;&lt;/code&gt; meta-estimator for
hyper-parameter optimization. This can be used as an alternative to
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;RandomizedSearchCV&lt;/span&gt;&lt;/code&gt; to find similar hyper-parameters in less time by not
wasting time on hyper-parameters that are not promising.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_ml.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_ml.datasets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_classification&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SGDClassifier&lt;/span&gt;

&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_classification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;est&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SGDClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;param_dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;alpha&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;              &lt;span class="s1"&gt;&amp;#39;loss&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;hinge&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;log&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;modified_huber&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;squared_hinge&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;              &lt;span class="s1"&gt;&amp;#39;average&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;est&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;param_dist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params_&lt;/span&gt;
&lt;span class="go"&gt;{&amp;#39;loss&amp;#39;: &amp;#39;log&amp;#39;, &amp;#39;average&amp;#39;: False, &amp;#39;alpha&amp;#39;: 0.0080502}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Thanks to &lt;a class="reference external" href="http://github.com/stsievert"&gt;Scott Sievert&lt;/a&gt;.
You can see Scott talk about this topic in greater depth by watching his
&lt;a class="reference external" href="https://youtu.be/x67K9FiPFBQ"&gt;SciPy 2019 talk&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/02/dask-2.2.md&lt;/span&gt;, line 220)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="move-bytes-i-o-handling-out-of-dask-to-fsspec"&gt;
&lt;h1&gt;4 - Move bytes I/O handling out of Dask to FSSpec&lt;/h1&gt;
&lt;p&gt;We’ve spun out Dask’s internal code to read and write raw data to different
storage systems out to a separate project, &lt;a class="reference external" href="https://fsspec.readthedocs.io"&gt;fsspec&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here is a small example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;fsspec&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;fsspec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;https://github.com/dask/dask/edit/master/README.rst&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;fsspec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;s3://bucket/myfile.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;fsspec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;hdfs:///path/to/myfile.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;fsspec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;gcs://bucket/myfile.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Dask’s I/O infrastructure to read and write bytes from systems like
HDFS, S3, GCS, Azure, and other remote storage systems is arguably the most
uniform and comprehensive in Python today. Through tooling like
&lt;a class="reference external" href="https://s3fs.readthedocs.io"&gt;s3fs&lt;/a&gt;, &lt;a class="reference external" href="https://gcsfs.readthedocs.io"&gt;gcsfs&lt;/a&gt;,
and ~~hdfs3~~ &lt;a class="reference external" href="https://arrow.apache.org/docs/python/filesystems.html"&gt;pyarrow.hdfs&lt;/a&gt;,
it’s easy to read and write data in a Pythonic way to a
variety of remote storage systems.&lt;/p&gt;
&lt;p&gt;Early on we decided that we wanted this code to live outside of the mainline
Dask codebase, which is why they are independent projects.
This choice allowed other libraries, like Pandas, Zarr, and others to benefit
from this work, without a strict dependency on Dask.
However, there was still code within Dask that helped to unify them a bit.
We’ve moved this code out to an external project,
&lt;a class="reference external" href="https://filesystem-spec.readthedocs.io/en/latest"&gt;fsspec&lt;/a&gt; which includes all
of the centralization code that Dask used to provide, as well as a formal
specification for what a remote data system should look like in order to be
compatible. This also helps to unify efforts with other projects like Arrow.&lt;/p&gt;
&lt;p&gt;Special thanks to &lt;a class="reference external" href="https://github.com/martindurant"&gt;Martin Durant&lt;/a&gt; for
shepherding Dask’s I/O infrastructure over the years, and for doing the more
immediate work of splitting out &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fsspec&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You can read more about FSSpec and its transition out of Dask
&lt;a class="reference external" href="https://blog.dask.org/2019/07/23/extracting-fsspec-from-dask"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/02/dask-2.2.md&lt;/span&gt;, line 269)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="async-await-everywhere-and-cleaner-setup-for-developers"&gt;
&lt;h1&gt;5 - Async/Await everywhere, and cleaner setup for developers&lt;/h1&gt;
&lt;p&gt;In Dask 2.0 we dropped Python 2 support and now support only Python 3.5 and
above.
This allows us to adopt async and await syntax for concurrent execution rather
than an older coroutine based approach with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;yield&lt;/span&gt;&lt;/code&gt;. The differences here
started out as largely aesthetic, but triggered a number of substantive
improvements as we walked through the codebase cleaning things up. Starting
and stopping internal Scheduler, Worker, Nanny, and Client objects is now far
more uniform, reducing the presence of subtle bugs.&lt;/p&gt;
&lt;p&gt;This is discussed in more detail in the &lt;a class="reference external" href="https://docs.dask.org/en/latest/setup/python-advanced.html"&gt;Python API setup
documentation&lt;/a&gt; and
is encapsulated in this code example from those docs:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;asyncio&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;Scheduler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;Worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;w1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;w2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;asynchronous&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;
                &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_until_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;As a result of this and other internal cleanup intermittent testing failures in
our CI have disappeared, and developer mood is high :)&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/02/dask-2.2.md&lt;/span&gt;, line 303)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="a-new-sshcluster"&gt;
&lt;h1&gt;6 - A new SSHCluster&lt;/h1&gt;
&lt;p&gt;We’ve added a second SSH cluster deployment solution. It looks like this:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;distributed.deploy.ssh2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;  &lt;span class="c1"&gt;# this will move in future releases&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;hosts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;host1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;host2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;host3&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;host4&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="c1"&gt;# hosts=[&amp;quot;localhost&amp;quot;] * 4  # if you want to try this out locally,&lt;/span&gt;
    &lt;span class="n"&gt;worker_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;nthreads&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;scheduler_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="n"&gt;connect_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;known_hosts&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Note that this object is experimental, and subject to change without notice&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We worked on this for two reasons:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Our user survey showed that a surprising number of people were deploying
Dask with SSH. Anecdotally they seem to be just SSHing into machines and
then using Dask’s normal &lt;a class="reference external" href="https://docs.dask.org/en/latest/setup/cli.html"&gt;Dask Command Line
Interface&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;We wanted a solution that was easier than this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’ve been trying to unify the code in the various deployment solutions
(like Kubernetes, SLURM, Yarn/Hadoop) to a central codebase, and having a
simple SSHCluster as a test case has proven valuable for testing and
experimentation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Also note, Dask already has a
&lt;a class="reference external" href="https://docs.dask.org/en/latest/setup/ssh.html"&gt;dask-ssh&lt;/a&gt; solution today that is more mature&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We expect that unification of deployment will be a central theme for the next
few months of development.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/08/02/dask-2.2.md&lt;/span&gt;, line 341)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="acknowledgements"&gt;
&lt;h1&gt;Acknowledgements&lt;/h1&gt;
&lt;p&gt;There have been two releases since the last time we had a release blogpost.
The following people contributed to the following repositories since the 2.0
release on June 30th:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask"&gt;dask/dask&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Brett Naul&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daniel Saxton&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;David Brochart&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Davis Bennett&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elliott Sales de Andrade&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GALI PREM SAGAR&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;James Bourbeau&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jim Crist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Loïc Estève&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Martin Durant&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthias Bussonnier&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Natalya Rapstine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nick Becker&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Peter Andreas Entschev&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ralf Gommers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Richard (Rick) Zamora&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sarah Bird&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sean McKenna&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Willi Rath&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Xavier Holt&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;andrethrill&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;asmith26&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;msbrown47&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tshatrov&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/distributed"&gt;dask/distributed&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Christian Hudon&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gabriel Sailer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jacob Tomlinson&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;James Bourbeau&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jim Crist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Martin Durant&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pierre Glaser&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Russ Bubley&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tjb900&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-jobqueue"&gt;dask/dask-jobqueue&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Guillaume Eynard-Bontemps&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Leo Singer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Loïc Estève&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stuart Berg&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-examples"&gt;dask/dask-examples&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Chris White&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ian Rose&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-mpi"&gt;dask/dask-mpi&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Anderson Banihirwe&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kevin Paul&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-kubernetes"&gt;dask/dask-kubernetes&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-ml"&gt;dask/dask-ml&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Roman Yurchak&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-yarn"&gt;dask/dask-yarn&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Al Johri&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jim Crist&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-examples"&gt;dask/dask-examples&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/08/02/dask-2.2/"/>
    <summary>I’m pleased to announce the release of Dask version 2.2.
This is a significant release with bug fixes and new features.
The last blogged release was 2.0 on 2019-06-22.
This blogpost outlines notable changes since the last post.</summary>
    <category term="release" label="release"/>
    <published>2019-08-02T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/07/23/extracting-fsspec-from-dask/</id>
    <title>Extracting fsspec from Dask</title>
    <updated>2019-07-23T00:00:00+00:00</updated>
    <author>
      <name>Martin Durant</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/07/23/extracting-fsspec-from-dask.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="tl-dr"&gt;

&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fsspec&lt;/span&gt;&lt;/code&gt;, the new base for file system operations in Dask, Intake, s3fs, gcsfs and others,
is now available as a stand-alone interface and central place to develop new backends
and file operations. Although it was developed as part of Dask, you no longer need Dask
to use this functionality.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/07/23/extracting-fsspec-from-dask.md&lt;/span&gt;, line 16)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="introduction"&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Over the past few years, Dask’s IO capability has grown gradually and organically, to
include a number of file-formats, and the ability to access data seamlessly on various
remote/cloud data systems. This has been achieved through a number of sister packages
for viewing cloud resources as file systems, and dedicated code in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.bytes&lt;/span&gt;&lt;/code&gt;.
Some of the storage backends, particularly &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;s3fs&lt;/span&gt;&lt;/code&gt;, became immediately useful outside of
Dask too, and were picked up as optional dependencies by &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;pandas&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;xarray&lt;/span&gt;&lt;/code&gt; and others.&lt;/p&gt;
&lt;p&gt;For the sake of consolidating the behaviours of the
various backends, providing a single reference specification for any new backends,
and to make this set of file system operations available even without Dask, I
created &lt;a class="reference external" href="https://filesystem-spec.readthedocs.io/en/latest/"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fsspec&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;.
This last week, Dask changed to use &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fsspec&lt;/span&gt;&lt;/code&gt; directly for its
IO needs, and I would like to describe in detail here the benefits of this change.&lt;/p&gt;
&lt;p&gt;Although this was done initially to easy the maintenance burden, the important takeaway
is that we want to make file systems operations easily available to the whole pydata ecosystem,
with or without Dask.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/07/23/extracting-fsspec-from-dask.md&lt;/span&gt;, line 36)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="history"&gt;
&lt;h1&gt;History&lt;/h1&gt;
&lt;p&gt;The first file system I wrote was &lt;a class="reference external" href="https://github.com/dask/hdfs3"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;hdfs3&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;, a thin wrapper
around the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;libhdfs3&lt;/span&gt;&lt;/code&gt; C library. At the time, Dask had acquired the ability to run on a
distributed cluster, and HDFS was the most popular storage solution for these (in the
commercial world, at least), so a solution was required. The python API closely matched
the C one, which in turn followed the Java API and posix standards. Fortunately, python already
has a &lt;a class="reference external" href="https://docs.python.org/3/library/io.html#i-o-base-classes"&gt;file-like standard&lt;/a&gt;, so
providing objects that implemented that was enough to make remote bytes available to many
packages.&lt;/p&gt;
&lt;p&gt;Pretty soon, it became apparent that cloud resources would be at least as important as in-cluster
file systems, and so followed &lt;a class="reference external" href="https://github.com/dask/s3fs"&gt;s3fs&lt;/a&gt;,
&lt;a class="reference external" href="https://github.com/Azure/azure-data-lake-store-python"&gt;adlfs&lt;/a&gt;, and &lt;a class="reference external" href="https://github.com/dask/gcsfs"&gt;gcsfs&lt;/a&gt;.
Each followed the same pattern, but with some specific code for the given interface, and
improvements based on the experience of the previous interfaces. During this time, Dask’s
needs also evolved, due to more complex file formats such as parquet. Code to interface to
the different backends and adapt their methods ended up in the Dask repository.&lt;/p&gt;
&lt;p&gt;In the meantime, other file system interfaces arrived, particularly
&lt;a class="reference external" href="https://arrow.apache.org/docs/python/filesystems.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;pyarrow&lt;/span&gt;&lt;/code&gt;’s&lt;/a&gt;, which had its own HDFS
implementation and direct parquet reading. But we would like all of the tools in
the ecosystem to work together well, so that Dask can read parquet using either
engine from any of the storage backends.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/07/23/extracting-fsspec-from-dask.md&lt;/span&gt;, line 61)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="code-duplication"&gt;
&lt;h1&gt;Code duplication&lt;/h1&gt;
&lt;p&gt;Copying an interface, adapting it and releasing it, as I did with each iteration of the file system,
is certainly a quick way to get a job done. However, when you then want to change the behaviour, or
add new functionality, it turns out you need to repeat the work in each place
(violating the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself"&gt;DRY&lt;/a&gt; principle) or have
the interfaces diverge slowly. Good examples of this were &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;glob&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;walk&lt;/span&gt;&lt;/code&gt;, which supported various
options for the former, and returned different things (list, versions dir/files iterator) for the
latter.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LocalFileSystem&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/home/path/&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;iterator of tuples&amp;gt;&lt;/span&gt;


&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3fs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;S3FileSystme&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;bucket/path&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;[list of filenames]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We found that, for Dask’s needs, we needed to build small wrapper
classes to ensure compatible APIs to all backends, as well as a class for operating on the local
file system with the same interface, and finally a registry for all of these with various helper
functions. Very little of this was specific to Dask, with only a couple of
functions concerning themselves with building graphs and deferred execution. It did, however,
raise the important issue that file systems should be serializable and that there should
be a way to specify a file to be opened, which is also serializable (and ideally supports
transparent text and compression).&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/07/23/extracting-fsspec-from-dask.md&lt;/span&gt;, line 91)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="new-file-systems"&gt;
&lt;h1&gt;New file systems&lt;/h1&gt;
&lt;p&gt;I already mentioned the effort to make a local file system class which met the same interface as
the other ones which already existed. But there are more options that Dask users (and others)
might want, such as ssh, ftp, http, in-memory, and so on. Following requests from users to handle these options,
we started to write more file system interfaces, which all lived within &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.bytes&lt;/span&gt;&lt;/code&gt;; but it was unclear
whether they should only support very minimal functionality, just enough to get something done from
Dask, or a full set of file operations.&lt;/p&gt;
&lt;p&gt;The in-memory file system, in particular, existed in an extremely long-lived PR - it’s not
clear how useful such a thing is to Dask, when each worker has it’s own memory, and so sees
a different state of the “file system”.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/07/23/extracting-fsspec-from-dask.md&lt;/span&gt;, line 104)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="consolidation"&gt;
&lt;h1&gt;Consolidation&lt;/h1&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/intake/filesystem_spec"&gt;file system Spec&lt;/a&gt;, later &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fsspec&lt;/span&gt;&lt;/code&gt;, was born out of a desire
to codify and consolidate the behaviours of the storage backends, reduce duplication, and provide the
same functionality to all backends. In the process, it became much easier to write new implementation
classes: see the &lt;a class="reference external" href="https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations"&gt;implementation&lt;/a&gt;,
which include interesting and highly experimental options such as the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;CachingFileSystem&lt;/span&gt;&lt;/code&gt;, which
makes local copies of every remote read, for faster access the second time around. However, more
important main-stream implementations also took shape, such as FTP, SSH, Memory and webHDFS
(the latter being the best bet for accessing HDFS from outside the cluster, following all the
problems building and authenticating with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;hdfs3&lt;/span&gt;&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Furthermore, the new repository gave the opportunity to implement new features, which would then have
further-reaching applicability than if they had been done in just selected repositories. Examples include
FUSE mounting, dictionary-style key-value views on file systems
(such as used by &lt;a class="reference external" href="https://zarr.readthedocs.io/en/stable/"&gt;zarr&lt;/a&gt;), and transactional writing of
files. All file systems are serializable and pyarrow-compliant.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/07/23/extracting-fsspec-from-dask.md&lt;/span&gt;, line 122)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="usefulness"&gt;
&lt;h1&gt;Usefulness&lt;/h1&gt;
&lt;p&gt;Eventually it dawned on my that the operations offered by the file system classes are very useful
for people not using Dask too. Indeed, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;s3fs&lt;/span&gt;&lt;/code&gt;, for example, sees plenty of use stand-alone, or in
conjunction with something like fastparquet, which can accept file system functions to its method,
or pandas.&lt;/p&gt;
&lt;p&gt;So it seemed to make sense to have a particular repo to write out the spec that a Dask-compliant
file system should adhere to, and I found that I could factor out a lot of common behaviour from
the existing implementations, provide functionality that had existed in only some to all, and
generally improve every implementation along the way.&lt;/p&gt;
&lt;p&gt;However, it was when considering &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fsspec&lt;/span&gt;&lt;/code&gt; in conjunction with &lt;a class="reference external" href="https://github.com/intake/intake/pull/381"&gt;Intake&lt;/a&gt;
that I realised how generally useful a stand-alone file system package can be: the PR
implemented a generalised file selector that can browse files in any file system that we
have available, even being able, for instance, to view a remote zip-file on S3 as a
browseable file system. Note that, similar to the general thrust of this blog, the
file selector itself need not live in the Intake repo and will eventually become either
its own thing, or an optional feature of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fsspec&lt;/span&gt;&lt;/code&gt;. You shouldn’t need Intake either just
to get generalised file system operations.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/07/23/extracting-fsspec-from-dask.md&lt;/span&gt;, line 143)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="final-thoughts"&gt;
&lt;h1&gt;Final Thoughts&lt;/h1&gt;
&lt;p&gt;This work is not quite on the level of “protocol standards” such as the well-know python buffer
protocol, but I think it is a useful step in making data in various storage services available
to people, since you can operate on each with the same API, expect the same behaviour, and
create real python file-like objects to pass to other functions. Having a single central repo
like this offers an obvious place to discuss and amend the spec, and build extra functionality
onto it.&lt;/p&gt;
&lt;p&gt;Many improvements remain to be done, such as support for globstrings in more functions, or
a single file system which can dispatch to the various backends depending on the form of the
URL provided; but there is now an obvious place for all of this to happen.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/07/23/extracting-fsspec-from-dask/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="IO" label="IO"/>
    <published>2019-07-23T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/06/22/dask-2.0/</id>
    <title>Dask Release 2.0</title>
    <updated>2019-06-22T00:00:00+00:00</updated>
    <author>
      <name>the Dask Maintainers</name>
    </author>
    <content type="html">&lt;p&gt;&lt;em&gt;Please take the &lt;a class="reference external" href="https://t.co/OGrIjTLC2G"&gt;Dask User Survey for 2019&lt;/a&gt;.&lt;/em&gt;
&lt;em&gt;Your reponse helps to prioritize future work.&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;We are pleased to announce the release of Dask version 2.0.
This is a major release with bug fixes and new features.&lt;/p&gt;
&lt;p&gt;Most major version changes of software signal many new and exciting features.
That is not the case with this release.
Instead, we’re bumping the major version number because
we’ve broken a few APIs to improve maintainability,
and because we decided to drop support for Python 2.&lt;/p&gt;
&lt;p&gt;This blogpost outlines these changes.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/22/dask-2.0.md&lt;/span&gt;, line 26)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="install"&gt;

&lt;p&gt;As always, you can conda install Dask:&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;conda install dask
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;or pip install from PyPI:&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;pip install &amp;quot;dask[complete]&amp;quot; --upgrade
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Full changelogs are available here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/blob/master/docs/source/changelog.rst"&gt;dask/dask&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/distributed/blob/master/docs/source/changelog.rst"&gt;dask/distributed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/22/dask-2.0.md&lt;/span&gt;, line 41)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="drop-support-for-python-2"&gt;
&lt;h1&gt;Drop support for Python 2&lt;/h1&gt;
&lt;p&gt;Python 2 reaches end of life in 2020, just six months away. Most major PyData
projects are dropping Python 2 support around now. See the &lt;a class="reference external" href="https://python3statement.org/"&gt;Python 3
Statement&lt;/a&gt; for more details about some of your
favorite projects.&lt;/p&gt;
&lt;p&gt;Python 2 users can continue to use older versions of Dask, which are in
widespread use today. Institutions looking for long term support of Dask in
Python 2 may wish to reach out to for-profit consulting companies, like
&lt;a class="reference external" href="https://www.quansight.com/"&gt;Quansight&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Dropping Python 2 will allow maintainers to spend more of their time fixing
bugs and developing new features. It will also allow the project to adopt more
modern development practices going forward.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/22/dask-2.0.md&lt;/span&gt;, line 57)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="small-breaking-changes"&gt;
&lt;h1&gt;Small breaking changes&lt;/h1&gt;
&lt;p&gt;We now include a list with a brief description of most of the breaking changes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The distributed.bokeh module has moved to distributed.dashboard&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Various &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ncores&lt;/span&gt;&lt;/code&gt; keywords have been moved to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nthreads&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Client.map/gather/scatter no longer accept iterators and Python queue
objects. Users can handle this themselves with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;submit&lt;/span&gt;&lt;/code&gt;/&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;as_completed&lt;/span&gt;&lt;/code&gt; or
can use the &lt;a class="reference external" href="https://github.com/python-streamz/streamz"&gt;Streamz&lt;/a&gt; library.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The worker &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/main&lt;/span&gt;&lt;/code&gt; route has moved to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/status&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cluster.workers is now a dictionary mapping worker name to worker, rather
than a list as it was before&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/22/dask-2.0.md&lt;/span&gt;, line 70)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="some-larger-fun-changes"&gt;
&lt;h1&gt;Some larger fun changes&lt;/h1&gt;
&lt;p&gt;We didn’t only break things. We also added some new things :)&lt;/p&gt;
&lt;section id="array-metadata"&gt;
&lt;h2&gt;Array metadata&lt;/h2&gt;
&lt;p&gt;Previously Dask Arrays were defined by their shape, chunkshape, and datatype,
like float, int, and so on.&lt;/p&gt;
&lt;p&gt;Now, Dask Arrays also know the type of their chunks. Historically this was
almost always a NumPy array, so it didn’t make sense to store, but now that
Dask Arrays are being used more frequently with sparse array chunks and GPU
array chunks we now maintain this information as well in a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;._meta&lt;/span&gt;&lt;/code&gt; attribute.
This is already how Dask dataframes work, so it should be familiar to advanced
users of that module.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eye&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_meta&lt;/span&gt;
&lt;span class="go"&gt;array([], shape=(0, 0), dtype=float64)&lt;/span&gt;

&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sparse&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sparse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_numpy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_meta&lt;/span&gt;
&lt;span class="go"&gt;&amp;lt;COO: shape=(0, 0), dtype=float64, nnz=0, fill_value=0.0&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This work was largely done by &lt;a class="reference external" href="https://github.com/pentschev"&gt;Peter Entschev&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="array-html-output"&gt;
&lt;h2&gt;Array HTML output&lt;/h2&gt;
&lt;p&gt;Dask arrays now print themselves nicely in Jupyter notebooks, showing a table
of information about their size and chunk size, and also a visual diagram of
their chunk structure.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 80.00 GB &lt;/td&gt; &lt;td&gt; 125.00 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (10000, 1000, 1000) &lt;/td&gt; &lt;td&gt; (250, 250, 250) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 640 Tasks &lt;/td&gt;&lt;td&gt; 640 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; float64 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="241" height="231" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="127" y2="117" style="stroke-width:2" /&gt;
  &lt;line x1="10" y1="16" x2="127" y2="133" /&gt;
  &lt;line x1="10" y1="32" x2="127" y2="149" /&gt;
  &lt;line x1="10" y1="48" x2="127" y2="165" /&gt;
  &lt;line x1="10" y1="64" x2="127" y2="181" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="10" y2="64" style="stroke-width:2" /&gt;
  &lt;line x1="12" y1="2" x2="12" y2="67" /&gt;
  &lt;line x1="15" y1="5" x2="15" y2="70" /&gt;
  &lt;line x1="18" y1="8" x2="18" y2="73" /&gt;
  &lt;line x1="21" y1="11" x2="21" y2="76" /&gt;
  &lt;line x1="24" y1="14" x2="24" y2="79" /&gt;
  &lt;line x1="27" y1="17" x2="27" y2="81" /&gt;
  &lt;line x1="30" y1="20" x2="30" y2="84" /&gt;
  &lt;line x1="33" y1="23" x2="33" y2="87" /&gt;
  &lt;line x1="36" y1="26" x2="36" y2="90" /&gt;
  &lt;line x1="39" y1="29" x2="39" y2="93" /&gt;
  &lt;line x1="42" y1="32" x2="42" y2="96" /&gt;
  &lt;line x1="45" y1="35" x2="45" y2="99" /&gt;
  &lt;line x1="48" y1="38" x2="48" y2="102" /&gt;
  &lt;line x1="51" y1="41" x2="51" y2="105" /&gt;
  &lt;line x1="54" y1="44" x2="54" y2="108" /&gt;
  &lt;line x1="57" y1="47" x2="57" y2="111" /&gt;
  &lt;line x1="60" y1="50" x2="60" y2="114" /&gt;
  &lt;line x1="62" y1="52" x2="62" y2="117" /&gt;
  &lt;line x1="65" y1="55" x2="65" y2="120" /&gt;
  &lt;line x1="68" y1="58" x2="68" y2="123" /&gt;
  &lt;line x1="71" y1="61" x2="71" y2="126" /&gt;
  &lt;line x1="74" y1="64" x2="74" y2="129" /&gt;
  &lt;line x1="77" y1="67" x2="77" y2="131" /&gt;
  &lt;line x1="80" y1="70" x2="80" y2="134" /&gt;
  &lt;line x1="83" y1="73" x2="83" y2="137" /&gt;
  &lt;line x1="86" y1="76" x2="86" y2="140" /&gt;
  &lt;line x1="89" y1="79" x2="89" y2="143" /&gt;
  &lt;line x1="92" y1="82" x2="92" y2="146" /&gt;
  &lt;line x1="95" y1="85" x2="95" y2="149" /&gt;
  &lt;line x1="98" y1="88" x2="98" y2="152" /&gt;
  &lt;line x1="101" y1="91" x2="101" y2="155" /&gt;
  &lt;line x1="104" y1="94" x2="104" y2="158" /&gt;
  &lt;line x1="107" y1="97" x2="107" y2="161" /&gt;
  &lt;line x1="110" y1="100" x2="110" y2="164" /&gt;
  &lt;line x1="112" y1="102" x2="112" y2="167" /&gt;
  &lt;line x1="115" y1="105" x2="115" y2="170" /&gt;
  &lt;line x1="118" y1="108" x2="118" y2="173" /&gt;
  &lt;line x1="121" y1="111" x2="121" y2="176" /&gt;
  &lt;line x1="124" y1="114" x2="124" y2="179" /&gt;
  &lt;line x1="127" y1="117" x2="127" y2="181" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 127.647059,117.647059 127.647059,181.975164 10.000000,64.328105" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="74" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="12" y1="2" x2="77" y2="2" /&gt;
  &lt;line x1="15" y1="5" x2="80" y2="5" /&gt;
  &lt;line x1="18" y1="8" x2="83" y2="8" /&gt;
  &lt;line x1="21" y1="11" x2="86" y2="11" /&gt;
  &lt;line x1="24" y1="14" x2="89" y2="14" /&gt;
  &lt;line x1="27" y1="17" x2="91" y2="17" /&gt;
  &lt;line x1="30" y1="20" x2="94" y2="20" /&gt;
  &lt;line x1="33" y1="23" x2="97" y2="23" /&gt;
  &lt;line x1="36" y1="26" x2="100" y2="26" /&gt;
  &lt;line x1="39" y1="29" x2="103" y2="29" /&gt;
  &lt;line x1="42" y1="32" x2="106" y2="32" /&gt;
  &lt;line x1="45" y1="35" x2="109" y2="35" /&gt;
  &lt;line x1="48" y1="38" x2="112" y2="38" /&gt;
  &lt;line x1="51" y1="41" x2="115" y2="41" /&gt;
  &lt;line x1="54" y1="44" x2="118" y2="44" /&gt;
  &lt;line x1="57" y1="47" x2="121" y2="47" /&gt;
  &lt;line x1="60" y1="50" x2="124" y2="50" /&gt;
  &lt;line x1="62" y1="52" x2="127" y2="52" /&gt;
  &lt;line x1="65" y1="55" x2="130" y2="55" /&gt;
  &lt;line x1="68" y1="58" x2="133" y2="58" /&gt;
  &lt;line x1="71" y1="61" x2="136" y2="61" /&gt;
  &lt;line x1="74" y1="64" x2="139" y2="64" /&gt;
  &lt;line x1="77" y1="67" x2="141" y2="67" /&gt;
  &lt;line x1="80" y1="70" x2="144" y2="70" /&gt;
  &lt;line x1="83" y1="73" x2="147" y2="73" /&gt;
  &lt;line x1="86" y1="76" x2="150" y2="76" /&gt;
  &lt;line x1="89" y1="79" x2="153" y2="79" /&gt;
  &lt;line x1="92" y1="82" x2="156" y2="82" /&gt;
  &lt;line x1="95" y1="85" x2="159" y2="85" /&gt;
  &lt;line x1="98" y1="88" x2="162" y2="88" /&gt;
  &lt;line x1="101" y1="91" x2="165" y2="91" /&gt;
  &lt;line x1="104" y1="94" x2="168" y2="94" /&gt;
  &lt;line x1="107" y1="97" x2="171" y2="97" /&gt;
  &lt;line x1="110" y1="100" x2="174" y2="100" /&gt;
  &lt;line x1="112" y1="102" x2="177" y2="102" /&gt;
  &lt;line x1="115" y1="105" x2="180" y2="105" /&gt;
  &lt;line x1="118" y1="108" x2="183" y2="108" /&gt;
  &lt;line x1="121" y1="111" x2="186" y2="111" /&gt;
  &lt;line x1="124" y1="114" x2="189" y2="114" /&gt;
  &lt;line x1="127" y1="117" x2="191" y2="117" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="127" y2="117" style="stroke-width:2" /&gt;
  &lt;line x1="26" y1="0" x2="143" y2="117" /&gt;
  &lt;line x1="42" y1="0" x2="159" y2="117" /&gt;
  &lt;line x1="58" y1="0" x2="175" y2="117" /&gt;
  &lt;line x1="74" y1="0" x2="191" y2="117" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 74.328105,0.000000 191.975164,117.647059 127.647059,117.647059" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="127" y1="117" x2="191" y2="117" style="stroke-width:2" /&gt;
  &lt;line x1="127" y1="133" x2="191" y2="133" /&gt;
  &lt;line x1="127" y1="149" x2="191" y2="149" /&gt;
  &lt;line x1="127" y1="165" x2="191" y2="165" /&gt;
  &lt;line x1="127" y1="181" x2="191" y2="181" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="127" y1="117" x2="127" y2="181" style="stroke-width:2" /&gt;
  &lt;line x1="143" y1="117" x2="143" y2="181" /&gt;
  &lt;line x1="159" y1="117" x2="159" y2="181" /&gt;
  &lt;line x1="175" y1="117" x2="175" y2="181" /&gt;
  &lt;line x1="191" y1="117" x2="191" y2="181" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="127.647059,117.647059 191.975164,117.647059 191.975164,181.975164 127.647059,181.975164" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="159.811111" y="201.975164" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;1000&lt;/text&gt;
&lt;text x="211.975164" y="149.811111" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,211.975164,149.811111)"&gt;1000&lt;/text&gt;
&lt;text x="58.823529" y="143.151634" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,58.823529,143.151634)"&gt;10000&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;/section&gt;
&lt;section id="proxy-worker-dashboards-from-the-scheduler-dashboard"&gt;
&lt;h2&gt;Proxy Worker dashboards from the Scheduler dashboard&lt;/h2&gt;
&lt;p&gt;If you’ve used Dask.distributed they you’re probably familiar with Dask’s
scheduler dashboard, which shows the state of computations on the cluster with
a real-time interactive &lt;a class="reference external" href="https://bokeh.org"&gt;Bokeh&lt;/a&gt; dashboard. However you may
not be aware that Dask workers also have their own dashboard, which shows a
completely separate set of plots for the state of that individual worker.&lt;/p&gt;
&lt;p&gt;Historically these worker dashboards haven’t been as commonly used because it’s
hard to connect to them. Users don’t know their address, or network rules
don’t enable direct web connections. Fortunately, the scheduler dashboard is
now able to proxy a connection from the user to the worker dashbaord.&lt;/p&gt;
&lt;p&gt;You can access this by clicking on the “Info” tab and then selecting the
“dashboard” link next to any of the workers. You will need to also install
&lt;a class="reference external" href="https://github.com/jupyterhub/jupyter-server-proxy"&gt;jupyter-server-proxy&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;pip install jupyter-server-proxy
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Thanks to &lt;a class="reference external" href="https://github.com/quasiben"&gt;Ben Zaitlen&lt;/a&gt; for this fun addtition.
We hope that now that these plots are made more visible, people will invest
more into developing plots for them.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="black-everywhere"&gt;
&lt;h2&gt;Black everywhere&lt;/h2&gt;
&lt;p&gt;We now use the &lt;a class="reference external" href="https://black.readthedocs.io/"&gt;Black&lt;/a&gt; code formatter throughout
most Dask repositories. These repositories include pre-commit hooks, which we
recommend when developing on the project.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;
&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;checkout&lt;/span&gt; &lt;span class="n"&gt;master&lt;/span&gt;
&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;pull&lt;/span&gt; &lt;span class="n"&gt;upstream&lt;/span&gt; &lt;span class="n"&gt;master&lt;/span&gt;

&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pre&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;commit&lt;/span&gt;
&lt;span class="n"&gt;pre&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;commit&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Git will then call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;black&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;flake8&lt;/span&gt;&lt;/code&gt; whenever you attempt to commit code.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/22/dask-2.0.md&lt;/span&gt;, line 300)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dask-gateway"&gt;
&lt;h1&gt;Dask Gateway&lt;/h1&gt;
&lt;p&gt;We would also like to inform readers about the somewhat new &lt;a class="reference external" href="https://github.com/jcrist/dask-gateway"&gt;Dask
Gateway&lt;/a&gt; project that enables
institutions and IT to control many Dask clusters for a variety of users.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://jcrist.github.io/dask-gateway/_images/architecture.svg"
     width="70%"
     alt="Dask Gateway"&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/22/dask-2.0.md&lt;/span&gt;, line 310)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="acknowledgements"&gt;
&lt;h1&gt;Acknowledgements&lt;/h1&gt;
&lt;p&gt;There have been several releases since the last time we had a release blogpost.
The following people contributed to the following repositories since the 1.1.0
release on January 23rd:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask"&gt;dask/dask&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;(Rick) Richard J Zamora&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Abhinav Ralhan&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adam Beberg&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Alistair Miles&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Álvaro Abella Bascarán&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Anderson Banihirwe&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Aploium&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bart Broere&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Benjamin Zaitlen&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bouwe Andela&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brett Naul&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brian Chu&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bruce Merry&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Christian Hudon&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cody Johnson&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dan O’Donovan&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daniel Saxton&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daniel Severo&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Danilo Horta&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dimplexion&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elliott Sales de Andrade&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Endre Mark Borza&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Genevieve Buckley&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;George Sakkis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Guillaume Lemaitre&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HSR05&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hameer Abbasi&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Henrique Ribeiro&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Henry Pinkard&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hugo&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ian Bolliger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ian Rose&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Isaiah Norton&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;James Bourbeau&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Janne Vuorela&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;John Kirkham&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jim Crist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Joe Corbett&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jorge Pessoa&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Julia Signell&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;JulianWgs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Justin Poehnelt&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Justin Waugh&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ksenia Bobrova&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lijo Jose&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Marco Neumann&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mark Bell&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Martin Durant&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Michael Eaton&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Michał Jastrzębski&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nathan Matare&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nick Becker&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Paweł Kordek&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Peter Andreas Entschev&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Philipp Rudiger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Philipp S. Sommer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Roma Sokolov&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ross Petchler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scott Sievert&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shyam Saladi&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Søren Fuglede Jørgensen&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Thomas Zilio&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Yu Feng&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;aaronfowles&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;amerkel2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;asmith26&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;btw08&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gregrf&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;mbarkhau&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;mcsoini&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;severo&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tpanza&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/distributed"&gt;dask/distributed&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Adam Beberg&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Benjamin Zaitlen&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brett Jurman&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brett Randall&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brian Chu&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caleb&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chris White&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daniel Farrell&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elliott Sales de Andrade&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;George Sakkis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;James Bourbeau&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jim Crist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;John Kirkham&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;K.-Michael Aye&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Loïc Estève&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Magnus Nord&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Manuel Garrido&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Marco Neumann&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Martin Durant&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mathieu Dugré&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matt Nicolls&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Michael Delgado&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Michael Spiegel&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Muammar El Khatib&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nikos Tsaousis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Olivier Grisel&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Peter Andreas Entschev&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sam Grayson&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scott Sievert&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Torsten Wörtwein&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;amerkel2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;condoratberlin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;deepthirajagopalan7&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;jukent&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;plbertrand&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-ml"&gt;dask/dask-ml&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Alejandro&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Florian Rohrer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;James Bourbeau&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Julien Jerphanion&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nathan Henrie&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Paul Vecchio&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ryan McCormick&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Saadullah Amin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scott Sievert&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sriharsha Atyam&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-jobqueue"&gt;dask/dask-jobqueue&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Andrea Zonca&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Guillaume Eynard-Bontemps&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kyle Husmann&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Levi Naden&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Loïc Estève&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matyas Selmeci&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ocaisa&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-kubernetes"&gt;dask/dask-kubernetes&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Brian Phillips&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jacob Tomlinson&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jim Crist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Joe Hamman&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Joseph Hamman&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Yuvi Panda&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;adam&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-examples"&gt;dask/dask-examples&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Christoph Deil&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Genevieve Buckley&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ian Rose&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Martin Durant&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthias Bussonnier&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Robert Sare&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Willi Rath&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-labextension"&gt;dask/dask-labextension&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Daniel Bast&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ian Rose&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Yuvi Panda&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/06/22/dask-2.0/"/>
    <summary>Please take the Dask User Survey for 2019.
Your reponse helps to prioritize future work.</summary>
    <category term="release" label="release"/>
    <published>2019-06-22T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/06/20/load-image-data/</id>
    <title>Load Large Image Data with Dask Array</title>
    <updated>2019-06-20T00:00:00+00:00</updated>
    <author>
      <name>John Kirkham</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/20/load-image-data.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;This post explores simple workflows to load large stacks of image data with Dask array.&lt;/p&gt;
&lt;p&gt;In particular, we start with a &lt;a class="reference external" href="https://drive.google.com/drive/folders/13mpIfqspKTIINkfoWbFsVtFF8D7jbTqJ"&gt;directory full of TIFF
files&lt;/a&gt;
of images like the following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ $ ls raw/ | head
ex6-2_CamA_ch1_CAM1_stack0000_560nm_0000000msec_0001291795msecAbs_000x_000y_000z_0000t.tif
ex6-2_CamA_ch1_CAM1_stack0001_560nm_0043748msec_0001335543msecAbs_000x_000y_000z_0000t.tif
ex6-2_CamA_ch1_CAM1_stack0002_560nm_0087497msec_0001379292msecAbs_000x_000y_000z_0000t.tif
ex6-2_CamA_ch1_CAM1_stack0003_560nm_0131245msec_0001423040msecAbs_000x_000y_000z_0000t.tif
ex6-2_CamA_ch1_CAM1_stack0004_560nm_0174993msec_0001466788msecAbs_000x_000y_000z_0000t.tif
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;and show how to stitch these together into large lazy arrays
using the &lt;a class="reference external" href="https://image.dask.org/en/latest/"&gt;dask-image&lt;/a&gt; library&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask_image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;raw/*.tif&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;or by writing your own Dask delayed image reader function.&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 3.16 GB &lt;/td&gt; &lt;td&gt; 316.15 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (2010, 1024, 768) &lt;/td&gt; &lt;td&gt; (201, 1024, 768) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 30 Tasks &lt;/td&gt;&lt;td&gt; 10 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="176" height="181" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="80" y2="70" style="stroke-width:2" /&gt;
  &lt;line x1="10" y1="61" x2="80" y2="131" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="10" y2="61" style="stroke-width:2" /&gt;
  &lt;line x1="17" y1="7" x2="17" y2="68" /&gt;
  &lt;line x1="24" y1="14" x2="24" y2="75" /&gt;
  &lt;line x1="31" y1="21" x2="31" y2="82" /&gt;
  &lt;line x1="38" y1="28" x2="38" y2="89" /&gt;
  &lt;line x1="45" y1="35" x2="45" y2="96" /&gt;
  &lt;line x1="52" y1="42" x2="52" y2="103" /&gt;
  &lt;line x1="59" y1="49" x2="59" y2="110" /&gt;
  &lt;line x1="66" y1="56" x2="66" y2="117" /&gt;
  &lt;line x1="73" y1="63" x2="73" y2="124" /&gt;
  &lt;line x1="80" y1="70" x2="80" y2="131" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 80.588235,70.588235 80.588235,131.722564 10.000000,61.134328" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="55" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="17" y1="7" x2="62" y2="7" /&gt;
  &lt;line x1="24" y1="14" x2="69" y2="14" /&gt;
  &lt;line x1="31" y1="21" x2="77" y2="21" /&gt;
  &lt;line x1="38" y1="28" x2="84" y2="28" /&gt;
  &lt;line x1="45" y1="35" x2="91" y2="35" /&gt;
  &lt;line x1="52" y1="42" x2="98" y2="42" /&gt;
  &lt;line x1="59" y1="49" x2="105" y2="49" /&gt;
  &lt;line x1="66" y1="56" x2="112" y2="56" /&gt;
  &lt;line x1="73" y1="63" x2="119" y2="63" /&gt;
  &lt;line x1="80" y1="70" x2="126" y2="70" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="80" y2="70" style="stroke-width:2" /&gt;
  &lt;line x1="55" y1="0" x2="126" y2="70" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 55.850746,0.000000 126.438982,70.588235 80.588235,70.588235" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="80" y1="70" x2="126" y2="70" style="stroke-width:2" /&gt;
  &lt;line x1="80" y1="131" x2="126" y2="131" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="80" y1="70" x2="80" y2="131" style="stroke-width:2" /&gt;
  &lt;line x1="126" y1="70" x2="126" y2="131" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="80.588235,70.588235 126.438982,70.588235 126.438982,131.722564 80.588235,131.722564" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="103.513608" y="151.722564" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;768&lt;/text&gt;
&lt;text x="146.438982" y="101.155399" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,146.438982,101.155399)"&gt;1024&lt;/text&gt;
&lt;text x="35.294118" y="116.428446" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,35.294118,116.428446)"&gt;2010&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Some day we’ll eventually be able to perform complex calculations on this dask array.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/mrocklin/raw-host/gh-pages/images/aollsm-index-1.jpg"
     width="45%"
     alt="Light Microscopy data rendered with NVidia IndeX"&gt;
&lt;img src="https://raw.githubusercontent.com/mrocklin/raw-host/gh-pages/images/aollsm-index-2.jpg"
     width="45%"
     alt="Light Microscopy data rendered with NVidia IndeX"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Disclaimer: we’re not going to produce rendered images like the above in this
post. These were created with &lt;a class="reference external" href="https://developer.nvidia.com/index"&gt;NVidia
IndeX&lt;/a&gt;, a completely separate tool chain
from what is being discussed here. This post covers the first step of image
loading.&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/20/load-image-data.md&lt;/span&gt;, line 128)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="series-overview"&gt;
&lt;h1&gt;Series Overview&lt;/h1&gt;
&lt;p&gt;A common case in fields that acquire large amounts of imaging data is to write
out smaller acquisitions into many small files. These files can tile a larger
space, sub-sample from a larger time period, and may contain multiple channels.
The acquisition techniques themselves are often state of the art and constantly
pushing the envelope in term of how large a field of view can be acquired, at
what resolution, and what quality.&lt;/p&gt;
&lt;p&gt;Once acquired this data presents a number of challenges. Algorithms often
designed and tested to work on very small pieces of this data need to be scaled
up to work on the full dataset. It might not be clear at the outset what will
actually work and so exploration still plays a very big part of the whole
process.&lt;/p&gt;
&lt;p&gt;Historically this analytical process has involved a lot of custom code. Often
the analytical process is stitched together by a series of scripts possibly in
several different languages that write various intermediate results to disk.
Thanks to advances in modern tooling these process can be significantly
improved. In this series of blogposts, we will outline ways for image
scientists to leverage different tools to move towards a high level, friendly,
cohesive, interactive analytical pipeline.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/20/load-image-data.md&lt;/span&gt;, line 151)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="post-overview"&gt;
&lt;h1&gt;Post Overview&lt;/h1&gt;
&lt;p&gt;This post in particular focuses on loading and managing large stacks of image
data in parallel from Python.&lt;/p&gt;
&lt;p&gt;Loading large image data can be a complex and often unique problem. Different
groups may choose to store this across many files on disk, a commodity or
custom database solution, or they may opt to store it in the cloud. Not all
datasets within the same group may be treated the same for a variety of
reasons. In short, this means loading data is a hard and expensive problem.&lt;/p&gt;
&lt;p&gt;Despite data being stored in many different ways, often groups want to reapply
the same analytical pipeline to these datasets. However if the data pipeline is
tightly coupled to a particular way of loading the data for later analytical
steps, it may be very difficult if not impossible to reuse an existing
pipeline. In other words, there is friction between the loading and analysis
steps, which frustrates efforts to make things reusable.&lt;/p&gt;
&lt;p&gt;Having a modular and general way to load data makes it easy to present data
stored differently in a standard way. Further having a standard way to present
data to analytical pipelines allows that part of the pipeline to focus on what
it does best, analysis! In general, this should decouple these to components in
a way that improves the experience of users involved in all parts of the
pipeline.&lt;/p&gt;
&lt;p&gt;We will use
&lt;a class="reference external" href="https://drive.google.com/drive/folders/13mpIfqspKTIINkfoWbFsVtFF8D7jbTqJ"&gt;image data&lt;/a&gt;
generously provided by
&lt;a class="reference external" href="https://scholar.google.com/citations?user=nxwNAEgAAAAJ&amp;amp;amp;hl=en"&gt;Gokul Upadhyayula&lt;/a&gt;
at the
&lt;a class="reference external" href="http://microscopy.berkeley.edu/"&gt;Advanced Bioimaging Center&lt;/a&gt;
at UC Berkeley and discussed in
&lt;a class="reference external" href="https://science.sciencemag.org/content/360/6386/eaaq1392"&gt;this paper&lt;/a&gt;
(&lt;a class="reference external" href="https://www.biorxiv.org/content/10.1101/243352v2"&gt;preprint&lt;/a&gt;),
though the workloads presented here should work for any kind of imaging data,
or array data generally.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/20/load-image-data.md&lt;/span&gt;, line 188)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="load-image-data-with-dask"&gt;
&lt;h1&gt;Load image data with Dask&lt;/h1&gt;
&lt;p&gt;Let’s start again with our image data from the top of the post:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ $ ls /path/to/files/raw/ | head
ex6-2_CamA_ch1_CAM1_stack0000_560nm_0000000msec_0001291795msecAbs_000x_000y_000z_0000t.tif
ex6-2_CamA_ch1_CAM1_stack0001_560nm_0043748msec_0001335543msecAbs_000x_000y_000z_0000t.tif
ex6-2_CamA_ch1_CAM1_stack0002_560nm_0087497msec_0001379292msecAbs_000x_000y_000z_0000t.tif
ex6-2_CamA_ch1_CAM1_stack0003_560nm_0131245msec_0001423040msecAbs_000x_000y_000z_0000t.tif
ex6-2_CamA_ch1_CAM1_stack0004_560nm_0174993msec_0001466788msecAbs_000x_000y_000z_0000t.tif
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section id="load-a-single-sample-image-with-scikit-image"&gt;
&lt;h2&gt;Load a single sample image with Scikit-Image&lt;/h2&gt;
&lt;p&gt;To load a single image, we use &lt;a class="reference external" href="https://scikit-image.org/"&gt;Scikit-Image&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;glob&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;filenames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/path/to/files/raw/*.tif&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;597&lt;/span&gt;

&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;imageio&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imageio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;span class="go"&gt;(201, 1024, 768)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Each filename corresponds to some 3d chunk of a larger image. We can look at a
few 2d slices of this single 3d chunk to get some context.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;skimage.io&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;skimage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/mrocklin/raw-host/gh-pages/images/aollsm-sample-1.png"
     width="60%"&gt;&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;skimage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/mrocklin/raw-host/gh-pages/images/aollsm-sample-2.png"
     width="60%"&gt;&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;skimage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="p"&gt;:])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/mrocklin/raw-host/gh-pages/images/aollsm-sample-3.png"
     width="60%"&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="investigate-filename-structure"&gt;
&lt;h2&gt;Investigate Filename Structure&lt;/h2&gt;
&lt;p&gt;These are slices from only one chunk of a much larger aggregate image.
Our interest here is combining the pieces into a large image stack.
It is common to see a naming structure in the filenames. Each
filename then may indicate a channel, time step, and spatial location with the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;&amp;lt;i&amp;gt;&lt;/span&gt;&lt;/code&gt; being some numeric values (possibly with units). Individual filenames may
have more or less information and may notate it differently than we have.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;mydata_ch&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;t_&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;x_&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;y_&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tif&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In principle with NumPy we might allocate a giant array and then iteratively
load images and place them into the giant array.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;full_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imageio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_location_from_filename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# We need to write this function&lt;/span&gt;
    &lt;span class="n"&gt;full_array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;However if our data is large then we can’t load it all into memory at once like
this into a single Numpy array, and instead we need to be a bit more clever to
handle it efficiently. One approach here is to use &lt;a class="reference external" href="https://dask.org"&gt;Dask&lt;/a&gt;,
which handles larger-than-memory workloads easily.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="lazily-load-images-with-dask-array"&gt;
&lt;h2&gt;Lazily load images with Dask Array&lt;/h2&gt;
&lt;p&gt;Now we learn how to lazily load and stitch together image data with Dask array.
We’ll start with simple examples first and then move onto the full example with
this more complex dataset afterwards.&lt;/p&gt;
&lt;p&gt;We can delay the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;imageio.imread&lt;/span&gt;&lt;/code&gt; calls with &lt;a class="reference external" href="https://docs.dassk.org/en/latest/delayed.html"&gt;Dask
Delayed&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;lazy_arrays&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delayed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imageio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;lazy_arrays&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_delayed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lazy_arrays&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Note: here we’re assuming that all of the images have the same shape and dtype
as the sample file that we loaded above. This is not always the case. See the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask_image&lt;/span&gt;&lt;/code&gt; note below in the Future Work section for an alternative.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We haven’t yet stitched these together. We have hundreds of single-chunk Dask
arrays, each of which lazily loads a single 3d chunk of data from disk. Lets look at a single array.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;lazy_arrays&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 316.15 MB &lt;/td&gt; &lt;td&gt; 316.15 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (201, 1024, 768) &lt;/td&gt; &lt;td&gt; (201, 1024, 768) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 2 Tasks &lt;/td&gt;&lt;td&gt; 1 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="174" height="194" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="34" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="10" y1="120" x2="34" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="10" y2="120" style="stroke-width:2" /&gt;
  &lt;line x1="34" y1="24" x2="34" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 34.664918,24.664918 34.664918,144.664918 10.000000,120.000000" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="100" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="34" y1="24" x2="124" y2="24" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="34" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="100" y1="0" x2="124" y2="24" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 100.000000,0.000000 124.664918,24.664918 34.664918,24.664918" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="34" y1="24" x2="124" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="34" y1="144" x2="124" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="34" y1="24" x2="34" y2="144" style="stroke-width:2" /&gt;
  &lt;line x1="124" y1="24" x2="124" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="34.664918,24.664918 124.664918,24.664918 124.664918,144.664918 34.664918,144.664918" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="79.664918" y="164.664918" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;768&lt;/text&gt;
&lt;text x="144.664918" y="84.664918" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,144.664918,84.664918)"&gt;1024&lt;/text&gt;
&lt;text x="12.332459" y="152.332459" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,12.332459,152.332459)"&gt;201&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;This is a lazy 3-dimensional Dask array of a &lt;em&gt;single&lt;/em&gt; 300MB chunk of data.
That chunk is created by loading in a particular TIFF file. Normally Dask
arrays are composed of &lt;em&gt;many&lt;/em&gt; chunks. We can concatenate many of these
single-chunked Dask arrays into a multi-chunked Dask array with functions like
&lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.concatenate"&gt;da.concatenate&lt;/a&gt;
and
&lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.stack"&gt;da.stack&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here we concatenate the first ten Dask arrays along a few axes, to get an
easier-to-understand picture of how this looks. Take a look both at how the
shape changes as we change the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;axis=&lt;/span&gt;&lt;/code&gt; parameter both in the table on the left
and the image on the right.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lazy_arrays&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 3.16 GB &lt;/td&gt; &lt;td&gt; 316.15 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (2010, 1024, 768) &lt;/td&gt; &lt;td&gt; (201, 1024, 768) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 30 Tasks &lt;/td&gt;&lt;td&gt; 10 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="176" height="181" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="80" y2="70" style="stroke-width:2" /&gt;
  &lt;line x1="10" y1="61" x2="80" y2="131" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="10" y2="61" style="stroke-width:2" /&gt;
  &lt;line x1="17" y1="7" x2="17" y2="68" /&gt;
  &lt;line x1="24" y1="14" x2="24" y2="75" /&gt;
  &lt;line x1="31" y1="21" x2="31" y2="82" /&gt;
  &lt;line x1="38" y1="28" x2="38" y2="89" /&gt;
  &lt;line x1="45" y1="35" x2="45" y2="96" /&gt;
  &lt;line x1="52" y1="42" x2="52" y2="103" /&gt;
  &lt;line x1="59" y1="49" x2="59" y2="110" /&gt;
  &lt;line x1="66" y1="56" x2="66" y2="117" /&gt;
  &lt;line x1="73" y1="63" x2="73" y2="124" /&gt;
  &lt;line x1="80" y1="70" x2="80" y2="131" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 80.588235,70.588235 80.588235,131.722564 10.000000,61.134328" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="55" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="17" y1="7" x2="62" y2="7" /&gt;
  &lt;line x1="24" y1="14" x2="69" y2="14" /&gt;
  &lt;line x1="31" y1="21" x2="77" y2="21" /&gt;
  &lt;line x1="38" y1="28" x2="84" y2="28" /&gt;
  &lt;line x1="45" y1="35" x2="91" y2="35" /&gt;
  &lt;line x1="52" y1="42" x2="98" y2="42" /&gt;
  &lt;line x1="59" y1="49" x2="105" y2="49" /&gt;
  &lt;line x1="66" y1="56" x2="112" y2="56" /&gt;
  &lt;line x1="73" y1="63" x2="119" y2="63" /&gt;
  &lt;line x1="80" y1="70" x2="126" y2="70" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="80" y2="70" style="stroke-width:2" /&gt;
  &lt;line x1="55" y1="0" x2="126" y2="70" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 55.850746,0.000000 126.438982,70.588235 80.588235,70.588235" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="80" y1="70" x2="126" y2="70" style="stroke-width:2" /&gt;
  &lt;line x1="80" y1="131" x2="126" y2="131" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="80" y1="70" x2="80" y2="131" style="stroke-width:2" /&gt;
  &lt;line x1="126" y1="70" x2="126" y2="131" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="80.588235,70.588235 126.438982,70.588235 126.438982,131.722564 80.588235,131.722564" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="103.513608" y="151.722564" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;768&lt;/text&gt;
&lt;text x="146.438982" y="101.155399" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,146.438982,101.155399)"&gt;1024&lt;/text&gt;
&lt;text x="35.294118" y="116.428446" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,35.294118,116.428446)"&gt;2010&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lazy_arrays&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 3.16 GB &lt;/td&gt; &lt;td&gt; 316.15 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (201, 10240, 768) &lt;/td&gt; &lt;td&gt; (201, 1024, 768) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 30 Tasks &lt;/td&gt;&lt;td&gt; 10 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="113" height="187" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="27" y2="17" style="stroke-width:2" /&gt;
  &lt;line x1="10" y1="12" x2="27" y2="29" /&gt;
  &lt;line x1="10" y1="24" x2="27" y2="41" /&gt;
  &lt;line x1="10" y1="36" x2="27" y2="53" /&gt;
  &lt;line x1="10" y1="48" x2="27" y2="65" /&gt;
  &lt;line x1="10" y1="60" x2="27" y2="77" /&gt;
  &lt;line x1="10" y1="72" x2="27" y2="89" /&gt;
  &lt;line x1="10" y1="84" x2="27" y2="101" /&gt;
  &lt;line x1="10" y1="96" x2="27" y2="113" /&gt;
  &lt;line x1="10" y1="108" x2="27" y2="125" /&gt;
  &lt;line x1="10" y1="120" x2="27" y2="137" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="10" y2="120" style="stroke-width:2" /&gt;
  &lt;line x1="27" y1="17" x2="27" y2="137" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 27.014952,17.014952 27.014952,137.014952 10.000000,120.000000" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="46" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="27" y1="17" x2="63" y2="17" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="27" y2="17" style="stroke-width:2" /&gt;
  &lt;line x1="46" y1="0" x2="63" y2="17" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 46.948234,0.000000 63.963186,17.014952 27.014952,17.014952" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="27" y1="17" x2="63" y2="17" style="stroke-width:2" /&gt;
  &lt;line x1="27" y1="29" x2="63" y2="29" /&gt;
  &lt;line x1="27" y1="41" x2="63" y2="41" /&gt;
  &lt;line x1="27" y1="53" x2="63" y2="53" /&gt;
  &lt;line x1="27" y1="65" x2="63" y2="65" /&gt;
  &lt;line x1="27" y1="77" x2="63" y2="77" /&gt;
  &lt;line x1="27" y1="89" x2="63" y2="89" /&gt;
  &lt;line x1="27" y1="101" x2="63" y2="101" /&gt;
  &lt;line x1="27" y1="113" x2="63" y2="113" /&gt;
  &lt;line x1="27" y1="125" x2="63" y2="125" /&gt;
  &lt;line x1="27" y1="137" x2="63" y2="137" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="27" y1="17" x2="27" y2="137" style="stroke-width:2" /&gt;
  &lt;line x1="63" y1="17" x2="63" y2="137" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="27.014952,17.014952 63.963186,17.014952 63.963186,137.014952 27.014952,137.014952" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="45.489069" y="157.014952" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;768&lt;/text&gt;
&lt;text x="83.963186" y="77.014952" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,83.963186,77.014952)"&gt;10240&lt;/text&gt;
&lt;text x="8.507476" y="148.507476" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,8.507476,148.507476)"&gt;201&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lazy_arrays&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 3.16 GB &lt;/td&gt; &lt;td&gt; 316.15 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (201, 1024, 7680) &lt;/td&gt; &lt;td&gt; (201, 1024, 768) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 30 Tasks &lt;/td&gt;&lt;td&gt; 10 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="197" height="108" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="27" y2="17" style="stroke-width:2" /&gt;
  &lt;line x1="10" y1="40" x2="27" y2="58" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="10" y2="40" style="stroke-width:2" /&gt;
  &lt;line x1="27" y1="17" x2="27" y2="58" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 27.988258,17.988258 27.988258,58.112379 10.000000,40.124121" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="10" y1="0" x2="130" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="27" y1="17" x2="147" y2="17" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="10" y1="0" x2="27" y2="17" style="stroke-width:2" /&gt;
  &lt;line x1="22" y1="0" x2="39" y2="17" /&gt;
  &lt;line x1="34" y1="0" x2="51" y2="17" /&gt;
  &lt;line x1="46" y1="0" x2="63" y2="17" /&gt;
  &lt;line x1="58" y1="0" x2="75" y2="17" /&gt;
  &lt;line x1="70" y1="0" x2="87" y2="17" /&gt;
  &lt;line x1="82" y1="0" x2="99" y2="17" /&gt;
  &lt;line x1="94" y1="0" x2="111" y2="17" /&gt;
  &lt;line x1="106" y1="0" x2="123" y2="17" /&gt;
  &lt;line x1="118" y1="0" x2="135" y2="17" /&gt;
  &lt;line x1="130" y1="0" x2="147" y2="17" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="10.000000,0.000000 130.000000,0.000000 147.988258,17.988258 27.988258,17.988258" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="27" y1="17" x2="147" y2="17" style="stroke-width:2" /&gt;
  &lt;line x1="27" y1="58" x2="147" y2="58" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="27" y1="17" x2="27" y2="58" style="stroke-width:2" /&gt;
  &lt;line x1="39" y1="17" x2="39" y2="58" /&gt;
  &lt;line x1="51" y1="17" x2="51" y2="58" /&gt;
  &lt;line x1="63" y1="17" x2="63" y2="58" /&gt;
  &lt;line x1="75" y1="17" x2="75" y2="58" /&gt;
  &lt;line x1="87" y1="17" x2="87" y2="58" /&gt;
  &lt;line x1="99" y1="17" x2="99" y2="58" /&gt;
  &lt;line x1="111" y1="17" x2="111" y2="58" /&gt;
  &lt;line x1="123" y1="17" x2="123" y2="58" /&gt;
  &lt;line x1="135" y1="17" x2="135" y2="58" /&gt;
  &lt;line x1="147" y1="17" x2="147" y2="58" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="27.988258,17.988258 147.988258,17.988258 147.988258,58.112379 27.988258,58.112379" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="87.988258" y="78.112379" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;7680&lt;/text&gt;
&lt;text x="167.988258" y="38.050318" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,167.988258,38.050318)"&gt;1024&lt;/text&gt;
&lt;text x="8.994129" y="69.118250" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,8.994129,69.118250)"&gt;201&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Or, if we wanted to make a new dimension, we would use &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;da.stack&lt;/span&gt;&lt;/code&gt;. In this
case note that we’ve run out of easily visible dimensions, so you should take
note of the listed shape in the table input on the left more than the picture
on the right. Notice that we’ve stacked these 3d images into a 4d image.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lazy_arrays&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 3.16 GB &lt;/td&gt; &lt;td&gt; 316.15 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (10, 201, 1024, 768) &lt;/td&gt; &lt;td&gt; (1, 201, 1024, 768) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 30 Tasks &lt;/td&gt;&lt;td&gt; 10 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="354" height="194" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="0" y1="0" x2="25" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="0" y1="25" x2="25" y2="25" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="0" y1="0" x2="0" y2="25" style="stroke-width:2" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="25" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="25" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="25" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="25" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="25" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="25" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="25" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="25" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="25" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="25" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="0.000000,0.000000 25.412617,0.000000 25.412617,25.412617 0.000000,25.412617" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="12.706308" y="45.412617" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;10&lt;/text&gt;
&lt;text x="45.412617" y="12.706308" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,45.412617,12.706308)"&gt;1&lt;/text&gt;&lt;/p&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="95" y1="0" x2="119" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="95" y1="120" x2="119" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="95" y1="0" x2="95" y2="120" style="stroke-width:2" /&gt;
  &lt;line x1="119" y1="24" x2="119" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="95.000000,0.000000 119.664918,24.664918 119.664918,144.664918 95.000000,120.000000" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="95" y1="0" x2="185" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="119" y1="24" x2="209" y2="24" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="95" y1="0" x2="119" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="185" y1="0" x2="209" y2="24" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="95.000000,0.000000 185.000000,0.000000 209.664918,24.664918 119.664918,24.664918" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="119" y1="24" x2="209" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="119" y1="144" x2="209" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="119" y1="24" x2="119" y2="144" style="stroke-width:2" /&gt;
  &lt;line x1="209" y1="24" x2="209" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="119.664918,24.664918 209.664918,24.664918 209.664918,144.664918 119.664918,144.664918" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="164.664918" y="164.664918" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;768&lt;/text&gt;
&lt;text x="229.664918" y="84.664918" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,229.664918,84.664918)"&gt;1024&lt;/text&gt;
&lt;text x="97.332459" y="152.332459" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,97.332459,152.332459)"&gt;201&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;These are the common case situations, where you have a single axis along which
you want to stitch images together.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="full-example"&gt;
&lt;h2&gt;Full example&lt;/h2&gt;
&lt;p&gt;This works fine for combining along a single axis. However if we need to
combine across multiple we need to perform multiple concatenate steps.
Fortunately there is a simpler option &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.block"&gt;da.block&lt;/a&gt;, which can
concatenate along multiple axes at once if you give it a nested list of dask
arrays.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="n"&gt;laxy_array_00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lazy_array_01&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
              &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lazy_array_10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lazy_array_11&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We now do the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Parse each filename to learn where it should live in the larger array&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;See how many files are in each of our relevant dimensions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Allocate a NumPy object-dtype array of the appropriate size, where each
element of this array will hold a single-chunk Dask array&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Go through our filenames and insert the proper Dask array into the right
position&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;da.block&lt;/span&gt;&lt;/code&gt; on the result&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This code is a bit complex, but shows what this looks like in a real-world
setting&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Get various dimensions&lt;/span&gt;

&lt;span class="n"&gt;fn_comp_sets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;comp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;splitext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;_&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;fn_comp_sets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setdefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;fn_comp_sets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;comp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fn_comp_sets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn_comp_sets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;

&lt;span class="n"&gt;remap_comps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;reversed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn_comp_sets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]))),&lt;/span&gt;
    &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;reversed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn_comp_sets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;])))&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Create an empty object array to organize each chunk that loads a TIFF&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remap_comps&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lazy_arrays&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;_ch&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;_&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;_stack&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;_&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;

&lt;span class="c1"&gt;# Stitch together the many blocks into a single array&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;table&gt;  &lt;thead&gt;    &lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;th&gt; Array &lt;/th&gt;&lt;th&gt; Chunk &lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;th&gt; Bytes &lt;/th&gt;&lt;td&gt; 188.74 GB &lt;/td&gt; &lt;td&gt; 316.15 MB &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Shape &lt;/th&gt;&lt;td&gt; (3, 199, 201, 1024, 768) &lt;/td&gt; &lt;td&gt; (1, 1, 201, 1024, 768) &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Count &lt;/th&gt;&lt;td&gt; 2985 Tasks &lt;/td&gt;&lt;td&gt; 597 Chunks &lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt; Type &lt;/th&gt;&lt;td&gt; uint16 &lt;/td&gt;&lt;td&gt; numpy.ndarray &lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;svg width="386" height="194" style="stroke:rgb(0,0,0);stroke-width:1" &gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="0" y1="0" x2="41" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="0" y1="8" x2="41" y2="8" /&gt;
  &lt;line x1="0" y1="16" x2="41" y2="16" /&gt;
  &lt;line x1="0" y1="25" x2="41" y2="25" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="0" y1="0" x2="0" y2="25" style="stroke-width:2" /&gt;
  &lt;line x1="0" y1="0" x2="0" y2="25" /&gt;
  &lt;line x1="0" y1="0" x2="0" y2="25" /&gt;
  &lt;line x1="0" y1="0" x2="0" y2="25" /&gt;
  &lt;line x1="0" y1="0" x2="0" y2="25" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="25" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="25" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="25" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="25" /&gt;
  &lt;line x1="1" y1="0" x2="1" y2="25" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="25" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="25" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="25" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="25" /&gt;
  &lt;line x1="2" y1="0" x2="2" y2="25" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="25" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="25" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="25" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="25" /&gt;
  &lt;line x1="3" y1="0" x2="3" y2="25" /&gt;
  &lt;line x1="4" y1="0" x2="4" y2="25" /&gt;
  &lt;line x1="4" y1="0" x2="4" y2="25" /&gt;
  &lt;line x1="4" y1="0" x2="4" y2="25" /&gt;
  &lt;line x1="4" y1="0" x2="4" y2="25" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="25" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="25" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="25" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="25" /&gt;
  &lt;line x1="5" y1="0" x2="5" y2="25" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="25" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="25" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="25" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="25" /&gt;
  &lt;line x1="6" y1="0" x2="6" y2="25" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="25" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="25" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="25" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="25" /&gt;
  &lt;line x1="7" y1="0" x2="7" y2="25" /&gt;
  &lt;line x1="8" y1="0" x2="8" y2="25" /&gt;
  &lt;line x1="8" y1="0" x2="8" y2="25" /&gt;
  &lt;line x1="8" y1="0" x2="8" y2="25" /&gt;
  &lt;line x1="8" y1="0" x2="8" y2="25" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="25" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="25" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="25" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="25" /&gt;
  &lt;line x1="9" y1="0" x2="9" y2="25" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="25" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="25" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="25" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="25" /&gt;
  &lt;line x1="10" y1="0" x2="10" y2="25" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="25" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="25" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="25" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="25" /&gt;
  &lt;line x1="11" y1="0" x2="11" y2="25" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="25" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="25" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="25" /&gt;
  &lt;line x1="12" y1="0" x2="12" y2="25" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="25" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="25" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="25" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="25" /&gt;
  &lt;line x1="13" y1="0" x2="13" y2="25" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="25" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="25" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="25" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="25" /&gt;
  &lt;line x1="14" y1="0" x2="14" y2="25" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="25" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="25" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="25" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="25" /&gt;
  &lt;line x1="15" y1="0" x2="15" y2="25" /&gt;
  &lt;line x1="16" y1="0" x2="16" y2="25" /&gt;
  &lt;line x1="16" y1="0" x2="16" y2="25" /&gt;
  &lt;line x1="16" y1="0" x2="16" y2="25" /&gt;
  &lt;line x1="16" y1="0" x2="16" y2="25" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="25" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="25" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="25" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="25" /&gt;
  &lt;line x1="17" y1="0" x2="17" y2="25" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="25" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="25" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="25" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="25" /&gt;
  &lt;line x1="18" y1="0" x2="18" y2="25" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="25" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="25" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="25" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="25" /&gt;
  &lt;line x1="19" y1="0" x2="19" y2="25" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="25" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="25" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="25" /&gt;
  &lt;line x1="20" y1="0" x2="20" y2="25" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="25" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="25" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="25" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="25" /&gt;
  &lt;line x1="21" y1="0" x2="21" y2="25" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="25" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="25" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="25" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="25" /&gt;
  &lt;line x1="22" y1="0" x2="22" y2="25" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="25" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="25" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="25" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="25" /&gt;
  &lt;line x1="23" y1="0" x2="23" y2="25" /&gt;
  &lt;line x1="24" y1="0" x2="24" y2="25" /&gt;
  &lt;line x1="24" y1="0" x2="24" y2="25" /&gt;
  &lt;line x1="24" y1="0" x2="24" y2="25" /&gt;
  &lt;line x1="24" y1="0" x2="24" y2="25" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="25" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="25" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="25" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="25" /&gt;
  &lt;line x1="25" y1="0" x2="25" y2="25" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="25" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="25" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="25" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="25" /&gt;
  &lt;line x1="26" y1="0" x2="26" y2="25" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="25" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="25" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="25" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="25" /&gt;
  &lt;line x1="27" y1="0" x2="27" y2="25" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="25" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="25" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="25" /&gt;
  &lt;line x1="28" y1="0" x2="28" y2="25" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="25" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="25" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="25" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="25" /&gt;
  &lt;line x1="29" y1="0" x2="29" y2="25" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="25" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="25" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="25" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="25" /&gt;
  &lt;line x1="30" y1="0" x2="30" y2="25" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="25" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="25" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="25" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="25" /&gt;
  &lt;line x1="31" y1="0" x2="31" y2="25" /&gt;
  &lt;line x1="32" y1="0" x2="32" y2="25" /&gt;
  &lt;line x1="32" y1="0" x2="32" y2="25" /&gt;
  &lt;line x1="32" y1="0" x2="32" y2="25" /&gt;
  &lt;line x1="32" y1="0" x2="32" y2="25" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="25" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="25" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="25" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="25" /&gt;
  &lt;line x1="33" y1="0" x2="33" y2="25" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="25" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="25" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="25" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="25" /&gt;
  &lt;line x1="34" y1="0" x2="34" y2="25" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="25" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="25" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="25" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="25" /&gt;
  &lt;line x1="35" y1="0" x2="35" y2="25" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="25" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="25" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="25" /&gt;
  &lt;line x1="36" y1="0" x2="36" y2="25" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="25" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="25" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="25" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="25" /&gt;
  &lt;line x1="37" y1="0" x2="37" y2="25" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="25" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="25" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="25" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="25" /&gt;
  &lt;line x1="38" y1="0" x2="38" y2="25" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="25" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="25" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="25" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="25" /&gt;
  &lt;line x1="39" y1="0" x2="39" y2="25" /&gt;
  &lt;line x1="40" y1="0" x2="40" y2="25" /&gt;
  &lt;line x1="40" y1="0" x2="40" y2="25" /&gt;
  &lt;line x1="40" y1="0" x2="40" y2="25" /&gt;
  &lt;line x1="40" y1="0" x2="40" y2="25" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="25" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="25" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="25" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="25" /&gt;
  &lt;line x1="41" y1="0" x2="41" y2="25" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="0.000000,0.000000 41.887587,0.000000 41.887587,25.412617 0.000000,25.412617" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="20.943793" y="45.412617" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;199&lt;/text&gt;
&lt;text x="61.887587" y="12.706308" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(0,61.887587,12.706308)"&gt;3&lt;/text&gt;&lt;/p&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="111" y1="0" x2="135" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="111" y1="120" x2="135" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="111" y1="0" x2="111" y2="120" style="stroke-width:2" /&gt;
  &lt;line x1="135" y1="24" x2="135" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="111.000000,0.000000 135.664918,24.664918 135.664918,144.664918 111.000000,120.000000" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="111" y1="0" x2="201" y2="0" style="stroke-width:2" /&gt;
  &lt;line x1="135" y1="24" x2="225" y2="24" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="111" y1="0" x2="135" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="201" y1="0" x2="225" y2="24" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="111.000000,0.000000 201.000000,0.000000 225.664918,24.664918 135.664918,24.664918" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Horizontal lines --&gt;
  &lt;line x1="135" y1="24" x2="225" y2="24" style="stroke-width:2" /&gt;
  &lt;line x1="135" y1="144" x2="225" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Vertical lines --&gt;
  &lt;line x1="135" y1="24" x2="135" y2="144" style="stroke-width:2" /&gt;
  &lt;line x1="225" y1="24" x2="225" y2="144" style="stroke-width:2" /&gt;
  &lt;!-- Colored Rectangle --&gt;
  &lt;polygon points="135.664918,24.664918 225.664918,24.664918 225.664918,144.664918 135.664918,144.664918" style="fill:#ECB172A0;stroke-width:0"/&gt;
  &lt;!-- Text --&gt;
&lt;p&gt;&lt;text x="180.664918" y="164.664918" font-size="1.0rem" font-weight="100" text-anchor="middle" &gt;768&lt;/text&gt;
&lt;text x="245.664918" y="84.664918" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(-90,245.664918,84.664918)"&gt;1024&lt;/text&gt;
&lt;text x="113.332459" y="152.332459" font-size="1.0rem" font-weight="100" text-anchor="middle" transform="rotate(45,113.332459,152.332459)"&gt;201&lt;/text&gt;
&lt;/svg&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;That’s a 180 GB logical array, composed of around 600 chunks, each of size 300
MB. We can now do normal NumPy like computations on this array using &lt;a class="reference external" href="https://docs.dask.org/en/latest/array.html"&gt;Dask
Array&lt;/a&gt;, but we’ll save that for a
future post.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="c1"&gt;# array computations would work fine, and would run in low memory&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="c1"&gt;# but we&amp;#39;ll save actual computation for future posts&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/20/load-image-data.md&lt;/span&gt;, line 1056)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="save-data"&gt;
&lt;h1&gt;Save Data&lt;/h1&gt;
&lt;p&gt;To simplify data loading in the future, we store this in a large chunked
array format like &lt;a class="reference external" href="https://zarr.readthedocs.io/"&gt;Zarr&lt;/a&gt; using the &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.Array.to_zarr"&gt;to_zarr&lt;/a&gt;
method.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_zarr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;mydata.zarr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We may add additional information about the image data as &lt;a class="reference external" href="https://zarr.readthedocs.io/en/stable/tutorial.html#user-attributes"&gt;attributes&lt;/a&gt;. This
both makes things simpler for future users (they can read the full dataset with
a single line using &lt;a class="reference external" href="http://docs.dask.org/en/latest/array-api.html#dask.array.from_zarr"&gt;da.from_zarr&lt;/a&gt;) and much
more performant because Zarr is an &lt;em&gt;analysis ready format&lt;/em&gt; that is efficiently
encoded for computation.&lt;/p&gt;
&lt;p&gt;Zarr uses the &lt;a class="reference external" href="http://blosc.org/"&gt;Blosc&lt;/a&gt; library for compression by default.
For scientific imaging data, we can optionally pass compression options that provide
a good compression ratio to speed tradeoff and optimize compression
performance.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numcodecs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Blosc&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_zarr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;mydata.zarr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compressor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Blosc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;zstd&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clevel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Blosc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BITSHUFFLE&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/20/load-image-data.md&lt;/span&gt;, line 1082)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="future-work"&gt;
&lt;h1&gt;Future Work&lt;/h1&gt;
&lt;p&gt;The workload above is generic and straightforward. It works well in simple
cases and also extends well to more complex cases, providing you’re willing to
write some for-loops and parsing code around your custom logic. It works on a
single small-scale laptop as well as a large HPC or Cloud cluster. If you have
a function that turns a filename into a NumPy array, you can generate large
lazy Dask array using that function, &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;Dask
Delayed&lt;/a&gt; and &lt;a class="reference external" href="https://docs.dask.org/en/latest/array.html"&gt;Dask
Array&lt;/a&gt;.&lt;/p&gt;
&lt;section id="dask-image"&gt;
&lt;h2&gt;Dask Image&lt;/h2&gt;
&lt;p&gt;However, we can make things a bit easier for users if we specialize a bit. For
example the &lt;a class="reference external" href="https://image.dask.org/en/latest/"&gt;Dask Image&lt;/a&gt; library has a
parallel image reader function, which automates much of our work above in the
simple case.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask_image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;raw/*.tif&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Similarly libraries like &lt;a class="reference external" href="https://xarray.pydata.org/en/stable/"&gt;Xarray&lt;/a&gt; have
readers for other file formats, like GeoTIFF.&lt;/p&gt;
&lt;p&gt;As domains do more and more work like what we did above they tend to write down
common patterns into domain-specific libraries, which then increases the
accessibility and user base of these tools.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="gpus"&gt;
&lt;h2&gt;GPUs&lt;/h2&gt;
&lt;p&gt;If we have special hardware lying around like a few GPUs, we can move the data
over to it and perform computations with a library like CuPy, which mimics
NumPy very closely. Thus benefiting from the same operations listed above, but
with the added performance of GPUs behind them.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cp&lt;/span&gt;
&lt;span class="n"&gt;a_gpu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="computation"&gt;
&lt;h2&gt;Computation&lt;/h2&gt;
&lt;p&gt;Finally, in future blogposts we plan to talk about how to compute on our large
Dask arrays using common image-processing workloads like overlapping stencil
functions, segmentation and deconvolution, and integrating with other libraries
like ITK.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/06/20/load-image-data/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="dask-image" label="dask-image"/>
    <category term="python" label="python"/>
    <category term="scikit-image" label="scikit-image"/>
    <category term="scipy" label="scipy"/>
    <published>2019-06-20T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/06/19/python-gpus-status-update/</id>
    <title>Python and GPUs: A Status Update</title>
    <updated>2019-06-19T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;p&gt;&lt;em&gt;This blogpost was delivered in talk form at the recent &lt;a class="reference external" href="https://pasc19.pasc-conference.org/"&gt;PASC
2019&lt;/a&gt; conference.
&lt;a class="reference external" href="https://docs.google.com/presentation/d/e/2PACX-1vSajAH6FzgQH4OwOJD5y-t9mjF9tTKEeljguEsfcjavp18pL4LkpABy4lW2uMykIUvP2dC-1AmhCq6l/pub?start=false&amp;amp;amp;loop=false&amp;amp;amp;delayms=60000"&gt;Slides for that talk are
here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/19/python-gpus-status-update.md&lt;/span&gt;, line 14)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;We’re improving the state of scalable GPU computing in Python.&lt;/p&gt;
&lt;p&gt;This post lays out the current status, and describes future work.
It also summarizes and links to several other more blogposts from recent months that drill down into different topics for the interested reader.&lt;/p&gt;
&lt;p&gt;Broadly we cover briefly the following categories:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Python libraries written in CUDA like CuPy and RAPIDS&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Python-CUDA compilers, specifically Numba&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scaling these libraries out with Dask&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Network communication with UCX&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Packaging with Conda&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/19/python-gpus-status-update.md&lt;/span&gt;, line 29)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="performance-of-gpu-accelerated-python-libraries"&gt;
&lt;h1&gt;Performance of GPU accelerated Python Libraries&lt;/h1&gt;
&lt;p&gt;Probably the easiest way for a Python programmer to get access to GPU
performance is to use a GPU-accelerated Python library. These provide a set of
common operations that are well tuned and integrate well together.&lt;/p&gt;
&lt;p&gt;Many users know libraries for deep learning like PyTorch and TensorFlow, but
there are several other for more general purpose computing. These tend to copy
the APIs of popular Python projects:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Numpy on the GPU: &lt;a class="reference external" href="https://cupy.chainer.org/"&gt;CuPy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Numpy on the GPU (again): &lt;a class="reference external" href="https://github.com/google/jax"&gt;Jax&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pandas on the GPU: &lt;a class="reference external" href="https://docs.rapids.ai/api/cudf/nightly/"&gt;RAPIDS cuDF&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scikit-Learn on the GPU: &lt;a class="reference external" href="https://docs.rapids.ai/api/cuml/nightly/"&gt;RAPIDS cuML&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These libraries build GPU accelerated variants of popular Python
libraries like NumPy, Pandas, and Scikit-Learn. In order to better understand
the relative performance differences
&lt;a class="reference external" href="https://github.com/pentschev"&gt;Peter Entschev&lt;/a&gt; recently put together a
&lt;a class="reference external" href="https://github.com/pentschev/pybench"&gt;benchmark suite&lt;/a&gt; to help with comparisons.
He has produced the following image showing the relative speedup between GPU
and CPU:&lt;/p&gt;
&lt;style&gt;
.vega-actions a {
    margin-right: 12px;
    color: #757575;
    font-weight: normal;
    font-size: 13px;
}
.error {
    color: red;
}
&lt;/style&gt;
&lt;script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@5"&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@3.3.0"&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@4"&gt;&lt;/script&gt;
&lt;div id="vis"&gt;&lt;/div&gt;
&lt;p&gt;There are lots of interesting results there.
Peter goes into more depth in this in &lt;a class="reference external" href="https://blog.dask.org/2019/06/27/single-gpu-cupy-benchmarks"&gt;his blogpost&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;More broadly though, we see that there is variability in performance.
Our mental model for what is fast and slow on the CPU doesn’t neccessarily
carry over to the GPU. Fortunately though, due consistent APIs, users that are
familiar with Python can easily experiment with GPU acceleration without
learning CUDA.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/19/python-gpus-status-update.md&lt;/span&gt;, line 78)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="numba-compiling-python-to-cuda"&gt;
&lt;h1&gt;Numba: Compiling Python to CUDA&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;See also this &lt;a class="reference external" href="https://blog.dask.org/2019/04/09/numba-stencil"&gt;recent blogpost about Numba
stencils&lt;/a&gt; and the attached &lt;a class="reference external" href="https://gist.github.com/mrocklin/9272bf84a8faffdbbe2cd44b4bc4ce3c"&gt;GPU
notebook&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The built-in operations in GPU libraries like CuPy and RAPIDS cover most common
operations. However, in real-world settings we often find messy situations
that require writing a little bit of custom code. Switching down to C/C++/CUDA
in these cases can be challenging, especially for users that are primarily
Python developers. This is where Numba can come in.&lt;/p&gt;
&lt;p&gt;Python has this same problem on the CPU as well. Users often couldn’t be
bothered to learn C/C++ to write fast custom code. To address this there are
tools like Cython or Numba, which let Python programmers write fast numeric
code without learning much beyond the Python language.&lt;/p&gt;
&lt;p&gt;For example, Numba accelerates the for-loop style code below about 500x on the
CPU, from slow Python speeds up to fast C/Fortran speeds.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numba&lt;/span&gt;  &lt;span class="c1"&gt;# We added these two lines for a 500x speedup&lt;/span&gt;

&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jit&lt;/span&gt;    &lt;span class="c1"&gt;# We added these two lines for a 500x speedup&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The ability to drop down to low-level performant code without context switching
out of Python is useful, particularly if you don’t already know C/C++ or
have a compiler chain set up for you (which is the case for most Python users
today).&lt;/p&gt;
&lt;p&gt;This benefit is even more pronounced on the GPU. While many Python programmers
know a little bit of C, very few of them know CUDA. Even if they did, they
would probably have difficulty in setting up the compiler tools and development
environment.&lt;/p&gt;
&lt;p&gt;Enter &lt;a class="reference external" href="https://numba.pydata.org/numba-doc/dev/cuda/index.html"&gt;numba.cuda.jit&lt;/a&gt;
Numba’s backend for CUDA. Numba.cuda.jit allows Python users to author,
compile, and run CUDA code, written in Python, interactively without leaving a
Python session. Here is an image of writing a stencil computation that
smoothes a 2d-image all from within a Jupyter Notebook:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/numba.cuda.jit.png"
     width="100%"
     alt="Numba.cuda.jit in a Jupyter Notebook"&gt;&lt;/p&gt;
&lt;p&gt;Here is a simplified comparison of Numba CPU/GPU code to compare programming
style..
The GPU code gets a 200x speed improvement over a single CPU core.&lt;/p&gt;
&lt;section id="cpu-600-ms"&gt;
&lt;h2&gt;CPU – 600 ms&lt;/h2&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jit&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;or if we use the fancy numba.stencil decorator …&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stencil&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="gpu-3-ms"&gt;
&lt;h2&gt;GPU – 3 ms&lt;/h2&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jit&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;smooth_gpu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                     &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                     &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Numba.cuda.jit has been out in the wild for years.
It’s accessible, mature, and fun to play with.
If you have a machine with a GPU in it and some curiosity
then we strongly recommend that you try it out.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;conda&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt;
&lt;span class="c1"&gt;# or&lt;/span&gt;
&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numba.cuda&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/19/python-gpus-status-update.md&lt;/span&gt;, line 186)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="scaling-with-dask"&gt;
&lt;h1&gt;Scaling with Dask&lt;/h1&gt;
&lt;p&gt;As mentioned in previous blogposts
(
&lt;a class="reference external" href="https://blog.dask.org/2019/01/03/dask-array-gpus-first-steps"&gt;1&lt;/a&gt;,
&lt;a class="reference external" href="https://blog.dask.org/2019/01/13/dask-cudf-first-steps"&gt;2&lt;/a&gt;,
&lt;a class="reference external" href="https://blog.dask.org/2019/03/04/building-gpu-groupbys"&gt;3&lt;/a&gt;,
&lt;a class="reference external" href="https://blog.dask.org/2019/03/18/dask-nep18"&gt;4&lt;/a&gt;
)
we’ve been generalizing &lt;a class="reference external" href="https://dask.org"&gt;Dask&lt;/a&gt;, to operate not just with
Numpy arrays and Pandas dataframes, but with anything that looks enough like
Numpy (like &lt;a class="reference external" href="https://cupy.chainer.org/"&gt;CuPy&lt;/a&gt; or
&lt;a class="reference external" href="https://sparse.pydata.org/en/latest/"&gt;Sparse&lt;/a&gt; or
&lt;a class="reference external" href="https://github.com/google/jax"&gt;Jax&lt;/a&gt;) or enough like Pandas (like &lt;a class="reference external" href="https://docs.rapids.ai/api/cudf/nightly/"&gt;RAPIDS
cuDF&lt;/a&gt;)
to scale those libraries out too. This is working out well. Here is a brief
video showing Dask array computing an SVD in parallel, and seeing what happens
when we swap out the Numpy library for CuPy.&lt;/p&gt;
&lt;iframe width="560"
        height="315"
        src="https://www.youtube.com/embed/QyyxpzNPuIE?start=1046"
        frameborder="0"
        allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
        allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;We see that there is about a 10x speed improvement on the computation. Most
importantly, we were able to switch between a CPU implementation and a GPU
implementation with a small one-line change, but continue using the
sophisticated algorithms with Dask Array, like it’s parallel SVD
implementation.&lt;/p&gt;
&lt;p&gt;We also saw a relative slowdown in communication. In general almost all
non-trivial Dask + GPU work today is becoming communication-bound. We’ve
gotten fast enough at computation that the relative importance of communication
has grown significantly. We’re working to resolve this with our next topic,
UCX.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/19/python-gpus-status-update.md&lt;/span&gt;, line 224)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="communication-with-ucx"&gt;
&lt;h1&gt;Communication with UCX&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;See &lt;a class="reference external" href="https://developer.download.nvidia.com/video/gputechconf/gtc/2019/video/S9679/s9679-ucx-python-a-flexible-communication-library-for-python-applications.mp4"&gt;this talk&lt;/a&gt; by &lt;a class="reference external" href="https://github.com/Akshay-Venkatesh"&gt;Akshay
Venkatesh&lt;/a&gt; or view &lt;a class="reference external" href="https://www.slideshare.net/MatthewRocklin/ucxpython-a-flexible-communication-library-for-python-applications"&gt;the
slides&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Also see &lt;a class="reference external" href="https://blog.dask.org/2019/06/09/ucx-dgx"&gt;this recent blogpost about UCX and
Dask&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We’ve been integrating the &lt;a class="reference external" href="https://openucx.org"&gt;OpenUCX&lt;/a&gt; library into Python
with &lt;a class="reference external" href="https://github.com/rapidsai/ucx-py"&gt;UCX-Py&lt;/a&gt;. UCX provides uniform access
to transports like TCP, InfiniBand, shared memory, and NVLink. UCX-Py is the
first time that access to many of these transports has been easily accessible
from the Python language.&lt;/p&gt;
&lt;p&gt;Using UCX and Dask together we’re able to get significant speedups. Here is a
trace of the SVD computation from before both before and after adding UCX:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Before UCX&lt;/strong&gt;:&lt;/p&gt;
&lt;iframe src="https://matthewrocklin.com/raw-host/task_stream_lcc_dgx16.html" width="100%" height="200"&gt;&lt;/iframe&gt;
&lt;p&gt;&lt;strong&gt;After UCX&lt;/strong&gt;:&lt;/p&gt;
&lt;iframe src="https://matthewrocklin.com/raw-host/task_stream_dgx_dgx16.html" width="100%" height="200"&gt;&lt;/iframe&gt;
&lt;p&gt;There is still a great deal to do here though (the blogpost linked above has
several items in the Future Work section).&lt;/p&gt;
&lt;p&gt;People can try out UCX and UCX-Py with highly experimental conda packages:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;conda&lt;/span&gt; &lt;span class="n"&gt;create&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="n"&gt;ucx&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="n"&gt;conda&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;forge&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="n"&gt;jakirkham&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ucx&lt;/span&gt; &lt;span class="n"&gt;cudatoolkit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;9.2&lt;/span&gt; &lt;span class="n"&gt;ucx&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="o"&gt;=*=&lt;/span&gt;&lt;span class="n"&gt;gpu&lt;/span&gt; &lt;span class="n"&gt;ucx&lt;/span&gt; &lt;span class="n"&gt;ucx&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.7&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We hope that this work will also affect non-GPU users on HPC systems with
Infiniband, or even users on consumer hardware due to the easy access to shared
memory communication.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/19/python-gpus-status-update.md&lt;/span&gt;, line 263)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="packaging"&gt;
&lt;h1&gt;Packaging&lt;/h1&gt;
&lt;p&gt;In an &lt;a class="reference external" href="https://matthewrocklin.com/blog/work/2018/12/17/gpu-python-challenges"&gt;earlier blogpost&lt;/a&gt;
we discussed the challenges around installing the wrong versions of CUDA
enabled packages that don’t match the CUDA driver installed on the system.
Fortunately due to recent work from &lt;a class="reference external" href="https://github.com/seibert"&gt;Stan Seibert&lt;/a&gt;
and &lt;a class="reference external" href="https://github.com/msarahan"&gt;Michael Sarahan&lt;/a&gt; at Anaconda, Conda 4.7 now
has a special &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cuda&lt;/span&gt;&lt;/code&gt; meta-package that is set to the version of the installed
driver. This should make it much easier for users in the future to install the
correct package.&lt;/p&gt;
&lt;p&gt;Conda 4.7 was just releasead, and comes with many new features other than the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cuda&lt;/span&gt;&lt;/code&gt; meta-package. You can read more about it &lt;a class="reference external" href="https://www.anaconda.com/how-we-made-conda-faster-4-7/"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;conda&lt;/span&gt; &lt;span class="n"&gt;update&lt;/span&gt; &lt;span class="n"&gt;conda&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;There is still plenty of work to do in the packaging space today.
Everyone who builds conda packages does it their own way,
resulting in headache and heterogeneity.
This is largely due to not having centralized infrastructure
to build and test CUDA enabled packages,
like we have in &lt;a class="reference external" href="https://conda-forge.org"&gt;Conda Forge&lt;/a&gt;.
Fortunately, the Conda Forge community is working together with Anaconda and
NVIDIA to help resolve this, though that will likely take some time.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/19/python-gpus-status-update.md&lt;/span&gt;, line 290)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="summary"&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;This post gave an update of the status of some of the efforts behind GPU
computing in Python. It also provided a variety of links for future reading.
We include them below if you would like to learn more:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.google.com/presentation/d/e/2PACX-1vSajAH6FzgQH4OwOJD5y-t9mjF9tTKEeljguEsfcjavp18pL4LkpABy4lW2uMykIUvP2dC-1AmhCq6l/pub?start=false&amp;amp;amp;loop=false&amp;amp;amp;delayms=60000"&gt;Slides&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Numpy on the GPU: &lt;a class="reference external" href="https://cupy.chainer.org/"&gt;CuPy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Numpy on the GPU (again): &lt;a class="reference external" href="https://github.com/google/jax"&gt;Jax&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pandas on the GPU: &lt;a class="reference external" href="https://docs.rapids.ai/api/cudf/nightly/"&gt;RAPIDS cuDF&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scikit-Learn on the GPU: &lt;a class="reference external" href="https://docs.rapids.ai/api/cuml/nightly/"&gt;RAPIDS cuML&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/pentschev/pybench"&gt;Benchmark suite&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/mrocklin/9272bf84a8faffdbbe2cd44b4bc4ce3c"&gt;Numba CUDA JIT notebook&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://developer.download.nvidia.com/video/gputechconf/gtc/2019/video/S9679/s9679-ucx-python-a-flexible-communication-library-for-python-applications.mp4"&gt;A talk on UCX&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2019/06/09/ucx-dgx"&gt;A blogpost on UCX and Dask&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.anaconda.com/how-we-made-conda-faster-4-7/"&gt;Conda 4.7&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;script&gt;
  var spec = {
  "config": {
    "view": {
      "width": 300,
      "height": 200
    },
    "mark": {
      "tooltip": null
    },
    "axis": {
      "grid": false,
      "labelColor": "#666666",
      "labelFontSize": 16,
      "titleColor": "#666666",
      "titleFontSize": 20
    },
    "axisX": {
      "labelAngle": -30,
      "labelColor": "#666666",
      "labelFontSize": 0,
      "titleColor": "#666666",
      "titleFontSize": 0
    },
    "header": {
      "labelAngle": -20,
      "labelColor": "#666666",
      "labelFontSize": 16,
      "titleColor": "#666666",
      "titleFontSize": 20
    },
    "legend": {
      "fillColor": "#fefefe",
      "labelColor": "#666666",
      "labelFontSize": 18,
      "padding": 10,
      "strokeColor": "gray",
      "titleColor": "#666666",
      "titleFontSize": 18
    }
  },
  "data": {
    "name": "data-4957f64f65957150f8029f7df2e6936f"
  },
  "facet": {
    "column": {
      "type": "nominal",
      "field": "operation",
      "sort": {
        "field": "speedup",
        "op": "sum",
        "order": "descending"
      },
      "title": "Operation"
    }
  },
  "spec": {
    "layer": [
      {
        "mark": {
          "type": "bar",
          "fontSize": 18,
          "opacity": 1.0
        },
        "encoding": {
          "color": {
            "type": "nominal",
            "field": "size",
            "scale": {
              "domain": [
                "800MB",
                "8MB"
              ],
              "range": [
                "#7306ff",
                "#36c9dd"
              ]
            },
            "title": "Array Size"
          },
          "x": {
            "type": "nominal",
            "field": "size"
          },
          "y": {
            "type": "quantitative",
            "axis": {
              "title": "GPU Speedup Over CPU"
            },
            "field": "speedup",
            "scale": {
              "domain": [
                0,
                1000
              ],
              "type": "symlog"
            },
            "stack": null
          }
        },
        "height": 300,
        "width": 50
      },
      {
        "layer": [
          {
            "mark": {
              "type": "text",
              "dy": -5
            },
            "encoding": {
              "color": {
                "type": "nominal",
                "field": "size",
                "scale": {
                  "domain": [
                    "800MB",
                    "8MB"
                  ],
                  "range": [
                    "#7306ff",
                    "#36c9dd"
                  ]
                },
                "title": "Array Size"
              },
              "text": {
                "type": "quantitative",
                "field": "speedup"
              },
              "x": {
                "type": "nominal",
                "field": "size"
              },
              "y": {
                "type": "quantitative",
                "axis": {
                  "title": "GPU Speedup Over CPU"
                },
                "field": "speedup",
                "scale": {
                  "domain": [
                    0,
                    1000
                  ],
                  "type": "symlog"
                },
                "stack": null
              }
            },
            "height": 300,
            "width": 50
          },
          {
            "mark": {
              "type": "text",
              "dy": 7
            },
            "encoding": {
              "color": {
                "type": "nominal",
                "field": "size",
                "scale": {
                  "domain": [
                    "800MB",
                    "8MB"
                  ],
                  "range": [
                    "#7306ff",
                    "#36c9dd"
                  ]
                },
                "title": "Array Size"
              },
              "text": {
                "type": "quantitative",
                "field": "speedup"
              },
              "x": {
                "type": "nominal",
                "field": "size"
              },
              "y": {
                "type": "quantitative",
                "axis": {
                  "title": "GPU Speedup Over CPU"
                },
                "field": "speedup",
                "scale": {
                  "domain": [
                    0,
                    1000
                  ],

                  "type": "symlog"
                },
                "stack": null
              }
            },
            "height": 300,
            "width": 50
          }
        ]
      }
    ]
  },
  "$schema": "https://vega.github.io/schema/vega-lite/v3.3.0.json",
  "datasets": {
    "data-4957f64f65957150f8029f7df2e6936f": [
      {
        "operation": "FFT",
        "speedup": 5.3,
        "shape0": 1000,
        "shape1": 1000,
        "shape": "1000x1000",
        "size": "8MB"
      },
      {
        "operation": "FFT",
        "speedup": 210.0,
        "shape0": 10000,
        "shape1": 10000,
        "shape": "10000x10000",
        "size": "800MB"
      },
      {
        "operation": "Sum",
        "speedup": 8.3,
        "shape0": 1000,
        "shape1": 1000,
        "shape": "1000x1000",
        "size": "8MB"
      },
      {
        "operation": "Sum",
        "speedup": 66.0,
        "shape0": 10000,
        "shape1": 10000,
        "shape": "10000x10000",
        "size": "800MB"
      },
      {
        "operation": "Standard Deviation",
        "speedup": 1.1,
        "shape0": 1000,
        "shape1": 1000,
        "shape": "1000x1000",
        "size": "8MB"
      },
      {
        "operation": "Standard Deviation",
        "speedup": 3.5,
        "shape0": 10000,
        "shape1": 10000,
        "shape": "10000x10000",
        "size": "800MB"
      },
      {
        "operation": "Elementwise",
        "speedup": 150.0,
        "shape0": 1000,
        "shape1": 1000,
        "shape": "1000x1000",
        "size": "8MB"
      },
      {
        "operation": "Elementwise",
        "speedup": 270.0,
        "shape0": 10000,
        "shape1": 10000,
        "shape": "10000x10000",
        "size": "800MB"
      },
      {
        "operation": "Matrix Multiplication",
        "speedup": 18.0,
        "shape0": 1000,
        "shape1": 1000,
        "shape": "1000x1000",
        "size": "8MB"
      },
      {
        "operation": "Matrix Multiplication",
        "speedup": 11.0,
        "shape0": 10000,
        "shape1": 10000,
        "shape": "10000x10000",
        "size": "800MB"
      },
      {
        "operation": "Array Slicing",
        "speedup": 3.6,
        "shape0": 1000,
        "shape1": 1000,
        "shape": "1000x1000",
        "size": "8MB"
      },
      {
        "operation": "Array Slicing",
        "speedup": 190.0,
        "shape0": 10000,
        "shape1": 10000,
        "shape": "10000x10000",
        "size": "800MB"
      },
      {
        "operation": "SVD",
        "speedup": 1.5,
        "shape0": 1000,
        "shape1": 1000,
        "shape": "1000x1000",
        "size": "8MB"
      },
      {
        "operation": "SVD",
        "speedup": 17.0,
        "shape0": 10000,
        "shape1": 1000,
        "shape": "10000x1000",
        "size": "800MB"
      },
      {
        "operation": "Stencil",
        "speedup": 5.1,
        "shape0": 1000,
        "shape1": 1000,
        "shape": "1000x1000",
        "size": "8MB"
      },
      {
        "operation": "Stencil",
        "speedup": 150.0,
        "shape0": 10000,
        "shape1": 10000,
        "shape": "10000x10000",
        "size": "800MB"
      }
    ]
  }
};

  var embedOpt = {"mode": "vega-lite"};

  function showError(el, error){
      el.innerHTML = ('&lt;div class="error" style="color:red;"&gt;'
                      + '&lt;p&gt;JavaScript Error: ' + error.message + '&lt;/p&gt;'
                      + "&lt;p&gt;This usually means there's a typo in your chart specification. "
                      + "See the javascript console for the full traceback.&lt;/p&gt;"
                      + '&lt;/div&gt;');
      throw error;
  }
  vegaEmbed("#vis", spec, embedOpt)
    .catch(error =&gt; showError(el, error));
&lt;/script&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/06/19/python-gpus-status-update/"/>
    <summary>This blogpost was delivered in talk form at the recent PASC
2019 conference.
Slides for that talk are
here.</summary>
    <category term="python" label="python"/>
    <category term="scipy" label="scipy"/>
    <published>2019-06-19T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/06/12/dask-on-hpc/</id>
    <title>Dask on HPC</title>
    <updated>2019-06-12T00:00:00+00:00</updated>
    <author>
      <name>Joe Hamman (NCAR)</name>
    </author>
    <content type="html">&lt;p&gt;We analyze large datasets on HPC systems with Dask, a parallel computing
library that integrates well with the existing Python software ecosystem, and
works comfortably with native HPC hardware.&lt;/p&gt;
&lt;p&gt;This article explains why this approach makes sense for us.
Our motivation is to share our experiences with our colleagues,
and to highlight opportunities for future work.&lt;/p&gt;
&lt;p&gt;We start with six reasons why we use Dask,
followed by seven issues that affect us today.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/12/dask-on-hpc.md&lt;/span&gt;, line 21)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="reasons-why-we-use-dask"&gt;

&lt;section id="ease-of-use"&gt;
&lt;h2&gt;1. Ease of use&lt;/h2&gt;
&lt;p&gt;Dask extends libraries like Numpy, Pandas, and Scikit-learn, which are well-known APIs for scientists and engineers. It also extends simpler APIs for
multi-node multiprocessing. This makes it easy for our existing user base to
get up to speed.&lt;/p&gt;
&lt;p&gt;By abstracting the parallelism away from the user/developer, our analysis tools can be written by computer science non-experts, such as the scientists
themselves, meaning that our software engineers can take on more of a supporting role than a leadership role.
Experience has shown that, with tools like Dask and Jupyter, scientists spend less time coding and more time thinking about science, as they should.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="smooth-hpc-integration"&gt;
&lt;h2&gt;2. Smooth HPC integration&lt;/h2&gt;
&lt;p&gt;With tools like &lt;a class="reference external" href="https://jobqueue.dask.org"&gt;Dask Jobqueue&lt;/a&gt; and &lt;a class="reference external" href="https://mpi.dask.org"&gt;Dask MPI&lt;/a&gt; there is no need of any boilerplate shell scripting code commonly found with job queueing systems.&lt;/p&gt;
&lt;p&gt;Dask interacts natively with our existing job schedulers (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;SLURM&lt;/span&gt;&lt;/code&gt;/&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;SGE&lt;/span&gt;&lt;/code&gt;/&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;LSF&lt;/span&gt;&lt;/code&gt;/&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;PBS&lt;/span&gt;&lt;/code&gt;/…)
so there is no additional system to set up and manage between users and IT.
All the infrastructure that we need is already in place.&lt;/p&gt;
&lt;p&gt;Interactive analysis at scale is powerful, and lets
us use our existing infrastructure in new ways.
Auto scaling improves our occupancy and helps with acceptance by HPC operators / owners.
Dask’s resilience against the death of all or part of its workers offers new ways of leveraging job-preemption when co-locating classical HPC workloads with analytics jobs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="aimed-for-scientific-processing"&gt;
&lt;h2&gt;3. Aimed for Scientific Processing&lt;/h2&gt;
&lt;p&gt;In addition to being integrated with the Scipy and PyData software ecosystems,
Dask is compatible with scientific data formats like HDF5, NetCDF, Parquet, and
so on. This is because Dask works with other libraries within the Python
ecosystem, like Xarray, which already have strong support for scientific data
formats and processing, and with C/C++/Fortran codes, such as is common for Python libraries.&lt;/p&gt;
&lt;p&gt;This native support is one of the major advantages that we’ve seen of Dask over Apache Spark.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="versatility-of-apis"&gt;
&lt;h2&gt;4. Versatility of APIs&lt;/h2&gt;
&lt;p&gt;And yet Dask is not designed for any particular workflow, but instead can
provide infrastructure to cover a variety of different problems within an
institution. Many different kinds of workloads are possible:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;You can easily handle Numpy arrays or Pandas Dataframes at scale, doing some numerical work or data analysis/cleaning,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can handle any objects collection, like JSON files, text, or log files,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can express more arbitrary task or job scheduling workloads with Dask Delayed, or real time and reactive processing with Dask Futures.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dask covers and simplifies many of the wide range of HPC workflows we’ve seen over the years. Many workflows that were previously implemented using job arrays, simplified MPI (e.g. mpi4py) or plain bash scripts seem to be easier for our users with Dask.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="versatility-of-infrastructure"&gt;
&lt;h2&gt;5. Versatility of Infrastructure&lt;/h2&gt;
&lt;p&gt;Dask is compatible with laptops, servers, HPC systems, and cloud computing. The environment can change with very little code adaptation which reduces our burden to rewrite code as we migrate analysis between systems such as from a laptop to a supercomputer, or between a supercomputer and the cloud.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Local machines&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# HPC Job Schedulers&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_jobqueue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SLURMCluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PBSCluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SGECluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SLURMCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;default&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ABCD1234&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Hadoop/Spark clusters&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_yarn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;YARNCluster&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;YarnCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;environment.tar.gz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker_vcores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Cloud/Kubernetes clusters&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_kubernetes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KubeCluster&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;KubeCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pod_spec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Dask is more than just a tool to us; it is a gateway to thinking about a completely different way of providing computing infrastructure to our users. Dask opens up the door to cloud computing technologies (such as elastic scaling and object storage) and makes us rethink what an HPC center should really look like.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="cost-and-collaboration"&gt;
&lt;h2&gt;6. Cost and Collaboration&lt;/h2&gt;
&lt;p&gt;Dask is free and open source, which means we do not have to rebalance our budget and staff to address the new immediate need of data analysis tools.
We don’t have to pay for licenses, and we have the ability to make changes to the code when necessary. The HPC community has good representation among Dask developers. It’s easy for us to participate and our concerns are well understood.&lt;/p&gt;
&lt;!-- WR: Mention quick response to new use cases / demands as another benefit of the collaborative
approach?  And hint towards dask-mpi here? --&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/12/dask-on-hpc.md&lt;/span&gt;, line 100)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="what-needs-work"&gt;
&lt;h1&gt;What needs work&lt;/h1&gt;
&lt;section id="heterogeneous-resources-handling"&gt;
&lt;h2&gt;1. Heterogeneous resources handling&lt;/h2&gt;
&lt;p&gt;Often we want to include different kinds of HPC nodes in the same deployment.
This includes situations like the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Workers with low or high memory,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Workers with GPUs,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Workers from different node pools.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dask provides some support for this heterogeneity already, but not enough.
We see two major opportunities for improvement.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Tools like Dask-Jobqueue should make it easier to manage multiple worker
pools within the same cluster. Currently the deployment solution assumes
homogeneity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It should be easier for users to specify which parts of a computation
require different hardware. The solution today works, but requires more
detail from the user than is ideal.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="coarse-grained-diagnostics-and-history"&gt;
&lt;h2&gt;2. Coarse-Grained Diagnostics and History&lt;/h2&gt;
&lt;p&gt;Dask provides a number of profiling tools that deliver real-time diagnostics at the individual task-level, but there is no way today to analyze or profile your Dask application at a coarse-grained level, and no built-in way to track performance over long periods of time.&lt;/p&gt;
&lt;p&gt;Having more tools to analyze bulk performance would be helpful when making design decisions and future architecture choices.&lt;/p&gt;
&lt;p&gt;Having the ability to persist or store history of computations (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;compute()&lt;/span&gt;&lt;/code&gt; calls)
and tasks executed on a scheduler could be really helpful to track problems and potential performance improvements.&lt;/p&gt;
&lt;!-- JJH: One tangible idea here would be a benchmarking suite that helps users make decisions about how to use dask most effectively.
 --&gt;
&lt;/section&gt;
&lt;section id="scheduler-performance-on-large-graphs"&gt;
&lt;h2&gt;3. Scheduler Performance on Large Graphs&lt;/h2&gt;
&lt;p&gt;HPC users want to analyze Petabyte datasets on clusters of thousands of large nodes.&lt;/p&gt;
&lt;p&gt;While Dask can theoretically handle this scale, it does tend to slow down a bit,
reducing the pleasure of interactive large-scale computing. Handling millions of tasks can lead to tens of seconds latency before a computation actually starts. This is perfectly fine for our Dask batch jobs, but tends to make the interactive Jupyter users frustrated.&lt;/p&gt;
&lt;p&gt;Much of this slowdown is due to task-graph construction time and centralized scheduling, both of which can be accelerated through a variety of means. We expect that, with some cleverness, we can increase the scale at which Dask continues to run smoothly by another order of magnitude.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="launch-batch-jobs-with-mpi"&gt;
&lt;h2&gt;4. ~~Launch Batch Jobs with MPI~~&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;This issue was resolved while we prepared this blogpost.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Most Dask workflows today are interactive. People log into a Jupyter notebook, import Dask, and then Dask asks the job scheduler (like SLURM, PBS, …) for resources dynamically. This is great because Dask is able to fit into small gaps in the schedule, release workers when they’re not needed, giving users a pleasant interactive experience while lessening the load on the cluster.&lt;/p&gt;
&lt;p&gt;However not all jobs are interactive. Often scientists want to submit a large job similar to how they submit MPI jobs. They submit a single job script with the necessary resources, walk away, and the resource manager runs that job when those resources become available (which may be many hours from now). While not as novel as the interactive workloads, these workloads are critical to common processes, and important to support.&lt;/p&gt;
&lt;p&gt;This point was raised by Kevin Paul at NCAR during discussion of this blogpost. Between when we started planning and when we released this blogpost Kevin had already solved the problem by prodiving &lt;a class="reference external" href="https://dask-mpi.readthedocs.org"&gt;dask-mpi&lt;/a&gt;, a project that makes it easy to launch Dask using normal &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mpirun&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mpiexec&lt;/span&gt;&lt;/code&gt; commands, making it easy to deploy Dask anywhere that MPI is deployable.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="more-data-formats"&gt;
&lt;h2&gt;5. More Data Formats&lt;/h2&gt;
&lt;p&gt;Dask works well today with bread-and-butter scientific data formats like HDF5, Grib, and NetCDF, as well as common data science formats like CSV, JSON, Parquet, ORC, and so on.&lt;/p&gt;
&lt;p&gt;However, the space of data formats is vast and Dask users find themselves struggling a little, or even solving the data ingestion problem manually for a number of common formats in different domains:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Remote sensing datasets: GeoTIFF, Jpeg2000,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Astronomical data: FITS, VOTable,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;… and so on&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Supporting these isn’t hard (indeed many of us have built our own support for them in Dask), but it would be handy to have a high quality centralized solution.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="link-with-deep-learning"&gt;
&lt;h2&gt;6. Link with Deep Learning&lt;/h2&gt;
&lt;p&gt;Many of our institutions are excited to leverage recent advances in deep learning and integrate powerful tools like Keras, TensorFlow, and PyTorch and powerful hardware like GPUs into our workflows.&lt;/p&gt;
&lt;p&gt;However, we often find that our data and architecture look a bit different from what we find in standard deep learning tutorials. We like using Dask for data ingestion, cleanup, and pre-processing, but would like to establish better practices and smooth tooling to transition from scientific workflows on HPC using Dask to deep learning as efficiently as possible.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;For more information, see &lt;a class="reference external" href="https://github.com/pangeo-data/pangeo/issues/567"&gt;this github
issue&lt;/a&gt; for an example topic.&lt;/em&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="more-calculation-guidelines"&gt;
&lt;h2&gt;7. More calculation guidelines&lt;/h2&gt;
&lt;p&gt;While there are means to analyse and diagnose computations interactively, and
a quite decent set of examples for Dask common calculations, trials and error appear to be the norm with big HPC computation before coming to optimized workflows.&lt;/p&gt;
&lt;p&gt;We should develop more guidelines and strategy on how to perform large scale computation, and we need to foster the community around Dask, which is already done in projects such as Pangeo. Note that these guidelines may be infrastructure dependent.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/06/12/dask-on-hpc/"/>
    <summary>We analyze large datasets on HPC systems with Dask, a parallel computing
library that integrates well with the existing Python software ecosystem, and
works comfortably with native HPC hardware.</summary>
    <category term="Programming" label="Programming"/>
    <category term="Python" label="Python"/>
    <category term="dask" label="dask"/>
    <category term="scipy" label="scipy"/>
    <published>2019-06-12T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/06/09/ucx-dgx/</id>
    <title>Experiments in High Performance Networking with UCX and DGX</title>
    <updated>2019-06-09T00:00:00+00:00</updated>
    <author>
      <name>Rick Zamora</name>
    </author>
    <content type="html">&lt;p&gt;&lt;em&gt;This post is about experimental and rapidly changing software.
Code examples in this post should not be relied upon to work in the future.&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/09/ucx-dgx.md&lt;/span&gt;, line 12)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;This post talks about connecting UCX, a high performance networking library, to
Dask, a parallel Python library, to accelerate communication-heavy workloads,
particularly when using GPUs.&lt;/p&gt;
&lt;p&gt;Additionally, we do this work on a DGX, a high-end multi-CPU multi-GPU machine
with a complex internal network. Working in this context was good to force
improvements in setting up Dask in heterogeneous situations targeting
different network cards, CPU sockets, GPUs, and so on..&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/09/ucx-dgx.md&lt;/span&gt;, line 23)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="motivation"&gt;
&lt;h1&gt;Motivation&lt;/h1&gt;
&lt;p&gt;Many distributed computing workloads are communication-bound.
This is common in cases like the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Dataframe joins&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Machine learning algorithms&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Complex array computations&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Communication becomes a bigger bottleneck as we accelerate our computation,
such as when we use GPUs for computing.&lt;/p&gt;
&lt;p&gt;Historically, high performance communication was only available using MPI, or
with custom solutions. This post describes an effort to get close to the
communication bandwidth of MPI while still maintaining the ease of
programmability and accessibility of a dynamic system like Dask.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/09/ucx-dgx.md&lt;/span&gt;, line 40)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="ucx-python-and-dask"&gt;
&lt;h1&gt;UCX, Python, and Dask&lt;/h1&gt;
&lt;p&gt;To get high performance networking in Dask, we wrapped UCX with Python and
then connected that to Dask.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="http://www.openucx.org/"&gt;OpenUCX&lt;/a&gt; project provides a uniform API around
various high performance networking libraries like InfiniBand, traditional
networking protocols like TCP/shared memory, and GPU-specific protocols like
NVLink. It is a layer beneath something like OpenMPI (the main user of OpenUCX
today) that figures out which networking system to use.&lt;/p&gt;
&lt;a href="http://www.openucx.org/wp-content/uploads/2015/07/ucx-architecture-1024x505.jpg"&gt;
&lt;img src="http://www.openucx.org/wp-content/uploads/2015/07/ucx-architecture-1024x505.jpg"
     width="100%" /&gt;&lt;/a&gt;
&lt;p&gt;Python users today don’t have much access to these network libraries, except
through MPI, which is sometimes not ideal. (&lt;a class="reference external" href="https://pypi.org/search/?q=infiniband"&gt;Try searching for “infiniband” on
PyPI.&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;This led us to create &lt;a class="reference external" href="https://github.com/rapidsai/ucx-py/"&gt;UCX-Py&lt;/a&gt;
.
UCX-Py is a Python wrapper around the UCX C library, which provides a Pythonic
API, both with blocking syntax appropriate for traditional HPC programs, as
well as a non-blocking &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;async/await&lt;/span&gt;&lt;/code&gt; syntax for more concurrent programs (like
Dask).
For more information on UCX I recommend watching Akshay’s &lt;a class="reference external" href="https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9679-ucx-python%3a+a+flexible+communication+library+for+python+applications"&gt;UCX
talk&lt;/a&gt;
from the GPU Technology Conference 2019.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: UCX-Py was primarily developed by &lt;a class="reference external" href="https://github.com/Akshay-Venkatesh/"&gt;Akshay Venkatesh&lt;/a&gt; (UCX, NVIDIA)
&lt;a class="reference external" href="https://tomaugspurger.github.io/"&gt;Tom Augspurger&lt;/a&gt; (Dask, Pandas, Anaconda),
and &lt;a class="reference external" href="https://github.com/quasiben/"&gt;Ben Zaitlen&lt;/a&gt; (NVIDIA, RAPIDS, Dask))&lt;/em&gt;&lt;/p&gt;
&lt;video width="560" height="315" controls&gt;
    &lt;source src="https://developer.download.nvidia.com/video/gputechconf/gtc/2019/video/S9679/s9679-ucx-python-a-flexible-communication-library-for-python-applications.mp4"
            type="video/mp4"&gt;
&lt;/video&gt;
&lt;p&gt;We then &lt;a class="reference external" href="https://github.com/dask/distributed/blob/master/distributed/comm/ucx.py"&gt;extended Dask communications to optionally use UCX&lt;/a&gt;.
If you have UCX and UCX-Py installed, then you can use the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ucx://&lt;/span&gt;&lt;/code&gt; protocol in
addresses or the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--protocol&lt;/span&gt; &lt;span class="pre"&gt;ucx&lt;/span&gt;&lt;/code&gt; flag when starting things up, something like
this.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ dask-scheduler --protocol ucx
Scheduler started at ucx://127.0.0.1:8786

$ dask-worker ucx://127.0.0.1:8786
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ucx://127.0.0.1:8786&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/09/ucx-dgx.md&lt;/span&gt;, line 95)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="experiment"&gt;
&lt;h1&gt;Experiment&lt;/h1&gt;
&lt;p&gt;We modified our &lt;a class="reference external" href="https://github.com/mrocklin/dask-gpu-benchmarks/blob/master/cupy-svd.ipynb"&gt;SVD with Dask and CuPy
benchmark&lt;/a&gt;
benchmark to use the UCX protocol for inter-process communication and ran it on
half of a DGX machine, using four GPUs. Here is a minimal implementation of the
UCX-enabled code:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cuda&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DGX&lt;/span&gt;

&lt;span class="c1"&gt;# Define DGX cluster and client&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DGX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create random data&lt;/span&gt;
&lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;persist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Perform distributed SVD&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;persist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;By using UCX the overall communication times are reduced by an order of
magnitude. To produce the task-stream figures below, the benchmark was run on a
DGX-1 with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;CUDA_VISIBLE_DEVICES=[0,1,2,3]&lt;/span&gt;&lt;/code&gt;. It is clear that the red task
bars, corresponding to inter-process communication, are significantly
compressed. Communications that were taking 500ms-1s before now take around 20ms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Before UCX&lt;/strong&gt;:&lt;/p&gt;
&lt;iframe src="https://matthewrocklin.com/raw-host/task_stream_lcc_dgx16.html" width="100%" height="200"&gt;&lt;/iframe&gt;
&lt;p&gt;&lt;strong&gt;After UCX&lt;/strong&gt;:&lt;/p&gt;
&lt;iframe src="https://matthewrocklin.com/raw-host/task_stream_dgx_dgx16.html" width="100%" height="200"&gt;&lt;/iframe&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/09/ucx-dgx.md&lt;/span&gt;, line 139)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="diving-into-the-details"&gt;
&lt;h1&gt;Diving into the Details&lt;/h1&gt;
&lt;p&gt;On a GPU using NVLink we can get somewhere between 5-10 GB/s throughput between
pairs of GPUs. On a CPU this drops down to 1-2 GB/s (which seems well below
optimal).
These speeds can affect all Dask workloads (array, dataframe, xarray, ML, …),
but when the proper hardware is present, other bottlenecks may occur,
such as serialization when dealing with text or JSON-like data.&lt;/p&gt;
&lt;p&gt;This of course, depends on this fancy networking hardware being present.
On the GPU example above we’re mostly relying on NVLink, but we would also get
improved performance on an HPC InfiniBand network or even on a single laptop
machine using shared memory transports.&lt;/p&gt;
&lt;p&gt;The examples above was run on a DGX machine, which includes all of these
transports and more (as well as numerous GPUs).&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/09/ucx-dgx.md&lt;/span&gt;, line 156)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="dgx"&gt;
&lt;h1&gt;DGX&lt;/h1&gt;
&lt;p&gt;The test machine used above was a
&lt;a class="reference external" href="https://www.nvidia.com/en-us/data-center/dgx-1/"&gt;DGX-1&lt;/a&gt;, which has eight GPUs,
two CPU sockets, four Infiniband network cards, and a complex NVLink
arrangement. This is a good example of non-uniform hardware. Certain CPUs
are closer to certain GPUs and network cards, and understanding this proximity
has an order-of-magnitude effect on performance. This situation isn’t unique
to DGX machines. The same situation arises when we have …&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Multiple workers in one node, with several nodes in a cluster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple nodes in one rack, with several racks in a data center&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple data centers, such as is the case with hybrid cloud&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Working with the DGX was interesting because it forced us to start thinking
about heterogeneity, and making it easier to specify complex deployment scenarios
with Dask.&lt;/p&gt;
&lt;p&gt;Here is a diagram showing how the GPUs, CPUs, and Infiniband
cards are connected to each other in a DGX-1:&lt;/p&gt;
&lt;a href="https://docs.nvidia.com/dgx/bp-dgx/index.html#networking"&gt;
  &lt;img src="https://docs.nvidia.com/dgx/bp-dgx/graphics/networks.png"
         width="100%" /&gt;
&lt;/a&gt;
&lt;p&gt;And here the output of nvidia-smi showing the NVLink, networking, and CPU affinity
structure (this is mostly orthogonal to the structure displayed above).&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ nvidia-smi  topo -m
     GPU0  GPU1  GPU2  GPU3  GPU4  GPU5  GPU6  GPU7   ib0   ib1   ib2   ib3
GPU0   X    NV1   NV1   NV2   NV2   SYS   SYS   SYS   PIX   SYS   PHB   SYS
GPU1  NV1    X    NV2   NV1   SYS   NV2   SYS   SYS   PIX   SYS   PHB   SYS
GPU2  NV1   NV2    X    NV2   SYS   SYS   NV1   SYS   PHB   SYS   PIX   SYS
GPU3  NV2   NV1   NV2    X    SYS   SYS   SYS   NV1   PHB   SYS   PIX   SYS
GPU4  NV2   SYS   SYS   SYS    X    NV1   NV1   NV2   SYS   PIX   SYS   PHB
GPU5  SYS   NV2   SYS   SYS   NV1    X    NV2   NV1   SYS   PIX   SYS   PHB
GPU6  SYS   SYS   NV1   SYS   NV1   NV2    X    NV2   SYS   PHB   SYS   PIX
GPU7  SYS   SYS   SYS   NV1   NV2   NV1   NV2    X    SYS   PHB   SYS   PIX
ib0   PIX   PIX   PHB   PHB   SYS   SYS   SYS   SYS    X    SYS   PHB   SYS
ib1   SYS   SYS   SYS   SYS   PIX   PIX   PHB   PHB   SYS    X    SYS   PHB
ib2   PHB   PHB   PIX   PIX   SYS   SYS   SYS   SYS   PHB   SYS    X    SYS
ib3   SYS   SYS   SYS   SYS   PHB   PHB   PIX   PIX   SYS   PHB   SYS    X

    CPU Affinity
GPU0  0-19,40-59
GPU1  0-19,40-59
GPU2  0-19,40-59
GPU3  0-19,40-59
GPU4  20-39,60-79
GPU5  20-39,60-79
GPU6  20-39,60-79
GPU7  20-39,60-79

Legend:

  X    = Self
  SYS  = Traverse PCIe as well as the SMP interconnect between NUMA nodes
  NODE = Travrese PCIe as well as the interconnect between PCIe Host Bridges
  PHB  = Traverse PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Traverse multiple PCIe switches (without PCIe Host Bridge)
  PIX  = Traverse a single PCIe switch
  NV#  = Traverse a bonded set of # NVLinks
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The DGX was originally designed for deep learning
applications. The complex network infrastructure above can be well used by
specialized NVIDIA networking libraries like
&lt;a class="reference external" href="https://developer.nvidia.com/nccl"&gt;NCCL&lt;/a&gt;, which knows how to route things
correctly, but is something of a challenge for other more general purpose
systems like Dask to adapt to.&lt;/p&gt;
&lt;p&gt;Fortunately, in meeting this challenge we were able to clean up a number of
related issues in Dask. In particular we can now:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Specify a more heterogeneous worker configuration when starting up a local cluster
&lt;a class="reference external" href="https://github.com/dask/distributed/pull/2675"&gt;dask/distributed #2675&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Learn bandwidth over time
&lt;a class="reference external" href="https://github.com/dask/distributed/pull/2658"&gt;dask/distributed #2658&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add Worker plugins to help handle things like CPU affinity (though this is
quite general)
&lt;a class="reference external" href="https://github.com/dask/distributed/pull/2453"&gt;dask/distributed #2453&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With these changes we’re now able to describe most of the DGX structure as
configuration in the Python function below:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;os&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Nanny&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SpecCluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Scheduler&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;distributed.worker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TOTAL_MEMORY&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cuda.local_cuda_cluster&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cuda_visible_devices&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;CPUAffinity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; A Worker plugin to pin CPU affinity &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cores&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cores&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;setup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sched_setaffinity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;affinity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  &lt;span class="c1"&gt;# See nvidia-smi topo -m&lt;/span&gt;
    &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;79&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;79&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;79&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;79&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;DGX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ib&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dashboard_address&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;:8787&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;threads_per_worker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;silence_logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; A Local Cluster for a DGX 1 machine&lt;/span&gt;

&lt;span class="sd"&gt;    NVIDIA&amp;#39;s DGX-1 machine has a complex architecture mapping CPUs,&lt;/span&gt;
&lt;span class="sd"&gt;    GPUs, and network hardware.  This function creates a local cluster&lt;/span&gt;
&lt;span class="sd"&gt;    that tries to respect this hardware as much as possible.&lt;/span&gt;

&lt;span class="sd"&gt;    It creates one Dask worker process per GPU, and assigns each worker&lt;/span&gt;
&lt;span class="sd"&gt;    process the correct CPU cores and Network interface cards to&lt;/span&gt;
&lt;span class="sd"&gt;    maximize performance.&lt;/span&gt;

&lt;span class="sd"&gt;    That being said, things aren&amp;#39;t perfect.  Today a DGX has very high&lt;/span&gt;
&lt;span class="sd"&gt;    performance between certain sets of GPUs and not others.  A Dask DGX&lt;/span&gt;
&lt;span class="sd"&gt;    cluster that uses only certain tightly coupled parts of the computer&lt;/span&gt;
&lt;span class="sd"&gt;    will have significantly higher bandwidth than a deployment on the&lt;/span&gt;
&lt;span class="sd"&gt;    entire thing.&lt;/span&gt;

&lt;span class="sd"&gt;    Parameters&lt;/span&gt;
&lt;span class="sd"&gt;    ----------&lt;/span&gt;
&lt;span class="sd"&gt;    interface: str&lt;/span&gt;
&lt;span class="sd"&gt;        The interface prefix for the infiniband networking cards.  This is&lt;/span&gt;
&lt;span class="sd"&gt;        often &amp;quot;ib&amp;quot;` or &amp;quot;bond&amp;quot;.  We will add the numeric suffix 0,1,2,3 as&lt;/span&gt;
&lt;span class="sd"&gt;        appropriate.  Defaults to &amp;quot;ib&amp;quot;.&lt;/span&gt;
&lt;span class="sd"&gt;    dashboard_address: str&lt;/span&gt;
&lt;span class="sd"&gt;        The address for the scheduler dashboard.  Defaults to &amp;quot;:8787&amp;quot;.&lt;/span&gt;
&lt;span class="sd"&gt;    CUDA_VISIBLE_DEVICES: str&lt;/span&gt;
&lt;span class="sd"&gt;        String like ``&amp;quot;0,1,2,3&amp;quot;`` or ``[0, 1, 2, 3]`` to restrict&lt;/span&gt;
&lt;span class="sd"&gt;        activity to different GPUs&lt;/span&gt;

&lt;span class="sd"&gt;    Examples&lt;/span&gt;
&lt;span class="sd"&gt;    --------&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;gt;&amp;gt;&amp;gt; from dask_cuda import DGX&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;gt;&amp;gt;&amp;gt; from dask.distributed import Client&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;gt;&amp;gt;&amp;gt; cluster = DGX(interface=&amp;#39;ib&amp;#39;)&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;gt;&amp;gt;&amp;gt; client = Client(cluster)&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;CUDA_VISIBLE_DEVICES&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;0,1,2,3,4,5,6,7&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;,&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;memory_limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOTAL_MEMORY&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;

    &lt;span class="n"&gt;spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;cls&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Nanny&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;options&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;env&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="s2"&gt;&amp;quot;CUDA_VISIBLE_DEVICES&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cuda_visible_devices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;ii&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="s2"&gt;&amp;quot;UCX_TLS&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rc,cuda_copy,cuda_ipc&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;interface&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;protocol&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;ucx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;ncores&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;threads_per_worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;data&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;preload&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;dask_cuda.initialize_context&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;dashboard_address&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;:0&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;plugins&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CPUAffinity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;affinity&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])],&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;silence_logs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;silence_logs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s2"&gt;&amp;quot;memory_limit&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;memory_limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ii&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;scheduler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;cls&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;options&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;interface&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;protocol&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;ucx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;dashboard_address&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dashboard_address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;SpecCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;silence_logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;silence_logs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;However, we never got the NVLink structure down. The Dask scheduler currently
still assumes uniform bandwidths between workers. We’ve started to make small
steps towards changing this, but we’re not there yet (this will be useful as
well for people that want to think about in-rack or cross-data-center
deployments).&lt;/p&gt;
&lt;p&gt;As usual, in solving a highly specific problem, we were able to solve a number
of lingering general features, which then made our specific problem easy to
write down.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/09/ucx-dgx.md&lt;/span&gt;, line 373)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="future-work"&gt;
&lt;h1&gt;Future Work&lt;/h1&gt;
&lt;p&gt;There has been significant effort over the last few months make everything
above work. In particular we …&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Modified UCX to support client-server workloads&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wrapped UCX with UCX-Py and design a Python async-await friendly interface&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wrapped UCX-Py with Dask&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hooked everything together to make generic workloads function well&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result is quite nice, especially for more communication heavy workloads.
However there is still plenty to do. This section details what we’re thinking
about now to continue this work.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Routing within complex networks&lt;/strong&gt;:
If you restrict yourself to four of the eight GPUs in a DGX, you can get 5-12 GB/s
between pairs of GPUs. For some workloads this can be significant. It
makes the system feel much more like a single unit than a bunch of isolated
machines.&lt;/p&gt;
&lt;p&gt;However we still can’t get great performance across the whole DGX because
there are many GPU-pairs that are not connected by NVLink, and so we get 10x
slower speeds. These dominate communication costs if you naively try to use
the full DGX.&lt;/p&gt;
&lt;p&gt;This might be solved either by:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Teaching Dask to avoid these communications&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Teaching UCX to route communications like these through a chain of
multiple NVLink connections&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Avoiding complex networks altogether. Newer systems like the DGX-2 use
NVSwitch, which provides uniform connectivity, with each GPU connected
to every other GPU.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Edit&lt;/em&gt;: I’ve since learned that UCX should be able to handle this. We
should still get PCIe speeds (around 4-7 GB/s) even when we don’t have
NVLink once an upstream bug gets fixed. Hooray!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CPU:&lt;/strong&gt; We can get 1-2 GB/s across InfiniBand, which isn’t bad, but also
wasn’t the full 5-8 GB/s that we were hoping for. This deserves more serious
profiling to determine what is going wrong. The current guess is that this
has to do with memory allocations.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000000000&lt;/span&gt;  &lt;span class="c1"&gt;# 1 GB&lt;/span&gt;
&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mi"&gt;248&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;223&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;472&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;470&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;   &lt;span class="c1"&gt;# &amp;lt;&amp;lt;----- Around 2 GB/s.  Slower than I expected&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Probably we’re just doing something dumb here.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Package UCX:&lt;/strong&gt; Currently I’m building the UCX and UCX-Py libraries from
source (see appendix below for instructions). Ideally these would become
conda packages. &lt;a class="reference external" href="https://github.com/jakirkham"&gt;John Kirkham&lt;/a&gt; (Conda Forge,
NVIDIA, Dask) is taking a look at this along with the UCX developers from
Mellanox.&lt;/p&gt;
&lt;p&gt;See &lt;a class="reference external" href="https://github.com/rapidsai/ucx-py/issues/65"&gt;ucx-py #65&lt;/a&gt; for
more information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Learn Heterogeneous Bandwidths:&lt;/strong&gt; In order to make good scheduling
decisions Dask needs to estimate how long it will take to move data between
machines. This question is now becoming much more complex, and depends on
both the source and destination machines (the network topology) the data
type (NumPy array, GPU array, Pandas Dataframe with text) and more. In
complex situations our bandwidths can span a 100x range (100 MB/s to 10
GB/s).&lt;/p&gt;
&lt;p&gt;Dask will have to develop more complex models for bandwidth, and
learn these over time.&lt;/p&gt;
&lt;p&gt;See &lt;a class="reference external" href="https://github.com/dask/distributed/issues/2743"&gt;dask/distributed
#2743&lt;/a&gt; for more
information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Support other GPU libraries:&lt;/strong&gt; To send GPU data around we need to teach
Dask how to serialize Python objects into GPU buffers. There is code in
the dask/distributed repository to do this for Numba, CuPy, and RAPIDS cuDF
objects, but we’ve really only tested CuPy seriously. We should expand
this by some of the following steps:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Try a distributed Dask cuDF join computation&lt;/p&gt;
&lt;p&gt;See &lt;a class="reference external" href="https://github.com/dask/distributed/pull/2746"&gt;dask/distributed #2746&lt;/a&gt; for initial work here.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Teach Dask to serialize array GPU libraries, like PyTorch and
TensorFlow, or possibly anything that supports the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__cuda_array_interface__&lt;/span&gt;&lt;/code&gt; protocol.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Track down communication failures:&lt;/strong&gt; We still occasionally get
unexplained communication failures. We should stress test this system to
discover rough corners.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;TCP&lt;/strong&gt;: Groups with high performing TCP networks can’t yet make use of UCX+Dask (though they can use either one individually).&lt;/p&gt;
&lt;p&gt;Currently using UCX in a client-server mode as we’re doing with
Dask requires access to RDMA libraries, which are often not found on systems
without networking systems like InfiniBand. This means that groups with
high performing TCP networks can’t make use of UCX+Dask.&lt;/p&gt;
&lt;p&gt;This is in progress at &lt;a class="reference external" href="https://github.com/openucx/ucx/pull/3570"&gt;openucx/ucx #3570&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Commodity Hardware&lt;/strong&gt;: Currently this code is only really useful on
high performance Linux systems that have InfiniBand or NVLink. However,
it would be nice to also use this on more commodity systems, including
personal laptop computers using TCP and shared memory.&lt;/p&gt;
&lt;p&gt;Currently Dask uses TCP for inter-process communication on a single machine.
Using UCX on a personal computer would give us access to shared memory
speeds, which tend to be an order of magnitude faster.&lt;/p&gt;
&lt;p&gt;See &lt;a class="reference external" href="https://github.com/openucx/ucx/issues/3663"&gt;openucx/ucx #3663&lt;/a&gt; for more
information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tune Performance:&lt;/strong&gt; The 5-10 GB/s bandwidths that we see with NVLink
today are sub-optimal. With UCX-Py alone we’re able to get something like
15 GB/s on large message tranfers. We should benchmark and tune our
implementation to see what is taking up the extra time. Until things work
more robustly though, this is a secondary priority.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/06/09/ucx-dgx.md&lt;/span&gt;, line 493)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="appendix-setup"&gt;
&lt;h1&gt;Appendix: Setup&lt;/h1&gt;
&lt;p&gt;Performing these experiments depends currently on development branches in a few
repositories. This section includes my current setup.&lt;/p&gt;
&lt;section id="create-conda-environment"&gt;
&lt;h2&gt;Create Conda Environment&lt;/h2&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;conda&lt;/span&gt; &lt;span class="n"&gt;create&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="n"&gt;ucx&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.7&lt;/span&gt; &lt;span class="n"&gt;libtool&lt;/span&gt; &lt;span class="n"&gt;cmake&lt;/span&gt; &lt;span class="n"&gt;automake&lt;/span&gt; &lt;span class="n"&gt;autoconf&lt;/span&gt; &lt;span class="n"&gt;cython&lt;/span&gt; &lt;span class="n"&gt;bokeh&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt; &lt;span class="n"&gt;pkg&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="n"&gt;ipython&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note: for some reason using conda-forge makes the autogen step below fail.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="set-up-ucx"&gt;
&lt;h2&gt;Set up UCX&lt;/h2&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;# Clone UCX repository and get branch
git clone https://github.com/openucx/ucx
cd ucx
git remote add Akshay-Venkatesh git@github.com:Akshay-Venkatesh/ucx.git
git remote update Akshay-Venkatesh
git checkout ucx-cuda

# Build
git clean -xfd
export CUDA_HOME=/usr/local/cuda-9.2/
./autogen.sh
mkdir build
cd build
../configure --prefix=$CONDA_PREFIX --enable-debug --with-cuda=$CUDA_HOME --enable-mt --disable-cma CPPFLAGS=&amp;quot;-I//usr/local/cuda-9.2/include&amp;quot;
make -j install

# Verify
ucx_info -d
which ucx_info  # verify that this is in the conda environment

# Verify that we see NVLink speeds
ucx_perftest -t tag_bw -m cuda -s 1048576 -n 1000 &amp;amp; ucx_perftest dgx15 -t tag_bw -m cuda -s 1048576 -n 1000
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="set-up-ucx-py"&gt;
&lt;h2&gt;Set up UCX-Py&lt;/h2&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;git clone git@github.com:rapidsai/ucx-py
cd ucx-py

export UCX_PATH=$CONDA_PREFIX
make install
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="set-up-dask"&gt;
&lt;h2&gt;Set up Dask&lt;/h2&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;clone&lt;/span&gt; &lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="nd"&gt;@github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;
&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;
&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;clone&lt;/span&gt; &lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="nd"&gt;@github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;distributed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;
&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="n"&gt;distributed&lt;/span&gt;
&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="optionally-set-up-cupy"&gt;
&lt;h2&gt;Optionally set up cupy&lt;/h2&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cuda92&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="optionally-set-up-cudf"&gt;
&lt;h2&gt;Optionally set up cudf&lt;/h2&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;conda&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="n"&gt;rapidsai&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;nightly&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="n"&gt;conda&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;forge&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt; &lt;span class="n"&gt;cudf&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cudf&lt;/span&gt; &lt;span class="n"&gt;cudatoolkit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;9.2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="optionally-set-up-jupyterlab"&gt;
&lt;h2&gt;Optionally set up JupyterLab&lt;/h2&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;conda&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;ipykernel&lt;/span&gt; &lt;span class="n"&gt;jupyterlab&lt;/span&gt; &lt;span class="n"&gt;nb_conda_kernels&lt;/span&gt; &lt;span class="n"&gt;nodejs&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For the Dask dashboard&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;dask_labextension&lt;/span&gt;
&lt;span class="n"&gt;jupyter&lt;/span&gt; &lt;span class="n"&gt;labextension&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;labextension&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="my-benchmark"&gt;
&lt;h2&gt;My Benchmark&lt;/h2&gt;
&lt;p&gt;I’ve been using the following benchmark to test communication. It allocates a
chunked Dask array, and then adds it to its transpose, which forces a lot of
communication, but not much computation.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;collections&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pprint&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;distributed.utils&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;format_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;format_bytes&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;

    &lt;span class="c1"&gt;# Set up workers on the local machine&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;DGX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asynchronous&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;silence_logs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;asynchronous&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

            &lt;span class="c1"&gt;# Create a simple random array&lt;/span&gt;
            &lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;40000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;128 MiB&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;persist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;npartitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;chunks&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Add X to its transpose, forcing computation&lt;/span&gt;
            &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Collect, aggregate, and print peer-to-peer bandwidths&lt;/span&gt;
            &lt;span class="n"&gt;incoming_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;dask_worker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dask_worker&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;incoming_transfer_log&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;bandwidths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;incoming_logs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;total&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;bandwidths&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;who&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;bandwidth&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;bandwidths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;w1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;w2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;format_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/s&amp;#39;&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;])]&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bandwidths&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;pprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bandwidths&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_until_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Note: most of this example is just getting back diagnostics, which can be
easily ignored. Also, you can drop the async/await code if you like. I think
that there should probably be more examples in the world using Dask with
async/await syntax, so I decided to leave it in.&lt;/em&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/06/09/ucx-dgx/"/>
    <summary>This post is about experimental and rapidly changing software.
Code examples in this post should not be relied upon to work in the future.</summary>
    <category term="python" label="python"/>
    <category term="scipy" label="scipy"/>
    <published>2019-06-09T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/04/09/numba-stencil/</id>
    <title>Composing Dask Array with Numba Stencils</title>
    <updated>2019-04-09T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;p&gt;In this post we explore four array computing technologies, and how they
work together to achieve powerful results.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Numba’s stencil decorator to craft localized compute kernels&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Numba’s Just-In-Time (JIT) compiler for array computing in Python&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask Array for parallelizing array computations across many chunks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NumPy’s Generalized Universal Functions (gufuncs) to tie everything
together&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the end we’ll show how a novice developer can write a small amount of Python
to efficiently compute localized computation on large amounts of data. In
particular we’ll write a simple function to smooth images and apply that in
parallel across a large stack of images.&lt;/p&gt;
&lt;p&gt;Here is the full code, we’ll dive into it piece by piece below.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numba&lt;/span&gt;

&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stencil&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;


&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guvectorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:])],&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;(n, m) -&amp;gt; (n, m)&amp;#39;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# If you want fake data&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;auto&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;int8&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# If you have actual data&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask_image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/path/to/*.png&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# dask.array&amp;lt;transpose, shape=(1000000, 1000, 1000), dtype=int8, chunksize=(125, 1000, 1000)&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note: the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;smooth&lt;/span&gt;&lt;/code&gt; function above is more commonly referred to as the 2D mean filter in the image processing community.&lt;/p&gt;
&lt;p&gt;Now, lets break this down a bit&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/04/09/numba-stencil.md&lt;/span&gt;, line 59)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="numba-stencils"&gt;

&lt;p&gt;&lt;strong&gt;Docs:&lt;/strong&gt;: https://numba.pydata.org/numba-doc/dev/user/stencil.html&lt;/p&gt;
&lt;p&gt;Many array computing functions operate only on a local region of the array.
This is common in image processing, signals processing, simulation, the
solution of differential equations, anomaly detection, time series analysis,
and more. Typically we write code that looks like the following:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Or something similar. The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;numba.stencil&lt;/span&gt;&lt;/code&gt; decorator makes this a bit easier to
write down. You just write down what happens on every element, and Numba
handles the rest.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stencil&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/04/09/numba-stencil.md&lt;/span&gt;, line 92)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="numba-jit"&gt;
&lt;h1&gt;Numba JIT&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Docs:&lt;/strong&gt; http://numba.pydata.org/&lt;/p&gt;
&lt;p&gt;When we run this function on a NumPy array, we find that it is slow, operating
at Python speeds.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;timeit&lt;/span&gt; &lt;span class="n"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;527&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="err"&gt;±&lt;/span&gt; &lt;span class="mf"&gt;44.1&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="n"&gt;per&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="err"&gt;±&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But if we JIT compile this function with Numba, then it runs more quickly.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;njit&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt; &lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;70.8&lt;/span&gt; &lt;span class="n"&gt;µs&lt;/span&gt; &lt;span class="err"&gt;±&lt;/span&gt; &lt;span class="mf"&gt;6.38&lt;/span&gt; &lt;span class="n"&gt;µs&lt;/span&gt; &lt;span class="n"&gt;per&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="err"&gt;±&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For those counting, that’s over 1000x faster!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: this function already exists as &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scipy.ndimage.uniform_filter&lt;/span&gt;&lt;/code&gt;, which
operates at the same speed.&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/04/09/numba-stencil.md&lt;/span&gt;, line 121)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="dask-array"&gt;
&lt;h1&gt;Dask Array&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Docs:&lt;/strong&gt; https://docs.dask.org/en/latest/array.html&lt;/p&gt;
&lt;p&gt;In these applications people often have many such arrays and they want to apply
this function over all of them. In principle they could do this with a for
loop.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;glob&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;skimage.io&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/path/to/*.png&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skimage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;skimage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imsave&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;.png&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;.out.png&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If they wanted to then do this in parallel they would maybe use the
multiprocessing or concurrent.futures modules. If they wanted to do this
across a cluster then they could rewrite their code with PySpark or some other
system.&lt;/p&gt;
&lt;p&gt;Or, they could use Dask array, which will handle both the pipelining and the
parallelism (single machine or on a cluster) all while still looking mostly
like a NumPy array.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask_image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/path/to/*.png&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# a large lazy array of all of our images&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;int8&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And then because each of the chunks of a Dask array are just NumPy arrays, we
can use the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; function to apply this function across all of our
images, and then save them out.&lt;/p&gt;
&lt;p&gt;This is fine, but lets go a bit further, and discuss generalized universal
functions from NumPy.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/04/09/numba-stencil.md&lt;/span&gt;, line 161)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="generalized-universal-functions"&gt;
&lt;h1&gt;Generalized Universal Functions&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Numba Docs:&lt;/strong&gt; https://numba.pydata.org/numba-doc/dev/user/vectorize.html&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NumPy Docs:&lt;/strong&gt; https://docs.scipy.org/doc/numpy-1.16.0/reference/c-api.generalized-ufuncs.html&lt;/p&gt;
&lt;p&gt;A generalized universal function (gufunc) is a normal function that has been
annotated with typing and dimension information. For example we can redefine
our &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;smooth&lt;/span&gt;&lt;/code&gt; function as a gufunc as follows:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guvectorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:])],&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;(n, m) -&amp;gt; (n, m)&amp;#39;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This function knows that it consumes a 2d array of int8’s and produces a 2d
array of int8’s of the same dimensions.&lt;/p&gt;
&lt;p&gt;This sort of annotation is a small change, but it gives other systems like Dask
enough information to use it intelligently. Rather than call functions like
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt;, we can just use the function directly, as if our Dask Array was
just a very large NumPy array.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Before gufuncs&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;int8&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After gufuncs&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is nice. If you write library code with gufunc semantics then that code
just works with systems like Dask, without you having to build in explicit
support for parallel computing. This makes the lives of users much easier.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/04/09/numba-stencil.md&lt;/span&gt;, line 200)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="finished-result"&gt;
&lt;h1&gt;Finished result&lt;/h1&gt;
&lt;p&gt;Lets see the full example one more time.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numba&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stencil&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;


&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guvectorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:])],&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;(n, m) -&amp;gt; (n, m)&amp;#39;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;auto&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;int8&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This code is decently approachable by novice users. They may not understand
the internal details of gufuncs or Dask arrays or JIT compilation, but they can
probably copy-paste-and-modify the example above to suit their needs.&lt;/p&gt;
&lt;p&gt;The parts that they do want to change are easy to change, like the stencil
computation, and creating an array of their own data.&lt;/p&gt;
&lt;p&gt;This workflow is efficient and scalable, using low-level compiled code and
potentially clusters of thousands of computers.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/04/09/numba-stencil.md&lt;/span&gt;, line 236)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-could-be-better"&gt;
&lt;h1&gt;What could be better&lt;/h1&gt;
&lt;p&gt;There are a few things that could make this workflow nicer.&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;It would be nice not to have to specify dtypes in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;guvectorize&lt;/span&gt;&lt;/code&gt;, but
instead specialize to types as they arrive.
&lt;a class="reference external" href="https://github.com/numba/numba/issues/2979"&gt;numba/numba #2979&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support GPU accelerators for the stencil computations using
&lt;a class="reference external" href="https://numba.pydata.org/numba-doc/dev/cuda/index.html"&gt;numba.cuda.jit&lt;/a&gt;.
Stencil computations are obvious candidates for GPU acceleration, and this
is a good accessible point where novice users can specify what they want in
a way that is sufficiently constrained for automated systems to rewrite it
as CUDA somewhat easily.
&lt;a class="reference external" href="https://github.com/numba/numba/issues/3915"&gt;numba/numba 3915&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It would have been nicer to be able to apply the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;&amp;#64;guvectorize&lt;/span&gt;&lt;/code&gt; decorator
directly on top of the stencil function like this.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guvectorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stencil&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Rather than have two functions.
&lt;a class="reference external" href="https://github.com/numba/numba/issues/3914"&gt;numba/numba #3914&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You may have noticed that our guvectorize function had to assign its result into an
out parameter.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guvectorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:])],&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;(n, m) -&amp;gt; (n, m)&amp;#39;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It would have been nicer, perhaps, to just return the output&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/numba/numba/issues/3916"&gt;numba/numba #3916&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The dask-image library could use a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;imsave&lt;/span&gt;&lt;/code&gt; function&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-image/issues/110"&gt;dask/dask-image #110&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/04/09/numba-stencil.md&lt;/span&gt;, line 290)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="aspirational-result"&gt;
&lt;h1&gt;Aspirational Result&lt;/h1&gt;
&lt;p&gt;With all of these, we might then be able to write the code above as follows&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# This is aspirational&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numba&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;

&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guvectorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:])],&lt;/span&gt;
    &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;(n, m) -&amp;gt; (n, m)&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;gpu&amp;#39;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@numba&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stencil&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask_image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/path/to/*.png&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smooth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dask_image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imsave&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/path/to/out/*.png&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/04/09/numba-stencil.md&lt;/span&gt;, line 316)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="update-now-with-gpus"&gt;
&lt;h1&gt;Update: Now with GPUs!&lt;/h1&gt;
&lt;p&gt;After writing this blogpost I did a small update where I used
&lt;a class="reference external" href="https://numba.pydata.org/numba-doc/dev/cuda/index.html"&gt;numba.cuda.jit&lt;/a&gt;
to implement the same smooth function on a GPU to achieve a 200x speedup with
only a modest increase to code complexity.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/mrocklin/9272bf84a8faffdbbe2cd44b4bc4ce3c"&gt;That notebook is here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/04/09/numba-stencil/"/>
    <summary>In this post we explore four array computing technologies, and how they
work together to achieve powerful results.</summary>
    <category term="dask" label="dask"/>
    <category term="numba" label="numba"/>
    <published>2019-04-09T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/03/27/dask-cuml/</id>
    <title>cuML and Dask hyperparameter optimization</title>
    <updated>2019-03-27T00:00:00+00:00</updated>
    <author>
      <name>Benjamin Zaitlen</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/27/dask-cuml.md&lt;/span&gt;, line 10)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="setup"&gt;

&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;DGX-1 Workstation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Host Memory: 512 GB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GPU Tesla V100 x 8&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cudf 0.6&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cuml 0.6&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;dask 1.1.4&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/quasiben/a96ce952b7eb54356f7f8390319473e4"&gt;Jupyter notebook&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;TLDR; Hyper-parameter Optimization is functional but slow with cuML&lt;/strong&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/27/dask-cuml.md&lt;/span&gt;, line 22)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="cuml-and-dask-hyper-parameter-optimization"&gt;
&lt;h1&gt;cuML and Dask Hyper-parameter Optimization&lt;/h1&gt;
&lt;p&gt;cuML is an open source GPU accelerated machine learning library primarily
developed at NVIDIA which mirrors the &lt;a class="reference external" href="https://scikit-learn.org/"&gt;Scikit-Learn API&lt;/a&gt;.
The current suite of algorithms includes GLMs, Kalman Filtering, clustering,
and dimensionality reduction. Many of these machine learning algorithms use
hyper-parameters. These are parameters used during the model training process
but are not “learned” during the training. Often these parameters are
coefficients or penalty thresholds and finding the “best” hyper parameter can be
computationally costly. In the PyData community, we often reach to Scikit-Learn’s
&lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html"&gt;GridSearchCV&lt;/a&gt;
or
&lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV"&gt;RandomizedSearchCV&lt;/a&gt;
for easy definition of the search space for hyper-parameters – this is called hyper-parameter
optimization. Within the Dask community, &lt;a class="reference external" href="https://dask-ml.readthedocs.io/en/latest/"&gt;Dask-ML&lt;/a&gt; has incrementally improved the efficiency of hyper-parameter optimization by leveraging both Scikit-Learn and Dask to use multi-core and
distributed schedulers: &lt;a class="reference external" href="https://dask-ml.readthedocs.io/en/latest/hyper-parameter-search.html"&gt;Grid and RandomizedSearch with DaskML&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With the newly created drop-in replacement for Scikit-Learn, cuML, we experimented with Dask’s GridSearchCV. In the upcoming 0.6 release of cuML, the estimators are serializable and are functional within the Scikit-Learn/dask-ml framework, but slow compared with Scikit-Learn estimators. And while speeds are slow now, we know how to boost performance, have filed several issues, and hope to show performance gains in future releases.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;All code and timing measurements can be found in this &lt;a class="reference external" href="https://gist.github.com/quasiben/a96ce952b7eb54356f7f8390319473e4"&gt;Jupyter notebook&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/27/dask-cuml.md&lt;/span&gt;, line 43)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="fast-fitting"&gt;
&lt;h1&gt;Fast Fitting!&lt;/h1&gt;
&lt;p&gt;cuML is fast! But finding that speed requires developing a bit of GPU knowledge and some
intuition. For example, there is a non-zero cost of moving data from device to GPU and, when data is “small” there are little to no performance gains. “Small”, currently might mean less than 100MB.&lt;/p&gt;
&lt;p&gt;In the following example we use the &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html"&gt;diabetes data&lt;/a&gt;
set provided by sklearn and linearly fit the data with &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html"&gt;RidgeRegression&lt;/a&gt;&lt;/p&gt;
&lt;div class="math notranslate nohighlight"&gt;
\[ \min\limits_w ||y - Xw||^2_2 + alpha \* ||w||^2_2\]&lt;/div&gt;
&lt;p&gt;&lt;a class="reference external" href="https://scikit-learn.org/stable/auto_examples/linear_model/plot_ridge_path.html"&gt;&lt;strong&gt;alpha&lt;/strong&gt;&lt;/a&gt; is the hyper-parameter and we initially set to 1.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cuml&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Ridge&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cumlRidge&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_ml.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dcv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linear_model&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.externals.joblib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;parallel_backend&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GridSearchCV&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diabetes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;diabetes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fit_intercept&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
&lt;span class="n"&gt;normalize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;
&lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;ridge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;linear_model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;cholesky&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cu_ridge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cumlRidge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;eig&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ridge&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cu_ridge&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The above ran with a single timing measurement of:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Scikit-Learn Ridge: 28 ms&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cuML Ridge: 1.12 s&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the data is quite small, ~28KB. Increasing the size to ~2.8GB and re-running we see significant gains:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;dup_ridge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;linear_model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;cholesky&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dup_cu_ridge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cumlRidge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;eig&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# move data from host to device&lt;/span&gt;
&lt;span class="n"&gt;record_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;fea&lt;/span&gt;&lt;span class="si"&gt;%d&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dup_data&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dup_data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;span class="n"&gt;gdf_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;gdf_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dup_train&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;#sklearn&lt;/span&gt;
&lt;span class="n"&gt;dup_ridge&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dup_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dup_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# cuml&lt;/span&gt;
&lt;span class="n"&gt;dup_cu_ridge&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gdf_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gdf_train&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;With new timing measurements of:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Scikit-Learn Ridge: 4.82 s ± 694 ms&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cuML Ridge: 450 ms ± 47.6 ms&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With more data we clearly see faster fitting times, but the time to move data to the GPU (through CUDF)
was 19.7s. This cost of data movement is one of the reasons why RAPIDS/cuDF was developed – keep data
on the GPU and avoid having to move back and forth.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/27/dask-cuml.md&lt;/span&gt;, line 108)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="hyper-parameter-optimization-experiments"&gt;
&lt;h1&gt;Hyper-Parameter Optimization Experiments&lt;/h1&gt;
&lt;p&gt;So moving to the GPU can be costly, but once there, with larger data sizes, we gain significant performance
optimizations. Naively, we thought, “well, we have GPU machine learning, we have distributed hyper-parameter optimization…
we &lt;em&gt;should&lt;/em&gt; have distributed, GPU-accelerated, hyper-parameter optimization!”&lt;/p&gt;
&lt;p&gt;Scikit-Learn assumes a specific, but well defined API for estimators over which it will perform hyper-parameter
optimization. Most estimators/classifiers in Scikit-Learn look like the following:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;DummyEstimator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseEstimator&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=...&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_params&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;set_params&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When we started experimenting with hyper-parameter optimization, we found a few API holes missing, these were
resolved, mostly handling matching argument structure and various getters/setters.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;get_params and set_params (&lt;a class="reference external" href="https://github.com/rapidsai/cuml/pull/271"&gt;#271&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;fix/clf-solver (&lt;a class="reference external" href="https://github.com/rapidsai/cuml/pull/318"&gt;#318&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;map fit_transform to sklearn implementation (&lt;a class="reference external" href="https://github.com/rapidsai/cuml/pull/330"&gt;#330&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fea get params small changes (&lt;a class="reference external" href="https://github.com/rapidsai/cuml/pull/322"&gt;#322&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With holes plugged up we tested again. Using the same diabetes data set, we are now performing hyper-parameter optimization
and searching over many alpha parameters for the best &lt;em&gt;scoring&lt;/em&gt; alpha.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;alpha&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;linear_model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;cholesky&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cu_clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cumlRidge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;eig&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;r2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cu_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cu_clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;r2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cu_grid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Again, reminding ourselves that the data is small ~28KB, we don’t expect to observe cuml performing faster than sklearn. Instead, we want to demonstrate functionality.&lt;/p&gt;
&lt;p&gt;Again, reminding ourselves that the data is small ~28KB, we don’t expect to observe cuml performing faster than Scikit-Learn. Instead, we
want to demonstrate functionality. Additionally, we also tried swapping out Dask-ML’s implementation of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;GridSearchCV&lt;/span&gt;&lt;/code&gt;
(which adheres to the same API as Scikit-Learn) to use all of the GPUs we have available in parallel.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;alpha&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;linear_model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ridge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;cholesky&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cu_clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cumlRidge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;eig&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dcv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;r2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cu_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dcv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cu_clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;r2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cu_grid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Timing Measurements:&lt;/p&gt;
&lt;div class="pst-scrollable-table-container"&gt;&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;GridSearchCV&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;sklearn-Ridge&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;cuml-ridge&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;&lt;strong&gt;Scikit-Learn&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;88.4 ms ± 6.11 ms&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;6.51 s ± 132 ms&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;&lt;strong&gt;Dask-ML&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;873 ms ± 347 ms&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;740 ms ± 142 ms&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;Unsurprisingly, we see that GridSearchCV and Ridge Regression from Scikit-Learn is the fastest in this context.
There is cost to distributing work and data, and as we previously mentioned, moving data from host to device.&lt;/p&gt;
&lt;section id="how-does-performance-scale-as-we-scale-data"&gt;
&lt;h2&gt;How does performance scale as we scale data?&lt;/h2&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;two_dup_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vstack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1e2&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;two_dup_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hstack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1e2&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;three_dup_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vstack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1e3&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;three_dup_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hstack&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1e3&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="n"&gt;cu_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dcv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cu_clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;r2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cu_grid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;two_dup_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;two_dup_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cu_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dcv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cu_clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;r2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cu_grid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;three_dup_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;three_dup_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dcv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GridSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;r2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;three_dup_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;three_dup_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Timing Measurements:&lt;/p&gt;
&lt;div class="pst-scrollable-table-container"&gt;&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Data (MB)&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;cuML+Dask-ML&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;sklearn+Dask-ML&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;2.8 MB&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;13.8s&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;28 MB&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;1min 17s&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;4.87 s&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;cuML + dask-ml (Distributed GridSearchCV) does significantly &lt;em&gt;worse&lt;/em&gt; as data sizes increase! Why? Primarily, two reasons:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Non optimized movement of data between host and device compounded by N devices and the size of
the parameter space&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scoring methods are not implemented in with cuML&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Below is the Dask graph for the GridSearch&lt;/p&gt;
&lt;p&gt;
  &lt;a href="/images/cuml_grid.svg"&gt;
    &lt;img src="/images/cuml_grid.svg" width="90%"&gt;
  &lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;There are 50 (cv=5 times 10 parameters for alpha) instances of chunking up our test data set and scoring performance. That means 50 times we are moving data back forth between host and device for fitting and 50 times for scoring. That’s not great, but it’s also very solvable – build scoring functions for GPUs!&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/27/dask-cuml.md&lt;/span&gt;, line 230)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="immediate-future-work"&gt;
&lt;h1&gt;Immediate Future Work&lt;/h1&gt;
&lt;p&gt;We know the problems, GH Issues have been filed, and we are working on these issues – come help!&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Built In Scorers (&lt;a class="reference external" href="https://github.com/rapidsai/cuml/issues/242"&gt;#242&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DeviceNDArray as input data (&lt;a class="reference external" href="https://github.com/rapidsai/cuml/issues/369"&gt;#369&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Communication with UCX (&lt;a class="reference external" href="https://github.com/dask/distributed/issues/2344"&gt;#2344&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/03/27/dask-cuml/"/>
    <summary>Document headings start at H3, not H1 [myst.header]</summary>
    <category term="GPU" label="GPU"/>
    <category term="RAPIDS" label="RAPIDS"/>
    <category term="dask" label="dask"/>
    <published>2019-03-27T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/03/18/dask-nep18/</id>
    <title>Dask and the __array_function__ protocol</title>
    <updated>2019-03-18T00:00:00+00:00</updated>
    <author>
      <name>Peter Andreas Entschev</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/18/dask-nep18.md&lt;/span&gt;, line 10)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="summary"&gt;

&lt;p&gt;Dask is versatile for analytics parallelism, but there is still one issue to
leverage it to a broader spectrum: allowing it to transparently work with
&lt;a class="reference external" href="https://www.numpy.org/"&gt;NumPy&lt;/a&gt;-like libraries. We have previously discussed
how to work with
&lt;a class="reference external" href="http://blog.dask.org/2019/01/03/dask-array-gpus-first-steps"&gt;GPU Dask Arrays&lt;/a&gt;,
but limited to the scope of the array’s member methods sharing a NumPy-like
interface, for example the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.sum()&lt;/span&gt;&lt;/code&gt; method, thus, calling general functionality
from NumPy’s library wasn’t still possible. NumPy recently addressed this issue
in &lt;a class="reference external" href="https://www.numpy.org/neps/nep-0018-array-function-protocol.html"&gt;NEP-18&lt;/a&gt;
with the introduction of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol. In short, the
protocol allows a NumPy function call to dispatch the appropriate NumPy-like
library implementation, depending on the array type given as input, thus
allowing Dask to remain agnostic of such libraries, internally calling just the
NumPy function, which automatically handles dispatching of the appropriate
library implementation, for example,
&lt;a class="reference external" href="https://cupy.chainer.org/"&gt;CuPy&lt;/a&gt; or &lt;a class="reference external" href="https://sparse.pydata.org/"&gt;Sparse&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To understand what’s the end goal of this change, consider the following
example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now consider we want to speedup the SVD computation of a Dask array and offload
that work to a CUDA-capable GPU, we ultimately want to simply replace the NumPy
array &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;x&lt;/span&gt;&lt;/code&gt; by a CuPy array and let NumPy do its magic via
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol and dispatch the appropriate CuPy linear algebra
operations under the hood:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We could do the same for a Sparse array, or any other NumPy-like array that
supports the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol and the computation that we are
trying to perform. In the next section, we will take a look at potential
performance benefits that the protocol helps leveraging.&lt;/p&gt;
&lt;p&gt;Note that the features described in this post are still experimental, some
still under development and review. For a more detailed discussion on the
actual progress of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt;, please refer to the &lt;a class="reference internal" href="#issues"&gt;&lt;span class="xref myst"&gt;Issues section&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/18/dask-nep18.md&lt;/span&gt;, line 70)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="performance"&gt;
&lt;h1&gt;Performance&lt;/h1&gt;
&lt;p&gt;Before going any further, assume the following hardware is utilized for all
performance results described in this entire post:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CPU: 6-core (12-threads) Intel Core i7-7800X &amp;#64; 3.50GHz&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Main memory: 16 GB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GPU: NVIDIA Quadro GV100&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenBLAS 0.2.18&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cuBLAS 9.2.174&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cuSOLVER 9.2.148&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s now check an example to see potential performance benefits of the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol with Dask when using CuPy as a backend. Let’s
start by creating all the arrays that we will use for computing an SVD later.
Please note that my focus here is how Dask can leverage compute performance,
therefore I’m ignoring in this example the time spent on creating or copying
the arrays between CPU and GPU.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;dx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;dy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;asarray&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Seen above we have four arrays:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;x&lt;/span&gt;&lt;/code&gt;: a NumPy array in main memory;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;y&lt;/span&gt;&lt;/code&gt;: a CuPy array in GPU memory;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dx&lt;/span&gt;&lt;/code&gt;: a NumPy array wrapped in a Dask array;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dy&lt;/span&gt;&lt;/code&gt;: a &lt;em&gt;copy&lt;/em&gt; of a CuPy array wrapped in a Dask array; wrapping a CuPy
array in a Dask array as a view (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;asarray=True&lt;/span&gt;&lt;/code&gt;) is not supported yet.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="compute-svd-on-a-numpy-array"&gt;
&lt;h2&gt;Compute SVD on a NumPy array&lt;/h2&gt;
&lt;p&gt;We can then start by computing the SVD of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;x&lt;/span&gt;&lt;/code&gt; using NumPy, thus, it’s
processed on CPU in a single thread:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The timing information I obtained after that looks like the following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;347&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Over 3 minutes seems a bit too slow, so now the question is: Can we do better,
and more importantly, without having to change our entire code?&lt;/p&gt;
&lt;p&gt;The answer to this question is: Yes, we can.&lt;/p&gt;
&lt;p&gt;Let’s look now at other results.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="compute-svd-on-the-numpy-array-wrapped-in-dask-array"&gt;
&lt;h2&gt;Compute SVD on the NumPy array wrapped in Dask array&lt;/h2&gt;
&lt;p&gt;First, of all, this is what you had to do &lt;em&gt;before&lt;/em&gt; the introduction of the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The code above might have been very prohibitive for several projects, since one
needs to call the proper library dispatcher in addition to passing the correct
array. In other words, one would need to find all NumPy calls in the code and
replace those by the correct library’s function call, depending on the input
array type. After &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt;, the same NumPy function can be
called, using the Dask array &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dx&lt;/span&gt;&lt;/code&gt; as input:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note: Dask defers computation of results until its consumption, therefore we
need to call the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.compute()&lt;/span&gt;&lt;/code&gt; function on result arrays to actually compute
them.&lt;/p&gt;
&lt;p&gt;Let’s now take a look at the timing information:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;460&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now, without changing any code, besides the wrapping of the NumPy array as a
Dask array, we can see a speedup of 2x. Not too bad. But let’s go back to our
previous question: Can we do better?&lt;/p&gt;
&lt;/section&gt;
&lt;section id="compute-svd-on-the-cupy-array"&gt;
&lt;h2&gt;Compute SVD on the CuPy array&lt;/h2&gt;
&lt;p&gt;We can do the same as for the Dask array now and simply call NumPy’s SVD
function on the CuPy array &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;y&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The timing information we get now is the following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mf"&gt;17.3&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.81&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;19.1&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;19.1&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We now see a 4-5x speedup with no change in internal calls whatsoever! This is
exactly the sort of benefit that we expect to leverage with the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol, speeding up existing code, for free!&lt;/p&gt;
&lt;p&gt;Let’s go back to our original question one last time: Can we do better?&lt;/p&gt;
&lt;/section&gt;
&lt;section id="compute-svd-on-the-cupy-array-wrapped-in-dask-array"&gt;
&lt;h2&gt;Compute SVD on the CuPy array wrapped in Dask array&lt;/h2&gt;
&lt;p&gt;We can now take advantage of the benefits of Dask data chunk splitting &lt;em&gt;and&lt;/em&gt;
the CuPy GPU implementation, in an attempt to keep our GPU busy as much as
possible, this remains as simple as:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;svd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For which we get the following timing:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mf"&gt;8.97&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;653&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;9.62&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;9.45&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Giving us another 2x speedup over the single-threaded CuPy SVD computing.&lt;/p&gt;
&lt;p&gt;To conclude, we started from over 3 minutes and are now down to under 10
seconds by simply dispatching the work on a different array.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/18/dask-nep18.md&lt;/span&gt;, line 214)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="application"&gt;
&lt;h1&gt;Application&lt;/h1&gt;
&lt;p&gt;We will now talk a bit about potential applications of the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol. For this, we will discuss the
&lt;a class="reference external" href="https://dask-glm.readthedocs.io/"&gt;Dask-GLM&lt;/a&gt; library, used for fitting
Generalized Linear Models on large datasets. It’s built on top of Dask and
offers an API compatible with &lt;a class="reference external" href="https://scikit-learn.org/"&gt;scikit-learn&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Before the introduction of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol, we would need
to rewrite most of its internal implementation for each and every NumPy-like
library that we wished to use as a backend, therefore, we would need a
specialization of the implementation for Dask, another for CuPy and yet
another for Sparse. Now, thanks to all the functionality that these libraries
share through compatible interface, we don’t have to change the implementation
at all, we simply pass a different array type as input, as simple as that.&lt;/p&gt;
&lt;section id="example-with-scikit-learn"&gt;
&lt;h2&gt;Example with scikit-learn&lt;/h2&gt;
&lt;p&gt;To demonstrate the ability we acquired, let’s consider the following
scikit-learn example (based on the original example
&lt;a class="reference external" href="https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py"&gt;here&lt;/a&gt;):&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;

&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;

&lt;span class="c1"&gt;# x from 0 to N&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;40000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# y = a*x + b with noise&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# create a linear regression model&lt;/span&gt;
&lt;span class="n"&gt;est&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We can then fit the model,&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;est&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;and obtain its time measurements:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mf"&gt;3.4&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;ns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;3.4&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;2.3&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We can then use it for prediction on some test data,&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# predict y from the data&lt;/span&gt;
&lt;span class="n"&gt;x_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;est&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_new&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newaxis&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And also check its time measurements:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mf"&gt;1.16&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;680&lt;/span&gt; &lt;span class="n"&gt;µs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.84&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.58&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And finally plot the results:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# plot the results&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_new&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_new&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;black&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_facecolor&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.42&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;tight&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/dask-nep18-linreg.png"&gt;
&lt;/section&gt;
&lt;section id="example-with-dask-glm"&gt;
&lt;h2&gt;Example with Dask-GLM&lt;/h2&gt;
&lt;p&gt;The only thing we have to change from the code before is the first block, where
we import libraries and create arrays:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_glm.estimators&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;

&lt;span class="c1"&gt;# x from 0 to N&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;40000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# y = a*x + b with noise&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# create a linear regression model&lt;/span&gt;
&lt;span class="n"&gt;est&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;lbfgs&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The rest of the code and also the plot look alike the previous scikit-learn
example, so we’re ommitting those here for brevity. Note also that we could
have called &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;LinearRegression()&lt;/span&gt;&lt;/code&gt; without any arguments, but for this example
we chose the
&lt;a class="reference external" href="https://docs.scipy.org/doc/scipy/reference/optimize.minimize-lbfgsb.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;lbfgs&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;
solver, that converges reasonably fast.&lt;/p&gt;
&lt;p&gt;We can also have a look at the timing results for fitting, followed by those
for predicting with Dask-GLM:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Fitting&lt;/span&gt;
&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mf"&gt;9.66&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;116&lt;/span&gt; &lt;span class="n"&gt;µs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;9.78&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;8.94&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;

&lt;span class="c1"&gt;# Predicting&lt;/span&gt;
&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mi"&gt;130&lt;/span&gt; &lt;span class="n"&gt;µs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;327&lt;/span&gt; &lt;span class="n"&gt;µs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;457&lt;/span&gt; &lt;span class="n"&gt;µs&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.06&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If instead we want to use CuPy to compute, we have to change only 3 lines,
importing &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cupy&lt;/span&gt;&lt;/code&gt; instead of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;numpy&lt;/span&gt;&lt;/code&gt;, and the two lines where we create the
random arrays, replacing them to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cupy.random&lt;/span&gt;&lt;/code&gt; insted of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;np.random&lt;/span&gt;&lt;/code&gt;. The
block should then look like this:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_glm.estimators&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;

&lt;span class="c1"&gt;# x from 0 to N&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;40000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# y = a*x + b with noise&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# create a linear regression model&lt;/span&gt;
&lt;span class="n"&gt;est&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;lbfgs&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And the timing results we obtain in this scenario are:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Fitting&lt;/span&gt;
&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mi"&gt;151&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;40.7&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;191&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;190&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;

&lt;span class="c1"&gt;# Predicting&lt;/span&gt;
&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="mf"&gt;1.91&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;778&lt;/span&gt; &lt;span class="n"&gt;µs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;2.69&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Wall&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.37&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For the simple example chosen for this post, scikit-learn outperforms Dask-GLM
using both NumPy and CuPy arrays. There may exist several reasons that
contribute to this, and while we didn’t dive deep into understanding the exact
reasons and their extent, we could cite some likely possibilities:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;scikit-learn may be using solvers that converge faster;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask-GLM is entirely built on top of Dask, while scikit-learn may be
heavily optimized internally;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Too many synchronization steps and data transfer could occur for small
datasets with CuPy.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-for-different-dask-glm-solvers"&gt;
&lt;h2&gt;Performance for different Dask-GLM solvers&lt;/h2&gt;
&lt;p&gt;To verify how Dask-GLM with NumPy arrays would compare with CuPy arrays, we
also did some logistic regression benchmarking of Dask-GLM solvers. The results
below were obtained from a training dataset with 10&lt;sup&gt;2&lt;/sup&gt;,
10&lt;sup&gt;3&lt;/sup&gt;, …, 10&lt;sup&gt;6&lt;/sup&gt; features of 100 dimensions, and matching
number of test features.&lt;/p&gt;
&lt;p&gt;Note: we are intentionally omitting results for Dask arrays, as we have
identified a &lt;a class="reference external" href="https://github.com/dask/dask-glm/issues/78"&gt;potential bug&lt;/a&gt; that
causes Dask arrays not to converge.&lt;/p&gt;
&lt;img src="/images/dask-nep18-fitting.png"&gt;
&lt;p&gt;From the results observed in the graphs above we can see that CuPy can be one
order of magnitude faster than NumPy for fitting with any of the Dask-GLM
solvers. Please note also that both axis are given in logarithmic scale for
easier visualization.&lt;/p&gt;
&lt;p&gt;Another interesting effect that can be seen is how converging may take longer
for small number of samples. However, as we would normally hope for, compute
time required to converge scales linearly to the number of samples.&lt;/p&gt;
&lt;img src="/images/dask-nep18-prediction.png"&gt;
&lt;p&gt;Prediction with CuPy, as seen above, can be proportionally much faster than
NumPy, staying mostly constant for all solvers, and around 3-4 orders of
magnitude faster.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/18/dask-nep18.md&lt;/span&gt;, line 418)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="issues"&gt;
&lt;h1&gt;&lt;a name="issues"&gt;&lt;/a&gt;Issues&lt;/h1&gt;
&lt;p&gt;In this section we describe the work that has been done and is still ongoing
since February, 2019, towards enabling the features described in previous
sections. If you are not interested in all the details, feel free to completely
skip this.&lt;/p&gt;
&lt;section id="fixed-issues"&gt;
&lt;h2&gt;Fixed Issues&lt;/h2&gt;
&lt;p&gt;Since early February, 2019, substantial progress has been made towards deeper
support of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol in the different projects,
this trend is still going on and will continue in March. Below we see a list
of issues that have been fixed or are in the process of review:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt; protocol dependencies fixed in
&lt;a class="reference external" href="https://github.com/cupy/cupy/issues/2029"&gt;CuPy PR #2029&lt;/a&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask issues using CuPy backend with mean() and moment()
&lt;a class="reference external" href="https://github.com/dask/dask/issues/4481"&gt;Dask Issue #4481&lt;/a&gt;, fixed in
&lt;a class="reference external" href="https://github.com/dask/dask/pull/4513"&gt;Dask PR #4513&lt;/a&gt; and
&lt;a class="reference external" href="https://github.com/dask/dask/pull/4519"&gt;Dask PR #4519&lt;/a&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Replace in SciPy the aliased NumPy functions that may not be available in
libraries like CuPy, fixed in
&lt;a class="reference external" href="https://github.com/scipy/scipy/pull/9888"&gt;SciPy PR #9888&lt;/a&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Allow creation of arbitrary shaped arrays, using the input array as
reference for the new array to be created, under review in
&lt;a class="reference external" href="https://github.com/numpy/numpy/issues/13043"&gt;NumPy PR #13043&lt;/a&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multithreading with CuPy first identified in
&lt;a class="reference external" href="https://github.com/dask/dask/issues/4487"&gt;Dask Issue #4487&lt;/a&gt;,
&lt;a class="reference external" href="https://github.com/cupy/cupy/issues/2045"&gt;CuPy Issue #2045&lt;/a&gt; and
&lt;a class="reference external" href="https://github.com/cupy/cupy/issues/1109"&gt;CuPy Issue #1109&lt;/a&gt;, now under
review in &lt;a class="reference external" href="https://github.com/cupy/cupy/pull/2053"&gt;CuPy PR #2053&lt;/a&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Calling Dask’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;flatnonzero()&lt;/span&gt;&lt;/code&gt; on CuPy array missing &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cupy.compress()&lt;/span&gt;&lt;/code&gt;,
first identified in
&lt;a class="reference external" href="https://github.com/dask/dask/issues/4497"&gt;Dask Issue #4497&lt;/a&gt;, under review
in &lt;a class="reference external" href="https://github.com/dask/dask/pull/4548"&gt;Dask PR #4548&lt;/a&gt;,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask support for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt;, under review in
&lt;a class="reference external" href="https://github.com/dask/dask/pull/4567"&gt;Dask PR #4567&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="known-issues"&gt;
&lt;h2&gt;Known Issues&lt;/h2&gt;
&lt;p&gt;Currently, one of the biggest issues we are tackling relates to the
&lt;a class="reference external" href="https://github.com/dask/dask/issues/4490"&gt;Dask issue #4490&lt;/a&gt; we first
identified when calling Dask’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;diag()&lt;/span&gt;&lt;/code&gt; on a CuPy array. This requires some
change on the Dask &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Array&lt;/span&gt;&lt;/code&gt; class, and subsequent changes throughout large
parts of the Dask codebase. I will not go into too much detail here, but the
way we are handling this issue is by adding a new attribute &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;_meta&lt;/span&gt;&lt;/code&gt; to Dask
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Array&lt;/span&gt;&lt;/code&gt; in replacement of the simple &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dtype&lt;/span&gt;&lt;/code&gt; that currently exists. This
new attribute will not only hold the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dtype&lt;/span&gt;&lt;/code&gt; information, but also an empty
array of the backend type used to create the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Array&lt;/span&gt;&lt;/code&gt; in the first place, thus
allowing us to internally reconstruct arrays of the backend type, without
having to know explicitly whether it’s a NumPy, CuPy, Sparse or any other
NumPy-like array. For additional details, please refer to &lt;a class="reference external" href="https://github.com/dask/dask/issues/2977"&gt;Dask Issue
#2977&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We have identified some more issues with ongoing discussions:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Using Sparse as a Dask backend, discussed in
&lt;a class="reference external" href="https://github.com/dask/dask/issues/4523"&gt;Dask Issue #4523&lt;/a&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Calling Dask’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fix()&lt;/span&gt;&lt;/code&gt; on CuPy array depends on &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_wrap__&lt;/span&gt;&lt;/code&gt;,
discussed in &lt;a class="reference external" href="https://github.com/dask/dask/issues/4496"&gt;Dask Issue #4496&lt;/a&gt;
and &lt;a class="reference external" href="https://github.com/cupy/cupy/issues/589"&gt;CuPy Issue #589&lt;/a&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Allow coercing of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__array_function__&lt;/span&gt;&lt;/code&gt;, discussed in
&lt;a class="reference external" href="https://github.com/numpy/numpy/issues/12974"&gt;NumPy Issue #12974&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/18/dask-nep18.md&lt;/span&gt;, line 482)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="future-work"&gt;
&lt;h1&gt;Future Work&lt;/h1&gt;
&lt;p&gt;There are several possibilities for a richer experience with Dask, some of which
could be very interesting in the short and mid-term are:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Use &lt;a class="reference external" href="https://github.com/rapidsai/dask-cudf"&gt;Dask-cuDF&lt;/a&gt; alongside with
Dask-GLM to present interesting realistic applications of the whole
ecosystem;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;More comprehensive examples and benchmarks for Dask-GLM;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/linear_model.html"&gt;more models in
Dask-GLM&lt;/a&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A deeper look into the Dask-GLM versus scikit-learn performance;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Profile CuPy’s performance of matrix-matrix multiplication operations
(GEMM), compare to matrix-vector multiplication operations (GEMV) for
distributed Dask operation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/03/18/dask-nep18/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="CuPy" label="CuPy"/>
    <category term="Dask" label="Dask"/>
    <category term="Dask-GLM" label="Dask-GLM"/>
    <category term="Sparse" label="Sparse"/>
    <published>2019-03-18T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/03/04/building-gpu-groupbys/</id>
    <title>Building GPU Groupby-Aggregations for Dask</title>
    <updated>2019-03-04T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/04/building-gpu-groupbys.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="summary"&gt;

&lt;p&gt;We’ve sufficiently aligned Dask DataFrame and cuDF to get groupby aggregations
like the following to work well.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This post describes the kind of work we had to do as a model for future
development.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/04/building-gpu-groupbys.md&lt;/span&gt;, line 21)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="plan"&gt;
&lt;h1&gt;Plan&lt;/h1&gt;
&lt;p&gt;As outlined in a previous post, &lt;a class="reference internal" href="#../../../2019/01/13/dask-cudf-first-steps.html"&gt;&lt;span class="xref myst"&gt;Dask, Pandas, and GPUs: first
steps&lt;/span&gt;&lt;/a&gt;, our plan to produce
distributed GPU dataframes was to combine &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe.html"&gt;Dask
DataFrame&lt;/a&gt; with
&lt;a class="reference external" href="https://rapids.ai"&gt;cudf&lt;/a&gt;. In particular, we had to&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;change Dask DataFrame so that it would parallelize not just around the
Pandas DataFrames that it works with today, but around anything that looked
enough like a Pandas DataFrame&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;change cuDF so that it would look enough like a Pandas DataFrame to fit
within the algorithms in Dask DataFrame&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/04/building-gpu-groupbys.md&lt;/span&gt;, line 35)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="changes"&gt;
&lt;h1&gt;Changes&lt;/h1&gt;
&lt;p&gt;On the Dask side this mostly meant replacing&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Replacing &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;isinstance(df,&lt;/span&gt; &lt;span class="pre"&gt;pd.DataFrame)&lt;/span&gt;&lt;/code&gt; checks with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;is_dataframe_like(df)&lt;/span&gt;&lt;/code&gt;
checks (after defining a suitable
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;is_dataframe_like&lt;/span&gt;&lt;/code&gt;/&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;is_series_like&lt;/span&gt;&lt;/code&gt;/&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;is_index_like&lt;/span&gt;&lt;/code&gt; functions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Avoiding some more exotic functionality in Pandas, and instead trying to
use more common functionality that we can expect to be in most DataFrame
implementations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On the cuDF side this means making dozens of tiny changes to align the cuDF API
to the Pandas API, and to add in missing features.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dask Changes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/4359"&gt;Remove explicit pandas checks and provide cudf lazy registration #4359&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/4375"&gt;Replace isinstance(…, pandas) with is_dataframe_like #4375&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/4395"&gt;Add has_parallel_type&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/4396"&gt;Lazily register more cudf functions and move to backends file #4396&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/4418"&gt;Avoid checking against types in is_dataframe_like #4418&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/4470"&gt;Replace cudf-specific code with dask-cudf import #4470&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/4482"&gt;Avoid groupby.agg(callable) in groupby-var #4482&lt;/a&gt; – this one is notable in that by simplifying our Pandas usage we actually got a significant speedup on the Pandas side.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;cuDF Changes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/529"&gt;Build DataFrames from CUDA array libraries #529&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/534"&gt;Groupby AttributeError&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/556"&gt;Support comparison operations on Indexes #556&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/568"&gt;Support byte ranges in read_csv (and other formats) #568&lt;/a&gt;:w&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/824"&gt;Allow “df.index = some_index” #824&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/828"&gt;Support indexing on groupby objects #828&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/831"&gt;Support df.reset_index(drop=True) #831&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/879"&gt;Add Series.groupby #879&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/880"&gt;Support Dataframe/Series groupby level=0 #880&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/900"&gt;Implement division on DataFrame objects #900&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/934"&gt;Groupby objects aren’t indexable by column names #934&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/937"&gt;Support comparisons on index operations #937&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/944"&gt;Add DataFrame.rename #944&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/967"&gt;Set the index of a dataframe/series #967&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/968"&gt;Support concat(…, axis=1) #968&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/969"&gt;Support indexing with a pandas index from columns #969&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/970"&gt;Support indexing a dataframe with another boolean dataframe #970&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I don’t really expect anyone to go through all of those issues, but my hope is
that by skimming over the issue titles people will get a sense for the kinds of
changes we’re making here. It’s a large number of small things.&lt;/p&gt;
&lt;p&gt;Also, kudos to &lt;a class="reference external" href="https://github.com/thomcom"&gt;Thomson Comer&lt;/a&gt; who solved most of
the cuDF issues above.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/04/building-gpu-groupbys.md&lt;/span&gt;, line 83)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="there-are-still-some-pending-issues"&gt;
&lt;h1&gt;There are still some pending issues&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/1055"&gt;Square Root #1055&lt;/a&gt;, needed for groupby-std&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/483"&gt;cuDF needs multi-index support for columns #483&lt;/a&gt;, needed for:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;gropuby&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;sum&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;], &amp;#39;&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;: [&amp;#39;&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;, &amp;#39;&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;]})&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/04/building-gpu-groupbys.md&lt;/span&gt;, line 92)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="but-things-mostly-work"&gt;
&lt;h1&gt;But things mostly work&lt;/h1&gt;
&lt;p&gt;But generally things work pretty well today:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cudf&lt;/span&gt;

&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask_cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;yellow_tripdata_2016-*.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;passenger_count&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trip_distance&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_pandas&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;Out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
&lt;span class="mi"&gt;0&lt;/span&gt;    &lt;span class="mf"&gt;0.625424&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;    &lt;span class="mf"&gt;4.976895&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;    &lt;span class="mf"&gt;4.470014&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;    &lt;span class="mf"&gt;5.955262&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;    &lt;span class="mf"&gt;4.328076&lt;/span&gt;
&lt;span class="mi"&gt;5&lt;/span&gt;    &lt;span class="mf"&gt;3.079661&lt;/span&gt;
&lt;span class="mi"&gt;6&lt;/span&gt;    &lt;span class="mf"&gt;2.998077&lt;/span&gt;
&lt;span class="mi"&gt;7&lt;/span&gt;    &lt;span class="mf"&gt;3.147452&lt;/span&gt;
&lt;span class="mi"&gt;8&lt;/span&gt;    &lt;span class="mf"&gt;5.165570&lt;/span&gt;
&lt;span class="mi"&gt;9&lt;/span&gt;    &lt;span class="mf"&gt;5.916169&lt;/span&gt;
&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;float64&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/04/building-gpu-groupbys.md&lt;/span&gt;, line 119)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="experience"&gt;
&lt;h1&gt;Experience&lt;/h1&gt;
&lt;p&gt;First, most of this work was handled by the cuDF developers (which may be
evident from the relative lengths of the issue lists above). When we started
this process it felt like a never-ending stream of tiny issues. We weren’t
able to see the next set of issues until we had finished the current set.
Fortunately, most of them were pretty easy to fix. Additionally, as we went
on, it seemed to get a bit easier over time.&lt;/p&gt;
&lt;p&gt;Additionally, lots of things work other than groupby-aggregations as a result
of the changes above. From the perspective of someone accustomed to Pandas,
The cuDF library is starting to feel more reliable. We hit missing
functionality less frequently when using cuDF on other operations.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/03/04/building-gpu-groupbys.md&lt;/span&gt;, line 133)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-s-next"&gt;
&lt;h1&gt;What’s next?&lt;/h1&gt;
&lt;p&gt;More recently we’ve been working on the various join/merge operations in Dask
DataFrame like indexed joins on a sorted column, joins between large and small
dataframes (a common special case) and so on. Getting these algorithms from
the mainline Dask DataFrame codebase to work with cuDF is resulting in a
similar set of issues to what we saw above with groupby-aggregations, but so
far the list is much smaller. We hope that this is a trend as we continue on
to other sets of functionality into the future like I/O, time-series
operations, rolling windows, and so on.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/03/04/building-gpu-groupbys/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="GPU" label="GPU"/>
    <category term="RAPIDS" label="RAPIDS"/>
    <category term="dataframe" label="dataframe"/>
    <published>2019-03-04T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/01/31/dask-mpi-experiment/</id>
    <title>Running Dask and MPI programs together</title>
    <updated>2019-01-31T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/31/dask-mpi-experiment.md&lt;/span&gt;, line 10)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;We present an experiment on how to pass data from a loosely coupled parallel
computing system like Dask to a tightly coupled parallel computing system like
MPI.&lt;/p&gt;
&lt;p&gt;We give motivation and a complete digestible example.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/mrocklin/193a9671f1536b9d13524214798da4a8"&gt;Here is a gist of the code and results&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/31/dask-mpi-experiment.md&lt;/span&gt;, line 20)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="motivation"&gt;
&lt;h1&gt;Motivation&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Disclaimer: Nothing in this post is polished or production ready. This is an
experiment designed to start conversation. No long-term support is offered.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We often get the following question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;How do I use Dask to pre-process my data,
but then pass those results to a traditional MPI application?&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;You might want to do this because you’re supporting legacy code written
in MPI, or because your computation requires tightly coupled parallelism of the
sort that only MPI can deliver.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/31/dask-mpi-experiment.md&lt;/span&gt;, line 34)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="first-solution-write-to-disk"&gt;
&lt;h1&gt;First solution: Write to disk&lt;/h1&gt;
&lt;p&gt;The simplest thing to do of course is to write your Dask results to disk and
then load them back from disk with MPI. Given the relative cost of your
computation to data loading, this might be a great choice.&lt;/p&gt;
&lt;p&gt;For the rest of this blogpost we’re going to assume that it’s not.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/31/dask-mpi-experiment.md&lt;/span&gt;, line 42)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="second-solution"&gt;
&lt;h1&gt;Second solution&lt;/h1&gt;
&lt;p&gt;We have a trivial MPI library written in &lt;a class="reference external" href="https://mpi4py.readthedocs.io/en/stable/"&gt;MPI4Py&lt;/a&gt;
where each rank just prints out all the data that it was given. In principle
though it could call into C++ code, and do arbitrary MPI things.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# my_mpi_lib.py&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;mpi4py&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MPI&lt;/span&gt;

&lt;span class="n"&gt;comm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MPI&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COMM_WORLD&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;print_data_and_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; Fake function that mocks out how an MPI function should operate&lt;/span&gt;

&lt;span class="sd"&gt;    -   It takes in a list of chunks of data that are present on this machine&lt;/span&gt;
&lt;span class="sd"&gt;    -   It does whatever it wants to with this data and MPI&lt;/span&gt;
&lt;span class="sd"&gt;        Here for simplicity we just print the data and print the rank&lt;/span&gt;
&lt;span class="sd"&gt;    -   Maybe it returns something&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get_rank&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;on rank:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In our dask program we’re going to use Dask normally to load in data, do some
preprocessing, and then hand off all of that data to each MPI rank, which will
call the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;print_data_and_rank&lt;/span&gt;&lt;/code&gt; function above to initialize the MPI
computation.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# my_dask_script.py&lt;/span&gt;

&lt;span class="c1"&gt;# Set up Dask workers from within an MPI job using the dask_mpi project&lt;/span&gt;
&lt;span class="c1"&gt;# See https://dask-mpi.readthedocs.io/en/latest/&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_mpi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;initialize&lt;/span&gt;
&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;futures_of&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Use Dask Array to &amp;quot;load&amp;quot; data (actually just create random data here)&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;persist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Find out where data is on each worker&lt;/span&gt;
&lt;span class="c1"&gt;# TODO: This could be improved on the Dask side to reduce boiler plate&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;toolz&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;first&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;collections&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;
&lt;span class="n"&gt;key_to_part_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;futures_of&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="n"&gt;who_has&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;who_has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;worker_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workers&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;who_has&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;worker_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key_to_part_dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;


&lt;span class="c1"&gt;# Call an MPI-enabled function on the list of data present on each worker&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;my_mpi_lib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;print_data_and_rank&lt;/span&gt;

&lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;print_data_and_rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;list_of_parts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;list_of_parts&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;worker_map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;

&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then we can call this mix of Dask and an MPI program using normal &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mpirun&lt;/span&gt;&lt;/code&gt; or
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mpiexec&lt;/span&gt;&lt;/code&gt; commands.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;mpirun&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;my_dask_script&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/31/dask-mpi-experiment.md&lt;/span&gt;, line 126)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-just-happened"&gt;
&lt;h1&gt;What just happened&lt;/h1&gt;
&lt;p&gt;So MPI started up and ran our script.
The &lt;a class="reference external" href="https://dask-mpi.readthedocs.io/en/latest/"&gt;dask-mpi&lt;/a&gt; project set a Dask
scheduler on rank 0, runs our client code on rank 1, and then runs a bunch of workers on ranks 2+.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Rank 0: Runs a Dask scheduler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rank 1: Runs our script&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ranks 2+: Run Dask workers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our script then created a Dask array, though presumably here it would read in
data from some source, do more complex Dask manipulations before continuing on.&lt;/p&gt;
&lt;p&gt;We then wait until all of the Dask work has finished and is in a quiet state.
We then query the state in the scheduler to find out where all of that data
lives. That’s this code here:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Find out where data is on each worker&lt;/span&gt;
&lt;span class="c1"&gt;# TODO: This could be improved on the Dask side to reduce boiler plate&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;toolz&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;first&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;collections&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;
&lt;span class="n"&gt;key_to_part_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;futures_of&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="n"&gt;who_has&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;who_has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;worker_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workers&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;who_has&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;worker_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key_to_part_dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Admittedly, this code is gross, and not particularly friendly or obvious to
non-Dask experts (or even Dask experts themselves, I had to steal this from the
&lt;a class="reference external" href="http://ml.dask.org/xgboost.html"&gt;Dask XGBoost project&lt;/a&gt;, which does
the same trick).&lt;/p&gt;
&lt;p&gt;But after that we just call our MPI library’s initialize function,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;print_data_and_rank&lt;/span&gt;&lt;/code&gt; on all of our data using Dask’s
&lt;a class="reference external" href="http://docs.dask.org/en/latest/futures.html"&gt;Futures interface&lt;/a&gt;.
That function gets the data directly from local memory (the Dask workers and
MPI ranks are in the same process), and does whatever the MPI application
wants.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/31/dask-mpi-experiment.md&lt;/span&gt;, line 168)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="future-work"&gt;
&lt;h1&gt;Future work&lt;/h1&gt;
&lt;p&gt;This could be improved in a few ways:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;The “gross” code referred to above could probably be placed into some
library code to make this pattern easier for people to use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ideally the Dask part of the computation wouldn’t also have to be managed
by MPI, but could maybe start up MPI on its own.&lt;/p&gt;
&lt;p&gt;You could imagine Dask running on something like Kubernetes doing highly
dynamic work, scaling up and down as necessary. Then it would get to a
point where it needed to run some MPI code so it would, itself, start up
MPI on its worker processes and run the MPI application on its data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We haven’t really said anything about resilience here. My guess is that
this isn’t hard to do (Dask has plenty of mechanisms to build complex
inter-task relationships) but I didn’t solve it above.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/mrocklin/193a9671f1536b9d13524214798da4a8"&gt;Here is a gist of the code and results&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/01/31/dask-mpi-experiment/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="MPI" label="MPI"/>
    <published>2019-01-31T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/01/29/cudf-joins/</id>
    <title>Single-Node Multi-GPU Dataframe Joins</title>
    <updated>2019-01-29T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/29/cudf-joins.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="summary"&gt;

&lt;p&gt;We experiment with single-node multi-GPU joins using cuDF and Dask. We find
that the in-GPU computation is faster than communication. We also present
context and plans for near-future work, including improving high performance
communication in Dask with UCX.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/mrocklin/6e2c33c33b32bc324e3965212f202f66"&gt;Here is a notebook of the experiment in this post&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/29/cudf-joins.md&lt;/span&gt;, line 18)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="introduction"&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;In a recent post we showed how Dask + cuDF could accelerate reading CSV files
using multiple GPUs in parallel. That operation quickly became bound by the
speed of our disk after we added a few GPUs. Now we try a very different kind
of operation, multi-GPU joins.&lt;/p&gt;
&lt;p&gt;This workload can be communication-heavy, especially if the column on which we
are joining is not sorted nicely, and so provides a good example on the other
extreme from parsing CSV.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/29/cudf-joins.md&lt;/span&gt;, line 29)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="benchmark"&gt;
&lt;h1&gt;Benchmark&lt;/h1&gt;
&lt;section id="construct-random-data-using-the-cpu"&gt;
&lt;h2&gt;Construct random data using the CPU&lt;/h2&gt;
&lt;p&gt;Here we use Dask array and Dask dataframe to construct two random tables with a
shared &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;id&lt;/span&gt;&lt;/code&gt; column. We can play with the number of rows of each table and the
number of keys to make the join challenging in a variety of ways.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;n_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000000000&lt;/span&gt;
&lt;span class="n"&gt;n_keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5000000&lt;/span&gt;

&lt;span class="n"&gt;left&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_dask_dataframe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_keys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_dask_dataframe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;n_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000000&lt;/span&gt;

&lt;span class="n"&gt;right&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_dask_dataframe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_keys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_dask_dataframe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="send-to-the-gpus"&gt;
&lt;h2&gt;Send to the GPUs&lt;/h2&gt;
&lt;p&gt;We have two Dask dataframes composed of many Pandas dataframes of our random
data. We now map the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cudf.from_pandas&lt;/span&gt;&lt;/code&gt; function across these to make a Dask
dataframe of cuDF dataframes.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cudf&lt;/span&gt;

&lt;span class="n"&gt;gleft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;left&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_partitions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pandas&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;gright&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_partitions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pandas&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;gleft&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gright&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;persist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gleft&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gright&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# persist data in device memory&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What’s nice here is that there wasn’t any special
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask_pandas_dataframe_to_dask_cudf_dataframe&lt;/span&gt;&lt;/code&gt; function. Dask composed nicely
with cuDF. We didn’t need to do anything special to support it.&lt;/p&gt;
&lt;p&gt;We’ll also persisted the data in device memory.&lt;/p&gt;
&lt;p&gt;After this, simple operations are easy and fast and use our eight GPUs.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;gleft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# this takes 250ms&lt;/span&gt;
&lt;span class="go"&gt;500004719.254711&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="join"&gt;
&lt;h2&gt;Join&lt;/h2&gt;
&lt;p&gt;We’ll use standard Pandas syntax to merge the datasets, persist the result in
RAM, and then wait&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gleft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gright&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# this is lazy&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/29/cudf-joins.md&lt;/span&gt;, line 95)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="profile-and-analyze-results"&gt;
&lt;h1&gt;Profile and analyze results&lt;/h1&gt;
&lt;p&gt;We now look at the Dask diagnostic plots for this computation.&lt;/p&gt;
&lt;section id="task-stream-and-communication"&gt;
&lt;h2&gt;Task stream and communication&lt;/h2&gt;
&lt;p&gt;When we look at Dask’s task stream plot we see that each of our eight threads
(each of which manages a single GPU) spent most of its time in communication
(red is communication time). The actual merge and concat tasks are quite fast
relative to the data transfer time.&lt;/p&gt;
&lt;iframe src="https://matthewrocklin.com/raw-host/dask-cudf-joins.html"
        width="800"
        height="400"&gt;&lt;/iframe&gt;
&lt;p&gt;That’s not too surprising. For this computation I’ve turned off any attempt to
communicate between devices (more on this below) so the data is being moved
from the GPU to the CPU memory, then serialized and put onto a TCP socket.
We’re moving tens of GB on a single machine, so we’re seeing about 1GB/s total
throughput of the system, which is typical for TCP-on-localhost in Python.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="flamegraph-of-computation"&gt;
&lt;h2&gt;Flamegraph of computation&lt;/h2&gt;
&lt;p&gt;We can also look more deeply at the computational costs in Dask’s
flamegraph-style plot. This shows which lines of our functions were taking up
the most time (down to the Python level at least).&lt;/p&gt;
&lt;iframe src="http://matthewrocklin.com/raw-host/dask-cudf-join-profile.html"
        width="800"
        height="400"&gt;&lt;/iframe&gt;
&lt;p&gt;This &lt;a class="reference external" href="http://www.brendangregg.com/flamegraphs.html"&gt;Flame graph&lt;/a&gt; shows which
lines of cudf code we spent time on while computing (excluding the main
communication costs mentioned above). It may be interesting for those trying
to further optimize performance. It shows that most of our costs are in memory
allocation. Like communication, this has actually also been fixed in RAPIDS’
optional memory management pool, it just isn’t default yet, so I didn’t use it
here.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/29/cudf-joins.md&lt;/span&gt;, line 134)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="plans-for-efficient-communication"&gt;
&lt;h1&gt;Plans for efficient communication&lt;/h1&gt;
&lt;p&gt;The cuDF library actually has a decent approach to single-node multi-GPU
communication that I’ve intentionally turned off for this experiment. That
approach cleverly used Dask to communicate device pointer information using
Dask’s normal channels (this is small and fast) and then used that information
to initiate a side-channel communication for the bulk of the data. This
approach was effective, but somewhat fragile. I’m inclined to move on for it
in favor of …&lt;/p&gt;
&lt;p&gt;UCX. The &lt;a class="reference external" href="http://www.openucx.org/"&gt;UCX&lt;/a&gt; project provides a single API that
wraps around several transports like TCP, Infiniband, shared memory, and also
GPU-specific transports. UCX claims to find the best way to communicate data
between two points given the hardware available. If Dask were able to use this
for communication then it would provide both efficient GPU-to-GPU communication
on a single machine, and also efficient inter-machine communication when
efficient networking hardware like Infiniband was present, even outside the
context of GPUs.&lt;/p&gt;
&lt;p&gt;There is some work we need to do here:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;We need to make a Python wrapper around UCX&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We need to make an optional &lt;a class="reference external" href="https://distributed.dask.org/en/latest/communications.html"&gt;Dask Comm&lt;/a&gt;
around this ucx-py library that allows users to specify endpoints like
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ucx://path-to-scheduler&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We need to make Python memoryview-like objects that refer to device memory&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;…&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This work is already in progress by &lt;a class="reference external" href="https://github.com/Akshay-Venkatesh"&gt;Akshay
Vekatesh&lt;/a&gt;, who works on UCX, and &lt;a class="reference external" href="https://tomaugspurger.github.io/"&gt;Tom
Augspurger&lt;/a&gt; a core Dask/Pandas developer. I
suspect that they’ll write about it soon. I’m looking forward to seeing what
comes of it, both for Dask and for high performance Python generally.&lt;/p&gt;
&lt;p&gt;It’s worth pointing out that this effort won’t just help GPU users. It should
help anyone on advanced networking hardware, including the mainstream
scientific HPC community.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/29/cudf-joins.md&lt;/span&gt;, line 172)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="id1"&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/29/cudf-joins.md&lt;/span&gt;, line 172); &lt;em&gt;&lt;a href="#id1"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “summary”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Single-node Mutli-GPU joins have a lot of promise. In fact, earlier RAPIDS
developers got this running much faster than I was able to do above through the
clever communication tricks I briefly mentioned. The main purpose of this post
is to provide a benchmark for joins that we can use in the future, and to
highlight when communication can be essential in parallel computing.&lt;/p&gt;
&lt;p&gt;Now that GPUs have accelerated the computation time of each of our chunks of
work we increasingly find that other systems become the bottleneck. We didn’t
care as much about communication before because computational costs were
comparable. Now that computation is an order of magnitude cheaper, other
aspects of our stack become much more important.&lt;/p&gt;
&lt;p&gt;I’m looking forward to seeing where this goes.&lt;/p&gt;
&lt;section id="come-help"&gt;
&lt;h2&gt;Come help!&lt;/h2&gt;
&lt;p&gt;If the work above sounds interesting to you then come help!
There is a lot of low-hanging and high impact work to do.&lt;/p&gt;
&lt;p&gt;If you’re interested in being paid to focus more on these topics, then consider
applying for a job. NVIDIA’s RAPIDS team is looking to hire engineers for Dask
development with GPUs and other data analytics library development projects.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-TX-Austin/Senior-Library-Software-Engineer---RAPIDS_JR1919608-1"&gt;Senior Library Software Engineer - RAPIDS&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/01/29/cudf-joins/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="GPU" label="GPU"/>
    <category term="dataframe" label="dataframe"/>
    <published>2019-01-29T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/01/23/dask-1.1.0/</id>
    <title>Dask Release 1.1.0</title>
    <updated>2019-01-23T00:00:00+00:00</updated>
    <content type="html">&lt;p&gt;I’m pleased to announce the release of Dask version 1.1.0. This is a major
release with bug fixes and new features. The last release was 1.0.0 on
2018-11-29.
This blogpost outlines notable changes since the last release.&lt;/p&gt;
&lt;p&gt;You can conda install Dask:&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;conda install dask
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;or pip install from PyPI:&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;pip install dask[complete] --upgrade
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Full changelogs are available here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/blob/master/docs/source/changelog.rst"&gt;dask/dask&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/distributed/blob/master/docs/source/changelog.rst"&gt;dask/distributed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/23/dask-1.1.0.md&lt;/span&gt;, line 26)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="notable-changes"&gt;

&lt;p&gt;A lot of work has happened over the last couple months, and we encourage people
to look through the changelog to get a sense of the kinds of incremental
changes that developers are working on.&lt;/p&gt;
&lt;p&gt;There are also a few notable changes in this release that we’ll highlight here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Support for the recent Numpy 1.16 and Pandas 0.24 releases&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for Pandas Extension Arrays (see &lt;a class="reference internal" href="../../2019/01/22/dask-extension-arrays/"&gt;&lt;span class="doc std std-doc"&gt;Tom Augspurger’s post on the topic&lt;/span&gt;&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;High level graph in Dask dataframe and operator fusion in simple cases&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Increased support for other libraries that look enough like Numpy and Pandas
to work within Dask Array/Dataframe&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/23/dask-1.1.0.md&lt;/span&gt;, line 40)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="support-for-numpy-1-16-and-pandas-0-24"&gt;
&lt;h1&gt;Support for Numpy 1.16 and Pandas 0.24&lt;/h1&gt;
&lt;p&gt;Both Numpy and Pandas have been evolving quickly over the last few months.
We’re excited about the changes to extensibility arriving in both libraries.
The Dask array/dataframe submodules have been updated to work well with these
recent changes.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/23/dask-1.1.0.md&lt;/span&gt;, line 47)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="pandas-extension-arrays"&gt;
&lt;h1&gt;Pandas Extension Arrays&lt;/h1&gt;
&lt;p&gt;In particular Dask Dataframe supports Pandas Extension arrays,
meaning that it’s easier to use third party Pandas packages like CyberPandas or
Fletcher in parallel with Dask Dataframe.&lt;/p&gt;
&lt;p&gt;For more information see &lt;a class="reference internal" href="../../2019/01/22/dask-extension-arrays/"&gt;&lt;span class="doc std std-doc"&gt;Tom Augspurger’s post&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/23/dask-1.1.0.md&lt;/span&gt;, line 55)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="high-level-graphs-in-dask-dataframe"&gt;
&lt;h1&gt;High Level Graphs in Dask Dataframe&lt;/h1&gt;
&lt;p&gt;For a while Dask array has had some high level graphs for “atop” operations
(elementwise, broadcasting, transpose, tensordot, reductions), which allow for
reduced overhead and task fusion on computations within this class.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;  &lt;span class="c1"&gt;# These operations get fused to a single task&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We’ve renamed &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;atop&lt;/span&gt;&lt;/code&gt; to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;blockwise&lt;/span&gt;&lt;/code&gt; to be a bit more generic, and have also
started applying it to Dask Dataframe, which helps to reduce overhead
substantially when doing computations with many simple operations.&lt;/p&gt;
&lt;p&gt;This still needs to be improved to increase the class of cases where it works,
but we’re already seeing nice speedups on previously unseen workloads.&lt;/p&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;da.atop&lt;/span&gt;&lt;/code&gt; function has been deprecated in favor of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;da.blockwise&lt;/span&gt;&lt;/code&gt;. There
is now also a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dd.blockwise&lt;/span&gt;&lt;/code&gt; which shares a common code path.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/23/dask-1.1.0.md&lt;/span&gt;, line 75)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="non-pandas-dataframe-and-non-numpy-array-types"&gt;
&lt;h1&gt;Non-Pandas dataframe and Non-Numpy array types&lt;/h1&gt;
&lt;p&gt;We’re working to make Dask a bit more agnostic to the types of in-memory array
and dataframe objects that it can manipulate. Rather than having Dask Array be
a grid of Numpy arrays and Dask Dataframe be a sequence of Pandas dataframes,
we’re relaxing that constraint to a grid of &lt;em&gt;Numpy-like&lt;/em&gt; arrays and a sequence
of &lt;em&gt;Pandas-like&lt;/em&gt; dataframes.&lt;/p&gt;
&lt;p&gt;This is an ongoing effort that has targetted alternate backends like
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scipy.sparse&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;pydata/sparse&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cupy&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cudf&lt;/span&gt;&lt;/code&gt; and other systems.&lt;/p&gt;
&lt;p&gt;There were some recent posts on
&lt;a class="reference internal" href="../../2019/01/03/dask-array-gpus-first-steps/"&gt;&lt;span class="doc std std-doc"&gt;arrays&lt;/span&gt;&lt;/a&gt; and
&lt;a class="reference internal" href="../../2019/01/13/dask-cudf-first-steps/"&gt;&lt;span class="doc std std-doc"&gt;dataframes&lt;/span&gt;&lt;/a&gt; that show proofs of
concept for this with GPUs.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/23/dask-1.1.0.md&lt;/span&gt;, line 91)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="acknowledgements"&gt;
&lt;h1&gt;Acknowledgements&lt;/h1&gt;
&lt;p&gt;There have been several releases since the last time we had a release blogpost.
The following people contributed to the dask/dask repository since the 0.19.0
release on September 5th:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Anderson Banihirwe&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Antonino Ingargiola&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Armin Berres&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bart Broere&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Carlos Valiente&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daniel Li&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daniel Saxton&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;David Hoese&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Diane Trout&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Damien Garaud&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elliott Sales de Andrade&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Eric Wolak&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gábor Lipták&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Guido Imperiale&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Guillaume Eynard-Bontemps&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Itamar Turner-Trauring&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;James Bourbeau&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jan Koch&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Javad&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jendrik Jördening&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jim Crist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jonathan Fraine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;John Kirkham&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Johnnie Gray&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Julia Signell&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Justin Dennison&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;M. Farrajota&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Marco Neumann&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mark Harfouche&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Markus Gonser&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Martin Durant&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthias Bussonnier&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mina Farid&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Paul Vecchio&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Prabakaran Kumaresshan&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rahul Vaidya&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stephan Hoyer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stuart Berg&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TakaakiFuruse&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Takahiro Kojima&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Augspurger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Yu Feng&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Zhenqing Li&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&amp;#64;milesial&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&amp;#64;samc0de&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&amp;#64;slnguyen&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following people contributed to the dask/distributed repository since the 0.19.0
release on September 5th:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Adam Klein&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brett Naul&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daniel Farrell&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Diane Trout&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dirk Petersen&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Eric Ma&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jim Crist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;John Kirkham&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gaurav Sheni&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Guillaume Eynard-Bontemps&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Loïc Estève&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Marius van Niekerk&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matthew Rocklin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Michael Wheeler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MikeG&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NotSqrt&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Peter Killick&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Roy Wedge&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Russ Bubley&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stephan Hoyer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&amp;#64;tjb900&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tom Rochette&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&amp;#64;fjetter&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/01/23/dask-1.1.0/"/>
    <summary>I’m pleased to announce the release of Dask version 1.1.0. This is a major
release with bug fixes and new features. The last release was 1.0.0 on
2018-11-29.
This blogpost outlines notable changes since the last release.</summary>
    <category term="release" label="release"/>
    <published>2019-01-23T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/01/22/dask-extension-arrays/</id>
    <title>Extension Arrays in Dask DataFrame</title>
    <updated>2019-01-22T00:00:00+00:00</updated>
    <author>
      <name>Tom Augspurger</name>
    </author>
    <content type="html">&lt;p&gt;&lt;em&gt;This work is supported by &lt;a class="reference external" href="http://anaconda.com"&gt;Anaconda Inc&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/22/dask-extension-arrays.md&lt;/span&gt;, line 11)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="summary"&gt;

&lt;p&gt;Dask DataFrame works well with pandas’ new Extension Array interface, including
third-party extension arrays. This lets Dask&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;easily support pandas’ new extension arrays, like their new &lt;a class="reference external" href="http://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support"&gt;nullable integer
array&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;support third-party extension array arrays, like &lt;a class="reference external" href="https://cyberpandas.readthedocs.io"&gt;cyberpandas’s&lt;/a&gt;
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;IPArray&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/22/dask-extension-arrays.md&lt;/span&gt;, line 21)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="background"&gt;
&lt;h1&gt;Background&lt;/h1&gt;
&lt;p&gt;Pandas 0.23 introduced the &lt;a class="reference external" href="http://pandas.pydata.org/pandas-docs/version/0.24/extending.html#extension-types"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtensionArray&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;, a way to store things other
than a simple NumPy array in a DataFrame or Series. Internally pandas uses this
for data types that aren’t handled natively by NumPy like datetimes with
timezones, Categorical, or (the new!) nullable integer arrays.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date_range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2000&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;periods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;US/Central&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="go"&gt;0   2000-01-01 00:00:00-06:00&lt;/span&gt;
&lt;span class="go"&gt;1   2000-01-02 00:00:00-06:00&lt;/span&gt;
&lt;span class="go"&gt;2   2000-01-03 00:00:00-06:00&lt;/span&gt;
&lt;span class="go"&gt;3   2000-01-04 00:00:00-06:00&lt;/span&gt;
&lt;span class="go"&gt;dtype: datetime64[ns, US/Central]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.dataframe&lt;/span&gt;&lt;/code&gt; has always supported the extension types that pandas defines.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pandas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;npartitions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;Dask Series Structure:&lt;/span&gt;
&lt;span class="go"&gt;npartitions=2&lt;/span&gt;
&lt;span class="go"&gt;0    datetime64[ns, US/Central]&lt;/span&gt;
&lt;span class="go"&gt;2                           ...&lt;/span&gt;
&lt;span class="go"&gt;3                           ...&lt;/span&gt;
&lt;span class="go"&gt;dtype: datetime64[ns, US/Central]&lt;/span&gt;
&lt;span class="go"&gt;Dask Name: from_pandas, 2 tasks&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/22/dask-extension-arrays.md&lt;/span&gt;, line 52)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="the-challenge"&gt;
&lt;h1&gt;The Challenge&lt;/h1&gt;
&lt;p&gt;Newer versions of pandas allow third-party libraries to write custom extension
arrays. These arrays can be placed inside a DataFrame or Series, and work
just as well as any extension array defined within pandas itself. However,
third-party extension arrays provide a slight challenge for Dask.&lt;/p&gt;
&lt;p&gt;Recall: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.dataframe&lt;/span&gt;&lt;/code&gt; is lazy. We use a familiar pandas-like API to build up
a task graph, rather than executing immediately. But if Dask DataFrame is lazy,
then how do things like the following work?&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;B&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pandas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;npartitions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;B&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;
&lt;span class="go"&gt;Index([&amp;#39;B&amp;#39;], dtype=&amp;#39;object&amp;#39;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ddf[['B']]&lt;/span&gt;&lt;/code&gt; (lazily) selects the column &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;'B'&lt;/span&gt;&lt;/code&gt; from the dataframe. But accessing
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.columns&lt;/span&gt;&lt;/code&gt; &lt;em&gt;immediately&lt;/em&gt; returns a pandas Index object with just the selected
columns.&lt;/p&gt;
&lt;p&gt;No real computation has happened (you could just as easily swap out the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;from_pandas&lt;/span&gt;&lt;/code&gt; for a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dd.read_parquet&lt;/span&gt;&lt;/code&gt; on a larger-than-memory dataset, and the
behavior would be the same). Dask is able to do these kinds of “metadata-only”
computations, where the output depends only on the columns and the dtypes,
without executing the task graph. Internally, Dask does this by keeping a pair
of dummy pandas DataFrames on each Dask DataFrame.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_meta&lt;/span&gt;
&lt;span class="go"&gt;Empty DataFrame&lt;/span&gt;
&lt;span class="go"&gt;Columns: [A, B]&lt;/span&gt;
&lt;span class="go"&gt;Index: []&lt;/span&gt;

&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_meta_nonempty&lt;/span&gt;
&lt;span class="go"&gt;ddf._meta_nonempty&lt;/span&gt;
&lt;span class="go"&gt;   A  B&lt;/span&gt;
&lt;span class="go"&gt;0  1  1&lt;/span&gt;
&lt;span class="go"&gt;1  1  1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We need the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;_meta_nonempty&lt;/span&gt;&lt;/code&gt;, since some operations in pandas behave differently
on an Empty DataFrame than on a non-empty one (either by design or,
occasionally, a bug in pandas).&lt;/p&gt;
&lt;p&gt;The issue with third-party extension arrays is that Dask doesn’t know what
values to put in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;_meta_nonempty&lt;/span&gt;&lt;/code&gt;. We’re quite happy to do it for each NumPy
dtype and each of pandas’ own extension dtypes. But any third-party library
could create an ExtensionArray for any type, and Dask would have no way of
knowing what’s a valid value for it.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/22/dask-extension-arrays.md&lt;/span&gt;, line 104)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="the-solution"&gt;
&lt;h1&gt;The Solution&lt;/h1&gt;
&lt;p&gt;Rather than Dask guessing what values to use for the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;_meta_nonempty&lt;/span&gt;&lt;/code&gt;, extension
array authors (or users) can register their extension dtype with Dask. Once
registered, Dask will be able to generate the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;_meta_nonempty&lt;/span&gt;&lt;/code&gt;, and things
should work fine from there. For example, we can register the dummy &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;DecimalArray&lt;/span&gt;&lt;/code&gt;
that pandas uses for testing (this isn’t part of pandas’ public API) with Dask.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;decimal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pandas.tests.extension.decimal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecimalArray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DecimalDtype&lt;/span&gt;

&lt;span class="c1"&gt;# The actual registration that would be done in the 3rd-party library&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe.extensions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_array_nonempty&lt;/span&gt;


&lt;span class="nd"&gt;@make_array_nonempty&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DecimalDtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;DecimalArray&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_from_sequence&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;NaN&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
                                       &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now users of that extension type can place those arrays inside a Dask DataFrame
or Series.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DecimalArray&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;1.0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2.0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;                                      &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;3.0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)])})&lt;/span&gt;

&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pandas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt;
&lt;span class="go"&gt;Dask DataFrame Structure:&lt;/span&gt;
&lt;span class="go"&gt;                     A&lt;/span&gt;
&lt;span class="go"&gt;npartitions=1&lt;/span&gt;
&lt;span class="go"&gt;0              decimal&lt;/span&gt;
&lt;span class="go"&gt;2                  ...&lt;/span&gt;
&lt;span class="go"&gt;Dask Name: from_pandas, 1 tasks&lt;/span&gt;

&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtypes&lt;/span&gt;
&lt;span class="go"&gt;A    decimal&lt;/span&gt;
&lt;span class="go"&gt;dtype: object&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And from there, the usual operations just as they would in pandas.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;random&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;choices&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;A&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DecimalArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;1.0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;                                              &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2.0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;                                             &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;                   &lt;span class="s2"&gt;&amp;quot;B&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,))})&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;ddf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pandas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;In [35]: ddf.groupby(&amp;quot;A&amp;quot;).B.mean().compute()&lt;/span&gt;
&lt;span class="go"&gt;Out[35]:&lt;/span&gt;
&lt;span class="go"&gt;A&lt;/span&gt;
&lt;span class="go"&gt;1.0    1.50&lt;/span&gt;
&lt;span class="go"&gt;2.0    1.48&lt;/span&gt;
&lt;span class="go"&gt;Name: B, dtype: float64&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/22/dask-extension-arrays.md&lt;/span&gt;, line 165)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="the-real-lesson"&gt;
&lt;h1&gt;The Real Lesson&lt;/h1&gt;
&lt;p&gt;It’s neat that Dask now supports extension arrays. But to me, the exciting thing
is just how little work this took. The
&lt;a class="reference external" href="https://github.com/dask/dask/pull/4379/files"&gt;PR&lt;/a&gt; implementing support for
third-party extension arrays is quite short, just defining the object that
third-parties register with, and using it to generate the data when dtype is
detected. Supporting the three new extension arrays in pandas 0.24.0
(&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;IntegerArray&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;PeriodArray&lt;/span&gt;&lt;/code&gt;, and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;IntervalArray&lt;/span&gt;&lt;/code&gt;), takes a handful of lines
of code&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@make_array_nonempty&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Interval&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;IntervalArray&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_breaks&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;closed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;closed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@make_array_nonempty&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;period_array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2001&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;freq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;freq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@make_array_nonempty&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_IntegerDtype&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;integer_array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Dask benefits directly from improvements made to pandas. Dask didn’t have to
build out a new parallel extension array interface, and reimplement all the new
extension arrays using the parallel interface. We just re-used what pandas
already did, and it fits into the existing Dask structure.&lt;/p&gt;
&lt;p&gt;For third-party extension array authors, like &lt;a class="reference external" href="https://cyberpandas.readthedocs.io"&gt;cyberpandas&lt;/a&gt;, the
work is similarly minimal. They don’t need to re-implement everything from the
ground up, just to play well with Dask.&lt;/p&gt;
&lt;p&gt;This highlights the importance of one of the Dask project’s core values: working
with the community. If you visit &lt;a class="reference external" href="https://dask.org"&gt;dask.org&lt;/a&gt;, you’ll see
phrases like&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Integrates with existing projects&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;and&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Built with the broader community&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;At the start of Dask, the developers &lt;em&gt;could&lt;/em&gt; have gone off and re-written pandas
or NumPy from scratch to be parallel friendly (though we’d probably still be
working on that part today, since that’s such a massive undertaking). Instead,
the Dask developers worked with the community, occasionally nudging it in
directions that would help out dask. For example, many places in pandas &lt;a class="reference external" href="http://matthewrocklin.com/blog/work/2015/03/10/PyData-GIL"&gt;held
the GIL&lt;/a&gt;, preventing
thread-based parallelism. Rather than abandoning pandas, the Dask and pandas
developers worked together to release the GIL where possible when it was a
bottleneck for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.dataframe&lt;/span&gt;&lt;/code&gt;. This benefited Dask and anyone else trying to
do thread-based parallelism with pandas DataFrames.&lt;/p&gt;
&lt;p&gt;And now, when pandas introduces new features like nullable integers,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.dataframe&lt;/span&gt;&lt;/code&gt; just needs to register it as an extension type and immediately
benefits from it. And third-party extension array authors can do the same for
their extension arrays.&lt;/p&gt;
&lt;p&gt;If you’re writing an ExtensionArray, make sure to add it to the &lt;a class="reference external" href="http://pandas.pydata.org/pandas-docs/version/0.24/ecosystem.html#extension-data-types"&gt;pandas
ecosystem&lt;/a&gt; page, and register it with Dask!&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/01/22/dask-extension-arrays/"/>
    <summary>This work is supported by Anaconda Inc</summary>
    <category term="dask" label="dask"/>
    <category term="dataframe" label="dataframe"/>
    <published>2019-01-22T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/01/13/dask-cudf-first-steps/</id>
    <title>Dask, Pandas, and GPUs: first steps</title>
    <updated>2019-01-13T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;We’re building a distributed GPU Pandas dataframe out of
&lt;a class="reference external" href="https://github.com/rapidsai/cudf"&gt;cuDF&lt;/a&gt; and
&lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe.html"&gt;Dask Dataframe&lt;/a&gt;.
This effort is young.&lt;/p&gt;
&lt;p&gt;This post describes the current situation,
our general approach,
and gives examples of what does and doesn’t work today.
We end with some notes on scaling performance.&lt;/p&gt;
&lt;p&gt;You can also view the experiment in this post as
&lt;a class="reference external" href="https://gist.github.com/mrocklin/4b1b80d1ae07ec73f75b2a19c8e90e2e"&gt;a notebook&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And here is a table of results:&lt;/p&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
  &lt;tr&gt;
    &lt;th&gt;Architecture&lt;/th&gt;
    &lt;th&gt;Time&lt;/th&gt;
    &lt;th&gt;Bandwidth&lt;/th&gt;
  &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt; Single CPU Core &lt;/th&gt;
      &lt;td&gt; 3min 14s &lt;/td&gt;
      &lt;td&gt; 50 MB/s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Eight CPU Cores &lt;/th&gt;
      &lt;td&gt; 58s &lt;/td&gt;
      &lt;td&gt; 170 MB/s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Forty CPU Cores &lt;/th&gt;
      &lt;td&gt; 35s &lt;/td&gt;
      &lt;td&gt; 285 MB/s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; One GPU &lt;/th&gt;
      &lt;td&gt; 11s &lt;/td&gt;
      &lt;td&gt; 900 MB/s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Eight GPUs &lt;/th&gt;
      &lt;td&gt; 5s &lt;/td&gt;
      &lt;td&gt; 2000 MB/s &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 63)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="building-blocks-cudf-and-dask"&gt;
&lt;h1&gt;Building Blocks: cuDF and Dask&lt;/h1&gt;
&lt;p&gt;Building a distributed GPU-backed dataframe is a large endeavor.
Fortunately we’re starting on a good foundation and
can assemble much of this system from existing components:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;The &lt;a class="reference external" href="https://github.com/rapidsai/cudf"&gt;cuDF&lt;/a&gt; library aims to implement the
Pandas API on the GPU. It gets good speedups on standard operations like
reading CSV files, filtering and aggregating columns, joins, and so on.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cudf&lt;/span&gt;  &lt;span class="c1"&gt;# looks and feels like Pandas, but runs on the GPU&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;myfile.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Alice&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;cuDF is part of the growing &lt;a class="reference external" href="https://rapids.ai"&gt;RAPIDS initiative&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe.html"&gt;Dask Dataframe&lt;/a&gt;
library provides parallel algorithms around the Pandas API. It composes
large operations like distributed groupbys or distributed joins from a task
graph of many smaller single-node groupbys or joins accordingly (and many
&lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-api.html"&gt;other operations&lt;/a&gt;).&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;  &lt;span class="c1"&gt;# looks and feels like Pandas, but runs in parallel&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;myfile.*.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Alice&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a class="reference external" href="https://distributed.dask.org"&gt;Dask distributed task scheduler&lt;/a&gt;
provides general-purpose parallel execution given complex task graphs.
It’s good for adding multi-node computing into an existing codebase.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Given these building blocks,
our approach is to make the cuDF API close enough to Pandas that
we can reuse the Dask Dataframe algorithms.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 105)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="benefits-and-challenges-to-this-approach"&gt;
&lt;h1&gt;Benefits and Challenges to this approach&lt;/h1&gt;
&lt;p&gt;This approach has a few benefits:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;We get to reuse the parallel algorithms found in Dask Dataframe originally designed for Pandas.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It consolidates the development effort within a single codebase so that
future effort spent on CPU Dataframes benefits GPU Dataframes and vice
versa. Maintenance costs are shared.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;By building code that works equally with two DataFrame implementations
(CPU and GPU) we establish conventions and protocols that will
make it easier for other projects to do the same, either with these two
Pandas-like libraries, or with future Pandas-like libraries.&lt;/p&gt;
&lt;p&gt;This approach also aims to demonstrate that the ecosystem should support Pandas-like
libraries, rather than just Pandas. For example, if
(when?) the Arrow library develops a computational system then we’ll be in
a better place to roll that in as well.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When doing any refactor we tend to clean up existing code.&lt;/p&gt;
&lt;p&gt;For example, to make dask dataframe ready for a new GPU Parquet reader
we end up &lt;a class="reference external" href="https://github.com/dask/dask/pull/4336"&gt;refactoring and simplifying our Parquet I/O logic&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The approach also has some drawbacks. Namely, it places API pressure on cuDF to match Pandas so:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Slight differences in API now cause larger problems, such as these:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/251"&gt;Join column ordering differs rapidsai/cudf #251&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/483#issuecomment-453218151"&gt;Groupby aggregation column ordering differs rapidsai/cudf #483&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cuDF has some pressure on it to repeat what some believe to be mistakes in
the Pandas API.&lt;/p&gt;
&lt;p&gt;For example, cuDF today supports missing values arguably more sensibly than
Pandas. Should cuDF have to revert to the old way of doing things
just to match Pandas semantics? Dask Dataframe will probably need
to be more flexible in order to handle evolution and small differences
in semantics.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 146)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h1&gt;Alternatives&lt;/h1&gt;
&lt;p&gt;We could also write a new dask-dataframe-style project around cuDF that deviates
from the Pandas/Dask Dataframe API. Until recently this
has actually been the approach, and the
&lt;a class="reference external" href="https://github.com/rapidsai/dask-cudf"&gt;dask-cudf&lt;/a&gt; project did exactly this.
This was probably a good choice early on to get started and prototype things.
The project was able to implement a wide range of functionality including
groupby-aggregations, joins, and so on using
&lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;dask delayed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We’re redoing this now on top of dask dataframe though, which means that we’re
losing some functionality that dask-cudf already had, but hopefully the
functionality that we add now will be more stable and established on a firmer
base.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 162)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="status-today"&gt;
&lt;h1&gt;Status Today&lt;/h1&gt;
&lt;p&gt;Today very little works, but what does is decently smooth.&lt;/p&gt;
&lt;p&gt;Here is a simple example that reads some data from many CSV files,
picks out a column,
and does some aggregations.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cuda&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LocalCUDACluster&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cudf&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocalCUDACluster&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# runs on eight local GPUs&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;gdf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask_cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;data/nyc/many/*.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# wrap around many CSV files&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;gdf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;passenger_count&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="mi"&gt;184464740&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Also note, NYC Taxi ridership is significantly less than it was a few years ago&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 186)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-i-m-excited-about-in-the-example-above"&gt;
&lt;h1&gt;What I’m excited about in the example above&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;All of the infrastructure surrounding the cuDF code, like the cluster setup,
diagnostics, JupyterLab environment, and so on, came for free, like any
other new Dask project.&lt;/p&gt;
&lt;p&gt;Here is an image of my JupyterLab setup&lt;/p&gt;
&lt;a href="https://matthewrocklin.com/blog/images/dask-cudf-environment.png"&gt;
  &lt;img src="https://matthewrocklin.com/blog/images/dask-cudf-environment.png"
       alt="Dask + CUDA + cuDF JupyterLab environment"
       width="70%"&gt;
&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Our &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;df&lt;/span&gt;&lt;/code&gt; object is actually just a normal Dask Dataframe. We didn’t have to
write new &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__repr__&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;__add__&lt;/span&gt;&lt;/code&gt;, or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.sum()&lt;/span&gt;&lt;/code&gt; implementations, and probably
many functions we didn’t think about work well today (though also many
don’t).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’re tightly integrated and more connected to other systems. For example, if
we wanted to convert our dask-cudf-dataframe to a dask-pandas-dataframe then
we would just use the cuDF &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;to_pandas&lt;/span&gt;&lt;/code&gt; function:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_partitions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_pandas&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We don’t have to write anything special like a separate &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.to_dask_dataframe&lt;/span&gt;&lt;/code&gt;
method or handle other special cases.&lt;/p&gt;
&lt;p&gt;Dask parallelism is orthogonal to the choice of CPU or GPU.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’s easy to switch hardware. By avoiding separate &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-cudf&lt;/span&gt;&lt;/code&gt; code paths
it’s easier to add cuDF to an existing Dask+Pandas codebase to run on GPUs,
or to remove cuDF and use Pandas if we want our code to be runnable without GPUs.&lt;/p&gt;
&lt;p&gt;There are more examples of this in the scaling section below.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 224)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-s-wrong-with-the-example-above"&gt;
&lt;h1&gt;What’s wrong with the example above&lt;/h1&gt;
&lt;p&gt;In general the answer is &lt;strong&gt;many small things&lt;/strong&gt;.&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cudf.read_csv&lt;/span&gt;&lt;/code&gt; function doesn’t yet support reading chunks from a
single CSV file, and so doesn’t work well with very large CSV files. We
had to split our large CSV files into many smaller CSV files first with
normal Dask+Pandas:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;few-large/*.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repartition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;npartitions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;many-small/*.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;(See &lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/568"&gt;rapidsai/cudf #568&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Many operations that used to work in dask-cudf like groupby-aggregations
and joins no longer work. We’re going to need to slightly modify many cuDF
APIs over the next couple of months to more closely match their Pandas
equivalents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I ran the timing cell twice because it currently takes a few seconds to
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;import&lt;/span&gt; &lt;span class="pre"&gt;cudf&lt;/span&gt;&lt;/code&gt; today.
&lt;a class="reference external" href="https://github.com/rapidsai/cudf/issues/627"&gt;rapidsai/cudf #627&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We had to make Dask Dataframe a bit more flexible and assume less about its
constituent dataframes being exactly Pandas dataframes. (see
&lt;a class="reference external" href="https://github.com/dask/dask/pull/4359"&gt;dask/dask #4359&lt;/a&gt; and
&lt;a class="reference external" href="https://github.com/dask/dask/pull/4375"&gt;dask/dask #4375&lt;/a&gt; for examples).
I suspect that there will by many more small changes like
these necessary in the future.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These problems are representative of dozens more similar issues. They are
all fixable and indeed, many are actively being fixed today by the &lt;a class="reference external" href="https://github.com/rapidsai/cudf/graphs/contributors"&gt;good folks
working on RAPIDS&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 262)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="near-term-schedule"&gt;
&lt;h1&gt;Near Term Schedule&lt;/h1&gt;
&lt;p&gt;The RAPIDS group is currently busy working to release 0.5, which includes some
of the fixes necessary to run the example above, and also many unrelated
stability improvements. This will probably keep them busy for a week or two
during which I don’t expect to see much Dask + cuDF work going on other than
planning.&lt;/p&gt;
&lt;p&gt;After that, Dask parallelism support will be a top priority, so
I look forward to seeing some rapid progress here.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 273)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="scaling-results"&gt;
&lt;h1&gt;Scaling Results&lt;/h1&gt;
&lt;p&gt;In &lt;a class="reference internal" href="../../2019/01/03/dask-array-gpus-first-steps/"&gt;&lt;span class="doc std std-doc"&gt;my last post about combining Dask Array with CuPy&lt;/span&gt;&lt;/a&gt;,
a GPU-accelerated Numpy,
we saw impressive speedups from using many GPUs on a simple problem that
manipulated some simple random data.&lt;/p&gt;
&lt;section id="dask-array-cupy-on-random-data"&gt;
&lt;h2&gt;Dask Array + CuPy on Random Data&lt;/h2&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
  &lt;tr&gt;
    &lt;th&gt;Architecture&lt;/th&gt;
    &lt;th&gt;Time&lt;/th&gt;
  &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt; Single CPU Core &lt;/th&gt;
      &lt;td&gt; 2hr 39min &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Forty CPU Cores &lt;/th&gt;
      &lt;td&gt; 11min 30s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; One GPU &lt;/th&gt;
      &lt;td&gt; 1 min 37s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Eight GPUs &lt;/th&gt;
      &lt;td&gt; 19s &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;That exercise was easy to scale because it was almost entirely bound by the
computation of creating random data.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="dask-dataframe-cudf-on-csv-data"&gt;
&lt;h2&gt;Dask DataFrame + cuDF on CSV data&lt;/h2&gt;
&lt;p&gt;We did a similar study on the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;read_csv&lt;/span&gt;&lt;/code&gt; example above, which is bound mostly
by reading CSV data from disk and then parsing it. You can see a notebook
available
&lt;a class="reference external" href="https://gist.github.com/mrocklin/4b1b80d1ae07ec73f75b2a19c8e90e2e"&gt;here&lt;/a&gt;. We
have similar (though less impressive) numbers to present.&lt;/p&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
  &lt;tr&gt;
    &lt;th&gt;Architecture&lt;/th&gt;
    &lt;th&gt;Time&lt;/th&gt;
    &lt;th&gt;Bandwidth&lt;/th&gt;
  &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt; Single CPU Core &lt;/th&gt;
      &lt;td&gt; 3min 14s &lt;/td&gt;
      &lt;td&gt; 50 MB/s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Eight CPU Cores &lt;/th&gt;
      &lt;td&gt; 58s &lt;/td&gt;
      &lt;td&gt; 170 MB/s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Forty CPU Cores &lt;/th&gt;
      &lt;td&gt; 35s &lt;/td&gt;
      &lt;td&gt; 285 MB/s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; One GPU &lt;/th&gt;
      &lt;td&gt; 11s &lt;/td&gt;
      &lt;td&gt; 900 MB/s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Eight GPUs &lt;/th&gt;
      &lt;td&gt; 5s &lt;/td&gt;
      &lt;td&gt; 2000 MB/s &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;The bandwidth numbers were computed by noting that the data was around 10 GB on disk&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/13/dask-cudf-first-steps.md&lt;/span&gt;, line 359)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="analysis"&gt;
&lt;h1&gt;Analysis&lt;/h1&gt;
&lt;p&gt;First, I want to emphasize again that it’s easy to test a wide variety of
architectures using this setup because of the Pandas API compatibility between
all of the different projects. We’re seeing a wide range of performance (40x
span) across a variety of different hardware with a wide range of cost points.&lt;/p&gt;
&lt;p&gt;Second, note that this problem scales less well than our
&lt;a class="reference internal" href="../../2019/01/03/dask-array-gpus-first-steps/"&gt;&lt;span class="doc std std-doc"&gt;previous example with CuPy&lt;/span&gt;&lt;/a&gt;,
both on CPU and GPU.
I suspect that this is because this example is also bound by I/O and not just
number-crunching. While the jump from single-CPU to single-GPU is large, the
jump from single-CPU to many-CPU or single-GPU to many-GPU is not as large as
we would have liked. For GPUs for example we got around a 2x speedup when we
added 8x as many GPUs.&lt;/p&gt;
&lt;p&gt;At first one might think that this is because we’re saturating disk read speeds.
However two pieces of evidence go against that guess:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;NVIDIA folks familiar with my current hardware inform me that they’re able to get
much higher I/O throughput when they’re careful&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The CPU scaling is similarly poor, despite the fact that it’s obviously not
reaching full I/O bandwidth&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Instead, it’s likely that we’re just not treating our disks and IO pipelines
carefully.&lt;/p&gt;
&lt;p&gt;We might consider working to think more carefully about data locality within a
single machine. Alternatively, we might just choose to use a smaller machine,
or many smaller machines. My team has been asking me to start playing with
some cheaper systems than a DGX, I may experiment with those soon. It may be
that for data-loading and pre-processing workloads the previous wisdom of “pack
as much computation as you can into a single box” no longer holds
(without us doing more work that is).&lt;/p&gt;
&lt;section id="come-help"&gt;
&lt;h2&gt;Come help&lt;/h2&gt;
&lt;p&gt;If the work above sounds interesting to you then come help!
There is a lot of low-hanging and high impact work to do.&lt;/p&gt;
&lt;p&gt;If you’re interested in being paid to focus more on these topics, then consider
applying for a job. NVIDIA’s RAPIDS team is looking to hire engineers for Dask
development with GPUs and other data analytics library development projects.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-TX-Austin/Senior-Library-Software-Engineer---RAPIDS_JR1919608-1"&gt;Senior Library Software Engineer - RAPIDS&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/01/13/dask-cudf-first-steps/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="GPU" label="GPU"/>
    <category term="Pandas" label="Pandas"/>
    <published>2019-01-13T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2019/01/03/dask-array-gpus-first-steps/</id>
    <title>GPU Dask Arrays, first steps</title>
    <updated>2019-01-03T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;p&gt;The following code creates and manipulates 2 TB of randomly generated data.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;threads&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;On a single CPU, this computation takes two hours.&lt;/p&gt;
&lt;p&gt;On an eight-GPU single-node system this computation takes nineteen seconds.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/03/dask-array-gpus-first-steps.md&lt;/span&gt;, line 24)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="combine-dask-array-with-cupy"&gt;

&lt;p&gt;Actually this computation isn’t that impressive.
It’s a simple workload,
for which most of the time is spent creating and destroying random data.
The computation and communication patterns are simple,
reflecting the simplicity commonly found in data processing workloads.&lt;/p&gt;
&lt;p&gt;What &lt;em&gt;is&lt;/em&gt; impressive is that we were able to create a distributed parallel GPU
array quickly by composing these four existing libraries:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://cupy.chainer.org/"&gt;CuPy&lt;/a&gt; provides a partial implementation of
Numpy on the GPU.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.dask.org/en/latest/array.html"&gt;Dask Array&lt;/a&gt; provides chunked
algorithms on top of Numpy-like libraries like Numpy and CuPy.&lt;/p&gt;
&lt;p&gt;This enables us to operate on more data than we could fit in memory
by operating on that data in chunks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a class="reference external" href="https://distributed.dask.org"&gt;Dask distributed&lt;/a&gt; task scheduler runs
those algorithms in parallel, easily coordinating work across many CPU
cores.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a class="reference external" href="https://github.com/rapidsai/dask-cuda"&gt;Dask CUDA&lt;/a&gt; to extend Dask
distributed with GPU support.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These tools already exist. We had to connect them together with a small amount
of glue code and minor modifications. By mashing these tools together we can
quickly build and switch between different architectures to explore what is
best for our application.&lt;/p&gt;
&lt;p&gt;For this example we relied on the following changes upstream:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/cupy/cupy/pull/1689"&gt;cupy/cupy #1689: Support Numpy arrays as seeds in RandomState&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/4041"&gt;dask/dask #4041 Make da.RandomState accessible to other modules&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/distributed/pull/2432"&gt;dask/distributed #2432: Add LocalCUDACluster&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/03/dask-array-gpus-first-steps.md&lt;/span&gt;, line 62)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="comparison-among-single-multi-cpu-gpu"&gt;
&lt;h1&gt;Comparison among single/multi CPU/GPU&lt;/h1&gt;
&lt;p&gt;We can now easily run some experiments on different architectures. This is
easy because …&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We can switch between CPU and GPU by switching between Numpy and CuPy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We can switch between single/multi-CPU-core and single/multi-GPU
by switching between Dask’s different task schedulers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These libraries allow us to quickly judge the costs of this computation for
the following hardware choices:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Single-threaded CPU&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multi-threaded CPU with 40 cores (80 H/T)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Single-GPU&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multi-GPU on a single machine with 8 GPUs&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We present code for these four choices below,
but first,
we present a table of results.&lt;/p&gt;
&lt;section id="results"&gt;
&lt;h2&gt;Results&lt;/h2&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
  &lt;tr&gt;
    &lt;th&gt;Architecture&lt;/th&gt;
    &lt;th&gt;Time&lt;/th&gt;
  &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt; Single CPU Core &lt;/th&gt;
      &lt;td&gt; 2hr 39min &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Forty CPU Cores &lt;/th&gt;
      &lt;td&gt; 11min 30s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; One GPU &lt;/th&gt;
      &lt;td&gt; 1 min 37s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Eight GPUs &lt;/th&gt;
      &lt;td&gt; 19s &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/section&gt;
&lt;section id="setup"&gt;
&lt;h2&gt;Setup&lt;/h2&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="c1"&gt;# generate chunked dask arrays of mamy numpy random arrays&lt;/span&gt;
&lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nbytes&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1e9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 2 TB&lt;/span&gt;
&lt;span class="c1"&gt;# 2000.0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="cpu-timing"&gt;
&lt;h2&gt;CPU timing&lt;/h2&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;single-threaded&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;threads&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="single-gpu-timing"&gt;
&lt;h2&gt;Single GPU timing&lt;/h2&gt;
&lt;p&gt;We switch from CPU to GPU by changing our data source to generate CuPy arrays
rather than NumPy arrays. Everything else should more or less work the same
without special handling for CuPy.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(This actually isn’t true yet, many things in dask.array will break for
non-NumPy arrays, but we’re working on it actively both within Dask, within
NumPy, and within the GPU array libraries. Regardless, everything in this
example works fine.)&lt;/em&gt;&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# generate chunked dask arrays of mamy cupy random arrays&lt;/span&gt;
&lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;-- we specify cupy here&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;single-threaded&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="multi-gpu-timing"&gt;
&lt;h2&gt;Multi GPU timing&lt;/h2&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cuda&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LocalCUDACluster&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocalCUDACluster&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And again, here are the results:&lt;/p&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
  &lt;tr&gt;
    &lt;th&gt;Architecture&lt;/th&gt;
    &lt;th&gt;Time&lt;/th&gt;
  &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt; Single CPU Core &lt;/th&gt;
      &lt;td&gt; 2hr 39min &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Forty CPU Cores &lt;/th&gt;
      &lt;td&gt; 11min 30s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; One GPU &lt;/th&gt;
      &lt;td&gt; 1 min 37s &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt; Eight GPUs &lt;/th&gt;
      &lt;td&gt; 19s &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;First, this is my first time playing with an 40-core system. I was surprised
to see that many cores. I was also pleased to see that Dask’s normal threaded
scheduler happily saturates many cores.&lt;/p&gt;
&lt;img src="https://matthewrocklin.com/blog/images/python-gil-8000-percent.png" width="100%"&gt;
&lt;p&gt;Although later on it did dive down to around 5000-6000%, and if you do the math
you’ll see that we’re not getting a 40x speedup. My &lt;em&gt;guess&lt;/em&gt; is that
performance would improve if we were to play with some mixture of threads and
processes, like having ten processes with eight threads each.&lt;/p&gt;
&lt;p&gt;The jump from the biggest multi-core CPU to a single GPU is still an order of
magnitude though. The jump to multi-GPU is another order of magnitude, and
brings the computation down to 19s, which is short enough that I’m willing to
wait for it to finish before walking away from my computer.&lt;/p&gt;
&lt;p&gt;Actually, it’s quite fun to watch on the dashboard (especially after you’ve
been waiting for three hours for the sequential solution to run):&lt;/p&gt;
&lt;blockquote class="imgur-embed-pub"
            lang="en"
            data-id="a/6hkPPwA"&gt;
&lt;a href="//imgur.com/6hkPPwA"&gt;&lt;/a&gt;
&lt;/blockquote&gt;
&lt;script async src="//s.imgur.com/min/embed.js" charset="utf-8"&gt;&lt;/script&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/03/dask-array-gpus-first-steps.md&lt;/span&gt;, line 221)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;This computation was simple, but the range in architecture just explored was
extensive. We swapped out the underlying architecture from CPU to GPU (which
had an entirely different codebase) and tried both multi-core CPU parallelism
as well as multi-GPU many-core parallelism.&lt;/p&gt;
&lt;p&gt;We did this in less than twenty lines of code, making this experiment something
that an undergraduate student or other novice could perform at home.
We’re approaching a point where experimenting with multi-GPU systems is
approachable to non-experts (at least for array computing).&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/mrocklin/57be0ca4143974e6015732d0baacc1cb"&gt;Here is a notebook for the experiment above&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2019/01/03/dask-array-gpus-first-steps.md&lt;/span&gt;, line 235)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="room-for-improvement"&gt;
&lt;h1&gt;Room for improvement&lt;/h1&gt;
&lt;p&gt;We can work to expand the computation above in a variety of directions.
There is a ton of work we still have to do to make this reliable.&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use more complex array computing workloads&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The Dask Array algorithms were designed first around Numpy. We’ve only
recently started making them more generic to other kinds of arrays (like
GPU arrays, sparse arrays, and so on). As a result there are still many
bugs when exploring these non-Numpy workloads.&lt;/p&gt;
&lt;p&gt;For example if you were to switch &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sum&lt;/span&gt;&lt;/code&gt; for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mean&lt;/span&gt;&lt;/code&gt; in the computation above
you would get an error because our &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mean&lt;/span&gt;&lt;/code&gt; computation contains an easy to
fix error that assumes Numpy arrays exactly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use Pandas and cuDF instead of Numpy and CuPy&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The cuDF library aims to reimplement the Pandas API on the GPU,
much like how CuPy reimplements the NumPy API.
Using Dask DataFrame with cuDF will require some work on both sides,
but is quite doable.&lt;/p&gt;
&lt;p&gt;I believe that there is plenty of low-hanging fruit here.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Improve and move LocalCUDACluster&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;LocalCUDAClutster&lt;/span&gt;&lt;/code&gt; class used above is an experimental &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Cluster&lt;/span&gt;&lt;/code&gt; type
that creates as many workers locally as you have GPUs, and assigns each
worker to prefer a different GPU. This makes it easy for people to load
balance across GPUs on a single-node system without thinking too much about
it. This appears to be a common pain-point in the ecosystem today.&lt;/p&gt;
&lt;p&gt;However, the LocalCUDACluster probably shouldn’t live in the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask/distributed&lt;/span&gt;&lt;/code&gt; repository (it seems too CUDA specific) so will probably
move to some dask-cuda repository. Additionally there are still many
questions about how to handle concurrency on top of GPUs, balancing between
CPU cores and GPU cores, and so on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-node computation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There’s no reason that we couldn’t accelerate computations like these
further by using multiple multi-GPU nodes. This is doable today with
manual setup, but we should also improve the existing deployment solutions
&lt;a class="reference external" href="https://kubernetes.dask.org"&gt;dask-kubernetes&lt;/a&gt;,
&lt;a class="reference external" href="https://yarn.dask.org"&gt;dask-yarn&lt;/a&gt;, and
&lt;a class="reference external" href="https://jobqueue.dask.org"&gt;dask-jobqueue&lt;/a&gt;, to make this easier for
non-experts who want to use a cluster of multi-GPU resources.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Expense&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The machine I ran this on is expensive. Well, it’s nowhere close to as
expensive to own and operate as a traditional cluster that you would need
for these kinds of results, but it’s still well beyond the price point of a
hobbyist or student.&lt;/p&gt;
&lt;p&gt;It would be useful to run this on a more budget system to get a sense of
the tradeoffs on more reasonably priced systems. I should probably also
learn more about provisioning GPUs on the cloud.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;section id="come-help"&gt;
&lt;h2&gt;Come help!&lt;/h2&gt;
&lt;p&gt;If the work above sounds interesting to you then come help!
There is a lot of low-hanging and high impact work to do.&lt;/p&gt;
&lt;p&gt;If you’re interested in being paid to focus more on these topics, then consider
applying for a job. The NVIDIA corporation is hiring around the use of Dask
with GPUs.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-TX-Austin/Senior-Library-Software-Engineer---RAPIDS_JR1919608-1"&gt;Senior Library Software Engineer - RAPIDS&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s a fairly generic posting. If you’re interested the posting doesn’t seem
to fit then please apply anyway and we’ll tweak things.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2019/01/03/dask-array-gpus-first-steps/"/>
    <summary>The following code creates and manipulates 2 TB of randomly generated data.</summary>
    <category term="GPU" label="GPU"/>
    <category term="array" label="array"/>
    <category term="cupy" label="cupy"/>
    <published>2019-01-03T00:00:00+00:00</published>
  </entry>
</feed>
