<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.dask.org</id>
  <title>Dask Working Notes - Posted in 2021</title>
  <updated>2026-03-05T15:05:21.247541+00:00</updated>
  <link href="https://blog.dask.org"/>
  <link href="https://blog.dask.org/blog/2021/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://blog.dask.org/2021/12/15/dask-fellow-reflections/</id>
    <title>Reflections on one year as the Dask life science fellow</title>
    <updated>2021-12-15T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/15/dask-fellow-reflections.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="summary"&gt;

&lt;p&gt;&lt;a class="reference external" href="https://github.com/GenevieveBuckley/"&gt;Genevieve Buckley&lt;/a&gt; was hired as a Dask Life Science Fellow in 2021 &lt;a class="reference external" href="https://chanzuckerberg.com/eoss/proposals/"&gt;funded by CZI&lt;/a&gt;. The goal was to improve Dask, with a &lt;a class="reference external" href="https://blog.dask.org/2021/03/04/the-life-science-community"&gt;specific focus on the life science community&lt;/a&gt;. This blogpost contains another progress update, and some personal reflections looking back over this year.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/15/dask-fellow-reflections.md&lt;/span&gt;, line 13)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#progress-update"&gt;&lt;span class="xref myst"&gt;Progress update&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#personal-reflections"&gt;&lt;span class="xref myst"&gt;Personal reflections&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#highlights-from-this-year"&gt;&lt;span class="xref myst"&gt;Highlights from this year&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#what-worked-well"&gt;&lt;span class="xref myst"&gt;What worked well&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#what-didnt-work-so-well"&gt;&lt;span class="xref myst"&gt;What didn’t work so well&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#whats-next-in-dask"&gt;&lt;span class="xref myst"&gt;What’s next in Dask?&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/15/dask-fellow-reflections.md&lt;/span&gt;, line 22)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="progress-update"&gt;
&lt;h1&gt;Progress update&lt;/h1&gt;
&lt;p&gt;A previous progress update for February to September 2021 is &lt;a class="reference external" href="https://blog.dask.org/2021/10/20/czi-eoss-update"&gt;available here&lt;/a&gt;. Read on for a progress update for the period September to December 2021.&lt;/p&gt;
&lt;p&gt;To summarize, between September and December 2021 inclusive, there were:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;32 merged pull requests acorss 7 repositories (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;distributed&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-tutorial&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ITK&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;napari&lt;/span&gt;&lt;/code&gt;, and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;napari.github.io&lt;/span&gt;&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;8 pending pull requests&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 new &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt; release&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 Dask tutorial run, and assisted with a second tutorial.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;4 new Dask blogposts published (five, if we count this one)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Read on for a more detailed description of special projects within this time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dask stale issues sprint&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In two weeks I was able to:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;close 117 stale issues, and&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;identify another 25 potential easy wins for the maintainer team to investigate further.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Lots of other people did work around the same time, following up on old pull requests and other maintanence work. The sprint was very successful overall.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dask user survey results analysis&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In September I analyzed the results from the 2021 Dask user survey.
This was a really fun task. Because we asked a lot more questions in 2021 (18 new questions, 43 questions in total) there was was a lot more data to dig into, compared with previous years. You can read the &lt;a class="reference external" href="https://blog.dask.org/2021/09/15/user-survey"&gt;full details about it here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The biggest benefit from this work is that now we can use this data to prioritize improvements to the documentation and examples.
The top two user requests are for more documentation and more examples from their industry. But it wasn’t until this year that we started asking what industries people worked in, so we can target new narrative documentation to the areas that need it most (geoscience, life science, and finance).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ITK compatibility with Dask&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I implemented &lt;a class="reference external" href="https://github.com/InsightSoftwareConsortium/ITK/pull/2829/"&gt;pickle serialization for itk images (ITK PR #2829)&lt;/a&gt;. This should be one of the last major pieces of the puzzle needed to make ITK images compatible with Dask. It builds on earlier work by Matt McCormick and John Kirkham (you can read a blog post about their earlier work &lt;a class="reference external" href="https://blog.dask.org/2019/08/09/image-itk"&gt;here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Better cross-compatibility for Dask with other projects was a major goal of mine, so this is an important piece of work. I outline the next steps in the section &lt;a class="reference internal" href="#whats-next-in-dask"&gt;&lt;span class="xref myst"&gt;What’s next in Dask?&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Improve rechunking&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I implemented &lt;a class="reference external" href="https://github.com/dask/dask/pull/8124"&gt;PR #8124&lt;/a&gt; fix a bug where reshaping a Dask array can cause an output array with chunks that are much too large to fit in memory.
Feedback from the life science user survey indicates that improving Dask’s performance around rechunking is a priority. This work helps to address that.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;High level graph work&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A major piece of work earlier this year was introducing high level graphs for array slicing and array overlap operations. That is a big effort requiring a lot of ongoing work.
&lt;a class="reference external" href="https://github.com/dask/dask/pull/8467"&gt;PR #8467&lt;/a&gt; tackles one of the next steps for this work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Find objects function for dask-image&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I implemented a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;find_objects&lt;/span&gt;&lt;/code&gt; function for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt; in &lt;a class="reference external" href="https://github.com/dask/dask-image/pull/240"&gt;PR #240&lt;/a&gt;. This implementation does not need to know the maximum label number ahead of time, a subtantial improement over the previous attempt. This is a major step forward, because it removes a major blocker to introducing scikit-image like &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;regionprops&lt;/span&gt;&lt;/code&gt; functionality.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Blogposts&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Dask blogposts published between September through to December 2021 include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes"&gt;Choosing good chunk sizes in Dask&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;This blogpost addresses some very common concerns and questions about using Dask.
I’m very pleased with this article, due to several thoughtful reviewers the final work is a much stronger and more comprehensive than the &lt;a class="reference external" href="https://twitter.com/DataNerdery/status/1424953376043790341"&gt;twitter thread&lt;/a&gt; that inspired it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’s also high impact work. In the Dask survey the most common request is for more documentation, and this content helps to address that. Twitter analytics also show much higher engagement with this content than for other similar tweets, indicating a demand in the community for this type of explanation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/12/01/mosaic-fusion"&gt;Mosaic Image Fusion&lt;/a&gt; (co-authored with Volker Hisenstein and Marvin Albert)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;This blogpost was several months in the making (started in mid-August and published in December). It’s fantastic to have people sharing some of the very cool work they do with Dask on real world problems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/10/20/czi-eoss-update"&gt;CZI EOSS Update&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;This blogpost shares with the community an interim progress update provided to CZI.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/09/15/user-survey"&gt;2021 Dask user survey results&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Discussed in more detail above, the analysis results from the Dask User Survey were published in September 2021.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tutorials&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;I presented a Dask tutorial at the &lt;a class="reference external" href="https://resbaz.github.io/resbaz2021/sydney/"&gt;ResBaz Sydney online conference&lt;/a&gt; on the 25th of November 2021. Thanks to the ResBaz organisers and to David McFarlane, Svetlana Tkachenko, and Oksana Tkachenko for monitoring the chat for questions on the day.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Naty Clementi ran a Dask tutorial for the Women Who Code DC meetup on the 4th of November 2021. I assisted Naty, mostly by monitoring questions in the chat.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/15/dask-fellow-reflections.md&lt;/span&gt;, line 93)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="personal-reflections"&gt;
&lt;h1&gt;Personal reflections&lt;/h1&gt;
&lt;p&gt;Reflecting back over the whole year, there were some things that worked well and some things that were less successful.&lt;/p&gt;
&lt;section id="highlights-from-this-year"&gt;
&lt;h2&gt;Highlights from this year&lt;/h2&gt;
&lt;p&gt;My personal highlights include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ITK + Dask integration work (discussed in more detail above).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A find objects fucntion for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt; (discussed in more detail above).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Visualization work, because it’s very high impact. We’re solving issues raised by life science groups, but the improved tools benefit EVERYONE who uses Dask.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This bugfix from &lt;a class="reference external" href="https://github.com/dask/dask/pull/7391"&gt;dask PR #7391&lt;/a&gt;, because this single change fixed problems in four places at once (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scikit-image&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-ml&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;xgcm/xhistogram&lt;/span&gt;&lt;/code&gt;, and the cupy dask tests).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Community building, conferences, and engagement. Lots of effort went into events over this year, and it’s certainly paid dividends.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="what-worked-well"&gt;
&lt;h2&gt;What worked well&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Dask stale issues sprint&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This was useful for the project, as well as useful for me.
Sorting through old issues was an incredibly effective way to get familiar with who the experts are for particular topics. It would have been even better if this happened in the first few months of working on Dask, instead of the last few months.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’s been suggested that one good way to gain familiarity is spending 6 months full time managing the issue tracker. Maybe that’s true, but the much shorter stale issue sprint was a very efficient way of getting a lot of the same benefits in a short space of time. I’d recommend it for new maintainers or triage team members.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Community building events&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We had a very successful year in terms of community building and events. This included tutorials, workshops, conferences, and community outreach. Summary of major events:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Led a Dask tutorial at &lt;a class="reference external" href="https://resbaz.github.io/resbaz2021/sydney/"&gt;ResBaz Sydney 2021&lt;/a&gt; in November.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Co-led a half-day tutorial on napari and Dask at the &lt;a class="reference external" href="https://www.lmameeting.com.au/"&gt;Light Microscopy Australia Meeting&lt;/a&gt; in August.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SciPy 2021 presentation &lt;a class="reference external" href="https://www.youtube.com/watch?v=tY_lCGS1BMk&amp;amp;amp;t=60s"&gt;Scaling Science: leveraging Dask for life sciences&lt;/a&gt; in July.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Organized the &lt;a class="reference external" href="https://blog.dask.org/2021/05/24/life-science-summit-workshop"&gt;Dask Life Science workshop&lt;/a&gt; at the Dask Summit in May 2021. The life science workshop included 15 pre-recorded talks, and 3 interactive discussions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Co-organised the &lt;a class="reference external" href="https://blog.dask.org/2021/06/25/dask-down-under"&gt;Dask Down Under&lt;/a&gt; workshop for the Dask Summit in May 2021. Dask Down Under contained 5 talks, 2 tutorials, 1 panel discussion, and 1 meet and greet networking event.
Dask Down Under&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expert panelist at the &lt;a class="reference external" href="https://www.vis2021.com.au/"&gt;VIS2021 symposium&lt;/a&gt; in February.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Visualization work&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This has been very high impact work, and I’m pleased with what we’ve achieved. Improved tools for visualization were requested by users in our survey of the life science community. This was a high priority, because improvements to visuzliation tools benefit EVERYONE who uses Dask.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-didn-t-work-so-well"&gt;
&lt;h2&gt;What didn’t work so well&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Technical resources&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We never really solved the problem of finding someone I could go to with technical questions. I did have people to ask about some specific projects, but in most cases I didn’t have a good way to direct questions to the right people. This is a challenging problem, especially because most Dask maintainers and contributors have full time jobs doing other things too. In my opinion, this negatively impacted the work and what we were able to achieve.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Being added to the &amp;#64;dask/maintenance team&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There’s no point getting notifications if you don’t have GitHub permissions to do anything about them. In future I think we should add only people with at least triage or write permissions to the github teams.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real time interaction&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We tried out “Ask a maintainer” office hours for the life science community, but they were poorly attended, so we cancelled this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We added some “Dask social chat” events to the calendar, but they were not very well attended outside of the first few. Most often, zero people attended. (There is another social chat for the Americas/Europe time zones, which is at a more convenient time for most people and might be more popular.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Slack&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Slack works well to DM specific people to set up meeting times, etc, but the public channels didn’t end up being very useful for me personally.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lack of integration with other project teams&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can only get so much done as a solo developer. We had hoped that I would naturally end up working with teams from several different projects, but this didn’t really end up being the case. The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;napari&lt;/span&gt;&lt;/code&gt; project is an exception to this, and that relationship was well established before starting work for Dask. Perhaps there’s something more we could have done here to facilitate more interaction.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/15/dask-fellow-reflections.md&lt;/span&gt;, line 154)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="what-s-next-for-genevieve"&gt;
&lt;h1&gt;What’s next for Genevieve?&lt;/h1&gt;
&lt;p&gt;Genevieve will be starting a new job next year, you can find her on GitHub &lt;a class="reference external" href="https://github.com/GenevieveBuckley/"&gt;&amp;#64;GeneviveeBuckley&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/15/dask-fellow-reflections.md&lt;/span&gt;, line 158)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-s-next-in-dask"&gt;
&lt;h1&gt;What’s next in Dask?&lt;/h1&gt;
&lt;p&gt;Lots of stuff has happened in Dask, but there is still lots left to do.
Here is a summary of the next steps for several projects. We’d love it if new people would like to take up the torch and contribute to any of these projects.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ITK image compatibility with Dask&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The next steps for the ITK + Dask project require ITK release candidate 5.3rc3 or above to become available (likely early in 2022).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When the release is available, the next step is to try to re-run the code from the original &lt;a class="reference external" href="https://blog.dask.org/2019/08/09/image-itk"&gt;ITK blogpost&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If there’s still work to be done we’ll need to open issues for the remaining blockers. And if it all works well, we’d like someone to write a second ITK + Dask blogpost to publicize the new functionality.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Improving performance around rechunking&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;More performance improvements related to rechunking is required (see &lt;a class="reference external" href="https://github.com/dask/dask/pull/7950"&gt;#7950&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/dask/dask/pull/7980"&gt;#7980&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;High level graph work for arrays and slicing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The high level graph work for slicing and overlapping arrays has a number of next steps.
Ian Rose has written &lt;a class="reference external" href="https://gist.github.com/ian-r-rose/4221ebf52f3423203640c498fb815f21"&gt;an excellent summary here&lt;/a&gt;. Briefly, the&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cull&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;get_output_keys&lt;/span&gt;&lt;/code&gt; methods must be implemented, then low level fusion and optimizations can be done.&lt;/p&gt;
&lt;p&gt;Relevant links:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cull&lt;/span&gt;&lt;/code&gt; method for ArrayOverlapLayer &lt;a class="reference external" href="https://github.com/dask/dask/issues/7789"&gt;#7789&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;get_output_keys&lt;/span&gt;&lt;/code&gt; method for ArrayOverlapLayer &lt;a class="reference external" href="https://github.com/dask/dask/issues/7791"&gt;#7791&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/7655"&gt;Array slicing HighLevelGraph layer #7655&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dask needs better documentation for high level graphs. Both &lt;a class="reference external" href="https://github.com/dask/dask/issues/7709"&gt;user documentation&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/dask/dask/issues/7755"&gt;developer documentation&lt;/a&gt; is required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;At some future point, it might be worthwhile integrating blogpost content from
&lt;a class="reference external" href="https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes"&gt;Choosing good chunk sizes in Dask&lt;/a&gt; into the main &lt;a class="reference external" href="https://docs.dask.org/en/latest/"&gt;Dask documentation&lt;/a&gt;, for better discoverability.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/12/15/dask-fellow-reflections/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="lifescience" label="life science"/>
    <published>2021-12-15T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/12/01/mosaic-fusion/</id>
    <title>Mosaic Image Fusion</title>
    <updated>2021-12-01T00:00:00+00:00</updated>
    <author>
      <name>and Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/01/mosaic-fusion.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;This blogpost shows a case study where a researcher uses Dask for mosaic image fusion.
Mosaic image fusion is when you combine multiple smaller images taken at known locations and stitch them together into a single image with a very large field of view. Full code examples are available on GitHub from the &lt;a class="reference external" href="https://github.com/VolkerH/DaskFusion"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;DaskFusion&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; repository:
&lt;a class="github reference external" href="https://github.com/VolkerH/DaskFusion"&gt;VolkerH/DaskFusion&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/01/mosaic-fusion.md&lt;/span&gt;, line 15)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="the-problem"&gt;
&lt;h1&gt;The problem&lt;/h1&gt;
&lt;section id="image-mosaicing-in-microscopy"&gt;
&lt;h2&gt;Image mosaicing in microscopy&lt;/h2&gt;
&lt;p&gt;In optical microscopy, a single field of view captured with a 20x objective typically
has a diagonal on the order of a few 100 μm (exact dimensions depend on other
parts of the optical system, including the size of the camera chip). A typical
sample slide has a size of 25mm by 75mm.
Therefore, when imaging a whole slide, one has to acquire hundreds of images, typically
with some overlap between individual tiles. With increasing magnification,
the required number of images increases accordingly.&lt;/p&gt;
&lt;p&gt;To obtain an overview one has to fuse this large number of individual
image tiles into a large mosaic image. Here, we assume that the information required for
positioning and alignment of the individual image tiles is known. In the example presented here,
this information is available as metadata recorded by the microscope, namely the microscope stage
position and the pixel scale. Alternatively, this
information could also be derived from the image data directly, e.g. through a
registration step that matches corresponding image features in the areas where tiles overlap.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/01/mosaic-fusion.md&lt;/span&gt;, line 35)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="the-solution"&gt;
&lt;h1&gt;The solution&lt;/h1&gt;
&lt;p&gt;The array that can hold the resulting mosaic image will often have a size that is too large
to fit in RAM, therefore we will use Dask arrays and the &lt;a class="reference external" href="https://docs.dask.org/en/latest/generated/dask.array.map_blocks.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function to enable
out-of-core processing. The &lt;a class="reference external" href="https://docs.dask.org/en/latest/generated/dask.array.map_blocks.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;
function will process smaller blocks (a.k.a chunks) of the output array individually, thus eliminating the need to
hold the whole output array in memory. If sufficient resources are available, dask will also distribute the processing of blocks across several workers,
thus we also get parallel processing for free, which can help speed up the fusion process.&lt;/p&gt;
&lt;p&gt;Typically whenever we want to join Dask arrays, we use &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-stack.html"&gt;Stack, Concatenate, and Block&lt;/a&gt;. However, these are not good tools for mosaic image fusion, because:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The image tiles will be be overlapping,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tiles may not be positioned on an exact grid and will typically also have slight rotations as the alignment of stage and camera is not perfect. In the most general case, for example in panaromic photo mosaics,
individual image tiles could be arbitrarily rotated or skewed.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The starting point for this mosaic prototype was some code that reads in the stage metadate for all tiles and calculates an affine transformation for each tile that would place it at the correct location
in the output array.&lt;/p&gt;
&lt;p&gt;The image below shows preliminary work placing mosaic image tiles into the correct positions using the napari image viewer.
Shown here is a small example with 63 image tiles.&lt;/p&gt;
&lt;img src="/images/mosaic-fusion/NapariMosaics.png" alt="Mosaic fusion images in the napari image viewer" width="700" height="265"&gt;
&lt;p&gt;And here is an animation of placing the individual tiles.&lt;/p&gt;
&lt;img src="/images/mosaic-fusion/Lama_whole_slide.gif" alt="Animation of whole slide mosaic fusion images" width="700" height="361"&gt;
&lt;p&gt;To leverage processing with Dask we created a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fuse&lt;/span&gt;&lt;/code&gt; function that generates a small block of the final mosaic and is invoked by &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; for each chunk of the output array.
On each invocation of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fuse&lt;/span&gt;&lt;/code&gt; function &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; passes a dictionary (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;block_info&lt;/span&gt;&lt;/code&gt;). From the &lt;a class="reference external" href="https://docs.dask.org/en/latest/generated/dask.array.map_blocks.html?highlight=block_info#dask.array.map_blocks"&gt;Dask documentation&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Your block function gets information about where it is in the array by accepting a special &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;block_info&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;block_id&lt;/span&gt;&lt;/code&gt; keyword argument.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The basic outline of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fuse&lt;/span&gt;&lt;/code&gt; function of the mosaic workflow is as follows.
For each chunk of the output array:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Determine which source image tiles intersect with the chunk.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adjust the image tiles’ affine transformations to take the offset of the chunk within the array into account.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Load all intersectiong image tiles and apply their respective adjusted affine transformation to map them into the chunk.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Blend the tiles using a simple maximum projection.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Return the blended chunk.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Using a maximum projection to blend areas with overlapping tiles can lead to artifacts such as ghost images and visible tile
seams, so you would typically want to use something more sophisticated in production.&lt;/p&gt;
&lt;section id="results"&gt;
&lt;h2&gt;Results&lt;/h2&gt;
&lt;p&gt;For datasets with many image tiles (~500-1000 tiles), we could speed up the mosaic generation from several hours to tens of minutes using this Dask based method
(compared to a previous workflow using ImageJ plugins runnning on the same workstation).
Due to Dask’s ability to handle data out-of-core and chunked array storage using zarr it is also possible to run the
fusion on hardware with limited RAM.&lt;/p&gt;
&lt;p&gt;Finally, we have the final mosaic fusion result.&lt;/p&gt;
&lt;img src="/images/mosaic-fusion/final-mosaic-fusion-result.png" alt="Final mosaic fusion result" width="700" height="486"&gt;
&lt;/section&gt;
&lt;section id="code"&gt;
&lt;h2&gt;Code&lt;/h2&gt;
&lt;p&gt;Code relatiing to this mosaic image fusion project can be found in the &lt;a class="reference external" href="https://github.com/VolkerH/DaskFusion"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;DaskFusion&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; GitHub repository here:
&lt;a class="github reference external" href="https://github.com/VolkerH/DaskFusion"&gt;VolkerH/DaskFusion&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There is a self-contained example available in &lt;a class="reference external" href="https://github.com/VolkerH/DaskFusion/blob/main/DaskFusion_Example.ipynb"&gt;this notebook&lt;/a&gt;, which downloads reduced-size example data to demonstrate the process.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/01/mosaic-fusion.md&lt;/span&gt;, line 97)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="what-s-next"&gt;
&lt;h1&gt;What’s next?&lt;/h1&gt;
&lt;p&gt;Currently, the DaskFusion code is a proof of concept for single-channel 2D images and simple maximum projection for blending the tiles in overlapping areas, it is not production code.
However, the same principle can be used for fusing multi-channel image volumes,
such as from Light-Sheet data if the tile chunk intersection calculation is extended to higher-dimensional arrays.
Such even larger datasets will benefit even more from leveraging dask,
as the processing can be distributed across multiple nodes of a HPC cluster using &lt;a class="reference external" href="http://jobqueue.dask.org/en/latest/"&gt;dask jobqueue&lt;/a&gt;.&lt;/p&gt;
&lt;section id="also-see"&gt;
&lt;h2&gt;Also see&lt;/h2&gt;
&lt;p&gt;Marvin’s lightning talk on multi-view image fusion:
&lt;a class="reference external" href="https://www.youtube.com/watch?v=YIblUvonMvo&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=10"&gt;15 minute video available here on YouTube&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The GitHub repository &lt;a class="reference external" href="https://github.com/m-albert/MVRegFus"&gt;MVRegFus&lt;/a&gt; that Marvin talks about in the video is available here:
&lt;a class="github reference external" href="https://github.com/m-albert/MVRegFus"&gt;m-albert/MVRegFus&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://github.com/manzt/napari-lazy-openslide"&gt;napari-lazy-openslide&lt;/a&gt; visualization plugin by &lt;a class="reference external" href="https://github.com/manzt"&gt;Trevor Manz&lt;/a&gt;: &lt;em&gt;“An experimental plugin to lazily load multiscale whole-slide tiff images with openslide and dask.”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;For further information on alternative approaches to image stitching:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ASHLAR: Alignment by Simultaneous Harmonization of Layer / Adjacency Registration&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://labsyspharm.github.io/ashlar/"&gt;ASHLAR homepage&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/labsyspharm/ashlar"&gt;ASHLAR GitHub repository&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://doi.org/10.1101/2021.04.20.440625"&gt;ASHLAR biorxiv pre-print&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Microscopy Image Stitching Tool (MIST)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://pages.nist.gov/MIST/"&gt;MIST homepage&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/usnistgov/MIST"&gt;MIST GitHub repository&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://raw.githubusercontent.com/wiki/USNISTGOV/MIST/assets/mist-algorithm-documentation.pdf"&gt;MIST algorithm documentation (PDF)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a class="reference external" href="https://github.com/yfukai/m2stitch"&gt;m2stitch&lt;/a&gt; python package by &lt;a class="reference external" href="https://github.com/yfukai"&gt;Yohsuke T. Fukai&lt;/a&gt;: &lt;em&gt;“Provides robust stitching of tiled microscope images on a regular grid”&lt;/em&gt; (based on the MIST algorithm)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/12/01/mosaic-fusion.md&lt;/span&gt;, line 127)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="acknowledgements"&gt;
&lt;h1&gt;Acknowledgements&lt;/h1&gt;
&lt;p&gt;This computational work was done by Volker Hilsenstein, in conjunction with Marvin Albert.
Volker Hilsenstein is a scientific software developer at &lt;a class="reference external" href="https://www.embl.org/groups/alexandrov/"&gt;EMBL in Theodore Alexandrov’s lab&lt;/a&gt; with a focus on spatial metabolomics and bio-image analysis.&lt;/p&gt;
&lt;p&gt;The sample images were prepared and imaged by Mohammed Shahraz from the Alexandrov lab at EMBL Heidelberg.&lt;/p&gt;
&lt;p&gt;Genevieve Buckley and Volker Hilsenstein wrote this blogpost.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/12/01/mosaic-fusion/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="imageanalysis" label="image analysis"/>
    <category term="lifescience" label="life science"/>
    <published>2021-12-01T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes/</id>
    <title>Choosing good chunk sizes in Dask</title>
    <updated>2021-11-02T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="summary"&gt;

&lt;p&gt;Confused about choosing &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-best-practices.html#select-a-good-chunk-size"&gt;a good chunk size&lt;/a&gt; for Dask arrays?&lt;/p&gt;
&lt;p&gt;Array chunks can’t be too big (we’ll run out of memory), or too small (the overhead introduced by Dask becomes overwhelming). So how can we get it right?&lt;/p&gt;
&lt;p&gt;It’s a two step process:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;First, start by choosing a chunk size similar to data you know can be processed entirely within memory (i.e. without Dask), using these &lt;a class="reference internal" href="#rough-rules-of-thumb"&gt;&lt;span class="xref myst"&gt;rough rules of thumb&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Then, watch the Dask dashboard task stream and worker memory plots, and adjust if needed. &lt;a class="reference internal" href="#what-to-watch-for-on-the-dashboard"&gt;&lt;span class="xref myst"&gt;Here are the signs to watch out for&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 20)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#what-are-dask-array-chunks"&gt;&lt;span class="xref myst"&gt;What are Dask array chunks?&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#too-small-is-a-problemg"&gt;&lt;span class="xref myst"&gt;Too small is a problem&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#too-big-is-also-a-problem"&gt;&lt;span class="xref myst"&gt;Too big is also a problem&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#choosing-an-initial-chunk-size"&gt;&lt;span class="xref myst"&gt;Choosing an initial chunk size&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#rough-rules-of-thumb"&gt;&lt;span class="xref myst"&gt;Rough rules of thumb&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#chunks-should-be-aligned-with-array-storage-on-disk"&gt;&lt;span class="xref myst"&gt;Chunks should be aligned with array storage on disk&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#using-the-dask-dashboard"&gt;&lt;span class="xref myst"&gt;Using the Dask dashboard&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#what-to-watch-for-on-the-dashboard"&gt;&lt;span class="xref myst"&gt;What to watch for on the dashboard
&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#rechunking-arrays"&gt;&lt;span class="xref myst"&gt;Rechunking arrays&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#unmanaged-memory"&gt;&lt;span class="xref myst"&gt;Unmanaged memory&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#thanks-for-reading"&gt;&lt;span class="xref myst"&gt;Thanks for reading&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 35)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-are-dask-array-chunks"&gt;
&lt;h1&gt;What are Dask array chunks?&lt;/h1&gt;
&lt;p&gt;Dask arrays are big structures, made out of many small chunks.
Typically, each small chunk is an individual &lt;a class="reference external" href="https://numpy.org/"&gt;numpy array&lt;/a&gt;, and they are arranged together to make a much larger Dask array.&lt;/p&gt;
&lt;img src="https://raw.githubusercontent.com/dask/dask/ac01ddc9074365e40d888f80f5bcd955ba01e872/docs/source/images/dask-array-black-text.svg" alt="Diagram: Dask array chunks" width="400" height="300" /&gt;
&lt;p&gt;You can find more information about Dask array chunks on this page of the documentation: &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-chunks.html"&gt;https://docs.dask.org/en/latest/array-chunks.html&lt;/a&gt;&lt;/p&gt;
&lt;section id="how-do-i-know-what-chunks-my-array-has"&gt;
&lt;h2&gt;How do I know what chunks my array has?&lt;/h2&gt;
&lt;p&gt;If you have a Dask array, you can use the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;chunksize&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;chunks&lt;/span&gt;&lt;/code&gt; attribues to see information about the chunks. You can also visualize this with the Dask array HTML representation.&lt;/p&gt;
&lt;img src="/images/choosing-good-chunk-sizes/examine-dask-array-chunks.png" alt="Visualizating Dask array chunks with the HTML repr" width="611" height="523" /&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;arr.chunksize&lt;/span&gt;&lt;/code&gt; shows the largest chunk size. For arrays where you expect roughly uniform chunk sizes, this is a good way to summarize chunk size information.&lt;/p&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;arr.chunks&lt;/span&gt;&lt;/code&gt; shows fully explicit sizes of all chunks along all dimensions within the Dask array (see &lt;a class="reference external" href="https://docs.dask.org/en/stable/array-chunks.html#specifying-chunk-shapes"&gt;item 3 here&lt;/a&gt;). This is more verbose, and is a good choice with arrays that have irregular chunks.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 54)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="too-small-is-a-problem"&gt;
&lt;h1&gt;Too small is a problem&lt;/h1&gt;
&lt;p&gt;If array chunks are too small, it’s inefficient. Why is this?&lt;/p&gt;
&lt;p&gt;Using Dask introduces some amount of overhead for each task in your computation.
This overhead is the reason the Dask best practices advise you to &lt;a class="reference external" href="https://docs.dask.org/en/latest/best-practices.html#avoid-very-large-graphs"&gt;avoid too-large graphs&lt;/a&gt;.
This is because if the amount of actual work done by each task is very tiny, then the percentage of overhead time vs useful work time is not good.&lt;/p&gt;
&lt;p&gt;Typically, the Dask scheduler takes 1 millisecond to coordinate a single task. That means we want the computation time for each task to be comparitively larger, eg: seconds instead of milliseconds.&lt;/p&gt;
&lt;p&gt;It might be hard to understand this intuitively, so here’s an analogy. Let’s imagine we’re building a house. It’s a pretty big job, and if there were only one worker it would take much too long to build.
So we have a team of workers and a site foreman. The site foreman is equivalent to the Dask scheduler: their job is to tell the workers what tasks they need to do.&lt;/p&gt;
&lt;p&gt;Say we have a big pile of bricks to build a wall, sitting in the corner of the building site.
If the foreman (the Dask scheduler) tells workers to go and fetch a single brick at a time, then bring each one to where the wall is being built, you can see how this is going to be very slow and inefficient! The workers are spending most of their time moving between the wall and the pile of bricks. Much less time is going towards doing the actual work of mortaring bricks onto the wall.&lt;/p&gt;
&lt;p&gt;Instead, we can do this in a smarter way. The foreman (Dask scheduler) can tell the workers to go and bring one full wheelbarrow load of bricks back each time. Now workers are spending much less time moving between the wall and the pile of bricks, and the wall will be finished much quicker.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 72)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="too-big-is-also-a-problem"&gt;
&lt;h1&gt;Too big is also a problem&lt;/h1&gt;
&lt;p&gt;If the Dask array chunks are too big, this is also bad. Why?
Chunks that are too large are bad because then you are likely to run out of working memory.
You may see out of memory errors happening, or you might see performance decrease substantially as data spills to disk.&lt;/p&gt;
&lt;p&gt;When too much data is loaded in memory on too few workers, Dask will try to spill data to disk instead of crashing.
Spilling data to disk makes things run very slowly, because all the extra read/write operations to disk. Things don’t just get a little bit slower, they get a LOT slower, so it’s smart to watch out for this.&lt;/p&gt;
&lt;p&gt;To watch out for this, look at the &lt;strong&gt;worker memory plot&lt;/strong&gt; on the Dask dashboard.
Orange bars are a warning you are close to the limit, and gray means data is being spilled to disk - not good!
For more tips, see the section on &lt;a class="reference internal" href="#using-the-Dask-dashboard"&gt;&lt;span class="xref myst"&gt;using the Dask dashboard&lt;/span&gt;&lt;/a&gt; below.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 85)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="choosing-an-initial-chunk-size"&gt;
&lt;h1&gt;Choosing an initial chunk size&lt;/h1&gt;
&lt;section id="rough-rules-of-thumb"&gt;
&lt;h2&gt;Rough rules of thumb&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;If you already created a prototype, which may not involve Dask at all, using a small subset of the data you intend to process, you’ll have a clear idea of what size of data can be processed easily for this workflow. You can use this knowledge to choose similar sized chunks in Dask.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some people have observed that chunk sizes below 1MB are almost always bad. Chunk size between 100MB and 1GB are generally good, going over 1 or 2GB means you have a really big dataset and/or a lot of memory available per core,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upper bound: Avoid too large task graphs. More than 10,000 or 100,000 chunks may start to perform poorly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lower bound: To get the advantage of parallelization, you need the number of chunks to at least equal the number of worker cores available (or better, the number of worker cores times 2). Otherwise, some workers will stay idle.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The time taken to compute each task should be much larger than the time needed to schedule the task. The Dask scheduler takes roughly 1 millisecond to coordinate a single task, so a good task computation time would be measured in seconds (not milliseconds).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="chunks-should-be-aligned-with-array-storage-on-disk"&gt;
&lt;h2&gt;Chunks should be aligned with array storage on disk&lt;/h2&gt;
&lt;p&gt;If you are reading data from disk, the storage structure will inform what shape your Dask array chunks should be. For best performance, choose chunks that are well aligned with the way data is stored.&lt;/p&gt;
&lt;p&gt;From the Dask best practices on how to &lt;a class="reference external" href="https://docs.dask.org/en/stable/array-best-practices.html#orient-your-chunks"&gt;orient your chunks&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;When reading data you should align your chunks with your storage format. Most array storage formats store data in chunks themselves. If your Dask array chunks aren’t multiples of these chunk shapes then you will have to read the same data repeatedly, which can be expensive. Note though that often storage formats choose chunk sizes that are much smaller than is ideal for Dask, closer to 1MB than 100MB. In these cases you should choose a Dask chunk size that aligns with the storage chunk size and that every Dask chunk dimension is a multiple of the storage chunk dimension.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Some examples of data storage structures on disk include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A HDF5 or &lt;a class="reference external" href="https://zarr.readthedocs.io/en/stable/api/core.html"&gt;Zarr array&lt;/a&gt;. The size and shape of chunks/blocks stored on disk should align well with the Dask array chunks you select.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A folder full of tiff files. You might decide that each tiff file should become a single chunk in the Dask array (or that multiple tiff files should be grouped into a single chunk).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 108)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="using-the-dask-dashboard"&gt;
&lt;h1&gt;Using the Dask dashboard&lt;/h1&gt;
&lt;p&gt;The second part of choosing a good chunk size is monitoring the Dask dashboard to see if you need to make any adjustments.&lt;/p&gt;
&lt;p&gt;If you’re not very familiar with the Dask dashboard, or you just sometimes forget where to find certain dashboard plots (like the worker memory plot), then you’ll probably enjoy these quick video tutorials:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.youtube.com/watch?v=N_GqzcuGLCY"&gt;Intro to the Dask dashboard (18 minute video)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.youtube.com/watch?v=EX_voquHdk0"&gt;Dask Jupyterlab extension (6 minute video)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://distributed.dask.org/en/latest/diagnosing-performance.html"&gt;Dask dashboard documentation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We recommend always having the dashboard up when you’re working with Dask.
It’s a fantastic way to get a sense of what’s working well, or poorly, so you can make adjustments.&lt;/p&gt;
&lt;section id="what-to-watch-for-on-the-dashboard"&gt;
&lt;h2&gt;What to watch for on the dashboard&lt;/h2&gt;
&lt;p&gt;Bad signs to watch out for include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Lots of white space in the task stream plot is a bad sign. White space means nothing is happening. Chunks may be too small.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lots and lots of red in the task stream plot is a bad sign. Red means worker communication. Dask workers need some communication, but if they are doing almost nothing except communication then there is not much productive work going on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On the worker memory plot, watch out for orange bars which are a sign you are getting close to the memory limit. Chunks may be too big.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On the worker memory plot, watch out for grey bars which mean data is being spilled to disk. Chunks may be too big.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here is an example of the Dask dashboard during a good computation (&lt;a class="reference external" href="https://youtu.be/N_GqzcuGLCY?t=372"&gt;time 6:12 in this video&lt;/a&gt;).
&lt;img alt="Visualizating Dask array chunks with the HTML repr" src="https://blog.dask.org/_images/good-dask-dashboard.png" /&gt;&lt;/p&gt;
&lt;p&gt;For comparison, here is an example of the Dask dashboard during a bad computation (&lt;a class="reference external" href="https://youtu.be/N_GqzcuGLCY?t=417"&gt;time 6:57 in this video&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In this example, it’s inefficient because the chunks are much too small, so we see a lot of white space and red worker communication in the task stream plot.
&lt;img alt="Visualizating Dask array chunks with the HTML repr" src="https://blog.dask.org/_images/bad-dask-dashboard-zoomedin.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 138)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="rechunking-arrays"&gt;
&lt;h1&gt;Rechunking arrays&lt;/h1&gt;
&lt;p&gt;If you need to change the chunking of a Dask array in the middle of a computation, you can do that with the &lt;a class="reference external" href="https://docs.dask.org/en/latest/generated/dask.array.rechunk.html"&gt;rechunk&lt;/a&gt; method.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;rechunked_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;original_array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rechunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; Rechunking Dask arrays comes at a cost.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The Dask graph must be rearranged to accomodate the new chunk structure. This happens immediately, and will block any other interaction with python until Dask has rearranged the task graph.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This also inserts new tasks into the Dask graph. At compute time, there are now more tasks to execute.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For these reasons, it is best to choose a good initial chunk size and avoid rechunking.&lt;/p&gt;
&lt;p&gt;However, sometimes the data is stored on disk is not well aligned and rechunking may be necessary.
For an example of this, here is Draga Doncila Pop &lt;a class="reference external" href="https://youtu.be/10Ws59NGDaE?t=833"&gt;talking about chunk alignment&lt;/a&gt; with satellite image data.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://rechunker.readthedocs.io/en/latest/"&gt;rechunker&lt;/a&gt; library can be useful in these situations:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Rechunker takes an input array (or group of arrays) stored in a persistent storage device (such as a filesystem or a cloud storage bucket) and writes out an array (or group of arrays) with the same data, but different chunking scheme, to a new location. Rechunker is designed to be used within a parallel execution framework such as Dask.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 160)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="unmanaged-memory"&gt;
&lt;h1&gt;Unmanaged memory&lt;/h1&gt;
&lt;p&gt;Last, remember that you don’t only need to consider the size of the array chunks in memory, but also the working memory consumed by your analysis functions. Sometimes that is called “unmanaged memory” in Dask.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;“Unmanaged memory is RAM that the Dask scheduler is not directly aware of and which can cause workers to run out of memory and cause computations to hang and crash.” – Guido Imperiale&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Here are some tips for handling unmanaged memory:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://coiled.io/blog/tackling-unmanaged-memory-with-dask/"&gt;Tackling unmanaged memory with Dask (Coiled blogpost)&lt;/a&gt; by Guido Imperiale&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://youtu.be/nwR6iGR0mb0"&gt;Handle Unmanaged Memory in Dask (8 minute video)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/11/02/choosing-dask-chunk-sizes.md&lt;/span&gt;, line 171)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="thanks-for-reading"&gt;
&lt;h1&gt;Thanks for reading&lt;/h1&gt;
&lt;p&gt;We hope this was helpful figuring out how to choose good chunk sizes for Dask. This blogpost was inspired by &lt;a class="reference external" href="https://twitter.com/DataNerdery/status/1424953376043790341"&gt;this twitter thread&lt;/a&gt;. If you’d like to follow Dask on Twitter, you can do that at &lt;a class="reference external" href="https://twitter.com/dask_dev"&gt;https://twitter.com/dask_dev&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="performance" label="performance"/>
    <published>2021-11-02T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/10/20/czi-eoss-update/</id>
    <title>CZI EOSS Update</title>
    <updated>2021-10-20T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;p&gt;Dask was awarded funding last year in round 2 of the &lt;a class="reference external" href="https://chanzuckerberg.com/eoss/proposals/"&gt;CZI Essential Open Source Software&lt;/a&gt; grant program.
That funding was used to hire &lt;a class="reference external" href="https://github.com/GenevieveBuckley/"&gt;Genevieve Buckley&lt;/a&gt; to work on Dask with a focus on &lt;a class="reference external" href="https://blog.dask.org/2021/03/04/the-life-science-community"&gt;life sciences&lt;/a&gt;.
Last month Dask submitted an interim progress report to CZI, covering the period from February to September 2021.
That progress update is published verbatim below, to share with the wider Dask community.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/10/20/czi-eoss-update.md&lt;/span&gt;, line 16)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="progress-overview"&gt;

&lt;section id="brief-summary"&gt;
&lt;h2&gt;Brief summary&lt;/h2&gt;
&lt;p&gt;The scope of work performed by the Dask fellow includes code contributions, conference presentations and tutorials, community engagement, and outreach including blogposts.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;The primary deliverable of this proposal is consistency and the success of neighboring software
projects&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Project work to date includes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;38 pull requests merged (plus 6 draft pull requests) across 5 different repositories.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;3 conferences (presentations and organising of specialist workshops)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 half day workshop (plus another one upcoming)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Student supervision for Dask’s Google Summer of Code project&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;9 blogposts (plus 2 drafts for upcoming publication)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="code-contributions"&gt;
&lt;h2&gt;Code contributions&lt;/h2&gt;
&lt;p&gt;Code contributions are not limiteed to the main Dask repository, but also neighbouring software projects which use Dask as well (like the &lt;a class="reference external" href="https://napari.org/"&gt;napari&lt;/a&gt; software project), including: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-examples&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;napari&lt;/span&gt;&lt;/code&gt;, &amp;amp; &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;napari.github.io&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To date, across the five repositories named above the Dask fellow has contributed:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;38 pull requests&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;6 draft pull requests&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;12 closed pull requests (not merged, discarded in favour of another approach)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Dask fellow is an official maintainer of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt; project, and additional milestones achieved for that project include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The maintainer team has been grown by one (we welcome Marvin Albert to our ranks)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2 new dask-image releases in 2020&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="code-contribution-highlights"&gt;
&lt;h2&gt;Code contribution highlights&lt;/h2&gt;
&lt;p&gt;Highlights include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Bugfixes benefitting the broader community&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/7391"&gt;dask PR #7391&lt;/a&gt;: This PR fixed slicing the output from Dask’s bincount function. The impact of this fix was substantial, as it solved issues filed in four separate projects: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scikit-image&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-ml&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;xgcm/xhistogram&lt;/span&gt;&lt;/code&gt; and the cupy dask tests.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expanded GPU support&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/6680"&gt;dask PR #6680&lt;/a&gt;: This PR provided support for different array types in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;*_like&lt;/span&gt;&lt;/code&gt; array creation functions. Now users can create &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cupy&lt;/span&gt;&lt;/code&gt; like Dask arrays for GPU processing, or indeed any other array type (eg: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sparse&lt;/span&gt;&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-image/pull/157"&gt;dask-image PR #157&lt;/a&gt;: This PR provided GPU support for binary morphological functions in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt; project.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Visualization tools benefitting all Dask users&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/7716"&gt;dask PR #7716&lt;/a&gt;: This PR automatically displays the high level graph visualization in the jupyter notebook cell output (somthing already done automatically for low level graphs).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/7763"&gt;dask PR #7763&lt;/a&gt;: This PR introduced a HTML representation for Dask &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HighLevelGraph&lt;/span&gt;&lt;/code&gt; objects. This allows users and developers a much easier way to inspect the structure and status of HighLevelGraphs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Further developed on during the Dask Google Summer of Code project, full report available &lt;a class="reference external" href="https://blog.dask.org/2021/08/23/gsoc-2021-project"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;High Level Graphs&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/7595"&gt;dask PR #7595&lt;/a&gt;: This PR introduced a high level graph layer for array overlaps. High level graphs are a tool we can use to optimize Dask’s performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/7655"&gt;dask PR #7655&lt;/a&gt; (ongoing): This PR introduces a high level graph for Dask array slicing operations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Memory improvements (ongoing)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/8124"&gt;dask PR #8124&lt;/a&gt; (ongoing): This PR investigates improved automatic rechunking strategies for &lt;a class="reference external" href="https://github.com/dask/dask/issues/8110"&gt;memory problems&lt;/a&gt; caused by reshaping Dask arrays.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/7950"&gt;dask PR #7950&lt;/a&gt; (ongoing): This PR aims to improve memory and performance of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tensordot&lt;/span&gt;&lt;/code&gt; function with auto-rechunking of Dask arrays.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask/pull/7980"&gt;dask PR #7980&lt;/a&gt; (ongoing): This PR aims to fix the unbounded memory use problem in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tensordot&lt;/span&gt;&lt;/code&gt;, reported &lt;a class="reference external" href="https://github.com/dask/dask/issues/6916"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="conferences"&gt;
&lt;h2&gt;Conferences&lt;/h2&gt;
&lt;p&gt;Notable conference events in 2021 included the SciPy conference, the Dask Summit, and VIS2021.&lt;/p&gt;
&lt;section id="scipy-conference"&gt;
&lt;h3&gt;SciPy conference&lt;/h3&gt;
&lt;p&gt;The Dask fellow presented a talk titled &lt;em&gt;“Scaling Science: leveraging Dask for life sciences”&lt;/em&gt; at the 2021 SciPy conference. Full recording &lt;a class="reference external" href="https://www.youtube.com/watch?v=tY_lCGS1BMk&amp;amp;amp;t=60s"&gt;available here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="dask-summit"&gt;
&lt;h3&gt;Dask Summit&lt;/h3&gt;
&lt;p&gt;The Dask fellow organised two workshops at the 2021 &lt;a class="reference external" href="https://summit.dask.org/"&gt;Dask Summit&lt;/a&gt;:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Dask Down Under (co-organised with Nick Mortimer), and&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Dask life science workshop&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;section id="dask-down-under"&gt;
&lt;h4&gt;Dask Down Under&lt;/h4&gt;
&lt;p&gt;The scope of Dask Down Under was more like a mini-conference for Australian timezones, rather than a typical workshop. Dask Down Under involved two days of events, covering:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;5 talks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2 tutorials&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 panel discussion&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 meet and greet networking event&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It was very well recieved by the community. A full report on the Dask Down under events is available &lt;a class="reference external" href="https://blog.dask.org/2021/06/25/dask-down-under"&gt;here&lt;/a&gt;. A YouTube playlist of the Dask Down Under events is available &lt;a class="reference external" href="https://www.youtube.com/watch?v=10Ws59NGDaE&amp;amp;amp;list=PLJ0vO2F_f6OAXBfb_SAF2EbJve9k1vkQX"&gt;here on the Dask YouTube channel&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="dask-life-science-workshop"&gt;
&lt;h4&gt;Dask life science workshop&lt;/h4&gt;
&lt;p&gt;The Dask life science workshop involved:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;15 pre-recorded lightning talks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;3 interactive discussion times (accessible across timezones in Europe, Oceania, and the Americas)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Asynchronous text chat throughout the Dask Summit&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A full report on the Dask life science workshop is available &lt;a class="reference external" href="https://blog.dask.org/2021/05/24/life-science-summit-workshop"&gt;here&lt;/a&gt;. A YouTube playlist of all the Dask life science lightning talks is available &lt;a class="reference external" href="https://www.youtube.com/watch?v=6PerbQhcupM&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0"&gt;here on the Dask YouTube channel&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="vis2021-symposium"&gt;
&lt;h4&gt;VIS2021 symposium&lt;/h4&gt;
&lt;p&gt;The Dask fellow was an invited panellist at the &lt;a class="reference external" href="https://www.vis2021.com.au/"&gt;VIS2021 symposium&lt;/a&gt; in February 2021. The “Problem Solver” panel discussion covered practical problems in image analysis and how tools like Dask and napari can help solve them.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="tutorials-and-workshops"&gt;
&lt;h2&gt;Tutorials and workshops&lt;/h2&gt;
&lt;p&gt;The Dask fellow co-presented a half-day workshop (five hours) at the 2021 &lt;a class="reference external" href="https://www.lmameeting.com.au/"&gt;Light Microscopy Australia Meeting&lt;/a&gt; with Juan Nunez-Iglesias. &lt;a class="reference external" href="https://napari.org/"&gt;napari&lt;/a&gt; is an open source multidimensional image viewer built using Dask for out-of-core image processing. Workshop content is available at this link: &lt;a class="github reference external" href="https://github.com/jni/lma-2021-bioimage-analysis-python/"&gt;jni/lma-2021-bioimage-analysis-python&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Upcoming workshop:&lt;/strong&gt;
The Dask fellow has been invited to deliver a workshop on &lt;a class="reference external" href="https://napari.org/"&gt;napari&lt;/a&gt; and big data using &lt;a class="reference external" href="https://dask.org/"&gt;Dask&lt;/a&gt; at an upcoming &lt;a class="reference external" href="http://eubias.org/NEUBIAS/training-schools/neubias-academy-home/"&gt;NEUBIAS Academy&lt;/a&gt;. Workshop content is available at this link: &lt;a class="github reference external" href="https://github.com/GenevieveBuckley/napari-big-data-training"&gt;GenevieveBuckley/napari-big-data-training&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="google-summer-of-code"&gt;
&lt;h2&gt;Google Summer of Code&lt;/h2&gt;
&lt;p&gt;The Dask fellow supervised a Google Summer of Code student in 2021. Martin Durant acted as a secondary supervisor. The project ran over a 3 month period, and involved implementing a number of features to improve visualization of Dask graphs and objects. A full report on the Dask GSOC project is available &lt;a class="reference external" href="https://blog.dask.org/2021/08/23/gsoc-2021-project"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="blogposts"&gt;
&lt;h2&gt;Blogposts&lt;/h2&gt;
&lt;p&gt;We set a goal of one blogpost per month, and exceeded it. To date, nine blogposts have been published by the Dask fellow, with another two currently in draft status.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/03/04/the-life-science-community"&gt;Getting to know the life science community&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/03/29/apply-pretrained-pytorch-model"&gt;Dask with PyTorch for large scale image analysis&lt;/a&gt; (co-authored with Nick Sofreniew)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/05/07/skeleton-analysis"&gt;Skeleton analysis&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/05/24/life-science-summit-workshop"&gt;Life sciences at the 2021 Dask Summit&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/05/25/user-survey"&gt;The 2021 Dask User Survey is out now&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/06/25/dask-down-under"&gt;Dask Down Under&lt;/a&gt; (co-authored with Nick Mortimer)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/07/02/ragged-output"&gt;Ragged output, how to handle awkward shaped results&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/07/07/high-level-graphs"&gt;High Level Graphs update&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2021/08/23/gsoc-2021-project"&gt;Google Summer of Code 2021 - Dask Project&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Draft status, will be published soon:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-blog/pull/108"&gt;Mosaic Image Fusion&lt;/a&gt; (co-authored with Volker Hisenstein)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/dask-blog/pull/109"&gt;2021 Dask user survey results&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/10/20/czi-eoss-update/"/>
    <summary>Dask was awarded funding last year in round 2 of the CZI Essential Open Source Software grant program.
That funding was used to hire Genevieve Buckley to work on Dask with a focus on life sciences.
Last month Dask submitted an interim progress report to CZI, covering the period from February to September 2021.
That progress update is published verbatim below, to share with the wider Dask community.</summary>
    <category term="lifescience" label="life science"/>
    <published>2021-10-20T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/09/15/user-survey/</id>
    <title>2021 Dask User Survey</title>
    <updated>2021-09-15T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;p&gt;This post presents the results of the 2021 Dask User Survey, which ran earlier this year.
Thanks to everyone who took the time to fill out the survey!
These results help us better understand the Dask community and will guide future development efforts.&lt;/p&gt;
&lt;p&gt;The raw data, as well as the start of an analysis, can be found in this binder:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://mybinder.org/v2/gh/dask/dask-examples/main?urlpath=%2Ftree%2Fsurveys%2F2021.ipynb"&gt;&lt;img alt="Binder" src="https://mybinder.org/badge_logo.svg" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Let us know if you find anything in the data.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 19)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="contents"&gt;

&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#highlights"&gt;&lt;span class="xref myst"&gt;Highlights&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#who-are-dask-users"&gt;&lt;span class="xref myst"&gt;Who are Dask users?&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#how-people-like-to-use-dask"&gt;&lt;span class="xref myst"&gt;How people like to use Dask&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#diagnostics"&gt;&lt;span class="xref myst"&gt;Diagnostics&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="../../2021/05/21/stability/#stability"&gt;&lt;span class="std std-ref"&gt;Stability&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#user-satisfaction"&gt;&lt;span class="xref myst"&gt;User satisfaction, support, and documentation&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#suggestions-for-improvement"&gt;&lt;span class="xref myst"&gt;Suggestions for improvement&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#previous-survey-results"&gt;&lt;span class="xref myst"&gt;Previous survey results&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 30)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="highlights"&gt;
&lt;h1&gt;Highlights &lt;a class="anchor" id="highlights"&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;We had 247 responses to the survey (roughly the same as last year, which had just under 240 responses). Overall, responses were similar to previous years.&lt;/p&gt;
&lt;p&gt;We asked 43 questions in the survey (an increase of 18 questions compared to the year before). We asked a bunch of new questions about the types of datasets people work with, the stability of Dask, and what kinds of industries people work in.&lt;/p&gt;
&lt;p&gt;Our community wants:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;More documentation and examples&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;More intermediate level documentation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To improve the resiliency of Dask (i.e. do computations complete?)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Users also value these features:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Improved scaling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ease of deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better scikit-learn &amp;amp; machine learning support&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="the-typical-dask-user"&gt;
&lt;h2&gt;The typical Dask user&lt;/h2&gt;
&lt;p&gt;The survey shows us there is a lot of diversity in our community, and there is no one way to use Dask. That said, our hypothetical “typical” Dask user:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Works with gigabyte sized datasets&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stored on a local filesystem&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Has been using Dask between 1 and 3 years&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses Dask occasionally, not every day&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses Dask interactively at least part of the time&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses a compute cluster (probably)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Likes to view the Dask dashboard with a web browser&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the most part, Dask is stable enough for their needs, but improving the Dask’s resiliancy would be helpful&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses the Dask dataframe, delayed, and maybe the Dask Array API, alongside numpy/pandas and other python libraries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The most useful thing that would help this person is more documentation, and more examples using Dask in their field.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;They likely work in a scientific field (perhaps geoscience, life science, physics, or astronomy), or alternatively they might work in accounting, finance, insurance, or as a tech worker.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can read the survey results from previous years here: &lt;a class="reference external" href="https://blog.dask.org/2020/09/22/user_survey"&gt;2020 survey results&lt;/a&gt;, &lt;a class="reference external" href="https://blog.dask.org/2019/08/05/user-survey"&gt;2019 survey results&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Let&amp;#39;s load in the survey data...&lt;/span&gt;
&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;matplotlib&lt;/span&gt; &lt;span class="n"&gt;inline&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pprint&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pandas&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;seaborn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sns&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;textwrap&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;re&lt;/span&gt;


&lt;span class="n"&gt;df2019&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;data/2019-user-survey-results.csv.gz&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parse_dates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Timestamp&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;How often do you use Dask?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;I use Dask all the time, even when I sleep&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Every day&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df2020&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;data/2020-user-survey-results.csv.gz&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Timestamp&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;%Y/%m/&lt;/span&gt;&lt;span class="si"&gt;%d&lt;/span&gt;&lt;span class="s2"&gt; %H:%M:%S %p %Z&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;datetime64[ns]&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;How often do you use Dask?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;I use Dask all the time, even when I sleep&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Every day&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df2021&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;data/2021-user-survey-results.csv.gz&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Timestamp&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;datetime64[ns]&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;How often do you use Dask?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;I use Dask all the time, even when I sleep&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Every day&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;common&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2019&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intersection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df2020&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intersection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;added&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;difference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df2020&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dropped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2020&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;difference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;df2019&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Year&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Year&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Timestamp&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 104)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="who-are-dask-users"&gt;
&lt;h1&gt;Who are Dask users? &lt;a class="anchor" id="who-are-dask-users"&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Most people said they use Dask occasionally, while a smaller group use Dask every day. There is a wide variety in how long people have used Dask for, with the most common response being between one and three years.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How often do you use Dask?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_7_0.png" /&gt;&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How long have you used Dask?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;More than 3 years&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;1 - 3 years&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;3 months - 1 year&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Less than 3 months&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;I&amp;#39;ve never used Dask&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_8_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;Just over half of respondants use Dask with other people (their team or organisation), and the other half use Dask on their own.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Do you use Dask as part of a larger group?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;I use Dask mostly on my own&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;My team or research group also use Dask&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;Beyond my group, many people throughout my institution use Dask&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_10_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;In the last year, there has been an increase in the number of people who say that many people throughout their institution use Dask (32 people said this in 2021, compared to 19 in 2020). Between 2019 and 2020, there was a drop in the number of people who said their immediate team also uses Dask (121 people said this in 2019, compared to 94 in 2020). It’s not clear why we saw either of these changes, so it will be interesting to see what happens in future years.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Do you use Dask as part of a larger group?&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Year&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_12_0.png" /&gt;&lt;/p&gt;
&lt;section id="what-industry-do-you-work-in"&gt;
&lt;h2&gt;What industry do you work in?&lt;/h2&gt;
&lt;p&gt;There was a wide variety of industries represented in the survey.&lt;/p&gt;
&lt;p&gt;Almost half of responses were in an industry related to science, academia, or a governmant laboratory. Geoscicence had the most responses, while life sciences, physics, and astronomy were also popular fields.&lt;/p&gt;
&lt;p&gt;Around 30 percent of responses were from people in businesss and tech. Of these, there was a roughly even split between people in accounting/finance/insurance vs other tech workers.&lt;/p&gt;
&lt;p&gt;Around 10 percent of responses belonged to manufacturing, engineering, and other industry (energy, aerospace, etc). The remaining responses were difficult to categorise.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;What industry do you work in?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_level_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_14_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="how-easy-is-it-for-you-to-upgrade-to-newer-versions-of-python-libraries"&gt;
&lt;h2&gt;How easy is it for you to upgrade to newer versions of Python libraries?&lt;/h2&gt;
&lt;p&gt;The majority of users are able to easily upgrade to newer versoins of python libraries when they want.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How easy is it for you to upgrade to newer versions of Python libraries&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Scale from 1 (Difficult) to 4 (Easy)&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_16_0.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 181)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="how-people-like-to-use-dask"&gt;
&lt;h1&gt;How people like to use Dask &lt;a class="anchor" id="how-people-like-to-use-dask"&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;People like to use Dask in conjunction with numpy and pandas, along with a range of other python libraries.
The most popular Dask APIs are &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe.html"&gt;Dask Dataframes&lt;/a&gt;, &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;Dask Delayed&lt;/a&gt;, and &lt;a class="reference external" href="https://docs.dask.org/en/latest/array.html"&gt;Dask Arrays&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The vast majority of people like to use Dask interactively with Jupyter or IPython at least part of the time, and most people view the &lt;a class="reference external" href="https://docs.dask.org/en/latest/diagnostics-distributed.html"&gt;Dask Dashboard&lt;/a&gt; with a web browser.&lt;/p&gt;
&lt;section id="what-are-some-other-libraries-that-you-often-use-with-dask"&gt;
&lt;h2&gt;What are some other libraries that you often use with Dask?”&lt;/h2&gt;
&lt;p&gt;The ten most common libraries people use with Dask are: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;numpy&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;pandas&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;xarray&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scikit-learn&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scipy&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;statsmodels&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;matplotlib&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;xgboost&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;numba&lt;/span&gt;&lt;/code&gt;, and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;joblib&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;What are some other libraries that you often use with Dask?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_20_0.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 201)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dask-apis"&gt;
&lt;h1&gt;Dask APIs&lt;/h1&gt;
&lt;p&gt;The three most popular Dask APIs people use are:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe.html"&gt;Dask Dataframes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;Dask Delayed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.dask.org/en/latest/array.html"&gt;Dask Arrays&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In 2021, we saw a small increase in the number of people who use &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;dask delayed&lt;/a&gt;, compared with previous years. This might be a good thing, it’s possible that as people develop experience and confidence with Dask, they are more likely to start using more advanced features such as &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;delayed&lt;/a&gt;. Besides this change, preferences were pretty simliar to the results from previous years.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;apis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Dask APIs&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;apis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apis&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;apis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Dask APIs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;apis&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_22_0.png" /&gt;&lt;/p&gt;
&lt;section id="interactive-or-batch"&gt;
&lt;h2&gt;Interactive or Batch?&lt;/h2&gt;
&lt;p&gt;The vast majority of people like to use Dask interactively with Jupyter or IPython at least part of the time. Less than 15% of Dask users only use Dask in batch mode (submitting scripts that run in the future).&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Interactive or Batch?&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Interactive:  I use Dask with Jupyter or IPython when playing with data, Batch: I submit scripts that run in the future&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Interactive and Batch&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Interactive:  I use Dask with Jupyter or IPython when playing with data&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Interactive&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Batch: I submit scripts that run in the future&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Batch&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Interactive and Batch&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Interactive&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Batch&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_24_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="how-do-you-view-dask-s-dashboard"&gt;
&lt;h2&gt;How do you view Dask’s dashboard?&lt;/h2&gt;
&lt;p&gt;Most people look at the Dask dashboard using a web browser. A smaller group use the &lt;a class="reference external" href="https://github.com/dask/dask-labextension"&gt;dask jupyterlab extension&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A few people are still not sure what the dashboard is all about. If that’s you too, you might like to watch &lt;a class="reference external" href="https://youtu.be/N_GqzcuGLCY"&gt;this 20 minute video&lt;/a&gt; that explains why the dashboard is super useful, or see the rest of the docs &lt;a class="reference external" href="https://docs.dask.org/en/latest/diagnostics-distributed.html"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How do you view Dask&amp;#39;s dashboard?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_26_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="local-machine-or-cluster"&gt;
&lt;h2&gt;Local machine or Cluster?&lt;/h2&gt;
&lt;p&gt;Roughly two thirds of respondants use a computing cluster at least part of the time.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Local machine or Cluster?&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Cluster&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Year&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;Year
2019    0.654902
2020    0.666667
2021    0.630081
Name: Local machine or Cluster?, dtype: float64
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Local machine or Cluster?&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;Personal laptop&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;Large workstation&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;Cluster of 2-10 machines&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;Cluster with 10-100 machines&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;&amp;#39;Cluster with 100+ machines&amp;#39;&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_29_0.png" /&gt;&lt;/p&gt;
&lt;section id="if-you-use-a-cluster-how-do-you-launch-dask"&gt;
&lt;h3&gt;If you use a cluster, how do you launch Dask?&lt;/h3&gt;
&lt;p&gt;SSH is the most common way to launch Dask on a compute cluster, followed by a HPC resource manager, then Kubernetes.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;If you use a cluster, how do you launch Dask? &amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HPC resource manager (SLURM, PBS, SGE, LSF or similar)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;HPC resource manager (SLURM PBS SGE LSF or similar)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;I don&amp;#39;t know, someone else does this for me&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;I don&amp;#39;t know someone else does this for me&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_level_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_31_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="if-you-use-a-cluster-do-you-have-a-need-for-multiple-worker-types-in-the-same-cluster"&gt;
&lt;h3&gt;If you use a cluster, do you have a need for multiple worker types in the same cluster?&lt;/h3&gt;
&lt;p&gt;Of the people who use compute clusters, a little less than half have a need for multiple worker types in the same cluster. Examples of this might include mixed workers with GPU vs no GPU, mixed workers with low or high memory allocations, etc.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;If you use a cluster, do you have a need for multiple worker / machine types (e.g. GPU / no GPU, low / high memory) in the same cluster?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Do you need multiple worker/machine types on a cluster?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_33_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="datasets"&gt;
&lt;h2&gt;Datasets&lt;/h2&gt;
&lt;section id="how-large-are-your-datasets-typically"&gt;
&lt;h3&gt;How large are your datasets typically?&lt;/h3&gt;
&lt;p&gt;Dask users most commonly work with gigabyte sized datasets. Very few users work with petabyte sized datasets.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How large are your datasets typically?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_35_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="where-are-your-datasets-typically-stored"&gt;
&lt;h3&gt;Where are your datasets typically stored?&lt;/h3&gt;
&lt;p&gt;Most people store their data on a local filesystem.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Where are your datasets typically stored?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_level_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_37_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-file-formats-do-you-typically-work-with"&gt;
&lt;h3&gt;What file formats do you typically work with?&lt;/h3&gt;
&lt;p&gt;The two most common file formats (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;csv&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;parquet&lt;/span&gt;&lt;/code&gt;) are popular among Dask Dataframe users. The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;JSON&lt;/span&gt;&lt;/code&gt; file format is also very commonly used with Dask. The fourth and fifth most common filetypes (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HDF5&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;zarr&lt;/span&gt;&lt;/code&gt;) are popular among Dask Array users. This fits with what we know about the Dask Dataframe API being the most popular, with Dask Arrays close behind.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;What file formats do you typically work with?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_level_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_39_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;This survey question had a long tail: a very wide variety of specialized file formats were reported, most only being used by one or two individuals who replied to the survey.&lt;/p&gt;
&lt;p&gt;A lot of these specialized file formats store image data, specific to particular fields (astronomy, geoscience, microscopy, etc.).&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_level_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;[&amp;#39;proprietary measurement format&amp;#39;,
 &amp;#39;netCDF3&amp;#39;,
 &amp;#39;czi&amp;#39;,
 &amp;#39;specifically NetCDF4&amp;#39;,
 &amp;#39;grib2&amp;#39;,
 &amp;#39;in-house npy-like array format&amp;#39;,
 &amp;#39;jpeg2000&amp;#39;,
 &amp;#39;netCDF4 (based on HDF5)&amp;#39;,
 &amp;#39;proprietary microscopy file types. Often I convert to Zarr with a loss of metadata.&amp;#39;,
 &amp;#39;sas7bdat&amp;#39;,
 &amp;#39;npy&amp;#39;,
 &amp;#39;npy and pickle&amp;#39;,
 &amp;#39;root with uproot&amp;#39;,
 &amp;#39;root&amp;#39;,
 &amp;#39;regular GeoTiff&amp;#39;,
 &amp;#39;.npy&amp;#39;,
 &amp;#39;Text&amp;#39;,
 &amp;#39;VCF BAM CRAM&amp;#39;,
 &amp;#39;UM&amp;#39;,
 &amp;#39;CASA measurement sets&amp;#39;,
 &amp;#39;Casa Tables (Radio Astronomy specific)&amp;#39;,
 &amp;#39;Custom binary&amp;#39;,
 &amp;#39;FITS&amp;#39;,
 &amp;#39;FITS (astronomical images)&amp;#39;,
 &amp;#39;FITS and a custom semi-relational table specification that I want to kill and replace with something better&amp;#39;,
 &amp;#39;Feather (Arrow)&amp;#39;,
 &amp;#39;GPKG&amp;#39;,
 &amp;#39;GeoTIFF&amp;#39;,
 &amp;#39;NetCDF4&amp;#39;,
 &amp;#39;Netcdf&amp;#39;,
 &amp;#39;Netcdf4&amp;#39;,
 &amp;#39;PP&amp;#39;,
 &amp;#39;SQL&amp;#39;,
 &amp;#39;SQL query to remote DB&amp;#39;,
 &amp;#39;SQL to Dataframe&amp;#39;,
 &amp;#39;Seismic data (miniSEED)&amp;#39;,
 &amp;#39;TFRecords&amp;#39;,
 &amp;#39;TIFF&amp;#39;,
 &amp;#39;Testing with all file formats. Just want it as a replacement for spark. &amp;#39;,
 &amp;#39;.raw image files&amp;#39;,
 &amp;#39;ugh&amp;#39;]
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="https://imgs.xkcd.com/comics/standards.png" alt="XKCD comic 927: Standards"&gt;
&lt;p&gt;XKCD comic “Standards” https://xkcd.com/927/&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="preferred-cloud"&gt;
&lt;h2&gt;Preferred Cloud?&lt;/h2&gt;
&lt;p&gt;The most popular cloud solution is Amazon Web Services (AWS), followed by Google Cloud Platform (GCP) and Microsoft Azure.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Preferred Cloud?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Amazon Web Services (AWS)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Google Cloud Platform (GCP)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Microsoft Azure&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Digital Ocean&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_44_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="do-you-use-dask-projects-to-deploy"&gt;
&lt;h2&gt;Do you use Dask projects to deploy?&lt;/h2&gt;
&lt;p&gt;Among those who use dask projects to deploy, &lt;a class="reference external" href="https://github.com/dask/dask-jobqueue"&gt;dask-jobqueue&lt;/a&gt;
and &lt;a class="reference external" href="https://github.com/dask/helm-chart"&gt;dask helm chart&lt;/a&gt; are the two most popular options.
There was a wide variety of projects people used for deployment.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Do you use Dask projects to deploy?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask-jobqueue&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask&amp;#39;s helm chart&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask-kubernetes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask&amp;#39;s docker image at daskdev/dask&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask-gateway&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask-ssh&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask-cloudprovider&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask-yarn&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;qhub&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;dask-mpi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;, &amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_46_0.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 450)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="diagnostics"&gt;
&lt;h1&gt;Diagnostics &lt;a class="anchor" id="diagnostics"&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;We saw earlier that most people like to view the Dask Dashboard using their web browser.&lt;/p&gt;
&lt;p&gt;In the dashboard, people said the most useful diagnostics plots were:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The task stream plot&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The progress plot, and&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The memory useage per worker plot&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Which Diagnostic plots are most useful?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;, &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_48_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;We also asked some new questions about diagnostics in 2021.&lt;/p&gt;
&lt;p&gt;We found that most people (65 percent) do not use &lt;a class="reference external" href="https://distributed.dask.org/en/latest/diagnosing-performance.html#performance-reports"&gt;Dask performance reports&lt;/a&gt;, which is a way to save the diagnostic dashboard to static HTML plots for later review.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Do you use Dask&amp;#39;s Performance reports?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Yes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;No&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_50_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;Very few people use Dask’s &lt;a class="reference external" href="https://docs.dask.org/en/latest/setup/prometheus.html"&gt;Prometheus metrics&lt;/a&gt;. Jacob Tomlinson has an excellent article on &lt;a class="reference external" href="https://medium.com/rapids-ai/monitoring-dask-rapids-with-prometheus-grafana-96eaf6b8f3a0"&gt;Monitoring Dask + RAPIDS with Prometheus + Grafana&lt;/a&gt;, if you’re interested in learning more about how to use this feature.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Do you use Dask&amp;#39;s Prometheus Metrics?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Yes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;No&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_52_0.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 490)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="stability"&gt;
&lt;h1&gt;Stability &lt;a class="anchor" id="stability"&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;We asked a number of questions around the stability of Dask, many of them new questions in 2021.&lt;/p&gt;
&lt;p&gt;The majority of people said Dask was resiliant enough for them (eg: computations complete).
However this is an area we could improve in, as 36 percent of people are not satisfied.
This was a new question 2021, so we can’t say how people opinion of Dask’s resiliancy has changed over time.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Is Dask resilient enough for you? (e.g. computations complete).&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# new question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Yes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;No&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Is Dask resilient enough for you?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_54_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;Most people say Dask in general is stable enough for them (eg: between different version releases). This is similar to the survey results from previous years.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Is Dask stable enough for you?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Yes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;No&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_56_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;People also say that the API of Dask is stable enough for them too.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Is Dask&amp;#39;s API stable enough for you?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Yes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;No&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_58_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;The vast majority of people are satisfied with the current release frequency (roughly once every two weeks).&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How is Dask&amp;#39;s release frequency?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_60_0.png" /&gt;&lt;/p&gt;
&lt;p&gt;Most people say they would pin their code to a long term support release, if one was available for Dask.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;If Dask had Long-term support (LTS) releases, would you pin your code to use them?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Yes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;No&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Would you pin to a long term support release?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_62_0.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 546)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="user-satisfaction-support-and-documentation"&gt;
&lt;h1&gt;User satisfaction, support, and documentation &lt;a class="anchor" id="user-satisfaction"&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;We asked a bunch of new questions about user satisfaction in the 2021 survey.&lt;/p&gt;
&lt;section id="how-easy-is-dask-to-use"&gt;
&lt;h2&gt;How easy is Dask to use?&lt;/h2&gt;
&lt;p&gt;The majority of people say that Dask is moderately easy to use, the same as in previous surveys.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;On a scale of 1 - 5 (1 being hardest, 5 being easiest) how easy is Dask to use?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;1 = Difficult, 5 = Easy&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;How easy is Dask to use?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_65_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="how-is-dask-s-documentation"&gt;
&lt;h2&gt;How is Dask’s documentation?&lt;/h2&gt;
&lt;p&gt;Most people think that Dask’s documentation is pretty good.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How is Dask&amp;#39;s documentation?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;1 = Not good, 5 = Great&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_67_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="how-satisfied-are-you-with-maintainer-responsiveness-on-github"&gt;
&lt;h2&gt;How satisfied are you with maintainer responsiveness on GitHub?&lt;/h2&gt;
&lt;p&gt;Almost everybody who responded feels positively about Dask’s maintainer responsiveness on GitHub .&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How satisfied are you with maintainer responsiveness on GitHub?&amp;quot;&lt;/span&gt;  &lt;span class="c1"&gt;# New question in 2021&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;1 = Not satisfied, 5 = Thrilled&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_69_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-dask-resources-have-you-used-for-support-in-the-last-six-months"&gt;
&lt;h2&gt;What Dask resources have you used for support in the last six months?&lt;/h2&gt;
&lt;p&gt;The documentation at &lt;a class="reference external" href="https://dask.org/"&gt;dask.org&lt;/a&gt; is the first place most users look for help.&lt;/p&gt;
&lt;p&gt;The breakdown of responses to this question in 2021 was very similar to previous years, with the exception that no-one seemed to know that the &lt;a class="reference external" href="https://www.youtube.com/c/Dask-dev/videos"&gt;Dask YouTube channel&lt;/a&gt; or Gitter chat existed in 2019.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;What Dask resources have you used for support in the last six months?&amp;#39;&lt;/span&gt;

&lt;span class="n"&gt;resource_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Tutorial&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Tutorial at tutorial.dask.org&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;YouTube&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;YouTube channel&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;gitter&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Gitter chat&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;, &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Make separator values consistent&lt;/span&gt;
&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;, &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resource_map&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;
&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Year&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_71_0.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 613)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="suggestions-for-improvement"&gt;
&lt;h1&gt;Suggestions for improvement &lt;a class="anchor" id="suggestions-for-improvement"&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;section id="which-would-help-you-most-right-now"&gt;
&lt;h2&gt;Which would help you most right now?&lt;/h2&gt;
&lt;p&gt;The two top priorities people said would help most right now are both related to documentation. People want more documentation, and more examples in their field. Performance improvements were also commonly mentioned as something that would help the most right now.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Which would help you most right now?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;More documentation&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;More examples in my field&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Performance improvements&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;New features&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Bug fixes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_74_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="how-can-dask-improve"&gt;
&lt;h2&gt;How can Dask improve?&lt;/h2&gt;
&lt;p&gt;We also gave people the opportunity for a free text response to the question “How can Dask imporove?”&lt;/p&gt;
&lt;p&gt;Matt has previously written an &lt;a class="reference external" href="https://blog.dask.org/2021/06/18/early-survey"&gt;early anecdotes blogpost&lt;/a&gt;
that dives into the responses to this question in more detail.&lt;/p&gt;
&lt;p&gt;He found these recurring themes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Intermediate Documentation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation Organization&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Functionality&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;High Level Optimization&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Runtime Stability and Advanced Troubleshooting&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since more documentation and examples were the two most requested improvements, I’ll summarize some of the steps forward in that area here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Regarding more intermediate documentation, Matt says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;There is a lot of good potential material that advanced users have around performance and debugging that could be fun to publish.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matt points out that Dask has excellent &lt;em&gt;reference documentation&lt;/em&gt;, but lacks a lot of good &lt;em&gt;narrative documentation&lt;/em&gt;. To address this, Julia Signell is currently investigating how we could improve the organization of Dask’s documentation (you can subscribe to &lt;a class="reference external" href="https://github.com/dask/community/issues/170"&gt;this issue thread&lt;/a&gt; if you want to follow that discussion)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matt comments that it’s hard to have good &lt;em&gt;narrative documentation&lt;/em&gt; when there are so many different &lt;em&gt;user narratives&lt;/em&gt; (i.e. Dask is used by people from many different industries). This year, we added a new question to the survey asking for the industry people work in. We added this because &lt;em&gt;“More examples in my field”&lt;/em&gt; has been one of the top two requests for the last three years. Now we can use that information to better target narrative documentation to the areas that need it most (geoscience, life science, and finance).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;What industry do you work in?&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df2021&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Which would help you most right now?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;More examples in my field&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;, &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_frame&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;What field do you want more documentation examples for?&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_76_0.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-common-feature-requests-do-you-care-about-most"&gt;
&lt;h2&gt;What common feature requests do you care about most?&lt;/h2&gt;
&lt;p&gt;Good support for numpy and pandas is critical for most users.
Users also value:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Improved scaling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ease of deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Resiliancy of Dask&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better scikit-learn &amp;amp; machine learning support&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most feature requests are similar to the survey results from previous years, although there was an increase in the number of people who say better scikit-learn/ML support is critical to them. We also added a new question about Dask’s resiliancy in 2021.&lt;/p&gt;
&lt;p&gt;In the figure below you can see how people rated the importance of each feature request, for each of the three years we’ve run this survey.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;common&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;What common feature&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)]]&lt;/span&gt;
          &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lstrip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;What common feature requests do you care about most?[&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rstrip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;]&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;common&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2019&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;level_0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Question&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;level_1&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Importance&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Year&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2019&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;common&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;level_0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Question&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;level_1&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Importance&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Year&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;common&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;level_0&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Question&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;level_1&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Importance&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Year&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ignore_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;common&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;level_2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Feature&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Importance&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Not relevant for me&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Somewhat useful&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;Critical to me&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;catplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Importance&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Feature&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Year&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sharex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="png" src="https://blog.dask.org/_images/2021_survey_78_0.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/09/15/user-survey.md&lt;/span&gt;, line 699)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="previous-survey-results"&gt;
&lt;h1&gt;Previous survey results &lt;a class="anchor" id="previous-survey-results"&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Thanks to everyone who took the survey!&lt;/p&gt;
&lt;p&gt;If you want to read more about the 2021 Dask survey, the blogpost on early anecdotes from the Dask 2021 survey &lt;a class="reference external" href="https://blog.dask.org/2021/06/18/early-survey"&gt;is available here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can read the survey results from previous years here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2020/09/22/user_survey"&gt;2020 survey results&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2019/08/05/user-survey"&gt;2019 survey results&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/09/15/user-survey/"/>
    <summary>This post presents the results of the 2021 Dask User Survey, which ran earlier this year.
Thanks to everyone who took the time to fill out the survey!
These results help us better understand the Dask community and will guide future development efforts.</summary>
    <category term="UserSurvey" label="User Survey"/>
    <published>2021-09-15T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/08/23/gsoc-2021-project/</id>
    <title>Google Summer of Code 2021 - Dask Project</title>
    <updated>2021-08-23T00:00:00+00:00</updated>
    <author>
      <name>Freyam Mehta and Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/08/23/gsoc-2021-project.md&lt;/span&gt;, line 8)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="overview"&gt;

&lt;p&gt;Here’s an update on new features related to visualizing Dask graphs and HTML representations. You can try these new features today with version &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;2021.08.1&lt;/span&gt;&lt;/code&gt; or above. This work was done by Freyam Mehta during the Google Summer of Code 2021. Dask took part in the program under the NumFOCUS umbrella organization.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/08/23/gsoc-2021-project.md&lt;/span&gt;, line 12)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#visualizing-dask-graphs"&gt;&lt;span class="xref myst"&gt;Visualizing Dask graphs&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#graphviz-node-size-scaling"&gt;&lt;span class="xref myst"&gt;Graphviz node size scaling&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#new-tooltips"&gt;&lt;span class="xref myst"&gt;New tooltips&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#color-by-layer-type"&gt;&lt;span class="xref myst"&gt;Color by layer type&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#bugfix-in-visualize-method"&gt;&lt;span class="xref myst"&gt;Bugfix in visualize method&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#html-representations"&gt;&lt;span class="xref myst"&gt;HTML Representations&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#array-images-in-html-repr-for-high-level-graphs"&gt;&lt;span class="xref myst"&gt;Array images in HTML repr for high level graphs&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#new-html-repr-for-processinterface-class"&gt;&lt;span class="xref myst"&gt;New HTML repr for ProcessInterface class&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#new-html-repr-for-security-class"&gt;&lt;span class="xref myst"&gt;New HTML repr for Security class&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/08/23/gsoc-2021-project.md&lt;/span&gt;, line 24)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="visualizing-dask-graphs"&gt;
&lt;h1&gt;Visualizing Dask graphs&lt;/h1&gt;
&lt;p&gt;There are several new features involving Dask &lt;a class="reference external" href="https://docs.dask.org/en/latest/graphs.html"&gt;task graph&lt;/a&gt; visualization. Task graphs are a visual representation of the order and dependencies of each individual task within a dask computation. They are a very userful diagnostic tool, and have been used for a long time.&lt;/p&gt;
&lt;img src="/images/gsoc21/dask-simple.png" alt="An example task graph visualization." height=300&gt;
&lt;p&gt;Freyam worked on making these visualizations more illustrative, engaging, and informative. The &lt;a class="reference external" href="https://docs.dask.org/en/latest/graphviz.html"&gt;Graphviz&lt;/a&gt; library boasts a great set of attributes which can be modifified to create a more visually appealing output.&lt;/p&gt;
&lt;p&gt;These features primarily improve the Dask &lt;a class="reference external" href="https://docs.dask.org/en/latest/high-level-graphs.html"&gt;high level graph&lt;/a&gt; visualizations. Both low level and high level Dask graphs can be accessed with very similar methods:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dask low level graph: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result.visualize()&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask high level graph: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result.dask.visualize()&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;…where &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result&lt;/span&gt;&lt;/code&gt; is a dask object or collection.&lt;/p&gt;
&lt;section id="graphviz-node-size-scaling"&gt;
&lt;h2&gt;Graphviz node size scaling&lt;/h2&gt;
&lt;p&gt;The first change you may notice to the Dask high level graphs, is that the node sizes have been adjusted to scale with the number of tasks in each layer. Layers with more tasks would appear larger than the rest.&lt;/p&gt;
&lt;p&gt;This is a helpful feature to have, because now users can get a much more intuitive sense of where the bulk of their computation takes place.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Dask high level graph&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/7869.png" alt="Example: graphviz node size scaling, pull request #7869" height=414 width=736&gt;
&lt;p&gt;Note: this change only affects the graphviz output for Dask high level graphs. Low level graphs are left unchanged, because each visual node corresponds to one task.&lt;/p&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7869"&gt;Pull request #7869 by Freyam Mehta &lt;em&gt;“Add node size scaling to the Graphviz output for the high level graphs”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-tooltips"&gt;
&lt;h2&gt;New tooltips&lt;/h2&gt;
&lt;p&gt;Dask high level graphs now include hover tooltips to provide a brief summary of more detailed information. To use the tooltips, generate a dask high level graph (eg: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result.dask.visualize()&lt;/span&gt;&lt;/code&gt;) then hover your mouse above the layer you are interested in.&lt;/p&gt;
&lt;img src="/images/gsoc21/7973.png" alt="Example: tooltips provide extra information, pull request #7973" height=414 width=736&gt;
&lt;p&gt;Tooltips provide information such as the layer type and number of tasks associated with it. There is additional information provided for specific dask collections, like dask arrays and dataframes.&lt;/p&gt;
&lt;p&gt;Dask array tooltip information additionally includes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Array shape&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chunk size&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chunk type (eg: are the array chunks numpy, cupy, sparse, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data type (eg: are the array values float, integer, boolean, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dask dataframe tooltip information additionally includes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Number of partitions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dataframe type&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dataframe columns&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Users have asked for a less overwhelming view into the dask task graph. We hope the high level graph view coupled with more detailed tooltip information can provide this middle ground, with enough information to be useful, but not so much as to become overwhelming (like the low level task graphs for large computations).&lt;/p&gt;
&lt;p&gt;Note: This feature is available for SVG output. Other image formats, like &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.png&lt;/span&gt;&lt;/code&gt;, etc. do not support tooltips.&lt;/p&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7973"&gt;Pull request #7973 by Freyam Mehta &lt;em&gt;“Add tooltips to graphviz”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="color-by-layer-type"&gt;
&lt;h2&gt;Color by layer type&lt;/h2&gt;
&lt;p&gt;There is also a new feature enabling users to color code a high level graph according to layer type. This option can be enabled by passing the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;color=&amp;quot;layer_type&amp;quot;&lt;/span&gt;&lt;/code&gt; keyword argument, eg: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result.dask.visualize(color=&amp;quot;layer_type&amp;quot;)&lt;/span&gt;&lt;/code&gt;. This change is intended to make it easier for users to see which layer types predominate.&lt;/p&gt;
&lt;p&gt;While there are no hard and fast rules about what makes a Dask computation efficient, there are some general guidelines:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dataframe shuffles are particularly expensive operations. You can &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-best-practices.html#avoid-full-data-shuffling"&gt;read more about this here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reading and writing data to/from storage/network services is often high-latency and therefore a bottleneck.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Blockwise layers are generally efficient for computation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All layers are materialized during computation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the &lt;a class="reference external" href="https://docs.dask.org/en/latest/best-practices.html"&gt;Dask best pracices&lt;/a&gt; pages for more information on creating more efficient Dask computations.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeseries&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;df3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;layer_type&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Dask high level graph with colored nodes by layer type&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/7974.png" alt="Example: Dask graph colored by layer type, pull request #7974" height=414 width=736&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7974"&gt;Pull request #7974 by Freyam Mehta &lt;em&gt;“Add colors to represent high level layer types”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="bugfix-in-visualize-method"&gt;
&lt;h2&gt;Bugfix in visualize method&lt;/h2&gt;
&lt;p&gt;Freyam also fixed a bug which caused an error when users tried to call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.visualize()&lt;/span&gt;&lt;/code&gt; with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;filename=None&lt;/span&gt;&lt;/code&gt; (issue &lt;a class="reference external" href="https://github.com/dask/dask/issues/7685"&gt;#7685&lt;/a&gt;, fixed by pull request &lt;a class="reference external" href="https://github.com/dask/dask/pull/7740"&gt;#7740&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The bug was fixed by adding an extra condition before it reaches the error. If the format is &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;None&lt;/span&gt;&lt;/code&gt;, Dask now uses use a default &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;png&lt;/span&gt;&lt;/code&gt; format.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# success&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7740"&gt;Pull request #7740 by Freyam Mehta &lt;em&gt;“Fixing calling .visualize() with filename=None”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/08/23/gsoc-2021-project.md&lt;/span&gt;, line 135)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="html-representations"&gt;
&lt;h1&gt;HTML representations&lt;/h1&gt;
&lt;p&gt;Dask makes use of HTML representations in several places, for example in Dask collections like the Array and Dataframe classes (for background reading, see &lt;a class="reference external" href="https://matthewrocklin.com/blog/2019/07/04/html-repr"&gt;this blogpost&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;More recently, we’ve introduced HTML representations for high level graphs into Dask, and Jacob Tomlinson has implemented HTML representations in several places in the dask distributed library (for further reading, see &lt;a class="reference external" href="https://blog.dask.org/2021/07/07/high-level-graphs#visualization"&gt;this other blogpost&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;During Freyam’s Google Summer of Code project, he extended the HTML representations for Dask high level graphs to include images, and introduced two entirely new HTML representations to the dask distributed library.&lt;/p&gt;
&lt;section id="array-images-in-html-repr-for-high-level-graphs"&gt;
&lt;h2&gt;Array images in HTML repr for high level graphs&lt;/h2&gt;
&lt;p&gt;The HTML representation for dask high level graphs has been extended, and now includes SVG images of dask arrays at intermediate stages of computation.&lt;/p&gt;
&lt;p&gt;The motivation for this feature is similar to the motivation behind adding tooltips, discussed above. Users want easier ways to access information about the way a Dask computation changes as it moves through each stage of computation. We hope this improvement to the HTML representation for Dask high level graphs will provide an at a glance summary of array shape and chunk size at each stage.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;

&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;  &lt;span class="c1"&gt;# shows the HTML representation in Jupyter&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/7886.png" alt="Example: Array images now included in HTML representation of Dask high level graphs, pull request #7886" height=414 width=736&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7886"&gt;Pull request #7886 by Freyam Mehta &lt;em&gt;“Add dask.array SVG to the HTML Repr”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-html-repr-for-processinterface-class"&gt;
&lt;h2&gt;New HTML repr for ProcessInterface class&lt;/h2&gt;
&lt;p&gt;A new HTML representation has been created for the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ProcessInterface&lt;/span&gt;&lt;/code&gt; class in &lt;a class="reference external" href="https://github.com/dask/distributed/"&gt;dask distributed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The HTML representation displays the status, address, and external address of the process.&lt;/p&gt;
&lt;p&gt;There are three possible status options:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Process created, not yet running (blue icon)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Process is running (green icon)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Process closed (orange icon)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="/images/gsoc21/5181-1.png" alt="Example: New HTML representation for distributed ProcessInterface class, pull request #5181" height=414 width=736&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ProcessInterface&lt;/span&gt;&lt;/code&gt; class is not intended to be used directly. Instead, more typically this information will be accessed via subclasses such as the SSH scheduler or workers.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;127.0.0.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;127.0.0.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;127.0.0.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;  &lt;span class="c1"&gt;# HTML representation for the SSH scheduler, shown in Jupyter&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;  &lt;span class="c1"&gt;# dict of all the workers&lt;/span&gt;
&lt;span class="c1"&gt;# or&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# HTML representation for the first SSH worker in the cluster&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/5181-2.png" alt="Example: New HTML representation for distributed ProcessInterface class, pull request #5181" height=414 width=736&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/distributed/pull/5181"&gt;Pull request #5181 by Freyam Mehta &lt;em&gt;“Add HTML Repr for ProcessInterface Class and all its subclasses”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-html-repr-for-security-class"&gt;
&lt;h2&gt;New HTML repr for Security class&lt;/h2&gt;
&lt;p&gt;Pull request &lt;a class="reference external" href="https://github.com/dask/distributed/pull/5178"&gt;#5178&lt;/a&gt; added a new HTML representation for the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Security&lt;/span&gt;&lt;/code&gt; class in the &lt;a class="reference external" href="https://github.com/dask/distributed/"&gt;dask distributed library&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Security&lt;/span&gt;&lt;/code&gt; HTML representation shows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Whether encryption is required&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Whether the object instance was created using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Security.temporary()&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Security(**paths_to_keys)&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For temporary security objects, keys are generated dynamically and the only copy is kept in memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For security objects created using keys stored on disk, the HTML representation will show the full filepath to the relevant security certificates on disk.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: temporary security object&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Security&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Security&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;temporary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;s&lt;/span&gt;  &lt;span class="c1"&gt;# shows the HTML representation in Jupyter&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Example: security object using certificates saved to disk&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Security&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Security&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;require_encryption&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tls_ca_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ca.pem&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tls_scheduler_cert&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;scert.pem&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;s&lt;/span&gt;  &lt;span class="c1"&gt;# shows the HTML representation in Jupyter&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/5178-2.png" alt="Example: New HTML representation for distributed Security class, pull request #5178" height=414 width=736&gt;
&lt;p&gt;In addition, the text representation has also been updated to reflect the same information shown in the HTML representation.&lt;/p&gt;
&lt;img src="/images/gsoc21/5178-1.png" alt="Example: New text representation for distributed Security class, pull request #5178" height=414 width=736&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/distributed/pull/5178/"&gt;Pull request #5178 by Freyam Mehta &lt;em&gt;“Add HTML Repr for Security Class”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/08/23/gsoc-2021-project/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <published>2021-08-23T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/07/07/high-level-graphs/</id>
    <title>High Level Graphs update</title>
    <updated>2021-07-07T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 8)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;There is a lot of work happening in Dask right now on high level graphs. We’d like to share a snapshot of current work in this area. This post is for people interested in technical details of behind the scenes work improving performance in Dask. You don’t need to know anything about it in order to use Dask.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 12)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#brief-background"&gt;&lt;span class="xref myst"&gt;Brief background&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#blockwise-layers-progress"&gt;&lt;span class="xref myst"&gt;Blockwise layers progress&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#a-high-level-graph-for-map-overlap"&gt;&lt;span class="xref myst"&gt;A high level graph for map overlap&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#slicing-and-high-level-graphs"&gt;&lt;span class="xref myst"&gt;Slicing and high level graphs&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#visualization"&gt;&lt;span class="xref myst"&gt;Visualization&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#documentation"&gt;&lt;span class="xref myst"&gt;Documentation&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 21)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="brief-background"&gt;
&lt;h1&gt;Brief background&lt;/h1&gt;
&lt;section id="what-are-high-level-graphs"&gt;
&lt;h2&gt;What are high level graphs?&lt;/h2&gt;
&lt;p&gt;High level graphs are a more compact representation of instructions needed to generate the full low level task graph.
The documentation page on Dask high level graphs is here:
https://docs.dask.org/en/latest/high-level-graphs.html&lt;/p&gt;
&lt;/section&gt;
&lt;section id="why-are-they-useful"&gt;
&lt;h2&gt;Why are they useful?&lt;/h2&gt;
&lt;p&gt;High level graphs are useful for faster scheduling.
Instead of sending very large task graphs between the scheduler and the workers, we can instead send the smaller high level graph representation to the worker. Reducing the amount of data that needs to be passed around allows us to improve the overall performance.&lt;/p&gt;
&lt;p&gt;You can read more about faster scheduling in &lt;a class="reference external" href="https://blog.dask.org/2020/07/21/faster-scheduling"&gt;our previous blogpost&lt;/a&gt;.
More recently, Adam Breindel has written about this over on the Coiled blog (&lt;a class="reference external" href="https://coiled.io/blog/dask-under-the-hood-scheduler-refactor/"&gt;link&lt;/a&gt;).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="do-i-need-to-change-my-code-to-use-them"&gt;
&lt;h2&gt;Do I need to change my code to use them?&lt;/h2&gt;
&lt;p&gt;No, you won’t need to change anything. This work is being done under the hood in Dask, and you should see some speed improvements without having to change anything in your code.&lt;/p&gt;
&lt;p&gt;In fact, you might already be benefitting from high level graphs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;“Starting with Dask 2021.05.0, Dask DataFrame computations will start sending HighLevelGraph’s directly from the client to the scheduler by default. Because of this, users should observe a much smaller delay between when they call .compute() and when the corresponding tasks begin running on workers for large DataFrame computations” https://coiled.io/blog/dask-heartbeat-by-coiled-2021-06-10/&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Read on for a snapshot of progress in other areas.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 47)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="blockwise-layers-progress"&gt;
&lt;h1&gt;Blockwise layers progress&lt;/h1&gt;
&lt;section id="summary"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Blockwise&lt;/span&gt;&lt;/code&gt; high level graph layer was introduced in the 2020.12.0 Dask release. Since then, there has been a lot of effort made to use &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Blockwise&lt;/span&gt;&lt;/code&gt; high level graph layer whereever possible for improved performance, most especially for IO operations. The following is a non-exhaustive list.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-to-date"&gt;
&lt;h2&gt;Work to date&lt;/h2&gt;
&lt;p&gt;Highlights include (in no particular order):&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Rick Zamora: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7415"&gt;Use Blockwise for DataFrame IO (parquet, csv, and orc) #7415&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Rick Zamora: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7625"&gt;Move read_hdf to Blockwise 7625&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Rick Zamora: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7615"&gt;Move timeseries and daily-stock to Blockwise #7615&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged PR by John Kirkham: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7704"&gt;Rewrite da.fromfunction w/ da.blockwise #7704&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="ongoing-work"&gt;
&lt;h2&gt;Ongoing work&lt;/h2&gt;
&lt;p&gt;Lots of other work with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Blockwise&lt;/span&gt;&lt;/code&gt; is currently in progress:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Ian Rose: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7417"&gt;Blockwise array creation redux #7417&lt;/a&gt;. This PR creates blockwise implementations for the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;from_array&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;from_zarr&lt;/span&gt;&lt;/code&gt; functions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rick Zamora: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7628"&gt;Move DataFrame from_array and from_pandas to Blockwise #7628&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bruce Merry: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7686"&gt;Use BlockwiseDep for map_blocks with block_id or block_info #7686&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 70)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="a-high-level-graph-for-map-overlap"&gt;
&lt;h1&gt;A high level graph for map overlap&lt;/h1&gt;
&lt;section id="id1"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 72); &lt;em&gt;&lt;a href="#id1"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “summary”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Investigating a high level graph for Dask’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt; is a project driven by &lt;a class="reference external" href="https://github.com/dask/dask/discussions/7404"&gt;user needs&lt;/a&gt;. People have told us that the time taken just to generate the task graph (before any actual computation takes place) can sometimes be a big user experience problem. So, we’re looking in to ways to improve it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="id2"&gt;
&lt;h2&gt;Work to date&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 76); &lt;em&gt;&lt;a href="#id2"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “work to date”.&lt;/p&gt;
&lt;/aside&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Genevieve Buckley: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7595"&gt;A HighLevelGraph abstract layer for map_overlap #7595&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This PR defers much of the computation involved in creating the Dask task graph, but does not does not reduce the total end-to-end computation time. Further optimization is therefore required.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="id3"&gt;
&lt;h2&gt;Ongoing work&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 82); &lt;em&gt;&lt;a href="#id3"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “ongoing work”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Followup work includes:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Find number of tasks in overlap layer without materializing the layer #7788 https://github.com/dask/dask/issues/7788&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement cull method for ArrayOverlapLayer #7789 https://github.com/dask/dask/issues/7789 (culling is simplifying a Dask graph by removing unnecessary tasks)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 89)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="slicing-and-high-level-graphs"&gt;
&lt;h1&gt;Slicing and high level graphs&lt;/h1&gt;
&lt;section id="id4"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 91); &lt;em&gt;&lt;a href="#id4"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “summary”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Profiling &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt;, we saw that a lot of time is being spent in slicing operations. So, slicing was a logical next step to investigate possible performance improvements with high level graphs.&lt;/p&gt;
&lt;p&gt;Meanwhile, Rick Zamora has been working on the dataframe side of Dask, using high level graphs to improve dataframe slicing/selections.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="id5"&gt;
&lt;h2&gt;Work to date&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 97); &lt;em&gt;&lt;a href="#id5"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “work to date”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;A couple of minor bugfixes/improvements:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Genevieve Buckley: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7787"&gt;SimpleShuffleLayer should compare parts_out with set(self.parts_out) #7787&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Genevieve Buckley: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7775"&gt;Make Layer get_output_keys officially an abstract method #7775&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="id6"&gt;
&lt;h2&gt;Ongoing work&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 105); &lt;em&gt;&lt;a href="#id6"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “ongoing work”.&lt;/p&gt;
&lt;/aside&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Rick Zamora: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7663"&gt;[WIP] Add DataFrameGetitemLayer to simplify HLG Optimizations #7663&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Genevieve Buckley: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7655"&gt;Array slicing HighLevelGraph layer #7655&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 111)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="visualization"&gt;
&lt;h1&gt;Visualization&lt;/h1&gt;
&lt;section id="id7"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 113); &lt;em&gt;&lt;a href="#id7"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “summary”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;We’ve also put some work into making better visualizations for Dask objects (including high level graphs).&lt;/p&gt;
&lt;p&gt;Defining a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;_repr_html_&lt;/span&gt;&lt;/code&gt; method for your classes is a great way to get nice HTML output when you’re working with jupyter notebooks. You can read &lt;a class="reference external" href="http://matthewrocklin.com/blog/2019/07/04/html-repr"&gt;this post&lt;/a&gt; to see more neat HTML representations in other scientific python libraries.&lt;/p&gt;
&lt;p&gt;Dask already uses HTML representations in lots of places (like the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Array&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Dataframe&lt;/span&gt;&lt;/code&gt; classes). We now have new HTML representations for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HighLevelGraph&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Layer&lt;/span&gt;&lt;/code&gt; objects, as well as &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Scheduler&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Client&lt;/span&gt;&lt;/code&gt; objects in Dask distributed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="id8"&gt;
&lt;h2&gt;Work to date&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 121); &lt;em&gt;&lt;a href="#id8"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “work to date”.&lt;/p&gt;
&lt;/aside&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Jacob Tomlinson: &lt;a class="reference external" href="https://github.com/dask/distributed/pull/4857"&gt;Add HTML repr to scheduler_info and incorporate into client and cluster reprs #4857&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Jacob Tomlinson: &lt;a class="reference external" href="https://github.com/dask/distributed/pull/4853"&gt;HTML reprs CLient.who_has &amp;amp; Client.has_what&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Genevieve Buckley: Implementation of HTML repr for HighLevelGraph layers #7763 https://github.com/dask/dask/pull/7763&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Genevieve Buckley &lt;a class="reference external" href="https://github.com/dask/dask/pull/7716"&gt;Automatically show graph visualization in jupyter notebooks #771&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged PR by Genevivee Buckley: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7309"&gt;Adding chunks and type information to dask high level graphs #7309&lt;/a&gt;. This PR inserts extra information into the high level graph, so that we can create richer visualizations using this extra context later on.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="example"&gt;
&lt;h2&gt;Example&lt;/h2&gt;
&lt;section id="before"&gt;
&lt;h3&gt;Before:&lt;/h3&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;highlevelgraph&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HighLevelGraph&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mh"&gt;0x7f9851b7e4f0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="after-html-representation"&gt;
&lt;h3&gt;After (HTML representation):&lt;/h3&gt;
&lt;img src="/images/2021-highlevelgraph-html-repr.png" alt="HTML representation for a Dask high level graph" width="700" height="470"&gt;
&lt;/section&gt;
&lt;section id="after-text-only-representation"&gt;
&lt;h3&gt;After (text-only representation):&lt;/h3&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.datasets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;timeseries&lt;/span&gt;

&lt;span class="n"&gt;ddf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timeseries&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;tasks&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;HighLevelGraph&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;highlevelgraph&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HighLevelGraph&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mh"&gt;0x7fc259015b80&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
 &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="n"&gt;make&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;timeseries&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;94&lt;/span&gt;&lt;span class="n"&gt;aab6e7236cbd9828bcbfb35fe6caee&lt;/span&gt;
 &lt;span class="mf"&gt;1.&lt;/span&gt; &lt;span class="n"&gt;simple&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cd01443e43b7a6eb9810ad67992c40b6&lt;/span&gt;
 &lt;span class="mf"&gt;2.&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;simple&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cd01443e43b7a6eb9810ad67992c40b6&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This gives us a much more meaningful representation, and is already being used by developers working on high level graphs.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/07/high-level-graphs.md&lt;/span&gt;, line 160)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="documentation"&gt;
&lt;h1&gt;Documentation&lt;/h1&gt;
&lt;p&gt;Finally, the documentation around high level graphs is sparse. This is because they’re relatively new, and have also been undergoing quite a bit of change. However, this makes it difficult for people. We’re planning to improve the documentation, for both users and devlopers of Dask.&lt;/p&gt;
&lt;p&gt;If you’d like to follow these discussions, or help out, you can subscribe to the issues:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;For Dask users: &lt;a class="reference external" href="https://github.com/dask/dask/issues/7709"&gt;Update HighLevelGraph documentation #7709&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Dask developers: &lt;a class="reference external" href="https://github.com/dask/dask/issues/7755"&gt;Document dev process around high level graphs #7755&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/07/07/high-level-graphs/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <published>2021-07-07T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/07/02/ragged-output/</id>
    <title>Ragged output, how to handle awkward shaped results</title>
    <updated>2021-07-02T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 8)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;This blogpost explains some of the difficulties associated with distributed computation and ragged or irregularly shaped outputs. We present a recommended method for using Dask in these circumstances.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 12)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="background"&gt;
&lt;h1&gt;Background&lt;/h1&gt;
&lt;p&gt;Often, we come across workflows where analyzing the data involves searching for features (which may or may not be present) then computing some results from those features.
Because we don’t know ahead of time how many features will be found, we can expect the processing output size to vary.&lt;/p&gt;
&lt;p&gt;For distributed workloads, we need to split up the data, process it, and then recombine the results. That means ragged output can cause cause problems (like broadcasting errors) when Dask combines the output.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 19)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="problem-constraints"&gt;
&lt;h1&gt;Problem constraints&lt;/h1&gt;
&lt;p&gt;In this blogpost, we’ll look at an example with the following constraints:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Input array data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A processing function requiring overlap between chunks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The output returned&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 27)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="solution"&gt;
&lt;h1&gt;Solution&lt;/h1&gt;
&lt;p&gt;The simplest strategy is a two step process:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Expand the array chunks using the &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html?#dask.array.overlap.overlap"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overlap&lt;/span&gt;&lt;/code&gt; function&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.map_blocks"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; with the &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.map_blocks"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;drop_axis&lt;/span&gt;&lt;/code&gt; keyword argument&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 34)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="example-code"&gt;
&lt;h1&gt;Example code&lt;/h1&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# example input data&lt;/span&gt;
&lt;span class="n"&gt;expanded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;boundary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;reflect&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;expanded&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processing_func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop_axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 45)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="multiple-output-types-supported"&gt;
&lt;h1&gt;Multiple output types supported&lt;/h1&gt;
&lt;p&gt;This pattern supports multiple types of output from the processing function, including:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;numpy arrays&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;pandas Series&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;pandas DataFrames&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can try this for yourself using any of the example processing functions below, generating dummy data output. Or, you can try out a function of your own.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Random length, 1D output returned&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pandas&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# function returns numpy array&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;processing_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;random_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# function returns pandas series&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;processing_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;random_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_series&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_series&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# function returns pandas dataframe&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;processing_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;random_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;x_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;y_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;random_length&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;y_data&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 79)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="why-can-t-i-use-map-overlap-or-reduction"&gt;
&lt;h1&gt;Why can’t I use &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;reduction&lt;/span&gt;&lt;/code&gt;?&lt;/h1&gt;
&lt;p&gt;Ragged output sizes can cause &lt;a class="reference external" href="https://numpy.org/doc/stable/user/basics.broadcasting.html"&gt;broadcasting&lt;/a&gt; errors when the outputs are combined for some Dask functions.&lt;/p&gt;
&lt;p&gt;However, if ragged output sizes aren’t a constraint for your particular programming problem, then you can continue to use the Dask &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html?#dask.array.overlap.map_overlap"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; and &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html?#dask.array.reduction"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;reduction&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; functions as much as you like.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 85)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="alternative-solution"&gt;
&lt;h1&gt;Alternative solution&lt;/h1&gt;
&lt;section id="dask-delayed"&gt;
&lt;h2&gt;Dask delayed&lt;/h2&gt;
&lt;p&gt;As an alternative solution, you can use &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;Dask delayed&lt;/a&gt; (a tutorial is &lt;a class="reference external" href="https://tutorial.dask.org/01_dask.delayed.html"&gt;available here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Advantages:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Your processing function can have any type of output (it not restricted to numpy or pandas objects)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There is more flexibility in the ways you can use Dask delayed.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;You will have to handle combining the outputs yourself.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You will have to be more careful about performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For example, because the code below uses delayed in a list comprehension, it’s very important for performance reasons that we pass in the expected metadata. Fortunately, dask has a &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.utils.make_meta"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;make_meta&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function available.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can read more about performance considerations for Dask delayed and &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed-best-practices.html"&gt;best practices here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example code:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pandas&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;

&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nd"&gt;@dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delayed&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;processing_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# returns dummy dataframe output&lt;/span&gt;
    &lt;span class="n"&gt;random_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_length&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                         &lt;span class="s1"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_length&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

&lt;span class="n"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_meta&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;expanded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;boundary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;reflect&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;blocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;expanded&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_delayed&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_delayed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processing_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ddf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/07/02/ragged-output.md&lt;/span&gt;, line 129)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="summing-up"&gt;
&lt;h1&gt;Summing up&lt;/h1&gt;
&lt;p&gt;That’s it! We’ve learned how to avoid common errors when working with processing functions returning ragged outputs. The method recommended here works well with multiple output types including: numpy arrays, pandas series, and pandas DataFrames.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/07/02/ragged-output/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <published>2021-07-02T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/06/25/dask-down-under/</id>
    <title>Dask Down Under</title>
    <updated>2021-06-25T00:00:00+00:00</updated>
    <author>
      <name>Nick Mortimer</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/25/dask-down-under.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;Dask Down Under was a special event held for the first time last month during the 2021 Dask Summit.
It featured talks, tutorials, and events tailored specifically for an Australian (and wider Oceania) audience.&lt;/p&gt;
&lt;p&gt;To get involved in the new Pangeo Oceania community group,
&lt;a class="reference external" href="https://confirmsubscription.com/h/j/E30A9F4EAC96EA73"&gt;register your interest here&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/25/dask-down-under.md&lt;/span&gt;, line 17)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#what-is-dask-down-under"&gt;&lt;span class="xref myst"&gt;What is Dask Down Under?&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#who-came"&gt;&lt;span class="xref myst"&gt;Who came?&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#watch-the-talks"&gt;&lt;span class="xref myst"&gt;Watch the talks&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#whats-next"&gt;&lt;span class="xref myst"&gt;What’s next? Here’s how to get involved!&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/25/dask-down-under.md&lt;/span&gt;, line 24)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-is-dask-down-under"&gt;
&lt;h1&gt;What is Dask Down Under?&lt;/h1&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Dask down under is a chance for everyone in Oceania to forge links and build community here in our backyard. Dask down under we feature talks, tutorials and panel discussions on using Dask to accelerate research. All levels from beginner to expert are encouraged to attend.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Dask Down Under involved two days of events:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;5 talks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2 tutorials&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 panel discussion&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1 meet and greet networking event&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/25/dask-down-under.md&lt;/span&gt;, line 35)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="who-came"&gt;
&lt;h1&gt;Who came?&lt;/h1&gt;
&lt;p&gt;There was a strong geoscience theme across Dask Down Under. This reflects the strong scientific community we have in these areas. People came from government organisations, universities, and industry.&lt;/p&gt;
&lt;p&gt;We expected most attendees would be based in the Asia-Pacific region, since those were the timezones targeted by these events.&lt;/p&gt;
&lt;p&gt;Unexpectedly, we also saw a lot of extra traffic at the talks on day one, likely from US timezones. Publicity from Dask Summit emails and tweets mentioning Dask Down Under resulted in a lot of people stopping by to watch. This more than doubled our live attendance during the first event. It was great to see so much interest coming from other parts of the world, too.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/25/dask-down-under.md&lt;/span&gt;, line 43)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="watch-the-talks"&gt;
&lt;h1&gt;Watch the talks&lt;/h1&gt;
&lt;p&gt;You can watch the talks and tutorials from the Dask Dwon Under workshop on the Dask youtube channel.
The &lt;a class="reference external" href="https://www.youtube.com/playlist?list=PLJ0vO2F_f6OAXBfb_SAF2EbJve9k1vkQX"&gt;full playlist for the workshop is available here&lt;/a&gt;.&lt;/p&gt;
&lt;section id="panel-discussion"&gt;
&lt;h2&gt;Panel discussion&lt;/h2&gt;
&lt;p&gt;A panel discussion was held, bringing together a diverse group of users from novice to expert, academic to commercial. We hope this discussion will start a conversation about using Dask in Australia, how we build our community, contribute and stay in touch with the rest of the world. You can &lt;a class="reference external" href="https://www.youtube.com/watch?v=WlSw7rhwGrA"&gt;watch it here&lt;/a&gt;:&lt;/p&gt;
&lt;iframe width="900" height="506" src="https://www.youtube.com/watch?v=WlSw7rhwGrA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;Moderator: Draga Doncila Pop&lt;br /&gt;
Panelists:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Ben Leighton, CSIRO&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tisham Dhar, Geoscience Australia&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Genevieve Buckley, Dask life science fellow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hugo Bowne-Anderson, Coiled&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="invited-talks"&gt;
&lt;h2&gt;Invited talks&lt;/h2&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://www.youtube.com/playlist?list=PLJ0vO2F_f6OAXBfb_SAF2EbJve9k1vkQX"&gt;full playlist for the workshop is available here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Featured talks include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Draga Doncila Pop, &lt;a class="reference external" href="https://www.youtube.com/watch?v=10Ws59NGDaE&amp;amp;amp;list=PLJ0vO2F_f6OAXBfb_SAF2EbJve9k1vkQX&amp;amp;amp;index=2"&gt;Interactive visualization and near real-time analysis on out-of-core satellite images&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tisham Dhar: &lt;a class="reference external" href="https://www.youtube.com/watch?v=MderTABZvyA&amp;amp;amp;list=PLJ0vO2F_f6OAXBfb_SAF2EbJve9k1vkQX&amp;amp;amp;index=3"&gt;Dask DevOps for Remote Sensing&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kirill Kouzoubov: &lt;a class="reference external" href="https://www.youtube.com/watch?v=9-zBmUSk29Q&amp;amp;amp;list=PLJ0vO2F_f6OAXBfb_SAF2EbJve9k1vkQX&amp;amp;amp;index=4"&gt;Patterns for large scale temporal processing of geo-spatial data using Dask&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ben Leighton and Kim Opie: &lt;a class="reference external" href="https://www.youtube.com/watch?v=Fbh07T1K_IE&amp;amp;amp;list=PLJ0vO2F_f6OAXBfb_SAF2EbJve9k1vkQX&amp;amp;amp;index=6"&gt;Image Processing Using Dask - Using dask and skimage to identity vegetation morphology across the Australian landscape&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nick Mortimer: &lt;a class="reference external" href="https://www.youtube.com/watch?v=YF_GNJdQRQ4&amp;amp;amp;list=PLJ0vO2F_f6OAXBfb_SAF2EbJve9k1vkQX&amp;amp;amp;index=7"&gt;Making the most of your schedule: From HPC to Local Cluster&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/25/dask-down-under.md&lt;/span&gt;, line 74)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="what-s-next"&gt;
&lt;h1&gt;What’s next?&lt;/h1&gt;
&lt;p&gt;Here’s how you can get involved:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Several people have discussed setting up a new Pangeo Oceania group. You can
&lt;a class="reference external" href="https://confirmsubscription.com/h/j/E30A9F4EAC96EA73"&gt;register your interest here&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-none notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&amp;gt; Soon we&amp;#39;ll start holding regular Pangeo Oceania meetups for sharing information, support, training, and workflow advocacy across our region.  We look forward to you helping to shape the Pangeo Oceania community. And if you have a friend or colleague that should be here too, please share this sign-up link: http://bit.ly/Pangeo_email_signup
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Python for Atmosphere and Ocean Science (PyAOS) provides information and resources to the user community: https://pyaos.github.io/
To keep the site up-to-date, the first ever PyAOS census is being conducted. It would be great if Python users in the atmosphere and/or ocean science community could take a few minutes to fill out the survey.
https://forms.gle/L84W7bsxmP86G3Ji9&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/06/25/dask-down-under/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="Australia" label="Australia"/>
    <category term="DaskSummit" label="Dask Summit"/>
    <category term="geoscience" label="geoscience"/>
    <published>2021-06-25T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/06/18/early-survey/</id>
    <title>Dask Survey 2021, early anecdotes</title>
    <updated>2021-06-18T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;p&gt;The annual Dask user survey is under way and currently accepting responses at &lt;a class="reference external" href="https://dask.org/survey"&gt;dask.org/survey&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This post provides a preview into early results, focusing on anecdotal responses.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 12)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="motivation"&gt;

&lt;p&gt;The Dask user survey helps developers focus and prioritize our larger efforts.  It’s also a fascinating and rewarding dataset of anecdotal use cases of how people use Dask today.  Thank you to everyone who has participated so far, you make a difference.&lt;/p&gt;
&lt;p&gt;The survey is still open, and I encourage people to speak up about their experience.  This blogpost is intended to encourage participation by giving you a sense for how it affects development, and by sharing user stories provided within the survey.&lt;/p&gt;
&lt;p&gt;This article skips all of the quantitative data that we collect, and focuses in on direct feedback listed in the final comments.  For a more quantitative analysis see the posts from previous years by Tom at &lt;a class="reference external" href="https://blog.dask.org/2020/09/22/user_survey"&gt;2020 Dask User Survey Results&lt;/a&gt; and  &lt;a class="reference external" href="https://blog.dask.org/2019/08/05/user-survey"&gt;2019 Dask User Survey Results&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 20)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="how-can-dask-improve"&gt;
&lt;h1&gt;How can Dask Improve?&lt;/h1&gt;
&lt;p&gt;In this post we’re going to look at answers to this one question. This was a long-form response field asking &lt;em&gt;“How can Dask Improve?”&lt;/em&gt;. Looking through some of the responses we see that a few of them fall into some common themes. I’ve grouped them here.&lt;/p&gt;
&lt;p&gt;In each section we’ll include raw responses, followed up with a few comments from me in response.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 26)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="intermediate-documentation"&gt;
&lt;h1&gt;Intermediate Documentation&lt;/h1&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;More long-form content about the internals of Dask to understand when things don’t work and why. The “Hacking Dask” tutorial in the Dask 2021 summit was precisely the kind of content I really need, because 90% of my time with Dask is spent not understanding why I’m running out of memory and I feel like I’ve ready all the documentation pages 5 times already (although sometimes I also stumble upon a useful page I’ve never seen before).&lt;/p&gt;
&lt;p&gt;There’s also a dearth of documentation of intermediate topics like blockwise in dask.array. (I think I ended up reverse engineering how it worked from docs, GitHub issue comments, reading the code, and black-box reverse engineering with different functions before I finally “got it”.)&lt;/p&gt;
&lt;p&gt;Improve documentation and error messages to cover more of the 2nd-level problems that people run into beyond the first-level tutorial examples.&lt;/p&gt;
&lt;p&gt;more examples for complex concepts (passing metadata to custom functions, for example). more examples/support for using dask arrays and cupy.&lt;/p&gt;
&lt;p&gt;I think the hardest thing about Dask is debugging performance issues with dask delayed and complex mixing of other libraries and not knowing when things are being pickled or not. I am getting better at reading the performance reports, but I think that better documentation and tutorials surrounding understanding the reports would help me greater than new features. For example, make a tutorial that does some non-trivial dask-delayed work (ie not just computing a mean) that is written against best practices and show how the performance improves with each adopted best practice/explain why things were slow with each step. I think there could also be improvements to the performance reports to point out the slowest 5 parts of your code and what lines they are, and possibly relevant docs links.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;section id="response"&gt;
&lt;h2&gt;Response&lt;/h2&gt;
&lt;p&gt;I really like this theme.  We now have a solid community of intermediate-advanced Dask users that we should empower.  We usually write materials that target the broad base of beginning users, but maybe we should rethink this a bit.  There is a lot of good potential material that advanced users have around performance and debugging that could be fun to publish.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 42)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="documentation-organization"&gt;
&lt;h1&gt;Documentation Organization&lt;/h1&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Documentation website is sometimes confusing to navigate, better separation of API and examples would help. Maybe this can inspire: &lt;a class="reference external" href="https://documentation.divio.com/"&gt;https://documentation.divio.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I actually think Dask’s documentation is pretty good. But the docs could use some reorganizing – it is often difficult to find the relevant APIs. And there is an incredible amount of HPC insider knowledge that is required to launch a typical workflow - right now much of this knowledge is hidden in the github issues (which is great! but more of it could be pushed into the FAQs to make it more accessible).&lt;/p&gt;
&lt;p&gt;More detailed documentation and examples. Start to finish examples that do not assume I know very much (about Dask, command line tools, Cloud technologies, Kubernetes, etc.).&lt;/p&gt;
&lt;p&gt;I think an easier introduction to delayed/bags and additional examples for more complex use-cases could be helpful.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;section id="id1"&gt;
&lt;h2&gt;Response&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 52); &lt;em&gt;&lt;a href="#id1"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “response”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;We get alternating praise and scorn for our documentation.  We have what I would call excellent &lt;em&gt;reference documentation&lt;/em&gt;.  In fact, if anyone wants to build a dynamic distributed task scheduler today I’m going to claim that distributed.dask.org is probably the most comprehensive reference out there.&lt;/p&gt;
&lt;p&gt;However, we lack good &lt;em&gt;narrative documentation&lt;/em&gt;, which is the concern raised by most of these comments. This is hard to do because Dask is used in so many &lt;em&gt;different user narratives&lt;/em&gt;.  It’s challenging to orient the Dask documentation around all of them simultaneously.&lt;/p&gt;
&lt;p&gt;I appreciated the direct reference in the first comment to a website with a framework.  In general I’d love to talk to people who lay out documentation semi-professionally and learn more.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 60)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="functionality"&gt;
&lt;h1&gt;Functionality&lt;/h1&gt;
&lt;p&gt;Here is a soup of various feature requests, there are a few themes among them&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Have a better pandas support (like multi-index), which can help me migrate my existing code to Dask.&lt;/p&gt;
&lt;p&gt;I’d like to see better support for actors. I think having a remote object is a common use case.&lt;/p&gt;
&lt;p&gt;Improve Dataframes - multi index!! More feature parity with Pandas API.&lt;/p&gt;
&lt;p&gt;Maybe a little less machine learning, more “classical” big data applications (CDF, PDEs, particle physics etc.). Not everything is map-reducable.&lt;/p&gt;
&lt;p&gt;Better database integration. Re-writing an SQL query in SQL Alchemy can be very impractical. Would also be great if there were better ways to ensure the process didn’t die from misjudging how much memory was needed per chunk.&lt;/p&gt;
&lt;p&gt;Better diagnostic tools; what operations are bottlenecking a task graph? Support for multiindex.&lt;/p&gt;
&lt;p&gt;I do work that regularly requires sorting a DataFrame by multiple columns. Pandas can do this single-core; H2O and Spark can do this multicore and distributed. But dask cannot sort_values() on multiple columns at all (such as &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;df.sort_values([&lt;/span&gt; &lt;span class="pre"&gt;&amp;quot;col1&amp;quot;,&lt;/span&gt; &lt;span class="pre"&gt;&amp;quot;col2&amp;quot;&lt;/span&gt; &lt;span class="pre"&gt;,&amp;quot;col3&amp;quot;&lt;/span&gt; &lt;span class="pre"&gt;],&lt;/span&gt; &lt;span class="pre"&gt;ascending=False)&lt;/span&gt;&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Type-hints! It is very tedious using Dask in a huge ML-Application without even having the option to do some static type-checking.&lt;/p&gt;
&lt;p&gt;Additionally it is very frustrating that Dask tries to mimic Pandas API, but then 40% of the API doesn’t work (isn’t implemented), or deviates so far from the Pandas API that some parameters aren’t implemented. Only way to find out about that is to read the docs. With some typehints one could mitigate much of this trial-and-error process when switching from Pandas to Dask.&lt;/p&gt;
&lt;p&gt;It’s hard to track everything around dask!!! Actors are a bit unloved, but I find them super useful&lt;/p&gt;
&lt;p&gt;Type annotations for all methods for better IDE (VSCode) support&lt;/p&gt;
&lt;p&gt;I think the Actor model could use a little love&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;section id="id2"&gt;
&lt;h2&gt;Response&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 88); &lt;em&gt;&lt;a href="#id2"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “response”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Interesting trends, not many that I would have expected&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;MultiIndex (well, this was expected)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Actors&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Type hinting for IDE support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SQL access&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 97)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="high-level-optimization"&gt;
&lt;h1&gt;High Level Optimization&lt;/h1&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Needs better physical data independence. Manual data chunking, memory management, query optimization are all a big hassle. Automate those more.&lt;/p&gt;
&lt;p&gt;Dask makes it easy for users with no parallel computing experience to scale up quickly (me), but we have no sense of how to judge our resource needs. It’d be great if Dask had some tools or tutorials that helped me judge the size of my problem (e.g. memory usage). These may already exist, but examples of how to do it may be hard to find.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 103)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="runtime-stability-and-advanced-troubleshooting"&gt;
&lt;h1&gt;Runtime Stability and Advanced Troubleshooting&lt;/h1&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Stability is the most important factor&lt;/p&gt;
&lt;p&gt;I have answered no to the Long Term Support version of dask but often the really great opportunities are those that arre on demand. The problem is that when these fixes are released, their not well advertised and something under the hood has changed. So, it ends up breaking something else or my particular knowledge of the workings are no longer correct. Dask maintainers have a bit of a weird clique and it can feel as a newbie or a learner that your talked down to or in reality. They don’t have the time to help someone. So they should probably have some more maintainers answering some of the more mundane questions via the blog or via some other method, Things we have seen people do wrong or having difficulty in . A bit of basic, a bit of intermediate and a bit of advanced. If the underlying dask API has changed, then these should be updated with new posts with updates of what has changed. Showing a breakdown of doing it the hard way. So people can see what is done step by step with standard workflows that work. Then vs dask, with less boilerplate and/or speed improvement. If there are places where speed isn’t improved. Show that the difference of where it doesnt work alongside the workflow where it might.&lt;/p&gt;
&lt;p&gt;We have long deployed dask clusters (weeks to months) and have noticed that they sometimes go into a wonky state. We’ve been unable to identify root cause(s). Redeployment is simple and easy when it does occur, but slightly annoying nonetheless.&lt;/p&gt;
&lt;p&gt;My biggest pain point is the scheduler, as I tend to spend time writing infrastructure to manage the scheduler and breaking apart / rewriting tasks graphs to minimize impact on the scheduler.&lt;/p&gt;
&lt;p&gt;As my answers make clear (and from previous conversations with Matt, James, and Genevieve) the biggest improvement I’d like to see is stable releases. Stable from both a runtime point of view (i.e. rock solid Dask distributed), and from an API point of view (so I don’t have to fix my code every couple of weeks). So a big +1 to LTS releases.&lt;/p&gt;
&lt;p&gt;Better error handling/descriptions of errors, better interoperability between (slightly) different versions&lt;/p&gt;
&lt;p&gt;If something goes wrong (in Dask, the batch system, or the interaction between Dask and the batch system), the problem is very opaque and difficult to diagnose. Dask needs significant additional documentation, and probably additional features, to make debugging easier and more transparent.&lt;/p&gt;
&lt;p&gt;Better ways of getting out logs of worker memory usage, especially after dask crashes/failures. Ways of getting performance reports written to log files, rather than html files which don’t write if the dask client process fails.&lt;/p&gt;
&lt;p&gt;Two big problems for me are when dask fails determining what when wrong and how to fix it.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;section id="id3"&gt;
&lt;h2&gt;Response&lt;/h2&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: INFO/1 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 123); &lt;em&gt;&lt;a href="#id3"&gt;backlink&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Duplicate implicit target name: “response”.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Stability definitely took a dive last December.  I’m feeling good right now though.  There is a lot of good work that should be merged in and released in the next few weeks that I think will significantly improve many of the common pain points.&lt;/p&gt;
&lt;p&gt;However, there are still many significant improvements yet to be made.  I in particular like the theme above in reporting and logging when things fail.  We’re ok at this today, but there is a lot of room for growth.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/18/early-survey.md&lt;/span&gt;, line 129)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="what-s-next"&gt;
&lt;h1&gt;What’s Next?&lt;/h1&gt;
&lt;p&gt;Do the views above fully express your thoughts on where Dask should go, or is there something missing?&lt;/p&gt;
&lt;p&gt;Share your perspective at &lt;a class="reference external" href="https://dask.org/survey"&gt;&lt;strong&gt;dask.org/survey&lt;/strong&gt;&lt;/a&gt;.  The whole process should take less than five minutes.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/06/18/early-survey/"/>
    <summary>The annual Dask user survey is under way and currently accepting responses at dask.org/survey.</summary>
    <published>2021-06-18T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/06/01/dask-distributed-user-journey/</id>
    <title>The evolution of a Dask Distributed user</title>
    <updated>2021-06-01T00:00:00+00:00</updated>
    <author>
      <name>Jacob Tomlinson (NVIDIA)</name>
    </author>
    <content type="html">&lt;p&gt;This week was the 2021 Dask Summit and &lt;a class="reference external" href="https://summit.dask.org/schedule/presentation/20/deploying-dask/"&gt;one of the workshops&lt;/a&gt; that we ran covered many deployment options for Dask Distributed.&lt;/p&gt;
&lt;p&gt;We covered local deployments, SSH, Hadoop, Kubernetes, the Cloud and managed services, but one question that came up a few times was “where do I start?”.&lt;/p&gt;
&lt;p&gt;I wanted to share the journey that I’ve seen many Dask users take in the hopes that you may recognize yourself as being somewhere along this path and it may inform you where to look next.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/01/dask-distributed-user-journey.md&lt;/span&gt;, line 16)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="in-the-beginning"&gt;

&lt;p&gt;As a user who is new to Dask you’re likely working your way through &lt;a class="reference external" href="https://docs.dask.org/en/latest/index.html"&gt;the documentation&lt;/a&gt; or perhaps &lt;a class="reference external" href="https://github.com/dask/dask-tutorial"&gt;a tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We often introduce the concept of the distributed scheduler early on, but you don’t need it to get initial benefits from Dask. Switching from Pandas to Dask for larger than memory datasets is a common entry point and performs perfectly well using the default threaded scheduler.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Switching from this&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pandas&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pd&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/data/.../2018-*-*.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# To this&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/data/.../2018-*-*.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But by the time you’re a few pages into the documentation you’re already being encouraged to create &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Client()&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;LocalCluster()&lt;/span&gt;&lt;/code&gt; objects.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note&lt;/strong&gt;: When you create a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Client()&lt;/span&gt;&lt;/code&gt; with no arguments/config set Dask will launch a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;LocalCluster()&lt;/span&gt;&lt;/code&gt; object for you under the hood. So often &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Client()&lt;/span&gt;&lt;/code&gt; is equivalent to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Client(LocalCluster())&lt;/span&gt;&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is a common area for users to stick around in, launch a local distributed scheduler and do your work maximising the resources on your local machine.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/data/.../2018-*-*.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/01/dask-distributed-user-journey.md&lt;/span&gt;, line 50)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="breaking-free-from-your-machine"&gt;
&lt;h1&gt;Breaking free from your machine&lt;/h1&gt;
&lt;p&gt;Once you get used to task graphs and work scheduling you may begin thinking about how you can expand your computation beyond your local machine.&lt;/p&gt;
&lt;p&gt;Our code doesn’t really need to change much, we are already connecting a client and doing Dask work, all we need are more networked machines with the same user environments, data, etc.&lt;/p&gt;
&lt;p&gt;Personally I used to work in an organisation where every researcher was given a Linux desktop under their desk. These machines were on a LAN and had Active Directory and user home directories stored on a storage server. This meant you could sit down at any desk and log in and have a consistent experience. This also meant you could SSH to another machine on the network and your home directory would be there with all your files including your data and conda environments.&lt;/p&gt;
&lt;p&gt;This is a common setup in many organisations and it can be tempting to SSH onto the machines of folks who may not be fully utilising their machine and run your work there. And I’m sure you ask first right!&lt;/p&gt;
&lt;p&gt;Organisations may also have servers in racks designated for computational use and the setup will be similar. You can SSH onto them and home directories and data are available via network storage.&lt;/p&gt;
&lt;p&gt;With Dask Distributed you can start to expand your workload onto these machines using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;SSHCluster&lt;/span&gt;&lt;/code&gt;. All you need is your SSH keys set up so you can log into those machines without a password.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;localhost&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;alices-desktop.lan&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;bobs-desktop.lan&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;&amp;quot;team-server.lan&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/data/.../2018-*-*.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now the same workload can run on all of the CPUs in our little ad-hoc cluster, using all the memory and pulling data from the same shared storage.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/01/dask-distributed-user-journey.md&lt;/span&gt;, line 84)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="moving-to-a-compute-platform"&gt;
&lt;h1&gt;Moving to a compute platform&lt;/h1&gt;
&lt;p&gt;Using (and abusing) hardware like desktops and shared servers will get you reasonably far, but probably to the dismay of your IT team.&lt;/p&gt;
&lt;p&gt;Organisations who have many users trying to perform large compute workloads will probably be thinking about or already have some kind of platform that is designated for running this work.&lt;/p&gt;
&lt;p&gt;The platforms your organisation has will be the result of many somewhat arbitrary technology choices. What programming languages does your company use? What deals did vendors offer at the time of procurement? What skills do the current IT staff have? What did your CTO have for breakfast the day they chose a vendor?&lt;/p&gt;
&lt;p&gt;I’m not saying these decisions are made thoughtlessly, but the criteria that are considered are often orthogonal to how the resource will ultimately be used by you. At Dask we support whatever platform decisions your organisations make. We try to build deployment tools for as many popular platforms as we can including:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Hadoop via &lt;a class="reference external" href="https://github.com/dask/dask-yarn"&gt;dask-yarn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubernetes via &lt;a class="reference external" href="https://github.com/dask/dask-kubernetes"&gt;dask-kubernetes&lt;/a&gt; and the &lt;a class="reference external" href="https://github.com/dask/helm-chart"&gt;helm chart&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HPC (with schedulers like SLURM, PBS and SGE) via &lt;a class="reference external" href="https://github.com/dask/dask-jobqueue"&gt;dask-jobqueue&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cloud platforms (including AWS, Azure and GCP) with &lt;a class="reference external" href="https://github.com/dask/dask-cloudprovider"&gt;dask-cloudprovider&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a user within an organisations you may have been onboarded to one of these platforms. You’ve probably been given some credentials and a little training on how to launch jobs on it.&lt;/p&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-foo&lt;/span&gt;&lt;/code&gt; tools listed above are designed to sit on top of those platforms and submit jobs on your behalf as if they were individual compute jobs. But instead of submitting a Python script to the platform we submit Dask schedulers and workers and then connect to them to leverage the provisioned resource. Clusters on top of clusters.&lt;/p&gt;
&lt;p&gt;With this approach your IT team has full control over the compute resource. They can ensure folks get their fair share with quotas and queues. But you as a user gets the same Dask experience you are used to on your local machine.&lt;/p&gt;
&lt;p&gt;Your data may be in a slightly different place on these platforms though. Perhaps you are on the cloud and your data is in object storage for example. Thankfully we have tools built on &lt;a class="reference external" href="https://filesystem-spec.readthedocs.io/en/latest/"&gt;fsspec&lt;/a&gt; like &lt;a class="reference external" href="https://github.com/dask/s3fs"&gt;s3fs&lt;/a&gt; or &lt;a class="reference external" href="https://pypi.org/project/adlfs/"&gt;adlfs&lt;/a&gt; we can read this data in pretty much the same way. So still not much change to your workflow.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cloudprovider.azure&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AzureVMCluster&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AzureVMCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resource_group&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;lt;resource group&amp;gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                         &lt;span class="n"&gt;vnet&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;lt;vnet&amp;gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                         &lt;span class="n"&gt;security_group&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&amp;lt;security group&amp;gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                         &lt;span class="n"&gt;n_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;adl://.../2018-*-*.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/01/dask-distributed-user-journey.md&lt;/span&gt;, line 122)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="centralizing-your-dask-resources"&gt;
&lt;h1&gt;Centralizing your Dask resources&lt;/h1&gt;
&lt;p&gt;When your organisation gets enough folks adopting and using Dask it may be time for your IT team to step in and provide you with a managed service. Having many users submitting many ad-hoc clusters in a myriad of ways is likely to be less efficient than a centrally managed and more importantly ordained service from IT.&lt;/p&gt;
&lt;p&gt;The motivation to move to a managed service is often driven at the organisational level rather than by individuals. Once you’ve reached this stage of Dask usage you’re probably quite comfortable with your workflows and it may be inconvenient to change them. However the level of Dask deployment knowledge you’ve acquired to reach this stage is probably quite large, and as Dask usage at your organization grows it’s not practical to expect everyone to reach the same level of competency.&lt;/p&gt;
&lt;p&gt;At the end of the day being an expert in deploying distributed systems is probably not listed in your job description and you probably have something more important to be getting on with like data science, finance, physics, biology or whatever it is Dask is helping you do.&lt;/p&gt;
&lt;p&gt;You may also be feeling some pressure from IT. You are running clusters on top of clusters and to them your Dask cluster is a black box and this can make them comfortable as they are the ones responsible for this hardware. It is common to feel constrained by your IT team, I know because I’ve been a sysadmin and used to constrain folks. But the motivations of your IT team are good ones, they are trying to save the organisation money, make best use of limited resources and ultimately get the IT out of your way so that you can get on with your job. So lean into this, engage with them, share your Dask knowledge and offer to become a pilot user for whatever solution they end up building.&lt;/p&gt;
&lt;p&gt;One approach you could recommend they take is to deploy &lt;a class="reference external" href="https://gateway.dask.org/"&gt;Dask Gateway&lt;/a&gt;. This can be deployed by an administrator and provides a central hub which launches Dask clusters on behalf of users. It supports many types of authentication so it can hook into whatever your organisation uses and supports many of the same backend compute platforms that the standalone tools do, including Kubernetes, Hadoop and HPC.&lt;/p&gt;
&lt;p&gt;This will allow them to ensure security settings are correct and consistent across clusters. If you are using containers they probably want you to use some official images which are regularly updated and vulnerability scanned. It may also give them more insight into what types of workloads folks are running and plan future systems more accurately. By using Dask Gateway this puts the control and responsibility of these things onto their side of the fence.&lt;/p&gt;
&lt;p&gt;Users will need to authenticate with the gateway, but then can launch Dask clusters in a platform agnostic way.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_gateway&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Gateway&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;gateway&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Gateway&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;http://daskgateway.myorg.com&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;kerberos&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gateway&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new_cluster&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/data/.../2018-*-*.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Again reading your data requires some knowledge on how it is stored on the underlying compute platform you the gateway is using, but the changes required are minimal.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/01/dask-distributed-user-journey.md&lt;/span&gt;, line 156)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="managed-services"&gt;
&lt;h1&gt;Managed services&lt;/h1&gt;
&lt;p&gt;If your organisation is too small to have an IT team to manage this for you, or you just have a preference for managed services, there are startups popping up to provide this to you as a service including &lt;a class="reference external" href="https://coiled.io/"&gt;Coiled&lt;/a&gt; and &lt;a class="reference external" href="https://www.saturncloud.io/s/home/"&gt;Saturn Cloud&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/01/dask-distributed-user-journey.md&lt;/span&gt;, line 160)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="future-platforms"&gt;
&lt;h1&gt;Future platforms&lt;/h1&gt;
&lt;p&gt;Today the large cloud vendors have managed data science platforms including &lt;a class="reference external" href="https://aws.amazon.com/sagemaker/"&gt;AWS Sagemaker&lt;/a&gt;, &lt;a class="reference external" href="https://azure.microsoft.com/en-gb/services/machine-learning/"&gt;Azure Machine Learning&lt;/a&gt; and &lt;a class="reference external" href="https://cloud.google.com/vertex-ai"&gt;Google Cloud AI Platform&lt;/a&gt;. But these do not include Dask as a service.&lt;/p&gt;
&lt;p&gt;These cloud services are focussed on batch processing and machine learning today, but these clouds also have managed services for Spark and other compute cluster offerings. With Dask’s increasing popularity it wouldn’t surprise me if managed Dask services are released by these cloud vendors in the years to follow.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/06/01/dask-distributed-user-journey.md&lt;/span&gt;, line 166)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="summary"&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;One of the most powerful features of Dask is that your code can stay pretty much the same regardless of how big or complex the distributed compute cluster is. It scales from a single machine to thousands of servers with ease.&lt;/p&gt;
&lt;p&gt;But scaling up requires both user and organisational growth and folks already seem to be treading a common path on that journey.&lt;/p&gt;
&lt;p&gt;Hopefully this post will give you an idea of where you are on that path and where to jump to next. Whether you’re new to the community and discovering the power of multi-core computing or an old hand who is trying to wrangle hundreds of users who all love Dask, good luck!&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/06/01/dask-distributed-user-journey/"/>
    <summary>This week was the 2021 Dask Summit and one of the workshops that we ran covered many deployment options for Dask Distributed.</summary>
    <category term="Distributed" label="Distributed"/>
    <category term="Organisations" label="Organisations"/>
    <category term="Tools" label="Tools"/>
    <published>2021-06-01T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/05/25/user-survey/</id>
    <title>The 2021 Dask User Survey is out now</title>
    <updated>2021-05-25T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;p&gt;The Dask User Survey is out again! Tell us how you use Dask, and help us make it better for everyone.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://forms.gle/F7QSGpSHwBWu8NCg8"&gt;Click this link to take the survey&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/25/user-survey.md&lt;/span&gt;, line 13)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="why-take-the-survey"&gt;

&lt;p&gt;Feedback from users is very important. It helps give us a clear picture who our users are and what is important to them. Your responses will inform prioritization for Dask development and improve the experience for the Dask community.&lt;/p&gt;
&lt;p&gt;We expect the survey to take no more than 5-10 minutes. It has the following short sections:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;How do you use Dask?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How could Dask improve?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What other tools do you use with Dask?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optional: What do you work on?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/25/user-survey.md&lt;/span&gt;, line 24)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="survey-results-from-previous-years"&gt;
&lt;h1&gt;Survey results from previous years&lt;/h1&gt;
&lt;p&gt;We will also publish answers to non-sensitive questions in our annual survey review to help keep everyone informed.&lt;/p&gt;
&lt;p&gt;You can see the results from previous user surveys here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2020/09/22/user_survey"&gt;2020 Dask User Survey Results&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blog.dask.org/2019/08/05/user-survey"&gt;2019 Dask User Survey Results&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/05/25/user-survey/"/>
    <summary>The Dask User Survey is out again! Tell us how you use Dask, and help us make it better for everyone.</summary>
    <category term="UserSurvey" label="User Survey"/>
    <published>2021-05-25T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/05/24/life-science-summit-workshop/</id>
    <title>Life sciences at the 2021 Dask Summit</title>
    <updated>2021-05-24T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/24/life-science-summit-workshop.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;The Dask life science workshop ran as part of the 2021 Dask Summit. Lightning talks from this workshop are &lt;a class="reference external" href="https://www.youtube.com/playlist?list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0"&gt;available here&lt;/a&gt;, and you can read on for a summary of the event.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/24/life-science-summit-workshop.md&lt;/span&gt;, line 13)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-is-the-dask-life-science-workshop"&gt;
&lt;h1&gt;What is the Dask life science workshop?&lt;/h1&gt;
&lt;p&gt;The Dask life science workshop ran as part of the 2021 Dask Summit. Currently many people in life sciences use Dask, but individual groups are relatively isolated from one another. This workshop gave us an opportunity to learn from each other, as well as opportunities to identify common frustrations and areas for improvement.&lt;/p&gt;
&lt;p&gt;The workshop involved:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Pre-recorded lightning talks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interactive discussion times (accessible across timezones in Europe, Oceania, and the Americas)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Asynchronous text chat throughout the Dask Summit&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/24/life-science-summit-workshop.md&lt;/span&gt;, line 23)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="if-i-missed-it-how-can-i-catch-up"&gt;
&lt;h1&gt;If I missed it, how can I catch up?&lt;/h1&gt;
&lt;p&gt;If you missed the Dask Summit, you can catch up on YouTube.
There is a playlist of all the life science lightning talks &lt;a class="reference external" href="https://www.youtube.com/playlist?list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0"&gt;available here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can also join our &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;#life-science&lt;/span&gt;&lt;/code&gt; channel on Slack:
&lt;a class="reference external" href="https://join.slack.com/t/dask/shared_invite/zt-mfmh7quc-nIrXL6ocgiUH2haLYA914g"&gt;Click here for an invitation link&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/24/life-science-summit-workshop.md&lt;/span&gt;, line 31)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="who-came"&gt;
&lt;h1&gt;Who came?&lt;/h1&gt;
&lt;p&gt;We invited attendees at the life science workshop to do a short Q&amp;amp;A about their work with Dask. This is a small subset of the people who joined us, many people came to the conference and did not do a Q&amp;amp;A.&lt;/p&gt;
&lt;p&gt;The responses give us an overview of the diversity of work people in the community are doing. In no particular order, here are some of those Q&amp;amp;As:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Tom White&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; EU/UK&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Statistical genetics&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt; Run per-row linear regressions at scale.&lt;br /&gt;
&lt;strong&gt;What do you want to do next with Dask?&lt;/strong&gt; Collaborative optimization of a public workflow (GWAS).&lt;br /&gt;
&lt;strong&gt;Lightning talk:&lt;/strong&gt; &lt;a class="reference external" href="https://www.youtube.com/watch?v=qt6YsHoPpZs&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=2"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Giovanni Palla&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; Helmholtz Center Munich&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; Europe&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Computational Biology and Spatial transcriptomics&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt; &lt;a class="reference external" href="http://image.dask.org/en/latest/"&gt;dask-image&lt;/a&gt; for image processing.&lt;br /&gt;
**What do you want to do next with Dask? Further integration with &lt;a class="reference external" href="https://squidpy.readthedocs.io/en/latest/"&gt;Squidpy&lt;/a&gt;.&lt;br /&gt;
**Lightning talk:** &lt;a class="reference external" href="https://www.youtube.com/watch?v=sGr7O8spfvE&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=8"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Isaac Virshup&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; University of Melbourne. Open source projects Scanpy and AnnData
&lt;strong&gt;Timezone:&lt;/strong&gt; AEST&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Single cell omics data.&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt;&lt;br /&gt;
I’ve used dask for some nested embarrassingly parallel calculations. Having an intelligent scheduler with good monitoring made this task as easy as it should be, especially compared with multiprocessing or joblib.&lt;br /&gt;
&lt;strong&gt;What do you want to do next with Dask?&lt;/strong&gt;&lt;br /&gt;
I would love to get AnnData, a container for working with single cell assays integrated with dask. Dataset sizes in this field are constantly increasing, and it would be good to be able to work with the coolest new dataset regardless of available RAM.&lt;br /&gt;
Since we rely heavily on sparse arrays, a key step towards this will be getting better sparse array support (CSC and CSR especially) inside dask. After all, it’s not great if our strategy for scaling out requires many times the total memory! As a maintainer, I’m interested in hearing people’s experience with distributing tools that integrate well with dask.&lt;br /&gt;
&lt;strong&gt;Lightning talk:&lt;/strong&gt; &lt;a class="reference external" href="https://www.youtube.com/watch?v=e8pWpRo5Ars&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=14"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Anna Kreshuk&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; European Molecular Biology Laboratory&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; CEST (GMT+2)&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Machine learning for microscopy image analysis.&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt; We run a lot of image processing workflows and want to see how Dask can be exploited in this context.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Beth Cimini&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; Broad Institute&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; US-East&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; User friendly image analysis tools for microscopy imaging.&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt; Making Dask work in CellProfiler, to make it easy to analyze big images in high throughput!&lt;br /&gt;
&lt;strong&gt;Lightning talk:&lt;/strong&gt; &lt;a class="reference external" href="https://www.youtube.com/playlist?list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Volker Hilsenstein&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; EMBL / Alexandrov lab&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; Central European Summer Time&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Spatial Metabolomics, combining microscopy and mass spectrometry.&lt;br /&gt;
&lt;strong&gt;Something I would like to try with dask:&lt;/strong&gt; fusing large mosaics of individual images or image volumes for which affine transformation into a joint coordinate system are available.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Marvin Albert&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; University of Zurich&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; UTC/GMT +2&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Life sciences / image analysis&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask? What do you want to do next with Dask?&lt;/strong&gt; Parallelise / reduce the memory footprint of image processing tasks and define workflows that can run on different compute environments.&lt;br /&gt;
&lt;strong&gt;Lightning talk:&lt;/strong&gt; &lt;a class="reference external" href="https://www.youtube.com/watch?v=YIblUvonMvo&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=9"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Jordao Bragantini&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; CZ Biohub&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; Pacific Daylight Time (UTC -7)&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Light-sheet microscopy&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt; Image processing of very large data.&lt;br /&gt;
&lt;strong&gt;What do you want to do next with Dask?&lt;/strong&gt; Implement algorithms for cell segmentation.&lt;br /&gt;
&lt;strong&gt;Lightning talk:&lt;/strong&gt; &lt;a class="reference external" href="https://www.youtube.com/watch?v=xadb-oXMFKI&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=3"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Josh Moore&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; Open Microscopy Environment (OME)&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; CEST&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Bioimaging (infrastructure for RDM)&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt; Accessing large image (Zarr) volumes over HTTP, primarily.
What do you want to do next with Dask? Improve pre-fetching for typical usage patterns, possibly integrating multiscale data (i.e. google maps zooming)&lt;br /&gt;
&lt;strong&gt;Lightning talk:&lt;/strong&gt; &lt;a class="reference external" href="https://www.youtube.com/watch?v=6PerbQhcupM&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=1"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Jackson Maxfield Brown&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; PST&lt;br /&gt;
&lt;strong&gt;What kind of science do you work in?&lt;/strong&gt; Cell biology, specifically microscopy and computational biology.&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt; Built a metadata aware / backed microscopy imaging reading library that uses Dask to read any size image w/ chunking by metadata dimension information. As well as TB-scale image processing pipelines using Dask + Prefect.&lt;br /&gt;
&lt;strong&gt;What do you want to do next with Dask?&lt;/strong&gt; Tighter integration with other libraries. I see cuCim from the RAPIDs team and would love to extend work with them to have a more general “bio-image-spec” so we can all play nicely together.&lt;br /&gt;
&lt;strong&gt;Lightning talk:&lt;/strong&gt; &lt;a class="reference external" href="https://www.youtube.com/watch?v=LNa_gGpSnvc&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=8"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Gregory R. Lee&lt;br /&gt;
&lt;strong&gt;Affiliation:&lt;/strong&gt; Quansight&lt;br /&gt;
&lt;strong&gt;Timezone:&lt;/strong&gt; EST (UTC-5)&lt;br /&gt;
&lt;strong&gt;What kind of science do you work on?&lt;/strong&gt; Scientific software development (with a background doing research in magnetic resonance imaging).&lt;br /&gt;
&lt;strong&gt;Something you’ve tried (or would like to try) with Dask?&lt;/strong&gt;&lt;br /&gt;
In past research work, I used Dask primarily in two scenarios, both on a single workstation:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;To achieve multi-threading by processing image blocks in parallel on the CPU (e.g. like in dask-image)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Serial blockwise processing of large volumetric data on the GPU (i.e. CuPy arrays of 10-100 GB in size) to reduce peak memory requirements.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;What do you want to do next with Dask?&lt;/strong&gt;&lt;br /&gt;
Audit scikit-image functions to determine which can easily be accelerated using block-wise approaches as in dask-image. Ideally a subset of functions would work directly with dask-arrays as inputs rather than requiring users to learn about Dask’s map_overlap, etc. to use this feature.&lt;br /&gt;
&lt;strong&gt;Lightning talk:&lt;/strong&gt; &lt;a class="reference external" href="https://www.youtube.com/watch?v=vPorCnEhM6g&amp;amp;amp;list=PLJ0vO2F_f6OBAY6hjRHM_mIQ9yh32mWr0&amp;amp;amp;index=16"&gt;click here&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/24/life-science-summit-workshop.md&lt;/span&gt;, line 126)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-s-next"&gt;
&lt;h1&gt;What’s next?&lt;/h1&gt;
&lt;p&gt;Dask is now considering holding “office hours” for the life science community. If we can find enough maintainers able to host one-hour Q&amp;amp;A sessions, then we’ll trial this for a short period of time.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/05/24/life-science-summit-workshop/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="DaskSummit" label="Dask Summit"/>
    <category term="lifescience" label="life science"/>
    <published>2021-05-24T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/05/21/stability/</id>
    <title>Stability of the Dask library</title>
    <updated>2021-05-21T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;p&gt;Dask is moving fast these days. Sometimes we break things as a result.&lt;/p&gt;
&lt;p&gt;Historically this hasn’t been a problem, according to our survey last year
most users were fairly happy with Dask’s stability.&lt;/p&gt;
&lt;img src="/images/2020_survey/2020_27_0.png"&gt;
&lt;p&gt;However the last year has seen a lot of evolution of the project,
which in turn causes code churn.
This can cause friction for downstream users today,
but also means more-than-incremental changes for the future.
We’ve optimized a little bit for long-term growth over short-term stability.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/21/stability.md&lt;/span&gt;, line 21)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="motivation-for-change"&gt;

&lt;p&gt;There are two structural things driving some of these changes:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;An increase in computational scale&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An increase in organizational scale&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/21/stability.md&lt;/span&gt;, line 28)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="computational-scale"&gt;
&lt;h1&gt;Computational Scale&lt;/h1&gt;
&lt;p&gt;Dask today is used across a wider range of problems,
a more diverse set of hardware,
and at larger scales more routinely than before.&lt;/p&gt;
&lt;p&gt;Addressing this increase in scale across many dimensions has caused us to
redesign Dask’s internal infrastructure in several ways.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;We’ve changed how Dask graphs are represented and communicated to the scheduler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’ve pulled out Dask’s internal state machines and made them more formalized&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’ve rewritten large chunks of the scheduler in Cython&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’ve overhauled how we serialize messages that go between all Dask servers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’re now tracking memory with much finer granularity than we did before&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;… and more&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We’ve been doing all of these internal changes with minimal impact to the
myriad of downstream user communities (Xarray, Prefect, RAPIDS, XGBoost, …).
This is largely due to those downstream developer communities,
who help to identify, isolate, and work through the subtle tremors that occur
on the surface when we make these subsurface shifts.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/21/stability.md&lt;/span&gt;, line 50)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="organizational-scale"&gt;
&lt;h1&gt;Organizational scale&lt;/h1&gt;
&lt;p&gt;Historically Dask’s core was maintained by a relatively small set of people,
mostly at Anaconda.
There were dozens of developers that worked on various &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-foo&lt;/span&gt;&lt;/code&gt; projects, but
only a small group that thought about things like serialization, state
machines, and so on.
In particular I personally tracked every issue and knew the entire project.
Whenever a potential conflict arose I was usually able to identify it early.&lt;/p&gt;
&lt;p&gt;This has all changed dramatically.&lt;/p&gt;
&lt;p&gt;First, there are now several multi-company teams working on different parts of
Dask internals.&lt;/p&gt;
&lt;p&gt;Second, we’ve also taken some time to redesign parts of Dask internals to make them more maintainable.
Dask scheduling is like a finely made clock.
Historically parts of that clock were built and designed by individuals with a craftsman-like approach.
Now we’re redesigning things with more of a group mindset.
This results in more maintainable designs,
but it also means that we’re taking apart the clock and putting it back together.
It takes a little while to find all of the missing parts :)&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/21/stability.md&lt;/span&gt;, line 73)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="how-this-affects-you-today"&gt;
&lt;h1&gt;How this affects you today&lt;/h1&gt;
&lt;p&gt;This all started around when we switched to Calendar Versioning at the end of last year
(Dask version &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;2.30.1&lt;/span&gt;&lt;/code&gt; rolled over into &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;2020.12.0&lt;/span&gt;&lt;/code&gt; last December). You may
have noticed&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;an increased sensitivity to version mismatches (as we change the Dask
protocol different versions of Dask can no longer talk to each other well)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;releases with stability issues (2020.12 was particularly rough)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/dask/community/issues/155"&gt;tighter pinning&lt;/a&gt; between dask and distributed versions during releases&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/21/stability.md&lt;/span&gt;, line 84)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="how-this-will-affect-you"&gt;
&lt;h1&gt;How this will affect you&lt;/h1&gt;
&lt;p&gt;We’ve merged in a &lt;a class="reference external" href="https://github.com/dask/dask/pull/7620"&gt;PR&lt;/a&gt;
to change the default behavior when moving &lt;a class="reference external" href="https://docs.dask.org/en/latest/high-level-graphs.html"&gt;high level graphs&lt;/a&gt;
to the scheduler for Dask Dataframes. This should result in much
less delay when submitting large computations and almost no delay in
optimization. It also opens up a conduit for us to send &lt;em&gt;a lot&lt;/em&gt; more semantic
information to the scheduler about your computation, which can result in new
visualizations and smarter scheduling in the future.&lt;/p&gt;
&lt;p&gt;It will also probably break some things.&lt;/p&gt;
&lt;p&gt;To be clear, all tests pass among Dask, distributed, xarray, prefect, rapids,
and other downstream projects. We’ve done our homework here, but almost certainly we’ve missed something.&lt;/p&gt;
&lt;p&gt;This is only one of several larger changes happening in the coming months.
We appreciate your patience and your engagement as we make some of these larger shifts.
For better or worse end users are the final testing suite :)&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/05/21/stability/"/>
    <summary>Dask is moving fast these days. Sometimes we break things as a result.</summary>
    <published>2021-05-21T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/05/07/skeleton-analysis/</id>
    <title>Skeleton analysis</title>
    <updated>2021-05-07T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;In this blogpost, we show how to modify a skeleton network analysis with Dask to work with constrained RAM (eg: on your laptop). This makes it more accessible: it can run on a small laptop, instead of requiring access to a supercomputing cluster. Example code is also &lt;a class="reference external" href="https://github.com/GenevieveBuckley/distributed-skeleton-analysis/blob/main/distributed-skeleton-analysis-with-dask.ipynb"&gt;provided here&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 13)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#skeleton-structures-are-everywhere"&gt;&lt;span class="xref myst"&gt;Skeleton structures are everywhere&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#the-scientific-problem"&gt;&lt;span class="xref myst"&gt;The scientific problem&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#the-compute-problem"&gt;&lt;span class="xref myst"&gt;The compute problem&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#our-approach"&gt;&lt;span class="xref myst"&gt;Our approach&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#results"&gt;&lt;span class="xref myst"&gt;Results&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#limitations"&gt;&lt;span class="xref myst"&gt;Limitations&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#problems-encountered"&gt;&lt;span class="xref myst"&gt;Problems encountered&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#how-we-solved-them"&gt;&lt;span class="xref myst"&gt;How we solved them&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#problem-1-the-skeletonize-function-from-scikit-image-crashes-due-to-lack-of-ram"&gt;&lt;span class="xref myst"&gt;Problem 1: The skeletonize function from scikit-image crashes due to lack of RAM&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#problem-2-ragged-or-non-uniform-output-from-dask-array-chunks"&gt;&lt;span class="xref myst"&gt;Problem 2: Ragged or non-uniform output from Dask array chunks&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#problem-3-grabbing-the-image-chunks-with-an-overlap"&gt;&lt;span class="xref myst"&gt;Problem 3: Grabbing the image chunks with an overlap&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#problem-4-summary-statistics-with-skan"&gt;&lt;span class="xref myst"&gt;Problem 4: Summary statistics with skan&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#what's-next"&gt;&lt;span class="xref myst"&gt;What’s next&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#how-you-can-help"&gt;&lt;span class="xref myst"&gt;How you can help&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 30)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="skeleton-structures-are-everywhere"&gt;
&lt;h1&gt;Skeleton structures are everywhere&lt;/h1&gt;
&lt;p&gt;Lots of biological structures have a skeleton or network-like shape. We see these in all kinds of places, including:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;blood vessel branching&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the branching of airways&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;neuron networks in the brain&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the root structure of plants&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the capillaries in leaves&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;… and many more&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Analysing the structure of these skeletons can give us important information about the biology of that system.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 43)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="the-scientific-problem"&gt;
&lt;h1&gt;The scientific problem&lt;/h1&gt;
&lt;p&gt;For this bogpost, we will look at the blood vessels inside of a lung. This data was shared with us by &lt;a class="reference external" href="https://research.monash.edu/en/persons/marcus-kitchen"&gt;Marcus Kitchen&lt;/a&gt;, &lt;a class="reference external" href="https://hudson.org.au/researcher-profile/andrew-stainsby/"&gt;Andrew Stainsby&lt;/a&gt;, and their team of collaborators.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Skeleton network of blood vessels within a healthy lung" src="https://blog.dask.org/_images/skeleton-screenshot-crop.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;This research group focusses on lung development.
We want to compare the blood vessels in a healthy lung, against a lung from a hernia model. In the hernia model the lung is underdeveloped, squashed, and smaller.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 52)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="the-compute-problem"&gt;
&lt;h1&gt;The compute problem&lt;/h1&gt;
&lt;p&gt;These image volumes have a shape of roughtly 1000x1000x1000 pixels.
That doesn’t seem huge but given the high RAM consumption involved in processing the analysis, it crashes when running on a laptop.&lt;/p&gt;
&lt;p&gt;If you’re running out of RAM, there are two possible appoaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Get more RAM. Run things on a bigger computer, or move things to a supercomputing cluster. This has the advantage that you don’t need to rewrite your code, but it does require access to more powerful computer hardware.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Manage the RAM you’ve got. Dask is good for this. If we use Dask, and some reasonable chunking of our arrays, we can manage things so that we never hit the RAM ceiling and crash. This has the advantage that you don’t need to buy more computer hardware, but it will require re-writing some code.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 63)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="our-approach"&gt;
&lt;h1&gt;Our approach&lt;/h1&gt;
&lt;p&gt;We took the second approach, using Dask so we can run our analysis on a small laptop with constrained RAM without crashing. This makes it more accessible, to more people.&lt;/p&gt;
&lt;p&gt;All the image pre-processing steps will be done with &lt;a class="reference external" href="http://image.dask.org/en/latest/"&gt;dask-image&lt;/a&gt;, and the &lt;a class="reference external" href="https://scikit-image.org/docs/dev/auto_examples/edges/plot_skeleton.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;skeletonize&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function of &lt;a class="reference external" href="https://scikit-image.org/"&gt;scikit-image&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We use &lt;a class="reference external" href="https://jni.github.io/skan/"&gt;skan&lt;/a&gt; as the backbone of our analysis pipeline. &lt;a class="reference external" href="https://jni.github.io/skan/"&gt;skan&lt;/a&gt; is a library for skeleton image analysis. Given a skeleton image, it can describe statistics of the branches. To make it fast, the library is accelerated with &lt;a class="reference external" href="https://numba.pydata.org/"&gt;numba&lt;/a&gt; (if you’re curious, you can hear more about that in &lt;a class="reference external" href="https://www.youtube.com/watch?v=0pUPNMglnaE"&gt;this talk&lt;/a&gt; and its &lt;a class="reference external" href="https://github.com/jni/skan-talk-scipy-2019"&gt;related notebook&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;There is an example notebook containing the full details of the skeleton analysis &lt;a class="reference external" href="https://github.com/GenevieveBuckley/distributed-skeleton-analysis/blob/main/distributed-skeleton-analysis-with-dask.ipynb"&gt;available here&lt;/a&gt;. You can read on to hear just the highlights.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 73)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="results"&gt;
&lt;h1&gt;Results&lt;/h1&gt;
&lt;p&gt;The statistics from the blood vessel branches in the healthy and herniated lung shows clear differences between the two.&lt;/p&gt;
&lt;p&gt;Most striking is the difference in the number of blood vessel branches.
The herniated lung has less than 40% of the number of blood vessel branches in the healthy lung.&lt;/p&gt;
&lt;p&gt;There are also quantitative differences in the sizes of the blood vessels.
Here is a violin plot showing the distribution of the distances between the start and end points of each blood vessel branch. We can see that overall the blood vessel branches start and end closer together in the herniated lung. This is consistent with what we might expect, since the healthy lung is more well developed than the lung from the hernia model and the hernia has compressed that lung into a smaller overall space.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Violin plot comparing blood vessel thickness between a healthy and herniated lung" src="https://blog.dask.org/_images/compare-euclidean-distance.png" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;EDIT: This blogpost previously described the euclidean distance violin plot as measuring the thickness of the blood vessels. This is incorrect, and the mistake was not caught in the review process before publication. This post has been updated to correctly describe the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;euclidean-distance&lt;/span&gt;&lt;/code&gt; measuremet as the distance between the start and end of branches, as if you pulled a string taught between those points. An alternative measurement, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;branch-length&lt;/span&gt;&lt;/code&gt; describes the total branch length, including any winding twists and turns.&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 87)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="limitations"&gt;
&lt;h1&gt;Limitations&lt;/h1&gt;
&lt;p&gt;We rely on one big assumption: once skeletonized the reduced non-zero pixel data will fit into memory. While this holds true for datasets of this size (the cropped rabbit lung datasets are roughly 1000 x 1000 x 1000 pixels), it may not hold true for much larger data.&lt;/p&gt;
&lt;p&gt;Dask computation is also triggered at a few points through our prototype workflow. Ideally all computation would be delayed until the very final stage.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 93)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="problems-encountered"&gt;
&lt;h1&gt;Problems encountered&lt;/h1&gt;
&lt;p&gt;This project was originally intended to be a quick &amp;amp; easy one. Famous last words!&lt;/p&gt;
&lt;p&gt;What I wanted to do was to put the image data in a Dask array, and then use the &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-overlap.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function to do the image filtering, thresholding, skeletonizing, and skeleton analysis. What I soon found was that although the image filtering, thresholding, and skeletonization worked well, the skeleton analysis step had some problems:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dask’s map_overlap function doesn’t handle ragged or non-uniformly shaped results from different image chunks very well, and…&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Internal function in the skan library were written in a way that was incompatible with distributed computation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 103)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="how-we-solved-them"&gt;
&lt;h1&gt;How we solved them&lt;/h1&gt;
&lt;section id="problem-1-the-skeletonize-function-from-scikit-image-crashes-due-to-lack-of-ram"&gt;
&lt;h2&gt;Problem 1: The skeletonize function from scikit-image crashes due to lack of RAM&lt;/h2&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://scikit-image.org/docs/dev/auto_examples/edges/plot_skeleton.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;skeletonize&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function of &lt;a class="reference external" href="https://scikit-image.org/"&gt;scikit-image&lt;/a&gt; is very memory intensive, and was crashing on a laptop with 16GB RAM.&lt;/p&gt;
&lt;p&gt;We solved this by:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Putting our image data into a Dask array with &lt;a class="reference external" href="http://image.dask.org/en/latest/dask_image.imread.html"&gt;dask-image &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;imread&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.dask.org/en/latest/array-chunks.html?highlight=rechunk#rechunking"&gt;Rechunking&lt;/a&gt; the Dask array. We need to change the chunk shapes from 2D slices to small cuboid volumes, so the next step in the computation is efficient. We can choose the overall size of the chunks so that we can stay under the memory threshold needed for skeletonize.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, we run the &lt;a class="reference external" href="https://scikit-image.org/docs/dev/auto_examples/edges/plot_skeleton.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;skeletonize&lt;/span&gt;&lt;/code&gt; function&lt;/a&gt; on the Dask array chunks using the &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-overlap.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt; function&lt;/a&gt;. By limiting the size of the array chunks, we stay under our memory threshold!&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="problem-2-ragged-or-non-uniform-output-from-dask-array-chunks"&gt;
&lt;h2&gt;Problem 2: Ragged or non-uniform output from Dask array chunks&lt;/h2&gt;
&lt;p&gt;The skeleton analysis functions will return results with ragged or non-uniform length for each image chunk. This is unsurpising, because different chunks will have different numbers of non-zero pixels in our skeleton shape.&lt;/p&gt;
&lt;p&gt;When working with Dask arrays, there are two very commonly used functions: &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.map_blocks"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; and &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-overlap.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;. Here’s what happens when we try a function with ragged outputs with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt; versus &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# our dummy analysis function&lt;/span&gt;
    &lt;span class="n"&gt;random_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;With &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt;, everything works well:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop_axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# this works well&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But if we need some overlap for function &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;foo&lt;/span&gt;&lt;/code&gt; to work correctly, then we run into problems:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_overlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop_axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# incorrect results&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Here, the first and last element of the results from foo are trimmed off before the results are concatenated, which we don’t want! Setting the keyword argument &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;trim=False&lt;/span&gt;&lt;/code&gt; would help avoid this problem, except then we get an error:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_overlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop_axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# ValueError&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Unfortunately for us, it’s really important to have a 1 pixel overlap in our array chunks, so that we can tell if a skeleton branch is ending or continuing on into the next chunk.&lt;/p&gt;
&lt;p&gt;There’s some complexity in the way &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt; results are concatenated back together so rather than diving into that, a more straightforward solution is to use &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;Dask delayed&lt;/a&gt; instead. &lt;a class="reference external" href="https://github.com/chrisroat"&gt;Chris Roat&lt;/a&gt; shows a nice example of how we can use &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;Dask delayed&lt;/a&gt; in a list comprehension that is then concatenated with Dask (&lt;a class="reference external" href="https://github.com/dask/dask/issues/7589"&gt;link to original discussion&lt;/a&gt;).&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pandas&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nd"&gt;@dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delayed&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Make each dataframe a different size&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                         &lt;span class="s1"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

&lt;span class="n"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_meta&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;blocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_delayed&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# no overlap&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_delayed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ddf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; It’s very important to pass in a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;meta&lt;/span&gt;&lt;/code&gt; keyword argument to the function &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;from_delayed&lt;/span&gt;&lt;/code&gt;. Without it, things will be extremely inefficient!&lt;/p&gt;
&lt;p&gt;If the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;meta&lt;/span&gt;&lt;/code&gt; keyword argument is not given, Dask will try and work out what it should be. Ordinarily that might be a good thing, but inside a list comprehension that means those tasks are computed slowly and sequentially before the main computation even begins, which is horribly inefficient. Since we know ahead of time what kinds of results we expect from our analysis function (we just don’t know the length of each set of results), we can use the &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.utils.make_meta"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;utils.make_meta&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function to help us here.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="problem-3-grabbing-the-image-chunks-with-an-overlap"&gt;
&lt;h2&gt;Problem 3: Grabbing the image chunks with an overlap&lt;/h2&gt;
&lt;p&gt;Now that we’re using &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;Dask delayed&lt;/a&gt; to piece together our skeleton analysis results, it’s up to us to handle the array chunks overlap ourselves.&lt;/p&gt;
&lt;p&gt;We’ll do that by modifying Dask’s &lt;a class="reference external" href="https://github.com/dask/dask/blob/21aaf44d4d25bdba05951b85f3f2d943b823e82d/dask/array/core.py#L209-L225"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.array.core.slices_from_chunks&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function, into something that will be able to handle an overlap. Some special handling is required at the boundaries of the Dask array, so that we don’t try to slice past the edge of the array.&lt;/p&gt;
&lt;p&gt;Here’s what that looks like (&lt;a class="reference external" href="https://gist.github.com/GenevieveBuckley/decd23c22ee3417f7d78e87f791bc081"&gt;gist&lt;/a&gt;):&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;itertools&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array.slicing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cached_cumsum&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;slices_from_chunks_overlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;array_shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cumdims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cached_cumsum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;initial_zero&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bds&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;slices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;starts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shapes&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cumdims&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;inner_slices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maxshape&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;starts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shapes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;array_shape&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;slice_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
            &lt;span class="n"&gt;slice_stop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;slice_start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;slice_start&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;slice_stop&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;maxshape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;slice_stop&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;
            &lt;span class="n"&gt;inner_slices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slice_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slice_stop&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;slices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inner_slices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;slices&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now that we can slice an image chunk plus an extra pixel of overlap, all we need is a way to do that for all the chunks in an array. Drawing inspiration from this &lt;a class="reference external" href="https://github.com/dask/dask-image/blob/63543bf2f6553a8150f45289492bf614e1945ac0/dask_image/ndmeasure/__init__.py#L299-L303"&gt;block iteration&lt;/a&gt; we make a similar iterator.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;block_iter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;numblocks&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;functools&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;partial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getitem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;slices_from_chunks_overlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_meta&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;row&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;col&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;data&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;intermediate_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_delayed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skeleton_graph_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;block_iter&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;intermediate_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop_duplicates&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# we need to drop duplicates because it counts pixels in the overlapping region twice&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;With these results, we’re able to create the sparse skeleton graph.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="problem-4-summary-statistics-with-skan"&gt;
&lt;h2&gt;Problem 4: Summary statistics with skan&lt;/h2&gt;
&lt;p&gt;Skeleton branch statistics can be calculate with the &lt;a class="reference external" href="https://jni.github.io/skan/api/skan.csr.html#skan.csr.summarize"&gt;skan &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;summarize&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function. The problem here is that the function expects a &lt;a class="reference external" href="https://jni.github.io/skan/api/skan.csr.html#skan.csr.Skeleton"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Skeleton&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; object instance, but initializing a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Skeleton&lt;/span&gt;&lt;/code&gt; object calls methods that are not compatible for distributed analysis.&lt;/p&gt;
&lt;p&gt;We’ll solve this problem by first initializing a &lt;a class="reference external" href="https://jni.github.io/skan/api/skan.csr.html#skan.csr.Skeleton"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Skeleton&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; object instance with a tiny dummy dataset, then overwriting the attributes of the skeleton object with our real results. This is a hack, but it lets us achieve our goal: summary branch statistics for our large dataset.&lt;/p&gt;
&lt;p&gt;First we make a Skeleton object instance with dummy data:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;skan._testdata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;skeleton0&lt;/span&gt;

&lt;span class="n"&gt;skeleton_object&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Skeleton&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skeleton0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# initialize with dummy data&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then we overwrite the attributes with the previously calculated results:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;skeleton_object&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skeleton_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;skeleton_object&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;skeleton_object&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coordinates&lt;/span&gt;
&lt;span class="n"&gt;skeleton_object&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;degrees&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;skeleton_object&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;distances&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then finally we can calculate the summary branch statistics:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;skan&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;summarize&lt;/span&gt;

&lt;span class="n"&gt;statistics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel_obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;statistics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="pst-scrollable-table-container"&gt;&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head text-right"&gt;&lt;p&gt;&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;skeleton-id&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;node-id-src&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;node-id-dst&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;branch-distance&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;branch-type&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;mean-pixel-value&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;stdev-pixel-value&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-src-0&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-src-1&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-src-2&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-dst-0&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-dst-1&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-dst-2&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-src-0&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-src-1&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-src-2&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-dst-0&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-dst-1&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-dst-2&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;euclidean-distance&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-right"&gt;&lt;p&gt;0&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.474584&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.00262514&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;400&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;595&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;400&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;596&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;400&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;595&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;400&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;596&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;3&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;9&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;8.19615&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.464662&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.00299629&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;37&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;400&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;622&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;43&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;392&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;590&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;37&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;400&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;622&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;43&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;392&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;590&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;33.5261&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;3&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;10&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;11&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.483393&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.00771038&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;49&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;391&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;589&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;50&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;391&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;589&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;49&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;391&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;589&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;50&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;391&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;589&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td class="text-right"&gt;&lt;p&gt;3&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;13&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;19&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;6.82843&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.464325&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.0139064&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;52&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;389&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;588&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;55&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;385&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;588&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;52&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;389&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;588&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;55&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;385&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;588&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;5&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-right"&gt;&lt;p&gt;4&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;7&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;21&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;23&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.45862&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.0104024&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;57&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;382&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;587&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;58&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;380&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;586&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;57&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;382&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;587&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;58&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;380&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;586&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2.44949&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;statistics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="pst-scrollable-table-container"&gt;&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head text-left"&gt;&lt;p&gt;&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;skeleton-id&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;node-id-src&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;node-id-dst&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;branch-distance&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;branch-type&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;mean-pixel-value&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;stdev-pixel-value&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-src-0&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-src-1&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-src-2&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-dst-0&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-dst-1&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;image-coord-dst-2&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-src-0&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-src-1&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-src-2&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-dst-0&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-dst-1&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;coord-dst-2&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-right"&gt;&lt;p&gt;euclidean-distance&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-left"&gt;&lt;p&gt;count&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1095&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td class="text-left"&gt;&lt;p&gt;mean&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2089.38&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;11520.1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;11608.6&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22.9079&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2.00091&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.663422&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.0418607&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;591.939&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;430.303&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;377.409&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;594.325&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;436.596&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;373.419&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;591.939&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;430.303&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;377.409&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;594.325&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;436.596&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;373.419&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;190.13&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-left"&gt;&lt;p&gt;std&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;636.377&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;6057.61&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;6061.18&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;24.2646&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.0302199&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.242828&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.0559064&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;174.04&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;194.499&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;97.0219&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;173.353&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;188.708&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;96.8276&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;174.04&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;194.499&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;97.0219&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;173.353&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;188.708&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;96.8276&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;151.171&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td class="text-left"&gt;&lt;p&gt;min&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.414659&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;6.79493e-06&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;39&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;116&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;39&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;114&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;39&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;116&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;22&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;39&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;114&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-left"&gt;&lt;p&gt;25%&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1586&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;6215.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;6429.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1.73205&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.482&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.00710439&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;468.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;278.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;313&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;475&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;299.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;307&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;468.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;278.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;313&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;475&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;299.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;307&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;72.6946&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td class="text-left"&gt;&lt;p&gt;50%&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2431&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;11977&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;12010&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;16.6814&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.552626&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.0189069&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;626&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;405&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;388&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;627&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;410&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;381&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;626&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;405&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;388&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;627&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;410&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;381&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;161.059&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-left"&gt;&lt;p&gt;75%&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2542.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;16526.5&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;16583&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;35.0433&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.768359&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.0528814&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;732&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;579&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;434&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;734&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;590&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;432&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;732&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;579&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;434&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;734&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;590&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;432&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;265.948&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td class="text-left"&gt;&lt;p&gt;max&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;8034&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;26820&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;26822&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;197.147&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;3&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;1.29687&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;0.357193&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;976&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;833&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;622&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;976&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;841&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;606&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;976&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;833&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;622&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;976&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;841&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;606&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-right"&gt;&lt;p&gt;737.835&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;Success!&lt;/p&gt;
&lt;p&gt;We’ve achieved distributed skeleton analysis with Dask.
You can see the example notebook containing the full details of the skeleton analysis &lt;a class="reference external" href="https://github.com/GenevieveBuckley/distributed-skeleton-analysis/blob/main/distributed-skeleton-analysis-with-dask.ipynb"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 294)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="what-s-next"&gt;
&lt;h1&gt;What’s next?&lt;/h1&gt;
&lt;p&gt;A good next step is modifing the &lt;a class="reference external" href="https://github.com/jni/skan"&gt;skan&lt;/a&gt; library code so that it directly supports distributed skeleton analysis.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/05/07/skeleton-analysis.md&lt;/span&gt;, line 298)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="how-you-can-help"&gt;
&lt;h1&gt;How you can help&lt;/h1&gt;
&lt;p&gt;If you’d like to get involved, there are a couple of options:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Try a similar analysis on your own data. The notebook with the full example code is &lt;a class="reference external" href="https://github.com/GenevieveBuckley/distributed-skeleton-analysis/blob/main/distributed-skeleton-analysis-with-dask.ipynb"&gt;available here&lt;/a&gt;. You can share or ask questions in the &lt;a class="reference external" href="https://join.slack.com/t/dask/shared_invite/zt-mfmh7quc-nIrXL6ocgiUH2haLYA914g"&gt;Dask slack&lt;/a&gt; or &lt;a class="reference internal" href="#twitter.com/dask_dev"&gt;&lt;span class="xref myst"&gt;on twitter&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Help add support for distributed skeleton analysis to skan. Head on over to the &lt;a class="reference external" href="https://github.com/jni/skan/issues/"&gt;skan issues page&lt;/a&gt; and leave a comment if you’d like to join in.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/05/07/skeleton-analysis/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="imaging" label="imaging"/>
    <category term="lifescience" label="life science"/>
    <category term="skan" label="skan"/>
    <category term="skeletonanalysis" label="skeleton analysis"/>
    <published>2021-05-07T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/03/29/apply-pretrained-pytorch-model/</id>
    <title>Dask with PyTorch for large scale image analysis</title>
    <updated>2021-03-29T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/29/apply-pretrained-pytorch-model.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;This post explores applying a pre-trained &lt;a class="reference external" href="https://pytorch.org/"&gt;PyTorch&lt;/a&gt; model in parallel with Dask Array.&lt;/p&gt;
&lt;p&gt;We cover a simple example applying a pre-trained UNet to a stack of images to generate features for every pixel.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/29/apply-pretrained-pytorch-model.md&lt;/span&gt;, line 15)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="a-worked-example"&gt;
&lt;h1&gt;A Worked Example&lt;/h1&gt;
&lt;p&gt;Let’s start with an example applying a pre-trained &lt;a class="reference external" href="https://arxiv.org/abs/1505.04597"&gt;UNet&lt;/a&gt; to a stack of light sheet microscopy data.&lt;/p&gt;
&lt;p&gt;In this example, we:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Load the image data from Zarr into a multi-chunked Dask array&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Load a pre-trained PyTorch model that featurizes images&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Construct a function to apply the model onto each chunk&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apply that function across the Dask array with the dask.array.map_blocks function.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Store the result back into Zarr format&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;section id="step-1-load-the-image-data"&gt;
&lt;h2&gt;Step 1. Load the image data&lt;/h2&gt;
&lt;p&gt;First, we load the image data into a Dask array.&lt;/p&gt;
&lt;p&gt;The example dataset we’re using here is lattice lightsheet microscopy of the tail region of a zebrafish embryo. It is described in &lt;a class="reference external" href="http://dx.doi.org/10.1126/science.aaq1392"&gt;this Science paper&lt;/a&gt; (see Figure 4), and provided with permission from Srigokul Upadhyayula.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Liu &lt;em&gt;et al.&lt;/em&gt; 2018 “Observing the cell in its native state: Imaging subcellular dynamics in multicellular organisms” &lt;em&gt;Science&lt;/em&gt;, Vol. 360, Issue 6386, eaaq1392 DOI: 10.1126/science.aaq1392 (&lt;a class="reference external" href="http://dx.doi.org/10.1126/science.aaq1392"&gt;link&lt;/a&gt;)&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;This is the same data that we analysed in our last &lt;a class="reference external" href="https://blog.dask.org/2019/08/09/image-itk"&gt;blogpost on Dask and ITK&lt;/a&gt;. You should note the similarities to that workflow even though we are now using new libaries and performing different analyses.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/Users/nicholassofroniew/Github/image-demos/data/LLSM&amp;#39;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Load our data&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="n"&gt;imgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_zarr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AOLLSM_m4_560nm.zarr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;imgs&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;zarr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;199&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="step-2-load-a-pre-trained-pytorch-model"&gt;
&lt;h2&gt;Step 2. Load a pre-trained PyTorch model&lt;/h2&gt;
&lt;p&gt;Next, we load our pre-trained UNet model.&lt;/p&gt;
&lt;p&gt;This UNet model takes in an 2D image and returns a 2D x 16 array, where each pixel is now associate with a feature vector of length 16.&lt;/p&gt;
&lt;p&gt;We thank Mars Huang for training this particular UNet on a corpous of biological images to produce biologically relevant feature vectors, as part of his work on &lt;a class="reference external" href="https://github.com/transformify-plugins/segmentify"&gt;interactive bio-image segmentation&lt;/a&gt;. These features can then be used for more downstream image processing tasks such as image segmentation.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Load our pretrained UNet¶&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;segmentify.model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;UNet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;layers&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;load_unet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Load a pretrained UNet model.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

    &lt;span class="c1"&gt;# load in saved model&lt;/span&gt;
    &lt;span class="n"&gt;pth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pth&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;model_args&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;model_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pth&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;model_state&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UNet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;model_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_state_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# remove last layer and activation&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;segment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Identity&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;activate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Identity&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_unet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;HPA_3.pth&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="step-3-construct-a-function-to-apply-the-model-to-each-chunk"&gt;
&lt;h2&gt;Step 3. Construct a function to apply the model to each chunk&lt;/h2&gt;
&lt;p&gt;We make a function to apply our pre-trained UNet model to each chunk of the Dask array.&lt;/p&gt;
&lt;p&gt;Because Dask arrays are just made out of Numpy arrays which are easily converted to Torch arrays, we’re able to leverage the power of machine learning at scale.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Apply UNet featurization&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;unet_featurize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Featurize pixels in an image using pretrained UNet model.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract the 2D image data from the Dask array&lt;/span&gt;
    &lt;span class="c1"&gt;# Original Dask array dimensions were (time, z-slice, y, x)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Put the data into a shape PyTorch expects&lt;/span&gt;
    &lt;span class="c1"&gt;# Expected dimensions are (Batch x Channel x Width x Height)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# convert image to torch Tensor&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# pass image through model&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# generate feature vectors (w,h,f)&lt;/span&gt;
    &lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transpose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Add back the leading length-one dimensions&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note: Very observant readers might notice that the steps for extracting the 2D image data and then putting it into a shape PyTorch expects appear to be redundant. It is redundant for our particular example, but that might easily not have been the case.&lt;/p&gt;
&lt;p&gt;To explain this in more detail, the UNet expects 4D input, with dimensions &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;(Batch&lt;/span&gt; &lt;span class="pre"&gt;x&lt;/span&gt; &lt;span class="pre"&gt;Channel&lt;/span&gt; &lt;span class="pre"&gt;x&lt;/span&gt; &lt;span class="pre"&gt;Width&lt;/span&gt; &lt;span class="pre"&gt;x&lt;/span&gt; &lt;span class="pre"&gt;Height)&lt;/span&gt;&lt;/code&gt;. The original Dask array dimensions were &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;(time,&lt;/span&gt; &lt;span class="pre"&gt;z-slice,&lt;/span&gt; &lt;span class="pre"&gt;y,&lt;/span&gt; &lt;span class="pre"&gt;x)&lt;/span&gt;&lt;/code&gt;. In our example it just so happens those match in a way that makes removing and then adding the leading dimensions redundant, but depending on the shape of the original Dask array this might not have been true.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="step-4-apply-that-function-across-the-dask-array"&gt;
&lt;h2&gt;Step 4. Apply that function across the Dask array&lt;/h2&gt;
&lt;p&gt;Now we apply that function to the data in our Dask array using &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html?highlight=map_blocks#dask.array.map_blocks"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.array.map_blocks&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Apply UNet featurization&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unet_featurize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;imgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;new_axis&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;unet_featurize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;199&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="step-5-store-the-result-back-into-zarr-format"&gt;
&lt;h2&gt;Step 5. Store the result back into Zarr format&lt;/h2&gt;
&lt;p&gt;Last, we store the result from the UNet model featurization as a zarr array.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Trigger computation and store&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_zarr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;AOLLSM_featurized.zarr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;overwrite&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now we’ve saved our output, these features can be used for more downstream image processing tasks such as image segmentation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="summing-up"&gt;
&lt;h2&gt;Summing up&lt;/h2&gt;
&lt;p&gt;Here we’ve shown how to apply a pre-trained PyTorch model to a Dask array of image data.&lt;/p&gt;
&lt;p&gt;Because our Dask array chunks are Numpy arrays, they can be easily converted to Torch arrays. This way, we’re able to leverage the power of machine learning at scale.&lt;/p&gt;
&lt;p&gt;This workflow was very similar to &lt;a class="reference external" href="https://blog.dask.org/2019/08/09/image-itk"&gt;our example&lt;/a&gt; using the dask.array.map_blocks function with ITK to perform image deconvolution. This shows you can easily adapt the same type of workflow to achieve many different types of analysis with Dask.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/03/29/apply-pretrained-pytorch-model/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="PyTorch" label="PyTorch"/>
    <category term="deeplearning" label="deep learning"/>
    <category term="imaging" label="imaging"/>
    <published>2021-03-29T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/03/19/image-segmentation/</id>
    <title>Image segmentation with Dask</title>
    <updated>2021-03-19T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/19/image-segmentation.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;We look at how to create a basic image segmentation pipeline, using the &lt;a class="reference external" href="http://image.dask.org/en/latest/"&gt;dask-image&lt;/a&gt; library.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/19/image-segmentation.md&lt;/span&gt;, line 13)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#just-show-me-the-code"&gt;&lt;span class="xref myst"&gt;Just show me the code&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#image-segmentation-pipeline"&gt;&lt;span class="xref myst"&gt;Image segmentation pipeline&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#set-up-your-python-environment"&gt;&lt;span class="xref myst"&gt;Set up your python environment&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#download-the-example-data"&gt;&lt;span class="xref myst"&gt;Download the example data&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#step-1-reading-in-data"&gt;&lt;span class="xref myst"&gt;Step 1: Reading in data&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#step-2-siltering-images"&gt;&lt;span class="xref myst"&gt;Step 2: Filtering images&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#step-3-segmenting-objects"&gt;&lt;span class="xref myst"&gt;Step 3: Segmenting objects&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#step-4-morphological-operations"&gt;&lt;span class="xref myst"&gt;Step 4: Morphological operations&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#step-5-measuring-objects"&gt;&lt;span class="xref myst"&gt;Step 5: Measuring objects&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#custom-functions"&gt;&lt;span class="xref myst"&gt;Custom functions&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#dask-map_overlap-and-map_blocks"&gt;&lt;span class="xref myst"&gt;Dask map_overlap and map_blocks&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#dask-delayed"&gt;&lt;span class="xref myst"&gt;Dask delayed decorator&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#scikit-image-apply_parallel-function"&gt;&lt;span class="xref myst"&gt;scikit-image apply_parallel&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#scaling-up-computation"&gt;&lt;span class="xref myst"&gt;Scaling up computation&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#bonus-content-using-arrays-on-gpu"&gt;&lt;span class="xref myst"&gt;Bonus content: using arrays on GPU&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#how-you-can-get-involved"&gt;&lt;span class="xref myst"&gt;How you can get involved&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The content of this blog post originally appeared as &lt;a class="reference external" href="https://github.com/genevieveBuckley/dask-image-talk-2020"&gt;a conference talk in 2020&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/19/image-segmentation.md&lt;/span&gt;, line 34)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="just-show-me-the-code"&gt;
&lt;h1&gt;Just show me the code&lt;/h1&gt;
&lt;p&gt;If you want to run this yourself, you’ll need to download the example data from the Broad Bioimage Benchmark Collection: https://bbbc.broadinstitute.org/BBBC039&lt;/p&gt;
&lt;p&gt;And install these requirements:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.4.0&lt;/span&gt; &lt;span class="n"&gt;tifffile&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Here’s our full pipeline:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image.imread&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;imread&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ndfilters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ndmorph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ndmeasure&lt;/span&gt;

&lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;data/BBBC039/images/*.tif&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;smoothed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndfilters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gaussian_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;thresh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndfilters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threshold_local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;smoothed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blocksize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;threshold_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smoothed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;thresh&lt;/span&gt;
&lt;span class="n"&gt;structuring_element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]])&lt;/span&gt;
&lt;span class="n"&gt;binary_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndmorph&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;binary_closing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;structure&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;structuring_element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;label_images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndmeasure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;binary_image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;area&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndmeasure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;area&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label_images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mean_intensity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndmeasure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label_images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can keep reading for a step by step walkthrough of this image segmentation pipeline, or you can skip ahead to the sections on &lt;a class="reference internal" href="#Custom-functions"&gt;&lt;span class="xref myst"&gt;custom functions&lt;/span&gt;&lt;/a&gt;, &lt;a class="reference internal" href="#Scaling-up-computation"&gt;&lt;span class="xref myst"&gt;scaling up computation&lt;/span&gt;&lt;/a&gt;, or &lt;a class="reference internal" href="#Bonus-content:-using-arrays-on-GPU"&gt;&lt;span class="xref myst"&gt;GPU acceleration&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/19/image-segmentation.md&lt;/span&gt;, line 65)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="image-segmentation-pipeline"&gt;
&lt;h1&gt;Image segmentation pipeline&lt;/h1&gt;
&lt;p&gt;Our basic image segmentation pipeline has five steps:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Reading in data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Filtering images&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Segmenting objects&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Morphological operations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Measuring objects&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;section id="set-up-your-python-environment"&gt;
&lt;h2&gt;Set up your python environment&lt;/h2&gt;
&lt;p&gt;Before we begin, we’ll need to set up our python virtual environment.&lt;/p&gt;
&lt;p&gt;At a minimum, you’ll need:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.4.0&lt;/span&gt; &lt;span class="n"&gt;tifffile&lt;/span&gt; &lt;span class="n"&gt;matplotlib&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Optionally, you can also install the &lt;a class="reference external" href="https://napari.org/"&gt;napari&lt;/a&gt; image viewer to visualize the image segmentation.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;napari[all]&amp;quot;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To use napari from IPython or jupyter, run the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;%gui&lt;/span&gt; &lt;span class="pre"&gt;qt&lt;/span&gt;&lt;/code&gt; magic in a cell before calling napari. See the &lt;a class="reference external" href="https://napari.org/"&gt;napari documentation&lt;/a&gt; for more details.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="download-the-example-data"&gt;
&lt;h2&gt;Download the example data&lt;/h2&gt;
&lt;p&gt;We’ll use the publically available image dataset &lt;a class="reference external" href="https://bbbc.broadinstitute.org/BBBC039"&gt;BBBC039&lt;/a&gt; Caicedo et al. 2018, available from the Broad Bioimage Benchmark Collection &lt;a class="reference external" href="http://dx.doi.org/10.1038/nmeth.2083"&gt;Ljosa et al., Nature Methods, 2012&lt;/a&gt;. You can download the dataset here: &lt;a class="reference external" href="https://bbbc.broadinstitute.org/BBBC039"&gt;https://bbbc.broadinstitute.org/BBBC039&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Example image from the BBBC039 dataset, Broad Bioimage Benchmark Collection" src="https://blog.dask.org/_images/BBBC039-example-image.png" /&gt;&lt;/p&gt;
&lt;p&gt;These are fluorescence microscopy images, where we see the nuclei in individual cells.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="step-1-reading-in-data"&gt;
&lt;h2&gt;Step 1: Reading in data&lt;/h2&gt;
&lt;p&gt;Step one in our image segmentation pipeline is to read in the image data. We can do that with the &lt;a class="reference external" href="http://image.dask.org/en/latest/dask_image.imread.html"&gt;dask-image imread function&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We pass the path to the folder full of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;*.tif&lt;/span&gt;&lt;/code&gt; images from our example data.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image.imread&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;imread&lt;/span&gt;

&lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;data/BBBC039/images/*.tif&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="HTML reprsentation of a Dask array" src="https://blog.dask.org/_images/dask-array-html-repr.png" /&gt;&lt;/p&gt;
&lt;p&gt;By default, each individual &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.tif&lt;/span&gt;&lt;/code&gt; file on disk has become one chunk in our Dask array.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="step-2-filtering-images"&gt;
&lt;h2&gt;Step 2: Filtering images&lt;/h2&gt;
&lt;p&gt;Denoising images with a small amount of blur can improve segmentation later on. This is a common first step in a lot of image segmentation pipelines. We can do this with the dask-image &lt;a class="reference external" href="http://image.dask.org/en/latest/dask_image.ndfilters.html#dask_image.ndfilters.gaussian_filter"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;gaussian_filter&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ndfilters&lt;/span&gt;

&lt;span class="n"&gt;smoothed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndfilters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gaussian_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="step-3-segmenting-objects"&gt;
&lt;h2&gt;Step 3: Segmenting objects&lt;/h2&gt;
&lt;p&gt;Next, we want to separate the objects in our images from the background. There are lots of different ways we could do this. Because we have fluorescent microscopy images, we’ll use a thresholding method.&lt;/p&gt;
&lt;section id="absolute-threshold"&gt;
&lt;h3&gt;Absolute threshold&lt;/h3&gt;
&lt;p&gt;We could set an absolute threshold value, where we’d consider pixels with intensity values below this threshold to be part of the background.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;absolute_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smoothed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;160&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Let’s have a look at these images with the napari image viewer. First we’ll need to use the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;%gui&lt;/span&gt; &lt;span class="pre"&gt;qt&lt;/span&gt;&lt;/code&gt; magic:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;gui&lt;/span&gt; &lt;span class="n"&gt;qt&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And now we can look a the images with napari:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;napari&lt;/span&gt;

&lt;span class="n"&gt;viewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;napari&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Viewer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;viewer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;absolute_threshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;viewer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contrast_limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/2021-image-segmentation/napari-absolute-threshold.png" alt="Absolute threshold napari screenshot" width="700" height="476"&gt;
&lt;p&gt;But there’s a problem here.&lt;/p&gt;
&lt;p&gt;When we look at the results for different image frames, it becomes clear that there is no “one size fits all” we can use for an absolute threshold value. Some images in the dataset have quite bright backgrounds, others have fluorescent nuclei with low brightness. We’ll have to try a different kind of thresholding method.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="local-threshold"&gt;
&lt;h3&gt;Local threshold&lt;/h3&gt;
&lt;p&gt;We can improve the segmentation using a local thresholding method.&lt;/p&gt;
&lt;p&gt;If we calculate a threshold value independently for each image frame then we can avoid the problem caused by fluctuating background intensity between frames.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;thresh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndfilters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threshold_local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;smoothed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunksize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;threshold_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smoothed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;thresh&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Let&amp;#39;s take a look at the images with napari&lt;/span&gt;
&lt;span class="n"&gt;viewer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold_images&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/2021-image-segmentation/napari-local-threshold.png" alt="Local threshold napari screenshot" width="700" height="476"&gt;
&lt;p&gt;The results here look much better, this is a much cleaner separation of nuclei from the background and it looks good for all the image frames.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="step-4-morphological-operations"&gt;
&lt;h2&gt;Step 4: Morphological operations&lt;/h2&gt;
&lt;p&gt;Now that we have a binary mask from our threshold, we can clean it up a bit with some morphological operations.&lt;/p&gt;
&lt;p&gt;Morphological operations are changes we make to the shape of structures a binary image. We’ll briefly describe some of the basic concepts here, but for a more detailed reference you can look at &lt;a class="reference external" href="https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html"&gt;this excellent page of the OpenCV documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Erosion&lt;/strong&gt; is an operation where the edges of structures in a binary image are eaten away, or eroded.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Example: Erosion of a binary image" src="https://blog.dask.org/_images/erosion.png" /&gt;&lt;/p&gt;
&lt;p&gt;Image credit: &lt;a class="reference external" href="https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html"&gt;OpenCV documentation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dilation&lt;/strong&gt; is the opposite of an erosion. With dilation, the edges of structures in a binary image are expanded.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Example: Dilation of a binary image" src="https://blog.dask.org/_images/dilation.png" /&gt;&lt;/p&gt;
&lt;p&gt;Image credit: &lt;a class="reference external" href="https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html"&gt;OpenCV documentation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can combine morphological operations in different ways to get useful effects.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;morphological opening&lt;/strong&gt; operation is an erosion, followed by a dilation.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Example: Morphological opening of a binary image" src="https://blog.dask.org/_images/opening.png" /&gt;&lt;/p&gt;
&lt;p&gt;Image credit: &lt;a class="reference external" href="https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html"&gt;OpenCV documentation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the example image above, we can see the left hand side has a noisy, speckled background. If the structuring element used for the morphological operations is larger than the size of the noisy speckles, they will disappear completely in the first erosion step. Then when it is time to do the second dilation step, there’s nothing left of the noise in the background to dilate. So we have removed the noise in the background, while the major structures we are interested in (in this example, the J shape) are restored almost perfectly.&lt;/p&gt;
&lt;p&gt;Let’s use this morphological opening trick to clean up the binary images in our segmentation pipeline.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ndmorph&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;structuring_element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]])&lt;/span&gt;
&lt;span class="n"&gt;binary_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndmorph&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;binary_opening&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold_images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;structure&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;structuring_element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You’ll notice here that we need to be a little bit careful about the structuring element. All our image frames are combined in a single Dask array, but we only want to apply the morphological operation independently to each frame.
To do this, we sandwich the default 2D structuring element between two layers of zeros. This means the neighbouring image frames have no effect on the result.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Default 2D structuring element&lt;/span&gt;

&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="step-5-measuring-objects"&gt;
&lt;h2&gt;Step 5: Measuring objects&lt;/h2&gt;
&lt;p&gt;The last step in any image processing pipeline is to make some kind of measurement. We’ll turn our binary mask into a label image, and then measure the intensity and size of those objects.&lt;/p&gt;
&lt;p&gt;For the sake of keeping the computation time in this tutorial nice and quick, we’ll measure only a small subset of the data. Let’s measure all the objects in the first three image frames (roughly 300 nuclei).&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ndmeasure&lt;/span&gt;

&lt;span class="c1"&gt;# Create labelled mask&lt;/span&gt;
&lt;span class="n"&gt;label_images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndmeasure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;binary_images&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;structuring_element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_features&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;# [1, 2, 3, ...num_features]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Here’s a screenshot of the label image generated from our mask.&lt;/p&gt;
&lt;img src="/images/2021-image-segmentation/napari-label-image.png" alt="Label image napari screenshot" width="700" height="476"&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Number of nuclei:&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_features&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="go"&gt;Number of nuclei: 271&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section id="measure-objects-in-images"&gt;
&lt;h3&gt;Measure objects in images&lt;/h3&gt;
&lt;p&gt;The dask-image &lt;a class="reference external" href="http://image.dask.org/en/latest/dask_image.ndmeasure.html"&gt;ndmeasure subpackage&lt;/a&gt; includes a number of different measurement functions. In this example, we’ll choose to measure:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The area in pixels of each object, and&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The average intensity of each object.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;area&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndmeasure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;area&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;label_images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mean_intensity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ndmeasure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;label_images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="run-computation-and-plot-results"&gt;
&lt;h3&gt;Run computation and plot results&lt;/h3&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;area&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean_intensity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gca&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Area vs mean intensity&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Area (pixels)&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Mean intensity&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img alt="Matplotlib graph of dask-image measurement results: " src="https://blog.dask.org/_images/dask-image-matplotlib-output.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/19/image-segmentation.md&lt;/span&gt;, line 285)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="custom-functions"&gt;
&lt;h1&gt;Custom functions&lt;/h1&gt;
&lt;p&gt;What if you want to do something that isn’t included in the dask-image API? There are several options we can use to write custom functions.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;dask &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-overlap.html?highlight=map_overlap#dask.array.map_overlap"&gt;map_overlap&lt;/a&gt; / &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html?highlight=map_blocks#dask.array.map_blocks"&gt;map_blocks&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;dask &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;delayed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;scikit-image &lt;a class="reference external" href="https://scikit-image.org/docs/dev/api/skimage.util.html#skimage.util.apply_parallel"&gt;apply_parallel()&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="dask-map-overlap-and-map-blocks"&gt;
&lt;h2&gt;Dask map_overlap and map_blocks&lt;/h2&gt;
&lt;p&gt;The Dask array &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-overlap.html#dask.array.map_overlap"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_overlap&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; and &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-api.html#dask.array.map_blocks"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map_blocks&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; are what is used to build most of the functions in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt;. You can use them yourself too. They will apply a function to each chunk in a Dask array.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;my_custom_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# ... does something really cool&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_overlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_custom_function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;my_dask_array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can read more about &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-overlap.html"&gt;overlapping computations here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="dask-delayed"&gt;
&lt;h2&gt;Dask delayed&lt;/h2&gt;
&lt;p&gt;If you want more flexibility and fine-grained control over your computation, then you can use &lt;a class="reference external" href="https://docs.dask.org/en/latest/delayed.html"&gt;Dask delayed&lt;/a&gt;. You can get started &lt;a class="reference external" href="https://tutorial.dask.org/01_dask.delayed.html"&gt;with the Dask delayed tutorial here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="scikit-image-apply-parallel-function"&gt;
&lt;h2&gt;scikit-image apply_parallel function&lt;/h2&gt;
&lt;p&gt;If you’re a person who does a lot of image processing in python, one tool you’re likely to already be using is &lt;a class="reference external" href="https://scikit-image.org/"&gt;scikit-image&lt;/a&gt;. I’d like to draw your attention to the &lt;a class="reference external" href="https://scikit-image.org/docs/dev/api/skimage.util.html?highlight=apply_parallel#skimage.util.apply_parallel"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;apply_parallel&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function available in scikit-image. It uses &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;map-overlap&lt;/span&gt;&lt;/code&gt;, and can be very helpful.&lt;/p&gt;
&lt;p&gt;It’s useful not only when when you have big data, but also in cases where your data fits into memory but the computation you want to apply to the data is memory intensive. This might cause you to exceed the available RAM, and &lt;a class="reference external" href="https://scikit-image.org/docs/dev/api/skimage.util.html?highlight=apply_parallel#skimage.util.apply_parallel"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;apply_parallel&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; is great for these situations too.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/19/image-segmentation.md&lt;/span&gt;, line 318)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="scaling-up-computation"&gt;
&lt;h1&gt;Scaling up computation&lt;/h1&gt;
&lt;p&gt;When you want to scale up from a laptop onto a supercomputing cluster, you can use &lt;a class="reference external" href="https://distributed.dask.org/en/latest/"&gt;dask-distributed&lt;/a&gt; to handle that.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="c1"&gt;# Setup a local cluster&lt;/span&gt;
&lt;span class="c1"&gt;# By default this sets up 1 worker per core&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;See the &lt;a class="reference external" href="https://distributed.dask.org/en/latest/"&gt;documentation here&lt;/a&gt; to get set up for your system.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/19/image-segmentation.md&lt;/span&gt;, line 334)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="bonus-content-using-arrays-on-gpu"&gt;
&lt;h1&gt;Bonus content: using arrays on GPU&lt;/h1&gt;
&lt;p&gt;We’ve recently been adding GPU support to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We’re able to add GPU support using a library called &lt;a class="reference external" href="https://cupy.dev/"&gt;CuPy&lt;/a&gt;. &lt;a class="reference external" href="https://cupy.dev/"&gt;CuPy&lt;/a&gt; is an array library with a numpy-like API, accelerated with NVIDIA CUDA. Instead of having Dask arrays which contain numpy chunks, we can have Dask arrays containing cupy chunks instead. This &lt;a class="reference external" href="https://blog.dask.org/2019/01/03/dask-array-gpus-first-steps"&gt;blogpost&lt;/a&gt; explains the benefits of GPU acceleration and gives some benchmarks for computations on CPU, a single GPU, and multiple GPUs.&lt;/p&gt;
&lt;section id="gpu-support-available-in-dask-image"&gt;
&lt;h2&gt;GPU support available in dask-image&lt;/h2&gt;
&lt;p&gt;From &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt; version 0.6.0, there is GPU array support for four of the six subpackages:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;imread&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ndfilters&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ndinterp&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ndmorph&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Subpackages of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-image&lt;/span&gt;&lt;/code&gt; that do &lt;em&gt;not&lt;/em&gt; yet have GPU support are.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ndfourier&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ndmeasure&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We hope to add GPU support to these in the future.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="an-example"&gt;
&lt;h2&gt;An example&lt;/h2&gt;
&lt;p&gt;Here’s an example of an image convolution with Dask on the CPU:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# CPU example&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;numpy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image.ndfilters&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;convolve&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;convolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And here’s the same example of an image convolution with Dask on the GPU. The only thing necessary to change is the type of input arrays.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Same example moved to the GPU&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cupy&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;- import cupy instead of numpy (version &amp;gt;=7.7.0)&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image.ndfilters&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;convolve&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;- cupy dask array&lt;/span&gt;
&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cupy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;- cupy array&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;convolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can’t mix arrays on the CPU and arrays on the GPU in the same computation. This is why the weights &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;w&lt;/span&gt;&lt;/code&gt; must be a cupy array in the second example above.&lt;/p&gt;
&lt;p&gt;Additionally, you can transfer data between the CPU and GPU. So in situations where the GPU speedup is larger than than cost associated with transferring data, this may be useful to do.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reading-in-images-onto-the-gpu"&gt;
&lt;h2&gt;Reading in images onto the GPU&lt;/h2&gt;
&lt;p&gt;Generally, we want to start our image processing by reading in data from images stored on disk. We can use the &lt;a class="reference external" href="http://image.dask.org/en/latest/dask_image.imread.html"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;imread&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; function with the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;arraytype=cupy&lt;/span&gt;&lt;/code&gt; keyword argument to do this.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_image.imread&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;imread&lt;/span&gt;

&lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;data/BBBC039/images/*.tif&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;images_on_gpu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;data/BBBC039/images/*.tif&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arraytype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cupy&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/19/image-segmentation.md&lt;/span&gt;, line 404)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="how-you-can-get-involved"&gt;
&lt;h1&gt;How you can get involved&lt;/h1&gt;
&lt;p&gt;Create and share your own segmentation or image processing workflows with Dask (&lt;a class="reference external" href="https://github.com/dask/dask-blog/issues/47"&gt;join the current discussion on segmentation&lt;/a&gt; or &lt;a class="reference external" href="https://github.com/dask/dask-blog/issues/new?assignees=&amp;amp;amp;labels=%5B%22type%2Ffeature%22%2C+%22needs-triage%22%5D&amp;amp;amp;template=feature-request.md"&gt;propose a new blogpost topic here&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Contribute to adding GPU support to dask-image: https://github.com/dask/dask-image/issues/133&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/03/19/image-segmentation/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="imaging" label="imaging"/>
    <published>2021-03-19T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/03/11/dask_memory_usage/</id>
    <title>Measuring Dask memory usage with dask-memusage</title>
    <updated>2021-03-11T00:00:00+00:00</updated>
    <author>
      <name>&lt;a href="https://pythonspeed.com"&gt;Itamar Turner-Trauring&lt;/a&gt;</name>
    </author>
    <content type="html">&lt;p&gt;Using too much computing resources can get expensive when you’re scaling up in the cloud.&lt;/p&gt;
&lt;p&gt;To give a real example, I was working on the image processing pipeline for a spatial gene sequencing device, which could report not just which genes were being expressed but also where they were in a 3D volume of cells.
In order to get this information, a specialized microscope took snapshots of the cell culture or tissue, and the resulting data was run through a Dask pipeline.&lt;/p&gt;
&lt;p&gt;The pipeline was fairly slow, so I did some back-of-the-envelope math to figure out our computing costs would be once we started running more data for customers.
&lt;strong&gt;It turned out that we’d be using 70% of our revenue just paying for cloud computing!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Clearly I needed to optimize this code.&lt;/p&gt;
&lt;p&gt;When we think about the bottlenecks in large-scale computation, we often focus on CPU: we want to use more CPU cores in order to get faster results.
Paying for all that CPU can be expensive, as in this case, and I did successfully reduce CPU usage by quite a lot.&lt;/p&gt;
&lt;p&gt;But high memory usage was also a problem, and fixing that problem led me to build a series of tools, tools that can also help you optimize and reduce your Dask memory usage.&lt;/p&gt;
&lt;p&gt;In the rest of this article you will learn:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#problem"&gt;&lt;span class="xref myst"&gt;How high memory usage can drive up your computing costs&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How a tool called &lt;a class="reference external" href="https://github.com/itamarst/dask-memusage/"&gt;dask-memusage&lt;/a&gt; can help you &lt;a class="reference internal" href="#dask-memusage"&gt;&lt;span class="xref myst"&gt;find peak memory usage of the tasks in your Dask execution graph&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to &lt;a class="reference internal" href="#fil"&gt;&lt;span class="xref myst"&gt;further pinpoint high memory usage&lt;/span&gt;&lt;/a&gt; using the &lt;a class="reference external" href="https://pythonspeed.com/fil"&gt;Fil memory profiler&lt;/a&gt;, so you can reduce memory usage.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/11/dask_memory_usage.md&lt;/span&gt;, line 30)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="the-problem-fixed-processing-chunks-and-a-high-memory-cpu-ratio-problem"&gt;

&lt;p&gt;As a reminder, I was working on a Dask pipeline that processed data from a specialized microscope.
The resulting data volume was quite large, and certain subsets of images had to be processed together as a unit.
From a computational standpoint, we effectively had a series of inputs X0, X1, X2, … that could be independently processed by a function &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f()&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The internal processing of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f()&lt;/span&gt;&lt;/code&gt; could not easily be parallelized further.
From a CPU scheduling perspective, this was fine, it was still an embarrassingly parallel problem given the large of number of X inputs.&lt;/p&gt;
&lt;p&gt;For example, if I provisioned a virtual machine with 4 CPU cores, to process the data I could start four processes, and each would max out a single core.
If I had 12 inputs and each processing step took about the same time, they might run as follows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CPU0: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X0)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X4)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X8)&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU1: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X1)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X5)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X9)&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU2: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X2)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X6)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X10)&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU3: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X3)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X7)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X11)&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If I could make &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f()&lt;/span&gt;&lt;/code&gt; faster, the pipeline as a whole would also run faster.&lt;/p&gt;
&lt;p&gt;CPU is not the only resource used in computation, however: RAM can also be a bottleneck.
For example, let’s say each call to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(Xi)&lt;/span&gt;&lt;/code&gt; took 12GB of RAM.
That means to fully utilize 4 CPUs, I would need 48GB of RAM—but what if my computer only has 16GB of RAM?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Even though my computer has 4 CPUs, I can only utilize one CPU on a computer with 16GB RAM, because I don’t have enough RAM to run more than one task in parallel.&lt;/strong&gt;
In practice, these tasks ran in the cloud, where I could ensure the necessary RAM/core ratio was preserved by choosing the right pre-configured VM instances.
And on some clouds you can freely set the amount of RAM and number of CPU cores for each virtual machine you spin up.&lt;/p&gt;
&lt;p&gt;However, I didn’t quite know how much memory was used at peak, so I’d had to limit parallelism to reduce out-of-memory errors.
As a result, the default virtual machines we were using had half their CPUs resting idle, resources were paying for but not using.&lt;/p&gt;
&lt;p&gt;In order to provision hardware appropriately and max out all the CPUs, I needed to know how much peak memory each task was using.
And to do that, I created a new tool.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/11/dask_memory_usage.md&lt;/span&gt;, line 63)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="measuring-peak-task-memory-usage-with-dask-memusage-dask-memusage"&gt;
&lt;h1&gt;Measuring peak task memory usage with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; {#dask-memusage}&lt;/h1&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; is a tool for measuring peak memory usage for each task in the Dask execution graph.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Per &lt;em&gt;task&lt;/em&gt; because Dask executes code as a graph of tasks, and the graph determines how much parallelism can be used.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Peak&lt;/em&gt; memory is important, because that is the bottleneck.
It doesn’t matter if average memory usage per task is 4GB, if two parallel tasks in the graph need 12GB each at the same time, you’re going to need 24GB of RAM if you want to to run both tasks on the same computer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="using-dask-memusage"&gt;
&lt;h2&gt;Using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Since the gene sequencing code is proprietary and quite complex, let’s use a different example.
We’re going to count the occurrence of words in some text files, and then report the top-10 most common words in each file.
You can imagine combining the data later on, but we won’t bother with that in this simple example.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;gc&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sleep&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pathlib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.bag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;from_sequence&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;collections&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_memusage&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;calculate_top_10&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# See notes below&lt;/span&gt;

    &lt;span class="c1"&gt;# Load the file&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Count the words&lt;/span&gt;
    &lt;span class="n"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.,&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\&amp;quot;&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# Choose the top 10:&lt;/span&gt;
    &lt;span class="n"&gt;by_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# See notes below&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;by_count&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:])&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Setup the calculation:&lt;/span&gt;

    &lt;span class="c1"&gt;# Create a 4-process cluster (running locally). Note only one thread&lt;/span&gt;
    &lt;span class="c1"&gt;# per-worker: because polling is per-process, you can&amp;#39;t run multiple&lt;/span&gt;
    &lt;span class="c1"&gt;# threads per worker, otherwise you&amp;#39;ll get results that combine memory&lt;/span&gt;
    &lt;span class="c1"&gt;# usage of multiple tasks.&lt;/span&gt;
    &lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threads_per_worker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                           &lt;span class="n"&gt;memory_limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Install dask-memusage:&lt;/span&gt;
    &lt;span class="n"&gt;dask_memusage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;memusage.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create the task graph:&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;from_sequence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterdir&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calculate_top_10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;example2.png&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rankdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;TD&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Run the calculations:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ... do something with results ...&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Here’s what the task graph looks like:&lt;/p&gt;
&lt;img src="/images/dask_memusage/example2.png" style="width: 75%; margin: 2em;"&gt;
&lt;p&gt;Plenty of parallelism!&lt;/p&gt;
&lt;p&gt;We can run the program on some files:&lt;/p&gt;
&lt;div class="highlight-shell-session notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;dask&lt;span class="o"&gt;[&lt;/span&gt;bag&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;dask_memusage
&lt;span class="gp"&gt;$ &lt;/span&gt;python&lt;span class="w"&gt; &lt;/span&gt;example2.py&lt;span class="w"&gt; &lt;/span&gt;files/
&lt;span class="gp gp-VirtualEnv"&gt;(&amp;#39;frankenstein.txt&amp;#39;, [(&amp;#39;that&amp;#39;, 1016)&lt;/span&gt;&lt;span class="go"&gt;, (&amp;#39;was&amp;#39;, 1021), (&amp;#39;in&amp;#39;, 1180), (&amp;#39;a&amp;#39;, 1438), (&amp;#39;my&amp;#39;, 1751), (&amp;#39;to&amp;#39;, 2164), (&amp;#39;i&amp;#39;, 2754), (&amp;#39;of&amp;#39;, 2761), (&amp;#39;and&amp;#39;, 3025), (&amp;#39;the&amp;#39;, 4339)])&lt;/span&gt;
&lt;span class="gp gp-VirtualEnv"&gt;(&amp;#39;pride_and_prejudice.txt&amp;#39;, [(&amp;#39;she&amp;#39;, 1660)&lt;/span&gt;&lt;span class="go"&gt;, (&amp;#39;i&amp;#39;, 1730), (&amp;#39;was&amp;#39;, 1832), (&amp;#39;in&amp;#39;, 1904), (&amp;#39;a&amp;#39;, 1981), (&amp;#39;her&amp;#39;, 2142), (&amp;#39;and&amp;#39;, 3503), (&amp;#39;of&amp;#39;, 3705), (&amp;#39;to&amp;#39;, 4188), (&amp;#39;the&amp;#39;, 4492)])&lt;/span&gt;
&lt;span class="gp gp-VirtualEnv"&gt;(&amp;#39;greatgatsby.txt&amp;#39;, [(&amp;#39;that&amp;#39;, 564)&lt;/span&gt;&lt;span class="go"&gt;, (&amp;#39;was&amp;#39;, 760), (&amp;#39;he&amp;#39;, 770), (&amp;#39;in&amp;#39;, 849), (&amp;#39;i&amp;#39;, 999), (&amp;#39;to&amp;#39;, 1197), (&amp;#39;of&amp;#39;, 1224), (&amp;#39;a&amp;#39;, 1440), (&amp;#39;and&amp;#39;, 1565), (&amp;#39;the&amp;#39;, 2543)])&lt;/span&gt;
&lt;span class="gp gp-VirtualEnv"&gt;(&amp;#39;big.txt&amp;#39;, [(&amp;#39;his&amp;#39;, 40032)&lt;/span&gt;&lt;span class="go"&gt;, (&amp;#39;was&amp;#39;, 45356), (&amp;#39;that&amp;#39;, 47924), (&amp;#39;he&amp;#39;, 48276), (&amp;#39;a&amp;#39;, 83228), (&amp;#39;in&amp;#39;, 86832), (&amp;#39;to&amp;#39;, 114184), (&amp;#39;and&amp;#39;, 152284), (&amp;#39;of&amp;#39;, 159888), (&amp;#39;the&amp;#39;, 314908)])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;As one would expect, the most common words are stem words, but there is still some variation in order.&lt;/p&gt;
&lt;p&gt;Next, let’s look at the results from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="dask-memusage-output-and-how-it-works"&gt;
&lt;h2&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; output, and how it works&lt;/h2&gt;
&lt;p&gt;You’ll notice that the actual use of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; involves just one extra line, other than the import:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;dask_memusage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;memusage.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What this will do is poll the process at 10ms intervals for peak memory usage, broken down by task.
In this case, here’s what &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;memusage.csv&lt;/span&gt;&lt;/code&gt; looks like:&lt;/p&gt;
&lt;div class="highlight-csv notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;task_key,min_memory_mb,max_memory_mb
&amp;quot;(&amp;#39;from_sequence-3637e6ff937ef8488894df60a80f62ed&amp;#39;, 3)&amp;quot;,51.2421875,51.2421875
&amp;quot;(&amp;#39;from_sequence-3637e6ff937ef8488894df60a80f62ed&amp;#39;, 0)&amp;quot;,51.70703125,51.70703125
&amp;quot;(&amp;#39;from_sequence-3637e6ff937ef8488894df60a80f62ed&amp;#39;, 1)&amp;quot;,51.28125,51.78515625
&amp;quot;(&amp;#39;from_sequence-3637e6ff937ef8488894df60a80f62ed&amp;#39;, 2)&amp;quot;,51.30859375,51.30859375
&amp;quot;(&amp;#39;calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca&amp;#39;, 2)&amp;quot;,56.19140625,56.19140625
&amp;quot;(&amp;#39;calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca&amp;#39;, 0)&amp;quot;,51.70703125,54.26953125
&amp;quot;(&amp;#39;calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca&amp;#39;, 1)&amp;quot;,52.30078125,52.30078125
&amp;quot;(&amp;#39;calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca&amp;#39;, 3)&amp;quot;,51.48046875,384.00390625
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For each task in the graph we are told minimum memory usage and peak memory usage, in MB.&lt;/p&gt;
&lt;p&gt;In more readable form:&lt;/p&gt;
&lt;div class="pst-scrollable-table-container"&gt;&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;task_key&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;min_memory_mb&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;max_memory_mb&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;“(‘from_sequence-3637e6ff937ef8488894df60a80f62ed’, 3)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.2421875&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.2421875&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;“(‘from_sequence-3637e6ff937ef8488894df60a80f62ed’, 0)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.70703125&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.70703125&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;“(‘from_sequence-3637e6ff937ef8488894df60a80f62ed’, 1)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.28125&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.78515625&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;“(‘from_sequence-3637e6ff937ef8488894df60a80f62ed’, 2)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.30859375&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.30859375&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;“(‘calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca’, 2)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;56.19140625&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;56.19140625&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;“(‘calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca’, 0)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.70703125&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;54.26953125&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;“(‘calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca’, 1)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;52.30078125&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;52.30078125&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;“(‘calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca’, 3)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.48046875&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;384.00390625&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;The bottom four lines are the interesting ones; all four start with a minimum memory usage of ~50MB RAM, and then memory may or may not increase as the code runs.
How much it increases presumably depends on the size of the files; most of them are quite small, so memory usage doesn’t change much.
&lt;strong&gt;One file uses much more maximum memory than the others, 384MB of RAM; presumably it’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;big.txt&lt;/span&gt;&lt;/code&gt; which is 25MB, since the other files are all smaller than 1MB.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The mechanism used, polling peak process memory, has some limitations:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;You’ll notice there’s a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;gc.collect()&lt;/span&gt;&lt;/code&gt; at the top of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;calculate_top_10()&lt;/span&gt;&lt;/code&gt;; this ensures we don’t count memory from previous code that hasn’t been cleaned up yet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There’s also a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sleep()&lt;/span&gt;&lt;/code&gt; at the bottom of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;calculate_top_10()&lt;/span&gt;&lt;/code&gt;.
Because polling is used, tasks that run too quickly won’t get accurate information—the polling happens every 10ms or so, so you want to sleep at least 20ms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, because polling is per-process, you can’t run multiple threads per worker, otherwise you’ll get results that combine memory usage of multiple tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="interpreting-the-data"&gt;
&lt;h2&gt;Interpreting the data&lt;/h2&gt;
&lt;p&gt;What we’ve learned is that memory usage of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;calculate_top_10()&lt;/span&gt;&lt;/code&gt; grows with file size; this can be used to &lt;a class="reference external" href="https://pythonspeed.com/articles/estimating-memory-usage/"&gt;characterize the memory requirements for the workload&lt;/a&gt;.
That is, we can create a model that links data input sizes and required RAM, and then we can calculate the required RAM for any given level of parallelism.
And that can guide our choice of hardware, if we assume one task per CPU core.&lt;/p&gt;
&lt;p&gt;Going back to my original motivating problem, the gene sequencing pipeline: using the data from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt;, I was able to come up with a formula saying “for this size input, this much memory is necessary”.
Whenever we ran a batch job we could therefore set the parallelism as high as possible given the number of CPUs and RAM on the machine.&lt;/p&gt;
&lt;p&gt;While this allowed for more parallelism, it still wasn’t sufficient—processing was still using a huge amount of RAM, RAM that we had to pay for either with time (by using less CPUs) or money (by paying for more expensive virtual machines that more RAM).
So the next step was to reduce memory usage.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/11/dask_memory_usage.md&lt;/span&gt;, line 216)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="reducing-memory-usage-with-fil-fil"&gt;
&lt;h1&gt;Reducing memory usage with Fil {#fil}&lt;/h1&gt;
&lt;p&gt;If we look at the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; output for our word-counting example, the memory usage seems rather high: for a 25MB file, we’re using 330MB of RAM to count words.
Thinking through how an ideal version of this code might work, we ought to be able to process the file with much less memory (for example we could redesign our code to process the file line by line, reducing memory).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;And that’s another way in which &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; can be helpful: it can point us at specific code that needs memory usage optimized, at the granularity of a task.&lt;/strong&gt;
A task can be a rather large chunk of code, though, so the next step is to use a memory profiler that can point to specific lines of code.&lt;/p&gt;
&lt;p&gt;When working on the gene sequencing tool I used the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;memory_profiler&lt;/span&gt;&lt;/code&gt; package, and while that worked, and I managed to reduce memory usage quite a bit, I found it quite difficult to use.
It turns out that for batch data processing, the typical use case for Dask, &lt;a class="reference external" href="https://pythonspeed.com/articles/memory-profiler-data-scientists/"&gt;you want a different kind of memory profiler&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So after I’d left that job, I created &lt;a class="reference external" href="https://pythonspeed.com/fil"&gt;a memory profiler called Fil&lt;/a&gt; that is expressly designed for finding peak memory usage.
Unlike &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt;, which can be run on production workloads, Fil slows down your execution and has other limitations I’m currently working on (it doesn’t support multiple processes, as of March 2021), so for now it’s better used for manual profiling.&lt;/p&gt;
&lt;p&gt;We can write a little script that only runs on &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;big.txt&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pathlib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;example2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculate_top_10&lt;/span&gt;

&lt;span class="n"&gt;calculate_top_10&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;files/big.txt&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Run it under Fil:&lt;/p&gt;
&lt;div class="highlight-shell-session notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="go"&gt;pip install filprofiler&lt;/span&gt;
&lt;span class="go"&gt;fil-profile run example3.py&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And the result shows us where the bulk of the memory is being allocated:&lt;/p&gt;
&lt;iframe id="peak" src="/images/dask_memusage/peak-memory.svg" width="100%" height="300" scrolling="auto" frameborder="0"&gt;&lt;/iframe&gt;
&lt;p&gt;Reading in the file takes 8% of memory, but &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;data.split()&lt;/span&gt;&lt;/code&gt; is responsible for 84% of memory.
Perhaps we shouldn’t be loading the whole file into memory and splitting the whole file into words, and instead we should be processing the file line by line.
A good next step if this were real code would be to fix the way &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;calculate_top_10()&lt;/span&gt;&lt;/code&gt; is implemented.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/11/dask_memory_usage.md&lt;/span&gt;, line 254)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="next-steps"&gt;
&lt;h1&gt;Next steps&lt;/h1&gt;
&lt;p&gt;What should you do if your Dask workload is using too much memory?&lt;/p&gt;
&lt;p&gt;If you’re running Dask workloads with the Distributed backend, and you’re fine with only having one thread per worker, running with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; will give you real-world per-task memory usage on production workloads.
You can then use the resulting information in a variety of ways:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;As a starting point for optimizing memory usage.
Once you know which tasks use the most memory, you can then &lt;a class="reference external" href="https://pythonspeed.com/articles/memory-profiler-data-scientists/"&gt;use Fil to figure out which lines of code are responsible&lt;/a&gt; and then use &lt;a class="reference external" href="https://pythonspeed.com/articles/data-doesnt-fit-in-memory/"&gt;a variety of techniques to reduce memory usage&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When possible, you can fine tune your chunking size; smaller chunks will use less memory.
If you’re using Dask Arrays you can &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-chunks.html"&gt;set the chunk size&lt;/a&gt;; with Dask Dataframes you can &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-best-practices.html#repartition-to-reduce-overhead"&gt;ensure good partition sizes&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can fine tune your hardware configuration, so you’re not wasting RAM or CPU cores.
For example, on AWS you can &lt;a class="reference external" href="https://instances.vantage.sh/"&gt;choose a variety of instance sizes&lt;/a&gt; with different RAM/CPU ratios, one of which may match your workload characteristics.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In my original use case, the gene sequencing pipeline, I was able to use a combination of lower memory use and lower CPU use to reduce costs to a much more modest level.
And when doing R&amp;amp;D, I was able to get faster results with the same hardware costs.&lt;/p&gt;
&lt;p&gt;You can &lt;a class="reference internal" href="#github.com/itamarst/dask-memusage/"&gt;&lt;span class="xref myst"&gt;learn more about &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; here&lt;/span&gt;&lt;/a&gt;, and &lt;a class="reference external" href="https://pythonspeed.com/fil"&gt;learn more about the Fil memory profiler here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/03/11/dask_memory_usage/"/>
    <summary>Using too much computing resources can get expensive when you’re scaling up in the cloud.</summary>
    <category term="dask" label="dask"/>
    <category term="distributed" label="distributed"/>
    <category term="memory" label="memory"/>
    <category term="profiling" label="profiling"/>
    <category term="ram" label="ram"/>
    <published>2021-03-11T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/03/04/the-life-science-community/</id>
    <title>Getting to know the life science community</title>
    <updated>2021-03-04T00:00:00+00:00</updated>
    <author>
      <name>Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="executive-summary"&gt;

&lt;p&gt;Dask wants to better support the needs of life scientists. We’ve been getting to know the community, in order to better understand:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Who is out there?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What kind of problems are they trying to solve?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We’ve learned that:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Lots of people want more examples tailored to their specific scientifc domain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better integration of Dask into other software is considered very important.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Managing memory constraints when working with big data is a common pain point.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our strategic plan for this year involves three parallel streams:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#infrastructure"&gt;&lt;span class="xref myst"&gt;INFRASTRUCTURE&lt;/span&gt;&lt;/a&gt; (60%) - improvements to Dask, or to other software with many life science users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#outreach"&gt;&lt;span class="xref myst"&gt;OUTREACH&lt;/span&gt;&lt;/a&gt; (20%) - blogposts, talks, webinars, tutorials, and examples.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#applications"&gt;&lt;span class="xref myst"&gt;APPLICATIONS&lt;/span&gt;&lt;/a&gt; (20%) - the application of Dask to a specific life science problem, collaborating with individual labs or groups.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you still want to have your say, it’s not too late -
&lt;a class="reference external" href="https://t.co/0NeknSdrO9?amp=1"&gt;click this link to get in touch!&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 31)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#background"&gt;&lt;span class="xref myst"&gt;Background&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#what-we-learned"&gt;&lt;span class="xref myst"&gt;What we learned&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#from-dask-users"&gt;&lt;span class="xref myst"&gt;From Dask users&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#from-other-software-libraries"&gt;&lt;span class="xref myst"&gt;From other software libraries&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#opportunities-we-see"&gt;&lt;span class="xref myst"&gt;Opportunities we see&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#strategic-plan"&gt;&lt;span class="xref myst"&gt;Strategic plan&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#limitations"&gt;&lt;span class="xref myst"&gt;Limitations&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#methods"&gt;&lt;span class="xref myst"&gt;Methods&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 42)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="background"&gt;
&lt;h1&gt;Background&lt;/h1&gt;
&lt;p&gt;Recently Dask &lt;a class="reference external" href="https://chanzuckerberg.com/eoss/proposals/"&gt;won some funding&lt;/a&gt; to hire a developer (&lt;a class="reference external" href="https://github.com/GenevieveBuckley/"&gt;Genevieve Buckley&lt;/a&gt;) to improve Dask specifically for life sciences.&lt;/p&gt;
&lt;p&gt;Working with scientists is a really great way to drive growth in open source projects. Both scientists and software developers benefit. Early on, Dask had a lot of success integrating with the geosciences community. It’d be great to see similar success for life sciences too.&lt;/p&gt;
&lt;p&gt;There are several areas of life science where we see Dask being used today:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Biological image processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Single cell analysis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Statistical genetics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;…and many more&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We’ve solicited feedback from the life science community, to come up with a strategic plan to direct our effort over the next year.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 57)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="what-we-learned"&gt;
&lt;h1&gt;What we learned&lt;/h1&gt;
&lt;section id="from-dask-users"&gt;
&lt;h2&gt;From Dask users&lt;/h2&gt;
&lt;p&gt;When we talked to individual Dask users, we heard a lot of similar themes in their comments.&lt;/p&gt;
&lt;p&gt;People wanted:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Better documentation and examples&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better support for working with constrained resources&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better interoperability with other software tools&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The most common request was for better documentation with more examples. People across many different areas of life science all said this could help them a lot. A corresponding challenge here is the multitude of different areas of life science, all of which require targeted documentation.&lt;/p&gt;
&lt;p&gt;GPU support was also commonly mentioned. Comments about GPUs fit into two of the categories above: GPU memory is often a constraint, and life scientists also want it to be easier to apply deep learning models to their data.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="from-other-software-libraries"&gt;
&lt;h2&gt;From other software libraries&lt;/h2&gt;
&lt;p&gt;We didn’t only talk with individual users of Dask. We also spoke to developers of scientific software projects.&lt;/p&gt;
&lt;section id="why-would-other-software-libraries-adopt-dask"&gt;
&lt;h3&gt;Why would other software libraries adopt Dask?&lt;/h3&gt;
&lt;p&gt;Software projects wanted to solve problems related to:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Easier deployment to distributed clusters&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Managing memory when processing large datasets&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parallelization of existing functionality&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dask is good at solving those kinds of problems, and might be a good solution for this.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="who-we-ve-talked-to"&gt;
&lt;h3&gt;Who we’ve talked to&lt;/h3&gt;
&lt;p&gt;Some of the software projects we spoke to include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://napari.org/"&gt;napari&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://pystatgen.github.io/sgkit/latest/"&gt;sgkit&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://scanpy.readthedocs.io/en/stable/"&gt;scanpy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://squidpy.readthedocs.io/en/latest/"&gt;squidpy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.ilastik.org/"&gt;ilastik&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://cellprofiler.org/"&gt;CellProfiler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="current-status"&gt;
&lt;h3&gt;Current status&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://napari.org/"&gt;napari&lt;/a&gt; is a python based image viewer. Dask is already well-integrated with napari. Areas for opportunity here include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Improved documentation about how to work efficiently with Dask arrays in napari.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Smarter caching of neighbouring image chunks to avoid lag.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Guides for how to create plugins for napari, so the community can grow.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pystatgen.github.io/sgkit/latest/"&gt;sgkit&lt;/a&gt; is a statistical genetics toolkit. Dask is already well-integrated with sgkit. The developers would like improved infrastructure in the main Dask repositories that they can benefit from. Wishlist items include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Better ways to understand how things like array chunks change as they move through a Dask computation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better high level graph visualizations. Graph visualizations showing all the low level operations can be overwhelming.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better ways to identify poorly efficient areas in Dask computations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stability when new versions of Dask are released&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Making it easier to run Dask in the cloud. They are currently using &lt;a class="reference external" href="https://github.com/dask/dask-cloudprovider"&gt;dask-cloudprovider&lt;/a&gt; and finding that very useful.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a class="reference external" href="https://scanpy.readthedocs.io/en/stable/"&gt;scanpy&lt;/a&gt; is a library for single cell analysis in Python. It is built together with &lt;a class="reference external" href="https://anndata.readthedocs.io/en/latest/"&gt;anndata&lt;/a&gt;, an annotated data structure.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Data size is less of an issue for scanpy users, although anndata developers do think support for Dask would be a useful thing to add.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for sparse arrays is very important for these communities.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a class="reference external" href="https://squidpy.readthedocs.io/en/latest/"&gt;squidpy&lt;/a&gt; is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata. Because squidpy involves large imaging data on top of what we’d normally see for datasets in scanpy/anndata, this is a project with a large area of opportunity for Dask.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Integrating Dask with the squidpy ImageContainer class is a good first step towards handling large image data within the availabe RAM constraints.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a class="reference external" href="https://www.ilastik.org/"&gt;ilastik&lt;/a&gt; does not currently use Dask at all. They are curious to see if Dask can make it easier to scale up from a single machine to a cluster.
Users generally train an ilastik model interactively, and then want to apply it to many images. This second step is often when people want an easy way to scale up the computing resources available.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://cellprofiler.org/"&gt;CellProfiler&lt;/a&gt; is a pipeline tool for image processing. They have briefly experimented with Dask before.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Primarily, they want to parallelize existing functionality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Most common pipelines fall into three major “user stories” where focussing effort would make the most impact:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Image processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Object processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Measurements&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 134)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="opportunities-we-see"&gt;
&lt;h1&gt;Opportunities we see&lt;/h1&gt;
&lt;p&gt;Because large scientific software projects have many users, improvements here would be high value for the scientific community. This is a huge area of opportunity. We plan to collaborate with these developer communities as much as possible to drive this forward.&lt;/p&gt;
&lt;p&gt;Another area of opportunity is improving infrastructure for &lt;a class="reference external" href="https://github.com/dask/dask/issues/7141"&gt;high level graph visualizations&lt;/a&gt;. Power users and novices alike would benefit from better tools for identifying areas of inefficiencies in Dask computations.&lt;/p&gt;
&lt;p&gt;Finally, continuing to build support for Dask arrays with non-numpy chunks is also a high impact area of opportunity. In particular, support for sparse arrays, and support for arrays on the GPU were highlighted as very important to the life science community.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 142)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="strategic-direction"&gt;
&lt;h1&gt;Strategic direction&lt;/h1&gt;
&lt;p&gt;We’re going to manage this project with three parallel streams:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#infrastructure"&gt;&lt;span class="xref myst"&gt;INFRASTRUCTURE&lt;/span&gt;&lt;/a&gt; (60%)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#outreach"&gt;&lt;span class="xref myst"&gt;OUTREACH&lt;/span&gt;&lt;/a&gt; (20%)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#applications"&gt;&lt;span class="xref myst"&gt;APPLICATIONS&lt;/span&gt;&lt;/a&gt; (20%)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each stream will likely have one primary project at any time, with many more queued. Within each stream, proposed projects will be ranked according to: level of impact, time commitment required, and the availability of other developer resources.&lt;/p&gt;
&lt;section id="infrastructure"&gt;
&lt;h2&gt;Infrastructure&lt;/h2&gt;
&lt;p&gt;Infrastructure projects are improvements to either:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Projects housed within the &lt;a class="reference external" href="https://github.com/dask/"&gt;Dask organisation&lt;/a&gt;, or&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other software projects involving Dask with large numbers of life science users&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We’ll aim to spend around 60% of project effort on infrastructure.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="outreach"&gt;
&lt;h2&gt;Outreach&lt;/h2&gt;
&lt;p&gt;Outreach activities include blogposts, talks, webinars, tutorials, and creating examples for documentation. We aim to spend around 20% of project effort on outreach.&lt;/p&gt;
&lt;p&gt;If you have outreach ideas you want to share (perhaps you run a student group or popular meetup) then you can &lt;a class="reference external" href="https://docs.google.com/forms/d/e/1FAIpQLScBi8YOx3gGkL9rz8TsRTIZYiRha9qYOvXu4EZx9qGLtjLGCw/viewform?usp=sf_link"&gt;get in touch with us here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="applications"&gt;
&lt;h2&gt;Applications&lt;/h2&gt;
&lt;p&gt;The final stream focusses on the application of Dask to a specific problem in life science.&lt;/p&gt;
&lt;p&gt;These projects generally involve collaborating with individual labs or group, and have an end goal of summarizing their workflow in a blogpost. This feeds back into our outreach, so others in the community can learn from it.&lt;/p&gt;
&lt;p&gt;Ideally these are short term projects, so we can showcase many different applications of Dask. We aim to spend around 20% of project effort on applications.&lt;/p&gt;
&lt;p&gt;If you use Dask and have an example in mind you’d like to share, then you can &lt;a class="reference external" href="https://docs.google.com/forms/d/e/1FAIpQLScBi8YOx3gGkL9rz8TsRTIZYiRha9qYOvXu4EZx9qGLtjLGCw/viewform?usp=sf_link"&gt;get in touch with us here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="how-will-we-know-what-success-looks-like"&gt;
&lt;h2&gt;How will we know what success looks like?&lt;/h2&gt;
&lt;p&gt;The role of Dask Life Science Fellow has a very broad scope, so there are a lot of different ways we could be successful within this space.&lt;/p&gt;
&lt;p&gt;Some indicators of success are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Bugs being clearly described, or bottlenecks clearly identified&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bug fixes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improvements or new features made to Dask infrastructure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improvements or new features made in related project repositories&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better integration or support for Dask made in related project repositories for life sciences&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better documentation with examples tailored to specific areas of life science&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Blogposts written (ideally in collaboration with Dask users)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Talks given&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Webinars produced&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tutorials created&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We won’t have the time or the resources to do all the things, but we will be able to make an impact by focussing on a subset.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 196)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="limitations"&gt;
&lt;h1&gt;Limitations&lt;/h1&gt;
&lt;p&gt;The information we discovered talking to the life science community is likely to be biased in a few different ways.&lt;/p&gt;
&lt;p&gt;My (Genevieve’s) network is strongest among imaging scientists, and among people in Australia. It’s much less strong for other fields in life science, as my original training is in physics.&lt;/p&gt;
&lt;p&gt;The Dask project has strong links to other open source python projects, including scientific software. The Dask developer community also has strong links from companies including NVIDIA, Quansight, and others. They are likely to be over-represented among the people we spoke to.&lt;/p&gt;
&lt;p&gt;It’s much harder to find people who aren’t using Dask at all yet but have problems that would be a good fit for it. These people are very unlikely to be, say following &lt;a class="reference external" href="https://twitter.com/dask_dev/"&gt;Dask on twitter&lt;/a&gt;, and probably won’t be aware that we’re looking for them.&lt;/p&gt;
&lt;p&gt;I don’t think there are any perfect solutions to these problems.
We’ve tried to mitigate these effects by using loose second and third degree connections to spread awareness, as well as posting in science public forums.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 209)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="methods"&gt;
&lt;h1&gt;Methods&lt;/h1&gt;
&lt;p&gt;We used a variety of approaches to gather feedback from the life science community.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A &lt;a class="reference external" href="https://t.co/0NeknSdrO9?amp=1"&gt;short survey&lt;/a&gt; was created to gather comments&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It was advertised by the &lt;a class="reference external" href="https://twitter.com/dask_dev/"&gt;&amp;#64;dask_dev&lt;/a&gt; twitter account&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We asked related software projects consider retweeting for reach (&lt;a class="reference external" href="https://twitter.com/napari_imaging/status/1360090299901505543"&gt;example&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We posted in scientific Slack groups and online public forums&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We emailed other life scientists in our network, asking them to let their networks know too&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We contacted a number of life science researchers directly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We contacted several other scientific software groups directly and spoke with the developers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/04/the-life-science-community.md&lt;/span&gt;, line 221)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="join-the-discussion"&gt;
&lt;h1&gt;Join the discussion&lt;/h1&gt;
&lt;p&gt;Come join us in the Dask slack! We have a #life-science channel so there’s a place to discuss things relevant to the Dask life science community. You can &lt;a class="reference external" href="https://join.slack.com/t/dask/shared_invite/zt-mfmh7quc-nIrXL6ocgiUH2haLYA914g"&gt;request an invite to the Slack here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/03/04/the-life-science-community/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="imaging" label="imaging"/>
    <published>2021-03-04T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2021/03/03/summit/</id>
    <title>Dask User Summit 2021</title>
    <updated>2021-03-03T00:00:00+00:00</updated>
    <author>
      <name>Matthew Rocklin</name>
    </author>
    <content type="html">&lt;p&gt;Dask is organizing a &lt;a class="reference external" href="https://summit.dask.org"&gt;user summit&lt;/a&gt; in mid-May.
This will be a remote event focused on bringing together developers and users of Dask and the distributed PyData stack in different domains.&lt;/p&gt;
&lt;p&gt;User Summits like this are particularly important for a project like Dask
which serves such a diverse set of use cases.
Dask’s user communities include industries like finance, government, health,
geoscience, imaging, machine learning, and more. These communities often have
very similar problems, but don’t often communicate with each other.&lt;/p&gt;
&lt;p&gt;User summits provide a venue for disparate domains to connect over shared
technology challenges. Often a solution designed for one domain is useful for
others. As technologists, this sharing is critical in order to promote
consistent and high quality software solutions across domains, rather than
silo’ed solutions.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/03/summit.md&lt;/span&gt;, line 23)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="history"&gt;

&lt;p&gt;We organized a summit a year ago, focusing mainly on developers.
This was a fantastic time and resulted in a surprising amount of consensus building and forward movement both in technological and domain-specific directions.&lt;/p&gt;
&lt;p&gt;For more on our summit last year, see &lt;a class="reference internal" href="#../../../2020/04/28/dask-summit.html"&gt;&lt;span class="xref myst"&gt;this post&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://pbs.twimg.com/media/ERykEc9XUAEFq-L?format=jpg&amp;name=large"
     width="40%"&gt;
&lt;img src="https://pbs.twimg.com/media/ERzXhHnWAAE_zDA?format=jpg&amp;name=large"
    width="40%"&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/03/summit.md&lt;/span&gt;, line 35)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="organization"&gt;
&lt;h1&gt;Organization&lt;/h1&gt;
&lt;p&gt;We’ve asked &lt;a class="reference external" href="https://numfocus.org"&gt;NumFOCUS&lt;/a&gt; to organize this event for us.
NumFOCUS runs the highly successful and community oriented PyData conference
series, and had great success with their remote-first PyData Global conference
late last year.&lt;/p&gt;
&lt;p&gt;Tickets are intended to be reasonably priced on a sliding scale, with assistance given to any in need.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/03/summit.md&lt;/span&gt;, line 44)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="open-cfp"&gt;
&lt;h1&gt;Open CFP&lt;/h1&gt;
&lt;p&gt;I would like to encourage people submit proposals to talk at &lt;a class="reference external" href="https://summit.dask.org"&gt;summit.dask.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I would like to especially extend an invitation to those who are new to
the Dask community, or new to speaking in general. This year we’re especially
trying to highlight use cases of Dask, rather than developers pushing the
technology forward (although these talks are of course welcome as well).&lt;/p&gt;
&lt;p&gt;If you have an idea for a talk then please submit something and we’ll work
together on making it fit. Alternatively, if you have a colleague that you
think would enjoy or grow from speaking then I encourage you to encourage them
as well.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/03/summit.md&lt;/span&gt;, line 58)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="workshops"&gt;
&lt;h1&gt;Workshops&lt;/h1&gt;
&lt;p&gt;Finally, I’m excited about an experiment that we’re running this year with
&lt;em&gt;workshops&lt;/em&gt;. These are intended to be two-hour blocks of time dedicated to
a particular topic, organized by a specific community member (perhaps you?).
If you have a consistent theme for a set of 3-5 talks then this option gives
you the ability to curate and control a dedicated block of the conference. You
can invite your colleagues and collaborators. We’ll handle the conference
infrastructure while you handle the content.&lt;/p&gt;
&lt;p&gt;We stole this structure from workshops at larger academic conferences. We
think that it fits Dask well specifically because of the federated nature of
our community. We hope that it gives space for sub-communities to assemble and
better establish cohesive working groups.&lt;/p&gt;
&lt;p&gt;Themes in the past have included topics like Pangeo, RAPIDS, workflow
management, imaging, and performance.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/03/summit.md&lt;/span&gt;, line 76)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="apply-to-speak"&gt;
&lt;h1&gt;Apply to speak&lt;/h1&gt;
&lt;p&gt;Again, I encourage you and your colleagues to submit applications to speak this
year in May. The proposal page is at
&lt;a class="reference external" href="https://summit.dask.org/present/#guidelines"&gt;https://summit.dask.org/present/#guidelines&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/03/03/summit/"/>
    <summary>Dask is organizing a user summit in mid-May.
This will be a remote event focused on bringing together developers and users of Dask and the distributed PyData stack in different domains.</summary>
    <published>2021-03-03T00:00:00+00:00</published>
  </entry>
</feed>
