<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.dask.org</id>
  <title>Dask Working Notes - Posts by Freyam Mehta and Genevieve Buckley</title>
  <updated>2026-03-05T15:05:19.535698+00:00</updated>
  <link href="https://blog.dask.org"/>
  <link href="https://blog.dask.org/blog/author/freyam-mehta-and-genevieve-buckley/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://blog.dask.org/2021/08/23/gsoc-2021-project/</id>
    <title>Google Summer of Code 2021 - Dask Project</title>
    <updated>2021-08-23T00:00:00+00:00</updated>
    <author>
      <name>Freyam Mehta and Genevieve Buckley</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/08/23/gsoc-2021-project.md&lt;/span&gt;, line 8)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="overview"&gt;

&lt;p&gt;Here’s an update on new features related to visualizing Dask graphs and HTML representations. You can try these new features today with version &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;2021.08.1&lt;/span&gt;&lt;/code&gt; or above. This work was done by Freyam Mehta during the Google Summer of Code 2021. Dask took part in the program under the NumFOCUS umbrella organization.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/08/23/gsoc-2021-project.md&lt;/span&gt;, line 12)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="contents"&gt;
&lt;h1&gt;Contents&lt;/h1&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#visualizing-dask-graphs"&gt;&lt;span class="xref myst"&gt;Visualizing Dask graphs&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#graphviz-node-size-scaling"&gt;&lt;span class="xref myst"&gt;Graphviz node size scaling&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#new-tooltips"&gt;&lt;span class="xref myst"&gt;New tooltips&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#color-by-layer-type"&gt;&lt;span class="xref myst"&gt;Color by layer type&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#bugfix-in-visualize-method"&gt;&lt;span class="xref myst"&gt;Bugfix in visualize method&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#html-representations"&gt;&lt;span class="xref myst"&gt;HTML Representations&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#array-images-in-html-repr-for-high-level-graphs"&gt;&lt;span class="xref myst"&gt;Array images in HTML repr for high level graphs&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#new-html-repr-for-processinterface-class"&gt;&lt;span class="xref myst"&gt;New HTML repr for ProcessInterface class&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#new-html-repr-for-security-class"&gt;&lt;span class="xref myst"&gt;New HTML repr for Security class&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/08/23/gsoc-2021-project.md&lt;/span&gt;, line 24)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="visualizing-dask-graphs"&gt;
&lt;h1&gt;Visualizing Dask graphs&lt;/h1&gt;
&lt;p&gt;There are several new features involving Dask &lt;a class="reference external" href="https://docs.dask.org/en/latest/graphs.html"&gt;task graph&lt;/a&gt; visualization. Task graphs are a visual representation of the order and dependencies of each individual task within a dask computation. They are a very userful diagnostic tool, and have been used for a long time.&lt;/p&gt;
&lt;img src="/images/gsoc21/dask-simple.png" alt="An example task graph visualization." height=300&gt;
&lt;p&gt;Freyam worked on making these visualizations more illustrative, engaging, and informative. The &lt;a class="reference external" href="https://docs.dask.org/en/latest/graphviz.html"&gt;Graphviz&lt;/a&gt; library boasts a great set of attributes which can be modifified to create a more visually appealing output.&lt;/p&gt;
&lt;p&gt;These features primarily improve the Dask &lt;a class="reference external" href="https://docs.dask.org/en/latest/high-level-graphs.html"&gt;high level graph&lt;/a&gt; visualizations. Both low level and high level Dask graphs can be accessed with very similar methods:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dask low level graph: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result.visualize()&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask high level graph: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result.dask.visualize()&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;…where &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result&lt;/span&gt;&lt;/code&gt; is a dask object or collection.&lt;/p&gt;
&lt;section id="graphviz-node-size-scaling"&gt;
&lt;h2&gt;Graphviz node size scaling&lt;/h2&gt;
&lt;p&gt;The first change you may notice to the Dask high level graphs, is that the node sizes have been adjusted to scale with the number of tasks in each layer. Layers with more tasks would appear larger than the rest.&lt;/p&gt;
&lt;p&gt;This is a helpful feature to have, because now users can get a much more intuitive sense of where the bulk of their computation takes place.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Dask high level graph&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/7869.png" alt="Example: graphviz node size scaling, pull request #7869" height=414 width=736&gt;
&lt;p&gt;Note: this change only affects the graphviz output for Dask high level graphs. Low level graphs are left unchanged, because each visual node corresponds to one task.&lt;/p&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7869"&gt;Pull request #7869 by Freyam Mehta &lt;em&gt;“Add node size scaling to the Graphviz output for the high level graphs”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-tooltips"&gt;
&lt;h2&gt;New tooltips&lt;/h2&gt;
&lt;p&gt;Dask high level graphs now include hover tooltips to provide a brief summary of more detailed information. To use the tooltips, generate a dask high level graph (eg: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result.dask.visualize()&lt;/span&gt;&lt;/code&gt;) then hover your mouse above the layer you are interested in.&lt;/p&gt;
&lt;img src="/images/gsoc21/7973.png" alt="Example: tooltips provide extra information, pull request #7973" height=414 width=736&gt;
&lt;p&gt;Tooltips provide information such as the layer type and number of tasks associated with it. There is additional information provided for specific dask collections, like dask arrays and dataframes.&lt;/p&gt;
&lt;p&gt;Dask array tooltip information additionally includes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Array shape&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chunk size&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chunk type (eg: are the array chunks numpy, cupy, sparse, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data type (eg: are the array values float, integer, boolean, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dask dataframe tooltip information additionally includes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Number of partitions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dataframe type&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dataframe columns&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Users have asked for a less overwhelming view into the dask task graph. We hope the high level graph view coupled with more detailed tooltip information can provide this middle ground, with enough information to be useful, but not so much as to become overwhelming (like the low level task graphs for large computations).&lt;/p&gt;
&lt;p&gt;Note: This feature is available for SVG output. Other image formats, like &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;.png&lt;/span&gt;&lt;/code&gt;, etc. do not support tooltips.&lt;/p&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7973"&gt;Pull request #7973 by Freyam Mehta &lt;em&gt;“Add tooltips to graphviz”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="color-by-layer-type"&gt;
&lt;h2&gt;Color by layer type&lt;/h2&gt;
&lt;p&gt;There is also a new feature enabling users to color code a high level graph according to layer type. This option can be enabled by passing the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;color=&amp;quot;layer_type&amp;quot;&lt;/span&gt;&lt;/code&gt; keyword argument, eg: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;result.dask.visualize(color=&amp;quot;layer_type&amp;quot;)&lt;/span&gt;&lt;/code&gt;. This change is intended to make it easier for users to see which layer types predominate.&lt;/p&gt;
&lt;p&gt;While there are no hard and fast rules about what makes a Dask computation efficient, there are some general guidelines:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dataframe shuffles are particularly expensive operations. You can &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-best-practices.html#avoid-full-data-shuffling"&gt;read more about this here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reading and writing data to/from storage/network services is often high-latency and therefore a bottleneck.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Blockwise layers are generally efficient for computation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All layers are materialized during computation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the &lt;a class="reference external" href="https://docs.dask.org/en/latest/best-practices.html"&gt;Dask best pracices&lt;/a&gt; pages for more information on creating more efficient Dask computations.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeseries&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;df3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;layer_type&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Dask high level graph with colored nodes by layer type&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/7974.png" alt="Example: Dask graph colored by layer type, pull request #7974" height=414 width=736&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7974"&gt;Pull request #7974 by Freyam Mehta &lt;em&gt;“Add colors to represent high level layer types”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="bugfix-in-visualize-method"&gt;
&lt;h2&gt;Bugfix in visualize method&lt;/h2&gt;
&lt;p&gt;Freyam also fixed a bug which caused an error when users tried to call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask.visualize()&lt;/span&gt;&lt;/code&gt; with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;filename=None&lt;/span&gt;&lt;/code&gt; (issue &lt;a class="reference external" href="https://github.com/dask/dask/issues/7685"&gt;#7685&lt;/a&gt;, fixed by pull request &lt;a class="reference external" href="https://github.com/dask/dask/pull/7740"&gt;#7740&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The bug was fixed by adding an extra condition before it reaches the error. If the format is &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;None&lt;/span&gt;&lt;/code&gt;, Dask now uses use a default &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;png&lt;/span&gt;&lt;/code&gt; format.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# success&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7740"&gt;Pull request #7740 by Freyam Mehta &lt;em&gt;“Fixing calling .visualize() with filename=None”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/08/23/gsoc-2021-project.md&lt;/span&gt;, line 135)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="html-representations"&gt;
&lt;h1&gt;HTML representations&lt;/h1&gt;
&lt;p&gt;Dask makes use of HTML representations in several places, for example in Dask collections like the Array and Dataframe classes (for background reading, see &lt;a class="reference external" href="https://matthewrocklin.com/blog/2019/07/04/html-repr"&gt;this blogpost&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;More recently, we’ve introduced HTML representations for high level graphs into Dask, and Jacob Tomlinson has implemented HTML representations in several places in the dask distributed library (for further reading, see &lt;a class="reference external" href="https://blog.dask.org/2021/07/07/high-level-graphs#visualization"&gt;this other blogpost&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;During Freyam’s Google Summer of Code project, he extended the HTML representations for Dask high level graphs to include images, and introduced two entirely new HTML representations to the dask distributed library.&lt;/p&gt;
&lt;section id="array-images-in-html-repr-for-high-level-graphs"&gt;
&lt;h2&gt;Array images in HTML repr for high level graphs&lt;/h2&gt;
&lt;p&gt;The HTML representation for dask high level graphs has been extended, and now includes SVG images of dask arrays at intermediate stages of computation.&lt;/p&gt;
&lt;p&gt;The motivation for this feature is similar to the motivation behind adding tooltips, discussed above. Users want easier ways to access information about the way a Dask computation changes as it moves through each stage of computation. We hope this improvement to the HTML representation for Dask high level graphs will provide an at a glance summary of array shape and chunk size at each stage.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.array&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;da&lt;/span&gt;

&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;

&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dask&lt;/span&gt;  &lt;span class="c1"&gt;# shows the HTML representation in Jupyter&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/7886.png" alt="Example: Array images now included in HTML representation of Dask high level graphs, pull request #7886" height=414 width=736&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/dask/pull/7886"&gt;Pull request #7886 by Freyam Mehta &lt;em&gt;“Add dask.array SVG to the HTML Repr”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-html-repr-for-processinterface-class"&gt;
&lt;h2&gt;New HTML repr for ProcessInterface class&lt;/h2&gt;
&lt;p&gt;A new HTML representation has been created for the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ProcessInterface&lt;/span&gt;&lt;/code&gt; class in &lt;a class="reference external" href="https://github.com/dask/distributed/"&gt;dask distributed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The HTML representation displays the status, address, and external address of the process.&lt;/p&gt;
&lt;p&gt;There are three possible status options:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Process created, not yet running (blue icon)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Process is running (green icon)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Process closed (orange icon)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="/images/gsoc21/5181-1.png" alt="Example: New HTML representation for distributed ProcessInterface class, pull request #5181" height=414 width=736&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ProcessInterface&lt;/span&gt;&lt;/code&gt; class is not intended to be used directly. Instead, more typically this information will be accessed via subclasses such as the SSH scheduler or workers.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SSHCluster&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;127.0.0.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;127.0.0.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;127.0.0.1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;  &lt;span class="c1"&gt;# HTML representation for the SSH scheduler, shown in Jupyter&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;  &lt;span class="c1"&gt;# dict of all the workers&lt;/span&gt;
&lt;span class="c1"&gt;# or&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# HTML representation for the first SSH worker in the cluster&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/5181-2.png" alt="Example: New HTML representation for distributed ProcessInterface class, pull request #5181" height=414 width=736&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/distributed/pull/5181"&gt;Pull request #5181 by Freyam Mehta &lt;em&gt;“Add HTML Repr for ProcessInterface Class and all its subclasses”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-html-repr-for-security-class"&gt;
&lt;h2&gt;New HTML repr for Security class&lt;/h2&gt;
&lt;p&gt;Pull request &lt;a class="reference external" href="https://github.com/dask/distributed/pull/5178"&gt;#5178&lt;/a&gt; added a new HTML representation for the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Security&lt;/span&gt;&lt;/code&gt; class in the &lt;a class="reference external" href="https://github.com/dask/distributed/"&gt;dask distributed library&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Security&lt;/span&gt;&lt;/code&gt; HTML representation shows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Whether encryption is required&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Whether the object instance was created using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Security.temporary()&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Security(**paths_to_keys)&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For temporary security objects, keys are generated dynamically and the only copy is kept in memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For security objects created using keys stored on disk, the HTML representation will show the full filepath to the relevant security certificates on disk.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: temporary security object&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Security&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Security&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;temporary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;s&lt;/span&gt;  &lt;span class="c1"&gt;# shows the HTML representation in Jupyter&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Example: security object using certificates saved to disk&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Security&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Security&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;require_encryption&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tls_ca_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ca.pem&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tls_scheduler_cert&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;scert.pem&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;s&lt;/span&gt;  &lt;span class="c1"&gt;# shows the HTML representation in Jupyter&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/gsoc21/5178-2.png" alt="Example: New HTML representation for distributed Security class, pull request #5178" height=414 width=736&gt;
&lt;p&gt;In addition, the text representation has also been updated to reflect the same information shown in the HTML representation.&lt;/p&gt;
&lt;img src="/images/gsoc21/5178-1.png" alt="Example: New text representation for distributed Security class, pull request #5178" height=414 width=736&gt;
&lt;p&gt;Reference: &lt;a class="reference external" href="https://github.com/dask/distributed/pull/5178/"&gt;Pull request #5178 by Freyam Mehta &lt;em&gt;“Add HTML Repr for Security Class”&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/08/23/gsoc-2021-project/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <published>2021-08-23T00:00:00+00:00</published>
  </entry>
</feed>
