<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.dask.org</id>
  <title>Dask Working Notes - Posts tagged profiling</title>
  <updated>2026-03-05T15:05:26.717517+00:00</updated>
  <link href="https://blog.dask.org"/>
  <link href="https://blog.dask.org/blog/tag/profiling/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://blog.dask.org/2021/03/11/dask_memory_usage/</id>
    <title>Measuring Dask memory usage with dask-memusage</title>
    <updated>2021-03-11T00:00:00+00:00</updated>
    <author>
      <name>&lt;a href="https://pythonspeed.com"&gt;Itamar Turner-Trauring&lt;/a&gt;</name>
    </author>
    <content type="html">&lt;p&gt;Using too much computing resources can get expensive when you’re scaling up in the cloud.&lt;/p&gt;
&lt;p&gt;To give a real example, I was working on the image processing pipeline for a spatial gene sequencing device, which could report not just which genes were being expressed but also where they were in a 3D volume of cells.
In order to get this information, a specialized microscope took snapshots of the cell culture or tissue, and the resulting data was run through a Dask pipeline.&lt;/p&gt;
&lt;p&gt;The pipeline was fairly slow, so I did some back-of-the-envelope math to figure out our computing costs would be once we started running more data for customers.
&lt;strong&gt;It turned out that we’d be using 70% of our revenue just paying for cloud computing!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Clearly I needed to optimize this code.&lt;/p&gt;
&lt;p&gt;When we think about the bottlenecks in large-scale computation, we often focus on CPU: we want to use more CPU cores in order to get faster results.
Paying for all that CPU can be expensive, as in this case, and I did successfully reduce CPU usage by quite a lot.&lt;/p&gt;
&lt;p&gt;But high memory usage was also a problem, and fixing that problem led me to build a series of tools, tools that can also help you optimize and reduce your Dask memory usage.&lt;/p&gt;
&lt;p&gt;In the rest of this article you will learn:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#problem"&gt;&lt;span class="xref myst"&gt;How high memory usage can drive up your computing costs&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How a tool called &lt;a class="reference external" href="https://github.com/itamarst/dask-memusage/"&gt;dask-memusage&lt;/a&gt; can help you &lt;a class="reference internal" href="#dask-memusage"&gt;&lt;span class="xref myst"&gt;find peak memory usage of the tasks in your Dask execution graph&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to &lt;a class="reference internal" href="#fil"&gt;&lt;span class="xref myst"&gt;further pinpoint high memory usage&lt;/span&gt;&lt;/a&gt; using the &lt;a class="reference external" href="https://pythonspeed.com/fil"&gt;Fil memory profiler&lt;/a&gt;, so you can reduce memory usage.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/11/dask_memory_usage.md&lt;/span&gt;, line 30)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="the-problem-fixed-processing-chunks-and-a-high-memory-cpu-ratio-problem"&gt;

&lt;p&gt;As a reminder, I was working on a Dask pipeline that processed data from a specialized microscope.
The resulting data volume was quite large, and certain subsets of images had to be processed together as a unit.
From a computational standpoint, we effectively had a series of inputs X0, X1, X2, … that could be independently processed by a function &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f()&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The internal processing of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f()&lt;/span&gt;&lt;/code&gt; could not easily be parallelized further.
From a CPU scheduling perspective, this was fine, it was still an embarrassingly parallel problem given the large of number of X inputs.&lt;/p&gt;
&lt;p&gt;For example, if I provisioned a virtual machine with 4 CPU cores, to process the data I could start four processes, and each would max out a single core.
If I had 12 inputs and each processing step took about the same time, they might run as follows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CPU0: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X0)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X4)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X8)&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU1: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X1)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X5)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X9)&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU2: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X2)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X6)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X10)&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU3: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X3)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X7)&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(X11)&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If I could make &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f()&lt;/span&gt;&lt;/code&gt; faster, the pipeline as a whole would also run faster.&lt;/p&gt;
&lt;p&gt;CPU is not the only resource used in computation, however: RAM can also be a bottleneck.
For example, let’s say each call to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;f(Xi)&lt;/span&gt;&lt;/code&gt; took 12GB of RAM.
That means to fully utilize 4 CPUs, I would need 48GB of RAM—but what if my computer only has 16GB of RAM?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Even though my computer has 4 CPUs, I can only utilize one CPU on a computer with 16GB RAM, because I don’t have enough RAM to run more than one task in parallel.&lt;/strong&gt;
In practice, these tasks ran in the cloud, where I could ensure the necessary RAM/core ratio was preserved by choosing the right pre-configured VM instances.
And on some clouds you can freely set the amount of RAM and number of CPU cores for each virtual machine you spin up.&lt;/p&gt;
&lt;p&gt;However, I didn’t quite know how much memory was used at peak, so I’d had to limit parallelism to reduce out-of-memory errors.
As a result, the default virtual machines we were using had half their CPUs resting idle, resources were paying for but not using.&lt;/p&gt;
&lt;p&gt;In order to provision hardware appropriately and max out all the CPUs, I needed to know how much peak memory each task was using.
And to do that, I created a new tool.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/11/dask_memory_usage.md&lt;/span&gt;, line 63)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="measuring-peak-task-memory-usage-with-dask-memusage-dask-memusage"&gt;
&lt;h1&gt;Measuring peak task memory usage with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; {#dask-memusage}&lt;/h1&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; is a tool for measuring peak memory usage for each task in the Dask execution graph.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Per &lt;em&gt;task&lt;/em&gt; because Dask executes code as a graph of tasks, and the graph determines how much parallelism can be used.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Peak&lt;/em&gt; memory is important, because that is the bottleneck.
It doesn’t matter if average memory usage per task is 4GB, if two parallel tasks in the graph need 12GB each at the same time, you’re going to need 24GB of RAM if you want to to run both tasks on the same computer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="using-dask-memusage"&gt;
&lt;h2&gt;Using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Since the gene sequencing code is proprietary and quite complex, let’s use a different example.
We’re going to count the occurrence of words in some text files, and then report the top-10 most common words in each file.
You can imagine combining the data later on, but we won’t bother with that in this simple example.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;gc&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sleep&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pathlib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.bag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;from_sequence&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;collections&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.distributed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_memusage&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;calculate_top_10&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# See notes below&lt;/span&gt;

    &lt;span class="c1"&gt;# Load the file&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Count the words&lt;/span&gt;
    &lt;span class="n"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.,&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\&amp;quot;&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# Choose the top 10:&lt;/span&gt;
    &lt;span class="n"&gt;by_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# See notes below&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;by_count&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:])&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Setup the calculation:&lt;/span&gt;

    &lt;span class="c1"&gt;# Create a 4-process cluster (running locally). Note only one thread&lt;/span&gt;
    &lt;span class="c1"&gt;# per-worker: because polling is per-process, you can&amp;#39;t run multiple&lt;/span&gt;
    &lt;span class="c1"&gt;# threads per worker, otherwise you&amp;#39;ll get results that combine memory&lt;/span&gt;
    &lt;span class="c1"&gt;# usage of multiple tasks.&lt;/span&gt;
    &lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocalCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threads_per_worker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                           &lt;span class="n"&gt;memory_limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Install dask-memusage:&lt;/span&gt;
    &lt;span class="n"&gt;dask_memusage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;memusage.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create the task graph:&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;from_sequence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterdir&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calculate_top_10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;example2.png&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rankdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;TD&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Run the calculations:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ... do something with results ...&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Here’s what the task graph looks like:&lt;/p&gt;
&lt;img src="/images/dask_memusage/example2.png" style="width: 75%; margin: 2em;"&gt;
&lt;p&gt;Plenty of parallelism!&lt;/p&gt;
&lt;p&gt;We can run the program on some files:&lt;/p&gt;
&lt;div class="highlight-shell-session notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;dask&lt;span class="o"&gt;[&lt;/span&gt;bag&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;dask_memusage
&lt;span class="gp"&gt;$ &lt;/span&gt;python&lt;span class="w"&gt; &lt;/span&gt;example2.py&lt;span class="w"&gt; &lt;/span&gt;files/
&lt;span class="gp gp-VirtualEnv"&gt;(&amp;#39;frankenstein.txt&amp;#39;, [(&amp;#39;that&amp;#39;, 1016)&lt;/span&gt;&lt;span class="go"&gt;, (&amp;#39;was&amp;#39;, 1021), (&amp;#39;in&amp;#39;, 1180), (&amp;#39;a&amp;#39;, 1438), (&amp;#39;my&amp;#39;, 1751), (&amp;#39;to&amp;#39;, 2164), (&amp;#39;i&amp;#39;, 2754), (&amp;#39;of&amp;#39;, 2761), (&amp;#39;and&amp;#39;, 3025), (&amp;#39;the&amp;#39;, 4339)])&lt;/span&gt;
&lt;span class="gp gp-VirtualEnv"&gt;(&amp;#39;pride_and_prejudice.txt&amp;#39;, [(&amp;#39;she&amp;#39;, 1660)&lt;/span&gt;&lt;span class="go"&gt;, (&amp;#39;i&amp;#39;, 1730), (&amp;#39;was&amp;#39;, 1832), (&amp;#39;in&amp;#39;, 1904), (&amp;#39;a&amp;#39;, 1981), (&amp;#39;her&amp;#39;, 2142), (&amp;#39;and&amp;#39;, 3503), (&amp;#39;of&amp;#39;, 3705), (&amp;#39;to&amp;#39;, 4188), (&amp;#39;the&amp;#39;, 4492)])&lt;/span&gt;
&lt;span class="gp gp-VirtualEnv"&gt;(&amp;#39;greatgatsby.txt&amp;#39;, [(&amp;#39;that&amp;#39;, 564)&lt;/span&gt;&lt;span class="go"&gt;, (&amp;#39;was&amp;#39;, 760), (&amp;#39;he&amp;#39;, 770), (&amp;#39;in&amp;#39;, 849), (&amp;#39;i&amp;#39;, 999), (&amp;#39;to&amp;#39;, 1197), (&amp;#39;of&amp;#39;, 1224), (&amp;#39;a&amp;#39;, 1440), (&amp;#39;and&amp;#39;, 1565), (&amp;#39;the&amp;#39;, 2543)])&lt;/span&gt;
&lt;span class="gp gp-VirtualEnv"&gt;(&amp;#39;big.txt&amp;#39;, [(&amp;#39;his&amp;#39;, 40032)&lt;/span&gt;&lt;span class="go"&gt;, (&amp;#39;was&amp;#39;, 45356), (&amp;#39;that&amp;#39;, 47924), (&amp;#39;he&amp;#39;, 48276), (&amp;#39;a&amp;#39;, 83228), (&amp;#39;in&amp;#39;, 86832), (&amp;#39;to&amp;#39;, 114184), (&amp;#39;and&amp;#39;, 152284), (&amp;#39;of&amp;#39;, 159888), (&amp;#39;the&amp;#39;, 314908)])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;As one would expect, the most common words are stem words, but there is still some variation in order.&lt;/p&gt;
&lt;p&gt;Next, let’s look at the results from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="dask-memusage-output-and-how-it-works"&gt;
&lt;h2&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; output, and how it works&lt;/h2&gt;
&lt;p&gt;You’ll notice that the actual use of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; involves just one extra line, other than the import:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;dask_memusage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;memusage.csv&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What this will do is poll the process at 10ms intervals for peak memory usage, broken down by task.
In this case, here’s what &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;memusage.csv&lt;/span&gt;&lt;/code&gt; looks like:&lt;/p&gt;
&lt;div class="highlight-csv notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;task_key,min_memory_mb,max_memory_mb
&amp;quot;(&amp;#39;from_sequence-3637e6ff937ef8488894df60a80f62ed&amp;#39;, 3)&amp;quot;,51.2421875,51.2421875
&amp;quot;(&amp;#39;from_sequence-3637e6ff937ef8488894df60a80f62ed&amp;#39;, 0)&amp;quot;,51.70703125,51.70703125
&amp;quot;(&amp;#39;from_sequence-3637e6ff937ef8488894df60a80f62ed&amp;#39;, 1)&amp;quot;,51.28125,51.78515625
&amp;quot;(&amp;#39;from_sequence-3637e6ff937ef8488894df60a80f62ed&amp;#39;, 2)&amp;quot;,51.30859375,51.30859375
&amp;quot;(&amp;#39;calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca&amp;#39;, 2)&amp;quot;,56.19140625,56.19140625
&amp;quot;(&amp;#39;calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca&amp;#39;, 0)&amp;quot;,51.70703125,54.26953125
&amp;quot;(&amp;#39;calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca&amp;#39;, 1)&amp;quot;,52.30078125,52.30078125
&amp;quot;(&amp;#39;calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca&amp;#39;, 3)&amp;quot;,51.48046875,384.00390625
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For each task in the graph we are told minimum memory usage and peak memory usage, in MB.&lt;/p&gt;
&lt;p&gt;In more readable form:&lt;/p&gt;
&lt;div class="pst-scrollable-table-container"&gt;&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;task_key&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;min_memory_mb&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;max_memory_mb&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;“(‘from_sequence-3637e6ff937ef8488894df60a80f62ed’, 3)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.2421875&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.2421875&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;“(‘from_sequence-3637e6ff937ef8488894df60a80f62ed’, 0)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.70703125&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.70703125&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;“(‘from_sequence-3637e6ff937ef8488894df60a80f62ed’, 1)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.28125&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.78515625&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;“(‘from_sequence-3637e6ff937ef8488894df60a80f62ed’, 2)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.30859375&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.30859375&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;“(‘calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca’, 2)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;56.19140625&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;56.19140625&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;“(‘calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca’, 0)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.70703125&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;54.26953125&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;“(‘calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca’, 1)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;52.30078125&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;52.30078125&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;“(‘calculate_top_10-afc867e38c3bd0aac8c18bb00d3634ca’, 3)”&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;51.48046875&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;384.00390625&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;The bottom four lines are the interesting ones; all four start with a minimum memory usage of ~50MB RAM, and then memory may or may not increase as the code runs.
How much it increases presumably depends on the size of the files; most of them are quite small, so memory usage doesn’t change much.
&lt;strong&gt;One file uses much more maximum memory than the others, 384MB of RAM; presumably it’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;big.txt&lt;/span&gt;&lt;/code&gt; which is 25MB, since the other files are all smaller than 1MB.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The mechanism used, polling peak process memory, has some limitations:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;You’ll notice there’s a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;gc.collect()&lt;/span&gt;&lt;/code&gt; at the top of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;calculate_top_10()&lt;/span&gt;&lt;/code&gt;; this ensures we don’t count memory from previous code that hasn’t been cleaned up yet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There’s also a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;sleep()&lt;/span&gt;&lt;/code&gt; at the bottom of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;calculate_top_10()&lt;/span&gt;&lt;/code&gt;.
Because polling is used, tasks that run too quickly won’t get accurate information—the polling happens every 10ms or so, so you want to sleep at least 20ms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, because polling is per-process, you can’t run multiple threads per worker, otherwise you’ll get results that combine memory usage of multiple tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="interpreting-the-data"&gt;
&lt;h2&gt;Interpreting the data&lt;/h2&gt;
&lt;p&gt;What we’ve learned is that memory usage of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;calculate_top_10()&lt;/span&gt;&lt;/code&gt; grows with file size; this can be used to &lt;a class="reference external" href="https://pythonspeed.com/articles/estimating-memory-usage/"&gt;characterize the memory requirements for the workload&lt;/a&gt;.
That is, we can create a model that links data input sizes and required RAM, and then we can calculate the required RAM for any given level of parallelism.
And that can guide our choice of hardware, if we assume one task per CPU core.&lt;/p&gt;
&lt;p&gt;Going back to my original motivating problem, the gene sequencing pipeline: using the data from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt;, I was able to come up with a formula saying “for this size input, this much memory is necessary”.
Whenever we ran a batch job we could therefore set the parallelism as high as possible given the number of CPUs and RAM on the machine.&lt;/p&gt;
&lt;p&gt;While this allowed for more parallelism, it still wasn’t sufficient—processing was still using a huge amount of RAM, RAM that we had to pay for either with time (by using less CPUs) or money (by paying for more expensive virtual machines that more RAM).
So the next step was to reduce memory usage.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/11/dask_memory_usage.md&lt;/span&gt;, line 216)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="reducing-memory-usage-with-fil-fil"&gt;
&lt;h1&gt;Reducing memory usage with Fil {#fil}&lt;/h1&gt;
&lt;p&gt;If we look at the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; output for our word-counting example, the memory usage seems rather high: for a 25MB file, we’re using 330MB of RAM to count words.
Thinking through how an ideal version of this code might work, we ought to be able to process the file with much less memory (for example we could redesign our code to process the file line by line, reducing memory).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;And that’s another way in which &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; can be helpful: it can point us at specific code that needs memory usage optimized, at the granularity of a task.&lt;/strong&gt;
A task can be a rather large chunk of code, though, so the next step is to use a memory profiler that can point to specific lines of code.&lt;/p&gt;
&lt;p&gt;When working on the gene sequencing tool I used the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;memory_profiler&lt;/span&gt;&lt;/code&gt; package, and while that worked, and I managed to reduce memory usage quite a bit, I found it quite difficult to use.
It turns out that for batch data processing, the typical use case for Dask, &lt;a class="reference external" href="https://pythonspeed.com/articles/memory-profiler-data-scientists/"&gt;you want a different kind of memory profiler&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So after I’d left that job, I created &lt;a class="reference external" href="https://pythonspeed.com/fil"&gt;a memory profiler called Fil&lt;/a&gt; that is expressly designed for finding peak memory usage.
Unlike &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt;, which can be run on production workloads, Fil slows down your execution and has other limitations I’m currently working on (it doesn’t support multiple processes, as of March 2021), so for now it’s better used for manual profiling.&lt;/p&gt;
&lt;p&gt;We can write a little script that only runs on &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;big.txt&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pathlib&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;example2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculate_top_10&lt;/span&gt;

&lt;span class="n"&gt;calculate_top_10&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;files/big.txt&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Run it under Fil:&lt;/p&gt;
&lt;div class="highlight-shell-session notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="go"&gt;pip install filprofiler&lt;/span&gt;
&lt;span class="go"&gt;fil-profile run example3.py&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And the result shows us where the bulk of the memory is being allocated:&lt;/p&gt;
&lt;iframe id="peak" src="/images/dask_memusage/peak-memory.svg" width="100%" height="300" scrolling="auto" frameborder="0"&gt;&lt;/iframe&gt;
&lt;p&gt;Reading in the file takes 8% of memory, but &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;data.split()&lt;/span&gt;&lt;/code&gt; is responsible for 84% of memory.
Perhaps we shouldn’t be loading the whole file into memory and splitting the whole file into words, and instead we should be processing the file line by line.
A good next step if this were real code would be to fix the way &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;calculate_top_10()&lt;/span&gt;&lt;/code&gt; is implemented.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2021/03/11/dask_memory_usage.md&lt;/span&gt;, line 254)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="next-steps"&gt;
&lt;h1&gt;Next steps&lt;/h1&gt;
&lt;p&gt;What should you do if your Dask workload is using too much memory?&lt;/p&gt;
&lt;p&gt;If you’re running Dask workloads with the Distributed backend, and you’re fine with only having one thread per worker, running with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; will give you real-world per-task memory usage on production workloads.
You can then use the resulting information in a variety of ways:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;As a starting point for optimizing memory usage.
Once you know which tasks use the most memory, you can then &lt;a class="reference external" href="https://pythonspeed.com/articles/memory-profiler-data-scientists/"&gt;use Fil to figure out which lines of code are responsible&lt;/a&gt; and then use &lt;a class="reference external" href="https://pythonspeed.com/articles/data-doesnt-fit-in-memory/"&gt;a variety of techniques to reduce memory usage&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When possible, you can fine tune your chunking size; smaller chunks will use less memory.
If you’re using Dask Arrays you can &lt;a class="reference external" href="https://docs.dask.org/en/latest/array-chunks.html"&gt;set the chunk size&lt;/a&gt;; with Dask Dataframes you can &lt;a class="reference external" href="https://docs.dask.org/en/latest/dataframe-best-practices.html#repartition-to-reduce-overhead"&gt;ensure good partition sizes&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can fine tune your hardware configuration, so you’re not wasting RAM or CPU cores.
For example, on AWS you can &lt;a class="reference external" href="https://instances.vantage.sh/"&gt;choose a variety of instance sizes&lt;/a&gt; with different RAM/CPU ratios, one of which may match your workload characteristics.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In my original use case, the gene sequencing pipeline, I was able to use a combination of lower memory use and lower CPU use to reduce costs to a much more modest level.
And when doing R&amp;amp;D, I was able to get faster results with the same hardware costs.&lt;/p&gt;
&lt;p&gt;You can &lt;a class="reference internal" href="#github.com/itamarst/dask-memusage/"&gt;&lt;span class="xref myst"&gt;learn more about &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-memusage&lt;/span&gt;&lt;/code&gt; here&lt;/span&gt;&lt;/a&gt;, and &lt;a class="reference external" href="https://pythonspeed.com/fil"&gt;learn more about the Fil memory profiler here&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2021/03/11/dask_memory_usage/"/>
    <summary>Using too much computing resources can get expensive when you’re scaling up in the cloud.</summary>
    <category term="dask" label="dask"/>
    <category term="distributed" label="distributed"/>
    <category term="memory" label="memory"/>
    <category term="profiling" label="profiling"/>
    <category term="ram" label="ram"/>
    <published>2021-03-11T00:00:00+00:00</published>
  </entry>
</feed>
