<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.dask.org</id>
  <title>Dask Working Notes - Posts by Matthew Murray (NVIDIA)</title>
  <updated>2026-03-05T15:05:19.967548+00:00</updated>
  <link href="https://blog.dask.org"/>
  <link href="https://blog.dask.org/blog/author/matthew-murray-nvidia/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://blog.dask.org/2022/02/17/helm-multiple-worker-groups/</id>
    <title>How to run different worker types with the Dask Helm Chart</title>
    <updated>2022-02-17T00:00:00+00:00</updated>
    <author>
      <name>Matthew Murray (NVIDIA)</name>
    </author>
    <content type="html">&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/02/17/helm-multiple-worker-groups.md&lt;/span&gt;, line 9)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="introduction"&gt;

&lt;p&gt;Today, we’ll learn how to deploy &lt;a class="reference external" href="https://dask.org/"&gt;Dask&lt;/a&gt; on a &lt;a class="reference external" href="https://kubernetes.io/"&gt;Kubernetes&lt;/a&gt; cluster with the Dask Helm Chart and then run and scale different worker types with annotations.&lt;/p&gt;
&lt;section id="what-is-the-dask-helm-chart"&gt;
&lt;h2&gt;What is the Dask Helm Chart?&lt;/h2&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://github.com/dask/helm-chart"&gt;Dask Helm Chart&lt;/a&gt; is a convenient way of deploying Dask using &lt;a class="reference external" href="https://helm.sh/"&gt;Helm&lt;/a&gt;, a package manager for Kubernetes applications. After deploying Dask with the Dask Helm Chart, we can connect to our HelmCluster and begin scaling out workers.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-is-dask-kubernetes"&gt;
&lt;h2&gt;What is Dask Kubernetes?&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://kubernetes.dask.org/en/latest/"&gt;Dask Kubernetes&lt;/a&gt; allows you to deploy and manage your Dask deployment on a Kubernetes cluster. The Dask Kubernetes Python package has a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HelmCluster&lt;/span&gt;&lt;/code&gt; class (among other things) that will enable you to manage your cluster from Python. In this tutorial, we will use the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HelmCluster&lt;/span&gt;&lt;/code&gt; as our cluster manager.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="prerequisites"&gt;
&lt;h2&gt;Prerequisites&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;To have Helm installed and be able to run &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;helm&lt;/span&gt;&lt;/code&gt; commands&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To have a running Kubernetes cluster. It doesn’t matter whether you’re running Kubernetes locally using &lt;a class="reference external" href="https://minikube.sigs.k8s.io/docs/"&gt;MiniKube&lt;/a&gt; or &lt;a class="reference external" href="https://kind.sigs.k8s.io/"&gt;Kind&lt;/a&gt; or you’re using a cloud provider like AWS or GCP. But your cluster will need to have access to &lt;a class="reference external" href="https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/"&gt;GPU nodes&lt;/a&gt; to run GPU workers. You’ll also need to install &lt;a class="reference external" href="https://rapids.ai/"&gt;RAPIDS&lt;/a&gt; to run the GPU worker example.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To have &lt;a class="reference external" href="https://kubernetes.io/docs/tasks/tools/"&gt;kubectl&lt;/a&gt; installed. Although this is not required.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s it, let’s get started!&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/02/17/helm-multiple-worker-groups.md&lt;/span&gt;, line 29)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="install-dask-kubernetes"&gt;
&lt;h1&gt;Install Dask Kubernetes&lt;/h1&gt;
&lt;p&gt;From the &lt;a class="reference external" href="https://kubernetes.dask.org/en/latest/installing.html"&gt;documentation&lt;/a&gt;,&lt;/p&gt;
&lt;div class="highlight-console notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="go"&gt;pip install dask-kubernetes --upgrade&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;or&lt;/p&gt;
&lt;div class="highlight-console notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="go"&gt;conda install dask-kubernetes -c conda-forge&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/02/17/helm-multiple-worker-groups.md&lt;/span&gt;, line 43)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="install-the-dask-helm-chart"&gt;
&lt;h1&gt;Install the Dask Helm Chart&lt;/h1&gt;
&lt;p&gt;First, deploy Dask on Kubernetes with Helm:&lt;/p&gt;
&lt;div class="highlight-console notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="go"&gt;helm repo add dask https://helm.dask.org/&lt;/span&gt;
&lt;span class="go"&gt;helm repo update&lt;/span&gt;
&lt;span class="go"&gt;helm install my-dask dask/dask&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Now you should have Dask running on your Kubernetes cluster. If you have kubectl installed, you can run &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;kubectl&lt;/span&gt; &lt;span class="pre"&gt;get&lt;/span&gt; &lt;span class="pre"&gt;all&lt;/span&gt; &lt;span class="pre"&gt;-n&lt;/span&gt; &lt;span class="pre"&gt;default&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;img src="/images/default-dask-cluster.png" alt="Default Dask Cluster Installed with Helm" width="661" height="373"&gt;
&lt;p&gt;You can see that we’ve created a few resources! The main thing to know is that we start with three dask workers.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/02/17/helm-multiple-worker-groups.md&lt;/span&gt;, line 59)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="add-gpu-worker-group-to-our-dask-deployment"&gt;
&lt;h1&gt;Add GPU worker group to our Dask Deployment&lt;/h1&gt;
&lt;p&gt;The Helm Chart has default values that it uses out of the box to deploy our Dask cluster on Kubernetes. But now, because we want to create some GPU workers, we need to change the default values in the Dask Helm Chart. To do this, we can create a copy of the current &lt;a class="reference external" href="https://github.com/dask/helm-chart/blob/main/dask/values.yaml"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;values.yaml&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;, update it to add a GPU worker group and then update our helm deployment.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;First, you can copy the contents of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;values.yaml&lt;/span&gt;&lt;/code&gt; file in the Dask Helm Chart and create a new file called &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;my-values.yaml&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next, we’re going to update the section in the file called &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;additional_worker_groups&lt;/span&gt;&lt;/code&gt;. The section looks like this:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;additional_worker_groups&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# Additional groups of workers to create&lt;/span&gt;
&lt;span class="c1"&gt;# - name: high-mem-workers  # Dask worker group name.&lt;/span&gt;
&lt;span class="c1"&gt;#   resources:&lt;/span&gt;
&lt;span class="c1"&gt;#     limits:&lt;/span&gt;
&lt;span class="c1"&gt;#       memory: 32G&lt;/span&gt;
&lt;span class="c1"&gt;#     requests:&lt;/span&gt;
&lt;span class="c1"&gt;#       memory: 32G&lt;/span&gt;
&lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="c1"&gt;# (Defaults will be taken from the primary worker configuration)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Now we’re going to edit the section to look like this:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;additional_worker_groups&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# Additional groups of workers to create&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;gpu-workers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# Dask worker group name.&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;replicas&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;rapidsai/rapidsai-core&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;21.12-cuda11.5-runtime-ubuntu20.04-py3.8&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;dask_worker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;dask-cuda-worker&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;extraArgs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;--resources&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;GPU=1&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;limits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Now we can update our deployment with our new values in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;my-values.yaml&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-console notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="go"&gt;helm upgrade -f my-values.yaml my-dask dask/dask&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Again, you can run &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;kubectl&lt;/span&gt; &lt;span class="pre"&gt;get&lt;/span&gt; &lt;span class="pre"&gt;all&lt;/span&gt; &lt;span class="pre"&gt;-n&lt;/span&gt; &lt;span class="pre"&gt;default&lt;/span&gt;&lt;/code&gt;, and you’ll see our new GPU worker pod running:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="/images/gpu-worker-dask-cluster.png" alt="Dask Cluster Installed with Helm with a GPU worker" width="653" height="428"&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Now we can open up a jupyter notebook or any editor to write some code.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/02/17/helm-multiple-worker-groups.md&lt;/span&gt;, line 108)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="scaling-the-workers-up-down"&gt;
&lt;h1&gt;Scaling the workers Up/Down&lt;/h1&gt;
&lt;p&gt;We’ll start by importing the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HelmCluster&lt;/span&gt;&lt;/code&gt; cluster manager from Dask Kubernetes. Next, we connect our cluster manager to our dask cluster by passing the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;release_name&lt;/span&gt;&lt;/code&gt; of our Dask cluster as an argument. That’s it, the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HelmCluster&lt;/span&gt;&lt;/code&gt; automatically port-forwards the scheduler to us and can give us quick access to &lt;a class="reference external" href="https://kubernetes.dask.org/en/latest/helmcluster.html#dask_kubernetes.HelmCluster.get_logs"&gt;logs&lt;/a&gt;. Next, we’re going to scale our Dask cluster.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_kubernetes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HelmCluster&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HelmCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;release_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;my-dask&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/dask-cluster-four-workers.png" alt="Dask Cluster with four workers" width="1002" height="659"&gt;
&lt;p&gt;To scale our cluster, we need to provide our desired number of workers as an argument to the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HelmCluster&lt;/span&gt;&lt;/code&gt;’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scale&lt;/span&gt;&lt;/code&gt; method. By default, the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;scale&lt;/span&gt;&lt;/code&gt; method scales our default worker group. You can see in the first example we scaled the default worker group from three to five workers, giving us six workers in total. In the second example, we use the handy &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;worker_group&lt;/span&gt;&lt;/code&gt; keyword argument to scale our GPU worker group from one to two workers, giving us seven workers in total.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# scale the default worker group from 3 to 5 workers&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/dask-cluster-six-workers.png" alt="Dask Cluster with six workers" width="1002" height="802"&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker_group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;gpu-workers&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# scale the GPU worker group from 1 to 2 workers&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img src="/images/dask-cluster-seven-workers.png" alt="Dask Cluster with seven cluster" width="992" height="845"&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/02/17/helm-multiple-worker-groups.md&lt;/span&gt;, line 136)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="example-finding-the-average-new-york-city-taxi-trip-distance-in-april-2020"&gt;
&lt;h1&gt;Example: Finding the average New York City taxi trip distance in April 2020&lt;/h1&gt;
&lt;p&gt;This example will find the average distance traveled by a yellow taxi in New York City in April 2020 using the &lt;a class="reference external" href="https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page"&gt;NY Taxi Dataset&lt;/a&gt;. We’ll compute this distance in two different ways. The first way will employ our default dask workers, and the second way will utilize our GPU worker group. We’ll load the NY Taxi dataset as a data frame in both examples and compute the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mean&lt;/span&gt;&lt;/code&gt; of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;trip_distance&lt;/span&gt;&lt;/code&gt; column. The main difference is that we need to run our GPU-specific computations using our GPU worker group. We can do this by utilizing Dask annotations.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask.dataframe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask&lt;/span&gt;

&lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-04.csv&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;ddf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;assume_missing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;avg_trip_distance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;trip_distance&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;In January 2021, the average trip distance for yellow taxis was &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;avg_trip_distance&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; miles.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;dask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;annotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;GPU&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}):&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_cudf&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;cudf&lt;/span&gt;
    &lt;span class="n"&gt;dask_cdf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ddf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map_partitions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cudf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pandas&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;avg_trip_distance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dask_cdf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;trip_distance&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;In January 2021, the average trip distance for yellow taxis was &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;avg_trip_distance&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; miles.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/02/17/helm-multiple-worker-groups.md&lt;/span&gt;, line 156)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="closing"&gt;
&lt;h1&gt;Closing&lt;/h1&gt;
&lt;p&gt;That’s it! We’ve deployed Dask with Helm, created an additional GPU worker type, and used our workers to run an example calculation using the NY Taxi dataset. We’ve learned several new things:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The Dask Helm Chart lets you create multiple worker groups with different worker types. We saw this when we made two different groups of Dask Workers: CPU and GPU workers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can run specific computations on your workers of choice with annotations. Our example computed the average taxi distance using the RAPIDS libraries &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cudf&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask_cudf&lt;/span&gt;&lt;/code&gt; on our GPU worker group.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HelmCluster&lt;/span&gt;&lt;/code&gt; cluster manager in Dask Kubernetes lets you scale your worker groups quickly from python. We scaled our GPU worker group by conveniently passing the worker group name as a keyword argument in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HelmCluster&lt;/span&gt;&lt;/code&gt; scale method.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/02/17/helm-multiple-worker-groups.md&lt;/span&gt;, line 164)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="future-work"&gt;
&lt;h1&gt;Future Work&lt;/h1&gt;
&lt;p&gt;We’re thinking a lot about the concept of worker groups in the Dask community. Until now, most Dask deployments have homogenous workers, but as Dask users push Dask further, there is a growing demand for heterogeneous clusters with special-purpose workers. So we want to add worker groups throughout Dask.&lt;/p&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2022/02/17/helm-multiple-worker-groups/"/>
    <summary>Document headings start at H2, not H1 [myst.header]</summary>
    <category term="Helm" label="Helm"/>
    <category term="Kubernetes" label="Kubernetes"/>
    <published>2022-02-17T00:00:00+00:00</published>
  </entry>
</feed>
