<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.dask.org</id>
  <title>Dask Working Notes - Posts tagged Community</title>
  <updated>2026-03-05T15:05:22.836825+00:00</updated>
  <link href="https://blog.dask.org"/>
  <link href="https://blog.dask.org/blog/tag/community/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://blog.dask.org/2022/11/21/november-demo-day/</id>
    <title>Dask Demo Day November 2022</title>
    <updated>2022-11-21T00:00:00+00:00</updated>
    <author>
      <name>Richard Pelgrim (Coiled)</name>
    </author>
    <content type="html">&lt;p&gt;Once a month, the Dask Community team hosts Dask Demo Day: an informal and fun online hangout where folks can showcase new or lesser-known Dask features and the rest of us can learn about all the things we didn’t know Dask could do 😁&lt;/p&gt;
&lt;p&gt;November’s Dask Demo Day had five great demos. We learned about:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#visualization-at-lightning-speed"&gt;&lt;span class="xref myst"&gt;Visualizing 2-billion lightning flashes with Dask, RAPIDS and Datashader&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#the-new-dask-cli"&gt;&lt;span class="xref myst"&gt;The new Dask CLI&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#xgboost-hpo-with-dask-and-optuna"&gt;&lt;span class="xref myst"&gt;The Dask-Optuna integration for distributed hyperparameter optimization&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#dask-for-awkward-arrays"&gt;&lt;span class="xref myst"&gt;Dask-Awkward&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference internal" href="#profiling-dask-on-a-cluster-with-py-spy"&gt;&lt;span class="xref myst"&gt;Profiling your Dask code with Dask-PySpy&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This blog gives you a quick overview of the five demos and demonstrates how they might be useful to you. You can &lt;a class="reference external" href="https://www.youtube.com/embed/_x7oaSEJDjA"&gt;watch the full recording below&lt;/a&gt;.&lt;/p&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/_x7oaSEJDjA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;&lt;/iframe&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/11/21/november-demo-day.md&lt;/span&gt;, line 23)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="visualization-at-lightning-speed"&gt;

&lt;p&gt;Would it be possible to interactively visualize all the lightning strikes in his dataset, &lt;a class="reference external" href="https://www.albany.edu/daes/faculty/kevin-tyle"&gt;Kevin Tyle&lt;/a&gt; (University of Albany) recently asked himself. In this demo, Kevin shows you how he leveraged &lt;a class="reference external" href="https://developer.nvidia.com/cuda-zone"&gt;CUDA&lt;/a&gt;, &lt;a class="reference external" href="https://rapids.ai/"&gt;RAPIDS-AI&lt;/a&gt;, &lt;a class="reference external" href="https://www.dask.org/"&gt;Dask&lt;/a&gt; and &lt;a class="reference external" href="https://datashader.org/"&gt;Datashader&lt;/a&gt; to build a smooth interactive visualization of 8 years’ worth of lightning strikes. That’s over 2 billion rows of data.&lt;/p&gt;
&lt;p&gt;Kevin shows you how to finetune performance of such a large-scale data processing workflow by:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Leveraging GPUs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using a Dask cluster to maximize hardware usage&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Making smart choices about file types&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="Heatmap of lightning strikes in the US" src="/images/2022-11-demo-day/lightning.png" style="max-width: 100%;" width="100%" /&gt;
&lt;p&gt;Watch the &lt;a class="reference external" href="https://youtu.be/_x7oaSEJDjA?t=167"&gt;full demo&lt;/a&gt; or read more about &lt;a class="reference external" href="https://www.coiled.io/blog/datashader-data-visualisation-performance"&gt;high-performance visualization strategies&lt;/a&gt; with Dask and Datashader.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/11/21/november-demo-day.md&lt;/span&gt;, line 37)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="the-new-dask-cli"&gt;
&lt;h1&gt;The New Dask CLI&lt;/h1&gt;
&lt;p&gt;During the Dask Sprint at &lt;a class="reference external" href="https://conference.scipy.org/"&gt;SciPy&lt;/a&gt; this year, a group of Dask maintainers began work on an upgraded, high-level &lt;a class="reference external" href="https://docs.dask.org/en/stable/cli.html"&gt;Dask CLI&lt;/a&gt;. &lt;a class="reference external" href="https://ddavis.io/about/"&gt;Doug Davis&lt;/a&gt; (Anaconda) walks us through how the CLI works and all the things you can do with it. After installing dask, you can access the CLI by typing dask into your terminal. The tool is designed to be easily extensible by anyone working on Dask. Doug shows you how to add your own components to the Dask CLI.&lt;/p&gt;
&lt;img alt="Screenshot of the new Dask CLI in action" src="/images/2022-11-demo-day/dask-cli.png" style="max-width: 100%;" width="100%" /&gt;
&lt;p&gt;Watch the &lt;a class="reference external" href="https://youtu.be/_x7oaSEJDjA?t=882"&gt;full demo&lt;/a&gt; or read the &lt;a class="reference external" href="https://docs.dask.org/en/stable/cli.html"&gt;Dask CLI documentation&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/11/21/november-demo-day.md&lt;/span&gt;, line 45)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="xgboost-hpo-with-dask-and-optuna"&gt;
&lt;h1&gt;XGBoost HPO with Dask and Optuna&lt;/h1&gt;
&lt;p&gt;Have you ever wanted to speed up your hyperparameter searches by running them in parallel? &lt;a class="reference external" href="https://www.jamesbourbeau.com/about/"&gt;James Bourbeau&lt;/a&gt; (Coiled) shows you how you can use the brand-new &lt;a class="reference external" href="https://jrbourbeau.github.io/dask-optuna/"&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-optuna&lt;/span&gt;&lt;/code&gt;&lt;/a&gt; integration to run hundreds of hyperparameter searches in parallel on a Dask cluster. Running your Optuna HPO searches on a Dask cluster requires only two changes to your existing optuna code. After making those changes, we’re then able to run 500 HPO iterations in parallel in 25 seconds.&lt;/p&gt;
&lt;img alt="Screenshot of Dask-Optuna running" src="/images/2022-11-demo-day/optuna-dask.png" style="max-width: 100%;" width="100%" /&gt;
&lt;p&gt;Watch the &lt;a class="reference external" href="https://youtu.be/_x7oaSEJDjA?t=1300"&gt;full demo&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/11/21/november-demo-day.md&lt;/span&gt;, line 53)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="dask-for-awkward-arrays"&gt;
&lt;h1&gt;Dask for Awkward Arrays&lt;/h1&gt;
&lt;p&gt;The PyData ecosystem has historically focused on rectilinear data structures like DataFrames and regular arrays. &lt;a class="reference external" href="https://awkward-array.readthedocs.io/en/stable/"&gt;Awkward Arrays&lt;/a&gt; brings NumPy-like operations to non-rectilinear data structures and &lt;a class="reference external" href="https://github.com/ContinuumIO/dask-awkward"&gt;dask-awkward&lt;/a&gt; enables you to work with awkward arrays on a distributed cluster in parallel. &lt;a class="reference external" href="https://ddavis.io/about/"&gt;Doug Davis&lt;/a&gt; (Anaconda) walks you through a quick demo of how to use &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dask-awkward&lt;/span&gt;&lt;/code&gt; on a local cluster. This is a helpful tool if you find yourself working with nested data structures at scale.&lt;/p&gt;
&lt;img alt="Screenshot of dask-awkward" src="/images/2022-11-demo-day/awkward.png" style="max-width: 100%;" width="100%" /&gt;
&lt;p&gt;Watch the &lt;a class="reference external" href="https://youtu.be/_x7oaSEJDjA?t=2033"&gt;full demo&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/11/21/november-demo-day.md&lt;/span&gt;, line 61)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="profiling-dask-on-a-cluster-with-py-spy"&gt;
&lt;h1&gt;Profiling Dask on a Cluster with py-spy&lt;/h1&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/benfred/py-spy"&gt;py-spy&lt;/a&gt; is a Python profiler that lets you dig deeper into your code than just your Python functions. &lt;a class="reference external" href="https://github.com/gjoseph92"&gt;Gabe Joseph&lt;/a&gt; (Coiled) shows you how you can use &lt;a class="reference external" href="https://github.com/gjoseph92/dask-pyspy"&gt;dask-pyspy&lt;/a&gt; to profile code on a Dask cluster. By digging down into compiled code, dask-pyspy is able to discover valuable insights about why your Dask code might be running slow and what you might be able to do to resolve this.&lt;/p&gt;
&lt;img alt="Screenshot of dask-pyspy in action" src="/images/2022-11-demo-day/pyspy.png" style="max-width: 100%;" width="100%" /&gt;
&lt;p&gt;Watch the &lt;a class="reference external" href="https://youtu.be/_x7oaSEJDjA?t=2758"&gt;full demo&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2022/11/21/november-demo-day.md&lt;/span&gt;, line 69)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="join-us-for-the-next-demo-day"&gt;
&lt;h1&gt;Join us for the next Demo Day!&lt;/h1&gt;
&lt;p&gt;Dask Demo Day is a great opportunity to learn about the latest developments and features in Dask. It’s also a fun hangout where you can ask questions and interact with some of Dask’s core maintainers in an informal, casual online setting. We’d love to see you at the next Demo Day on December 15th!&lt;/p&gt;
&lt;p&gt;Curious how you can stay connected and find out about the latest Dask news and events?&lt;/p&gt;
&lt;p&gt;You can:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;follow us on Twitter &lt;a class="reference external" href="https://twitter.com/dask_dev"&gt;&amp;#64;dask_dev&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;subscribe to the Dask newsletter by sending a blank email to newsletter+subscribe&amp;#64;dask.org&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;subscribe to the &lt;a class="reference external" href="https://docs.dask.org/en/latest/support.html"&gt;Dask community calendar&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2022/11/21/november-demo-day/"/>
    <summary>Once a month, the Dask Community team hosts Dask Demo Day: an informal and fun online hangout where folks can showcase new or lesser-known Dask features and the rest of us can learn about all the things we didn’t know Dask could do 😁</summary>
    <category term="Community" label="Community"/>
    <published>2022-11-21T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2020/08/21/running-tutorials/</id>
    <title>Running tutorials</title>
    <updated>2020-08-21T00:00:00+00:00</updated>
    <author>
      <name>Jacob Tomlinson (NVIDIA)</name>
    </author>
    <content type="html">&lt;p&gt;For the last couple of months we’ve been running community tutorials every three weeks or so. The response from the community has been great and we’ve had 50-100 people at each 90 minute session.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/21/running-tutorials.md&lt;/span&gt;, line 12)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="why-should-open-source-projects-run-tutorials"&gt;

&lt;p&gt;The Dask team has historically run tutorials at conferences such as SciPy. With 2020 turning out the way that it has much of this content is being presented virtually this year. As more people are becoming accustomed to participating in virtual tutorials we felt it would be a good service to our community to start running regular virtual tutorials independent of conferences we may be attending or speaking at.&lt;/p&gt;
&lt;p&gt;Tutorials are great for open source projects as they appeal to multiple types of learner.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The tutorial material provides a great foundation for &lt;em&gt;written and visual learners&lt;/em&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using an interactive tool like Jupyter Notebooks allows &lt;em&gt;kinesthetic learners&lt;/em&gt; to follow along and take their own paths.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Having an instructor run through the material in real time provides a spoken source for &lt;em&gt;auditory learners&lt;/em&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’s also just fun to have a bunch of people from around the world participate in a live event. There is a greater sense of community.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many open source projects provide documentation, some also make instructional videos on YouTube, but you really can’t beat a tutorial for producing a single set of content that is valuable to many users.&lt;/p&gt;
&lt;p&gt;The more users can share knowledge, information and skills with the more they are going to use and engage with the project. Having a great source of learning material is critical for converting interested newcomers to users and users to contributors.&lt;/p&gt;
&lt;p&gt;It is also great for the maintainers too. Dask is a large project made up of many open source repositories all with different functions. Each maintainer tends to participate in their specialist areas, but do not engage with everything on a day-to-day basis. Having maintainers run tutorials encourages them to increase their knowledge of areas they rarely touch in order to deliver the material, and this benefits the project as a whole.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/21/running-tutorials.md&lt;/span&gt;, line 29)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="how"&gt;
&lt;h1&gt;How&lt;/h1&gt;
&lt;p&gt;For the rest of this post we will discuss the preparation and logistics we have undertaken to provide our tutorials. Hopefully this will provide a blueprint for others waning to run similar activities.&lt;/p&gt;
&lt;section id="writing-the-material"&gt;
&lt;h2&gt;Writing the material&lt;/h2&gt;
&lt;p&gt;When starting to compile material is it important to consider a few questions; “Who is this for?”, “How long should it be?” and “What already exists today?”.&lt;/p&gt;
&lt;p&gt;For the Dask tutorial we were targeting users who were either new to Dask, or had been using it for a while but wanted to learn more about the wider project. Dask is a large project after all and there are many features that you may not discover when trying to solve your specific challenges with it.&lt;/p&gt;
&lt;p&gt;At large conferences is it quite normal to run a three hour tutorial, however when trying to schedule a tutorial as part of a person’s normal working day that is probably too much to ask of them. Folks are accustomed to scheduling in work meetings that are typically 30-60 minutes, but that may not be enough to run a tutorial. So we settled on 90 minutes, enough to get through a good amount of content, but not too long that folks will be put off.&lt;/p&gt;
&lt;p&gt;We already have an &lt;a class="reference external" href="https://github.com/dask/dask-tutorial"&gt;“official” tutorial&lt;/a&gt; which is designed to fill the three hours of a SciPy tutorial. This tutorial is also designed as a “Dask from first principals” style tutorial where we explore how Dask works and eventually scale up to how Dask implements familiar APIs like Numpy and Pandas. This is great for giving folks a thorough understanding of Dask but given that we decided on 90 minutes we may not want to start with low level code as we may run out of time before getting to general usage.&lt;/p&gt;
&lt;p&gt;While researching what already exists I was pointed to the &lt;a class="reference external" href="https://github.com/adbreind/dask-mini-2019"&gt;Mini Dask 2019 tutorial&lt;/a&gt; which was created for an &lt;a class="reference external" href="https://www.oreilly.com/live-training/courses/scale-your-python-processing-with-dask/0636920319573/"&gt;O’Reilly event&lt;/a&gt;. This tutorial starts with familiar APIs such as dataframes and arrays and eventually digs down into Dask fundamentals. As tutorial content like this is often licensed as open source and made available on GitHub it’s great to be able to build upon the work of others.&lt;/p&gt;
&lt;p&gt;The result of combining the two tutorials was the &lt;a class="reference external" href="https://github.com/jacobtomlinson/dask-video-tutorial-2020"&gt;Dask Video Tutorial 2020&lt;/a&gt;. It follows the same structure as the mini tutorial starting with high level APIs and digging further down. It also includes some new content on deployment and distributed methods.&lt;/p&gt;
&lt;section id="structuring-content"&gt;
&lt;h3&gt;Structuring content&lt;/h3&gt;
&lt;p&gt;To ensure this content targets the different learner types that we discussed earlier we need to ensure our content has a few things.&lt;/p&gt;
&lt;p&gt;As a foundation we should put together a series of pages/documents with a written version of the information we are trying to communicate for &lt;em&gt;written learners&lt;/em&gt;. We should also endeavor to include diagrams and pictures to illustrate this information for &lt;em&gt;visual learners&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;As we are sharing knowledge on an open source software project we should also make things as interactive as possible. Using Jupyter Notebooks as our document format means we can include many code examples which both provide written examples but are also editable and executable to empower &lt;em&gt;kinesthetic learners&lt;/em&gt; to feel how things work in practice.&lt;/p&gt;
&lt;p&gt;When the content is being delivered the instructor will be running through the content at the same time and narrating what they are doing for &lt;em&gt;auditory learners&lt;/em&gt;. It is important to try and structure things in a way where you explain each section of the content out loud, but without directly reading the text from the screen as that can be off-putting.&lt;/p&gt;
&lt;p&gt;We also want to ensure folks are taking things in, and labs are a great way to include small tests in the content. Having a section at the end of an example which is incomplete means that you can give the audience some time to try and figure things our for themselves. Some folks will be able to fill things in with no problems. For others they will hit errors or make mistakes, this is good for teaching how to debug and troubleshoot your project. And for those who are having awful flashbacks to pop-quizzes they can simply skip it without worrying that someone will check up on them.&lt;/p&gt;
&lt;p&gt;For each section of content you want to include in your tutorial I recommend you create a notebook with an explanation, an example and some things for the audience to figure out. Doing this for each section (in the Dask tutorial we had 9 sections) the audience will quickly become familiar with the process and be able to anticipate what is coming next. This will make them feel comfortable.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="hosting-the-material"&gt;
&lt;h2&gt;Hosting the material&lt;/h2&gt;
&lt;p&gt;Once you have put your material together you need to share it with your attendees.&lt;/p&gt;
&lt;p&gt;GitHub is a great place to put things, especially if you include an open license with it. For narrative tutorial content a creative commons license if often used which requires modifications to also be shared.&lt;/p&gt;
&lt;p&gt;As we have put our content together as Jupyter Notebooks we can use &lt;a class="reference external" href="https://mybinder.org/"&gt;Binder&lt;/a&gt; to make it possible for folks to run the material without having to download it locally or ensure their Python environment is set up correctly.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="choosing-a-video-platform"&gt;
&lt;h2&gt;Choosing a video platform&lt;/h2&gt;
&lt;p&gt;Next we have to decide how we will present the material. As this is a virtual tutorial we will want to use some kind of video conferences or streaming software.&lt;/p&gt;
&lt;p&gt;These tools tend to fall into two categories; private meetings with a tool like Zoom, Hangouts or Teams and public broadcasts on websites like YouTube or Twitch.&lt;/p&gt;
&lt;p&gt;Any of these options will likely be a good choice, they allow the presenter to share their video, audio and screen with participants and participants can communicate back with a range of tools.&lt;/p&gt;
&lt;p&gt;The main decision you will have to make is around whether you want to restrict numbers or not. The more interactivity you want to have in the tutorial the more you will need a handle on numbers. For our initial tutorials we wanted to enable participants to ask questions at any time and get a quick response, so we opted to use Zoom and limit our numbers to allow us to not get overwhelmed with questions. However if you want to present to as many people as possible and accept that you may not be able to address them all individually you may want to use a streaming platform instead.&lt;/p&gt;
&lt;p&gt;It is also possible to do both at the same time. Zoom can stream directly to YouTube for example. This can be useful if you want to open things to as many folks as possible, but also limit the interactivity to a select group (probably on a first-come-first-served basis). For the Dask tutorials we decided to not livestream and instead run multiple tutorials so that everyone gets an interactive experience, but we are fortunate to have the resources to do that.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="registering-attendees"&gt;
&lt;h2&gt;Registering attendees&lt;/h2&gt;
&lt;p&gt;There are a couple of reasons why you may wish to register attendees ahead of time.&lt;/p&gt;
&lt;p&gt;If you want to limit numbers you will certainly need some way to register people and put a cap on that number. But even if you are streaming generally you may want to get folks to register ahead of time as that allows you to send them reminder emails in the run up to the event, which likely will add more certainty to the attendance numbers.&lt;/p&gt;
&lt;p&gt;As our event was private we registered folks with &lt;a class="reference external" href="https://www.eventbrite.com/"&gt;Eventbrite&lt;/a&gt;. This allowed us to cap numbers and also schedule automated emails to act as a reminder but also share the details of the private Zoom meeting.&lt;/p&gt;
&lt;p&gt;When running the Dask tutorials we found about 50% of the folks who registered actually turned up, so we accounted for this an set out limit to around double the number we wanted.&lt;/p&gt;
&lt;p&gt;Here’s an example of the event details what we created:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;&lt;strong&gt;Event Title&lt;/strong&gt;: Dask Tutorial&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Organizer&lt;/strong&gt;: Presenter’s name&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Event Type&lt;/strong&gt;: Seminar or talk, Science and Technology, Online event&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tags&lt;/strong&gt;: dask, pydata, python, tutorial&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Location&lt;/strong&gt;: Online event&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Date and time&lt;/strong&gt;: Single Event, add times&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Come learn about Dask at this online free tutorial provided by the Dask maintainers.&lt;/p&gt;
&lt;p&gt;This ninety minute course will mix overview discussion and demonstration by a leader in the Dask community, as well as interactive exercises in live notebook sessions for attendees. The computing environment will be provided.&lt;/p&gt;
&lt;p&gt;If you want to get a sample of similar content, take a look at https://tutorial.dask.org (although this tutorial will cover different material appropriate for this shorter session).&lt;/p&gt;
&lt;p&gt;We look forward to seeing you there!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Image&lt;/strong&gt;: https://i.imgur.com/2i1tMNG.png&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Live video content&lt;/strong&gt;: NA&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Text and media&lt;/strong&gt;: NA&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Links to resources&lt;/strong&gt;:
Tutorial Content (Online Jupyter Notebooks)
https://github.com/jacobtomlinson/dask-video-tutorial-2020&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ticket Cost&lt;/strong&gt;: Free&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ticket Attendee limit&lt;/strong&gt;: 150 people&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="count-down-to-the-tutorial"&gt;
&lt;h2&gt;Count down to the tutorial&lt;/h2&gt;
&lt;p&gt;We also set up a series of automated emails. You can find this under &lt;strong&gt;Manage Attendees &amp;gt; Emails to Attendees&lt;/strong&gt; in the event management page.&lt;/p&gt;
&lt;p&gt;We scheduled emails for two days before, two hours before and 10 minutes before to let folks know where to go and another a few hours after to gather feedback. &lt;em&gt;We will discuss the feedback email shortly&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;You’ll need to ensure you have links to the materials and meeting location ready for this. In our case we pushed the content to GitHub and scheduled the Zoom call ahead of time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Two days and two hours before&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Hi Everyone!&lt;/p&gt;
&lt;p&gt;We look forward to seeing you &amp;lt;tomorrow|soon&amp;gt;. We wanted to share some important links with you to help you connect to the meeting.&lt;/p&gt;
&lt;p&gt;The materials for the course are available on GitHub here at the link below:&lt;/p&gt;
&lt;p&gt;&amp;lt;Link to materials&amp;gt;&lt;/p&gt;
&lt;p&gt;This repository contains Jupyter notebooks that we’ll go through together as a group. You do not need to install anything before the tutorial. We will run the notebooks on the online service, mybinder.org . All you need is a web connection.&lt;/p&gt;
&lt;p&gt;The meeting itself will be held by video call at the following Zoom link:&lt;/p&gt;
&lt;p&gt;&amp;lt;Zoom link and pin&amp;gt;&lt;/p&gt;
&lt;p&gt;We look forward to seeing you soon!&lt;/p&gt;
&lt;p&gt;&amp;lt;Organisers names&amp;gt;&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Ten minutes before&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Hi Everyone!&lt;/p&gt;
&lt;p&gt;We are about to get started. Here’s a final reminder of the meeting details.&lt;/p&gt;
&lt;p&gt;&amp;lt;Zoom link and pin&amp;gt;&lt;/p&gt;
&lt;p&gt;See you in a minute!&lt;/p&gt;
&lt;p&gt;&amp;lt;Organisers names&amp;gt;&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Few hours after&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Hi Everyone!&lt;/p&gt;
&lt;p&gt;Thank you so much for attending the Dask tutorial. We really hope you found it valuable.&lt;/p&gt;
&lt;p&gt;We would really appreciate it if you could answer a couple of quick feedback questions to help us improve things for next time.&lt;/p&gt;
&lt;p&gt;&amp;lt;Google form link &amp;gt;&lt;/p&gt;
&lt;p&gt;Also we want to remind you that the tutorial materials are always available on GitHub and you can run through them any time or share them with others.&lt;/p&gt;
&lt;p&gt;&amp;lt;Link to materials&amp;gt;&lt;/p&gt;
&lt;p&gt;Thanks,&lt;/p&gt;
&lt;p&gt;&amp;lt;Organisers names&amp;gt;&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="getting-the-word-out"&gt;
&lt;h2&gt;Getting the word out&lt;/h2&gt;
&lt;p&gt;Now that we have an Eventbrite page we need to tell people about it.&lt;/p&gt;
&lt;p&gt;You may already have existing channels where you can contact your community. For Dask we have an active twitter account with a good number of followers, so tweeting out the link to the event a couple of times the week running up to the tutorial was enough to fill the spaces.&lt;/p&gt;
&lt;p&gt;If you have a mailing list, or any other platform you will probably want to share it there.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="setting-up-the-call"&gt;
&lt;h2&gt;Setting up the call&lt;/h2&gt;
&lt;p&gt;Be sure to join the call ahead of the attendees. I would make sure this is at least before the final reminder email goes out. Personally I join 20 minutes or so before hand. This allows you to ensure the call is being recorded and that attendees were muted when they join.&lt;/p&gt;
&lt;p&gt;Consider the experience of the user’s here. They will have signed up for an event online, received a few emails with Zoom call details and then they will join the call. If there is no indication that they are in the right place within a few seconds they may become anxious.&lt;/p&gt;
&lt;p&gt;To combat this I tend to show some graphic which lets people know they are in the right place. You could either use a tool like &lt;a class="reference external" href="https://jacobtomlinson.dev/posts/2020/how-to-use-obs-studio-with-zoom-hangouts-teams-and-more-on-macos/"&gt;OBS with Zoom&lt;/a&gt; to create a custom scene or just share your screen with a simple slide saying something like “The Dask tutorial will start soon”.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The only downside to sharing your screen is you can’t continue to use your computer in the run up to the tutorial.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When we ran our first few tutorials we were also running our Dask user survey so also included a link to that on the waiting screen to give folks something to do.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="greeting-and-getting-folks-set-up"&gt;
&lt;h2&gt;Greeting and getting folks set up&lt;/h2&gt;
&lt;p&gt;Say hi on the hour and welcome everyone to the tutorial. But as the event is virtual folks will be late, so don’t kick off until around five minutes in, otherwise you’ll just get a flood of questions asking what’s going on.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="interactivity"&gt;
&lt;h2&gt;Interactivity&lt;/h2&gt;
&lt;p&gt;A fun thing to do during this waiting period is get everyone to introduce themselves in the chat. Say something like “Please say hi in that chat and give your name and where you are joining from”.&lt;/p&gt;
&lt;p&gt;This is nice feedback for you as the instructor to see where folks are joining from, but it also gives the attendees a sense of being in a room full of people. One of the benefits of an event like this is that it is interactive, so be sure to say hi back to people.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I’m awful at pronouncing names correctly so I tend to list the places they said they are from instead. It still makes them feel like their message has been seen.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Once you’re ready to start introduce yourself and a general overview of the tutorial content. Then make use of any interaction tools you may have in your chat application. In zoom there are buttons that participants can click with labels like “go faster”, “go slower”, “yes” and “no”. These are great for getting feedback from the audience when running the tutorial, but it’s good to make sure everyone knows where they are and has a go at using them. I tend to explain where the buttons are and then ask questions like “have you managed to launch the binder?”, “have you used Dask before?” or “are you a Pandas user?”. You learn a little about your audience and they get familiar with the controls.&lt;/p&gt;
&lt;p&gt;Being interactive means you can also respond to user questions. In Dask tutorials we mute everyone by default and encourage folks to type in the text chat. We also have an additional instructor who is not delivering the material who is able to watch the chat and answer questions in real time. If they feel like a question/answer would be beneficial to the whole group they can unmute and interrupt the presenter in order to bubble it up. Be prepared for a wide range of questions from the chat, including topics that are not being actively covered in the tutorial. This is often the only time that attendees have real-time access to core maintainers.&lt;/p&gt;
&lt;p&gt;You may not have the resources to have two instructors for every tutorial, Dask is fortunate to have a strong maintainer team, so instead you may want to allocate breaks at the end of each section to answer questions. During the labs can be a good time to go back and review any questions.&lt;/p&gt;
&lt;p&gt;Interactivity is one of the big benefits a live tutorial has over a video.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="run-through-the-material"&gt;
&lt;h2&gt;Run through the material&lt;/h2&gt;
&lt;p&gt;Once you’re all set up and everyone is in it’s time to run through the material. Given the amount of preparation we did before hand to construct the material this is relatively straight forward. Everything is laid out in front of us and we just need to go through the motions of talking through it.&lt;/p&gt;
&lt;p&gt;I find it very helpful to have a list of the sections with timings written down that I can refer to in order to pace things.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Overview of Dask with Dask Dataframe (10 mins)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introductory Lab (10 mins) and results (5 mins)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask GUI and dashboards (10 mins)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask Array (10 mins)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask ML with lab (10 mins) and results (5 mins)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bags and Futures (10 mins)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Distributed (10 mins)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wrapup and close (5 mins)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;As we have another instructor answering questions I tend to ignore the chat and run through each section as slowly as I can without going over time. Personally my default is to go too fast, so forcing myself to be slow but having some timings to keep me on track seems to work well. But you should do whatever works for you.&lt;/p&gt;
&lt;p&gt;During the labs I tend to mute my microphone and join in with answering questions on the chat.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="wrapping-things-up"&gt;
&lt;h2&gt;Wrapping things up&lt;/h2&gt;
&lt;p&gt;When you’re nearing the end it’s good to have some time for any final questions. People may want to ask things that they didn’t get a chance to earlier or have questions which haven’t fit in with any particular area.&lt;/p&gt;
&lt;p&gt;If you get complex questions or want to go in to depth you may want to offer to stay after and continue talking, but your attendees will appreciate you finishing at the scheduled time as they may have other things booked immediately after.&lt;/p&gt;
&lt;p&gt;It’s always good to leave folks with some extra resources, whether that is links to the documentation, community places they can learn more like a Gitter chat, etc.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="sharing-the-content-later"&gt;
&lt;h2&gt;Sharing the content later&lt;/h2&gt;
&lt;p&gt;Once you’re done it is also beneficial to upload a recording of the tutorial to YouTube. If you’ve livestreamed then this may happen automatically. If you used a tool like Zoom you’ll need to upload it to yourself.&lt;/p&gt;
&lt;p&gt;Anyone watching in the future won’t get the benefit of the interactivity, but should still be able to get much of the benefit from following through the material.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="gathering-feedback-and-planning-for-next-time"&gt;
&lt;h2&gt;Gathering feedback and planning for next time&lt;/h2&gt;
&lt;p&gt;The last thing for you to do is plan for next time. The Dask team have decided to run tutorials every month or so but rotate around timezones to try and cover as many users as possible. We’ve also discussed having special deep dive tutorials which follow the same length and format but dive into one topic in particular.&lt;/p&gt;
&lt;p&gt;To help you plan for future events you will likely want feedback from your participants. You can use tools like Google Forms to create a short questionnaire which you can send out to participants afterwards. In our experience about 20% of participants will fill in a survey that is 10 questions long.&lt;/p&gt;
&lt;p&gt;This feedback can be very helpful for making changes to the content or format. For example in our first tutorial we use OBS for both the intro screen and screen sharing throughout. However Zoom limits webcams to 720p and adds heavy compression, so the quality for participants was not good and 50% of the surveys mentioned poor video. In later tutorials we only used OBS for the intro screen and then used the built in screen sharing utility in Zoom which provided a better experience and no user reported any audio/video issues in the survey.&lt;/p&gt;
&lt;p&gt;Here are some examples of questions we asked and how they were answered for our tutorial.&lt;/p&gt;
&lt;section id="have-you-used-dask-before"&gt;
&lt;h3&gt;Have you used Dask before?&lt;/h3&gt;
&lt;p&gt;When writing our material we said we were “targeting users who were either new to Dask, or had been using it for a while but wanted to learn more about the wider project.”. Our feedback results confirm that we are hitting these groups.&lt;/p&gt;
&lt;p&gt;We could’ve been more specific and asked folks to rank their ability. But the more complex the questions the less likely folks will fill them out, so it’s a balancing act.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Forms response chart. Question title: Have you used Dask before? 39% no, 61% yes." src="https://i.imgur.com/T1loyeb.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="did-we-cover-all-the-topics-you-were-expecting-and-if-not-what-was-missing"&gt;
&lt;h3&gt;Did we cover all the topics you were expecting? And if not, what was missing?&lt;/h3&gt;
&lt;p&gt;Depending on the complexity of your project you may have to make compromises on what you can cover in the time you have. Dask is a large project and so we couldn’t cover everything, so we wanted to check we had covered the basics.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Forms response chart. Question title: Did we cover all the topics you expected? 22% no, 78% yes." src="https://i.imgur.com/la3dqrA.png" /&gt;&lt;/p&gt;
&lt;p&gt;Most of the feedback we had from folks who answered no were asking about advanced topics like Kubernetes, Google Cloud deployments, deep dives into internal workings, etc. I’m satisfied that this shouldn’t have been in this tutorial, but it adds weight to our plans to run deep dives in the future.&lt;/p&gt;
&lt;p&gt;Once useful bit of feedback we had here was “When should I use Dask and when should I stick with Pandas?”. This is something which definitely should be covered by an intro tutorial, so our material is clearly lacking here. As a result we can go back and make modifications and improve the content.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="how-was-the-pace"&gt;
&lt;h3&gt;How was the pace?&lt;/h3&gt;
&lt;p&gt;Setting the pace is hard. If you’re targeting a range of abilities then it’s easy to go too fast or slow for a big chunk of the attendees.&lt;/p&gt;
&lt;p&gt;Our feedback shows that folks were generally happy, but we are leaning on the side of being too fast. Given that we are filling our allocated time this probably indicates that we should cut a little content in order to slow things down.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Forms response chart. Question title: How was the pace? 70% Just right, 26% Too fast, 4% Too slow." src="https://i.imgur.com/mHPNmwp.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="which-sections-did-you-find-more-informative"&gt;
&lt;h3&gt;Which sections did you find more informative?&lt;/h3&gt;
&lt;p&gt;By asking what sections were most informative we can identify things to cut in future if we do need to slow things down. It also shows areas where we may want to spend more time and add more content.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Forms response chart. Question title: Which sections did you find more informative?" src="https://i.imgur.com/XLzSEw4.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-would-be-your-preferred-platform-for-a-tutorial-like-this"&gt;
&lt;h3&gt;What would be your preferred platform for a tutorial like this?&lt;/h3&gt;
&lt;p&gt;We had to make a decision on which video platform to use based on the criteria we discussed earlier. For our tutorials we chose Zoom. By doing a user survey we were able to check that this worked for people and also see if there is an alternative that folks prefer.&lt;/p&gt;
&lt;p&gt;Our results confirmed that folks were happy with Zoom. These results may be a little biased given that we used Zoom, but I’m confident that we can keep using it and folks will have a good experience.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Forms response chart. Question title: What would be your preferred platform for a tutorial like this? 70% Zoom, &amp;lt;5% for options including YouTube, Twitch, Jitsi, and No preference" src="https://i.imgur.com/fMxTZOK.png" /&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="would-you-recommend-the-tutorial-to-a-colleague"&gt;
&lt;h3&gt;Would you recommend the tutorial to a colleague?&lt;/h3&gt;
&lt;p&gt;The last thing to check is that folks had a good time. It gives you great pleasure as an instructor to see 100% of folks say they would recommend to a colleague.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;These results may be biased because if folks wouldn’t recommend it they probably wouldn’t bother to fill out a survey. But hey, I’ll take it!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Forms response chart. Question title: Would you recommend the tutorial to a colleague? 100% Yes." src="https://i.imgur.com/RzrXvfn.png" /&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/21/running-tutorials.md&lt;/span&gt;, line 318)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="wrap-up"&gt;
&lt;h1&gt;Wrap up&lt;/h1&gt;
&lt;p&gt;In this post we have covered why and how you can run community tutorials for open source projects.&lt;/p&gt;
&lt;p&gt;In summary you should run tutorials because:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;You can share knowledge with a range of people with different learning styles&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can give back to your community&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can grow your community&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can improve maintainers knowledge of the whole project&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And you can run a tutorial by following these steps:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Break your project into sections&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write up interactive documents on each section with tools like Jupyter notebooks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Give people access to this content with services like Binder&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Manage attendees with services like Eventbrite&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Advertise your tutorial on social media&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get everyone in a video meeting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make use of the interactive tools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deliver your material&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gather feedback&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2020/08/21/running-tutorials/"/>
    <summary>For the last couple of months we’ve been running community tutorials every three weeks or so. The response from the community has been great and we’ve had 50-100 people at each 90 minute session.</summary>
    <category term="Community" label="Community"/>
    <category term="Tutorials" label="Tutorials"/>
    <published>2020-08-21T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://blog.dask.org/2020/07/17/scipy-2020-maintainers-track/</id>
    <title>Last Year in Review</title>
    <updated>2020-07-17T00:00:00+00:00</updated>
    <author>
      <name>Jacob Tomlinson (NVIDIA)</name>
    </author>
    <content type="html">&lt;p&gt;We recently enjoyed the 2020 SciPy conference from the comfort of our own homes this year. The 19th annual Scientific Computing with Python conference was a virtual conference this year due to the global pandemic. The annual SciPy Conference brought together over 1500 participants from industry, academia, and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.&lt;/p&gt;
&lt;p&gt;As part of the maintainers track we presented an update on Dask.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/07/17/scipy-2020-maintainers-track.md&lt;/span&gt;, line 14)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="video"&gt;

&lt;p&gt;You can find the video on the SciPy YouTube channel. The Dask update runs from 0:00-19:30.&lt;/p&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/XC0M76CmzHg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;&lt;/iframe&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/07/17/scipy-2020-maintainers-track.md&lt;/span&gt;, line 20)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="slides"&gt;
&lt;h1&gt;Slides&lt;/h1&gt;
&lt;script async class="speakerdeck-embed" data-id="ae0f04df5b7341eaa3e2989221be1889" data-ratio="1.77777777777778" src="//speakerdeck.com/assets/embed.js"&gt;&lt;/script&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/07/17/scipy-2020-maintainers-track.md&lt;/span&gt;, line 24)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="talk-summary"&gt;
&lt;h1&gt;Talk Summary&lt;/h1&gt;
&lt;p&gt;Here’s a summary of the main topics covered in the talk. You can also check out the &lt;a class="reference external" href="https://threadreaderapp.com/thread/1280885850914553856.html"&gt;original thread on Twitter&lt;/a&gt;.&lt;/p&gt;
&lt;section id="community-overview"&gt;
&lt;h2&gt;Community overview&lt;/h2&gt;
&lt;p&gt;We’ve been trying to gauge the size of our community lately. The best proxy we have right now is the number of weekly visitors to the &lt;a class="reference external" href="https://docs.dask.org/en/latest/"&gt;Dask documentation&lt;/a&gt;. Which currently stands at around 10,000.&lt;/p&gt;
&lt;img alt="Dask documentation analytics showing growth to 10,000 weekly users over the last four years" src="https://pbs.twimg.com/media/EcaS9DpWkAEBaB4.jpg" style="width: 100%;" /&gt;
&lt;p&gt;Dask also came up in the &lt;a class="reference external" href="https://www.jetbrains.com/lp/devecosystem-2020/python/"&gt;Jetbrains Python developer survey&lt;/a&gt;. We were excited to see 5% of all the Python developers who filled out the survey said they use Dask. Which shows health in the PyData community as well as Dask.&lt;/p&gt;
&lt;img alt="Jetbrains survey results showing Dask used by 5% of Python users, beaten only by the Spark/hadoop ecosystem" src="https://pbs.twimg.com/media/EcaTTuiX0AIT2KB.jpg" style="width: 100%;" /&gt;
&lt;p&gt;We are running &lt;a class="reference external" href="https://dask.org/survey"&gt;our own survey&lt;/a&gt; at the moment. If you are a Dask user please take a few minutes to fill it out. We would really appreciate it.&lt;/p&gt;
&lt;img alt="Link to the Dask survey" src="https://pbs.twimg.com/media/EcaTlITXYAAVs-y.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="community-events"&gt;
&lt;h2&gt;Community events&lt;/h2&gt;
&lt;p&gt;In February we had an in-person &lt;a class="reference external" href="https://blog.dask.org/2020/04/28/dask-summit"&gt;Dask Summit&lt;/a&gt; where a mixture of OSS maintainers and institutional users met. We had talks and workshops to help figure out our challenges and set our direction.&lt;/p&gt;
&lt;img alt="A room of attendees at the Dask summit" src="https://pbs.twimg.com/media/EcaUbHLXQAAHckq.jpg" style="width: 100%;" /&gt;
&lt;p&gt;The Dask community also has a &lt;a class="reference external" href="https://docs.dask.org/en/latest/support.html"&gt;monthly meeting&lt;/a&gt;! It is held on the first Thursday of the month at 10:00 US Central Time. If you’re a Dask user you are welcome to come to hear updates from maintainers and share what you’re working on.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="community-projects"&gt;
&lt;h2&gt;Community projects&lt;/h2&gt;
&lt;p&gt;There are many projects built on Dask. Looking at the preliminary results from the 2020 Dask survey shows some that are especially popular.&lt;/p&gt;
&lt;img alt="Graph showing the most popular projects built on Dask; Xarray, RAPIDS, XGBoost, Prefect and Iris" src="https://pbs.twimg.com/media/EcaVSHpX0AAMDYs.png" style="width: 100%;" /&gt;
&lt;p&gt;Let’s take a look at each of those.&lt;/p&gt;
&lt;section id="xarray"&gt;
&lt;h3&gt;Xarray&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://xarray.pydata.org/en/stable/"&gt;Xarray&lt;/a&gt; allows you to work on multi-dimensional datasets that have supporting metadata arrays in a Pandas-like way.&lt;/p&gt;
&lt;img alt="Slide showing xarray code example" src="https://pbs.twimg.com/media/EcaVbOaXkAMQ4SU.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="rapids"&gt;
&lt;h3&gt;RAPIDS&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://rapids.ai/"&gt;RAPIDS&lt;/a&gt; is an open-source suite of GPU accelerated Python libraries. Using these tools you can execute end-to-end data science and analytics pipelines entirely on GPUs. All using familiar PyData APIs.&lt;/p&gt;
&lt;img alt="Slide showing RAPIDS dataframe code example" src="https://pbs.twimg.com/media/EcaWFfDXkAEX4B_.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="blazingsql"&gt;
&lt;h3&gt;BlazingSQL&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blazingsql.com"&gt;BlazingSQL&lt;/a&gt; builds on RAPIDS and Dask to provide an open-source distributed, GPU accelerated SQL engine.&lt;/p&gt;
&lt;img alt="Slide showing BlazingSQL code example" src="https://pbs.twimg.com/media/EcaWW_CXsAM7XP7.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="xgboost"&gt;
&lt;h3&gt;XGBoost&lt;/h3&gt;
&lt;p&gt;While &lt;a class="reference external" href="https://examples.dask.org/machine-learning/xgboost.html"&gt;XGBoost&lt;/a&gt; has been around for a long time you can now prepare your data on your Dask cluster and then bootstrap your XGBoost cluster on top of Dask and hand the distributed dataframes straight over.&lt;/p&gt;
&lt;img alt="Slide showing XGBoost code example" src="https://pbs.twimg.com/media/EcaXKlRWsAAjLYe.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="prefect"&gt;
&lt;h3&gt;Prefect&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://www.prefect.io/"&gt;Prefect&lt;/a&gt; is a workflow manager which is built on top of Dask’s scheduling engine. “Users organize Tasks into Flows, and Prefect takes care of the rest.”&lt;/p&gt;
&lt;img alt="Slide showing Prefect code example" src="https://pbs.twimg.com/media/EcaXlf-XYAEPY-Z.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="iris"&gt;
&lt;h3&gt;Iris&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://scitools.org.uk/iris/docs/latest/"&gt;Iris&lt;/a&gt;, part of the &lt;a class="reference external" href="https://scitools.org.uk"&gt;SciTools&lt;/a&gt; suite of tools, uses the CF data model giving you a format-agnostic interface for working with your data. It excels when working with multi-dimensional Earth Science data, where tabular representations become unwieldy and inefficient.&lt;/p&gt;
&lt;img alt="Slide showing Iris code example" src="https://pbs.twimg.com/media/EcaX3S9XsAAU-Sm.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="more-tools"&gt;
&lt;h3&gt;More tools&lt;/h3&gt;
&lt;p&gt;These are the tools our community have told us they like so far. But if you use something which didn’t make the list then head to &lt;a class="reference external" href="https://dask.org/survey"&gt;our survey&lt;/a&gt; and let us know! According to PyPI there are many more out there.&lt;/p&gt;
&lt;img alt="Screenshot of PyPI showing 239 packages with Dask in their name" src="https://pbs.twimg.com/media/EcaYZmPWoAANYhr.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="user-groups"&gt;
&lt;h2&gt;User groups&lt;/h2&gt;
&lt;p&gt;There are many user groups who use Dask. Everything from life sciences, geophysical sciences and beamline facilities to finance, retail and logistics. Check out the great &lt;a class="reference external" href="https://youtu.be/t_GRK4L-bnw"&gt;“Who uses Dask?” talk&lt;/a&gt; from &lt;a class="reference external" href="https://twitter.com/mrocklin"&gt;Matthew Rocklin&lt;/a&gt; for more info.&lt;/p&gt;
&lt;img alt="Screenshot 'Who uses Dask?' YouTube video" src="https://pbs.twimg.com/media/EcaYj2JXQAEvgV3.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="for-profit-companies"&gt;
&lt;h2&gt;For profit companies&lt;/h2&gt;
&lt;p&gt;There has been an increase in for-profit companies building tools with Dask. Including &lt;a class="reference external" href="https://coiled.io/"&gt;Coiled Computing&lt;/a&gt;, &lt;a class="reference external" href="https://www.prefect.io/"&gt;Prefect&lt;/a&gt; and &lt;a class="reference external" href="https://www.saturncloud.io/s/"&gt;Saturn Cloud&lt;/a&gt;.&lt;/p&gt;
&lt;img alt="Slide describing the for-profit companies Coiled, Prefect and Saturn Cloud" src="https://pbs.twimg.com/media/EcaZOqgX0AABFpQ.jpg" style="width: 100%;" /&gt;
&lt;p&gt;We’ve also seen large companies like Microsoft’s &lt;a class="reference external" href="https://azure.microsoft.com/en-gb/services/machine-learning/"&gt;Azure ML&lt;/a&gt; team contributing a cluster manager to &lt;a class="reference external" href="https://cloudprovider.dask.org/en/latest/#azure"&gt;Dask Cloudprovider&lt;/a&gt;. This helps folks get up and running with Dask on AzureML quicker and easier.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="recent-improvements"&gt;
&lt;h2&gt;Recent improvements&lt;/h2&gt;
&lt;section id="communications"&gt;
&lt;h3&gt;Communications&lt;/h3&gt;
&lt;p&gt;Moving on to recent improvements there has been a lot of work to get &lt;a class="reference external" href="https://www.openucx.org/"&gt;Open UCX&lt;/a&gt; supported as a protocol in Dask. Which allows worker-worker communication to be accelerated vastly with hardware that supports &lt;a class="reference external" href="https://en.wikipedia.org/wiki/InfiniBand"&gt;Infiniband&lt;/a&gt; or &lt;a class="reference external" href="https://en.wikipedia.org/wiki/NVLink"&gt;NVLink&lt;/a&gt;.&lt;/p&gt;
&lt;img alt="Slide showing worker communication comparison between UCX/Infiniband and TCP with UCX being much faster" src="https://pbs.twimg.com/media/EcaaTxiXQAE4TD0.jpg" style="width: 100%;" /&gt;
&lt;p&gt;There have also been some &lt;a class="reference external" href="https://blogs.nvidia.com/blog/2020/06/22/big-data-analytics-tpcx-bb/"&gt;recent announcements&lt;/a&gt; around NVIDIA blowing away the TPCx-BB benchmark by outperforming the current leader by 20x. This is a huge success for all the open-source projects that were involved, including Dask.&lt;/p&gt;
&lt;img alt="Slide showing TPCx-BB benchmark results" src="https://pbs.twimg.com/media/EcabNUVWoAQGy8e.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="dask-gateway"&gt;
&lt;h3&gt;Dask Gateway&lt;/h3&gt;
&lt;p&gt;We’ve seen increased adoption of &lt;a class="reference external" href="https://gateway.dask.org"&gt;Dask Gateway&lt;/a&gt;. Many institutions are using it as a way to provide their staff with on-demand Dask clusters.&lt;/p&gt;
&lt;img alt="Slide showing Dask Gateway overview" src="https://pbs.twimg.com/media/EcabpirWkAYtx-W.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="cluster-map-plot-aka-pew-pew-pew"&gt;
&lt;h3&gt;Cluster map plot (aka ‘pew pew pew’)&lt;/h3&gt;
&lt;p&gt;The update that got the most 👏 feedback from the SciPy 2020 attendees was the Cluster Map Plot (known to maintainers as the “pew pew pew” plot). This plot shows a high-level overview of your Dask cluster scheduler and workers and the communication between them.&lt;/p&gt;
&lt;p&gt;&lt;video autoplay="" loop="" controls="" poster="https://pbs.twimg.com/tweet_video_thumb/EcacHRcXkAE53eI.jpg"&gt;&lt;source src="https://video.twimg.com/tweet_video/EcacHRcXkAE53eI.mp4" type="video/mp4"&gt;&lt;img alt="" src="https://pbs.twimg.com/tweet_video_thumb/EcacHRcXkAE53eI.jpg"&gt;&lt;/video&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="next-steps"&gt;
&lt;h2&gt;Next steps&lt;/h2&gt;
&lt;section id="high-level-graph-optimization"&gt;
&lt;h3&gt;High-level graph optimization&lt;/h3&gt;
&lt;p&gt;To wrap up with what Dask is going to be doing next we are going to be continuing to work on high-level graph optimization.&lt;/p&gt;
&lt;img alt="Slide showing High Level Graph documentation page" src="https://pbs.twimg.com/media/EcacZvfWsAIfqTz.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;section id="scheduler-performance"&gt;
&lt;h3&gt;Scheduler performance&lt;/h3&gt;
&lt;p&gt;With feedback from our community we are also going to be focussing on making the &lt;a class="reference external" href="https://github.com/dask/distributed/issues/3783"&gt;Dask scheduler more performant&lt;/a&gt;. There are a few things happening including a Rust implementation of the scheduler, dynamic task creation and ongoing benchmarking.&lt;/p&gt;
&lt;img alt="Scheduler performance tasks including a Rust implementation, benchmarking, dynamic tasks and Cython, PyPy and C experiments" src="https://pbs.twimg.com/media/Ecacr6pWoAEd4Tx.jpg" style="width: 100%;" /&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="chan-zuckerberg-foundation-maintainer-post"&gt;
&lt;h2&gt;Chan Zuckerberg Foundation maintainer post&lt;/h2&gt;
&lt;p&gt;Lastly I’m excited to share that with funding from the &lt;a class="reference external" href="https://chanzuckerberg.com/eoss/proposals/scaling-python-with-dask/"&gt;Chan Zuckerberg Foundation&lt;/a&gt;, Dask will be hiring a maintainer who will focus on growing usage in the biological sciences field. If that is of interest to you keep an eye on &lt;a class="reference external" href="https://twitter.com/dask_dev"&gt;our twitter account&lt;/a&gt; for more announcements.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.dask.org/2020/07/17/scipy-2020-maintainers-track/"/>
    <summary>We recently enjoyed the 2020 SciPy conference from the comfort of our own homes this year. The 19th annual Scientific Computing with Python conference was a virtual conference this year due to the global pandemic. The annual SciPy Conference brought together over 1500 participants from industry, academia, and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.</summary>
    <category term="Community" label="Community"/>
    <category term="SciPy" label="SciPy"/>
    <category term="Talk" label="Talk"/>
    <published>2020-07-17T00:00:00+00:00</published>
  </entry>
</feed>
