Skip to main content
  • Dask
  • Get Started
  • Community
    • Get Help
    • Powered By
  • Blog
  • Docs
Ctrl+K

Dask Blog

  • Motivation
  • Setup
  • Setup
  • Introduction
  • Introduction
    • Introduction
    • Introduction
    • Dask arrays work
    • Introduction
    • Evaluate dask graphs
    • Most Parallel Computation is Simple
    • Recent Parallel Work Focuses on Big Collections
    • Humans Repeat Stuff
    • Distributed Computing
    • Collections
    • GitHub Archive Data on S3
    • Play with Distributed Data
    • Setup
    • Identify a Problem
    • TL;DR.
    • What is grid search?
    • Introduction
    • Major Changes and Features
    • Biggest difference: Worker state and communication
    • Example: Kubernetes
    • Embedded Bokeh Servers in Dask Workers
    • Dask array without known chunk sizes
    • Rewriting Load Balancing
    • Load Balancing Cleanup
    • Summary
    • Summary
    • Summary
    • Stability enhancements and micro-release
    • Summary
    • Stability enhancements and micro-release
    • Summary
    • Summary
    • Dask-GLM and iterative algorithms
    • Summary
    • Summary
    • Arrays
    • Summary
    • Summary
    • Joblib
    • Arrays
    • NumPy ufuncs operate as Dask.array ufuncs
    • CSV is convenient, but slow
    • Summary
    • New dask-core and dask conda packages
    • Deploying Dask with MPI
    • TL;DR:
    • Masked Arrays
    • Summary
    • Summary
    • Cython
    • Breaking Changes
    • Community Communication
    • The Problem
    • This is a guest post
    • Deprecations
    • Minimal Complete Verifiable Examples
    • Tornado 5.0
    • Executive Summary
    • Context
    • History
    • Yarn deployment
    • Easy to Contribute
    • Dask on HPC Machines
    • Pickle is slow
    • Stateful processing with Actors
    • Start
    • Question
    • Notable Changes
    • 1: Update Dask Examples to use JupyterLab extension
    • Summary
    • Introduction
    • What 1.0 means to us
    • Combine Dask Array with CuPy
    • Executive Summary
    • Summary
    • Notable Changes
    • Summary
    • Executive Summary
    • Summary
    • Summary
    • Setup
    • Numba Stencils
    • Executive Summary
    • Reasons why we use Dask
    • Executive Summary
    • Executive Summary
    • Install
    • TL;DR
    • Notable Changes
    • 2019 Dask User Survey Results
    • Executive Summary
    • Power Architecture
    • First, why would you do this?
    • Problem
    • Groupby Aggregations with Dask
    • Summary
    • Bots dominate download counts
    • Who came?
    • Summary
    • Video
    • Summary
    • Manual setup
    • How to host a distributed Dask cluster
    • Consistency with the Scikit-Learn API
    • Why should open source projects run tutorials
    • History
    • Highlights
    • Summary
    • History
    • Executive Summary
    • The problem: fixed processing chunks and a high memory/CPU ratio {#problem}
    • Executive Summary
    • Executive Summary
    • Executive Summary
    • Motivation for change
    • Executive Summary
    • Why take the survey?
    • In the beginning
    • Motivation
    • Executive Summary
    • Executive Summary
    • Executive Summary
    • Overview
    • Contents
    • Progress Overview
    • Summary
    • Executive Summary
    • Summary
    • Introduction
    • Executive Summary
    • Summary
    • What is meta?
    • What is an operator?
    • Slow is smooth
    • Visualization at Lightning Speed
    • Dispatching for Array Creation
    • What is Flyte?
    • What is Shuffling?
    • What is from_map?
    • What does this mean for me?
    • Nightly testing
    • Introduction
    • Intro
    • What is GroupBy.map?
  • GitHub
  • Motivation
  • Setup
  • Setup
  • Introduction
  • Introduction
  • Introduction
  • Introduction
  • Dask arrays work
  • Introduction
  • Evaluate dask graphs
  • Most Parallel Computation is Simple
  • Recent Parallel Work Focuses on Big Collections
  • Humans Repeat Stuff
  • Distributed Computing
  • Collections
  • GitHub Archive Data on S3
  • Play with Distributed Data
  • Setup
  • Identify a Problem
  • TL;DR.
  • What is grid search?
  • Introduction
  • Major Changes and Features
  • Biggest difference: Worker state and communication
  • Example: Kubernetes
  • Embedded Bokeh Servers in Dask Workers
  • Dask array without known chunk sizes
  • Rewriting Load Balancing
  • Load Balancing Cleanup
  • Summary
  • Summary
  • Summary
  • Stability enhancements and micro-release
  • Summary
  • Stability enhancements and micro-release
  • Summary
  • Summary
  • Dask-GLM and iterative algorithms
  • Summary
  • Summary
  • Arrays
  • Summary
  • Summary
  • Joblib
  • Arrays
  • NumPy ufuncs operate as Dask.array ufuncs
  • CSV is convenient, but slow
  • Summary
  • New dask-core and dask conda packages
  • Deploying Dask with MPI
  • TL;DR:
  • Masked Arrays
  • Summary
  • Summary
  • Cython
  • Breaking Changes
  • Community Communication
  • The Problem
  • This is a guest post
  • Deprecations
  • Minimal Complete Verifiable Examples
  • Tornado 5.0
  • Executive Summary
  • Context
  • History
  • Yarn deployment
  • Easy to Contribute
  • Dask on HPC Machines
  • Pickle is slow
  • Stateful processing with Actors
  • Start
  • Question
  • Notable Changes
  • 1: Update Dask Examples to use JupyterLab extension
  • Summary
  • Introduction
  • What 1.0 means to us
  • Combine Dask Array with CuPy
  • Executive Summary
  • Summary
  • Notable Changes
  • Summary
  • Executive Summary
  • Summary
  • Summary
  • Setup
  • Numba Stencils
  • Executive Summary
  • Reasons why we use Dask
  • Executive Summary
  • Executive Summary
  • Install
  • TL;DR
  • Notable Changes
  • 2019 Dask User Survey Results
  • Executive Summary
  • Power Architecture
  • First, why would you do this?
  • Problem
  • Groupby Aggregations with Dask
  • Summary
  • Bots dominate download counts
  • Who came?
  • Summary
  • Video
  • Summary
  • Manual setup
  • How to host a distributed Dask cluster
  • Consistency with the Scikit-Learn API
  • Why should open source projects run tutorials
  • History
  • Highlights
  • Summary
  • History
  • Executive Summary
  • The problem: fixed processing chunks and a high memory/CPU ratio {#problem}
  • Executive Summary
  • Executive Summary
  • Executive Summary
  • Motivation for change
  • Executive Summary
  • Why take the survey?
  • In the beginning
  • Motivation
  • Executive Summary
  • Executive Summary
  • Executive Summary
  • Overview
  • Contents
  • Progress Overview
  • Summary
  • Executive Summary
  • Summary
  • Introduction
  • Executive Summary
  • Summary
  • What is meta?
  • What is an operator?
  • Slow is smooth
  • Visualization at Lightning Speed
  • Dispatching for Array Creation
  • What is Flyte?
  • What is Shuffling?
  • What is from_map?
  • What does this mean for me?
  • Nightly testing
  • Introduction
  • Intro
  • What is GroupBy.map?
  • GitHub

Tags

  • Australia
  • Community
  • CuPy
  • Dask
  • Dask Gateway
  • Dask Summit
  • Dask-GLM
  • Deployment
  • Distributed
  • Flyte
  • GPU
  • HPC
  • Helm
  • IO
  • Kubernetes
  • MPI
  • Organisations
  • Pandas
  • Programming
  • PyTorch
  • Python
  • RAPIDS
  • SciPy
  • Sparse
  • Talk
  • Tools
  • Tutorials
  • User Survey
  • array
  • clusters
  • config
  • cupy
  • dask
  • dask array
  • dask-image
  • dask-kubernetes
  • dask-ml
  • dataframe
  • deep learning
  • deployment
  • distributed
  • ecosystem
  • geoscience
  • image analysis
  • imaging
  • jobqueue
  • kubernetes
  • life science
  • machine-learning
  • memory
  • numba
  • p2p
  • pangeo
  • performance
  • profiling
  • pydata
  • python
  • query optimizer
  • ram
  • ray
  • release
  • scikit-image
  • scipy
  • shuffling
  • skan
  • skeleton analysis
  • xarray

Archives

  • 2024 (2)
  • 2023 (7)
  • 2022 (7)
  • 2021 (20)
  • 2020 (12)
  • 2019 (24)
  • 2018 (20)
  • 2017 (28)
  • 2016 (14)
  • 2015 (13)
  • 2014 (2)

Posted in 2024

Nov 21, 2024 - Improving GroupBy.map with Dask and Xarray

May 30, 2024 - Dask DataFrame is Fast Now

Posted in 2023

Aug 25, 2023 - High Level Query Optimization in Dask

Apr 18, 2023 - Upstream testing in Dask

Apr 14, 2023 - Do you need consistent environments between the client, scheduler and workers?

Apr 12, 2023 - Deep Dive into creating a Dask DataFrame Collection with from_map

Mar 15, 2023 - Shuffling large data at constant memory in Dask

Feb 13, 2023 - Managing dask workloads with Flyte

Feb 02, 2023 - Easy CPU/GPU Arrays and Dataframes

Posted in 2022

Nov 21, 2022 - Dask Demo Day November 2022

Nov 15, 2022 - Reducing memory usage in Dask workloads by 80%

Nov 09, 2022 - Dask Kubernetes Operator

Aug 09, 2022 - Understanding Dask’s meta keyword argument

Jul 19, 2022 - Data Proximate Computation on a Dask Cluster Distributed Between Data Centres

Jul 15, 2022 - Documentation Framework

Feb 17, 2022 - How to run different worker types with the Dask Helm Chart

Posted in 2021

Dec 15, 2021 - Reflections on one year as the Dask life science fellow

Dec 01, 2021 - Mosaic Image Fusion

Nov 02, 2021 - Choosing good chunk sizes in Dask

Oct 20, 2021 - CZI EOSS Update

Sep 15, 2021 - 2021 Dask User Survey

Aug 23, 2021 - Google Summer of Code 2021 - Dask Project

Jul 07, 2021 - High Level Graphs update

Jul 02, 2021 - Ragged output, how to handle awkward shaped results

Jun 25, 2021 - Dask Down Under

Jun 18, 2021 - Dask Survey 2021, early anecdotes

Jun 01, 2021 - The evolution of a Dask Distributed user

May 25, 2021 - The 2021 Dask User Survey is out now

May 24, 2021 - Life sciences at the 2021 Dask Summit

May 21, 2021 - Stability of the Dask library

May 07, 2021 - Skeleton analysis

Mar 29, 2021 - Dask with PyTorch for large scale image analysis

Mar 19, 2021 - Image segmentation with Dask

Mar 11, 2021 - Measuring Dask memory usage with dask-memusage

Mar 04, 2021 - Getting to know the life science community

Mar 03, 2021 - Dask User Summit 2021

Posted in 2020

Nov 12, 2020 - Image Analysis Redux

Sep 22, 2020 - 2020 Dask User Survey

Aug 31, 2020 - Announcing the DaskHub Helm Chart

Aug 21, 2020 - Running tutorials

Aug 06, 2020 - Comparing Dask-ML and Ray Tune's Model Selection Algorithms

Jul 30, 2020 - Configuring a Distributed Dask Cluster

Jul 23, 2020 - The current state of distributed Dask clusters

Jul 21, 2020 - Faster Scheduling

Jul 17, 2020 - Last Year in Review

May 13, 2020 - Large SVDs

Apr 28, 2020 - Dask Summit

Jan 14, 2020 - Estimating Users

Posted in 2019

Nov 01, 2019 - Dask Deployment Updates

Oct 08, 2019 - DataFrame Groupby Aggregations

Sep 30, 2019 - Better and faster hyperparameter optimization with Dask

Sep 13, 2019 - Co-locating a Jupyter Server and Dask Scheduler

Aug 28, 2019 - Dask on HPC: a case study

Aug 09, 2019 - Dask and ITK for large scale image analysis

Aug 05, 2019 - 2019 Dask User Survey

Aug 02, 2019 - Dask Release 2.2.0

Jul 23, 2019 - Extracting fsspec from Dask

Jun 22, 2019 - Dask Release 2.0

Jun 20, 2019 - Load Large Image Data with Dask Array

Jun 19, 2019 - Python and GPUs: A Status Update

Jun 12, 2019 - Dask on HPC

Jun 09, 2019 - Experiments in High Performance Networking with UCX and DGX

Apr 09, 2019 - Composing Dask Array with Numba Stencils

Mar 27, 2019 - cuML and Dask hyperparameter optimization

Mar 18, 2019 - Dask and the __array_function__ protocol

Mar 04, 2019 - Building GPU Groupby-Aggregations for Dask

Jan 31, 2019 - Running Dask and MPI programs together

Jan 29, 2019 - Single-Node Multi-GPU Dataframe Joins

Jan 23, 2019 - Dask Release 1.1.0

Jan 22, 2019 - Extension Arrays in Dask DataFrame

Jan 13, 2019 - Dask, Pandas, and GPUs: first steps

Jan 03, 2019 - GPU Dask Arrays, first steps

Posted in 2018

Nov 29, 2018 - Dask Version 1.0

Oct 08, 2018 - Dask-jobqueue

Sep 27, 2018 - Refactor Documentation

Sep 17, 2018 - Dask Development Log

Sep 05, 2018 - Dask Release 0.19.0

Aug 28, 2018 - High level performance of Pandas, Dask, Spark, and Arrow

Aug 07, 2018 - Building SAGA optimization for Dask arrays

Aug 02, 2018 - Dask Development Log

Jul 23, 2018 - Pickle isn't slow, it's a protocol

Jul 17, 2018 - Dask Development Log, Scipy 2018

Jul 16, 2018 - Who uses Dask?

Jul 08, 2018 - Dask Development Log

Jun 26, 2018 - Dask Scaling Limits

Jun 14, 2018 - Dask Release 0.18.0

May 27, 2018 - Beyond Numpy Arrays in Python

Mar 21, 2018 - Dask Release 0.17.2

Feb 28, 2018 - Craft Minimal Bug Reports

Feb 12, 2018 - Dask Release 0.17.0

Feb 09, 2018 - Credit Modeling with Dask

Jan 22, 2018 - Pangeo: JupyterHub, Dask, and XArray on the Cloud

Posted in 2017

Dec 06, 2017 - Dask Development Log

Nov 21, 2017 - Dask Release 0.16.0

Nov 03, 2017 - Optimizing Data Structure Access in Python

Oct 16, 2017 - Streaming Dataframes

Oct 10, 2017 - Notes on Kafka in Python

Sep 24, 2017 - Dask Release 0.15.3

Sep 21, 2017 - Fast GeoSpatial Analysis in Python

Sep 18, 2017 - Dask on HPC - Initial Work

Aug 30, 2017 - Dask Release 0.15.2

Jul 03, 2017 - Dask Benchmarks

Jun 28, 2017 - Use Apache Parquet

Jun 15, 2017 - Dask Release 0.15.0

May 08, 2017 - Dask Release 0.14.3

Apr 28, 2017 - Dask Development Log

Apr 19, 2017 - Asynchronous Optimization Algorithms with Dask

Mar 28, 2017 - Dask and Pandas and XGBoost

Mar 23, 2017 - Dask Release 0.14.1

Mar 22, 2017 - Developing Convex Optimization Algorithms in Dask

Feb 27, 2017 - Dask Release 0.14.0

Feb 20, 2017 - Dask Development Log

Feb 11, 2017 - Experiment with Dask and TensorFlow

Feb 07, 2017 - Two Easy Ways to Use Scikit Learn and Dask

Jan 30, 2017 - Dask Development Log

Jan 24, 2017 - Custom Parallel Algorithms on a Cluster with Dask

Jan 18, 2017 - Dask Development Log

Jan 17, 2017 - Distributed NumPy on a Cluster with Dask Arrays

Jan 12, 2017 - Distributed Pandas on a Cluster with Dask DataFrames

Jan 03, 2017 - Dask Release 0.13.0

Posted in 2016

Dec 24, 2016 - Dask Development Log

Dec 18, 2016 - Dask Development Log

Dec 12, 2016 - Dask Development Log

Dec 05, 2016 - Dask Development Log

Sep 22, 2016 - Dask Cluster Deployments

Sep 13, 2016 - Dask and Celery

Sep 12, 2016 - Dask Distributed Release 1.13.0

Aug 16, 2016 - Dask for Institutions

Jul 12, 2016 - Dask and Scikit-Learn -- Model Parallelism

Apr 20, 2016 - Ad Hoc Distributed Random Forests

Apr 14, 2016 - Fast Message Serialization

Feb 26, 2016 - Distributed Dask Arrays

Feb 22, 2016 - Pandas on HDFS with Dask Dataframes

Feb 17, 2016 - Introducing Dask distributed

Posted in 2015

Dec 21, 2015 - Dask is one year old

Oct 09, 2015 - Distributed Prototype

Aug 03, 2015 - Caching

Jul 23, 2015 - Custom Parallel Workflows

Jun 26, 2015 - Write Complex Parallel Algorithms

Jun 23, 2015 - Distributed Scheduling

May 19, 2015 - State of Dask

Mar 11, 2015 - Towards Out-of-core DataFrames

Feb 17, 2015 - Towards Out-of-core ND-Arrays -- Dask + Toolz = Bag

Feb 13, 2015 - Towards Out-of-core ND-Arrays -- Slicing and Stacking

Jan 16, 2015 - Towards Out-of-core ND-Arrays -- Spilling to Disk

Jan 14, 2015 - Towards Out-of-core ND-Arrays -- Benchmark MatMul

Jan 06, 2015 - Towards Out-of-core ND-Arrays -- Multi-core Scheduling

Posted in 2014

Dec 30, 2014 - Towards Out-of-core ND-Arrays -- Frontend

Dec 27, 2014 - Towards Out-of-core ND-Arrays

By Dask Contributors

© Copyright 2014-2026, Dask Contributors.

Get Started

  • Documentation
  • Install
  • Examples

Community

  • Get Help
  • Powered By
  • Ecosystem

Blog

  • Dask Blog
  • Contribute

Brand

  • Brand Guidelines
Copyright © 2014-2026, Dask Contributors New-BSD Licensed.