Skip to main content
  • Dask
  • Get Started
  • Community
    • Get Help
    • Powered By
  • Blog
  • Docs
Ctrl+K

Dask Blog

  • Motivation
  • Setup
  • Setup
  • Introduction
  • Introduction
    • Introduction
    • Introduction
    • Dask arrays work
    • Introduction
    • Evaluate dask graphs
    • Most Parallel Computation is Simple
    • Recent Parallel Work Focuses on Big Collections
    • Humans Repeat Stuff
    • Distributed Computing
    • Collections
    • GitHub Archive Data on S3
    • Play with Distributed Data
    • Setup
    • Identify a Problem
    • TL;DR.
    • What is grid search?
    • Introduction
    • Major Changes and Features
    • Biggest difference: Worker state and communication
    • Example: Kubernetes
    • Embedded Bokeh Servers in Dask Workers
    • Dask array without known chunk sizes
    • Rewriting Load Balancing
    • Load Balancing Cleanup
    • Summary
    • Summary
    • Summary
    • Stability enhancements and micro-release
    • Summary
    • Stability enhancements and micro-release
    • Summary
    • Summary
    • Dask-GLM and iterative algorithms
    • Summary
    • Summary
    • Arrays
    • Summary
    • Summary
    • Joblib
    • Arrays
    • NumPy ufuncs operate as Dask.array ufuncs
    • CSV is convenient, but slow
    • Summary
    • New dask-core and dask conda packages
    • Deploying Dask with MPI
    • TL;DR:
    • Masked Arrays
    • Summary
    • Summary
    • Cython
    • Breaking Changes
    • Community Communication
    • The Problem
    • This is a guest post
    • Deprecations
    • Minimal Complete Verifiable Examples
    • Tornado 5.0
    • Executive Summary
    • Context
    • History
    • Yarn deployment
    • Easy to Contribute
    • Dask on HPC Machines
    • Pickle is slow
    • Stateful processing with Actors
    • Start
    • Question
    • Notable Changes
    • 1: Update Dask Examples to use JupyterLab extension
    • Summary
    • Introduction
    • What 1.0 means to us
    • Combine Dask Array with CuPy
    • Executive Summary
    • Summary
    • Notable Changes
    • Summary
    • Executive Summary
    • Summary
    • Summary
    • Setup
    • Numba Stencils
    • Executive Summary
    • Reasons why we use Dask
    • Executive Summary
    • Executive Summary
    • Install
    • TL;DR
    • Notable Changes
    • 2019 Dask User Survey Results
    • Executive Summary
    • Power Architecture
    • First, why would you do this?
    • Problem
    • Groupby Aggregations with Dask
    • Summary
    • Bots dominate download counts
    • Who came?
    • Summary
    • Video
    • Summary
    • Manual setup
    • How to host a distributed Dask cluster
    • Consistency with the Scikit-Learn API
    • Why should open source projects run tutorials
    • History
    • Highlights
    • Summary
    • History
    • Executive Summary
    • The problem: fixed processing chunks and a high memory/CPU ratio {#problem}
    • Executive Summary
    • Executive Summary
    • Executive Summary
    • Motivation for change
    • Executive Summary
    • Why take the survey?
    • In the beginning
    • Motivation
    • Executive Summary
    • Executive Summary
    • Executive Summary
    • Overview
    • Contents
    • Progress Overview
    • Summary
    • Executive Summary
    • Summary
    • Introduction
    • Executive Summary
    • Summary
    • What is meta?
    • What is an operator?
    • Slow is smooth
    • Visualization at Lightning Speed
    • Dispatching for Array Creation
    • What is Flyte?
    • What is Shuffling?
    • What is from_map?
    • What does this mean for me?
    • Nightly testing
    • Introduction
    • Intro
    • What is GroupBy.map?
  • GitHub
  • Motivation
  • Setup
  • Setup
  • Introduction
  • Introduction
  • Introduction
  • Introduction
  • Dask arrays work
  • Introduction
  • Evaluate dask graphs
  • Most Parallel Computation is Simple
  • Recent Parallel Work Focuses on Big Collections
  • Humans Repeat Stuff
  • Distributed Computing
  • Collections
  • GitHub Archive Data on S3
  • Play with Distributed Data
  • Setup
  • Identify a Problem
  • TL;DR.
  • What is grid search?
  • Introduction
  • Major Changes and Features
  • Biggest difference: Worker state and communication
  • Example: Kubernetes
  • Embedded Bokeh Servers in Dask Workers
  • Dask array without known chunk sizes
  • Rewriting Load Balancing
  • Load Balancing Cleanup
  • Summary
  • Summary
  • Summary
  • Stability enhancements and micro-release
  • Summary
  • Stability enhancements and micro-release
  • Summary
  • Summary
  • Dask-GLM and iterative algorithms
  • Summary
  • Summary
  • Arrays
  • Summary
  • Summary
  • Joblib
  • Arrays
  • NumPy ufuncs operate as Dask.array ufuncs
  • CSV is convenient, but slow
  • Summary
  • New dask-core and dask conda packages
  • Deploying Dask with MPI
  • TL;DR:
  • Masked Arrays
  • Summary
  • Summary
  • Cython
  • Breaking Changes
  • Community Communication
  • The Problem
  • This is a guest post
  • Deprecations
  • Minimal Complete Verifiable Examples
  • Tornado 5.0
  • Executive Summary
  • Context
  • History
  • Yarn deployment
  • Easy to Contribute
  • Dask on HPC Machines
  • Pickle is slow
  • Stateful processing with Actors
  • Start
  • Question
  • Notable Changes
  • 1: Update Dask Examples to use JupyterLab extension
  • Summary
  • Introduction
  • What 1.0 means to us
  • Combine Dask Array with CuPy
  • Executive Summary
  • Summary
  • Notable Changes
  • Summary
  • Executive Summary
  • Summary
  • Summary
  • Setup
  • Numba Stencils
  • Executive Summary
  • Reasons why we use Dask
  • Executive Summary
  • Executive Summary
  • Install
  • TL;DR
  • Notable Changes
  • 2019 Dask User Survey Results
  • Executive Summary
  • Power Architecture
  • First, why would you do this?
  • Problem
  • Groupby Aggregations with Dask
  • Summary
  • Bots dominate download counts
  • Who came?
  • Summary
  • Video
  • Summary
  • Manual setup
  • How to host a distributed Dask cluster
  • Consistency with the Scikit-Learn API
  • Why should open source projects run tutorials
  • History
  • Highlights
  • Summary
  • History
  • Executive Summary
  • The problem: fixed processing chunks and a high memory/CPU ratio {#problem}
  • Executive Summary
  • Executive Summary
  • Executive Summary
  • Motivation for change
  • Executive Summary
  • Why take the survey?
  • In the beginning
  • Motivation
  • Executive Summary
  • Executive Summary
  • Executive Summary
  • Overview
  • Contents
  • Progress Overview
  • Summary
  • Executive Summary
  • Summary
  • Introduction
  • Executive Summary
  • Summary
  • What is meta?
  • What is an operator?
  • Slow is smooth
  • Visualization at Lightning Speed
  • Dispatching for Array Creation
  • What is Flyte?
  • What is Shuffling?
  • What is from_map?
  • What does this mean for me?
  • Nightly testing
  • Introduction
  • Intro
  • What is GroupBy.map?
  • GitHub

Tags

  • Australia
  • Community
  • CuPy
  • Dask
  • Dask Gateway
  • Dask Summit
  • Dask-GLM
  • Deployment
  • Distributed
  • Flyte
  • GPU
  • HPC
  • Helm
  • IO
  • Kubernetes
  • MPI
  • Organisations
  • Pandas
  • Programming
  • PyTorch
  • Python
  • RAPIDS
  • SciPy
  • Sparse
  • Talk
  • Tools
  • Tutorials
  • User Survey
  • array
  • clusters
  • config
  • cupy
  • dask
  • dask array
  • dask-image
  • dask-kubernetes
  • dask-ml
  • dataframe
  • deep learning
  • deployment
  • distributed
  • ecosystem
  • geoscience
  • image analysis
  • imaging
  • jobqueue
  • kubernetes
  • life science
  • machine-learning
  • memory
  • numba
  • p2p
  • pangeo
  • performance
  • profiling
  • pydata
  • python
  • query optimizer
  • ram
  • ray
  • release
  • scikit-image
  • scipy
  • shuffling
  • skan
  • skeleton analysis
  • xarray

Archives

  • 2024 (2)
  • 2023 (7)
  • 2022 (7)
  • 2021 (20)
  • 2020 (12)
  • 2019 (24)
  • 2018 (20)
  • 2017 (28)
  • 2016 (14)
  • 2015 (13)
  • 2014 (2)

Posted in 2024

Improving GroupBy.map with Dask and Xarray

  • Nov 21, 2024
  • Patrick Hoefler
  • dask array xarray

This post was originally published on the Xarray blog.

Read more ...


Dask DataFrame is Fast Now

  • May 30, 2024
  • Patrick Hoefler
  • dask query optimizer dataframe

This work was engineered and supported by Coiled and NVIDIA. Thanks to Patrick Hoefler and Rick Zamora, in particular. Original version of this post appears on docs.coiled.io

Read more ...


By Dask Contributors

© Copyright 2014-2026, Dask Contributors.

Get Started

  • Documentation
  • Install
  • Examples

Community

  • Get Help
  • Powered By
  • Ecosystem

Blog

  • Dask Blog
  • Contribute

Brand

  • Brand Guidelines
Copyright © 2014-2026, Dask Contributors New-BSD Licensed.