All posts — Dask Working Notes

Posted in 2024

Nov 21, 2024 - Improving GroupBy.map with Dask and Xarray

May 30, 2024 - Dask DataFrame is Fast Now

Posted in 2023

Aug 25, 2023 - High Level Query Optimization in Dask

Apr 18, 2023 - Upstream testing in Dask

Apr 14, 2023 - Do you need consistent environments between the client, scheduler and workers?

Apr 12, 2023 - Deep Dive into creating a Dask DataFrame Collection with from_map

Mar 15, 2023 - Shuffling large data at constant memory in Dask

Feb 13, 2023 - Managing dask workloads with Flyte

Feb 02, 2023 - Easy CPU/GPU Arrays and Dataframes

Posted in 2022

Nov 21, 2022 - Dask Demo Day November 2022

Nov 15, 2022 - Reducing memory usage in Dask workloads by 80%

Nov 09, 2022 - Dask Kubernetes Operator

Aug 09, 2022 - Understanding Dask’s meta keyword argument

Jul 19, 2022 - Data Proximate Computation on a Dask Cluster Distributed Between Data Centres

Jul 15, 2022 - Documentation Framework

Feb 17, 2022 - How to run different worker types with the Dask Helm Chart

Posted in 2021

Dec 15, 2021 - Reflections on one year as the Dask life science fellow

Dec 01, 2021 - Mosaic Image Fusion

Nov 02, 2021 - Choosing good chunk sizes in Dask

Oct 20, 2021 - CZI EOSS Update

Sep 15, 2021 - 2021 Dask User Survey

Aug 23, 2021 - Google Summer of Code 2021 - Dask Project

Jul 07, 2021 - High Level Graphs update

Jul 02, 2021 - Ragged output, how to handle awkward shaped results

Jun 25, 2021 - Dask Down Under

Jun 18, 2021 - Dask Survey 2021, early anecdotes

Jun 01, 2021 - The evolution of a Dask Distributed user

May 25, 2021 - The 2021 Dask User Survey is out now

May 24, 2021 - Life sciences at the 2021 Dask Summit

May 21, 2021 - Stability of the Dask library

May 07, 2021 - Skeleton analysis

Mar 29, 2021 - Dask with PyTorch for large scale image analysis

Mar 19, 2021 - Image segmentation with Dask

Mar 11, 2021 - Measuring Dask memory usage with dask-memusage

Mar 04, 2021 - Getting to know the life science community

Mar 03, 2021 - Dask User Summit 2021

Posted in 2020

Nov 12, 2020 - Image Analysis Redux

Sep 22, 2020 - 2020 Dask User Survey

Aug 31, 2020 - Announcing the DaskHub Helm Chart

Aug 21, 2020 - Running tutorials

Aug 06, 2020 - Comparing Dask-ML and Ray Tune's Model Selection Algorithms

Jul 30, 2020 - Configuring a Distributed Dask Cluster

Jul 23, 2020 - The current state of distributed Dask clusters

Jul 21, 2020 - Faster Scheduling

Jul 17, 2020 - Last Year in Review

May 13, 2020 - Large SVDs

Apr 28, 2020 - Dask Summit

Jan 14, 2020 - Estimating Users

Posted in 2019

Nov 01, 2019 - Dask Deployment Updates

Oct 08, 2019 - DataFrame Groupby Aggregations

Sep 30, 2019 - Better and faster hyperparameter optimization with Dask

Sep 13, 2019 - Co-locating a Jupyter Server and Dask Scheduler

Aug 28, 2019 - Dask on HPC: a case study

Aug 09, 2019 - Dask and ITK for large scale image analysis

Aug 05, 2019 - 2019 Dask User Survey

Aug 02, 2019 - Dask Release 2.2.0

Jul 23, 2019 - Extracting fsspec from Dask

Jun 22, 2019 - Dask Release 2.0

Jun 20, 2019 - Load Large Image Data with Dask Array

Jun 19, 2019 - Python and GPUs: A Status Update

Jun 12, 2019 - Dask on HPC

Jun 09, 2019 - Experiments in High Performance Networking with UCX and DGX

Apr 09, 2019 - Composing Dask Array with Numba Stencils

Mar 27, 2019 - cuML and Dask hyperparameter optimization

Mar 18, 2019 - Dask and the __array_function__ protocol

Mar 04, 2019 - Building GPU Groupby-Aggregations for Dask

Jan 31, 2019 - Running Dask and MPI programs together

Jan 29, 2019 - Single-Node Multi-GPU Dataframe Joins

Jan 23, 2019 - Dask Release 1.1.0

Jan 22, 2019 - Extension Arrays in Dask DataFrame

Jan 13, 2019 - Dask, Pandas, and GPUs: first steps

Jan 03, 2019 - GPU Dask Arrays, first steps

Posted in 2018

Nov 29, 2018 - Dask Version 1.0

Oct 08, 2018 - Dask-jobqueue

Sep 27, 2018 - Refactor Documentation

Sep 17, 2018 - Dask Development Log

Sep 05, 2018 - Dask Release 0.19.0

Aug 28, 2018 - High level performance of Pandas, Dask, Spark, and Arrow

Aug 07, 2018 - Building SAGA optimization for Dask arrays

Aug 02, 2018 - Dask Development Log

Jul 23, 2018 - Pickle isn't slow, it's a protocol

Jul 17, 2018 - Dask Development Log, Scipy 2018

Jul 16, 2018 - Who uses Dask?

Jul 08, 2018 - Dask Development Log

Jun 26, 2018 - Dask Scaling Limits

Jun 14, 2018 - Dask Release 0.18.0

May 27, 2018 - Beyond Numpy Arrays in Python

Mar 21, 2018 - Dask Release 0.17.2

Feb 28, 2018 - Craft Minimal Bug Reports

Feb 12, 2018 - Dask Release 0.17.0

Feb 09, 2018 - Credit Modeling with Dask

Jan 22, 2018 - Pangeo: JupyterHub, Dask, and XArray on the Cloud

Posted in 2017

Dec 06, 2017 - Dask Development Log

Nov 21, 2017 - Dask Release 0.16.0

Nov 03, 2017 - Optimizing Data Structure Access in Python

Oct 16, 2017 - Streaming Dataframes

Oct 10, 2017 - Notes on Kafka in Python

Sep 24, 2017 - Dask Release 0.15.3

Sep 21, 2017 - Fast GeoSpatial Analysis in Python

Sep 18, 2017 - Dask on HPC - Initial Work

Aug 30, 2017 - Dask Release 0.15.2

Jul 03, 2017 - Dask Benchmarks

Jun 28, 2017 - Use Apache Parquet

Jun 15, 2017 - Dask Release 0.15.0

May 08, 2017 - Dask Release 0.14.3

Apr 28, 2017 - Dask Development Log

Apr 19, 2017 - Asynchronous Optimization Algorithms with Dask

Mar 28, 2017 - Dask and Pandas and XGBoost

Mar 23, 2017 - Dask Release 0.14.1

Mar 22, 2017 - Developing Convex Optimization Algorithms in Dask

Feb 27, 2017 - Dask Release 0.14.0

Feb 20, 2017 - Dask Development Log

Feb 11, 2017 - Experiment with Dask and TensorFlow

Feb 07, 2017 - Two Easy Ways to Use Scikit Learn and Dask

Jan 30, 2017 - Dask Development Log

Jan 24, 2017 - Custom Parallel Algorithms on a Cluster with Dask

Jan 18, 2017 - Dask Development Log

Jan 17, 2017 - Distributed NumPy on a Cluster with Dask Arrays

Jan 12, 2017 - Distributed Pandas on a Cluster with Dask DataFrames

Jan 03, 2017 - Dask Release 0.13.0

Posted in 2016

Dec 24, 2016 - Dask Development Log

Dec 18, 2016 - Dask Development Log

Dec 12, 2016 - Dask Development Log

Dec 05, 2016 - Dask Development Log

Sep 22, 2016 - Dask Cluster Deployments

Sep 13, 2016 - Dask and Celery

Sep 12, 2016 - Dask Distributed Release 1.13.0

Aug 16, 2016 - Dask for Institutions

Jul 12, 2016 - Dask and Scikit-Learn -- Model Parallelism

Apr 20, 2016 - Ad Hoc Distributed Random Forests

Apr 14, 2016 - Fast Message Serialization

Feb 26, 2016 - Distributed Dask Arrays

Feb 22, 2016 - Pandas on HDFS with Dask Dataframes

Feb 17, 2016 - Introducing Dask distributed

Posted in 2015

Dec 21, 2015 - Dask is one year old

Oct 09, 2015 - Distributed Prototype

Aug 03, 2015 - Caching

Jul 23, 2015 - Custom Parallel Workflows

Jun 26, 2015 - Write Complex Parallel Algorithms

Jun 23, 2015 - Distributed Scheduling

May 19, 2015 - State of Dask

Mar 11, 2015 - Towards Out-of-core DataFrames

Feb 17, 2015 - Towards Out-of-core ND-Arrays -- Dask + Toolz = Bag

Feb 13, 2015 - Towards Out-of-core ND-Arrays -- Slicing and Stacking

Jan 16, 2015 - Towards Out-of-core ND-Arrays -- Spilling to Disk

Jan 14, 2015 - Towards Out-of-core ND-Arrays -- Benchmark MatMul

Jan 06, 2015 - Towards Out-of-core ND-Arrays -- Multi-core Scheduling

Posted in 2014

Dec 30, 2014 - Towards Out-of-core ND-Arrays -- Frontend

Dec 27, 2014 - Towards Out-of-core ND-Arrays