Posted in 2023

High Level Query Optimization in Dask

Aug 25, 2023

This work was engineered and supported by Coiled and NVIDIA. Thanks to Patrick Hoefler and Rick Zamora, in particular. Original version of this post appears on blog.coiled.io

Read more ...

Upstream testing in Dask

Apr 18, 2023

Original version of this post appears on blog.coiled.io

Read more ...

Do you need consistent environments between the client, scheduler and workers?

Apr 14, 2023

Update May 3rd 2023: Clarify GPU recommendations.

Read more ...

Deep Dive into creating a Dask DataFrame Collection with from_map

Apr 12, 2023

Dask DataFrame provides dedicated IO functions for several popular tabular-data formats, like CSV and Parquet. If you are working with a supported format, then the corresponding function (e.g read_csv) is likely to be the most reliable way to create a new Dask DataFrame collection. For other workflows, from_map now offers a convenient way to define a DataFrame collection as an arbitrary function mapping. While these kinds of workflows have historically required users to adopt the Dask Delayed API, from_map now makes custom collection creation both easier and more performant.

Read more ...

Shuffling large data at constant memory in Dask

Mar 15, 2023

This work was engineered and supported by Coiled. In particular, thanks to Florian Jetter, Gabe Joseph, Hendrik Makait, and Matt Rocklin. Original version of this post appears on blog.coiled.io

Read more ...

Managing dask workloads with Flyte

Feb 13, 2023

It is now possible to manage dask workloads using Flyte 🎉!

Read more ...

Easy CPU/GPU Arrays and Dataframes

Feb 02, 2023

This article was originally posted on the RAPIDS blog.

Read more ...

Tags

Archives

High Level Query Optimization in Dask

Upstream testing in Dask

Do you need consistent environments between the client, scheduler and workers?

Deep Dive into creating a Dask DataFrame Collection with from_map

Shuffling large data at constant memory in Dask

Managing dask workloads with Flyte

Easy CPU/GPU Arrays and Dataframes