Posts by Rick Zamora

Do you need consistent environments between the client, scheduler and workers?

Update May 3rd 2023: Clarify GPU recommendations.

Read more ...


Deep Dive into creating a Dask DataFrame Collection with from_map

Dask DataFrame provides dedicated IO functions for several popular tabular-data formats, like CSV and Parquet. If you are working with a supported format, then the corresponding function (e.g read_csv) is likely to be the most reliable way to create a new Dask DataFrame collection. For other workflows, from_map now offers a convenient way to define a DataFrame collection as an arbitrary function mapping. While these kinds of workflows have historically required users to adopt the Dask Delayed API, from_map now makes custom collection creation both easier and more performant.

Read more ...


Experiments in High Performance Networking with UCX and DGX

This post is about experimental and rapidly changing software. Code examples in this post should not be relied upon to work in the future.

Read more ...