Posts tagged IO

Do you need consistent environments between the client, scheduler and workers?

Update May 3rd 2023: Clarify GPU recommendations.

Read more ...


Deep Dive into creating a Dask DataFrame Collection with from_map

Dask DataFrame provides dedicated IO functions for several popular tabular-data formats, like CSV and Parquet. If you are working with a supported format, then the corresponding function (e.g read_csv) is likely to be the most reliable way to create a new Dask DataFrame collection. For other workflows, from_map now offers a convenient way to define a DataFrame collection as an arbitrary function mapping. While these kinds of workflows have historically required users to adopt the Dask Delayed API, from_map now makes custom collection creation both easier and more performant.

Read more ...


Extracting fsspec from Dask

Document headings start at H2, not H1 [myst.header]

Read more ...