Posts tagged IO
Do you need consistent environments between the client, scheduler and workers?
- Apr 14, 2023
Update May 3rd 2023: Clarify GPU recommendations.
Deep Dive into creating a Dask DataFrame Collection with from_map
- Apr 12, 2023
Dask DataFrame provides dedicated IO functions for several popular tabular-data formats, like CSV and Parquet. If you are working with a supported format, then the corresponding function (e.g read_csv) is likely to be the most reliable way to create a new Dask DataFrame collection. For other workflows, from_map now offers a convenient way to define a DataFrame collection as an arbitrary function mapping. While these kinds of workflows have historically required users to adopt the Dask Delayed API, from_map now makes custom collection creation both easier and more performant.