I’m pleased to announce the release of Dask version 1.1.0. This is a major release with bug fixes and new features. The last release was 1.0.0 on 2018-11-29. This blogpost outlines notable changes since the last release.
You can conda install Dask:
conda install dask
or pip install from PyPI:
pip install dask[complete] --upgrade
Full changelogs are available here:
Notable Changes#
A lot of work has happened over the last couple months, and we encourage people to look through the changelog to get a sense of the kinds of incremental changes that developers are working on.
There are also a few notable changes in this release that we’ll highlight here:
Support for the recent Numpy 1.16 and Pandas 0.24 releases
Support for Pandas Extension Arrays (see Tom Augspurger’s post on the topic)
High level graph in Dask dataframe and operator fusion in simple cases
Increased support for other libraries that look enough like Numpy and Pandas to work within Dask Array/Dataframe
Support for Numpy 1.16 and Pandas 0.24#
Both Numpy and Pandas have been evolving quickly over the last few months. We’re excited about the changes to extensibility arriving in both libraries. The Dask array/dataframe submodules have been updated to work well with these recent changes.
Pandas Extension Arrays#
In particular Dask Dataframe supports Pandas Extension arrays, meaning that it’s easier to use third party Pandas packages like CyberPandas or Fletcher in parallel with Dask Dataframe.
For more information see Tom Augspurger’s post
High Level Graphs in Dask Dataframe#
For a while Dask array has had some high level graphs for “atop” operations (elementwise, broadcasting, transpose, tensordot, reductions), which allow for reduced overhead and task fusion on computations within this class.
y = da.exp(x + 1).T # These operations get fused to a single task
We’ve renamed atop to blockwise to be a bit more generic, and have also
started applying it to Dask Dataframe, which helps to reduce overhead
substantially when doing computations with many simple operations.
This still needs to be improved to increase the class of cases where it works, but we’re already seeing nice speedups on previously unseen workloads.
The da.atop function has been deprecated in favor of da.blockwise. There
is now also a dd.blockwise which shares a common code path.
Non-Pandas dataframe and Non-Numpy array types#
We’re working to make Dask a bit more agnostic to the types of in-memory array and dataframe objects that it can manipulate. Rather than having Dask Array be a grid of Numpy arrays and Dask Dataframe be a sequence of Pandas dataframes, we’re relaxing that constraint to a grid of Numpy-like arrays and a sequence of Pandas-like dataframes.
This is an ongoing effort that has targetted alternate backends like
scipy.sparse, pydata/sparse, cupy, cudf and other systems.
There were some recent posts on arrays and dataframes that show proofs of concept for this with GPUs.
Acknowledgements#
There have been several releases since the last time we had a release blogpost. The following people contributed to the dask/dask repository since the 0.19.0 release on September 5th:
Anderson Banihirwe
Antonino Ingargiola
Armin Berres
Bart Broere
Carlos Valiente
Daniel Li
Daniel Saxton
David Hoese
Diane Trout
Damien Garaud
Elliott Sales de Andrade
Eric Wolak
Gábor Lipták
Guido Imperiale
Guillaume Eynard-Bontemps
Itamar Turner-Trauring
James Bourbeau
Jan Koch
Javad
Jendrik Jördening
Jim Crist
Jonathan Fraine
John Kirkham
Johnnie Gray
Julia Signell
Justin Dennison
M. Farrajota
Marco Neumann
Mark Harfouche
Markus Gonser
Martin Durant
Matthew Rocklin
Matthias Bussonnier
Mina Farid
Paul Vecchio
Prabakaran Kumaresshan
Rahul Vaidya
Stephan Hoyer
Stuart Berg
TakaakiFuruse
Takahiro Kojima
Tom Augspurger
Yu Feng
Zhenqing Li
@milesial
@samc0de
@slnguyen
The following people contributed to the dask/distributed repository since the 0.19.0 release on September 5th:
Adam Klein
Brett Naul
Daniel Farrell
Diane Trout
Dirk Petersen
Eric Ma
Jim Crist
John Kirkham
Gaurav Sheni
Guillaume Eynard-Bontemps
Loïc Estève
Marius van Niekerk
Matthew Rocklin
Michael Wheeler
MikeG
NotSqrt
Peter Killick
Roy Wedge
Russ Bubley
Stephan Hoyer
@tjb900
Tom Rochette
@fjetter
Comments
comments powered by Disqus