Please take the Dask User Survey for 2019. Your reponse helps to prioritize future work.
We are pleased to announce the release of Dask version 2.0. This is a major release with bug fixes and new features.
Most major version changes of software signal many new and exciting features. That is not the case with this release. Instead, we’re bumping the major version number because we’ve broken a few APIs to improve maintainability, and because we decided to drop support for Python 2.
This blogpost outlines these changes.
Install#
As always, you can conda install Dask:
conda install dask
or pip install from PyPI:
pip install "dask[complete]" --upgrade
Full changelogs are available here:
Drop support for Python 2#
Python 2 reaches end of life in 2020, just six months away. Most major PyData projects are dropping Python 2 support around now. See the Python 3 Statement for more details about some of your favorite projects.
Python 2 users can continue to use older versions of Dask, which are in widespread use today. Institutions looking for long term support of Dask in Python 2 may wish to reach out to for-profit consulting companies, like Quansight.
Dropping Python 2 will allow maintainers to spend more of their time fixing bugs and developing new features. It will also allow the project to adopt more modern development practices going forward.
Small breaking changes#
We now include a list with a brief description of most of the breaking changes:
The distributed.bokeh module has moved to distributed.dashboard
Various
ncoreskeywords have been moved tonthreadsClient.map/gather/scatter no longer accept iterators and Python queue objects. Users can handle this themselves with
submit/as_completedor can use the Streamz library.The worker
/mainroute has moved to/statusCluster.workers is now a dictionary mapping worker name to worker, rather than a list as it was before
Some larger fun changes#
We didn’t only break things. We also added some new things :)
Array metadata#
Previously Dask Arrays were defined by their shape, chunkshape, and datatype, like float, int, and so on.
Now, Dask Arrays also know the type of their chunks. Historically this was
almost always a NumPy array, so it didn’t make sense to store, but now that
Dask Arrays are being used more frequently with sparse array chunks and GPU
array chunks we now maintain this information as well in a ._meta attribute.
This is already how Dask dataframes work, so it should be familiar to advanced
users of that module.
>>> import dask.array as da
>>> x = da.eye(1000000)
>>> x._meta
array([], shape=(0, 0), dtype=float64)
>>> import sparse
>>> s = x.map_blocks(sparse.COO.from_numpy)
>>> s._meta
<COO: shape=(0, 0), dtype=float64, nnz=0, fill_value=0.0>
This work was largely done by Peter Entschev
Array HTML output#
Dask arrays now print themselves nicely in Jupyter notebooks, showing a table of information about their size and chunk size, and also a visual diagram of their chunk structure.
import dask.array as da
x = da.ones((10000, 1000, 1000))
|
Proxy Worker dashboards from the Scheduler dashboard#
If you’ve used Dask.distributed they you’re probably familiar with Dask’s scheduler dashboard, which shows the state of computations on the cluster with a real-time interactive Bokeh dashboard. However you may not be aware that Dask workers also have their own dashboard, which shows a completely separate set of plots for the state of that individual worker.
Historically these worker dashboards haven’t been as commonly used because it’s hard to connect to them. Users don’t know their address, or network rules don’t enable direct web connections. Fortunately, the scheduler dashboard is now able to proxy a connection from the user to the worker dashbaord.
You can access this by clicking on the “Info” tab and then selecting the “dashboard” link next to any of the workers. You will need to also install jupyter-server-proxy
pip install jupyter-server-proxy
Thanks to Ben Zaitlen for this fun addtition. We hope that now that these plots are made more visible, people will invest more into developing plots for them.
Black everywhere#
We now use the Black code formatter throughout most Dask repositories. These repositories include pre-commit hooks, which we recommend when developing on the project.
cd /path/to/dask
git checkout master
git pull upstream master
pip install pre-commit
pre-commit install
Git will then call black and flake8 whenever you attempt to commit code.
Dask Gateway#
We would also like to inform readers about the somewhat new Dask Gateway project that enables institutions and IT to control many Dask clusters for a variety of users.
Acknowledgements#
There have been several releases since the last time we had a release blogpost. The following people contributed to the following repositories since the 1.1.0 release on January 23rd:
-
(Rick) Richard J Zamora
Abhinav Ralhan
Adam Beberg
Alistair Miles
Álvaro Abella Bascarán
Anderson Banihirwe
Aploium
Bart Broere
Benjamin Zaitlen
Bouwe Andela
Brett Naul
Brian Chu
Bruce Merry
Christian Hudon
Cody Johnson
Dan O’Donovan
Daniel Saxton
Daniel Severo
Danilo Horta
Dimplexion
Elliott Sales de Andrade
Endre Mark Borza
Genevieve Buckley
George Sakkis
Guillaume Lemaitre
HSR05
Hameer Abbasi
Henrique Ribeiro
Henry Pinkard
Hugo
Ian Bolliger
Ian Rose
Isaiah Norton
James Bourbeau
Janne Vuorela
John Kirkham
Jim Crist
Joe Corbett
Jorge Pessoa
Julia Signell
JulianWgs
Justin Poehnelt
Justin Waugh
Ksenia Bobrova
Lijo Jose
Marco Neumann
Mark Bell
Martin Durant
Matthew Rocklin
Michael Eaton
Michał Jastrzębski
Nathan Matare
Nick Becker
Paweł Kordek
Peter Andreas Entschev
Philipp Rudiger
Philipp S. Sommer
Roma Sokolov
Ross Petchler
Scott Sievert
Shyam Saladi
Søren Fuglede Jørgensen
Thomas Zilio
Tom Augspurger
Yu Feng
aaronfowles
amerkel2
asmith26
btw08
gregrf
mbarkhau
mcsoini
severo
tpanza
-
Adam Beberg
Benjamin Zaitlen
Brett Jurman
Brett Randall
Brian Chu
Caleb
Chris White
Daniel Farrell
Elliott Sales de Andrade
George Sakkis
James Bourbeau
Jim Crist
John Kirkham
K.-Michael Aye
Loïc Estève
Magnus Nord
Manuel Garrido
Marco Neumann
Martin Durant
Mathieu Dugré
Matt Nicolls
Matthew Rocklin
Michael Delgado
Michael Spiegel
Muammar El Khatib
Nikos Tsaousis
Olivier Grisel
Peter Andreas Entschev
Sam Grayson
Scott Sievert
Tom Augspurger
Torsten Wörtwein
amerkel2
condoratberlin
deepthirajagopalan7
jukent
plbertrand
-
Alejandro
Florian Rohrer
James Bourbeau
Julien Jerphanion
Matthew Rocklin
Nathan Henrie
Paul Vecchio
Ryan McCormick
Saadullah Amin
Scott Sievert
Sriharsha Atyam
Tom Augspurger
-
Andrea Zonca
Guillaume Eynard-Bontemps
Kyle Husmann
Levi Naden
Loïc Estève
Matthew Rocklin
Matyas Selmeci
ocaisa
-
Brian Phillips
Jacob Tomlinson
Jim Crist
Joe Hamman
Joseph Hamman
Matthew Rocklin
Tom Augspurger
Yuvi Panda
adam
-
Christoph Deil
Genevieve Buckley
Ian Rose
Martin Durant
Matthew Rocklin
Matthias Bussonnier
Robert Sare
Tom Augspurger
Willi Rath
-
Daniel Bast
Ian Rose
Matthew Rocklin
Yuvi Panda
Comments
comments powered by Disqus