Posts tagged dask

Dask DataFrame is Fast Now

This work was engineered and supported by Coiled and NVIDIA. Thanks to Patrick Hoefler and Rick Zamora, in particular. Original version of this post appears on docs.coiled.io

Read more ...


High Level Query Optimization in Dask

This work was engineered and supported by Coiled and NVIDIA. Thanks to Patrick Hoefler and Rick Zamora, in particular. Original version of this post appears on blog.coiled.io

Read more ...


Upstream testing in Dask

Original version of this post appears on blog.coiled.io

Read more ...


Shuffling large data at constant memory in Dask

This work was engineered and supported by Coiled. In particular, thanks to Florian Jetter, Gabe Joseph, Hendrik Makait, and Matt Rocklin. Original version of this post appears on blog.coiled.io

Read more ...


Managing dask workloads with Flyte

It is now possible to manage dask workloads using Flyte 🎉!

Read more ...


Measuring Dask memory usage with dask-memusage

Using too much computing resources can get expensive when you’re scaling up in the cloud.

Read more ...


Comparing Dask-ML and Ray Tune's Model Selection Algorithms

Hyperparameter optimization is the process of deducing model parameters that can’t be learned from data. This process is often time- and resource-consuming, especially in the context of deep learning. A good description of this process can be found at “Tuning the hyper-parameters of an estimator,” and the issues that arise are concisely summarized in Dask-ML’s documentation of “Hyper Parameter Searches.”

Read more ...


DataFrame Groupby Aggregations

Document headings start at H2, not H1 [myst.header]

Read more ...


Dask on HPC

We analyze large datasets on HPC systems with Dask, a parallel computing library that integrates well with the existing Python software ecosystem, and works comfortably with native HPC hardware.

Read more ...


Composing Dask Array with Numba Stencils

In this post we explore four array computing technologies, and how they work together to achieve powerful results.

Read more ...


cuML and Dask hyperparameter optimization

Document headings start at H3, not H1 [myst.header]

Read more ...


Extension Arrays in Dask DataFrame

This work is supported by Anaconda Inc

Read more ...


Dask Version 1.0

We are pleased to announce the release of Dask version 1.0.0!

Read more ...


Refactor Documentation

This work is supported by Anaconda Inc

Read more ...


Dask Development Log

This work is supported by Anaconda Inc

Read more ...


Dask Release 0.19.0

This work is supported by Anaconda Inc.

Read more ...


High level performance of Pandas, Dask, Spark, and Arrow

This work is supported by Anaconda Inc

Read more ...


Building SAGA optimization for Dask arrays

This work is supported by ETH Zurich, Anaconda Inc, and the Berkeley Institute for Data Science

Read more ...


Dask Development Log

This work is supported by Anaconda Inc

Read more ...


Pickle isn't slow, it's a protocol

This work is supported by Anaconda Inc

Read more ...


Dask Development Log, Scipy 2018

This work is supported by Anaconda Inc

Read more ...


Who uses Dask?

This work is supported by Anaconda Inc

Read more ...


Dask Development Log

This work is supported by Anaconda Inc

Read more ...


Dask Scaling Limits

This work is supported by Anaconda Inc.

Read more ...


Dask Release 0.18.0

This work is supported by Anaconda Inc.

Read more ...


Beyond Numpy Arrays in Python

Document headings start at H2, not H1 [myst.header]

Read more ...


Dask Release 0.17.2

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Dask Release 0.17.0

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Pangeo: JupyterHub, Dask, and XArray on the Cloud

This work is supported by Anaconda Inc, the NSF EarthCube program, and UC Berkeley BIDS

Read more ...


Dask Development Log

This work is supported by Anaconda Inc and the Data Driven Discovery Initiative from the Moore Foundation

Read more ...


Dask Release 0.16.0

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Optimizing Data Structure Access in Python

This work is supported by Anaconda Inc and the Data Driven Discovery Initiative from the Moore Foundation

Read more ...


Streaming Dataframes

This work is supported by Anaconda Inc and the Data Driven Discovery Initiative from the Moore Foundation

Read more ...


Notes on Kafka in Python

Document headings start at H2, not H1 [myst.header]

Read more ...


Dask Release 0.15.3

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Fast GeoSpatial Analysis in Python

This work is supported by Anaconda Inc., the Data Driven Discovery Initiative from the Moore Foundation, and NASA SBIR NNX16CG43P

Read more ...


Dask on HPC - Initial Work

This work is supported by Anaconda Inc. and the NSF EarthCube program.

Read more ...


Dask Release 0.15.2

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Dask Benchmarks

This work is supported by Continuum Analytics and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Use Apache Parquet

This work is supported by Continuum Analytics and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Dask Release 0.15.0

This work is supported by Continuum Analytics and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Dask Release 0.14.3

This work is supported by Continuum Analytics and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Dask Release 0.14.1

This work is supported by Continuum Analytics, the XDATA Program, and the Data Driven Discovery Initiative from the Moore Foundation.

Read more ...


Dask Distributed Release 1.13.0

I’m pleased to announce a release of Dask’s distributed scheduler, dask.distributed, version 1.13.0.

Read more ...


Dask for Institutions

Read more ...


Dask and Scikit-Learn -- Model Parallelism

This post was written by Jim Crist. The original post lives at http://jcrist.github.io/dask-sklearn-part-1.html (with better styling)

Read more ...


Ad Hoc Distributed Random Forests

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Fast Message Serialization

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Distributed Dask Arrays

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Pandas on HDFS with Dask Dataframes

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Introducing Dask distributed

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Dask is one year old

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Distributed Prototype

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Caching

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Custom Parallel Workflows

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Write Complex Parallel Algorithms

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Distributed Scheduling

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


State of Dask

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Towards Out-of-core DataFrames

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Towards Out-of-core ND-Arrays -- Dask + Toolz = Bag

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Towards Out-of-core ND-Arrays -- Slicing and Stacking

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Towards Out-of-core ND-Arrays -- Spilling to Disk

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Towards Out-of-core ND-Arrays -- Benchmark MatMul

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Towards Out-of-core ND-Arrays -- Multi-core Scheduling

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Towards Out-of-core ND-Arrays -- Frontend

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...


Towards Out-of-core ND-Arrays

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

Read more ...