<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.dask.org</id>
  <title>Dask Working Notes - Posts by &lt;a href="https://stsievert.com"&gt;Scott Sievert&lt;/a&gt; (University of Wisconsin–Madison)</title>
  <updated>2026-03-05T15:05:19.410300+00:00</updated>
  <link href="https://blog.dask.org"/>
  <link href="https://blog.dask.org/blog/author/a-hrefhttpsstsievertcomscott-sieverta-university-of-wisconsinmadison/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://blog.dask.org/2020/08/06/ray-tune/</id>
    <title>Comparing Dask-ML and Ray Tune's Model Selection Algorithms</title>
    <updated>2020-08-06T00:00:00+00:00</updated>
    <author>
      <name>&lt;a href="https://stsievert.com"&gt;Scott Sievert&lt;/a&gt; (University of Wisconsin–Madison)</name>
    </author>
    <content type="html">&lt;p&gt;Hyperparameter optimization is the process of deducing model parameters that
can’t be learned from data. This process is often time- and resource-consuming,
especially in the context of deep learning. A good description of this process
can be found at “&lt;a class="reference external" href="https://scikit-learn.org/stable/modules/grid_search.html"&gt;Tuning the hyper-parameters of an estimator&lt;/a&gt;,” and
the issues that arise are concisely summarized in Dask-ML’s documentation of
“&lt;a class="reference external" href="https://ml.dask.org/hyper-parameter-search.html"&gt;Hyper Parameter Searches&lt;/a&gt;.”&lt;/p&gt;
&lt;p&gt;There’s a host of libraries and frameworks out there to address this problem.
&lt;a class="reference external" href="https://scikit-learn.org/stable/modules/grid_search.html"&gt;Scikit-Learn’s module&lt;/a&gt; has been mirrored &lt;a class="reference external" href="https://ml.dask.org/hyper-parameter-search.html"&gt;in Dask-ML&lt;/a&gt; and
&lt;a class="reference external" href="https://automl.github.io/auto-sklearn/master/"&gt;auto-sklearn&lt;/a&gt;, both of which offer advanced hyperparameter optimization
techniques. Other implementations that don’t follow the Scikit-Learn interface
include &lt;a class="reference external" href="https://docs.ray.io/en/master/tune.html"&gt;Ray Tune&lt;/a&gt;, &lt;a class="reference external" href="https://www.automl.org/"&gt;AutoML&lt;/a&gt; and &lt;a class="reference external" href="https://medium.com/optuna/optuna-supports-hyperband-93b0cae1a137"&gt;Optuna&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.ray.io"&gt;Ray&lt;/a&gt; recently provided a wrapper to &lt;a class="reference external" href="https://docs.ray.io/en/master/tune.html"&gt;Ray Tune&lt;/a&gt; that mirrors the Scikit-Learn
API called tune-sklearn (&lt;a class="reference external" href="https://docs.ray.io/en/master/tune/api_docs/sklearn.html"&gt;docs&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/ray-project/tune-sklearn"&gt;source&lt;/a&gt;). &lt;a class="reference external" href="https://medium.com/distributed-computing-with-ray/gridsearchcv-2-0-new-and-improved-ee56644cbabf"&gt;The introduction&lt;/a&gt; of this library
states the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Cutting edge hyperparameter tuning techniques (Bayesian optimization, early
stopping, distributed execution) can provide significant speedups over grid
search and random search.&lt;/p&gt;
&lt;p&gt;However, the machine learning ecosystem is missing a solution that provides
users with the ability to leverage these new algorithms while allowing users
to stay within the Scikit-Learn API. In this blog post, we introduce
tune-sklearn [Ray’s tuning library] to bridge this gap. Tune-sklearn is a
drop-in replacement for Scikit-Learn’s model selection module with
state-of-the-art optimization features.&lt;/p&gt;
&lt;p&gt;—&lt;a class="reference external" href="https://medium.com/distributed-computing-with-ray/gridsearchcv-2-0-new-and-improved-ee56644cbabf"&gt;GridSearchCV 2.0 — New and Improved&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;This claim is inaccurate: for over a year Dask-ML has provided access to
“cutting edge hyperparameter tuning techniques” with a Scikit-Learn compatible
API. To correct their statement, let’s look at each of the features that Ray’s
tune-sklearn provides, and compare them to Dask-ML:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Here’s what [Ray’s] tune-sklearn has to offer:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consistency with Scikit-Learn API&lt;/strong&gt; …&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Modern hyperparameter tuning techniques&lt;/strong&gt; …&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Framework support&lt;/strong&gt; …&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scale up&lt;/strong&gt; … [to] multiple cores and even multiple machines.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;[Ray’s] Tune-sklearn is also &lt;strong&gt;fast&lt;/strong&gt;.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Dask-ML’s model selection module has every one of the features:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consistency with Scikit-Learn API:&lt;/strong&gt; Dask-ML’s model selection API
mirrors the Scikit-Learn model selection API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Modern hyperparameter tuning techniques:&lt;/strong&gt; Dask-ML offers state-of-the-art
hyperparameter tuning techniques.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Framework support:&lt;/strong&gt; Dask-ML model selection supports many libraries
including Scikit-Learn, PyTorch, Keras, LightGBM and XGBoost.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scale up:&lt;/strong&gt; Dask-ML supports distributed tuning (how could it not?) and
larger-than-memory datasets.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dask-ML is also &lt;strong&gt;fast.&lt;/strong&gt; In “&lt;a class="reference internal" href="#speed"&gt;&lt;span class="xref myst"&gt;Speed&lt;/span&gt;&lt;/a&gt;” we show a benchmark between
Dask-ML, Ray and Scikit-Learn:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/2020-model-selection/n_workers=8.png" width="450px"
 /&gt;&lt;/p&gt;
&lt;p&gt;Only time-to-solution is relevant; all of these methods produce similar model
scores. See “&lt;a class="reference internal" href="#speed"&gt;&lt;span class="xref myst"&gt;Speed&lt;/span&gt;&lt;/a&gt;” for details.&lt;/p&gt;
&lt;p&gt;Now, let’s walk through the details on how to use Dask-ML to obtain the 5
features above.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/06/ray-tune.md&lt;/span&gt;, line 95)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;section id="consistency-with-the-scikit-learn-api"&gt;

&lt;p&gt;&lt;em&gt;Dask-ML is consistent with the Scikit-Learn API.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Here’s how to use Scikit-Learn’s, Dask-ML’s and Ray’s tune-sklearn
hyperparameter optimization:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;## Trimmed example; see appendix for more detail&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomizedSearchCV&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RandomizedSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_ml.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;tune_sklearn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TuneSearchCV&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TuneSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The definitions of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;model&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;params&lt;/span&gt;&lt;/code&gt; follow the normal Scikit-Learn
definitions as detailed in the &lt;a class="reference internal" href="#full-example-usage"&gt;&lt;span class="xref myst"&gt;appendix&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Clearly, both Dask-ML and Ray’s tune-sklearn are Scikit-Learn compatible. Now
let’s focus on how each search performs and how it’s configured.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/06/ray-tune.md&lt;/span&gt;, line 126)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="modern-hyperparameter-tuning-techniques"&gt;
&lt;h1&gt;Modern hyperparameter tuning techniques&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Dask-ML offers state-of-the-art hyperparameter tuning techniques
in a Scikit-Learn interface.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://medium.com/distributed-computing-with-ray/gridsearchcv-2-0-new-and-improved-ee56644cbabf"&gt;The introduction&lt;/a&gt; of Ray’s tune-sklearn made this claim:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;tune-sklearn is the only
Scikit-Learn interface that allows you to easily leverage Bayesian
Optimization, HyperBand and other optimization techniques by simply toggling a few parameters.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The state-of-the-art in hyperparameter optimization is currently
“&lt;a class="reference external" href="https://arxiv.org/pdf/1603.06560.pdf"&gt;Hyperband&lt;/a&gt;.” Hyperband reduces the amount of computation
required with a &lt;em&gt;principled&lt;/em&gt; early stopping scheme; past that, it’s the same as
Scikit-Learn’s popular &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;RandomizedSearchCV&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Hyperband &lt;em&gt;works.&lt;/em&gt; As such, it’s very popular. After the introduction of
Hyperband in 2016 by Li et. al, &lt;a class="reference external" href="https://arxiv.org/pdf/1603.06560.pdf"&gt;the paper&lt;/a&gt; has been cited
&lt;a class="reference external" href="https://scholar.google.com/scholar?cites=10473284631669296057&amp;amp;amp;as_sdt=5,39&amp;amp;amp;sciodt=0,39&amp;amp;amp;hl=en"&gt;over 470 times&lt;/a&gt; and has been implemented in many different libraries
including &lt;a class="reference external" href="https://ml.dask.org/modules/generated/dask_ml.model_selection.HyperbandSearchCV.html#dask_ml.model_selection.HyperbandSearchCV"&gt;Dask-ML&lt;/a&gt;, &lt;a class="reference external" href="https://docs.ray.io/en/master/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler"&gt;Ray Tune&lt;/a&gt;, &lt;a class="reference external" href="https://keras-team.github.io/keras-tuner/documentation/tuners/#hyperband-class"&gt;keras-tune&lt;/a&gt;, &lt;a class="reference external" href="https://medium.com/optuna/optuna-supports-hyperband-93b0cae1a137"&gt;Optuna&lt;/a&gt;,
&lt;a class="reference external" href="https://www.automl.org/"&gt;AutoML&lt;/a&gt;,&lt;a class="footnote-reference brackets" href="#automl" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and &lt;a class="reference external" href="https://nni.readthedocs.io/en/latest/Tuner/HyperbandAdvisor.html"&gt;Microsoft’s NNI&lt;/a&gt;. The original paper shows a
rather drastic improvement over all the relevant
implementations,&lt;a class="footnote-reference brackets" href="#hyperband-figs" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and this drastic improvement persists in
follow-up works.&lt;a class="footnote-reference brackets" href="#follow-up" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; Some illustrative results from Hyperband are
below:&lt;/p&gt;
&lt;p&gt;&lt;img width="80%" src="/images/2020-model-selection/hyperband-fig-7-8.png"
 style="display: block; margin-left: auto; margin-right: auto;" /&gt;&lt;/p&gt;
&lt;div style="max-width: 80%; word-wrap: break-word;" style="text-align: center;"&gt;
&lt;p&gt;&lt;sup&gt;All algorithms are configured to do the same amount of work except “random
2x” which does twice as much work. “hyperband (finite)” is similar Dask-ML’s
default implementation, and “bracket s=4” is similar to Ray’s default
implementation. “random” is a random search. SMAC,&lt;a class="footnote-reference brackets" href="#smac" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;
spearmint,&lt;a class="footnote-reference brackets" href="#spearmint" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and TPE&lt;a class="footnote-reference brackets" href="#tpe" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; are popular Bayesian algorithms. &lt;/sup&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Hyperband is undoubtedly a “cutting edge” hyperparameter optimization
technique. Dask-ML and Ray offer Scikit-Learn implementations of this algorithm
that rely on similar implementations, and Dask-ML’s implementation also has a
&lt;a class="reference external" href="https://ml.dask.org/hyper-parameter-search.html#hyperband-parameters-rule-of-thumb"&gt;rule of thumb&lt;/a&gt; for configuration. Both Dask-ML’s and Ray’s documentation
encourages use of Hyperband.&lt;/p&gt;
&lt;p&gt;Ray does support using their Hyperband implementation on top of a technique
called Bayesian sampling. This changes the hyperparameter sampling scheme for
model initialization. This can be used in conjunction with Hyperband’s early
stopping scheme. Adding this option to Dask-ML’s Hyperband implementation is
future work for Dask-ML.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/06/ray-tune.md&lt;/span&gt;, line 222)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="framework-support"&gt;
&lt;h1&gt;Framework support&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Dask-ML model selection supports many libraries including Scikit-Learn, PyTorch, Keras, LightGBM and XGBoost.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Ray’s tune-sklearn supports these frameworks:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;tune-sklearn is used primarily for tuning
Scikit-Learn models, but it also supports and provides examples for many
other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch),
KerasClassifiers (Keras), and XGBoostClassifiers (XGBoost).&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Clearly, both Dask-ML and Ray support the many of the same libraries.&lt;/p&gt;
&lt;p&gt;However, both Dask-ML and Ray have some qualifications. Certain libraries don’t
offer an implementation of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt;,&lt;a class="footnote-reference brackets" href="#ray-pf" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; so not all of the modern
hyperparameter optimization techniques can be offered. Here’s a table comparing
different libraries and their support in Dask-ML’s model selection and Ray’s
tune-sklearn:&lt;/p&gt;
&lt;div class="pst-scrollable-table-container"&gt;&lt;table class="table"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head text-center"&gt;&lt;p&gt;Model Library&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-center"&gt;&lt;p&gt;Dask-ML support&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-center"&gt;&lt;p&gt;Ray support&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-center"&gt;&lt;p&gt;Dask-ML: early stopping?&lt;/p&gt;&lt;/th&gt;
&lt;th class="head text-center"&gt;&lt;p&gt;Ray: early stopping?&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-center"&gt;&lt;p&gt;&lt;a class="reference external" href="https://scikit-learn.org/"&gt;Scikit-Learn&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔*&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔*&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td class="text-center"&gt;&lt;p&gt;&lt;a class="reference external" href="https://pytorch.org/"&gt;PyTorch&lt;/a&gt; (via &lt;a class="reference external" href="https://skorch.readthedocs.io/"&gt;Skorch&lt;/a&gt;)&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-center"&gt;&lt;p&gt;&lt;a class="reference external" href="https://keras.io/"&gt;Keras&lt;/a&gt; (via &lt;a class="reference external" href="https://github.com/adriangb/scikeras"&gt;SciKeras&lt;/a&gt;)&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔**&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔**&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td class="text-center"&gt;&lt;p&gt;&lt;a class="reference external" href="https://lightgbm.readthedocs.io/"&gt;LightGBM&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;❌&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;❌&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td class="text-center"&gt;&lt;p&gt;&lt;a class="reference external" href="https://xgboost.ai/"&gt;XGBoost&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;✔&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;❌&lt;/p&gt;&lt;/td&gt;
&lt;td class="text-center"&gt;&lt;p&gt;❌&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;&lt;sup&gt;* Only for &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/computing.html#incremental-learning"&gt;the models that implement &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt;&lt;/a&gt;.&lt;/sup&gt;&lt;br&gt;
&lt;sup&gt;** Thanks to work by the Dask developers around &lt;a class="reference external" href="https://github.com/adriangb/scikeras/issues/24"&gt;scikeras#24&lt;/a&gt;.&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;By this measure, Dask-ML and Ray model selection have the same level of
framework support. Of course, Dask has tangential integration with LightGBM and
XGBoost through &lt;a class="reference external" href="https://ml.dask.org/xgboost.html"&gt;Dask-ML’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;xgboost&lt;/span&gt;&lt;/code&gt; module&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/dask/dask-lightgbm"&gt;dask-lightgbm&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/06/ray-tune.md&lt;/span&gt;, line 272)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="scale-up"&gt;
&lt;h1&gt;Scale up&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Dask-ML supports distributed tuning (how could it not?), aka parallelization
across multiple machines/cores. In addition, it also supports
larger-than-memory data.&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;[Ray’s] Tune-sklearn leverages Ray Tune, a library for distributed
hyperparameter tuning, to efficiently and transparently parallelize cross
validation on multiple cores and even multiple machines.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Naturally, Dask-ML also scales to multiple cores/machines because it relies on
Dask. Dask has wide support for &lt;a class="reference external" href="https://docs.dask.org/en/latest/setup.html"&gt;different deployment options&lt;/a&gt; that span
from your personal machine to supercomputers. Dask will very likely work on top
of any computing system you have available, including Kubernetes, SLURM, YARN
and Hadoop clusters as well as your personal machine.&lt;/p&gt;
&lt;p&gt;Dask-ML’s model selection also scales to larger-than-memory datasets, and is
thoroughly tested. Support for larger-than-memory data is untested in Ray, and
there are no examples detailing how to use Ray Tune with the distributed
dataset implementations in PyTorch/Keras.&lt;/p&gt;
&lt;p&gt;In addition, I have benchmarked Dask-ML’s model selection module to see how the
time-to-solution is affected by the number of Dask workers in “&lt;a class="reference external" href="https://blog.dask.org/2019/09/30/dask-hyperparam-opt"&gt;Better and
faster hyperparameter optimization with Dask&lt;/a&gt;.” That is, how does the
time to reach a particular accuracy scale with the number of workers &lt;span class="math notranslate nohighlight"&gt;\(P\)&lt;/span&gt;? At
first, it’ll scale like &lt;span class="math notranslate nohighlight"&gt;\(1/P\)&lt;/span&gt; but with large number of workers the serial
portion will dictate time to solution according to &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Amdahl%27s_law"&gt;Amdahl’s Law&lt;/a&gt;. Briefly, I
found Dask-ML’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; speedup started to saturate around 24
workers for a particular search.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/06/ray-tune.md&lt;/span&gt;, line 311)&lt;/p&gt;
&lt;p&gt;Document headings start at H3, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="speed"&gt;
&lt;h1&gt;Speed&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Both Dask-ML and Ray are much faster than Scikit-Learn.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Ray’s tune-sklearn runs some benchmarks in &lt;a class="reference external" href="https://medium.com/distributed-computing-with-ray/gridsearchcv-2-0-new-and-improved-ee56644cbabf"&gt;the introduction&lt;/a&gt; with the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;GridSearchCV&lt;/span&gt;&lt;/code&gt; class found in Scikit-Learn and Dask-ML. A more fair benchmark
would be use Dask-ML’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; because it is almost the same as the
algorithm in Ray’s tune-sklearn. To be specific, I’m interested in comparing
these methods:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Scikit-Learn’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;RandomizedSearchCV&lt;/span&gt;&lt;/code&gt;. This is a popular implementation, one
that I’ve bootstrapped myself with a custom model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dask-ML’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;. This is an early stopping technique for
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;RandomizedSearchCV&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ray tune-sklearn’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;TuneSearchCV&lt;/span&gt;&lt;/code&gt;. This is a slightly different early
stopping technique than &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt;’s.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each search is configured to perform the same task: sample 100 parameters and
train for no longer than 100 “epochs” or passes through the
data.&lt;a class="footnote-reference brackets" href="#random-search" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; Each estimator is configured as their respective
documentation suggests. Each search uses 8 workers with a single cross
validation split, and a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt; call takes one second with 50,000
examples. The complete setup can be found in &lt;a class="reference internal" href="#appendix"&gt;&lt;span class="xref myst"&gt;the appendix&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here’s how long each library takes to complete the same search:&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/2020-model-selection/n_workers=8.png" width="450px"
 /&gt;&lt;/p&gt;
&lt;p&gt;Notably, we didn’t improve the Dask-ML codebase for this benchmark, and ran the
code as it’s been for the last year.&lt;a class="footnote-reference brackets" href="#priority-impl" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; Regardless, it’s possible that
other artifacts from &lt;a class="reference external" href="http://matthewrocklin.com/blog/work/2017/03/09/biased-benchmarks"&gt;biased benchmarks&lt;/a&gt; crept into this benchmark.&lt;/p&gt;
&lt;p&gt;Clearly, Ray and Dask-ML offer similar performance for 8 workers when compared
with Scikit-Learn. To Ray’s credit, their implementation is ~15% faster than
Dask-ML’s with 8 workers. We suspect that this performance boost comes from the
fact that Ray implements an asynchronous variant of Hyperband. We should
investigate this difference between Dask and Ray, and how each balances the
tradeoffs, number FLOPs vs. time-to-solution. This will vary with the number
of workers: the asynchronous variant of Hyperband provides no benefit if used
with a single worker.&lt;/p&gt;
&lt;p&gt;Dask-ML reaches scores quickly in serial environments, or when the number of
workers is small. Dask-ML prioritizes fitting high scoring models: if there are
100 models to fit and only 4 workers available, Dask-ML selects the models that
have the highest score. This is most relevant in serial
environments;&lt;a class="footnote-reference brackets" href="#priority" id="id10" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; see “&lt;a class="reference external" href="https://blog.dask.org/2019/09/30/dask-hyperparam-opt"&gt;Better and faster hyperparameter optimization
with Dask&lt;/a&gt;” for benchmarks. This feature is omitted from this
benchmark, which only focuses on time to solution.&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/06/ray-tune.md&lt;/span&gt;, line 377)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;Dask-ML and Ray offer the same features for model selection: state-of-the-art
features with a Scikit-Learn compatible API, and both implementations have
fairly wide support for different frameworks and rely on backends that can
scale to many machines.&lt;/p&gt;
&lt;p&gt;In addition, the Ray implementation has provided motivation for further
development, specifically on the following items:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Adding support for more libraries, including Keras&lt;/strong&gt; (&lt;a class="reference external" href="https://github.com/dask/dask-ml/issues/696"&gt;dask-ml#696&lt;/a&gt;,
&lt;a class="reference external" href="https://github.com/dask/dask-ml/pull/713"&gt;dask-ml#713&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/adriangb/scikeras/issues/24"&gt;scikeras#24&lt;/a&gt;). SciKeras is a Scikit-Learn wrapper for
Keras that (now) works with Dask-ML model selection because SciKeras models
implement the Scikit-Learn model API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Better documenting the models that Dask-ML supports&lt;/strong&gt;
(&lt;a class="reference external" href="https://github.com/dask/dask-ml/pull/699"&gt;dask-ml#699&lt;/a&gt;). Dask-ML supports any model that implement the
Scikit-Learn interface, and there are wrappers for Keras, PyTorch, LightGBM
and XGBoost. Now, &lt;a class="reference external" href="https://ml.dask.org"&gt;Dask-ML’s documentation&lt;/a&gt; prominently highlights this
fact.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The Ray implementation has also helped motivate and clarify future work.
Dask-ML should include the following implementations:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A Bayesian sampling scheme for the Hyperband implementation&lt;/strong&gt; that’s
similar to Ray’s and BOHB’s (&lt;a class="reference external" href="https://github.com/dask/dask-ml/issues/697"&gt;dask-ml#697&lt;/a&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A configuration of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HyperbandSearchCV&lt;/span&gt;&lt;/code&gt; that’s well-suited for
exploratory hyperparameter searches.&lt;/strong&gt; An initial implementation is in
&lt;a class="reference external" href="https://github.com/dask/dask-ml/pull/532"&gt;dask-ml#532&lt;/a&gt;, which should be benchmarked against Ray.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Luckily, all of these pieces of development are straightforward modifications
because the Dask-ML model selection framework is pretty flexible.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Thank you &lt;a class="reference external" href="https://github.com/TomAugspurger"&gt;Tom Augspurger&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/mrocklin"&gt;Matthew Rocklin&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/jsignell"&gt;Julia Signell&lt;/a&gt;, and &lt;a class="reference external" href="https://github.com/quasiben"&gt;Benjamin
Zaitlen&lt;/a&gt; for your feedback, suggestions and edits.&lt;/em&gt;&lt;/p&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/06/ray-tune.md&lt;/span&gt;, line 427)&lt;/p&gt;
&lt;p&gt;Document headings start at H2, not H1 [myst.header]&lt;/p&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="appendix"&gt;
&lt;h1&gt;Appendix&lt;/h1&gt;
&lt;section id="benchmark-setup"&gt;
&lt;h2&gt;Benchmark setup&lt;/h2&gt;
&lt;p&gt;This is the complete setup for the benchmark between Dask-ML, Scikit-Learn and
Ray. Complete details can be found at
&lt;a class="reference external" href="https://github.com/stsievert/dask-hyperband-comparison"&gt;stsievert/dask-hyperband-comparison&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s create a dummy model that takes 1 second for a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt; call with
50,000 examples. This is appropriate for this benchmark; we’re only interested
in the time required to finish the search, not how well the models do.
Scikit-learn, Ray and Dask-ML have have very similar methods of choosing
hyperparameters to evaluate; they differ in their early stopping techniques.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;scipy.stats&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uniform&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_classification&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;benchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConstantFunction&lt;/span&gt;  &lt;span class="c1"&gt;# custom module&lt;/span&gt;

&lt;span class="c1"&gt;# This model sleeps for `latency * len(X)` seconds before&lt;/span&gt;
&lt;span class="c1"&gt;# reporting a score of `value`.&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ConstantFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;50e3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;value&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="c1"&gt;# This dummy dataset mirrors the MNIST dataset&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_classification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;60e3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;n_features&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;784&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This model will take 2 minutes to train for 100 epochs (aka passes through the
data). Details can be found at &lt;a class="reference external" href="https://github.com/stsievert/dask-hyperband-comparison"&gt;stsievert/dask-hyperband-comparison&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s configure our searches to use 8 workers with a single cross-validation
split:&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomizedSearchCV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ShuffleSplit&lt;/span&gt;
&lt;span class="n"&gt;split&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ShuffleSplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;kwargs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;refit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RandomizedSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 20.88 minutes&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_ml.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;
&lt;span class="n"&gt;dask_search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aggressiveness&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;tune_sklearn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TuneSearchCV&lt;/span&gt;
&lt;span class="n"&gt;ray_search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TuneSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;early_stopping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;dask_search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 2.93 minutes&lt;/span&gt;
&lt;span class="n"&gt;ray_search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 2.49 minutes&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="full-example-usage"&gt;
&lt;h2&gt;Full example usage&lt;/h2&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SGDClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;scipy.stats&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loguniform&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.datasets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_classification&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SGDClassifier&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;alpha&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;loguniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1e-5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;l1_ratio&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_classification&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomizedSearchCV&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RandomizedSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;dask_ml.model_selection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;
&lt;span class="n"&gt;HyperbandSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;tune_sklearn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TuneSearchCV&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TuneSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;hr class="docutils" /&gt;
&lt;aside class="system-message"&gt;
&lt;p class="system-message-title"&gt;System Message: WARNING/2 (&lt;span class="docutils literal"&gt;/opt/build/repo/2020/08/06/ray-tune.md&lt;/span&gt;, line 40)&lt;/p&gt;
&lt;p&gt;Duplicate reference definition: TSNE [myst.duplicate_def]&lt;/p&gt;
&lt;/aside&gt;
&lt;hr class="footnotes docutils" /&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="automl" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Their implementation of Hyperband in &lt;a class="reference external" href="https://github.com/automl/HpBandSter"&gt;HpBandSter&lt;/a&gt; is included in &lt;a class="reference external" href="https://www.automl.org/wp-content/uploads/2018/09/chapter7-autonet.pdf"&gt;Auto-PyTorch&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/automl/BOAH"&gt;BOAH&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="hyperband-figs" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;See Figures 4, 7 and 8 in “&lt;a class="reference external" href="https://arxiv.org/pdf/1603.06560.pdf"&gt;Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization&lt;/a&gt;.”&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="follow-up" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;See Figure 1 of &lt;a class="reference external" href="http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf"&gt;the BOHB paper&lt;/a&gt; and &lt;a class="reference external" href="https://arxiv.org/pdf/1801.01596.pdf"&gt;a paper&lt;/a&gt; from an augmented reality company.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="smac" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;SMAC is described in “&lt;a class="reference external" href="https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf"&gt;Sequential Model-Based Optimization forGeneral Algorithm Configuration&lt;/a&gt;,” and is available &lt;a class="reference external" href="https://www.automl.org/automated-algorithm-design/algorithm-configuration/smac/"&gt;in AutoML&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="spearmint" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;5&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Spearmint is described in “&lt;a class="reference external" href="https://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf"&gt;Practical Bayesian Optimization of MachineLearning Algorithms&lt;/a&gt;,” and is available in &lt;a class="reference external" href="https://github.com/HIPS/Spearmint"&gt;HIPS/spearmint&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="tpe" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id6"&gt;6&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;TPE is described in Section 4 of “&lt;a class="reference external" href="http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf"&gt;Algorithms for Hyperparameter Optimization&lt;/a&gt;,” and is available &lt;a class="reference external" href="http://hyperopt.github.io/hyperopt/"&gt;through Hyperopt&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="ray-pf" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;From &lt;a class="reference external" href="https://github.com/ray-project/tune-sklearn/blob/31f228e21ef632a89a74947252d8ad5323cbd043/README.md"&gt;Ray’s README.md&lt;/a&gt;: “If the estimator does not support &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;partial_fit&lt;/span&gt;&lt;/code&gt;, a warning will be shown saying early stopping cannot be done and it will simply run the cross-validation on Ray’s parallel back-end.”&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="random-search" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;I choose to benchmark random searches instead of grid searches because random searches produce better results because grid searches require estimating how important each parameter is; for more detail see “&lt;a class="reference external" href="http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf"&gt;Random Search for Hyperparameter Optimization&lt;/a&gt;” by Bergstra and Bengio.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="priority-impl" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id9"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Despite a relevant implementation in &lt;a class="reference external" href="https://github.com/dask/dask-ml/pull/527"&gt;dask-ml#527&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="priority" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id10"&gt;10&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Because priority is meaningless if there are an infinite number of workers.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="bohb-exps" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;11&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Details are in “&lt;a class="reference external" href="http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf"&gt;BOHB: Robust and Efficient Hyperparameter Optimization at Scale&lt;/a&gt;.”&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="nlp-future" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;12&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Future work is combining this with the Dask-ML’s Hyperband implementation.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="openai" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;13&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Computing &lt;a class="reference external" href="https://en.wikipedia.org/wiki/N-gram"&gt;n-grams&lt;/a&gt; requires a ton of memory and computation. For OpenAI, NLP preprocessing took 8 GPU-months! (&lt;a class="reference external" href="https://openai.com/blog/language-unsupervised/#drawbacks"&gt;source&lt;/a&gt;)&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="stopping" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;14&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Hyperband’s theory answers “how many models should be stopped?” and “when should they be stopped?”&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="bohb-parallel" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;15&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;In Section 4.2 of &lt;a class="reference external" href="http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf"&gt;their paper&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
</content>
    <link href="https://blog.dask.org/2020/08/06/ray-tune/"/>
    <summary>Hyperparameter optimization is the process of deducing model parameters that
can’t be learned from data. This process is often time- and resource-consuming,
especially in the context of deep learning. A good description of this process
can be found at “Tuning the hyper-parameters of an estimator,” and
the issues that arise are concisely summarized in Dask-ML’s documentation of
“Hyper Parameter Searches.”</summary>
    <category term="dask" label="dask"/>
    <category term="dask-ml" label="dask-ml"/>
    <category term="machine-learning" label="machine-learning"/>
    <category term="ray" label="ray"/>
    <published>2020-08-06T00:00:00+00:00</published>
  </entry>
</feed>
