---
blogpost: true
date: Aug 6, 2020
title: Comparing Dask-ML and Ray Tune's Model Selection Algorithms
author: <a href="https://stsievert.com">Scott Sievert</a> (University of Wisconsin–Madison)
tags: machine-learning, dask-ml, dask, ray
description: Modern hyperparameter optimizations, Scikit-Learn support, framework support and scaling to many machines.
---

Hyperparameter optimization is the process of deducing model parameters that
can't be learned from data. This process is often time- and resource-consuming,
especially in the context of deep learning. A good description of this process
can be found at "[Tuning the hyper-parameters of an estimator][tuning]," and
the issues that arise are concisely summarized in Dask-ML's documentation of
"[Hyper Parameter Searches]."

[hyper parameter searches]: https://ml.dask.org/hyper-parameter-search.html

There's a host of libraries and frameworks out there to address this problem.
[Scikit-Learn's module][skl-ms] has been mirrored [in Dask-ML][dml-ms] and
[auto-sklearn], both of which offer advanced hyperparameter optimization
techniques. Other implementations that don't follow the Scikit-Learn interface
include [Ray Tune], [AutoML] and [Optuna].

[automl]: https://www.automl.org/
[auto-sklearn]: https://automl.github.io/auto-sklearn/master/

[Ray] recently provided a wrapper to [Ray Tune] that mirrors the Scikit-Learn
API called tune-sklearn ([docs][rts-docs], [source][rts-source]). [The introduction][1] of this library
states the following:

[rts-docs]: https://docs.ray.io/en/master/tune/api_docs/sklearn.html
[rts-source]: https://github.com/ray-project/tune-sklearn
[ray]: https://docs.ray.io
[ray tune]: https://docs.ray.io/en/master/tune.html
[dml-ms]: https://ml.dask.org/hyper-parameter-search.html
[skl-ms]: https://scikit-learn.org/stable/modules/grid_search.html
[tsne]: https://distill.pub/2016/misread-tsne/
[tuning]: https://scikit-learn.org/stable/modules/grid_search.html
[tsne]: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
[1]: https://medium.com/distributed-computing-with-ray/gridsearchcv-2-0-new-and-improved-ee56644cbabf

> Cutting edge hyperparameter tuning techniques (Bayesian optimization, early
> stopping, distributed execution) can provide significant speedups over grid
> search and random search.
>
> However, the machine learning ecosystem is missing a solution that provides
> users with the ability to leverage these new algorithms while allowing users
> to stay within the Scikit-Learn API. In this blog post, we introduce
> tune-sklearn [Ray's tuning library] to bridge this gap. Tune-sklearn is a
> drop-in replacement for Scikit-Learn's model selection module with
> state-of-the-art optimization features.
>
> —[GridSearchCV 2.0 — New and Improved][1]

This claim is inaccurate: for over a year Dask-ML has provided access to
"cutting edge hyperparameter tuning techniques" with a Scikit-Learn compatible
API. To correct their statement, let's look at each of the features that Ray's
tune-sklearn provides, and compare them to Dask-ML:

[dml-hod]: https://ml.dask.org/hyper-parameter-search.html

> Here's what [Ray's] tune-sklearn has to offer:
>
> 1. **Consistency with Scikit-Learn API** ...
> 2. **Modern hyperparameter tuning techniques** ...
> 3. **Framework support** ...
> 4. **Scale up** ... [to] multiple cores and even multiple machines.
>
> [Ray's] Tune-sklearn is also **fast**.

Dask-ML's model selection module has every one of the features:

- **Consistency with Scikit-Learn API:** Dask-ML's model selection API
  mirrors the Scikit-Learn model selection API.
- **Modern hyperparameter tuning techniques:** Dask-ML offers state-of-the-art
  hyperparameter tuning techniques.
- **Framework support:** Dask-ML model selection supports many libraries
  including Scikit-Learn, PyTorch, Keras, LightGBM and XGBoost.
- **Scale up:** Dask-ML supports distributed tuning (how could it not?) and
  larger-than-memory datasets.

Dask-ML is also **fast.** In "[Speed](#speed)" we show a benchmark between
Dask-ML, Ray and Scikit-Learn:

<img src="/images/2020-model-selection/n_workers=8.png" width="450px"
 />

Only time-to-solution is relevant; all of these methods produce similar model
scores. See "[Speed](#speed)" for details.

Now, let's walk through the details on how to use Dask-ML to obtain the 5
features above.

### Consistency with the Scikit-Learn API

_Dask-ML is consistent with the Scikit-Learn API._

Here's how to use Scikit-Learn's, Dask-ML's and Ray's tune-sklearn
hyperparameter optimization:

```python
## Trimmed example; see appendix for more detail
from sklearn.model_selection import RandomizedSearchCV
search = RandomizedSearchCV(model, params, ...)
search.fit(X, y)

from dask_ml.model_selection import HyperbandSearchCV
search = HyperbandSearchCV(model, params, ...)
search.fit(X, y, classes=[0, 1])

from tune_sklearn import TuneSearchCV
search = TuneSearchCV(model, params, ...)
search.fit(X, y, classes=[0, 1])
```

The definitions of `model` and `params` follow the normal Scikit-Learn
definitions as detailed in the [appendix](#full-example-usage).

Clearly, both Dask-ML and Ray's tune-sklearn are Scikit-Learn compatible. Now
let's focus on how each search performs and how it's configured.

[skl-eg]: https://scikit-learn.org/stable/auto_examples/model_selection/plot_randomized_search.html#sphx-glr-auto-examples-model-selection-plot-randomized-search-py
[sgd-eg]: https://github.com/ray-project/tune-sklearn/blob/31f228e21ef632a89a74947252d8ad5323cbd043/examples/sgd.py

### Modern hyperparameter tuning techniques

_Dask-ML offers state-of-the-art hyperparameter tuning techniques
in a Scikit-Learn interface._

[The introduction][1] of Ray's tune-sklearn made this claim:

> tune-sklearn is the only
> Scikit-Learn interface that allows you to easily leverage Bayesian
> Optimization, HyperBand and other optimization techniques by simply toggling a few parameters.

The state-of-the-art in hyperparameter optimization is currently
"[Hyperband][hyperband-paper]." Hyperband reduces the amount of computation
required with a _principled_ early stopping scheme; past that, it's the same as
Scikit-Learn's popular `RandomizedSearchCV`.

Hyperband _works._ As such, it's very popular. After the introduction of
Hyperband in 2016 by Li et. al, [the paper][hyperband-paper] has been cited
[over 470 times][470] and has been implemented in many different libraries
including [Dask-ML][hscv], [Ray Tune][rt-hb], [keras-tune], [Optuna],
[AutoML],[^automl] and [Microsoft's NNI][nni]. The original paper shows a
rather drastic improvement over all the relevant
implementations,[^hyperband-figs] and this drastic improvement persists in
follow-up works.[^follow-up] Some illustrative results from Hyperband are
below:

<img width="80%" src="/images/2020-model-selection/hyperband-fig-7-8.png"
 style="display: block; margin-left: auto; margin-right: auto;" />

<div style="max-width: 80%; word-wrap: break-word;" style="text-align: center;">

<sup>All algorithms are configured to do the same amount of work except "random
2x" which does twice as much work. "hyperband (finite)" is similar Dask-ML's
default implementation, and "bracket s=4" is similar to Ray's default
implementation. "random" is a random search. SMAC,[^smac]
spearmint,[^spearmint] and TPE[^tpe] are popular Bayesian algorithms. </sup>

</div>

Hyperband is undoubtedly a "cutting edge" hyperparameter optimization
technique. Dask-ML and Ray offer Scikit-Learn implementations of this algorithm
that rely on similar implementations, and Dask-ML's implementation also has a
[rule of thumb] for configuration. Both Dask-ML's and Ray's documentation
encourages use of Hyperband.

Ray does support using their Hyperband implementation on top of a technique
called Bayesian sampling. This changes the hyperparameter sampling scheme for
model initialization. This can be used in conjunction with Hyperband's early
stopping scheme. Adding this option to Dask-ML's Hyperband implementation is
future work for Dask-ML.

[^bohb-exps]: Details are in "[BOHB: Robust and Efficient Hyperparameter Optimization at Scale][bohb paper]."
[^nlp-future]: Future work is combining this with the Dask-ML's Hyperband implementation.

[rule of thumb]: https://ml.dask.org/hyper-parameter-search.html#hyperband-parameters-rule-of-thumb
[hbscv]: https://ml.dask.org/modules/generated/dask_ml.model_selection.HyperbandSearchCV.html
[tscv]: https://docs.ray.io/en/master/tune/api_docs/sklearn.html
[the conclusion]: #conclusion
[hyperband-paper]: https://arxiv.org/pdf/1603.06560.pdf
[470]: https://scholar.google.com/scholar?cites=10473284631669296057&as_sdt=5,39&sciodt=0,39&hl=en
[hscv]: https://ml.dask.org/modules/generated/dask_ml.model_selection.HyperbandSearchCV.html#dask_ml.model_selection.HyperbandSearchCV
[optuna]: https://medium.com/optuna/optuna-supports-hyperband-93b0cae1a137
[keras-tune]: https://keras-team.github.io/keras-tuner/documentation/tuners/#hyperband-class

[^automl]: Their implementation of Hyperband in [HpBandSter] is included in [Auto-PyTorch] and [BOAH].
[^follow-up]: See Figure 1 of [the BOHB paper][bohb paper] and [a paper][blippar] from an augmented reality company.
[^hyperband-figs]: See Figures 4, 7 and 8 in "[Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization][hyperband-paper]."
[^openai]: Computing [n-grams] requires a ton of memory and computation. For OpenAI, NLP preprocessing took 8 GPU-months! ([source][gpt])

[gpt]: https://openai.com/blog/language-unsupervised/#drawbacks
[avoiding repeated work]: https://ml.dask.org/hyper-parameter-search.html#avoid-repeated-work
[boah]: https://github.com/automl/BOAH
[auto-pytorch]: https://www.automl.org/wp-content/uploads/2018/09/chapter7-autonet.pdf
[hpbandster]: https://github.com/automl/HpBandSter
[blippar]: https://arxiv.org/pdf/1801.01596.pdf
[bohb paper]: http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf
[n-grams]: https://en.wikipedia.org/wiki/N-gram

[^smac]: SMAC is described in "[Sequential Model-Based Optimization forGeneral Algorithm Configuration][smac-paper]," and is available [in AutoML][smac-automl].
[^spearmint]: Spearmint is described in "[Practical Bayesian Optimization of MachineLearning Algorithms][pbo]," and is available in [HIPS/spearmint].
[^tpe]: TPE is described in Section 4 of "[Algorithms for Hyperparameter Optimization][aho]," and is available [through Hyperopt][hyperopt].

[smac-paper]: https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf
[smac-automl]: https://www.automl.org/automated-algorithm-design/algorithm-configuration/smac/
[spearmint]: https://github.com/HIPS/Spearmint
[aho]: http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
[pbo]: https://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf
[hips/spearmint]: https://github.com/HIPS/Spearmint
[hyperopt]: http://hyperopt.github.io/hyperopt/

[^stopping]: Hyperband's theory answers "how many models should be stopped?" and "when should they be stopped?"

[nni]: https://nni.readthedocs.io/en/latest/Tuner/HyperbandAdvisor.html
[rt-hb]: https://docs.ray.io/en/master/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler
[rts-hb]: https://docs.ray.io/en/master/tune/api_docs/sklearn.html

### Framework support

_Dask-ML model selection supports many libraries including Scikit-Learn, PyTorch, Keras, LightGBM and XGBoost._

Ray's tune-sklearn supports these frameworks:

> tune-sklearn is used primarily for tuning
> Scikit-Learn models, but it also supports and provides examples for many
> other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch),
> KerasClassifiers (Keras), and XGBoostClassifiers (XGBoost).

Clearly, both Dask-ML and Ray support the many of the same libraries.

However, both Dask-ML and Ray have some qualifications. Certain libraries don't
offer an implementation of `partial_fit`,[^ray-pf] so not all of the modern
hyperparameter optimization techniques can be offered. Here's a table comparing
different libraries and their support in Dask-ML's model selection and Ray's
tune-sklearn:

|      Model Library       | Dask-ML support | Ray support | Dask-ML: early stopping? | Ray: early stopping? |
| :----------------------: | :-------------: | :---------: | :----------------------: | :------------------: |
|      [Scikit-Learn]      |        ✔        |      ✔      |           ✔\*            |         ✔\*          |
| [PyTorch] (via [Skorch]) |        ✔        |      ✔      |            ✔             |          ✔           |
| [Keras] (via [SciKeras]) |        ✔        |      ✔      |          ✔\*\*           |        ✔\*\*         |
|        [LightGBM]        |        ✔        |      ✔      |            ❌            |          ❌          |
|        [XGBoost]         |        ✔        |      ✔      |            ❌            |          ❌          |

<sup>\* Only for [the models that implement `partial_fit`][skl-il].</sup><br>
<sup>\*\* Thanks to work by the Dask developers around [scikeras#24].</sup>

[scikeras#24]: https://github.com/adriangb/scikeras/issues/24
[skl-il]: https://scikit-learn.org/stable/modules/computing.html#incremental-learning
[scikit-learn]: https://scikit-learn.org/
[xgboost]: https://xgboost.ai/
[lightgbm]: https://lightgbm.readthedocs.io/
[pytorch]: https://pytorch.org/
[keras]: https://keras.io/
[scikeras]: https://github.com/adriangb/scikeras
[skorch]: https://skorch.readthedocs.io/

By this measure, Dask-ML and Ray model selection have the same level of
framework support. Of course, Dask has tangential integration with LightGBM and
XGBoost through [Dask-ML's `xgboost` module][dmlxg] and [dask-lightgbm][dml-lg].

[^ray-pf]: From [Ray's README.md]: "If the estimator does not support `partial_fit`, a warning will be shown saying early stopping cannot be done and it will simply run the cross-validation on Ray's parallel back-end."

[dml-lg]: https://github.com/dask/dask-lightgbm
[dmlxg]: https://ml.dask.org/xgboost.html
[ray's readme.md]: https://github.com/ray-project/tune-sklearn/blob/31f228e21ef632a89a74947252d8ad5323cbd043/README.md

### Scale up

_Dask-ML supports distributed tuning (how could it not?), aka parallelization
across multiple machines/cores. In addition, it also supports
larger-than-memory data._

> [Ray's] Tune-sklearn leverages Ray Tune, a library for distributed
> hyperparameter tuning, to efficiently and transparently parallelize cross
> validation on multiple cores and even multiple machines.

Naturally, Dask-ML also scales to multiple cores/machines because it relies on
Dask. Dask has wide support for [different deployment options][ddo] that span
from your personal machine to supercomputers. Dask will very likely work on top
of any computing system you have available, including Kubernetes, SLURM, YARN
and Hadoop clusters as well as your personal machine.

Dask-ML's model selection also scales to larger-than-memory datasets, and is
thoroughly tested. Support for larger-than-memory data is untested in Ray, and
there are no examples detailing how to use Ray Tune with the distributed
dataset implementations in PyTorch/Keras.

[ray-trainable]: https://medium.com/rapids-ai/30x-faster-hyperparameter-search-with-raytune-and-rapids-403013fbefc5

In addition, I have benchmarked Dask-ML's model selection module to see how the
time-to-solution is affected by the number of Dask workers in "[Better and
faster hyperparameter optimization with Dask][db-bf]." That is, how does the
time to reach a particular accuracy scale with the number of workers $P$? At
first, it'll scale like $1/P$ but with large number of workers the serial
portion will dictate time to solution according to [Amdahl's Law]. Briefly, I
found Dask-ML's `HyperbandSearchCV` speedup started to saturate around 24
workers for a particular search.

[^bohb-parallel]: In Section 4.2 of [their paper](http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf).

[db-bf]: https://blog.dask.org/2019/09/30/dask-hyperparam-opt
[amdahl's law]: https://en.wikipedia.org/wiki/Amdahl%27s_law
[ddo]: https://docs.dask.org/en/latest/setup.html
[plg]: https://lightgbm.readthedocs.io/en/latest/Parallel-Learning-Guide.html

### Speed

_Both Dask-ML and Ray are much faster than Scikit-Learn._

Ray's tune-sklearn runs some benchmarks in [the introduction][1] with the
`GridSearchCV` class found in Scikit-Learn and Dask-ML. A more fair benchmark
would be use Dask-ML's `HyperbandSearchCV` because it is almost the same as the
algorithm in Ray's tune-sklearn. To be specific, I'm interested in comparing
these methods:

- Scikit-Learn's `RandomizedSearchCV`. This is a popular implementation, one
  that I've bootstrapped myself with a custom model.
- Dask-ML's `HyperbandSearchCV`. This is an early stopping technique for
  `RandomizedSearchCV`.
- Ray tune-sklearn's `TuneSearchCV`. This is a slightly different early
  stopping technique than `HyperbandSearchCV`'s.

Each search is configured to perform the same task: sample 100 parameters and
train for no longer than 100 "epochs" or passes through the
data.[^random-search] Each estimator is configured as their respective
documentation suggests. Each search uses 8 workers with a single cross
validation split, and a `partial_fit` call takes one second with 50,000
examples. The complete setup can be found in [the appendix](#appendix).

Here's how long each library takes to complete the same search:

<img src="/images/2020-model-selection/n_workers=8.png" width="450px"
 />

Notably, we didn't improve the Dask-ML codebase for this benchmark, and ran the
code as it's been for the last year.[^priority-impl] Regardless, it's possible that
other artifacts from [biased benchmarks][bb] crept into this benchmark.

[^priority-impl]: Despite a relevant implementation in [dask-ml#527].

[dask-ml#527]: https://github.com/dask/dask-ml/pull/527

Clearly, Ray and Dask-ML offer similar performance for 8 workers when compared
with Scikit-Learn. To Ray's credit, their implementation is ~15% faster than
Dask-ML's with 8 workers. We suspect that this performance boost comes from the
fact that Ray implements an asynchronous variant of Hyperband. We should
investigate this difference between Dask and Ray, and how each balances the
tradeoffs, number FLOPs vs. time-to-solution. This will vary with the number
of workers: the asynchronous variant of Hyperband provides no benefit if used
with a single worker.

Dask-ML reaches scores quickly in serial environments, or when the number of
workers is small. Dask-ML prioritizes fitting high scoring models: if there are
100 models to fit and only 4 workers available, Dask-ML selects the models that
have the highest score. This is most relevant in serial
environments;[^priority] see "[Better and faster hyperparameter optimization
with Dask][db-bf]" for benchmarks. This feature is omitted from this
benchmark, which only focuses on time to solution.

[bb]: http://matthewrocklin.com/blog/work/2017/03/09/biased-benchmarks
[biased-benchmarks]: http://matthewrocklin.com/blog/work/2017/03/09/biased-benchmarks

[^priority]: Because priority is meaningless if there are an infinite number of workers.

[sts-dhc]: https://github.com/stsievert/dask-hyperband-comparison

[^random-search]: I choose to benchmark random searches instead of grid searches because random searches produce better results because grid searches require estimating how important each parameter is; for more detail see "[Random Search for Hyperparameter Optimization][rsho]" by Bergstra and Bengio.

[rsho]: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
[summit]: https://en.wikipedia.org/wiki/Summit_(supercomputer)#Design

## Conclusion

Dask-ML and Ray offer the same features for model selection: state-of-the-art
features with a Scikit-Learn compatible API, and both implementations have
fairly wide support for different frameworks and rely on backends that can
scale to many machines.

In addition, the Ray implementation has provided motivation for further
development, specifically on the following items:

1. **Adding support for more libraries, including Keras** ([dask-ml#696][696],
   [dask-ml#713][713], [scikeras#24]). SciKeras is a Scikit-Learn wrapper for
   Keras that (now) works with Dask-ML model selection because SciKeras models
   implement the Scikit-Learn model API.
2. **Better documenting the models that Dask-ML supports**
   ([dask-ml#699][699]). Dask-ML supports any model that implement the
   Scikit-Learn interface, and there are wrappers for Keras, PyTorch, LightGBM
   and XGBoost. Now, [Dask-ML's documentation] prominently highlights this
   fact.

The Ray implementation has also helped motivate and clarify future work.
Dask-ML should include the following implementations:

1. **A Bayesian sampling scheme for the Hyperband implementation** that's
   similar to Ray's and BOHB's ([dask-ml#697][697]).
2. **A configuration of `HyperbandSearchCV` that's well-suited for
   exploratory hyperparameter searches.** An initial implementation is in
   [dask-ml#532][532], which should be benchmarked against Ray.

Luckily, all of these pieces of development are straightforward modifications
because the Dask-ML model selection framework is pretty flexible.

[dask-ml's documentation]: https://ml.dask.org
[699]: https://github.com/dask/dask-ml/pull/699
[scikit-learn#3299]: https://github.com/Scikit-Learn/Scikit-Learn/issues/3299
[tensorflow#39609]: https://github.com/tensorflow/tensorflow/pull/39609
[scikeras#17]: https://github.com/adriangb/scikeras/pull/17
[697]: https://github.com/dask/dask-ml/issues/697
[532]: https://github.com/dask/dask-ml/pull/532
[696]: https://github.com/dask/dask-ml/issues/696
[713]: https://github.com/dask/dask-ml/pull/713

_Thank you [Tom Augspurger], [Matthew Rocklin], [Julia Signell], and [Benjamin
Zaitlen] for your feedback, suggestions and edits._

[julia signell]: https://github.com/jsignell
[tom augspurger]: https://github.com/TomAugspurger
[matthew rocklin]: https://github.com/mrocklin
[benjamin zaitlen]: https://github.com/quasiben

## Appendix

### Benchmark setup

This is the complete setup for the benchmark between Dask-ML, Scikit-Learn and
Ray. Complete details can be found at
[stsievert/dask-hyperband-comparison][sts-dhc].

Let's create a dummy model that takes 1 second for a `partial_fit` call with
50,000 examples. This is appropriate for this benchmark; we're only interested
in the time required to finish the search, not how well the models do.
Scikit-learn, Ray and Dask-ML have have very similar methods of choosing
hyperparameters to evaluate; they differ in their early stopping techniques.

```python
from scipy.stats import uniform
from sklearn.model_selection import make_classification
from benchmark import ConstantFunction  # custom module

# This model sleeps for `latency * len(X)` seconds before
# reporting a score of `value`.
model = ConstantFunction(latency=1 / 50e3, max_iter=max_iter)

params = {"value": uniform(0, 1)}
# This dummy dataset mirrors the MNIST dataset
X_train, y_train = make_classification(n_samples=int(60e3), n_features=784)
```

This model will take 2 minutes to train for 100 epochs (aka passes through the
data). Details can be found at [stsievert/dask-hyperband-comparison][sts-dhc].

Let's configure our searches to use 8 workers with a single cross-validation
split:

```python
from sklearn.model_selection import RandomizedSearchCV, ShuffleSplit
split = ShuffleSplit(test_size=0.2, n_splits=1)
kwargs = dict(cv=split, refit=False)

search = RandomizedSearchCV(model, params, n_jobs=8, n_iter=n_params, **kwargs)
search.fit(X_train, y_train)  # 20.88 minutes

from dask_ml.model_selection import HyperbandSearchCV
dask_search = HyperbandSearchCV(
    model, params, test_size=0.2, max_iter=max_iter, aggressiveness=4
)

from tune_sklearn import TuneSearchCV
ray_search = TuneSearchCV(
    model, params, n_iter=n_params, max_iters=max_iter, early_stopping=True, **kwargs
)

dask_search.fit(X_train, y_train)  # 2.93 minutes
ray_search.fit(X_train, y_train)  # 2.49 minutes
```

### Full example usage

```python
from sklearn.linear_model import SGDClassifier
from scipy.stats import uniform, loguniform
from sklearn.datasets import make_classification
model = SGDClassifier()
params = {"alpha": loguniform(1e-5, 1e-3), "l1_ratio": uniform(0, 1)}
X, y = make_classification()

from sklearn.model_selection import RandomizedSearchCV
search = RandomizedSearchCV(model, params, ...)
search.fit(X, y)

from dask_ml.model_selection import HyperbandSearchCV
HyperbandSearchCV(model, params, ...)
search.fit(X, y, classes=[0, 1])

from tune_sklearn import TuneSearchCV
search = TuneSearchCV(model, params, ...)
search.fit(X, y, classes=[0, 1])
```

---