BayesianOptimizer

When a function is expensive to evaluate, or when gradients are not available, optimalizing it requires more sophisticated methods than gradient descent. One such method is Bayesian optimization, which lies close to active learning. In Bayesian optimization, instead of picking queries by maximizing the uncertainty of predictions, function values are evaluated at points where the promise of finding a better value is large. In modAL, these algorithms are implemented with the BayesianOptimizer class, which is a sibling of ActiveLearner. They are both children of the BaseLearner class and they have the same interface, although their uses differ. In the following, we are going to shortly review this.

Differences with ActiveLearner

Initializing a BayesianOptimizer is syntactically identical to the initialization of ActiveLearner, although there are a few important differences.

from modAL.models import BayesianOptimizer
from modAL.acquisition import max_EI
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern

kernel = Matern(length_scale=1.0)
regressor = GaussianProcessRegressor(kernel=kernel)

optimizer = BayesianOptimizer(
    estimator=regressor,
    query_strategy=max_EI
)

Most importantly, BayesianOptimizer works with a regressor. You can use them with a classifier if the labels are numbers, but the result will be meaningless. Bayesian optimization typically uses a Gaussian process regressor to keep a hypothesis about the function to be optimized and estimate the expected gains when a certain point is picked for evaluation. This latter is the task of the acquisition function. (See below for details.)

The actual optimization loop is identical to the one you would use with the ActiveLearner.

# Bayesian optimization: func is to be optimized
for n_query in range(n_queries):
    query_idx, query_inst = optimizer.query(X)
    optimizer.teach(X[query_idx].reshape(1, -1), func(X[query_idx]).reshape(1, -1))

Again, the bottleneck in Bayesian learning is not necessarily the availability of labels. The function to be optimized can take a long time and a lot of money to evaluate. For instance, when optimizing the hyperparameters of a deep neural network, evaluating the accuracy of the model can take a few days of training. This is a case when Bayesian optimization is very useful. For more details, see this paper for instance.

To see the maximum value so far, use optimizer.get_max():

X_max, y_max = optmizer.get_max()

Acquisition functions

Currently, there are three built in acquisition functions in the modAL.acquisition module: expected improvement, probability of improvement and upper confidence bounds. You can find more information about them at the page Acquisition functions.