modAL.models¶

class modAL.models.ActiveLearner(estimator: sklearn.base.BaseEstimator, query_strategy: Callable = <function uncertainty_sampling>, X_training: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix, None] = None, y_training: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix, None] = None, bootstrap_init: bool = False, **fit_kwargs)[source]¶

This class is an abstract model of a general active learning algorithm.

Parameters:

estimator – The estimator to be used in the active learning loop.
query_strategy – Function providing the query strategy for the active learning loop, for instance, modAL.uncertainty.uncertainty_sampling.
X_training – Initial training samples, if available.
y_training – Initial training labels corresponding to initial training samples.
bootstrap_init – If initial training data is available, bootstrapping can be done during the first training. Useful when building Committee models with bagging.
**fit_kwargs – keyword arguments.

estimator¶: The estimator to be used in the active learning loop.

query_strategy¶: Function providing the query strategy for the active learning loop.

X_training¶: If the model hasn’t been fitted yet it is None, otherwise it contains the samples which the model has been trained on. If provided, the method fit() of estimator is called during __init__()

y_training¶: The labels corresponding to X_training.

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import RandomForestClassifier
>>> from modAL.models import ActiveLearner
>>> iris = load_iris()
>>> # give initial training examples
>>> X_training = iris['data'][[0, 50, 100]]
>>> y_training = iris['target'][[0, 50, 100]]
>>>
>>> # initialize active learner
>>> learner = ActiveLearner(
...     estimator=RandomForestClassifier(),
...     X_training=X_training, y_training=y_training
... )
>>>
>>> # querying for labels
>>> query_idx, query_sample = learner.query(iris['data'])
>>>
>>> # ...obtaining new labels from the Oracle...
>>>
>>> # teaching newly labelled examples
>>> learner.teach(
...     X=iris['data'][query_idx].reshape(1, -1),
...     y=iris['target'][query_idx].reshape(1, )
... )

fit(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], bootstrap: bool = False, **fit_kwargs) → modAL.models.base.BaseLearner¶

Interface for the fit method of the predictor. Fits the predictor to the supplied data, then stores it internally for the active learning loop.

Parameters:	X – The samples to be fitted. y – The corresponding labels. bootstrap – If true, trains the estimator on a set bootstrapped from X. Useful for building Committee models with bagging. **fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

Note

When using scikit-learn estimators, calling this method will make the ActiveLearner forget all training data it has seen!

Returns:	self

predict(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_kwargs) → Any¶

Estimator predictions for X. Interface with the predict method of the estimator.

Parameters:	X – The samples to be predicted. **predict_kwargs – Keyword arguments to be passed to the predict method of the estimator.
Returns:	Estimator predictions for X.

predict_proba(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → Any¶

Class probabilities if the predictor is a classifier. Interface with the predict_proba method of the classifier.

Parameters:	X – The samples for which the class probabilities are to be predicted. **predict_proba_kwargs – Keyword arguments to be passed to the predict_proba method of the classifier.
Returns:	Class probabilities for X.

query()¶

Finds the n_instances most informative point in the data provided by calling the query_strategy function.

Parameters:	query_args – The arguments for the query strategy. For instance, in the case of `uncertainty_sampling()`, it is the pool of samples from which the query strategy should choose instances to request labels. *query_kwargs – Keyword arguments for the query strategy function.
Returns:	Value of the query_strategy function. Should be the indices of the instances from the pool chosen to be labelled and the instances themselves. Can be different in other cases, for instance only the instance to be labelled upon query synthesis.

score(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **score_kwargs) → Any¶

Interface for the score method of the predictor.

Parameters:	X – The samples for which prediction accuracy is to be calculated. y – Ground truth labels for X. **score_kwargs – Keyword arguments to be passed to the .score() method of the predictor.
Returns:	The score of the predictor.

teach(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], bootstrap: bool = False, only_new: bool = False, **fit_kwargs) → None[source]¶

Adds X and y to the known training data and retrains the predictor with the augmented dataset.

Parameters:

X – The new samples for which the labels are supplied by the expert.
y – Labels corresponding to the new instances in X.
bootstrap – If True, training is done on a bootstrapped dataset. Useful for building Committee models with bagging.
only_new – If True, the model is retrained using only X and y, ignoring the previously provided examples. Useful when working with models where the .fit() method doesn’t retrain the model from scratch (e. g. in tensorflow or keras).
**fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

class modAL.models.BayesianOptimizer(estimator: sklearn.base.BaseEstimator, query_strategy: Callable = <function max_EI>, X_training: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix, None] = None, y_training: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix, None] = None, bootstrap_init: bool = False, **fit_kwargs)[source]¶

This class is an abstract model of a Bayesian optimizer algorithm.

Parameters:

estimator – The estimator to be used in the Bayesian optimization. (For instance, a GaussianProcessRegressor.)
query_strategy – Function providing the query strategy for Bayesian optimization, for instance, modAL.acquisitions.max_EI.
X_training – Initial training samples, if available.
y_training – Initial training labels corresponding to initial training samples.
bootstrap_init – If initial training data is available, bootstrapping can be done during the first training. Useful when building Committee models with bagging.
**fit_kwargs – keyword arguments.

estimator¶: The estimator to be used in the Bayesian optimization.

query_strategy¶: Function providing the query strategy for Bayesian optimization.

X_training¶: If the model hasn’t been fitted yet it is None, otherwise it contains the samples which the model has been trained on.

y_training¶: The labels corresponding to X_training.

X_max¶: argmax of the function so far.

y_max¶: Max of the function so far.

Examples

>>> import numpy as np
>>> from functools import partial
>>> from sklearn.gaussian_process import GaussianProcessRegressor
>>> from sklearn.gaussian_process.kernels import Matern
>>> from modAL.models import BayesianOptimizer
>>> from modAL.acquisition import optimizer_PI, optimizer_EI, optimizer_UCB, max_PI, max_EI, max_UCB
>>>
>>> # generating the data
>>> X = np.linspace(0, 20, 1000).reshape(-1, 1)
>>> y = np.sin(X)/2 - ((10 - X)**2)/50 + 2
>>>
>>> # assembling initial training set
>>> X_initial, y_initial = X[150].reshape(1, -1), y[150].reshape(1, -1)
>>>
>>> # defining the kernel for the Gaussian process
>>> kernel = Matern(length_scale=1.0)
>>>
>>> tr = 0.1
>>> PI_tr = partial(optimizer_PI, tradeoff=tr)
>>> PI_tr.__name__ = 'PI, tradeoff = %1.1f' % tr
>>> max_PI_tr = partial(max_PI, tradeoff=tr)
>>>
>>> acquisitions = zip(
...     [PI_tr, optimizer_EI, optimizer_UCB],
...     [max_PI_tr, max_EI, max_UCB],
... )
>>>
>>> for acquisition, query_strategy in acquisitions:
...     # initializing the optimizer
...     optimizer = BayesianOptimizer(
...         estimator=GaussianProcessRegressor(kernel=kernel),
...         X_training=X_initial, y_training=y_initial,
...         query_strategy=query_strategy
...     )
...
...     for n_query in range(5):
...         # query
...         query_idx, query_inst = optimizer.query(X)
...         optimizer.teach(X[query_idx].reshape(1, -1), y[query_idx].reshape(1, -1))

fit(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], bootstrap: bool = False, **fit_kwargs) → modAL.models.base.BaseLearner¶

Interface for the fit method of the predictor. Fits the predictor to the supplied data, then stores it internally for the active learning loop.

Parameters:	X – The samples to be fitted. y – The corresponding labels. bootstrap – If true, trains the estimator on a set bootstrapped from X. Useful for building Committee models with bagging. **fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

Note

When using scikit-learn estimators, calling this method will make the ActiveLearner forget all training data it has seen!

Returns:	self

predict(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_kwargs) → Any¶

Estimator predictions for X. Interface with the predict method of the estimator.

Parameters:	X – The samples to be predicted. **predict_kwargs – Keyword arguments to be passed to the predict method of the estimator.
Returns:	Estimator predictions for X.

predict_proba(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → Any¶

Class probabilities if the predictor is a classifier. Interface with the predict_proba method of the classifier.

Parameters:	X – The samples for which the class probabilities are to be predicted. **predict_proba_kwargs – Keyword arguments to be passed to the predict_proba method of the classifier.
Returns:	Class probabilities for X.

query()¶

Finds the n_instances most informative point in the data provided by calling the query_strategy function.

Parameters:	query_args – The arguments for the query strategy. For instance, in the case of `uncertainty_sampling()`, it is the pool of samples from which the query strategy should choose instances to request labels. *query_kwargs – Keyword arguments for the query strategy function.
Returns:	Value of the query_strategy function. Should be the indices of the instances from the pool chosen to be labelled and the instances themselves. Can be different in other cases, for instance only the instance to be labelled upon query synthesis.

score(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **score_kwargs) → Any¶

Interface for the score method of the predictor.

Parameters:	X – The samples for which prediction accuracy is to be calculated. y – Ground truth labels for X. **score_kwargs – Keyword arguments to be passed to the .score() method of the predictor.
Returns:	The score of the predictor.

teach(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], bootstrap: bool = False, only_new: bool = False, **fit_kwargs) → None[source]¶

Adds X and y to the known training data and retrains the predictor with the augmented dataset. This method also keeps track of the maximum value encountered in the training data.

Parameters:

X – The new samples for which the values are supplied.
y – Values corresponding to the new instances in X.
bootstrap – If True, training is done on a bootstrapped dataset. Useful for building Committee models with bagging. (Default value = False)
only_new – If True, the model is retrained using only X and y, ignoring the previously provided examples. Useful when working with models where the .fit() method doesn’t retrain the model from scratch (for example, in tensorflow or keras).
**fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

class modAL.models.Committee(learner_list: List[modAL.models.learners.ActiveLearner], query_strategy: Callable = <function vote_entropy_sampling>)[source]¶

This class is an abstract model of a committee-based active learning algorithm.

Parameters:	learner_list – A list of ActiveLearners forming the Committee. query_strategy – Query strategy function. Committee supports disagreement-based query strategies from `modAL.disagreement`, but uncertainty-based ones from `modAL.uncertainty` are also supported.

classes_¶: Class labels known by the Committee.

n_classes_¶: Number of classes known by the Committee.

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.neighbors import KNeighborsClassifier
>>> from sklearn.ensemble import RandomForestClassifier
>>> from modAL.models import ActiveLearner, Committee
>>>
>>> iris = load_iris()
>>>
>>> # initialize ActiveLearners
>>> learner_1 = ActiveLearner(
...     estimator=RandomForestClassifier(),
...     X_training=iris['data'][[0, 50, 100]], y_training=iris['target'][[0, 50, 100]]
... )
>>> learner_2 = ActiveLearner(
...     estimator=KNeighborsClassifier(n_neighbors=3),
...     X_training=iris['data'][[1, 51, 101]], y_training=iris['target'][[1, 51, 101]]
... )
>>>
>>> # initialize the Committee
>>> committee = Committee(
...     learner_list=[learner_1, learner_2]
... )
>>>
>>> # querying for labels
>>> query_idx, query_sample = committee.query(iris['data'])
>>>
>>> # ...obtaining new labels from the Oracle...
>>>
>>> # teaching newly labelled examples
>>> committee.teach(
...     X=iris['data'][query_idx].reshape(1, -1),
...     y=iris['target'][query_idx].reshape(1, )
... )

fit(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **fit_kwargs) → modAL.models.base.BaseCommittee¶

Fits every learner to a subset sampled with replacement from X. Calling this method makes the learner forget the data it has seen up until this point and replaces it with X! If you would like to perform bootstrapping on each learner using the data it has seen, use the method .rebag()!

Calling this method makes the learner forget the data it has seen up until this point and replaces it with X!

Parameters:	X – The samples to be fitted on. y – The corresponding labels. **fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

predict(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → Any[source]¶

Predicts the class of the samples by picking the consensus prediction.

Parameters:	X – The samples to be predicted. **predict_proba_kwargs – Keyword arguments to be passed to the `predict_proba()` of the Committee.
Returns:	The predicted class labels for X.

predict_proba(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → Any[source]¶

Consensus probabilities of the Committee.

Parameters:	X – The samples for which the class probabilities are to be predicted. **predict_proba_kwargs – Keyword arguments to be passed to the `predict_proba()` of the Committee.
Returns:	Class probabilities for X.

query()¶

Finds the n_instances most informative point in the data provided by calling the query_strategy function.

Parameters:	query_args – The arguments for the query strategy. For instance, in the case of `max_disagreement_sampling()`, it is the pool of samples from which the query. strategy should choose instances to request labels. *query_kwargs – Keyword arguments for the query strategy function.
Returns:	Return value of the query_strategy function. Should be the indices of the instances from the pool chosen to be labelled and the instances themselves. Can be different in other cases, for instance only the instance to be labelled upon query synthesis.

rebag(**fit_kwargs) → None¶

Refits every learner with a dataset bootstrapped from its training instances. Contrary to .bag(), it bootstraps the training data for each learner based on its own examples.

Parameters:	**fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

score(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], sample_weight: List[float] = None) → Any[source]¶

Returns the mean accuracy on the given test data and labels.

Parameters:	X – The samples to score. y – Ground truth labels corresponding to X. sample_weight – Sample weights.
Returns:	Mean accuracy of the classifiers.

teach(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], bootstrap: bool = False, only_new: bool = False, **fit_kwargs) → None[source]¶

Adds X and y to the known training data for each learner and retrains learners with the augmented dataset.

Parameters:

X – The new samples for which the labels are supplied by the expert.
y – Labels corresponding to the new instances in X.
bootstrap – If True, trains each learner on a bootstrapped set. Useful when building the ensemble by bagging.
only_new – If True, the model is retrained using only X and y, ignoring the previously provided examples.
**fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

vote(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_kwargs) → Any[source]¶

Predicts the labels for the supplied data for each learner in the Committee.

Parameters:	X – The samples to cast votes. **predict_kwargs – Keyword arguments to be passed to the `predict()` of the learners.
Returns:	The predicted class for each learner in the Committee and each sample in X.

vote_proba(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → Any[source]¶

Predicts the probabilities of the classes for each sample and each learner.

Parameters:	X – The samples for which class probabilities are to be calculated. **predict_proba_kwargs – Keyword arguments for the `predict_proba()` of the learners.
Returns:	Probabilities of each class for each learner and each instance.

class modAL.models.CommitteeRegressor(learner_list: List[modAL.models.learners.ActiveLearner], query_strategy: Callable = <function max_std_sampling>)[source]¶

This class is an abstract model of a committee-based active learning regression.

Parameters:	learner_list – A list of ActiveLearners forming the CommitteeRegressor. query_strategy – Query strategy function.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from sklearn.gaussian_process import GaussianProcessRegressor
>>> from sklearn.gaussian_process.kernels import WhiteKernel, RBF
>>> from modAL.models import ActiveLearner, CommitteeRegressor
>>>
>>> # generating the data
>>> X = np.concatenate((np.random.rand(100)-1, np.random.rand(100)))
>>> y = np.abs(X) + np.random.normal(scale=0.2, size=X.shape)
>>>
>>> # initializing the regressors
>>> n_initial = 10
>>> kernel = RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e3)) + WhiteKernel(noise_level=1, noise_level_bounds=(1e-10, 1e+1))
>>>
>>> initial_idx = list()
>>> initial_idx.append(np.random.choice(range(100), size=n_initial, replace=False))
>>> initial_idx.append(np.random.choice(range(100, 200), size=n_initial, replace=False))
>>> learner_list = [ActiveLearner(
...                         estimator=GaussianProcessRegressor(kernel),
...                         X_training=X[idx].reshape(-1, 1), y_training=y[idx].reshape(-1, 1)
...                 )
...                 for idx in initial_idx]
>>>
>>> # query strategy for regression
>>> def ensemble_regression_std(regressor, X):
...     _, std = regressor.predict(X, return_std=True)
...     query_idx = np.argmax(std)
...     return query_idx, X[query_idx]
>>>
>>> # initializing the CommitteeRegressor
>>> committee = CommitteeRegressor(
...     learner_list=learner_list,
...     query_strategy=ensemble_regression_std
... )
>>>
>>> # active regression
>>> n_queries = 10
>>> for idx in range(n_queries):
...     query_idx, query_instance = committee.query(X.reshape(-1, 1))
...     committee.teach(X[query_idx].reshape(-1, 1), y[query_idx].reshape(-1, 1))

fit(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **fit_kwargs) → modAL.models.base.BaseCommittee¶

Fits every learner to a subset sampled with replacement from X. Calling this method makes the learner forget the data it has seen up until this point and replaces it with X! If you would like to perform bootstrapping on each learner using the data it has seen, use the method .rebag()!

Calling this method makes the learner forget the data it has seen up until this point and replaces it with X!

Parameters:	X – The samples to be fitted on. y – The corresponding labels. **fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

predict(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], return_std: bool = False, **predict_kwargs) → Any[source]¶

Predicts the values of the samples by averaging the prediction of each regressor.

Parameters:	X – The samples to be predicted. **predict_kwargs – Keyword arguments to be passed to the `vote()` method of the CommitteeRegressor.
Returns:	The predicted class labels for X.

query()¶

Finds the n_instances most informative point in the data provided by calling the query_strategy function.

Parameters:	query_args – The arguments for the query strategy. For instance, in the case of `max_disagreement_sampling()`, it is the pool of samples from which the query. strategy should choose instances to request labels. *query_kwargs – Keyword arguments for the query strategy function.
Returns:	Return value of the query_strategy function. Should be the indices of the instances from the pool chosen to be labelled and the instances themselves. Can be different in other cases, for instance only the instance to be labelled upon query synthesis.

rebag(**fit_kwargs) → None¶

Refits every learner with a dataset bootstrapped from its training instances. Contrary to .bag(), it bootstraps the training data for each learner based on its own examples.

Parameters:	**fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

teach(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], y: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], bootstrap: bool = False, only_new: bool = False, **fit_kwargs) → None¶

Adds X and y to the known training data for each learner and retrains learners with the augmented dataset.

Parameters:

X – The new samples for which the labels are supplied by the expert.
y – Labels corresponding to the new instances in X.
bootstrap – If True, trains each learner on a bootstrapped set. Useful when building the ensemble by bagging.
only_new – If True, the model is retrained using only X and y, ignoring the previously provided examples.
**fit_kwargs – Keyword arguments to be passed to the fit method of the predictor.

vote(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_kwargs)[source]¶

Predicts the values for the supplied data for each regressor in the CommitteeRegressor.

Parameters:	X – The samples to cast votes. **predict_kwargs – Keyword arguments to be passed to `predict()` of the learners.
Returns:	The predicted value for each regressor in the CommitteeRegressor and each sample in X.