modAL.uncertainty¶

Uncertainty measures and uncertainty based sampling strategies for the active learning models.

modAL.uncertainty.classifier_entropy(classifier: sklearn.base.BaseEstimator, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → numpy.ndarray[source]¶

Entropy of predictions of the for the provided samples.

Parameters:	classifier – The classifier for which the prediction entropy is to be measured. X – The samples for which the prediction entropy is to be measured. **predict_proba_kwargs – Keyword arguments to be passed for the `predict_proba()` of the classifier.
Returns:	Entropy of the class probabilities.

modAL.uncertainty.classifier_margin(classifier: sklearn.base.BaseEstimator, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → numpy.ndarray[source]¶

Classification margin uncertainty of the classifier for the provided samples. This uncertainty measure takes the first and second most likely predictions and takes the difference of their probabilities, which is the margin.

Parameters:	classifier – The classifier for which the prediction margin is to be measured. X – The samples for which the prediction margin of classification is to be measured. **predict_proba_kwargs – Keyword arguments to be passed for the `predict_proba()` of the classifier.
Returns:	Margin uncertainty, which is the difference of the probabilities of first and second most likely predictions.

modAL.uncertainty.classifier_uncertainty(classifier: sklearn.base.BaseEstimator, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → numpy.ndarray[source]¶

Classification uncertainty of the classifier for the provided samples.

Parameters:	classifier – The classifier for which the uncertainty is to be measured. X – The samples for which the uncertainty of classification is to be measured. **predict_proba_kwargs – Keyword arguments to be passed for the `predict_proba()` of the classifier.
Returns:	Classifier uncertainty, which is 1 - P(prediction is correct).

modAL.uncertainty.entropy_sampling(classifier: sklearn.base.BaseEstimator, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], n_instances: int = 1, random_tie_break: bool = False, **uncertainty_measure_kwargs) → Tuple[numpy.ndarray, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]¶

Entropy sampling query strategy. Selects the instances where the class probabilities have the largest entropy.

Parameters:

classifier – The classifier for which the labels are to be queried.
X – The pool of samples to query from.
n_instances – Number of samples to be queried.
random_tie_break – If True, shuffles utility scores to randomize the order. This can be used to break the tie when the highest utility score is not unique.
**uncertainty_measure_kwargs – Keyword arguments to be passed for the uncertainty measure function.

Returns:

The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.

modAL.uncertainty.margin_sampling(classifier: sklearn.base.BaseEstimator, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], n_instances: int = 1, random_tie_break: bool = False, **uncertainty_measure_kwargs) → Tuple[numpy.ndarray, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]¶

Margin sampling query strategy. Selects the instances where the difference between the first most likely and second most likely classes are the smallest. :param classifier: The classifier for which the labels are to be queried. :param X: The pool of samples to query from. :param n_instances: Number of samples to be queried. :param random_tie_break: If True, shuffles utility scores to randomize the order. This

can be used to break the tie when the highest utility score is not unique.

Parameters:	**uncertainty_measure_kwargs – Keyword arguments to be passed for the uncertainty measure function.
Returns:	The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.

modAL.uncertainty.uncertainty_sampling(classifier: sklearn.base.BaseEstimator, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], n_instances: int = 1, random_tie_break: bool = False, **uncertainty_measure_kwargs) → Tuple[numpy.ndarray, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]¶

Uncertainty sampling query strategy. Selects the least sure instances for labelling.

Parameters:

classifier – The classifier for which the labels are to be queried.
X – The pool of samples to query from.
n_instances – Number of samples to be queried.
random_tie_break – If True, shuffles utility scores to randomize the order. This can be used to break the tie when the highest utility score is not unique.
**uncertainty_measure_kwargs – Keyword arguments to be passed for the uncertainty measure function.

Returns:

The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.