modAL.batch

Uncertainty measures that explicitly support batch-mode sampling for active learning models.

modAL.batch.ranked_batch(classifier: Union[modAL.models.base.BaseLearner, modAL.models.base.BaseCommittee], unlabeled: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], uncertainty_scores: numpy.ndarray, n_instances: int, metric: Union[str, Callable], n_jobs: Optional[int]) → numpy.ndarray[source]

Query our top :n_instances: to request for labeling.

Refer to Cardoso et al.’s “Ranked batch-mode active learning”:
https://www.sciencedirect.com/science/article/pii/S0020025516313949
Parameters:
  • classifier – One of modAL’s supported active learning models.
  • unlabeled – Set of records to be considered for our active learning model.
  • uncertainty_scores – Our classifier’s predictions over the response variable.
  • n_instances – Limit on the number of records to query from our unlabeled set.
  • metric – This parameter is passed to pairwise_distances().
  • n_jobs – This parameter is passed to pairwise_distances().
Returns:

The indices of the top n_instances ranked unlabelled samples.

modAL.batch.select_cold_start_instance(X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], metric: Union[str, Callable], n_jobs: Optional[int]) → Tuple[int, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]

Define what to do if our batch-mode sampling doesn’t have any labeled data – a cold start.

If our ranked batch sampling algorithm doesn’t have any labeled data to determine similarity among the uncertainty set, this function finds the element with highest average similarity to cold-start the batch selection.

Refer to Cardoso et al.’s “Ranked batch-mode active learning”:
https://www.sciencedirect.com/science/article/pii/S0020025516313949
Parameters:
  • X – The set of unlabeled records.
  • metric – This parameter is passed to pairwise_distances().
  • n_jobs – This parameter is passed to pairwise_distances().
Returns:

Index of the best cold-start instance from X chosen to be labelled; record of the best cold-start instance from X chosen to be labelled.

modAL.batch.select_instance(X_training: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], X_pool: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], X_uncertainty: numpy.ndarray, mask: numpy.ndarray, metric: Union[str, Callable], n_jobs: Optional[int]) → Tuple[numpy.ndarray, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], numpy.ndarray][source]

Core iteration strategy for selecting another record from our unlabeled records.

Given a set of labeled records (X_training) and unlabeled records (X_pool) with uncertainty scores (X_uncertainty), we’d like to identify the best instance in X_pool that best balances uncertainty and dissimilarity.

Refer to Cardoso et al.’s “Ranked batch-mode active learning”:
https://www.sciencedirect.com/science/article/pii/S0020025516313949
Parameters:
  • X_training – Mix of both labeled and unlabeled records.
  • X_pool – Unlabeled records to be selected for labeling.
  • X_uncertainty – Uncertainty scores for unlabeled records to be selected for labeling.
  • mask – Mask to exclude previously selected instances from the pool.
  • metric – This parameter is passed to pairwise_distances().
  • n_jobs – This parameter is passed to pairwise_distances().
Returns:

Index of the best index from X chosen to be labelled; a single record from our unlabeled set that is considered the most optimal incremental record for including in our query set.

modAL.batch.uncertainty_batch_sampling(classifier: Union[modAL.models.base.BaseLearner, modAL.models.base.BaseCommittee], X: Union[numpy.ndarray, scipy.sparse.csr.csr_matrix], n_instances: int = 20, metric: Union[str, Callable] = 'euclidean', n_jobs: Optional[int] = None, **uncertainty_measure_kwargs) → Tuple[numpy.ndarray, Union[numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]

Batch sampling query strategy. Selects the least sure instances for labelling.

This strategy differs from uncertainty_sampling() because, although it is supported, traditional active learning query strategies suffer from sub-optimal record selection when passing n_instances > 1. This sampling strategy extends the interactive uncertainty query sampling by allowing for batch-mode uncertainty query sampling. Furthermore, it also enforces a ranking – that is, which records among the batch are most important for labeling?

Refer to Cardoso et al.’s “Ranked batch-mode active learning”:
https://www.sciencedirect.com/science/article/pii/S0020025516313949
Parameters:
  • classifier – One of modAL’s supported active learning models.
  • X – Set of records to be considered for our active learning model.
  • n_instances – Number of records to return for labeling from X.
  • metric – This parameter is passed to pairwise_distances()
  • n_jobs – If not set, pairwise_distances_argmin_min() is used for calculation of distances between samples. Otherwise it is passed to pairwise_distances().
  • **uncertainty_measure_kwargs – Keyword arguments to be passed for the predict_proba() of the classifier.
Returns:

Indices of the instances from X chosen to be labelled; records from X chosen to be labelled.