modAL.disagreement

Disagreement measures and disagreement based query strategies for the Committee model.

modAL.disagreement.KL_max_disagreement(committee: modAL.models.base.BaseCommittee, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → numpy.ndarray[source]

Calculates the max disagreement for the Committee. First it computes the class probabilties of X for each learner in the Committee, then calculates the consensus probability distribution by averaging the individual class probabilities for each learner. Then each learner’s class probabilities are compared to the consensus distribution in the sense of Kullback-Leibler divergence. The max disagreement for a given sample is the argmax of the KL divergences of the learners from the consensus probability.

Parameters:
  • committee – The modAL.models.BaseCommittee instance for which the max disagreement is to be calculated.
  • X – The data for which the max disagreement is to be calculated.
  • **predict_proba_kwargs – Keyword arguments for the predict_proba() of the Committee.
Returns:

Max disagreement of the Committee for the samples in X.

modAL.disagreement.consensus_entropy(committee: modAL.models.base.BaseCommittee, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → numpy.ndarray[source]

Calculates the consensus entropy for the Committee. First it computes the class probabilties of X for each learner in the Committee, then calculates the consensus probability distribution by averaging the individual class probabilities for each learner. The entropy of the consensus probability distribution is the vote entropy of the Committee, which is returned.

Parameters:
  • committee – The modAL.models.BaseCommittee instance for which the consensus entropy is to be calculated.
  • X – The data for which the consensus entropy is to be calculated.
  • **predict_proba_kwargs – Keyword arguments for the predict_proba() of the Committee.
Returns:

Consensus entropy of the Committee for the samples in X.

modAL.disagreement.consensus_entropy_sampling(committee: modAL.models.base.BaseCommittee, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], n_instances: int = 1, random_tie_break=False, **disagreement_measure_kwargs) → Tuple[numpy.ndarray, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]

Consensus entropy sampling strategy.

Parameters:
  • committee – The committee for which the labels are to be queried.
  • X – The pool of samples to query from.
  • n_instances – Number of samples to be queried.
  • random_tie_break – If True, shuffles utility scores to randomize the order. This can be used to break the tie when the highest utility score is not unique.
  • **disagreement_measure_kwargs – Keyword arguments to be passed for the disagreement measure function.
Returns:

The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.

modAL.disagreement.max_disagreement_sampling(committee: modAL.models.base.BaseCommittee, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], n_instances: int = 1, random_tie_break=False, **disagreement_measure_kwargs) → Tuple[numpy.ndarray, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]

Maximum disagreement sampling strategy.

Parameters:
  • committee – The committee for which the labels are to be queried.
  • X – The pool of samples to query from.
  • n_instances – Number of samples to be queried.
  • random_tie_break – If True, shuffles utility scores to randomize the order. This can be used to break the tie when the highest utility score is not unique.
  • **disagreement_measure_kwargs – Keyword arguments to be passed for the disagreement measure function.
Returns:

The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.

modAL.disagreement.max_std_sampling(regressor: sklearn.base.BaseEstimator, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], n_instances: int = 1, random_tie_break=False, **predict_kwargs) → Tuple[numpy.ndarray, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]

Regressor standard deviation sampling strategy.

Parameters:
  • regressor – The regressor for which the labels are to be queried.
  • X – The pool of samples to query from.
  • n_instances – Number of samples to be queried.
  • random_tie_break – If True, shuffles utility scores to randomize the order. This can be used to break the tie when the highest utility score is not unique.
  • **predict_kwargs – Keyword arguments to be passed to predict() of the CommiteeRegressor.
Returns:

The indices of the instances from X chosen to be labelled; the instances from X chosen to be labelled.

modAL.disagreement.vote_entropy(committee: modAL.models.base.BaseCommittee, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], **predict_proba_kwargs) → numpy.ndarray[source]

Calculates the vote entropy for the Committee. First it computes the predictions of X for each learner in the Committee, then calculates the probability distribution of the votes. The entropy of this distribution is the vote entropy of the Committee, which is returned.

Parameters:
  • committee – The modAL.models.BaseCommittee instance for which the vote entropy is to be calculated.
  • X – The data for which the vote entropy is to be calculated.
  • **predict_proba_kwargs – Keyword arguments for the predict_proba() of the Committee.
Returns:

Vote entropy of the Committee for the samples in X.

modAL.disagreement.vote_entropy_sampling(committee: modAL.models.base.BaseCommittee, X: Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix], n_instances: int = 1, random_tie_break=False, **disagreement_measure_kwargs) → Tuple[numpy.ndarray, Union[list, numpy.ndarray, scipy.sparse.csr.csr_matrix]][source]

Vote entropy sampling strategy.

Parameters:
  • committee – The committee for which the labels are to be queried.
  • X – The pool of samples to query from.
  • n_instances – Number of samples to be queried.
  • random_tie_break – If True, shuffles utility scores to randomize the order. This can be used to break the tie when the highest utility score is not unique.
  • **disagreement_measure_kwargs – Keyword arguments to be passed for the disagreement measure function.
Returns:

The indices of the instances from X chosen to be labelled;

the instances from X chosen to be labelled.