yall.ActiveLearningModel

class yall.activelearning.ActiveLearningModel(classifier, query_strategy, eval_metric='auc', U_proportion=0.9, init_L='random', random_state=None)[source]

Bases: object

Parameters:
  • classifier (sklearn.base.BaseEstimator) – Classifier to build the model.
  • query_strategy (QueryStrategy) – QueryStrategy instance to use.
  • eval_metric (str) – One of “auc”, “accuracy”.
  • U_proportion (float) – proportion of training data to be assigned the unlabeled set.
  • init_L (str) – How to initialize L: “random” or “LDS”.
  • random_state (int) – Sets the random_state parameter of train_test_split.
partial_train(new_x, new_y)[source]

Given a subset of training examples, calls partial_fit.

Parameters:
  • new_x (numpy.ndarray) – Feature array.
  • new_y (numpy.ndarray) – Label array.
prepare_data(train_X, test_X, train_y, test_y)[source]

Splits data into unlabeled, labeled, and test sets according to self.U_proportion.

Parameters:
  • train_X (np.array) – Training data features.
  • test_X (np.array) – Test data features.
  • train_y (np.array) – Training data labels.
  • test_y (np.array) – Test data labels.
run(train_X, test_X, train_y, test_y, ndraws=None, verbose=0)[source]

Run the active learning model. Saves AUC scores for each sampling iteration.

Parameters:
  • train_X (np.array) – Training data features.
  • test_X (np.array) – Test data features.
  • train_y (np.array) – Training data labels.
  • test_y (np.array) – Test data labels.
  • ndraws (int) – Number of times to query the unlabeled set. If None, query entire unlabeled set.
  • verbose (int) – If > 0, print information.
Returns:

AUC scores for each sampling iteration.

Return type:

numpy.ndarray(shape=(ndraws, ))

score()[source]

Computes the performance of the current classifier according to self.eval_metric.

Returns:performance score
Return type:float
train()[source]

Trains the classifier on L.

update_labels()[source]

Gets the chosen index from the query strategy, adds the corresponding data point to L and removes it from U. Logs which instance is picked from U.

Returns:chosen x and y, for use with partial_train()
Return type:tuple(numpy.ndarray, numpy.ndarray)