yall.ActiveLearningModel¶

class yall.activelearning.ActiveLearningModel(classifier, query_strategy, eval_metric='auc', U_proportion=0.9, init_L='random', random_state=None)[source]¶

Bases: object

Parameters:

classifier (sklearn.base.BaseEstimator) – Classifier to build the model.
query_strategy (QueryStrategy) – QueryStrategy instance to use.
eval_metric (str) – One of “auc”, “accuracy”.
U_proportion (float) – proportion of training data to be assigned the unlabeled set.
init_L (str) – How to initialize L: “random” or “LDS”.
random_state (int) – Sets the random_state parameter of train_test_split.

partial_train(new_x, new_y)[source]¶

Given a subset of training examples, calls partial_fit.

Parameters:	new_x (numpy.ndarray) – Feature array. new_y (numpy.ndarray) – Label array.

prepare_data(train_X, test_X, train_y, test_y)[source]¶

Splits data into unlabeled, labeled, and test sets according to self.U_proportion.

Parameters:	train_X (np.array) – Training data features. test_X (np.array) – Test data features. train_y (np.array) – Training data labels. test_y (np.array) – Test data labels.

run(train_X, test_X, train_y, test_y, ndraws=None, verbose=0)[source]¶

Run the active learning model. Saves AUC scores for each sampling iteration.

Parameters:	train_X (np.array) – Training data features. test_X (np.array) – Test data features. train_y (np.array) – Training data labels. test_y (np.array) – Test data labels. ndraws (int) – Number of times to query the unlabeled set. If None, query entire unlabeled set. verbose (int) – If > 0, print information.
Returns:	AUC scores for each sampling iteration.
Return type:	numpy.ndarray(shape=(ndraws, ))

score()[source]¶

Computes the performance of the current classifier according to self.eval_metric.

Returns:	performance score
Return type:	float

train()[source]¶: Trains the classifier on L.

update_labels()[source]¶

Gets the chosen index from the query strategy, adds the corresponding data point to L and removes it from U. Logs which instance is picked from U.

Returns:	chosen x and y, for use with partial_train()
Return type:	tuple(numpy.ndarray, numpy.ndarray)