Y’ALL: Yet another Active Learning Library¶
Prerequisites¶
A motivating example¶
Active learning can often discover a subset of the full data set that generalizes well to the test set. For example, we consider the Iris data set:
>>> import numpy as np
>>> from yall import ActiveLearningModel
>>> from yall.querystrategies import Margin
>>> from yall.utils import plot_learning_curve
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression as LR
>>> from sklearn.base import clone
>>> np.random.seed(0)
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2)
>>> lr = LR(solver="liblinear", multi_class="auto")
>>> lr = lr.fit(train_X, train_y)
>>> print(lr.score(test_X, test_y))
0.967
Using the full data set, logistic regression acheives an accuracy of 0.967 on the test data.
>>> alm = ActiveLearningModel(clone(lr), Margin(),
... eval_metric="accuracy",
... U_proportion=0.95, random_state=0)
>>> accuracies, choices = alm.run(train_X, test_X, train_y, test_y)
>>> plot_learning_curve(accuracies, 0, len(accuracies),
... eval_metric="accuracy")
From the learning curve we see that only the first 25 or so data points are required to acheive perfect 1.0 accuracy on the test data.
>>> lr_small = clone(lr)
>>> lr_small = lr_small.fit(alm.L.X[:25, ], alm.L.y[:25])
>>> print(lr_small.score(test_X, test_y))
1.0
Supported query strategies¶
- Random Sampling (passive learning)
- Uncertainty Sampling
- Entropy Sampling
- Least Confidence
- Least Confidence with Bias
- Least Confidence with Dynamic Bias
- Margin Sampling
- Simple Margin Sampling
- Representative Sampling
- Density Sampling
- Distance to Center
- MinMax Sampling
- Combined Sampling
- Beta-weighted Combined Sampling
- Lambda-weighted Combined Sampling
Running Tests¶
First install pytest-cov
Then, from the project home directory run
py.test --cov=yall tests
Authors¶
- Jake Vasilakes - jvasilakes@gmail.com
Acknowledgements¶
This project grew out of a study of active learning methods for biomedical text classification. The paper associated with this study can be found at https://doi.org/10.1093/jamiaopen/ooy021