Artificial Intelligence 7 min read

Active Learning: Concepts, Workflow, Strategies, and Evaluation Metrics

Active learning addresses the high cost of labeling data by iteratively selecting the most informative unlabeled samples for annotation, thereby reducing labeling effort while achieving target model performance, and the article explains its fundamentals, relationship to supervised and semi‑supervised learning, common selection strategies, hybrid methods, and evaluation metrics.

360 Smart Cloud
360 Smart Cloud
360 Smart Cloud
Active Learning: Concepts, Workflow, Strategies, and Evaluation Metrics

Supervised learning requires large amounts of labeled data, which can be expensive and time‑consuming to obtain, especially in domains such as medical imaging where expert annotation incurs significant cost.

Because unlabeled data are abundant, active learning is introduced to reduce labeling cost by actively selecting the most valuable samples from the unlabeled pool for human annotation, thus improving model performance efficiently.

Active Learning Concepts and Basic Workflow

Concept: Active learning, also known as query learning or optimal experimental design, selects the most valuable unlabeled samples for labeling to achieve the desired model performance with minimal labeled data.

Basic Workflow: The process involves an unlabeled sample pool (U), a selection strategy (Q), annotators (S), a labeled set (L), and the target model (G). The model iteratively queries samples from U, obtains human labels, and retrains on the expanded L.

Relationship with Supervised and Semi‑Supervised Learning

Active learning vs. supervised learning: By selecting high‑value samples for labeling, active learning reaches peak performance with fewer annotations.

Active learning vs. semi‑supervised learning: Both select informative samples from unlabeled data, but semi‑supervised learning relies on automatic labeling, whereas active learning requires human annotation.

Active Learning Strategies and Methods

Random Sampling: Selects samples randomly from the unlabeled pool without model interaction.

Uncertainty Strategies: Choose samples the model is least confident about, using metrics such as:

Least Confidence: 1 – max predicted probability.

Margin Sampling: Small difference between the top two predicted probabilities.

Multi‑class Level Uncertainty: Difference between the highest and second‑highest probabilities (P1 – P2).

Maximum Entropy: Highest entropy of the predicted probability distribution.

Query by Committee (QBC): Builds a committee of multiple models, lets each predict unlabeled samples, and selects those with the most disagreement for labeling. Common disagreement measures include:

Vote Entropy: Entropy of the vote distribution across classes.

Average Kullback‑Leibler Divergence: Average KL divergence between each model’s prediction and the ensemble’s average distribution.

Hybrid approaches combine multiple basic strategies or integrate semi‑supervised learning (e.g., self‑training) to further improve efficiency.

Evaluation Metrics for Active Learning

Key metrics include:

SavedRate: Proportion of labeling cost saved compared to training on the full dataset.

ExpertAnnotated: Number of samples annotated by experts when the model reaches the target performance.

Full Samples: Total number of available unlabeled samples.

Machine Learningactive learningLabeling Cost ReductionQuery by CommitteeUncertainty Sampling
360 Smart Cloud
Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.