Notes on Feasibility, Hoeffding Inequality, and VC Theory from Lin Xuantian's Machine Learning Foundations Course
These concise notes summarize key concepts from Professor Lin Xuantian's Machine Learning Foundations course, covering feasibility of learning, Hoeffding and multi‑bin Hoeffding inequalities, VC bounds, dichotomies, growth and bounding functions, VC dimension, and their implications for model and sample complexity.
This article presents the author’s concise notes after studying Professor Lin Xuantian’s “Foundations of Machine Learning” course, primarily based on the professor’s lecture materials with additional references indicated.
Feasibility : Learning a model from training samples to estimate unknown data aims to make the in‑sample error (E in ) close to the out‑of‑sample error (E out ); reducing E in therefore leads to a small E out , achieving good learning performance.
Hoeffding Inequality : When the sample size is sufficiently large, the proportion observed in the sample approximates the true proportion (the PAC principle). For a single hypothesis, a low empirical error implies, with high probability, a low true error.
Multi‑bin Hoeffding : For a hypothesis set containing many hypotheses, the simple Hoeffding result no longer applies. Using the union bound, the probability that a sample is “bad” for some hypothesis grows with the number of hypotheses M and shrinks with sample size N. Thus, with limited M and large N we can confidently select the hypothesis with minimal empirical error.
VC Bound : To replace the possibly infinite hypothesis‑set size M with a polynomial growth function, the VC dimension is introduced. After deriving a bounding function B(N,k), the VC bound (B′) and its final form (C) show that, under the condition of finite VC dimension and sufficiently large N, E in ≈ E out for any hypothesis in the set.
Key VC Concepts :
Dichotomy : For a sample D, each possible labeling (○ or ×) defines a dichotomy; if a hypothesis can realize that labeling, the dichotomy exists. All dichotomies for D form the dichotomy set.
Growth Function m(N) : The maximum size of the dichotomy set over all samples of size N.
Shatter : When m(N)=2^N, the hypothesis set H shatters that sample.
Breakpoint k : The smallest N where m(N)≠2^N.
VC Dimension d : The largest N for which m(N)=2^N, i.e., d = k‑1.
Relations: (1) For a perceptron, the number of parameters is d+1, linking VC dimension to the effective degrees of freedom of the hypothesis set; larger VC dimension indicates a more expressive hypothesis class. (2) Substituting k‑1 with d in the bound yields a penalty term Ω(N,H,δ) that reflects model‑complexity cost.
The penalty Ω(N,H,δ) grows with VC dimension, indicating that a more powerful hypothesis set incurs higher error cost. Sample complexity can be approximated as N ≈ 10·d, where d is the VC dimension.
Reference: https://juejin.cn/post/6844903541476163597
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.