Errata

Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience

  1. In Footnote 1, the variance should be specified as \(1/H\) and not \(1/{\sqrt{H}}\).

  2. In Equation (2), the \(\forall l\) in the second line must be a \(\exists l\). The correct equation is:

\[\require{color} \begin{align} &{\text{if }} \forall q < r, \forall l, \rho_{q,l}(\mathcal{W},\mathbf{x},y) > 0 \text{ then} \\ & \; \; \; Pr_{\mathcal{U} \sim \mathcal{N}(0,\sigma^2 I)} \Big[ { \colorbox{yellow}{$\exists l$} \; |\rho_{r,l}(\mathcal{W}+\mathcal{U},{\mathbf{x},y}) - \rho_{r,l}(\mathcal{W},{\mathbf{x},y}) | > \frac{\Delta_{r,l}(\sigma)}{2} } \; \; \text{ and } \\ & \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; { \forall q {<} r, \forall l \; \; |\rho_{q,l}(\mathcal{W}+\mathcal{U},{\mathbf{x},y}) - \rho_{q,l}(\mathcal{W},{\mathbf{x},y}) |{<} \frac{\Delta_{q,l}(\sigma)}{2} } \Big] \leq \frac{1}{(R+1)\sqrt{m}}. \end{align}\]

The same correction holds good for similar equations in the proof of Theorem 3.1.

Uniform convergence may be unable to explain generalization

  1. Definition 3.1 uses the term “generalization error” when it is ideally supposed to be referred to as “generalization gap”. The term “generalization error” conventionally refers to the error rate of a classifier in expectation over test data. “Generalization gap” conventionally refers to the gap between the test error and the training error. In the specific case of interpolating models (where training error is zero), both these quantities would numerically evaluate to the same value. Nevertheless, in a general setting, these two quantities are two separate concepts.