This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider the setting where the number of features (weak learners) p scales with the sample size n, in an over-parametrized regime. Under a broad class of statistical models, we provide an exact analysis of the generalization error of boosting, when the algorithm interpolates the training data and maximizes the empirical L1-margin. The relation between the boosting test error and the optimal Bayes error is pinned down explicitly. In turn, these precise characterizations resolve several open questions raised in [15, 81] surrounding boosting. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Furthermore, we discover that the larger the overparametrization ratio p/n, the smaller the proportion of active features (with zero initialization), and the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Variants of AdaBoost corresponding to general Lq geometry, for q > 1, are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms.

More Research From These Scholars

BFI Working Paper Oct 16, 2020

How Well Generative Adversarial Networks Learn Distributions

Tengyuan Liang
Topics:  Technology & Innovation
BFI Working Paper Oct 16, 2020

Estimating Certain Integral Probability Metrics (IPMs) Is as Hard as Estimating under the IPMs

Tengyuan Liang
Topics:  Technology & Innovation
BFI Working Paper Oct 16, 2020

Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

Tengyuan Liang, Hai Tran-Bach
Topics:  Technology & Innovation