- Part I. History of Statistical Learning Theory. 1. In hindsight: Doklady Akademii nauk SSSR, 181(4), 1968 / Léon Bottou
- 2. On the uniform convergence of the frequencies of occurrence of events to their probabilities / Vladimir N. Vapnik and Alexey Ya. Chervonenkis
- 3. Early history of support vector machines / Alexey Ya. Chervonenkis
- Part II. Theory and Practice of Statistical Learning Theory. 4. Some remarks on the statistical analysis of SVMs and related methods / Ingo Steinwart
- 5. Explaining AdaBoost / Robert E. Schapire
- 6. On the relations and differences between popper dimension, exclusion dimension and VC-dimension / Yevgeny Seldin and Bernhard Schölkopf
- 7. On learnability, complexity and stability / Silvia Villa, Lorenzo Rosasco, and Tomaso Poggio
- 8. Loss functions / Robert C. Williamson
- 9. Statistical learning theory in practice / Jason Weston
- 10. PAC-Bayesian theory / David McAllester and Takintayo Akinbiyi
- 11. Kernel ridge regression / Vladmir Vovk
- 12. Multi-task learning for computational biology: overview and outlook / Christian Widmenr, Marius Kloft, and Gunnar Rätsch
- 13. Semi-supervised learning in causal and anticausal settings / Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij
- 14. Strong universal consistent estimate of the minimum mean squared error / Luc Devroye, Paola G. Ferrario, László Györfi, and Harro Walk
- 15. The median hypothesis / Ran Gilad-Bachrach and Chris J.C. Burges
- 16. Efficient transductive online learning via randomized rounding / Nicoló Cesa-bianchi and Ohad Shamir
- 17. Pivotal estimation in high-dimensional regression via linear programming / Eric Gautier and Alexandre B. Tsybakov
- 18. On sparsity inducing regularization methods for machine learning / Andreas Argyriou, Luca Baldassarre, Charles A. Micchelli, and Massimiliano Pontil
- 19. Sharp oracle inequalities in low rank estimation / Vladimir Koltchinskii
- 20. On the consistency of the bootstrap approach for support vector machines and related kernel-based methods / Andreas Christmann and Robert Hable
- 21. Kernels, pre-images and optimization / John C. Snyder, Sebastian Mika, Kieron Burke, and Klaus-Robert Müller
- 22. Efficient learning of sparse ranking functions / Mark Stevens, Samy Bengio, and Yoram Singer
- 23. Direct approximation of divergences between probability distributions / Masashi Sugiyama.

This book honours the outstanding contributions of Vladimir Vapnik, a rare example of a scientist for whom the following statements hold true simultaneously: his work led to the inception of a new field of research, the theory of statistical learning and empirical inference; he has lived to see the field blossom; and he is still as active as ever. He started analyzing learning algorithms in the 1960s and he invented the first version of the generalized portrait algorithm. He later developed one of the most successful methods in machine learning, the support vector machine (SVM) - more than just an algorithm, this was a new approach to learning problems, pioneering the use of functional analysis and convex optimization in machine learning. Part I of this book contains three chapters describing and witnessing some of Vladimir Vapnik's contributions to science. In the first chapter, Léon Bottou discusses the seminal paper published in 1968 by Vapnik and Chervonenkis that lay the foundations of statistical learning theory, and the second chapter is an English-language translation of that original paper. In the third chapter, Alexey Chervonenkis presents a first-hand account of the early history of SVMs and valuable insights into the first steps in the development of the SVM in the framework of the generalised portrait method. The remaining chapters, by leading scientists in domains such as statistics, theoretical computer science, and mathematics, address substantial topics in the theory and practice of statistical learning theory, including SVMs and other kernel-based methods, boosting, PAC-Bayesian theory, online and transductive learning, loss functions, learnable function classes, notions of complexity for function classes, multitask learning, and hypothesis selection. These contributions include historical and context notes, short surveys, and comments on future research directions. This book will be of interest to researchers, engineers, and graduate students engaged with all aspects of statistical learning.