I do not have an office at NYU. I work from home. I come to NYU only to give lectures. If you need to contact me, please email me and/or arrange a Zoom or Google Hangouts call. Office hours are on Zoom.>
In this course you will learn the theoretical foundations of machine learning. We will explain how a machine learning algorithm can make accurate predictions on unseen data. This property is called generalization. We develop a statistical theory of generalization for prediction problems. Along the way we analyze some popular machine learning methods (support vector machines, boosting, perceptron, stochastic gradient descent, decision trees, neural networks, k-nearest neighbor). Besides generalization properties of machine learning algorithms, we will look also at their computational complexity and computational hardness of the underlying optimization problems.
The class will be completely theoretical. In lectures, I will give definitions and prove theorems. In homeworks and exams, I will ask you to solve mathematical problems and prove theorems. You will NOT do any coding, you will NOT do any data analysis, and you will NOT build any machine learning models.
You must have taken courses in probability theory, linear algebra, and algorithms. You will benefit from prior exposure to machine learning (ML) (e.g. introductory ML course, neural network course, statistical inference course, practical experience). However, knowledge of ML is not a prerequisite. I will give a brief introduction to ML in the first lecture.
I expect you to know one-dimensional calculus (limits, infimum, supremum, continuity, derivatives and integrals), mathematical notation (sets, functions, logical formulas), and elementary combinatorics (combinations, permutations, binomial coeficients). You must be comfortable doing mathematical proofs.
Note that the total percentage adds up to 110%. That means you can make up lost points. If you want to do well in the course, make sure to submit solutions to all homework problems.
There is one 2.5 hour class meeting per week, divided by a 15 minute break. In accordance with NYU guidelines, you must wear a mask covering your face and nose whenever you are attending lecture.
Your solution must be turned both by email and as a printed out hard-copy in class by the due date. Send emails to david.pal@nyu.edu with subject "Homework #?". Use LaTeX and submit your solution as a PDF file. Use the following template.
LaTeX source | Handout date | Due date | Graded by | Solution PDF | Solution LaTeX source | ||
---|---|---|---|---|---|---|---|
Homework #0 | LaTeX source | January 23 | January 31 | February 5 | |||
Homework #1 | February 5 | February 19 | February 26 | ||||
Homework #2 | February 19 | March 5 | March 12 | ||||
Homework #3 | March 26 | April 9 | April 16 | ||||
Homework #4 | April 9 | April 23 | April 30 |
I am hoping to cover the topics listed below. Additional topics (k-nearest neighbor, support vector machines, neural networks) might be added if time permits.
Week 1 | January 29 | Generalization error, Test error, Optimal Bayes classifier | Chapters 1,2,3 | |
Week 2 | February 5 | Empirical Risk Minimization (ERM), Overfitting, PAC model with finite hypothesis class, Concentration inequalities, Hoeffding's bound, No free lunch | Chapters 2,3,4; Appendix B | |
Week 3 | February 12 | Hoeffding's bound, PAC model, No free lunch theorem | Chapters 3,4,5; Appendix B | |
Week 4 | February 19 | VC dimension, Sauer's lemma, Radon's theorem, VC dimension of halfspaces, epsilon-nets, epsilon-approximations | Chapter 6 | |
Week 5 | February 26 | epsilon-nets, epsilon-approximations, sample complexity upper and lower bounds | Chapters 6,28 | |
Week 6 | March 5 | Measure theory, Upper bound on the sample complexity for Agnostic PAC model | Chapters 6, 28 | |
Week 7 | March 12 | Midterm exam | ||
Week 8 | March 19 | Spring break — No lectures | ||
Week 9 | March 26 | Non-uniform learning and computational complexity of learning | Chapters 7, 8 | |
Week 10 | April 2 | Online learning model, Halving algorithm, Mistake bound, Perceptron, Winnow, Online-to-batch conversion | Chapter 21 | |
Week 11 | April 9 | Hedge, Perceptron, Online-to-batch conversion, Boosting | Chapters 21, 10 | |
Week 12 | April 16 | Boosting, Decision trees | Chapters 10, 18 | |
Week 13 | April 23 | Least squares, logistic regression, convex learning problems | Chapters 9, 12 | |
Week 14 | April 30 | Convex learning problems, stability, regularization | Chapters 12, 13 | |
Week 15 | May 7 | Regularization stability, Online gradient descent, stochastic gradient descent | Chapters 13, 14 | |
Week 16 | May 14 | Final exam |