Welcome to STA 561 Probabilistic Machine Learning Spring 2024

Quick references:

Instructor: Eric Laber, eric.laber@duke.edu, laber-labs.com
Office hours: M 11:30AM-12:30PM, Sun 8PM or by appt, location: zoom
TAs:
- Jingan Zhou, jingan.zhou@duke.edu
- Miles Martinez, miles.martinez@duke.edu
- Yixin Zhang, yixin.zhang7@duke.edu
- Sanjev Vishnu Thulasiraman, sanjevvishnu.thulasiraman@duke.edu
- Jay Bagrecha, jay.bagrecha@duke.edu
- Rihui Oh, rihui.ou@duke.edu
- Yen-Chun Liu, yenchun.liu@duke.edu
TA office hours will be set in your lab sections
Course Sakai Page

Overview

The goal of this course is introduce the statsitical underpinnings needed to solve a many modern statistical problems. Our focus will be on key ideas in prediction and decision making. Often we will try to find the simplest version of a problem/algorithm/idea that illustrates salient features while leaving more complex nuanced versions to homework or to self-study. On a related note, this course is not a catalog of machine learning algorithms and all their variants; such a catalog would immediately be out-of-date as new methods are constantly being introduced (furthemore, learning and using new methods becomes dramatically easier if one has strong intuitive and theoretical understanding of the foundations of statistics/ML.) While much of our lecture time will be spent on proofs and derivations, the homework will involve putting these indeas into practice with simulation experiments or data anlyses.

Pre-requisites

I will assume that students have a basic understanding of mathematical statistics, calculus, basic analyses, linear algebgra, and computing. There are many excellent resources online for shoring gaps in these areas. Coursera, EdX, Udemy, YouTube, etc., are a great place to start. While I will do my best to review key ideas, I will take for granted that students know basic results such as the strong law of large numbers, the central limit theorem, and matrix decompositions. I will also be holding several review sessions throughout the semester to help students prepare for more technical material if it appears that there is interest.

Syllabus (subject to change; roughly one topic per week)

Linear regression review
Linear regression and regularization and noise addition
Cross-validation and inference
Post selection inference
Linear regression and online estimation
Kernel methods
Random forests
Partial linear models
Active learning (i.e., sequential experimental design)
Large margin classifiers
Nearest neighbor methods
Batch decision problems (one-stage)
Batch decision problems (mult-stage)
Contextual bandits
Reinforcement learning

Notes on the notes below

I’ll be updating the notes as we go along. Some I’ll update a lot, some only a little. Read ahead of class at you own peril. But, be warned, you might learn something that’s not on the exam (gasp!) Also, there isn’t an exam (double gasp!)

References

We will primarily use slides and the (virtual) whiteboard for lectures. A list of references for background and/or further study will be provided with each topic. General references that you may find useful include:

Elements of Statistical Learning, Hastie, Tibshirani, and Freedman PDF
Reinforcement learning, Sutton and Barto PDF
Pattern Classification, Duda, Hart, and Stork Amazon, there are pdfs online from the authors but they’ve asked others not to distribute so not linked here

Some references on classic linear models that may be useful for background include

Linear Models with Python, Faraway Amazon
Transformation and Weighting in Regression, Carroll and Ruppert Amazon

Advice

The objective of this course is to develop your statistical thinking for prediction and decision problems. I strongly encourage you to work with your classmates on all homework and projects and to focus on a deep understanding rather than your grades. Some of the problems will be ambiguous and open-ended. This is (mostly) intentional and intended to provide you with practice making choices and assumptions when you approach an ill-defined problem. (In application, problems are rarely cleanly when they first reach you.) I also encourage you to find other sources of information that explain the same material another way or that explore issues that we didn’t cover in class.

Grading

Grades will be based on homework (80%) and a project (20%). You can work with your classmates on everything. Thus, by appealing to the wisdom of crowds, there is no reason for low HW scores. Late homework will not be accepted but the lowest homework will be dropped.

COVID

Due to COVID-19 the course will be partly online. This is unfortunate because meeting in person can often give me clues about which topics are causing confusion (or boredom!) and allow me to adapt material accordingly. Because I won’t be able to see your eyes gloss over or see the pained expressions on your faces, we will need to take extra steps to make sure that everyone is following along. Please let me or your TAs know if you are struggling with the material and if/what extra content might be helpful. We may also need to slow down to accommodate this new format. I would rather you learn 10 topics well than 15 topics poorly.

Once my student

Once my student, always my student. After this class if over, please don’t hesitate to reach out if there’s something you think I might be able to help you with. (Statistics-wise or lifewise etc. However, I don’t want support the go fund me for your novel, I read a few of the chapters you released online, and let’s be honest, they’re not very good. Your writing is terrible and we are all dumber for having read them. Now, that idea you had for an doggie-door that does automatic grooming…I might be able to get behind that.)

Lectures

Review of linear regresion
- Slides
- Homework one
- Image example notebook
- lineberger_study_data.csv
- Material summary and proofs UPDATED!
- Lab topic: Intro to pycharm and jupyter notebooks
Linear regression and regularization
- Slides
- Homework two
- allSubsets.R
- growingLinerModel.R
- lasso.R
- Lab topic: Ridge and Lasso with CV, GCV, AIC, and BIC
- Optional additional reading: Chapter three of ESL
Scaling up
- Slides (Updated)
- Homework three
- SIS example R code
- distributed OLS and Lasso example R code
- distributed Cox-PH
- Simple projections R code
- Lab topic: Kernel methods
- Optional additional reading: Dual Ridge Regression, Random Projections and Large Scale Regression
- PRFAQ Plan Template
- An interactive review of gradient descent
- Robbins and Monro (1951)
- Exam one (This is for fun this year!)
- Homework four Updated!
- Homework five and six
- HW 5 and 6, Upated!
Classification
Decision making
- Slides
- Review of precision medicine
- Excellent review of bandits
- What’s so special about Markov Decision Processes?
- V-learning
- Bayesian Selection Paradoxes
- EVERYTHING BELOW HERE IS OUTDATED
- Homework 8-9
- Final homework (finally!)