INFO251 Spring 2021
(all readings and lecture recordings are available on bCourses)

 

January 19: Introduction

  • Introductions
  • Nuts and bolts of the class: structure, homework, policies, learning objectives

Required Readings (students are already expected to have this level of familiarity with Python):

January 20 (Lab): NO LAB TODAY

  • There is no lab section on Jan 22, please do not show up!

January 21: Experimental Methods for Causal Inference

  • A-B testing, Business Experiments, Randomized Control Trials
  • Counterfactuals and Control Groups
  • Correlation and Causation
  • Experimental design and statistical power

Required Readings:

Optional Readings:

January 26: Impact Evaluation

  • Research designs for impact evaluation
  • Identifying assumptions
  • Differences-in-Difference

Required Readings:

 Optional readings

January 27 (Lab): Python and Pandas

  • Programming paradigms
  • Working with data
  • Crash course in python

January 28: Regression and Impact Evaluation

  • Regression and causal inference
  • Interactions and heterogeneity
  • Fixed and random effects

Required Readings:

Optional Readings:

  • A more systematic treatment: Gerber, A.S., Green, D.P., 2012. Field Experiments: Design, Analysis, and Interpretation. W. W. Norton & Company, New York.

February 2: Non-Experimental Methods for Causal Inference

  • Instrumental Variables

Required Readings:

Optional Readings

February 3 (Lab): Regression and Hypothesis Testing

  • T-tests and regressions with Python
  • Dummy variables, interactions, fixed effects
  • Fixed effects
  • Interaction terms
  • Instrumental variables

February 4: Non-Experimental Methods for Causal Inference, continued

  • Regression discontinuity

Required Readings:

Optional Readings

February 9: Intro to Machine Learning

  • Supervised and unsupervised learning
  • Representation
  • Evaluation
  • Optimization
  • Generalization and overfitting
  • Training and test data
  • Cross-validation and bootstrapping
  • Evaluation and baselines
  • Features and feature selection

Required Readings:

Optional Readings:

February 10 (Lab): Computational Efficiency

  • Vectorized computation

February 11: Nearest Neighbors

  • Instance-based learning
  • Nearest neighbors
  • Curse of dimensionality

Required Readings:

Optional Readings:

February 16: Gradient Descent

  • Cost functions
  • Gradient descent
  • Convexity

Required Readings:

Optional Readings:

February 17 (Lab): ML Experiments in Python

  • Random numbers, training and test data
  • Built-in methods for cross validation
  • Comparing different measures of performance

February 18: Regularization and Linear Models, part 1

  • Regularization
  • Ridge and Lasso
  • Logistic regression
  • Support vector machines
  • Kernel methods

Required Readings:

Optional Readings:

February 23: Regularization and Linear Models, part 2

  • Regularization
  • Ridge and Lasso
  • Logistic regression
  • Support vector machines
  • Kernel methods

Same readings as above

February 24 (Lab):  Linear models and Regularization

  • Lasso vs. Ridge
  • Cross-validation to find optimal regularization parameter
  • Computational efficiency revisited

February 25: Naive Bayes

  • Probability review: Bayes rule, independence, distributions
  • Generative models and Naive Bayes
  • Maximum likelihood estimation and smoothing

Required Readings:

Optional Readings:

March 2: Mid-Semester Quiz

  • Quiz #1

March 3 (Lab): Gradient descent (continued)

  • Gradient descent
  • Naive bayes

March 4: Decision Trees

  • Building decision trees
  • Information gain

Required Readings:

Optional Readings:

  • Chapter 9 (section 9.2) and Chapter 15 of Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (10th edition)

March 9: Random Forests

  • Regression Trees
  • Random Forests
  • Boosting
  • Feature Importance

Optional Readings:

March 10 (Lab): Neural networks

  • Intro to TensorFlow

March 11: Neural Networks, part 1

  • Biological underpinnings
  • The perceptron
  • Rosenblatt's algorithm

Required Readings:

Optional Readings:

March 16: Neural Networks, part 2

  • Multilayer networks
  • Backpropagation

Required Readings:

Optional Readings:

  • Chapter 11 (sections 11.3-11.4) of Hastie, Tibshirani, Friedman, The Elements of Statistical Learning
  • We will review these videos by Grant Sanderson on backpropagation in class:
    • https://www.youtube.com/watch?v=Ilg3gGewQ5U
    • https://www.youtube.com/watch?v=tIeHLnjs5U8

 March 17 (Lab): Deep Learning

  • Naive Bayes

March 18: Deep Learning, part 1

  • What is "deep" about deep learning?
  • Auto-encoders
  • Convolutional Neural Networks
  • RNNs / LTSM Networks

Required Readings:

Optional Readings:

--- SPRING BREAK ---

March 30: Bias in ML

  • High-profile ML failures
  • Sources of bias
  • Notions of fairness

Required Readings:

Optional Readings

  • Solon Barocas, Moritz Hardt, Arvind Narayanan. 2020. Fairness and machine learning:Limitations and Opportunities. https://fairmlbook.org (Chapters 1,2, and 5)

March 31: Fair ML lab

April 1: Fair ML 

  • Formalization
  • Identifying bias
  • Fairness constraints
  • Technical "solutions"

Required Readings:

April 6: Common practical issues

  • Bias-variance tradeoff
  • Feature engineering
  • Imbalanced data

Required Readings:

Optional Readings

April 7 (Lab): Supervised learning practicalities

  • X

April 8: Common practical issues (Part 2)

  • Imbalanced data
  • Missing data
  • Multi-class classification
  • Model and feature selection

Required reading

  • He, H., Garcia, E.A., 2009. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21, 1263–1284. doi:10.1109/TKDE.2008.239

Optional reading:

April 13: Supervised Learning Wrap-Up

  • Modelling Trade-Offs
  • Comparing classifiers
  • Guiding principles

Required Readings:

Optional Readings:

  • Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J., Steinberg, D., 2008.  “ Top 10 Algorithms in Data Mining ”. Knowledge and Information Systems 14, 1–37. doi:10.1007/s10115-007-0114-2

April 14 (Lab): TBD

  • X

April 15: Unsupervised learning

  • Cluster analysis
  • Dimensionality Reduction
  • Principal Component Analysis
  • Case study: Eigenfaces
  • Other methods for dimensionality reduction: SVD, NNMF, LDA

Required Readings

Optional Readings

April 20: Recommender Systems

  • The Netflix challenge
  • Content-based methods
  • Learning features and parameters
  • Nearest-neighbor collaborative filtering

Recommended Readings:

Optional Readings:

April 21 (Lab): Unsupervised learning

  • k-Means clustering
  • Dimensionality reduction: PCA

April 22: Machine learning and causal inference

  • ML for measurement
  • Inference after selection
  • Selecting among many controls
  • Selecting among many instruments
  • Machine learning heterogeneous treatment effects

Required Readings:

Optional Readings:

April 27: Applied ML - start to finish

  • Data => Features
  • Training and cross-validation
  • Evaluating performance
  • Extensions

Optional Readings:

April 28 (Lab): No Lab

April 29: Summary 

  • Recap / summary
  • Quiz #2