INFO251 Spring 2021
(all readings and lecture recordings are available on bCourses)
January 19: Introduction
 Introductions
 Nuts and bolts of the class: structure, homework, policies, learning objectives
Required Readings (students are already expected to have this level of familiarity with Python):
 Chapters 35 and Chapter 9 of McKinney (2013): Python for Data Analysis. O’Reilly Media, Inc.
 Install python, IPython, and the numerical analysis libraries on your laptop and bring it to class. I highly recommend you install the Anaconda version, but if you want to assemble the packages yourself, make sure you have python, ipython notebook, numpy, scipy, and matplotlib
 Read and complete at least the "Introduction" to the following Python tutorial: http://interactivepython.org/courselib/static/pythonds/index.html
 Watch 10minute tour of pandas: https://www.youtube.com/watch?v=dcqPhpY7tWk
 Strongly recommended: Read and complete lessons 17 of Learn Pandas (https://bitbucket.org/hrojas/learnpandas)
January 20 (Lab): NO LAB TODAY
 There is no lab section on Jan 22, please do not show up!
January 21: Experimental Methods for Causal Inference
 AB testing, Business Experiments, Randomized Control Trials
 Counterfactuals and Control Groups
 Correlation and Causation
 Experimental design and statistical power
Required Readings:
 Chapters 23 of Khandker et al. (2010), “Handbook on Impact Evaluation”
 Introduction (pp. 263269) to: Bertrand et al. (2012) “What's advertising content worth? Evidence from a consumer credit marketing field experiment” Quarterly Journal of Economics, 125(11) pp. 263269
Optional Readings:
 Pages 147 of: Duflo, M. Kremer and R. Glennerster (2006). "Using Randomization in Development Economics Research: A Toolkit"
 Athey & Imbens (2016) The econometrics of Randomized Experiments

Lin, M., Lucas, H.C., Shmueli, G., 2013. Research Commentary—Too Big to Fail: Large Samples and the pValue Problem. Information Systems Research 24, 906–917. doi:10.1287/isre.2013.0480
 Anderson & Simester (2011). “A StepByStep Guide to Smart Business Experiments”, Harvard Business Review, pp. 99105
 Ariely (2004). “Why Businesses Don’t Experiment”, Harvard Business Review, p. 34
 Kohavi, R., Longbotham, R., Sommerfield, D. and Henne, R. Controlled experiments on the Web: Survey and practical guide. Data Mining and Knowledge Discovery 18 (2009), 140–181.

Reiley, D., Rao, J.M. & Lewis, R.A. (2011) Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising. WWW 2011.
January 26: Impact Evaluation
 Research designs for impact evaluation
 Identifying assumptions
 DifferencesinDifference
Required Readings:
 Sections 13 of Shultz: School subsidies for the poor

Varian, H.R., 2016. Causal inference in economics and marketing. PNAS 113, 7310–7315. doi:10.1073/pnas.1510479113
Optional readings
 David Albouy: Lecture notes on Differences in Differences Estimation
 Lewis, R., Rao, J.M. & Reiley, D.H. (2015) Measuring the effects of advertising: The digital frontier . In: Economic Analysis of the Digital Economy . University of Chicago Press. pp. 191–218.

Jensen, R., 2007. The Digital Provide: Information (Technology), Market Performance, and Welfare in the South Indian Fisheries Sector. The Quarterly Journal of Economics 122, 879–924.
January 27 (Lab): Python and Pandas
 Programming paradigms
 Working with data
 Crash course in python
January 28: Regression and Impact Evaluation
 Regression and causal inference
 Interactions and heterogeneity
 Fixed and random effects
Required Readings:
 Chapter 5 of Khandker et al. (2010), “Handbook on Impact Evaluation”
 Lecture notes on “Fixed Effects Models”
Optional Readings:
 A more systematic treatment: Gerber, A.S., Green, D.P., 2012. Field Experiments: Design, Analysis, and Interpretation. W. W. Norton & Company, New York.
February 2: NonExperimental Methods for Causal Inference
 Instrumental Variables
Required Readings:
 Chapter 6 of Khandker (2010), “Handbook on Impact Evaluation”
Optional Readings
 Chapter 10 of Stock & Watson (2010) on “Instrumental Variables”
 Angrist, J.; Krueger, A. (2001). "Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments". Journal of Economic Perspectives 15(4): 69–85.
 Duflo (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment
 A more systematic treatment: Kennedy, P., 2008. A Guide to Econometrics. 6 edition. ed. WileyBlackwell, Malden, MA.
 Alexandre Belloni, Victor Chernozhukov, and Christian Hansen (2011): “LASSO Methods for Gaussian Instrumental Variables Models,” 2011 arXiv:[stat.ME], http://arxiv.org/abs/1012.1297.
 Jason Hartford, Greg Lewis, Kevin LeytonBrown, Matt Taddy (2017): “Deep IV: A Flexible Approach for Counterfactual Prediction.” Proceedings of the 34th International Conference on Machine Learning, PMLR 70, 1414–1423
February 3 (Lab): Regression and Hypothesis Testing
 Ttests and regressions with Python
 Dummy variables, interactions, fixed effects
 Fixed effects
 Interaction terms
 Instrumental variables
February 4: NonExperimental Methods for Causal Inference, continued
 Regression discontinuity
Required Readings:
 Chapter 7 of Khandker (2010), “Handbook on Impact Evaluation”
Optional Readings
 Read a simplified example RD analysis in Python
 Buddlemeyer & Skoufias (2004). An Evaluation of the Performance of Regression Discontinuity Design on PROGRESA.

Solis, A., 2017. Credit Access and College Enrollment. Journal of Political Economy 125, 562–622. doi:10.1086/690829
February 9: Intro to Machine Learning
 Supervised and unsupervised learning
 Representation
 Evaluation
 Optimization
 Generalization and overfitting
 Training and test data
 Crossvalidation and bootstrapping
 Evaluation and baselines
 Features and feature selection
Required Readings:
 Chapters 1 & 2 of Daume (in preparation). A course in machine learning
 Chapter 5 of Witten, Frank, Hall: Data Mining
Optional Readings:

Mullainathan, S., Spiess, J., 2017. Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives 31, 87–106. https://doi.org/10.1257/jep.31.2.87
 Syed, A. (2011). A review of cross validation and adaptive model selection.
February 10 (Lab): Computational Efficiency
 Vectorized computation
February 11: Nearest Neighbors
 Instancebased learning
 Nearest neighbors
 Curse of dimensionality
Required Readings:
 Chapter 3 of Daume (in preparation). A course in machine learning
Optional Readings:
 Chapter 13 (sections 13.1  13.3) of Hastie, Tibshirani, Friedman, The Elements of Statistical Learning
 Chapter 6 of Provost & Fawcett: Data Science for Business
February 16: Gradient Descent
 Cost functions
 Gradient descent
 Convexity
Required Readings:
 Chapter 7 of Daume (in preparation). A course in machine learning
Optional Readings:
 Chapter 5 of Schutt & O’Neill (2013): Doing Data Science
February 17 (Lab): ML Experiments in Python
 Random numbers, training and test data
 Builtin methods for cross validation
 Comparing different measures of performance
February 18: Regularization and Linear Models, part 1
 Regularization
 Ridge and Lasso
 Logistic regression
 Support vector machines
 Kernel methods
Required Readings:
 Chapter 7 of Daume (in preparation). A course in machine learning
Optional Readings:
 Chapter 6 (section 6.2) of James et al. (2016): Introduction to Statistical Learning
 This post on interpreting logistic regression results
 Chapter 3 (sections 3.3 and 3.4) of Hastie, Tibshirani, Friedman, The Elements of Statistical Learning
February 23: Regularization and Linear Models, part 2
 Regularization
 Ridge and Lasso
 Logistic regression
 Support vector machines
 Kernel methods
Same readings as above
February 24 (Lab): Linear models and Regularization
 Lasso vs. Ridge
 Crossvalidation to find optimal regularization parameter
 Computational efficiency revisited
February 25: Naive Bayes
 Probability review: Bayes rule, independence, distributions
 Generative models and Naive Bayes
 Maximum likelihood estimation and smoothing
Required Readings:
 Chapter 4 of Schutt & O’Neill (2013): Doing Data Science
 Reread section 4.2 of Whitten, Frank, Hall: Data Mining
 Michael Collin’s lecture notes on Naïve Bayes (especially pp. 14)
Optional Readings:
 Paul Graham (2002) on “Better Bayesian Filtering”.
 Kevin Murphy's example of Bayes' Rule for medical diagnosis
March 2: MidSemester Quiz
 Quiz #1
March 3 (Lab): Gradient descent (continued)
 Gradient descent
 Naive bayes
March 4: Decision Trees
 Building decision trees
 Information gain
Required Readings:
 Chapter 8 of James et al. (2016): Introduction to Statistical Learning
 Chapters 13 of Daume (in preparation). A course in machine learning
Optional Readings:
 Chapter 9 (section 9.2) and Chapter 15 of Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (10^{th} edition)
March 9: Random Forests
 Regression Trees
 Random Forests
 Boosting
 Feature Importance
Optional Readings:
 Feature importance measures for random forest: blog post
 A Kaggle master explains gradient boosting
March 10 (Lab): Neural networks
 Intro to TensorFlow
March 11: Neural Networks, part 1
 Biological underpinnings
 The perceptron
 Rosenblatt's algorithm
Required Readings:
 Chapters 4 and 10 of Daume (in preparation). A course in machine learning
Optional Readings:
 Chapter 11 (sections 11.311.4) of Hastie, Tibshirani, Friedman, The Elements of Statistical Learning
March 16: Neural Networks, part 2
 Multilayer networks
 Backpropagation
Required Readings:
 Chapters 4 and 10 of Daume (in preparation). A course in machine learning
Optional Readings:
 Chapter 11 (sections 11.311.4) of Hastie, Tibshirani, Friedman, The Elements of Statistical Learning

We will review these videos by Grant Sanderson on backpropagation in class:
 https://www.youtube.com/watch?v=Ilg3gGewQ5U
 https://www.youtube.com/watch?v=tIeHLnjs5U8
March 17 (Lab): Deep Learning
 Naive Bayes
March 18: Deep Learning, part 1
 What is "deep" about deep learning?
 Autoencoders
 Convolutional Neural Networks
 RNNs / LTSM Networks
Required Readings:
 Andrew Ng's lecture notes on sparse autoencoders
 UFLDL's Deep Learning tutorial
Optional Readings:
 SingleLayer Neural Networks and Gradient Descent
 A stepbystep backpropagation tutorial
 Tutorial on ConvNets
 Understanding LSTM Networks
 Dean (2018). The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
 SPRING BREAK 
March 30: Bias in ML
 Highprofile ML failures
 Sources of bias
 Notions of fairness
Required Readings:
 Obermeyer, Powers, Vogeli and Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. https://science.sciencemag.org/content/366/6464/447
Optional Readings
 Solon Barocas, Moritz Hardt, Arvind Narayanan. 2020. Fairness and machine learning:Limitations and Opportunities. https://fairmlbook.org (Chapters 1,2, and 5)
March 31: Fair ML lab
 Download lab 1 here: https://colab.research.google.com/drive/1yYHoLqbM5in4T801mQ083XGpRqHwtFhm
 Download lab 2 here: https://colab.research.google.com/drive/1kMXYl7LPX1qTnBdBiueYx15umPzM24
April 1: Fair ML
 Formalization
 Identifying bias
 Fairness constraints
 Technical "solutions"
Required Readings:
 Reading: Mulligan, Kroll, Kohli & Wong. 2019. This Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology. https://dl.acm.org/doi/10.1145/3359221
April 6: Common practical issues
 Biasvariance tradeoff
 Feature engineering
 Imbalanced data
Required Readings:
 Chapters 5 & 6 of Daume (in preparation). A course in machine learning
Optional Readings
 A plainEnglish tutorial on the biasvariance tradeoff
 Chapters 13 of Mastering Feature Engineering (early release)
 Chapter 2 of James et al. (2017). An Introduction to statistical Learning
 Andrew Gelman on Missing Data Imputation
 He, H., Garcia, E.A., 2009. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21, 1263–1284. doi:10.1109/TKDE.2008.239

Lakkaraju, H., Kleinberg, J., Leskovec, J., Ludwig, J., Mullainathan, S., 2017. The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables. KDD 2017, 275–284. https://doi.org/10.1145/3097983.3098066
April 7 (Lab): Supervised learning practicalities
 X
April 8: Common practical issues (Part 2)
 Imbalanced data
 Missing data
 Multiclass classification
 Model and feature selection
Required reading
 He, H., Garcia, E.A., 2009. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21, 1263–1284. doi:10.1109/TKDE.2008.239
Optional reading:
 Python tutorial on CostSensitive Decision Trees for Imbalanced Classification
 Python tutorial on softmax classification
 Multinomial response models, from Rodríguez, G. (2007). Lecture Notes on Generalized Linear Models.
April 13: Supervised Learning WrapUp
 Modelling TradeOffs
 Comparing classifiers
 Guiding principles
Required Readings:
 Chapter 13 of Daume (in preparation). A course in machine learning
 Domingos, “ A Few Useful Things to Know about Machine Learning .” Communications of the ACM, 55 (10), 7887, 2012.
Optional Readings:
 Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D., 2008. “ Top 10 Algorithms in Data Mining ”. Knowledge and Information Systems 14, 1–37. doi:10.1007/s1011500701142
April 14 (Lab): TBD
 X
April 15: Unsupervised learning
 Cluster analysis
 Dimensionality Reduction
 Principal Component Analysis
 Case study: Eigenfaces
 Other methods for dimensionality reduction: SVD, NNMF, LDA
Required Readings
 Chapter 7 of Leskovec, Rajaraman, and Ullman (2014): Mining of Massive Datasets
Optional Readings
 Watch Pedro Domingos talk about the curse of dimensionality (segment 4 of week 4)
 Chapter 11 (sections 11.1 – 11.3) Leskovec, Rajaraman, and Ullman (2014): Mining of Massive Datasets.
 Chapter 15 of Daume (in preparation). A course in machine learning
 Justin Grimmer and Gary King. 2011. “General Purpose ComputerAssisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences. Copy at http://j.mp/2qzYYj2
 Chapter 6 of Provost & Fawcett: Data Science for Business
 Chapter 14 (sections 14.2, 14.5  14.10) of Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (10^{th} edition)
 Turk & Pentland (1991) “ Eigenfaces for Recognition ”
April 20: Recommender Systems
 The Netflix challenge
 Contentbased methods
 Learning features and parameters
 Nearestneighbor collaborative filtering
Recommended Readings:
 Chapter 8 of Schutt & O’Neill (2013): Doing Data Science
 Domingos, “ A Few Useful Things to Know about Machine Learning .” Communications of the ACM, 55 (10), 7887, 2012.
 Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D., 2008. “ Top 10 Algorithms in Data Mining ”. Knowledge and Information Systems 14, 1–37. doi:10.1007/s1011500701142
Optional Readings:
 The Guardian (2017), "How algorithms are pushing the tech giants into the danger zone"
 Chapter 9 of Leskovec, Rajaraman, and Ullman (2014): Mining of Massive Datasets.
 Yehuda Koren (2009) “ The BellKor Solution to the Net¢ix Grand Prize"
 Resnick et al (1994) “ GroupLens: an open architecture for collaborative filtering of netnews ”, CSCW ’94, pp. 175186
 RM Bell, Y Koren (2007) “ Lessons from the Netflix prize challenge ”, ACM SIGKDD Explorations Newsletter
April 21 (Lab): Unsupervised learning
 kMeans clustering
 Dimensionality reduction: PCA
April 22: Machine learning and causal inference
 ML for measurement
 Inference after selection
 Selecting among many controls
 Selecting among many instruments
 Machine learning heterogeneous treatment effects
Required Readings:
 Section 4 of: Athey, S., 2018. The impact of machine learning on economics, in: The Economics of Artificial Intelligence: An Agenda. University of Chicago Press, pp. 507–547.
 Athey, S., Imbens, G., 2019. Machine Learning Methods Economists Should Know About. arXiv:1903.10075.
Optional Readings:
 Athey, S., Imbens, G., 2016. Recursive partitioning for heterogeneous causal effects. PNAS 113, 7353–7360. https://doi.org/10.1073/pnas.1510489113
 Athey, S., M. Bayati, N. Doudchenko, G. Imbens, and K. Khosravi (2017) "Matrix Completion Methods for Causal Panel Data Models." http://arXiv.org/abs/1710.10251

Belloni, A., Chernozhukov, V., Hansen, C., 2014. HighDimensional Methods and Inference on Structural and Treatment Effects. Journal of Economic Perspectives 28, 29–50. https://doi.org/10.1257/jep.28.2.29
 Same authors (2011): “LASSO Methods for Gaussian Instrumental Variables Models ,” 2011 arXiv:[stat.ME], http://arxiv.org/abs/1012.1297 .
 Chernozhukov, V., Hansen, C., Spindler, M., 2015. Valid PostSelection and PostRegularization Inference: An Elementary, General Approach. Annual Review of Economics 7, 649–688. https://doi.org/10.1146/annureveconomics012315015826
 Künzel, S.R., Sekhon, J.S., Bickel, P.J., Yu, B., 2019. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences 116, 4156–4165.
 Sands and Gilchrist (Medium Post): Best of Both Worlds: An Applied Intro to ML For Causal Inference

Taylor, J., Tibshirani, R.J., 2015. Statistical learning and selective inference. Proceedings of the National Academy of Sciences 112, 7629–7634.Wager, S., Du, W., Taylor, J., Tibshirani, R.J., 2016. Highdimensional regression adjustments in randomized experiments. PNAS 113, 12673–12678.
April 27: Applied ML  start to finish
 Data => Features
 Training and crossvalidation
 Evaluating performance
 Extensions
Optional Readings:
 Blumenstock et al (2015): Predicting Poverty with Mobile Phone Metadata
 Aiken et al. (2020): Targeting Development Aid with Machine Learning and Mobile Phone Data: Evidence from an AntiPoverty Intervention in Afghanistan
April 28 (Lab): No Lab
April 29: Summary
 Recap / summary
 Quiz #2