machine learning classification

By attending Machine Learning Classification workshop, Participants will:

  • Describe the input and output of a classification model.
  • Tackle both binary and multiclass classification problems.
  • Implement a logistic regression model for large-scale classification.  
  • Create a non-linear model using decision trees.
  • Improve the performance of any model using boosting.
  • Scale your methods with stochastic gradient ascent.
  • Describe the underlying decision boundaries.  
  • Build a classification model to predict sentiment in a product review dataset.  
  • Analyze financial data to predict loan defaults.
  • Use techniques for handling missing data.
  • Evaluate your models using precision-recall metrics.
  • Implement these techniques in Python (or in the language of your choice, though Python is highly recommended).

In Machine Learning Classification training course, you will create classifiers that provide state-of-the-art performance on a variety of tasks.  You will become familiar with  the most successful techniques, which are most widely used in practice, including logistic regression, decision trees and boosting.  In addition, you will be able to design and implement the underlying algorithms that can learn these models at scale, using stochastic gradient ascent.  You will implement these technique on real-world, large-scale machine learning tasks.  You will also address significant tasks you will face in real-world applications of ML, including handling missing data and measuring precision and recall to evaluate a classifier.  This course is hands-on, action-packed, and full of visualizations and illustrations of how these techniques will behave on real data.

COURSE AGENDA

  • Linear classifiers: A motivating example
  • Intuition behind linear classifiers
  • Decision boundaries
  • Linear classifier model
  • Effect of coefficient values on decision boundary
  • Using features of the inputs
  • Predicting class probabilities
  • Review of basics of probabilities
  • Review of basics of conditional probabilities
  • Using probabilities in classification
  • Predicting class probabilities with (generalized) linear models
  • The sigmoid (or logistic) link function
  • Logistic regression model
  • Effect of coefficient values on predicted probabilities
  • Overview of learning logistic regression models
  • Encoding categorical inputs
  • Multiclass classification with 1 versus all
  • Recap of logistic regression classifier
  • Goal: Learning parameters of logistic regression
  • Intuition behind maximum likelihood estimation
  • Data likelihood
  • Finding best linear classifier with gradient ascent
  • Review of gradient ascent
  • Learning algorithm for logistic regression
  • Example of computing derivative for logistic regression
  • Interpreting derivative for logistic regression
  • Summary of gradient ascent for logistic regression
  • Choosing step size
  • Careful with step sizes that are too large
  • Rule of thumb for choosing step size
  • (VERY OPTIONAL) Deriving gradient of logistic regression: Log trick
  • (VERY OPTIONAL) Expressing the log-likelihood
  • (VERY OPTIONAL) Deriving probability y=-1 given x
  • (VERY OPTIONAL) Rewriting the log likelihood into a simpler form
  • (VERY OPTIONAL) Deriving gradient of log likelihood
  • Recap of learning logistic regression classifiers
  • Evaluating a classifier
  • Review of overfitting in regression
  • Overfitting in classification
  • Visualizing overfitting with high-degree polynomial features
  • Overfitting in classifiers leads to overconfident predictions
  • Visualizing overconfident predictions
  • (OPTIONAL) Another perspecting on overfitting in logistic regression
  • Penalizing large coefficients to mitigate overfitting
  • L2 regularized logistic regression
  • Visualizing effect of L2 regularization in logistic regression
  • Learning L2 regularized logistic regression with gradient ascent
  • Sparse logistic regression with L1 regularization
  • Recap of overfitting & regularization in logistic regression
  • Predicting loan defaults with decision trees
  • Intuition behind decision trees
  • Task of learning decision trees from data
  • Recursive greedy algorithm
  • Learning a decision stump
  • Selecting best feature to split on
  • When to stop recursing
  • Making predictions with decision trees
  • Multiclass classification with decision trees
  • Threshold splits for continuous inputs
  • (OPTIONAL) Picking the best threshold to split on
  • Visualizing decision boundaries
  • Recap of decision trees
  • A review of overfitting
  • Overfitting in decision trees
  • Principle of Occam's razor: Learning simpler decision trees
  • Early stopping in learning decision trees
  • (OPTIONAL) Motivating pruning
  • (OPTIONAL) Pruning decision trees to avoid overfitting
  • (OPTIONAL) Tree pruning algorithm
  • Recap of overfitting and regularization in decision trees
  • Challenge of missing data
  • Strategy 1: Purification by skipping missing data
  • Strategy 2: Purification by imputing missing data
  • Modifying decision trees to handle missing data
  • Feature split selection with missing data
  • Recap of handling missing data
  • The boosting question
  • Ensemble classifiers
  • Boosting
  • AdaBoost overview
  • Weighted error
  • Computing coefficient of each ensemble component
  • Reweighing data to focus on mistakes
  • Normalizing weights
  • Example of AdaBoost in action
  • Learning boosted decision stumps with AdaBoost
  • The Boosting Theorem
  • Overfitting in boosting
  • Ensemble methods, impact of boosting & quick recap
  • Case-study where accuracy is not best metric for classification
  • What is good performance for a classifier?
  • Precision: Fraction of positive predictions that are actually positive
  • Recall: Fraction of positive data predicted to be positive
  • Precision-recall extremes
  • Trading off precision and recall
  • Precision-recall curve
  • Recap of precision-recall
  • Gradient ascent won't scale to today's huge datasets
  • Timeline of scalable machine learning & stochastic gradient
  • Why gradient ascent won't scale
  • Stochastic gradient: Learning one data point at a time
  • Comparing gradient to stochastic gradient
  • Why would stochastic gradient ever work?
  • Convergence paths
  • Shuffle data before running stochastic gradient
  • Choosing step size
  • Don't trust last coefficients
  • (OPTIONAL) Learning from batches of data
  • (OPTIONAL) Measuring convergence
  • (OPTIONAL) Adding regularization
  • The online learning task
  • Using stochastic gradient for online learning
  • Scaling to huge datasets through parallelization & module recap