Tutorial: Predictive Modeling with R and the caret
Package
Max
Kuhn, Pfizer Global R&D, USA.
Abstract
This course will provide an overview of using R for supervised
learning (aka machine learning, pattern recognition, predictive
analytics, etc). The session will step through the process of
building, visualizing, testing and comparing models that are
focused on prediction. The goal of the course is to provide a
thorough workflow in R that can be used with many different
modeling techniques. A case study is used to illustrate
functionality.
Outline
Topics will include:
-
Introduction (philosophy, case study)
-
General Strategies (data splitting, resampling, model
tuning)
-
Data Pre-Processing (transformations, variable filtering)
-
Conventions in R (OOP, function interfaces, consistency)
-
Building and Tuning Models (performance metrics, trees,
kernel methods, model comparisons)
-
Other Topics (as time allows) (parallel processing,
variable importance)
The length of the tutorial is not conducive to hands-on
exercises, so laptops are not required. However, the
illustrative data sets and code will be available online if
participants would like to follow along.
Please check here for up to date tutorial resources.
Prerequisites
Basic understanding of R (matrices, data frames, functions,
etc) is needed. Some basic understanding of regression
techniques is helpful.