Tutorial: Introductory Probability and Statistics Using R
G. Jay
Kerns, Department of Mathematics and Statistics,
Youngstown State University, USA.
Abstract
The purpose of this tutorial is to introduce new users of R to
the basic environment and to share resources and tricks I have
learned in my experience introducing R to students and
colleagues of my own.
Goals
Individuals walking out of this tutorial should be:
-
familiar with the tools available for basic data import,
export, and manipulation. Reshaping data, subsetting by
groups, and formatting are included.
-
competent with basic descriptive statistics including
summaries and data display, and I plan to discuss all three
(3) of the base, lattice, and ggplot2
graphics engines.
-
proficient in the standard point estimation and hypothesis
testing topics from a first course in statistics.
-
equipped to perform standard linear regression modeling and
diagnostics, and other topics such as resampling and
permutation tests as time permits.
-
able to prepare randomized quizzes, exams, answer keys,
and/or study materials for students or colleagues.
Outline
Topics will include:
-
Data import, export, and manipulation: reshaping,
subsetting, display
-
Descriptive statistics: graphical, numerical
-
Probability and distributions: base distributions
and the distr family of packages
-
Point and interval estimation: maximum likelihood,
confidence intervals
-
Hypothesis testing: parametric, nonparametric
-
Simple and multiple linear regression: fitting,
prediction, diagnostics
-
Resampling: bootstrap percentile confidence
intervals, permutation tests
-
Document creation: Sweave,
odfWeave, HTML export, and more
Prerequisites
A person attending my tutorial should know how to turn on a
computer. It would also help if they have had at least one
semester of an upper-division undergraduate course in
statistics.
In the last third of the tutorial I will discuss assessment
material creation (exams, quizzes, class notes) and for that
it would be helpful for users to have a passing familiarity
with LaTeX; in the grand scheme of things, however, a person
does not even really need that. There are freely available
tools nowadays (LyX, GNU Emacs
Org-mode + babel) that automate the LaTeX
process to a large extent.
Intended Audience
This tutorial is targeted at A) established
professionals from other fields
(physical/biological/environmental sciences, business,
economics) who are comfortable with statistics but are just
starting with the R language, B) computer scientists who
may be quite competent with the R language but feel tentative
about their basic statistical literacy and would like to learn
more, or C) individuals who expect to be teaching people
from groups A) or B) in the near future.
Workshop Materials
Attendees of my workshop should go here before the
tutorial to get up to speed. Other materials will be
distributed to the tutorial participants on-site. At this time
I do not believe people will need to bring anything except
themselves, but if this changes I will post instructions here.
Please check here for up to date tutorial resources.
Related Links
The IPSUR homepage is here. The homepage
for the IPSUR package is here, and the
homepage for the R Commander plugin package is here.
References
[1] Kerns, G. J. (2010).
Introduction to Probability and Statistics Using R. First
Edition.
[2] G. Jay Kerns (2010). IPSUR: Introduction to
Probability and Statistics Using R. R package version
1.1.
[3] G. Jay Kerns with contributions by Theophilius Boye, Tyler
Drombosky and adapted from the work of John Fox et al. (2010).
RcmdrPlugin.IPSUR: An IPSUR Plugin for the R
Commander. R package
version 0.1-7.