窗体顶端

Introduction to Big Data Analysis and Machine Learning with R

Name: / Alexander Stanoyevitch
Nationality: / United States
Academic Title: / Professor
Home University(From): / California State University-Dominguez Hills, Carso, California, USA

本科生

Freshman,Sophomore,Junior,Senior,Postgraduate,Doctoral Student

English

Mathematics proficiency at the single-variable calculus level. Some experience with computer programming would be helpful but not required.

Lecturing in class

Lectures

(1) Attendance and participation: 20%
(2) assignments and papers: 80%

2credits

Alexander Stanoyevitch is currently a professor in the mathematics department at the California State University-Dominguez Hills. He earned his PhD in mathematics at the University of Michigan. He is very interested in statistics and applied mathematics, and has spent three extended visits at Stanford University’s statistics department where he has taught classes in mathematical statistics.

This course will focus on data analysis using the (free) R software. R is fast becoming the software of choice for scientists and professionals in all fields that need to analyze data. It was created in 1995 by two professors at the University of Auckland, and one of the creators, Robert Gentleman, is now the Vice President of Research at Genentech, a major biotechnology firm. Since it is open source, there are over 10,000 add on packages containing powerful programs and data sets ranging over a wide variety of subject areas. In a 2016 study, R was ranked the 12th most popular programming language; the highest ranking for any specialized computing environment—the first 11 were all general purpose languages. Although it is extremely powerful, R has a steep learning curve, but the investment is well worth it, since proficiency with R is a valued—and (increasingly) often necessary, skill for many jobs and graduate programs throughout the developed world.

Unit 1: Introduction to the R software, with a focus on extracting information from large data sets.
Unit 2: Simulation
Unit 3: Using the R add-on “dplyr” package to efficiently obtain statistics from single and multiple large data sets.
Unit 4: A practically-oriented introduction to linear (and multilinear) regression—the prototypical machine learning algorithm.
Unit 5: Introduction to some other machine learning algorithms for supervised statistical learning problems, as time permits.

No textbook will be required. Notes and slides will be provided.

An Introduction to Statistical Learning, by Gareth James, Daniela Witten, Trevor Hastie and Rob Tibshirani, Springer-Verlag, 1st ed. 2013.
Note: A pdf of this book is freely available:
Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies, by John D. Kelleher, Brian Mac Namee, and Aoife D'Arcy, MIT Press, 2015

窗体底端