BU.510.710.XX– Big Data Machine Learning–Page 1 of 8

/

Big Data Machine Learning

2 Credits
BU.520.710.XX
[NOTE: Each section must have a separate syllabus.]
[Day &Time / ex: Monday, 6pm-9pm]
[Start & End Dates / ex: 3/24/17-5/12/17]
[Semester / ex: Fall 2016]
[Location / ex: Washington, DC]

Instructor

[Full Name]

Contact Information

[Email Address]

[Phone Number, ###- ###-#### (Optional)]

Office Hours

[Please specify the day and time of the 2 hours that will be dedicated to office hours each week. For evening classes, faculty may wish to hold their office hours by phone or email. While faculty are permitted to state “and by appointment,” office hours should not be held exclusively by appointment.]

Required Texts Learning Materials

Python Machine Learning – by Sebastian Raschka, 2015, PACKT Publishing, Birmingham – Mumbai at:

Hands-On Machine Learning with Scikit-Learn and TensorFlow – by AurelienGeron, 2017, O’Reilly Media, Inc.

Software

We will program in PYTHON 3.6. This software is freely available at Please download and install according to your operating system on your computer. Make sure your iPython/Jupyter Notebook works properly.

Also please make sure you update your scikit-learn package via Anaconda Explorer  Extensions, update Scikit-learn package

Prior to the first class you are required to watch on Lynda.com “Up and Running with Python”: (Note we will use iPython/JupyterNotebook for all our work, not the IED Aptana Studio 3 in the video above.)

Note: This class is not a class on how to program in Python but will train students to a level of proficiencies necessary for successful completion of this course. Additional tutorial are readily available on the web at several locations:

  • Codecademy
  • Python.org
  • learnpython.org
  • Pythonfiddle.com

Other good Python primer videos that students could watch prior to class are available on Lynda.com:

  • Introduction to Data Analysis
  • Python 3 Essential Training

Course Description

This course provides students with a firm understanding of the mathematical and statistical theories that underlie the foundations of big data and machine learning. Students will be engaged in solving real-world problems by directly applying their data science skills through the implementation of code and rigorous analysis of financial data sets. In particular, this course will highlight some of the challenges and limitations of applying such machine learning algorithms. Focus will be on understanding the subtle differences in each technique. This course will be hands-on with weekly homework assignments and a final presentationgeared towards fully immersing students in the data science process. Students will program in Python (e.g. Pandas, NumPy, Scikit-Learn, Matplotlib, pattern, NLTK,etc). Topics that will be covered include: Principle Components Analysis, Multinomial LogisticRegression, Naïve Bayes, Perceptron, Support Vector Machines, Random Forest, Neural Networks,model evaluation ROC/AUC, k-fold cross-validation, etc.

Prerequisite(s)

BU.510.650 Data AnalyticsandBU.231.620 Corporate Finance

(If you have not completed Corporate Finance but would still like to (potentially) join this course please contact Professor Liew directly at .)

Learning Objectives

By the end of this course, students will be able to:

  1. Munge and understand data quickly, summarize and make proper inferences.
  2. Practice the data analysis procedure: set-up the research question; gather and clean data; determine the appropriate methodology to apply; and analyze results and reflect on any limitations/improvements.
  3. Implement machine learning algos at a level of proficiency suitable for typical business school --data scientist.
  4. Comprehend the underlying theories that the machine learning algos are based on with understanding the limitations and embedded assumptions.
  5. Implement the tools learned in a hands-on, real-world project, then communicate effectively the findings
  6. Think critically about the benefits and limitation of machine learning and challenges with big data,
  7. Be able to synthesize the different big data techniques in a creative manner,

To view the complete list of Carey Business School’s general learning goals and objectives, visit the Carey website.

Attendance
Attendance and class participation are part of each student’s course grade. Students are expected to attend all scheduled class sessions. Failure to attend class will result in an inability to achieve the objectives of the course. Excessive absence will result in loss of points for participation. Regular attendance and active participation are required for students to successfully complete the course.

Class participation is an important part of learning. If you have a question, it’s likely that others do as well. I encourage active participation, and course grades will take into account students who make particularly strong contributions

Assignments

Assignment / Learning Objectives / Weight
Attendance and participation in class discussion / 1–7 / 10%
Assignments / 1–7 / 30%
Final Presentation (one third of the 30% will be based on peer review) / 1–7 / 30%
Final Exam / 1–7 / 30%
Total / 100%

Homework: weekly individual homework assignments, due by the beginning of the next class day. All homework assignment should be submitted through the Blackboard links.

Group Projects: 2–4 students form a group and work on the projects as a team. Groups will formulate a hypotheses, collect data, use techniques taught in class to study the data patterns or to predict future outcomes. Students are required to write a 5- to 10-page project report, and present in class using PowerPoint slides and present results in the seventh class.

Final Exam: The final exam is in-class closed-book individual written exam.

Late submission including assignments, projects, and exams will notbe accepted.

Study Group (not required, but highly recommend)

Many students learn better and faster when working in a group, so I encourage collaborative learning. You can work together in a study group with 3–5 students, to discuss class materials, homework assignments and projects on a weekly basis. However, each student must write your homework assignment individually using your own language; your text should reflect your own understanding of the materials. The study groups can be different from your project groups.

Grading

Effective Fall 2017: The grade of A is reserved for those who demonstrate extraordinarily excellent performance as determined by the instructor. The grade of A- is awarded only for excellent performance. The grades of B+, B, and B- are awarded for good performance. The grades of C+, C, and C- are awarded for adequate but substandard performance.The grades of D+, D, and D- are not awarded at the graduate level (undergraduate only). The grade of F indicates the student’s failure to satisfactorily complete the course work.

Please note that for Core and Foundation courses, a maximum of 25% of students may be awarded an A or A-; the grade point average of the class should not exceed 3.3. For Elective courses, a maximum of 35% of students may be awarded an A or A-; the grade point average of the class should not exceed 3.4. (For classes with 15 students or fewer, the class GPA cap is waived.)

Tentative Course Calendar
The instructors reserve the right to alter course content and/or adjust the pace to accommodate class progress. Students are responsible for keeping up with all adjustments to the course calendar.

Session / Objective/Topics / Assigned Readings / Assignment
0 / Python Pre-class Review / Lynda’s “Up and Running with Python” / Please review the Lynda’s coursebefore the first class
1 / Python exercises and Review of Classifiers / Python Exercises I and
Chapters 1, 2, 3 / Assign HWK1
2 / Data Processing and Dimension Reduction –PCA / Python Exercises II and Chapters 4, 5 / Assign HWK2:“Equity Market Components”
3 / Model Evaluation and Ensemble Learning / Chapters 6, 7 / Assign HWK3:“Asset Class Market Timing”
4 / Sentiment Analysis / Chapter 8 / Assign HWK4:“Tweet Sentiments”
5 / Regression Analysis / Cluster Analysis / Chapters9, 10, 11 / Assign HWK5
6 / Neural Network / Chapter 12 / Assign HWK6
7 / Project Presentation / Project write-ups and presentations
8 / Final Exam / In class

Carey Business School

Policies and General Information

Blackboard Site

A Blackboard course site is set up for this course. Each student is expected to check the site throughout the semester as Blackboard will be the primary venue for outside classroom communications between the instructors and the students. Students can access the course site at Support for Blackboard is available at 1-866-669-6138.

Course Evaluation

As a research and learning community, the Carey Business School is committed to continuous improvement. The faculty strongly encourages students to provide complete and honest feedback for this course. Please take this activity seriously; we depend on your feedback to help us improve. Information on how to complete the evaluation will be provided toward the end of the course.

Disability Support Services

All students with disabilities who require accommodations for this course should contact Disability Support Services at their earliest convenience to discuss their specific needs. If you have a documented disability, you must be registered with Disability Support Services ( or 410-234-9243) to receive accommodations. For more information, please visit the Disability Support Services webpage.

Academic Ethics Policy

Carey expects graduates to be innovative business leaders and exemplary global citizens. The Carey community believes that honesty, integrity, and community responsibility are qualities inherent in an exemplary citizen. The objective of the Academic Ethics Policy (AEP) is to create an environment of trust and respect among all members of the Carey academic community and hold Carey students accountable to the highest standards of academic integrity and excellence.

It is the responsibility of every Carey student, faculty member, and staff member to familiarize themselves with the AEP and its procedures. Failure to become acquainted with this information will not excuse any student, faculty, or staff from the responsibility to abide by the AEP. Please contact the Student Services office if you have any questions. For the full policy, please visit the Academic Ethics Policy webpage.

Students are not allowed to use any electronic devices during in-class tests. Calculators will be provided if the instructor requires them for test taking. Students must seek permission from the instructor to leave the classroom during an in-class test. Test scripts must not be removed from the classroom during the test.

Student Conduct Code

The fundamental purpose of the Johns Hopkins University’s regulation of student conduct is to promote and to protect the health, safety, welfare, property, and rights of all members of the University community as well as to promote the orderly operation of the University and to safeguard its property and facilities. As members of the University community, students accept certain responsibilities which support the educational mission and create an environment in which all students are afforded the same opportunity to succeed academically. Please contact the Student Services office if you have any questions. For the full policy, please visit the Student Conduct Code webpage.

Student Success Center

The Student Success Center offers free online and in-person one-on-one and group coaching in writing, presenting, and quantitative courses. The center also offers a variety of workshops, exam study sessions, and instructor-led primer seminars to help prepare students for challenging course content, including statistics and accounting. For more information or to book an appointment, please visit the Student Success Center website.

Other Important Academic Policies and Services

Students are strongly encouraged to consult the Carey Business School’s Student Handbook and Academic Catalog and Student Resources for information regarding the following items:

  • Statement of Diversity and Inclusion
  • Inclement Weather Policy

Copyright Statement

Unless explicitly allowed by the instructor, course materials, class discussions, and examinations are created for and expected to be used by class participants only.The recording and rebroadcasting of such material, by any means, is forbidden. Violations are subject to sanctions under the Honor Code.

Appendix:Homework Rubric for Big Data Machine Learning Course

Assessment
Criteria / Not Good Enough
(0 score <6) / Good
(6 score <9) / Very Good
(9 score 10) / Score
Deep understanding of theoryand its applications, using qualitative methods to answer business questions / Demonstrate inadequate understanding of important concepts, methods or their applications, e.g., choose wrong methods, conduct analysis inappropriately, or interpret results incorrectly. / Understand concepts and methods relatively well, analyze data using acceptable methods although not perfect; be able to derive useful information for decision making. / Demonstrate sophisticated understanding for the concepts and methods; know the exact scopes and possible limitations of each method; show capability of using data analytics skills to make right business decision.
Implementation and interpretation of data analysis techniques / Use wrong techniques to analyze data, present inappropriate interpretations or conclusions / Choose acceptable methods to analyze data, interpretations are sensible, derive useful results. / Use advanced techniques to conduct thorough and insightful analysis, interpret the results correctly, draw right conclusions based on data analysis
Ability of solving real-world problems using quantitative methods / Data is inadequate or unstructured. Use inappropriate methods to analyze data, fail to retrieve useful information. Suggestions are not persuading. / Collect and document just enough data, employ appropriate techniques to retrieve insightful information from data, make reasonable recommendations / Gather sufficient relevant data, conduct data analytics using scientificmethods ,make appropriate and powerful connections between analysis and real-world problems, provide constructive guidance in decision making
Writing and presenting, especially on organization and communication / Report is inadequately written and poorly organized. Analysis is insufficient. Conclusions are unconvincing. / Report is concise and clearly written. Analyze problems following scientific strategies; provide useful suggestions with detailed explanation. / Report is well organized and insightfully written, includes thorough and thoughtful details. Conclusions are convincing.
Total Score
Comments:

Appendix: Final Exam Rubric for Big Data Machine Learning Course

Assessment
Criteria / Not Good Enough
(0 score <6) / Good
(6 score <9) / Very Good
(9 score 10) / Score
Interpretation of Data
(qualitative) / Little or no attempt to interpret data; or there are significant errors; or some data are over- or under-interpreted. / Interpret most data correctly; part of conclusions may be suspect; suggestions on future implementation are sound. / Data are completely and appropriately interpreted; there is no over- or under-interpretation; draw convincing conclusions.
Analysis (quantitative) / Methods are completely misapplied or applied but with significant errors or omissions. Choose inappropriate methods and make wrong predictions. / Most statistical methods are correctly applied but more could have been done with the data. Predictions are sensible but may deviate from the true results in a large range. / Statistical methods are fully and correctly applied; demonstrate superior data analysis skills; deeply mine the data and obtain useful insights for decision making.
Critical evaluation of findings; / Blindly accept defective results; or recognize defective results but does not know how to fix them. / Recognize defective results and figure out the causes; understand the main sources of errors. / Show deep understanding for the sources of errors; recognize defective results and eliminates the causes
Ability to draw proper conclusions and make effective suggestions / Not draw conclusions; draw incorrect conclusions; suggestions are not acceptable. / Draw correct conclusion; suggestions may have potential impact on the future business. / Demonstrate substantial understanding of the problem; conduct deep data analytics using correct methods; draw correct conclusions with sufficient explanation and elaboration.
Total Score
Comments: