BUS 211 f(1) Spring 20171

BUS211f(1)Analyzing Big Data iCourse Syllabus

Spring 2017—Tuesdays & Thursdays5:00–6:20 pm

Sachar 116—International Hall

Prof. Robert Carver

781-775-5493 (mobile)

Office: Sachar 1C

Hours: Tu/Thu 3:00 to 4:45 pm and by appointment

TAs: Boxi Pang, Siyu Wang

Overview / This is a two credit modulethat examines the opportunities and industry disruption in an era of massive, high velocity, unstructured data and new developments in data analytic. We treat some strategic, ethical, and technical dimensions of big data. The technical foci of the course include data structures, data warehousing, Structured Query Language (SQL), and high-impact visual displays. The principal objective of the course is to help students build understanding of data as an essential competitive resource, and acquire advanced computer skills through cases and hands-on applications.Assignments and classroom time will be devoted to both to analysis of current developments in analytics and to gaining experience with current tools.
Required Reading /
  • There is a required on-line course pack available for purchase at the Harvard Business Publishing website at this URL:

This link is also available on LATTE . See last page of Syllabus for course pack contents.
  • Other readings as posted on LATTE site, including on-line technical reference works.

Learning Goals and Objectives / Upon successful completion of this module, students will:
  • Think of data as a strategic resource in business.
  • Understand the logic of complex data queries in the context of on-line business research sources.
  • Be familiar with current developments in Big Data, business intelligence, and competitive analytics.
  • Be able to design a relational database structure suited to a business enterprise.
  • Understand the relationships between human cognitive processes and effective informational visualization.

Prerequisites / Students should have prior background in accounting and statistical techniques at a level comparable to that provided by FIN 212a and ECON 210f.
Course Approach / Nobel Laureate Herbert Simon wrote “What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” This was true when Simon wrote it in 1969 and it is all the more true in our current age of Big Data.
This course is designed to provide students with an understanding of some techniques for managing information abundance and for controlling the costs and benefits of information processing in decision contexts. As such, we start with typical decision-making situations in business settings and work towards making data-driven decisions. Readings focus on the theory of decision-making, data structure, and analytic models. In addition, articles and cases illustrate typical decision problems and the application of the techniques we will study.
Communications / We’ll make regular use of LATTE. All lecture notes, handouts, assignments, and supporting materials will be available via LATTE, and any late-breaking news will reach you via email. Please check your Brandeis email and the LATTE site regularly to keep apprised of important course-related announcements.
Technologies / Throughout the course you have the choice of using the public computer clusters at IBS and/or your personal laptops. Ideally, you will bring your laptop to class, but use it only for course-related software.
Nearly all of the software we will use runs on either Windows or Macs.
Software / In addition to software available on the IBS computer clusters, we will also use web-based resources, including some made available through the Teradata Student Network (TSN). This site, sponsored by Teradata, the Walton School of Business at the University of Arkansas, IBM, MicroStrategy, SASand others, is a gateway to articles, cases, software tools and real corporate data. Details about use of TSN will be provided separately on our LATTE site.
The key tools that we will use this module are:
  • Teradata SQL Assistant: Teradata SQL (pronounced “sequel”) Assistant will provide exposure to writing Structured Query Language code to interrogate a large database compiled by Dillard’s Department Stores, a large US retail chain.
  • R: R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. In this module we will use R mainly for data visualization. The advantage of the R software is that it can work on both Windows and Mac-OS. It is ranked no. 1 in the KDnuggets 2013 poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment that has become popular.
R (base):
RStudio:
  • Github is also a free environment that facilitates (a) collaborative work and (b) version control for software projects that are under development. It is very widely used by data scientists to manage and share their work.
  • LucidCharts is a cloud-based tool that enables users to design a coherent data structure as part of the process of designing a database. We will use a free version.
  • Interactive Visualization Tools (further instructions to be distributed in class): These powerful applications can speed up some processes, but unlike R and SQL will not (a) mark you as a skilled data scientist or (b) permit the degree of customization and flexibility that R and SQL afford. They are, however, the state of the art.
  • Tableau: Windows and Mac versions available
  • Qlik Sense Desktop (for students with 64-bit Windows computers)

Student Contributions / Class participation is important in this course both as a means of developing understanding and as an indicator of student progress. Participation can take many forms, and each student is expected to contribute actively, freely, and effectively to the classroom experience by raising questions, demonstrating preparedness and proficiency in the analysis of problems and cases, and explaining the implications of particular analyses in context. Homework-based discussion and presentations are an important part of participation.To this end, regular class attendance is required, and students should use name cards.We meet only six times, so absence can become a serious problem. Even if you must arrive late or leave early, be here.
With assistance from the TAs, I will evaluate the quality of your contributions in class each evening, as well as the quality of your contributions via email, LATTE discussion, etc. These will all be factored together in determining your ultimate Contributions grade (see below). In general, absence from class reduces your contribution grade.
Written Assignments and Projects / Students will complete five written assignments during the course. Three of these will be brief analyses, requiring modest analysis and writing. These may be completed with one or two partners, and each student should expect to briefly discuss one of these analyses in class.
Two other written assignments will be “Projects” requiring more significant time and analysis. The projects will be prepared in teams of four students, and will include written and computer-based elements. Owing to the size of the class this term, students will have only limited opportunities to present parts of their projectsorally in the course.
All assignments should be submitted via LATTE upload prior to the start of class. Papers should be professional in appearance and use clear, grammatically correct business English. Analytical work (graphs, tables, and other output) should be incorporated seamlessly into the written document, showing readers exactly and only what you want them to see.
Workload Expectation / Success in this two- credit course is based on the expectation that students will spend a minimum of 9 hours of study time per week for six weeks in preparation for class (readings, papers, discussion sections, preparation for exams, etc.).
Evaluation / Your final grade in the course will be computed using these weights:
Contributions to Class Discussions / 15% / please note!
Briefanalyses (3—averaged together) / 35%
Projects (2) / 50%
TOTAL / 100%
Academic Honesty /
  • You are expected to be honest in all of your academic work. Please consult Brandeis University Rights and Responsibilitiesfor all policies and procedures related to academic integrity. Students may be required to submit work to TurnItIn.com software to verify originality. Allegations of alleged academic dishonesty will be forwarded to the Director of Academic Integrity. Sanctions for academic dishonesty can include failing grades and/or suspension from the university. Citation and research assistance can be found at LTS - Library guides.

Disabilities / If you are a student with a documented disability on record at Brandeis and wish to have a reasonable accommodation made for you in this class, please see me immediately.
Study Groups / Working with one or two partners is an excellent way to gain understanding of this subject. I encourage small groups to work on assignments, with a few caveats:
  • Be sure that you are neither carrying nor being carried by the group; each member of the group is entitled to learn and is expected to contribute.
  • Even in the context of group work, each student is responsible for the quality and timeliness of the submitted work.
  • Each group member retains the right to “go it alone.” Joining a group is not a marriage. Similarly, teams are encouraged to dismiss underperforming members.

Course Outline

Note: for each session, you should complete the assigned reading before coming to class. See the list of deliverables on below; detailed assignments will be posted each week, and all assignments and handouts will also be available on our LATTE site. Items marked HBP are in the on-line coursepack.

Week/ Date

/

Topics and Readings

/

Upload to LATTE before class

Week 1

17 January

/ Finding Business Value in a Sea of Data
READINGS: “Digital Ubiquity” – distributed by email
Thomas, Rob. “Transforming Farms with Data” blog post (LATTE)
“At Amadeus, Finding Data Science Talent…” (LATTE)
  1. Course introduction and objectives
  2. Competing on Analytics: Opportunities
/

(none)

19 January

/ First Look at R and RStudioEnvironment
READINGS: McKinsey Report Exec Summary (LATTE)
Download Instructions on LATTE

Week 2

24 January

/ Data Visualization—Finding Patterns in Voluminous Data
CASE Assignment: Spotify: Face the Music (HBS)
Davenport HBR article “Competing on Analytics” (HBP)
READINGS: The Hidden Traps in Decision Making (HBR Classic)
GUEST SPEAKER: Begli Nursahedov, MAIEF 2009
Data Scientist, Hubspot
  1. Discussion of Spotify case
  2. Human Cognition Effective displays
/

Analysis 1 due before class

26 January

/ Effective Data Visualizations
READINGS: Handouts on R-Studio and Graphing in R (LATTE)
Princeton Intro to R-Studio
Link to Peng's "Plotting Systems in R"
(reference) R and Data Mining (link on LATTE)
Graphing in R demo

Week 3

31 January

/ Organizing and Managing Data
READINGS: G. Russell reading on Databases (Link on LATTE)
ERD reading (posted on LATTE)
CASE READING: Mustang Music (A & B; HBP)
  1. Analysis 2 defbrief
  2. Essential concepts: metadata, structure, data quality
  3. Structure of Relational Databases
  4. Entity-Relationship Diagramming
/

Analysis 2

2 February

/ Ethical Concerns
NOTE: On 2 Feb our class will be open to the community as part of ‘Deis Impact
READING: Barocas, S. and Nissenbaum, H. “Big Data’s End Run Around Anonymity and Consent” (LATTE)
Short cases distributed in class

Week 4

7 February

/ SQL Part-1
READING: Russell section on SQL (LATTE; Chap 3)
Dillard’s Dept Stores
a.Project 1 debriefing
b.SQL: one language, many dialects
c.Fundamental commands: SELECT, FROM, WHERE, JOIN, computation of a new column (AS), case sensitivity, relational operators
d.Dillard’s Demo /

Project 1

Complete your Teradata registration(instructions on LATTE)

9 February

/ SQL Part-2
READING: Russell section on SQL (LATTE; Chap 3)
Selected Teradata material about SQL
GUEST SPEAKER: Fei Wu, MBA 2015
Business Intelligence Analyst, Brandeis University
a.Teradata interface
b.Dillard’s queries

Week 5

14 February

/ SQL Part-3
a.Debrief Analysis 3
b.More SQL commands /

Analysis 3

16 February

/ Concepts of Data Cleaning
READING: Wickham, Hadley. “Tidy Data” ( LATTE)
a.Need & Strategies for Data Cleaning: Another R package

21 & 23 February

/ Midterm Recess—No classes this week

Week 6

28 Feb

/ Visualization and Exploration
a.Interactivity in Visualizations
b.Visualization for Business Intelligence

2 Mar

/ More on Visualization
a.Individual Explorations of Tableau and MS Power BI
b.Preview of BUS212

THURSDAYMarch 9

/ No Class Week this week
  • Final project due before this date in lieu of final exam
  • Early submissions appreciated
/

Project 2

Brief Description of Assignments (complete assignment details to be distributed in class):

Analysis 1 / Brief analysis of “Spotify: Face the Music”
Analysis 2 / Visualization Exercises
Analysis 3 / Working with SQL
Project 1 / Design a partial database schema for Mustang Music case, per instructions.
Project 2 / Design and code database queries using Structured Query Language (SQL) on a remote, large-scale database (Dillard’s Dept Stores).
Supplementary Readings: (inHBP course online)
“Competing on Analytics.” Davenport, Thomas H. In Harvard Business Review, Case No. R0601H-PDF-ENG. Published 01/01/2006, Harvard Business School Publishing, (12 pages).
“Spotify: Face the Music” Sastre, Isaac & Vroom, Govert. Published 12/2014, IESE Business School, (20 pages)
"The Hidden Traps in Decision Making (HBR Classic)." Hammond, John S.; Keeney, Ralph L.; Raiffa, Howard. In Harvard Business Review, Case No. R0601K. Published 01/01/2006, Harvard Business School Publishing, (11 pages).
"Mustang Music (A)" Neufeld, D.. Case No. 910E09. Published 04/28/2010, Richard Ivey School of Business, (9 pages).
“Mustang Music (B)” Neufield, D. Case No. 910E10. Published 04/27/10, Richard Ivey School of Business, (2 pages).

Rev. 01/2017