Marcus, Eugene Wu
Data science! Big data! Statistics! Infographics! Buzzword!
We have added links to the lectures for each day in the course outline.
This is a lab-oriented course where you will learn the basics to
- Take raw data (e.g., emails, logs)
- Extract meaningful information
- Use statistical tools
- Make visualizations
Each day will start with a 30 minute lecture followed by 2.5 hours of
lab. Students will work through labs and exercises during
class. We will post links to them prior to lecture.
Please make sure to complete Day 0
before the first class!!
Day 0: Class organization and programming
environment setup. Setup for the course before you arrive. Email us with
Day 1: An end-to-end example getting you from a
dataset found online to several plots of campaign contributions.
Day 2: Lots of visualization examples, and practice going from data
to chart. Lecture
Day 3: Statistics basics, including T-Tests, Linear Regression, and
statistical significance. We'll use campaign finance and per-county
health rankings. Lecture
Day 4: Text processing on a large text corpus (the Enron email
dataset) using tf-idf and cosine similarity. Lecture
Day 5: Scaling up to process large datasets using Hadoop/MapReduce
on a larger copy of the Enron dataset. Lecture
Day 6: You tell us! Get into groups or work on your own to analyze
a dataset of your choosing, and tell us a story!
We are grateful to our sponsors for speaking to students about their real-world
experience and for covering our expenses.
We wrote the lab pages using Markdown, a markup
language that compiles to HTML.
Days 1, 2 and 4 were written in Mou, an OS X
markdown editor with nice default css files. The CSS file can be downloaded
Days 3 and 5 were written in a hacked version of pycco, which lets you
write Markdown as comments in your python files and generates an HTML
file that includes the compiled markdown interspersed with python code
that is syntax highlighted. The default pycco generates HTML where
the comments and python code are split onto the left and right parts
of the screen. We hacked pycco to include the python code in-line.
You can see the code on the
During MIT IAP
Jan 9-12, 17-18, 11am - 2pm, Room 32-144
Please bring a laptop to every class.
Participants requested to attend all sessions
Prereq: 6.01 or prior python
Sign up by Jan 8!