Data science! Big data! Statistics! Infographics! Buzzword!
Day 0: Class organization and programming environment setup. Setup for the course before you arrive. Email us with questions!
Day 1: An end-to-end example getting you from a dataset found online to several plots of campaign contributions. Lecture
Day 2: Lots of visualization examples, and practice going from data to chart. Lecture
Day 3: Statistics basics, including T-Tests, Linear Regression, and statistical significance. We'll use campaign finance and per-county health rankings. Lecture
Day 4: Text processing on a large text corpus (the Enron email dataset) using tf-idf and cosine similarity. Lecture
Day 5: Scaling up to process large datasets using Hadoop/MapReduce on a larger copy of the Enron dataset. Lecture
Day 6: You tell us! Get into groups or work on your own to analyze a dataset of your choosing, and tell us a story!
We are grateful to our sponsors for speaking to students about their real-world experience and for covering our expenses.
We wrote the lab pages using Markdown, a markup language that compiles to HTML.
Days 1, 2 and 4 were written in Mou, an OS X markdown editor with nice default css files. The CSS file can be downloaded here.
Days 3 and 5 were written in a hacked version of pycco, which lets you write Markdown as comments in your python files and generates an HTML file that includes the compiled markdown interspersed with python code that is syntax highlighted. The default pycco generates HTML where the comments and python code are split onto the left and right parts of the screen. We hacked pycco to include the python code in-line. You can see the code on the github page
During MIT IAP
Jan 9-12, 17-18, 11am - 2pm, Room 32-144
Please bring a laptop to every class.
Participants requested to attend all sessions
Prereq: 6.01 or prior python