Stanford CS345S:
Data-Intensive Systems for the Next 1000x
Autumn 2016

Instructor: Peter Bailis (p lastname at cs dot stanford dot edu; Office Hours: Tu 2-3:00PM Gates 410 and by appointment)

CA: Feiran Wang

Time: Tu/Th 10:30-11:50AM

Location: 540-108 (Blume Earthquake Center)

Online Forum: link

Description: The last decade saw enormous shifts in the design of large-scale data-intensive systems due to the rise of Internet services, cloud computing, and Big Data processing. Where will we see the next 1000x increases in scale and data volume, and how should data-intensive systems accordingly evolve? This course will critically examine a range of trends, including the Internet of Things, drones, smart cities, and emerging hardware capabilities, through the lens of software systems research and design. Students will perform a comparative analysis by reading and discussing cutting-edge research while performing their own original research.


9/27 Trends I Mobile ate the world (2016)
Mary Meeker's Internet Trends (2016)
9/29 Trends II A view of cloud computing (2010)
The swarm at the edge of the cloud (2014)
10/4 Dataflow Volcano---An extensible and parallel query evaluation system (1994)
Dryad: Distributed data-parallel programs from sequential building blocks (2007)
10/6 Streaming TelegraphCQ: Continuous dataflow processing (2003)
Naiad: A timely dataflow system (2013)
Optional: Scalability! But at what COST? (2015)
10/11: Project Proposals Due via Email (Proposal Info)
10/11 Training Towards a unified architecture for in-RDBMS analytics (2012)
Scaling distributed machine learning with the parameter server (2014)
10/13 Consistency Managing update conflicts in Bayou, a weakly connected replicated storage system (1995)
Replicated data consistency explained through baseball (2013)
10/18 Attention Bro: a system for detecting network intruders in real-time (1999)
MacroBase: Analytic Monitoring for the Internet of Things (2016)
10/20: Homework Report due via Email (Homework Info)
10/20 Interaction Polaris: A system for query, analysis, and visualization of multidimensional relational databases (2002)
Voyager: Exploratory analysis via faceted browsing of visualization recommendations (2016)
Optional: Vega-Lite: A grammar of interactive graphics (2017)
10/25 Aggregation TAG: A tiny aggregation service for ad-hoc sensor networks (2002)
SocialWeaver: Collaborative inference of human conversation networks using smartphones (2013)
10/27 Signals The Case for a Signal-Oriented Data Stream Management System (2007)
ShopMiner: Mining customer shopping behavior in physical clothing stores with COTS RFID devices (2015)
11/1 Homes and Cities An Operating System for the Home (2012)
BOSS: Building Operating System Services (2013)
Optional: Internet of Things for smart cities (2014)
Optional: Putting the 'Smarts' into the Smart Grid: A Grand Challenge for Artificial Intelligence (2012)
11/3 Keith Winstein Guest Lecture Reading via Email ?
11/8 Inference The missing piece in complex analytics: Low latency, scalable model management and serving with Velox (2015)
Extracting Databases from Dark Data with DeepDive (2016)
11/10 Thinking Critically About Data Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon (2014)
Optional, recommended: Thinking critically about and researching algorithms (2016)
No ?s
11/13: Project Updates due via Email (Update Info)
11/15 Matei Zaharia Guest Lecture Reading via Email ?
11/17 Flight An interactive tool for designing quadrotor camera shots (2015)
QuadCloud: A rapid response force with quadrotor teams (2016)
11/29 Edge Glimpse: Continuous, real-time object recognition on mobile devices (2015)
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (2016)
12/1 Drive Nericell: Rich monitoring of road and traffic conditions using mobile smartphones (2008)
Three decades of driver assistance systems: Review and future perspectives (2014)
12/6 Cloud++ (with Brendan Burns) Design patterns for container-based distributed systems (2016)
Serverless Computation with OpenLambda (2016)
12/8 Design and Discussion Skim, full read strongly recommended: Hints for computer system design (1983)
Skim: "One size fits all": An idea whose time has come and gone (2006)
12/14: Project Presentations 12:15-3:15PM Gates 415 (Presentation Info)
12/15: Final Project Writeups due via Email (Writeup Info)

Anonymous Feedback

Link to Form

Paper Reading Advice


Grading: 35% class participation (see below), 50% final project and write-up (see below), 15% short assignment (see below)

Class participation: This is a graduate-level research seminar, and participation is key. For each class, please respond to the questions corresponding to the reading and email your answers before class to In addition, please come prepared to discuss the following questions:

  1. What was the main idea behind this paper?
  2. What were this paper's strengths?
  3. What were this paper's weaknesses, and how would you improve upon them?
  4. Why do you think we're reading this paper?
  5. How might this paper's findings change over the next five years?

Final project: A great way to learn to do research is to do research. You will complete an open-ended research project in small teams (2-3). I encourage students from outside core databases, machine learning, statistics, and computer science to find projects related to their own research areas.

I've listed a set of potential project ideas here.

Please start looking for project partners right away. It is your responsibility to form and manage groups. The course project will include an interim course project report, a short project presentation at the end of the quarter, and a final project report. The final project presentation will be in a workshop-like format.

Short assignment: You will complete one laboratory assignment, the goal of which is to deliver hands-on exposure to the software development environment (even if the hardware is not available). This will entail implementing a small application pertaining to one of the trends using a publicly available API or SDK of your choice, accompanied by a brief, one-page experience report.