HDFS scalability: the limits to growth
Summary of Shvachko's article presenting Hadoop distributed file system HDFS and its scalability limitations.
MapReduce: Simplified Data Processing on Large Clusters
Paper from Google engineers presenting MapReduce, a model providing a robust yet simple interface for processing large datasets in distributed environments.
Enterprise Data Analysis and Visualization: An Interview Study
This paper presents an overview of the different competencies of data analysts, discussing the social and organizational context of companies that affects the outcome of analysts' everyday work.
Improving Datacenter Performance and Robustness with Multipath TCP
Paper presenting the MPTCP protocol which improves the performance on datacenters, providing an alternative for single-path transport.
Modeling TCP Throughput: A Simple Model and its Empirical Validation
Summary of the paper published in 1998 describing a TCP throughput prediction model that take timeouts into account.