BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:A Spark in the Cloud: Iterative and Interactive Cluster Computing 
 - Mosharaf Chowdhury (UC Berkeley)
DTSTART:20100722T150000Z
DTEND:20100722T160000Z
UID:TALK25383@talks.cam.ac.uk
CONTACT:Eiko Yoneki
DESCRIPTION:MapReduce and its variants have been highly successful in impl
 ementing large-scale data-intensive applications on commodity clusters. Ho
 wever\, most of these systems are built around an acyclic data flow model 
 that is not suitable for a wide array of popular use cases including many 
 iterative machine learning algorithms. We present Spark\, a framework opti
 mized for iterative jobs\, where a dataset is reused across multiple paral
 lel operations without sacrificing the scalability and fault tolerance of 
 MapReduce. To achieve these goals\, Spark introduces an abstraction called
  resilient distributed datasets (RDDs) based on the concept of data lineag
 e. Spark provides a functional programming model similar to MapReduce\, bu
 t also lets users hint for data to be cached between iterations\, leading 
 to up to 10x better performance than Hadoop on some jobs. Spark also makes
  programming jobs easy by integrating cleanly into the Scala programming l
 anguage (a high-level language on the JVM). Finally\, the ability of Spark
  to load a dataset into memory and query it repeatedly makes it especially
  suitable for interactive analysis of big datasets. We have modified the S
 cala interpreter to make it possible to use Spark interactively in this ma
 nner\, providing a significantly more responsive experience than Hive and 
 Pig.\n \nBio: Mosharaf Chowdhury is a Ph.D. student working with Prof. Ion
  Stoica in the RADLab at UC Berkeley. He recieved his B.Sc. in Computer Sc
 ience and Engineering from Bangladesh University of Engineering and Techno
 logy and his M.Math in Computer Science from the University of Waterloo. H
 is research interest is in large-scale data-parallel systems\, data center
  networks\, and network virtualization. \n \n
LOCATION:FW26\, Computer Laboratory\, William Gates Builiding
END:VEVENT
END:VCALENDAR
