BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Platforms and Applications for &quot\;Big and Fast&quot\; Data Ana
 lytics  - Yanlei Diao\, UMass Amherst
DTSTART:20141203T100000Z
DTEND:20141203T110000Z
UID:TALK56281@talks.cam.ac.uk
CONTACT:Microsoft Research Cambridge Talks Admins
DESCRIPTION:Recently there has been a significant interest in building big
  data systems that can handle not only "big data" but also "fast data" for
  analytics. Our work is strongly motivated by recent real-world case studi
 es that point to the need for a general\, unified data processing framewor
 k to support analytical queries with different latency requirements. Towar
 ds this goal\, our project is designed to transform the popular MapReduce 
 computation model\, originally proposed for batch processing\, into distri
 buted (near) real-time processing.\n\nIn this talk\, I start by examining 
 the widely used Hadoop system and presenting a thorough analysis to unders
 tand the causes of high latency in Hadoop. I then present a number of nece
 ssary architectural changes\, as well as new resource configuration and op
 timization techniques to meet user-specified latency requirements while ma
 ximizing throughput. Experiments using typical workloads in click stream a
 nalysis and twitter feed analysis show that our techniques reduce the late
 ncy from tens or hundreds of seconds in Hadoop to sub-second in our system
 \, with 2x-7x increase in throughput. Our system also outperforms state-of
 -the-art distributed stream systems\, Twitter Storm and Spark Streaming\, 
 by a wide margin. Finally\, I will show some initial results and challenge
 s of supporting big and fast data analytics in the emerging domain of geno
 mics. \n
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station Road\, Cambridge
 \, CB1 2FB
END:VEVENT
END:VCALENDAR
