BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Cosmic Rays Don't Strike Twice: Understanding the Nature of DRAM E
 rrors and the Implications for System Design - Ioan Stefanovici (Universit
 y of Toronto)
DTSTART:20140327T150000Z
DTEND:20140327T160000Z
UID:TALK50693@talks.cam.ac.uk
CONTACT:Eiko Yoneki
DESCRIPTION:Main memory is one of the leading hardware causes for machine 
 crashes in today's datacenters. Designing\, evaluating and modeling system
 s that are resilient against memory errors requires a good understanding o
 f the underlying characteristics of errors in DRAM in the field. While the
 re have recently been a few first studies on DRAM errors in production sys
 tems\, these have been too limited in either the size of the data set or t
 he granularity of the data to conclusively answer many of the open questio
 ns on DRAM errors. Such questions include\, for example\, the prevalence o
 f soft errors compared to hard errors\, or the analysis of typical pattern
 s of hard errors. \n\nIn this project\, we study data on DRAM errors colle
 cted on a diverse range of production systems in total covering nearly 300
  terabyte-years of main memory. As a first contribution\, we provide a det
 ailed analytical study of DRAM error characteristics\, including both hard
  and soft errors. We find that a large fraction of DRAM errors in the fiel
 d can be attributed to hard errors and we provide a detailed analytical st
 udy of their characteristics. As a second contribution\, we use the result
 s from the measurement study to identify a number of promising directions 
 for designing more resilient systems and evaluate the  potential of differ
 ent protection mechanisms in light of realistic error patterns. One of our
  findings is that simple page retirement policies might be able to mask a 
 large number of DRAM errors in production systems\, while sacrificing only
  a negligible fraction of the total DRAM in the system.\n\nBio: Ioan Stefa
 novici is a PhD student in the Computer Systems and Networks Group at the 
 University of Toronto\, under the supervision of Prof. Bianca Schroeder. H
 is research has dealt primarily with improving the reliability and perform
 ance of large-scale computer systems\, studying the reliability of DRAM\, 
 and the impact of temperature on data centers. More recently\, he has been
  working at Microsoft Research on sotware-defined storage.\nhttp://www.cs.
 utoronto.ca/~ioan/\n\n
LOCATION:FW26\, Computer Laboratory\, William Gates Builiding
END:VEVENT
END:VCALENDAR
