BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Examining Raft's behaviour during partial network failures - Chris
  J. Jensen\, Computer Lab
DTSTART:20210429T140000Z
DTEND:20210429T150000Z
UID:TALK159961@talks.cam.ac.uk
CONTACT:Srinivasan Keshav
DESCRIPTION:State machine replication protocols such as Raft are widely us
 ed to\nbuild highly-available strongly-consistent services\, maintaining\n
 liveness even if a minority of servers crash. As these systems are\nimplem
 ented and optimised for production\, they accumulate many\ndivergences fro
 m the original specification. These divergences are\npoorly documented\, r
 esulting in operators having an incomplete model of\nthe system's characte
 ristics\, especially during failures. In this paper\,\nwe look at one such
  Raft model used to explain the November Cloudflare\noutage and show that 
 etcd's behaviour during the same failure differs.\nWe continue to show the
  specific optimisations in etcd causing this\ndifference and present a mor
 e complete model of the outage based on\netcd's behaviour in an emulated d
 eployment using reckon. Finally\, we\nhighlight the upcoming PreVote optim
 isation in etcd\, which might have\nprevented the outage from happening in
  the first place.\n\nBio:\n\nChris Jensen is a first year PhD student in t
 he SRG\, focusing on\nbenchmarking and improving the availability of stron
 gly consistent\ndistributed databases. He previously completed his BSc in 
 Computer\nScience at the University of Cambridge.
LOCATION:https://meet.google.com/ehj-dwaz-rea
END:VEVENT
END:VCALENDAR
