BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:A Scalable Approach for Managing Unstructured Information - Kim Ke
 eton (HP Palo Alto)
DTSTART:20120309T110000Z
DTEND:20120309T120000Z
UID:TALK36699@talks.cam.ac.uk
CONTACT:Eiko Yoneki
DESCRIPTION:Digital data is being generated in mind-boggling amounts: 15 p
 etabytes -- more than 8X the information contained in all US libraries -- 
 is created daily.  The data landscape is shifting -- in addition to struct
 ured data in databases\, organizations are increasingly dealing with unstr
 uctured data such as email\, documents\, spreadsheets\, blogs\, Web pages 
 and media files.  Unstructured information comprises 80% of most organizat
 ions' information today\, and it is growing at an annual rate of 60%.  Use
 rs are demanding increasing sophistication in the level of information pro
 cessing that storage and information management systems provide.  In addit
 ion to the traditional challenges of storing the bytes and searching and c
 lassifying the content\, they need to leverage their information to provid
 e relevant and timely insights that improve the outcomes of the tasks that
  they undertake.\n\nIn this talk\, I will describe recent work at HP Labs 
 on unstructured information management\, including SCAN-lite\, an extensib
 le framework for gathering structured metadata from unstructured documents
 \, and LazyBase\, a scalable database system for ingesting\, storing and q
 uerying the resulting metadata.  Leveraging the high degree of replication
  present in the enterprise\, SCAN-lite uses a two-phase scanning policy (e
 .g.\, an initial phase to identify duplicate content and a second phase to
  do more complicated analysis) that considers client priority classes and 
 idle time to minimize the impact on client foreground workloads.  LazyBase
  is a scalable NoSQL database system that provides extremely high ingest r
 ates\, a strong consistency model (as contrasted with eventual consistency
 )\, and an explicit per-query tradeoff between freshness and query speed.\
 n\nBio: Dr. Kimberly Keeton is a Principal Researcher in the Storage and I
 nformation Management Platform group at HP Labs in Palo Alto\, CA\, USA.  
 Her research focuses on simplifying the management of enterprise informati
 on systems\, including system design and implementation\, modeling\, and o
 ptimization techniques to automatically design systems to meet users' (e.g
 .\, dependability or information quality) goals.\n\n
LOCATION:SS03\, Computer Lab\, William Gates Building
END:VEVENT
END:VCALENDAR
