BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Information theoretic model selection in clustering - Joachim M Bu
 hmann\, Department of Computer Science\, ETH Zurich
DTSTART:20091028T110000Z
DTEND:20091028T120000Z
UID:TALK21146@talks.cam.ac.uk
CONTACT:Peter Orbanz
DESCRIPTION:Partitioning of data sets into groups defines an important\npr
 eprocessing step for compression\, prototype extraction or outlier\nremova
 l. Various criteria of connectedness or proximity have been\nproposed to g
 roup data according to structural similarity but in\ngeneral it is unclear
  which method or model to use. In the spirit of\ninformation theory we pro
 pose a decision process to determine the\namount of extractable informatio
 n from data conditioned on a\nhypothesis class of partitions. A sender-rec
 eiver-scenario defines an\napproximation capacity for a clustering problem
  which quantizes the\nhypothesis class and\, thereby\, introduces sets of 
 statistically\nindistinguishible partitionings. The quality of a clusterin
 g model is\ndetermined by its ability to extract more "signal" bits from a
  data\nsource than a competing data interpretation. \n\nEmpirical evidence
  for this model selection concept is provided by\ncluster validation in co
 mputer security\, i.e.\, multilabel clustering\nof Boolean data for role b
 ased access control\, but also in analysis of\nmicroarray data.\n
LOCATION:Engineering Department\, CBL Room 438
END:VEVENT
END:VCALENDAR