BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Learning based on Data Compression - Steven de Rooij\, University 
 of Cambridge
DTSTART:20090211T163000Z
DTEND:20090211T173000Z
UID:TALK16774@talks.cam.ac.uk
CONTACT:Richard Samworth
DESCRIPTION:Bayesian statistics are used all over the place\, but in the s
 hadows lurks\nanother inference method called the Minimum Description Leng
 th principle\n(MDL). Although MDL and Bayesian inference turn out to be qu
 ite closely\nrelated technically\, they are the product of very different 
 sequences of\nideas. In this presentation I will describe the background o
 f MDL and how it\nturns out to be related to Bayesian statistics. To this 
 end I will introduce\nKolmogorov complexity\, which is a measure of the am
 ount of information in\nobserved data that does not require probabilistic 
 assumptions. Although\nKolmogorov complexity\, which is itself based on Tu
 ring's notion of effective\ncomputation\, can theoretically be used to lea
 rn just about anything that you\nmight want to learn\, unfortunately it is
  itself uncomputable. MDL was\ndeveloped as a computable version of Kolmog
 orov complexity\, but later\nborrowed heavily from information theory and 
 statistics as well. The Kraft\ninequality provides the crucial connection 
 between methods based on data\ncompression and methods based on probabilit
 y theory.\n\nReading:\n\nThis presentation focuses at no one single paper\
 , but below I've listed the\nrelevant references\, many of which are class
 ics. However\, for those\ninterested in compression-based learning I'd rec
 ommend looking at the\nfollowing two textbooks instead:\n\nP.Vitanyi and M
 . Li 2008. "An Introduction to Kolmogorov Complexity and its\nApplications
 ". Third edition\, Springer Verlag. See\nhttp://homepages.cwi.nl/~paulv/ko
 lmogorov.html.\nP. Grunwald 2007. "The Minimum Description Length principl
 e". MIT Press. See\nhttp://mitpress.mit.edu/catalog/item/default.asp?ttype
 =2&tid=11155\n\nReferences:\n\nA.M. Turing 1936. "On Computable Numbers\, 
 with an Application to the\nEntscheidungsproblem". Proceedings of the Lond
 on Mathematical Society series\n2\, 42:230-265. Many versions appear onlin
 e\, for example\nhttp://www.thocp.net/biographies/papers/turing_oncomputab
 lenumbers_1936.pdf\n\nC.E. Shannon 1948. "A Mathematical Theory of Communi
 cation"\, Bell Systems\nTechnical Journal 27:379-423\,623-656. Available a
 t\nhttp://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf\n\nL.G. K
 raft 1949. "A Device for Quantising\, Grouping and Coding\nAmplitude-Modul
 ated Pulses". Master's thesis\, MIT. Not available online.\n\nB. McMillan 
 1956. "Two Inequalities implied by Unique Decipherability". IEEE\nTransact
 ions on Information Theory\, 2(4):115-116.\nAvailable at http://ieeexplore
 .ieee.org/stamp/stamp.jsp?arnumber=01056818\n\nR. Solomonoff 1964. "A Form
 al Theory of Inductive Inference" (Parts I\,II).\nInformation and Control\
 , 7(1):1-22 (Part I) and 7(2):224-254 (part II).\nAvailable at\nhttp://wor
 ld.std.com/~rjs/1964pt1.pdf (part I)\nhttp://world.std.com/~rjs/1964pt2.pd
 f (part II)\n\nA.N. Kolmogorov 1965. "Three Approaches to the Quantitative
  Definition of\nInformation". Problems of Information Transmission 1(1):1-
 7\nNot available online.\n\nJ. Rissanen 1978. "Modeling by the Shortest Da
 ta Description". Automatica\n14:465-471\n\n
LOCATION:MR5\, CMS
END:VEVENT
END:VCALENDAR
