BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Statistical Significance Analysis of Motif Discovery - Patrick Ng
DTSTART:20101105T100000Z
DTEND:20101105T110000Z
UID:TALK27746@talks.cam.ac.uk
CONTACT:Microsoft Research Cambridge Talks Admins
DESCRIPTION:The identification of transcription factor binding sites\, and
  of cis-regulatory elements in general\, is an important step in understan
 ding the regulation of gene expression. To address this need\, many motif-
 finding tools have been described that can find short sequence motifs give
 n only an input set of sequences. In the first part of the talk\, I will d
 iscuss why a reliable significance evaluation should be considered an esse
 ntial component of any motif finder\, and then I will introduce a novel bi
 ologically realistic method to estimate the reported motif's statistical s
 ignificance based on a novel 3-Gamma approximation scheme. Furthermore\, I
  will show how its reliability can be further improved by incorporating lo
 cal base composition information. Finally\, I will present GIMSAN: a tool 
 for de novo motif finding that incorporates this novel significance evalua
 tion technique. \n\nIn the second part of my talk\, I will present ALICO (
 Alignment Constrained) null set generator: a framework to generate randomi
 zed versions of an input multiple sequence alignment that preserve some of
  its crucial features including its dependence structure. In particular\, 
 I will show that\, on average\, ALICO samples approximately preserve the P
 IDs (percent identities) between every pair of input sequences as well as 
 the average Markov model composition. I will demonstrate its utility in ph
 ylogenetic motif finders\, which are finders that leverage on conservation
  information.\n
LOCATION:Small public lecture room\, Microsoft Research Ltd\, 7 J J Thomso
 n Avenue (Off Madingley Road)\, Cambridge
END:VEVENT
END:VCALENDAR
