BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Data Mining and Information Extraction for CiteSeerX and Friends -
  Dr. C. Lee Giles\, Pennsylvania State University
DTSTART:20120629T110000Z
DTEND:20120629T120000Z
UID:TALK38465@talks.cam.ac.uk
CONTACT:Ekaterina Kochmar
DESCRIPTION:Cyberinfrastructure or e-science has become crucial in many ar
 eas of\nscience where data access often defines scientific progress. Open 
 source (OS)\nsystems have greatly facilitated design and implementation an
 d supporting\ncyberinfrastructure permitting the design of specialized int
 egrated search\nengines and digital libraries which offer many opportuniti
 es for domain\nrelevant information and knowledge extraction\, such as cit
 ation extraction\,\nautomated indexing and ranking\, chemical formulae sea
 rch\, table indexing\, etc.\nWe describe the open source SeerSuite archite
 cture which is a modular\,\nextensible system built on successful OS proje
 cts such as Lucene/Solr and\ndiscuss issues in building domain specific en
 terprise search and\ncyberinfrastructure for the sciences and academia. Be
 cause of the large amount\nof information crawled and/or search there are 
 many scale problems in\ninformation extraction and data mining such as aut
 hor and entity\ndisambiguation\, data extraction and ranking\, etc. We hig
 hlight application\ndomains with examples from computer science\, CiteSeer
 X\, and chemistry\,\nChemXSeer and related problem areas.\nBecause such en
 terprise systems require unique information extraction\napproaches\, sever
 al different machine learning methods\, such as conditional\nrandom fields
 \, support vector machines\, mutual information based feature\nselection\,
  sequence mining\, etc. are critical for performance. We draw lessons\nfor
  other e-science and cyberinfrastructure systems in terms of design\,\nimp
 lementation and research and discuss future directions\, systems and\nrese
 arch.
LOCATION:SW01\, Computer Laboratory
END:VEVENT
END:VCALENDAR
