Conservation Evidence
- 👤 Speaker: Shrey Biswas / Radhika Iyer / Kacper Michalik, University of Cambridge
- 📅 Date & Time: Friday 29 November 2024, 13:00 - 13:55
- 📍 Venue: FW11, William Gates Building. Zoom link: https://cl-cam-ac-uk.zoom.us/j/4361570789?pwd=Nkl2T3ZLaTZwRm05bzRTOUUxY3Q4QT09&from=addon
Abstract
Abstract:
Grey literature’s inherent nature means that it is a difficult form of media to discover, typically being hidden deep within websites, analyse, following no standard file formats or structures, and process, due to the sheer volume of existing and actively produced literature, this forms a massive cost and time problem for organisations that require such literature in their function.
We devise and implement a pipeline that uses Common Crawl internet archives to locate & scrape potential grey literature; then process it for use in a multistage machine learning pipeline to classify and output relevant media.
Bios:
Shrey Biswas is a second-year Computer Science Student at Pembroke College.
Radhika Iyer is a second-year Computer Science Student at Murray Edwards College.
Kacper Michalik is a Second-year Computer Science Student at Pembroke College.
Series This talk is part of the Energy and Environment Group, Department of CST series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge talks
- Department of Computer Science and Technology talks and seminars
- Energy and Environment Group, Department of CST
- FW11, William Gates Building. Zoom link: https://cl-cam-ac-uk.zoom.us/j/4361570789?pwd=Nkl2T3ZLaTZwRm05bzRTOUUxY3Q4QT09&from=addon
- Interested Talks
- School of Technology
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 29 November 2024, 13:00-13:55