Can we automatically anonymize text documents?
- 👤 Speaker: Pierre Lison, Norwegian Computing Center
- 📅 Date & Time: Thursday 19 May 2022, 11:00 - 12:00
- 📍 Venue: https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
Abstract
Text documents often contain personal data in some form. To protect the privacy of the individuals referred to in those documents, it is often desirable (and, in many cases, mandatory) to edit those documents such as to conceal the identity of those individuals. This anonymization process remains a difficult task, at the intersection of NLP , law and data privacy. In this talk, I’ll give an overview of current approaches and outline a number of unsolved problems. Furthermore, I’ll present the Text Anonymization Benchmark (TAB), a new corpus and evaluation framework dedicated to this task. TAB contains 1268 court cases from the European Court of Human Rights manually enriched with detailed annotations regarding the personal data expressed in each document. We hope this new benchmark will inspire NLP researchers to work on this challenging but important problem.
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Thursday 19 May 2022, 11:00-12:00