Clinical De-Identification and Semantic Relatedness
- đ¤ Speaker: Mohamed Abdalla, University of Toronto
- đ Date & Time: Thursday 25 November 2021, 15:00 - 16:00
- đ Venue: https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
Abstract
The first part of this talk will discuss the development of novel low-cost approaches to de-identifying clinical notes. The second part of the talk discuss the development of a new dataset of semantic relatedness for sentence pairs.. This dataset, STR -2021, has 5,500 English sentence pairs manually annotated for semantic relatedness using a comparative annotation framework. We show that the resulting scores have high reliability (repeat annotation correlation of 0.84). We use the dataset to explore a number of questions on what makes two sentences more semantically related. We also evaluate a suite of sentence representation methods on their ability to place pairs that are more related closer to each other in vector space.
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Thursday 25 November 2021, 15:00-16:00