A modular architecture for Unicode text compression
- đ¤ Speaker: Adam Gleave (University of Cambridge)
- đ Date & Time: Tuesday 14 June 2016, 15:00 - 15:15
- đ Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38
Abstract
Unicode is now ubiquitous, with 87% of online content in the UTF -8 character encoding. Conventional compression techniques operate on individual bytes: this works well for ASCII , but poorly for UTF -8, where a character can span multiple bytes. Previous attempts at Unicode compression have invented new algorithms from scratch, with generally poor results. My approach is to extend existing data compression algorithms to operate over Unicode characters. I find this substantially improves compression effectiveness for Unicode text, with only a small overhead for ASCII and binary files.
Please note the talk will last for 15 minutes, although I will be available afterwards for any further questions.
Series This talk is part of the arg58's list series.
Included in Lists
- All Cavendish Laboratory Seminars
- All Talks (aka the CURE list)
- Biology
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge Neuroscience Seminars
- Cambridge talks
- Cambridge University Engineering Department, CBL Seminar room BE4-38
- CBL important
- Centre for Health Leadership and Enterprise
- Chris Davis' list
- Creating transparent intact animal organs for high-resolution 3D deep-tissue imaging
- custom
- dh539
- dh539
- Featured lists
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group
- Inference Group Summary
- Information Engineering Division seminar list
- Interested Talks
- Joint Machine Learning Seminars
- Life Science
- Life Sciences
- Machine Learning
- Machine Learning @ CUED
- Machine Learning Summary
- ME Seminar
- ML
- ndk22's list
- Neurons, Fake News, DNA and your iPhone: The Mathematics of Information
- Neuroscience
- Neuroscience Seminars
- Neuroscience Seminars
- ob366-ai4er
- Required lists for MLG
- rp587
- School of Physical Sciences
- Seminar
- Simon Baker's List
- Stem Cells & Regenerative Medicine
- Thin Film Magnetic Talks
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Tuesday 14 June 2016, 15:00-15:15