Posted By

scarfboy on 07/17/08


python strings accents diacritics

Versions (?)

Who likes this?

1 person have marked this snippet as a favorite


remove diacritics

 / Published in: Python

Useful when creating canonical forms of strings for indexing.

  1. reCombining = re.compile(u'[\u0300-\u036f\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]',re.U)
  3. def remove_diacritics(s):
  4. " Decomposes string, then removes combining characters "
  5. return reCombining.sub('',unicodedata.normalize('NFD',unicode(s)) )

Report this snippet  


RSS Icon Subscribe to comments
Posted By: juj on June 22, 2010

just brilliant!

You need to login to post a comment.