Posted By

scarfboy on 07/17/08


Tagged

python strings accents diacritics


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

juj


remove diacritics


 / Published in: Python
 

Useful when creating canonical forms of strings for indexing.

  1. reCombining = re.compile(u'[\u0300-\u036f\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]',re.U)
  2.  
  3. def remove_diacritics(s):
  4. " Decomposes string, then removes combining characters "
  5. return reCombining.sub('',unicodedata.normalize('NFD',unicode(s)) )

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: juj on June 22, 2010

just brilliant!

You need to login to post a comment.