remove diacritics


/ Published in: Python
Save to your folder(s)

Useful when creating canonical forms of strings for indexing.


Copy this code and paste it in your HTML
  1. reCombining = re.compile(u'[\u0300-\u036f\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]',re.U)
  2.  
  3. def remove_diacritics(s):
  4. " Decomposes string, then removes combining characters "
  5. return reCombining.sub('',unicodedata.normalize('NFD',unicode(s)) )

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.