python strings accents diacritics

remove diacritics

 / Published in: Python

Useful when creating canonical forms of strings for indexing.

  1. reCombining = re.compile(u'[\u0300-\u036f\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]',re.U)
  3. def remove_diacritics(s):
  4. " Decomposes string, then removes combining characters "
  5. return reCombining.sub('',unicodedata.normalize('NFD',unicode(s)) )

Posted By: juj on June 22, 2010

just brilliant!

