We Recommend

Learning Python Learning Python
The authors of Learning Python show you enough essentials of the Python scripting language to enable you to begin solving problems right away, then reveal more powerful aspects of the language one at a time. This approach is sure to appeal to programmers and system administrators who have urgent problems and a preference for learning by semi-guided experimentation.


Posted By

scarfboy on 07/17/08


Tagged

python strings accents diacritics


Versions (?)


remove diacritics


Published in: Python 


Useful when creating canonical forms of strings for indexing.

  1. reCombining = re.compile(u'[\u0300-\u036f\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]',re.U)
  2.  
  3. def remove_diacritics(s):
  4. " Decomposes string, then removes combining characters "
  5. return reCombining.sub('',unicodedata.normalize('NFD',unicode(s)) )

Report this snippet 

You need to login to post a comment.