Posted By

manatlan on 09/10/09


Tagged

html python text


Versions (?)

decode html entities


 / Published in: Python
 

ex : decode_htmlentities("l'eau")

  1. from htmlentitydefs import name2codepoint as n2cp
  2. import re
  3.  
  4. def substitute_entity(match):
  5. ent = match.group(3)
  6. if match.group(1) == "#":
  7. if match.group(2) == '':
  8. return unichr(int(ent))
  9. elif match.group(2) == 'x':
  10. return unichr(int('0x'+ent, 16))
  11. else:
  12. cp = n2cp.get(ent)
  13. if cp:
  14. return unichr(cp)
  15. else:
  16. return match.group()
  17.  
  18. def decode_htmlentities(string):
  19. entity_re = re.compile(r'&(#?)(x?)(\w+);')
  20. return entity_re.subn(substitute_entity, string)[0]

Report this snippet  

You need to login to post a comment.