Posted By

mrAlexGray on 04/10/11


Versions (?)

regex FAQ list

 / Published in: Perl

  1. #regex FAQ list
  3. This is the FAQ list for #regex.
  5. How do I parse HTML/XML?
  6. Read How to parse HTML...∞ and Bring Me Your Regexs! I Will Create HTML To Break Them!∞.
  7. Here is a short list of HTML parsers for popular languages. XML parsers seem to be easy enough to find.
  8. El-Kabong HTML∞ (C)
  9. HTML Tidy∞ (C)
  10. HTML::Parser∞ (Perl)
  11. HTMLParser∞ (Python)
  12. GNU Wget∞ wget/src/html-parse.c (C)
  13. libxml2∞ (C)
  14. Libwww - the W3C Protocol Library∞ (C)
  15. PHP: DOM Functions∞
  16. PHP: Tidy Functions∞
  17. PHP: SimpleXML∞
  18. For UNIX command line tools (awk, sed, etc), consider converting HTML to XHTML using Tidy∞ and then to PYX using XMLStarlet∞.
  19. ## extract all hyperlinks
  20. <bookmark.htm tidy -asxhtml 2>/dev/null | xmlstarlet pyx | sed '/^(a/,/^)a/!d;/^Ahref /!d;s///'
  22. How do I match a URL?
  25. How do I match text which doesn't match a pattern?
  26. How do I negate a match?
  27. Ideally, you'll want to use the features of your language or application software to do this. Here are some examples:
  28. Perl:
  29. $str !~ m/foo/
  31. PHP:
  32. if (!preg_match("/foo/", $string))
  34. sed:
  35. /foo/d
  37. vi:
  38. :v/foo/p
  40. mod_rewrite:
  41. !/foo/
  43. grep -v foo
  45. If you cannot use such a technique because your application (e.g. a text editor) does not allow that level of programmability, you may be able to get by with an expression such as:
  46. /^(?!.*foo)/s
  47. Note however that this may be much slower than the equivalent negated expression.
  49. How do I match text which contains words in any order?
  50. How do I match text which matches more than one pattern?
  51. This is another of those situations where regular expressions alone are not enough. The best way is to match the line against multiple patterns:
  52. Perl:
  53. if ($str =~ m/foo/ && $str =~ m/bar/)
  55. PHP:
  56. if (preg_match("/foo/", $string) && preg_match("/bar/", $string))
  58. sed:
  59. /foo/!d;/bar/!d
  61. grep foo | grep bar
  63. Again, if you cannot use such a technique, try
  64. /^(?=.*foo)(?=.*bar)/s

Report this snippet  

You need to login to post a comment.