Posted By

on 09/06/08


Tagged

rss perl scraping getyourwaron


Versions (?)

Get Your War On Scrape to RSS


 / Published in: Perl
 

URL: http://www.mnftiu.cc/mnftiu.cc/war.html

Really old and busted Get Your War On scraper but it still works so there.

  1. #!/usr/bin/perl
  2.  
  3. use HTML::Entities;
  4. use LWP::Simple;
  5.  
  6. # print a feed header
  7. print "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n".
  8. "<rdf:RDF\n".
  9. "xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"\n".
  10. "xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"\n".
  11. "xmlns=\"http://my.netscape.com/rdf/simple/0.9/\">\n".
  12. "<channel>\n".
  13. " <title>Get Your War On</title>\n".
  14. " <link>http://www.mnftiu.cc/mnftiu.cc/war.html</link>\n".
  15. " <description>A webcomic about our 9/11 epilogue.</description>\n".
  16. "</channel>\n\n";
  17.  
  18. $html_string = get ("http://www.mnftiu.cc/mnftiu.cc/war.html");
  19.  
  20. $i = 2;
  21.  
  22. while ($html_string =~ m/<a href="war(\d|\d\d).html">(\d|\d\d)<\/a>/g)
  23. {
  24. $i++
  25. }
  26.  
  27. $url = "http://www.mnftiu.cc/mnftiu.cc/war" . $i . ".html";
  28.  
  29. $html_string = get ($url);
  30.  
  31. while ($html_string =~ m/<img src="images\/gywo.(.*?).gif" border=0>/g)
  32. {
  33. print "<item>\n".
  34. "<title>" . $1 . "</title>\n".
  35. "<link>" . $url . "</link>\n".
  36. "<description>&lt;img src=\"http://www.mnftiu.cc/mnftiu.cc/images/gywo." . $1 . ".gif\"&gt;</description>\n";
  37.  
  38. print "</item>\n\n";
  39. }
  40.  
  41. print "</rdf:RDF>\n";

Report this snippet  

You need to login to post a comment.