Posted By

iblis on 01/27/09


Tagged

html download web


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

icecreamboyy


Batch download code between tags from remote HTML pages


 / Published in: Perl
 

Won't work with Google code pages: they are javascript powered.

  1. #!/usr/bin/env perl
  2. #
  3. # grabcode.pl
  4. # Download code between <pre> tags from remote HTML pages
  5. # Takes a list of urls as argument
  6.  
  7. use strict; use warnings;
  8.  
  9. use WWW::Mechanize;
  10. use HTML::TreeBuilder::XPath;
  11. use Encode;
  12.  
  13. my @urls = @ARGV;
  14.  
  15. my $browser = WWW::Mechanize->new;
  16. $browser->agent_alias('Linux Mozilla');
  17. #$browser->credentials('uname', 'passwd');
  18.  
  19. foreach my $url (@urls) {
  20.  
  21. my $page;
  22. if ( $browser->get($url)->is_success() ) {
  23. $page = $browser->content();
  24. }
  25. else {
  26. warn "Skipping $url:\n$browser->status_line\n";
  27. next;
  28. }
  29.  
  30. my $tree= HTML::TreeBuilder::XPath->new;
  31. $tree->parse( $page );
  32.  
  33. my $nodes = $tree->findnodes( '//pre');
  34. while ( my $node = $nodes->shift() ) {
  35. print encode("utf8",$node->as_text());
  36. print "\n";
  37. }
  38. }

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: Vordreller on March 18, 2009

I really have no idea what this does. I don't find the title to be very clear and the code is kinda confusing.

You need to login to post a comment.