<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>Snipplr - noah</title>
<link>http://snipplr.com/users/noah/tags/perl</link>
<description>Recent snippets posted on Snipplr.com</description>
<language>en-us</language>
<pubDate>Thu, 23 May 2013 05:42:42 GMT</pubDate>
<item>
<title>(Perl) Detect PHP files with trailing whitespace, using Perl</title>
<link>http://snipplr.com/view/46218/detect-php-files-with-trailing-whitespace-using-perl/</link>
<description><![CDATA[ <p>The following incantation returns nonzero exit status when the terminating `?>`  of a PHP file, is followed by whitespace.</p> ]]></description>
<pubDate>Wed, 29 Dec 2010 08:25:03 GMT</pubDate>
<guid>http://snipplr.com/view/46218/detect-php-files-with-trailing-whitespace-using-perl/</guid>
</item>
<item>
<title>(Bash) kill zombie processes</title>
<link>http://snipplr.com/view/41095/kill-zombie-processes/</link>
<description><![CDATA[ <p>Thanks to [Kastner](http://metaatem.net/) for these snippets.</p> ]]></description>
<pubDate>Tue, 28 Sep 2010 07:14:38 GMT</pubDate>
<guid>http://snipplr.com/view/41095/kill-zombie-processes/</guid>
</item>
<item>
<title>(SVN) Howto list all the file extension types in an SVN log dump</title>
<link>http://snipplr.com/view/28195/howto-list-all-the-file-extension-types-in-an-svn-log-dump/</link>
<description><![CDATA[ <p>Note that on Windows you will want to double-quote the string argument to `perl -ne` rather than single-quoting it.  Otherwise this works on Windows (with Cygwin) as well.</p> ]]></description>
<pubDate>Thu, 11 Feb 2010 14:59:31 GMT</pubDate>
<guid>http://snipplr.com/view/28195/howto-list-all-the-file-extension-types-in-an-svn-log-dump/</guid>
</item>
<item>
<title>(Perl) Search and Replace Across Multiple Files with Perl</title>
<link>http://snipplr.com/view/19732/search-and-replace-across-multiple-files-with-perl/</link>
<description><![CDATA[ <p>This brief script *replaces* the batch search-and-replace tool in your commercial text editor.  If batch file search and replacement is the only reason you need an IDE, you can adopt this script and go back to using Notepad (or better yet `vi`).

Thanks to JPinyan, who taught me the pattern shown below, back in 2001 on `beginners.perl.org`

The regular expression flags used here are explained in excellent detail in the *best practices for regular expressions* chapter of **Perl Best Practices** by Damian Conway.</p> ]]></description>
<pubDate>Tue, 15 Sep 2009 00:18:30 GMT</pubDate>
<guid>http://snipplr.com/view/19732/search-and-replace-across-multiple-files-with-perl/</guid>
</item>
<item>
<title>(Perl) Perl POD embedded documentation example</title>
<link>http://snipplr.com/view/18611/perl-pod-embedded-documentation-example/</link>
<description><![CDATA[ <p>Start all your Perl scripts this way.  Then **instead** of writing documentation, *generate documentation automatically; from comments embedded in the code.* 

Steve Oualline has [another nice Perl POD example](http://my.safaribooksonline.com/073571228X/ch10lev1sec4)

### How to generate Perl POD

Use `pod2html myFile.pl` to generate HTML documentation (like JavaDoc, RDoc or YUIDoc).  

There's also `pod2latex` which does just what you would think ;-)  

Or just use `pod2text` to generate plain text documentation.</p> ]]></description>
<pubDate>Thu, 20 Aug 2009 13:15:43 GMT</pubDate>
<guid>http://snipplr.com/view/18611/perl-pod-embedded-documentation-example/</guid>
</item>
<item>
<title>(Perl) check for broken links (shell one-liner)</title>
<link>http://snipplr.com/view/18344/check-for-broken-links-shell-oneliner/</link>
<description><![CDATA[ <p>Retrieves links from a remote HTML page, then checks the response code of each link.  Duplicated links are only checked once, and anchors are ignored.  That is `http://foo` and `http://foo#bar` are considered to be the same URL, and thus `http://foo` will only be checked once; even if both URLs occur on the page.

Note that if the command produces too much output for you, you can filter down to just the *broken* links (if any) by piping the output of the entire one-liner, to `grep -v "200 OK"`

## Dependencies

These must be installed on your system:

0. [Perl](http://perl.org)
0. [lwp-request](http://search.cpan.org/~gaas/libwww-perl-5.831/bin/lwp-request)
0. [sort](http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html 'GNU sort sorts lines in a text file')
0. [uniq](http://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html 'GNU uniq filters duplicate items from a list')

## Troubleshooting

**Check the quotes.**  The command below has strings wrapped in *double* quotes (") which is appropriate if you are using a *Windows* shell.

**If you are using a Mac or Linux shell** then you need to change the double quotes around the strings in the command below, to *single* quotes (').  Then all should work fine.

It is good to keep in mind always that the Windows shell wants strings to be double-quoted, while Unix-ish shells want strings to be single-quoted.</p> ]]></description>
<pubDate>Sat, 15 Aug 2009 08:55:19 GMT</pubDate>
<guid>http://snipplr.com/view/18344/check-for-broken-links-shell-oneliner/</guid>
</item>
<item>
<title>(Perl) comparing the checksums of two files with Perl and cksum</title>
<link>http://snipplr.com/view/16660/comparing-the-checksums-of-two-files-with-perl-and-cksum/</link>
<description><![CDATA[ <p>This one-liner helps to determine if two or more files have the same
checksum.  It works by piping the output from `cksum` to Perl, which
takes note of the first checksum and compares each subsequent file's
checksum to that value.

Assume an example session where we have three identical files and two
that are different
      
    >echo bart > bart
    >cp bart bart1
    >cp bart bart2
    >echo milhouse > mvh
    >echo lisa > lisa
      
two files with the same checksum, produce no output
      
    >cksum bart bart1 | perl -ane '$x ||= $F[0]; warn if $x != $F[0];'
      
if a there is a different checksum, the line numbers printed are the
indexes of those files
      
    >cksum bart bart1 mvh bart2 lisa | perl -ane '$x ||= $F[0]; warn if $x != $F[0];'
    Warning: something's wrong at -e line 1,  line 3.
    Warning: something's wrong at -e line 1,  line 5.</p> ]]></description>
<pubDate>Sun, 05 Jul 2009 21:17:39 GMT</pubDate>
<guid>http://snipplr.com/view/16660/comparing-the-checksums-of-two-files-with-perl-and-cksum/</guid>
</item>
<item>
<title>(Perl) Fuzzy string matching with Perl</title>
<link>http://snipplr.com/view/16365/fuzzy-string-matching-with-perl/</link>
<description><![CDATA[ <p>Fuzzy string matches with Jarkko Hietaniemi's String::Approx module.

**Get approximate matches, close to what you want.**  This is great for when you have filenames that might contain misspellings, extra underscores or other typos and mistakes.  Also great for searching for files when there are several different naming conventions used within a project.

Mainly I am concerned with being able to match strings that have underscores inserted (or deleted) in arbitrary places.  But the result I came up with here, does a pretty good job of matching when there are all sorts of typos, without picking up *too* many false positives.</p> ]]></description>
<pubDate>Sat, 27 Jun 2009 01:09:08 GMT</pubDate>
<guid>http://snipplr.com/view/16365/fuzzy-string-matching-with-perl/</guid>
</item>
<item>
<title>(Perl) grep with Perl</title>
<link>http://snipplr.com/view/11878/grep-with-perl/</link>
<description><![CDATA[ <p>A combination of the instructions in the book _Minimal Perl_ and [this Perl one-liners page](http://sial.org/howto/perl/one-liner/)

The general form of the one-liner is:

    > perl -wnl -e '/REGEX/ and print $ARGV." $.: $_"; close ARGV if eof' 

The example below shows how to print the hex colors that are defined in a [Sass](http://haml.hamptoncatlin.com/docs/rdoc/classes/Sass.html) source tree.</p> ]]></description>
<pubDate>Thu, 05 Feb 2009 13:29:59 GMT</pubDate>
<guid>http://snipplr.com/view/11878/grep-with-perl/</guid>
</item>
<item>
<title>(SVN) Just print SVN log message bodies</title>
<link>http://snipplr.com/view/10791/just-print-svn-log-message-bodies/</link>
<description><![CDATA[ <p>Gets the SVN log and prints out just the log messages, without any metadata.  Exposes the narrative aspect of software workflow, allows the actors to recede into the background a bit.</p> ]]></description>
<pubDate>Wed, 31 Dec 2008 13:56:14 GMT</pubDate>
<guid>http://snipplr.com/view/10791/just-print-svn-log-message-bodies/</guid>
</item>
<item>
<title>(Perl) Generate XHTML on the command line with XML::API::XHTML</title>
<link>http://snipplr.com/view/4971/generate-xhtml-on-the-command-line-with-xmlapixhtml/</link>
<description><![CDATA[ <p>Build a simple XHTML document, format it with Tidy and print the result to "temp.html."

You may first need to `cpan   XML::API::XHTML`, and install Tidy using your favorite package manager, or just [download Tidy by itself](http://tidy.sourceforge.net/#binaries)

### If you can't install XML::API::XHTML

On my system, CPAN complained that "make test had returned bad status" during installation of XML::API::XHTML.  To solve this I started the CPAN shell (just type `cpan`) and then forced the install, like this:
    
    cpan> force XML::API::XHTML</p> ]]></description>
<pubDate>Sun, 10 Feb 2008 02:21:23 GMT</pubDate>
<guid>http://snipplr.com/view/4971/generate-xhtml-on-the-command-line-with-xmlapixhtml/</guid>
</item>
<item>
<title>(Bash) Scrape Google from the command line</title>
<link>http://snipplr.com/view/4299/scrape-google-from-the-command-line/</link>
<description><![CDATA[ <p>This code is POC only -- actually using it would violate Google's TOS, which forbids scraping.  It is published here for educational value only.

Hypothetically, the following command should return a list of the top 500 or so hits in Google for onemorebug.com.

The results will be prepended with digits, followed by a dot and some whitespace (Lynx adds these).

_You must have Lynx and Wget installed on your system for this to work._

Keep in mind that *nix shells don't like it when you double-quote strings, see the comments.</p> ]]></description>
<pubDate>Sun, 09 Dec 2007 21:16:58 GMT</pubDate>
<guid>http://snipplr.com/view/4299/scrape-google-from-the-command-line/</guid>
</item>
<item>
<title>(Bash) check linked pages for Tidy validation errors, on the command line</title>
<link>http://snipplr.com/view/4130/check-linked-pages-for-tidy-validation-errors-on-the-command-line/</link>
<description><![CDATA[ <p>Given a list of HTML links (for example, a Google results page that has been saved locally) check each link that points to a page on a given domain, and report if Tidy complains that its doctype declaration is missing.

Besides DOCTYPE, other strings to search for include "discarding", "lacks value" and "Error:"

Remember to replace MY_DOMAIN with the actual domain you want to validate against.</p> ]]></description>
<pubDate>Tue, 13 Nov 2007 23:47:28 GMT</pubDate>
<guid>http://snipplr.com/view/4130/check-linked-pages-for-tidy-validation-errors-on-the-command-line/</guid>
</item>
<item>
<title>(Bash) Validate a list of Web pages with Tidy, on the command line</title>
<link>http://snipplr.com/view/4129/validate-a-list-of-web-pages-with-tidy-on-the-command-line/</link>
<description><![CDATA[ <p></p> ]]></description>
<pubDate>Tue, 13 Nov 2007 22:54:37 GMT</pubDate>
<guid>http://snipplr.com/view/4129/validate-a-list-of-web-pages-with-tidy-on-the-command-line/</guid>
</item>
<item>
<title>(Bash) Tail requests for HTML files</title>
<link>http://snipplr.com/view/4069/tail-requests-for-html-files/</link>
<description><![CDATA[ <p>This command will tail an Apache access log, but only print lines where an HTML file was requested.</p> ]]></description>
<pubDate>Mon, 05 Nov 2007 12:40:54 GMT</pubDate>
<guid>http://snipplr.com/view/4069/tail-requests-for-html-files/</guid>
</item>
<item>
<title>(Bash) Download linked JPEGs from a Web page, on the command line</title>
<link>http://snipplr.com/view/4063/download-linked-jpegs-from-a-web-page-on-the-command-line/</link>
<description><![CDATA[ <p>The following command will download all the files with a JPG extension that are linked from http://flickr.com.

_Requires the LWP and HTML::Tree Perl modules.  You must also have Wget installed on your system for this to work._</p> ]]></description>
<pubDate>Fri, 02 Nov 2007 22:57:45 GMT</pubDate>
<guid>http://snipplr.com/view/4063/download-linked-jpegs-from-a-web-page-on-the-command-line/</guid>
</item>
<item>
<title>(Perl) Fix installing Perl modules on OS X</title>
<link>http://snipplr.com/view/4009/fix-installing-perl-modules-on-os-x/</link>
<description><![CDATA[ <p>On Mac OS X, typing this command into CPAN will fix the make error when installing a Perl module. Found on the CPAN forums.

Here is the error I was getting:
-- NOT OK
Running make test
Can't run test witout successful make
Running make install
make had returned bad status, install seems impossible</p> ]]></description>
<pubDate>Fri, 26 Oct 2007 13:15:37 GMT</pubDate>
<guid>http://snipplr.com/view/4009/fix-installing-perl-modules-on-os-x/</guid>
</item>
<item>
<title>(Perl) How to start CPAN on Windows/ActivePerl</title>
<link>http://snipplr.com/view/3718/how-to-start-cpan-on-windowsactiveperl/</link>
<description><![CDATA[ <p>ActivePerl has a great package manager utility -- just type "ppm" at the prompt.

The problem is that ActivePerl's PPM points to ActivePerl's package repository, which doesn't contain every module listed on CPAN.

The solution is to invoke the cpan utility under windows!  Here's how.</p> ]]></description>
<pubDate>Mon, 17 Sep 2007 10:23:37 GMT</pubDate>
<guid>http://snipplr.com/view/3718/how-to-start-cpan-on-windowsactiveperl/</guid>
</item>
<item>
<title>(Perl) search and replace across multiple files with Perl</title>
<link>http://snipplr.com/view/3145/search-and-replace-across-multiple-files-with-perl/</link>
<description><![CDATA[ <p>A couple of useful snippets from an article I found at Perl.com

**Perl search-and-replace on the command line.**

All of these should be usable under Cygwin as well.  But remember that bash wants single-quoted strings but MS-DOS shell wants strings to be double-quoted.</p> ]]></description>
<pubDate>Wed, 04 Jul 2007 20:00:52 GMT</pubDate>
<guid>http://snipplr.com/view/3145/search-and-replace-across-multiple-files-with-perl/</guid>
</item>
<item>
<title>(Perl) Grab linked files from a list of web pages</title>
<link>http://snipplr.com/view/3126/grab-linked-files-from-a-list-of-web-pages/</link>
<description><![CDATA[ <p>## how to use

`perl grabit.pl urls_for_download.txt`

Expects as argument the name of a file containing a newline-delimited list of URLs:

    http://example.com/coolstuff
    http://example.com/coolstuff/fun
    http://example.com/videos/explosions

When invoked, launches an interactive shell that asks what type of file should be downloaded.  Then downloads all the files that are linked from each of the listed Web pages.

Note that the location of the download folder is hard-coded to `c:/windows/desktop/grabit/` so you may want to change that before trying.

This script is also [available on Github](http://github.com/textarcana/scrapers/blob/643e6e7cb349fa94cbc3fc88e1d55c7b6a262d11/grabit.pl)

## Wait! Do you know about WGet and Curl?

This script is legacy.  People seem to like it (hey, I still use it) but today I would probably not write my own tool to download multiple files off remote sites.

Instead I would likely just use a command-line Web browser like [WGet](http://lifehacker.com/software/top/geek-to-live--mastering-wget-161202.php 'Gina Trapani of Lifehacker, on the way of the WGet ninja') or Curl.  [LWP-Request would also do the trick](http://snipplr.com/view/4063/download-linked-jpegs-from-a-web-page-on-the-command-line/)

## do not comment your code like this!

For a great explanation of the rather baroque commenting style I was using circa 2001, see [Steve Yegge's excellent article on code style: *Portait of a n00b.*](http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.html)  

Of course, when I sit down to write a Perl script today, I [use POD](http://snipplr.com/view/18611/perl-pod-embedded-documentation-example/) to format and publish my comments.</p> ]]></description>
<pubDate>Tue, 03 Jul 2007 22:31:30 GMT</pubDate>
<guid>http://snipplr.com/view/3126/grab-linked-files-from-a-list-of-web-pages/</guid>
</item>
</channel>
</rss>