<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>Snipplr</title>
<link>http://snipplr.com/language/perl/tags/text</link>
<description>Recent snippets posted on Snipplr.com</description>
<language>en-us</language>
<pubDate>Sun, 26 May 2013 10:13:54 GMT</pubDate>
<item>
<title>(Perl) Process quoted text differently - deepsoul</title>
<link>http://snipplr.com/view/65684/process-quoted-text-differently/</link>
<description><![CDATA[ <p>This Perl snippet shows how to separate quoted parts from a text in order to process quoted and unquoted parts separately.  For example, you could expand variables or wildcards only in the unquoted part.  Then the different processed parts are put together again.

That a marker character "\x01" is used to mark the position of the quoted passages is a bit ugly, but on the other hand the single line of code putting the text together again is very elegant and shows off what Perl can do!</p> ]]></description>
<pubDate>Mon, 18 Jun 2012 05:00:57 GMT</pubDate>
<guid>http://snipplr.com/view/65684/process-quoted-text-differently/</guid>
</item>
<item>
<title>(Perl) chomp text files and return regex matches by product - alfirth</title>
<link>http://snipplr.com/view/48080/chomp-text-files-and-return-regex-matches-by-product/</link>
<description><![CDATA[ <p></p> ]]></description>
<pubDate>Sat, 29 Jan 2011 09:02:00 GMT</pubDate>
<guid>http://snipplr.com/view/48080/chomp-text-files-and-return-regex-matches-by-product/</guid>
</item>
<item>
<title>(Perl) Changing text case utility - davidmoreen</title>
<link>http://snipplr.com/view/38738/changing-text-case-utility/</link>
<description><![CDATA[ <p>Just run in your consol window as ./tocase (or whatever you call the file). Be sure to give it proper permission first (chmod 755 tocase) :]</p> ]]></description>
<pubDate>Mon, 09 Aug 2010 08:20:18 GMT</pubDate>
<guid>http://snipplr.com/view/38738/changing-text-case-utility/</guid>
</item>
<item>
<title>(Perl) Convert a PDF to text with Perl - noah</title>
<link>http://snipplr.com/view/9963/convert-a-pdf-to-text-with-perl/</link>
<description><![CDATA[ <p>Converts the PDF 'example.pdf' to plain text.

IIRC this only converts the _first_ page of the document, but that can be changed by modifying the argument to getPageContentTree on line 8.  Been a while since I've used this so ymmv.</p> ]]></description>
<pubDate>Fri, 21 Nov 2008 16:24:58 GMT</pubDate>
<guid>http://snipplr.com/view/9963/convert-a-pdf-to-text-with-perl/</guid>
</item>
<item>
<title>(Perl) Clean up Word documents that have been translated to HTML - noah</title>
<link>http://snipplr.com/view/9483/clean-up-word-documents-that-have-been-translated-to-html/</link>
<description><![CDATA[ <p>Haven't tried this with any recent versions of Word.  Yet.</p> ]]></description>
<pubDate>Tue, 04 Nov 2008 09:54:38 GMT</pubDate>
<guid>http://snipplr.com/view/9483/clean-up-word-documents-that-have-been-translated-to-html/</guid>
</item>
<item>
<title>(Perl) Remove newline characters from text - retry</title>
<link>http://snipplr.com/view/7571/remove-newline-characters-from-text/</link>
<description><![CDATA[ <p>This Perl one-liner will remove all newline characters from a text file, replacing them with a space.  In the source, all newline characters in the file gear_list.xml are replaced with spaces, and the original file is saved as gear_list.xml.bak</p> ]]></description>
<pubDate>Tue, 29 Jul 2008 03:17:03 GMT</pubDate>
<guid>http://snipplr.com/view/7571/remove-newline-characters-from-text/</guid>
</item>
<item>
<title>(Perl) search and replace across multiple files with Perl - noah</title>
<link>http://snipplr.com/view/3145/search-and-replace-across-multiple-files-with-perl/</link>
<description><![CDATA[ <p>A couple of useful snippets from an article I found at Perl.com

**Perl search-and-replace on the command line.**

All of these should be usable under Cygwin as well.  But remember that bash wants single-quoted strings but MS-DOS shell wants strings to be double-quoted.</p> ]]></description>
<pubDate>Wed, 04 Jul 2007 20:00:52 GMT</pubDate>
<guid>http://snipplr.com/view/3145/search-and-replace-across-multiple-files-with-perl/</guid>
</item>
<item>
<title>(Perl) capitalize words - noah</title>
<link>http://snipplr.com/view/3134/capitalize-words/</link>
<description><![CDATA[ <p></p> ]]></description>
<pubDate>Tue, 03 Jul 2007 22:53:02 GMT</pubDate>
<guid>http://snipplr.com/view/3134/capitalize-words/</guid>
</item>
<item>
<title>(Perl) Grab linked files from a list of web pages - noah</title>
<link>http://snipplr.com/view/3126/grab-linked-files-from-a-list-of-web-pages/</link>
<description><![CDATA[ <p>## how to use

`perl grabit.pl urls_for_download.txt`

Expects as argument the name of a file containing a newline-delimited list of URLs:

    http://example.com/coolstuff
    http://example.com/coolstuff/fun
    http://example.com/videos/explosions

When invoked, launches an interactive shell that asks what type of file should be downloaded.  Then downloads all the files that are linked from each of the listed Web pages.

Note that the location of the download folder is hard-coded to `c:/windows/desktop/grabit/` so you may want to change that before trying.

This script is also [available on Github](http://github.com/textarcana/scrapers/blob/643e6e7cb349fa94cbc3fc88e1d55c7b6a262d11/grabit.pl)

## Wait! Do you know about WGet and Curl?

This script is legacy.  People seem to like it (hey, I still use it) but today I would probably not write my own tool to download multiple files off remote sites.

Instead I would likely just use a command-line Web browser like [WGet](http://lifehacker.com/software/top/geek-to-live--mastering-wget-161202.php 'Gina Trapani of Lifehacker, on the way of the WGet ninja') or Curl.  [LWP-Request would also do the trick](http://snipplr.com/view/4063/download-linked-jpegs-from-a-web-page-on-the-command-line/)

## do not comment your code like this!

For a great explanation of the rather baroque commenting style I was using circa 2001, see [Steve Yegge's excellent article on code style: *Portait of a n00b.*](http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.html)  

Of course, when I sit down to write a Perl script today, I [use POD](http://snipplr.com/view/18611/perl-pod-embedded-documentation-example/) to format and publish my comments.</p> ]]></description>
<pubDate>Tue, 03 Jul 2007 22:31:30 GMT</pubDate>
<guid>http://snipplr.com/view/3126/grab-linked-files-from-a-list-of-web-pages/</guid>
</item>
<item>
<title>(Perl) Remove duplicate lines from a text file with Perl - noah</title>
<link>http://snipplr.com/view/3124/remove-duplicate-lines-from-a-text-file-with-perl/</link>
<description><![CDATA[ <p>*[Found at Google Answers.](http://answers.google.com/answers/threadview?id=25196)*

Sometimes I get a big list of things, and some of the things occur multiple times in the same list.  To make the list easier to read, I want to *delete* the duplicate lines.
 
A good example is a list of files that have errors (maybe excerpted from an application sever's log files).  In that case you have a newline-delimited list of file paths,  and depending upon the situation, the same file path might be listed 4 or 5 times or more.  Often, it is useful to have a list of just the files that are faulty, which can be produced by deleting all the duplicate lines.  This script is for filtering just those kinds of list files.

[Of course,](http://xkcd.com/378/ "emacs has a command for that")  for **Emacs** users  [there is a much easier way to remove duplicate lines:](http://everything2.com/title/useful%2520emacs%2520lisp%2520functions "useful emacs lisp functions"), if you have `uniq` installed on your system.

`M-x sort-lines RET C-x h M-x shell-command-on-region RET uniq RET`</p> ]]></description>
<pubDate>Tue, 03 Jul 2007 22:06:34 GMT</pubDate>
<guid>http://snipplr.com/view/3124/remove-duplicate-lines-from-a-text-file-with-perl/</guid>
</item>
<item>
<title>(Perl) Examine CSS selectors in JSP files and report on whether they match some patterns - noah</title>
<link>http://snipplr.com/view/3119/examine-css-selectors-in-jsp-files-and-report-on-whether-they-match-some-patterns/</link>
<description><![CDATA[ <p>Processe a directory of JSP source files, looks for regex matches, and reports on the number and location of matches.  For detecting deprecated attributes in XHTML.

Uses Mail::Outlook to send an email.  Requires you to click "ok" each time a notification is sent, but gets around firewalls.</p> ]]></description>
<pubDate>Tue, 03 Jul 2007 21:27:59 GMT</pubDate>
<guid>http://snipplr.com/view/3119/examine-css-selectors-in-jsp-files-and-report-on-whether-they-match-some-patterns/</guid>
</item>
<item>
<title>(Perl) Text change across multiple files - noah</title>
<link>http://snipplr.com/view/2034/text-change-across-multiple-files/</link>
<description><![CDATA[ <p>Put this in a script called "update.pl" and call it with "ls *ext|xargs perl update.pl"  Be sure to back up the work directory before playing with this, as it is very easy to wipe out the content of a bunch of files at once in this way.</p> ]]></description>
<pubDate>Sun, 28 Jan 2007 09:01:18 GMT</pubDate>
<guid>http://snipplr.com/view/2034/text-change-across-multiple-files/</guid>
</item>
</channel>
</rss>