Posted By

softmechanics on 01/21/10


Tagged

regex curl rss feed Shell Bash linux haskell torrent hsh podcast


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

keigoi


Simple Broadcatcher in Haskell/HSH


 / Published in: Haskell
 

HSH is a cool haskell library that allows you to leverage your shell scripting prowess in haskell programs. In this simple broadcatcher, I use curl for http get, and other standard unix tools for tracking history (so we don't get the same file twice). The feed parsing and filtering is done in haskell using the Text.Feed and Text.Regex libraries.

Note: if you decide to use this in real life, be sure to respect your feed's time to live (ttl) in your crontab.

  1. #!/usr/bin/env runhaskell
  2.  
  3. import Char
  4. import Data.List
  5. import HSH
  6. import Maybe
  7. import Text.Feed.Import
  8. import Text.Feed.Query
  9. import Text.Regex.Posix
  10.  
  11. -- CONFIGURATION --
  12. dlDir = "/path/to/download/dir/"
  13. historyFile = "/path/to/download/history.log"
  14.  
  15. any_patterns = ["some.*thing", "something.*else", "etc"]
  16. all_patterns = ["every.*thing"]
  17. none_patterns = ["some.*boring.*thing"]
  18.  
  19. feed_url = "http://my/feed.rss"
  20.  
  21. -- curl cli flags (see man curl)
  22. curl_opts = ""
  23.  
  24. -- END CONFIGURATION --
  25.  
  26. curl = "curl -s " ++ curl_opts
  27. fetchFeed = curl ++ "\"" ++ feed_url ++ "\""
  28. fetchFiles = "(cd " ++ dlDir ++ " && xargs -r " ++ curl ++ " -O)"
  29.  
  30. withCurry f g = curry $ f . uncurry g
  31. matches patterns title = map (\p -> title =~ p :: Bool) patterns
  32. match_any = any id `withCurry` matches
  33. match_all = all id `withCurry` matches
  34. match_none = all not `withCurry` matches
  35.  
  36. filters = [match_any any_patterns, match_all all_patterns, match_none none_patterns]
  37.  
  38. -- filter using a list of predicates
  39. allPreds fs = flip all fs . flip ($)
  40.  
  41. filterSubscriptions lines =
  42. case parseFeedString $ unlines lines of
  43. Just feed -> map link $ doFilter $ mapMaybe titleAndLink (getFeedItems feed)
  44. Nothing -> error "feed parse failed"
  45. where title (x, _) = x
  46. link (_,x) = x
  47. titleAndLink item = do title <- getItemTitle item
  48. link <- getItemLink item
  49. return (title, link)
  50. doFilter = filter (allPreds filters . map toLower . title)
  51.  
  52. checkHistory = "bash -c \"sort | diff <(sort " ++ historyFile ++ ") - | sed -n 's/^> //p' | tee -a " ++ historyFile ++ "\""
  53.  
  54. test = runIO $ "cat /tmp/feed.xml" -|- filterSubscriptions
  55. main = runIO $ fetchFeed -|- "tee /tmp/feed.xml" -|- filterSubscriptions -|- checkHistory -|- fetchFiles

Report this snippet  

You need to login to post a comment.