Posted By

jacobian on 09/22/06


Tagged

python music scrape beautifulsoup


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

anayhk


All Songs All Songs


 / Published in: Python
 

  1. #!/usr/bin/env python
  2.  
  3. """
  4. Create a SMIL file of all the full tracks from this week's All Songs Considered
  5. (http://www.npr.org/programs/asc/).
  6. """
  7.  
  8. import re
  9. import sys
  10. import urllib2
  11. from BeautifulSoup import BeautifulSoup
  12.  
  13. RA_URL = "rtsp://real.npr.org:80/real.npr.na-central/%s.rm"
  14.  
  15. def allsongsallsongs(url):
  16. smil = ["<smil>", "<body>"]
  17. soup = BeautifulSoup(urllib2.urlopen(url))
  18. for songlink in soup.findAll("a", {"href" : re.compile("getStaticMedia")})[1:]:
  19. rafile = RA_URL % songlink["href"].split("'")[1]
  20. smil.append("<audio src='%s' />" % rafile)
  21. smil.extend(["</body>", "</smil>"])
  22. return "\n".join(smil)
  23.  
  24. if __name__ == '__main__':
  25. if len(sys.argv) > 1:
  26. url = sys.argv[1]
  27. else:
  28. url = "http://www.npr.org/programs/asc/"
  29. print allsongsallsongs(url)

Report this snippet  

You need to login to post a comment.