Posted By

tionazo on 12/26/14


Tagged

regex python web urllib2


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

pcpc


Get all links from a website


 / Published in: Python
 

URL: http://www.pythonforbeginners.com/code/regular-expression-re-findall

Get all links from a website from: http://www.pythonforbeginners.com/code/regular-expression-re-findall

  1. import urllib2
  2. import re
  3.  
  4. #connect to a URL
  5. website = urllib2.urlopen(url)
  6.  
  7. #read html code
  8. html = website.read()
  9.  
  10. #use re.findall to get all the links
  11. links = re.findall('"((http|ftp)s?://.*?)"', html)
  12.  
  13. print links

Report this snippet  

You need to login to post a comment.