Python - Cattura tutti i links <a href=


/ Published in: Python
Save to your folder(s)



Copy this code and paste it in your HTML
  1. import os,re,sys
  2.  
  3. # python script.py file.html
  4.  
  5. links = re.compile('[<].?[Aa].*[Hh][Rr][Ee][Ff].*=.*[\"\']?.*[\"\']?.?[>]')
  6. lunghezza_file = os.stat(sys.argv[1])[6]
  7. f = open(sys.argv[1], 'r')
  8.  
  9. while(lunghezza_file > 0):
  10. riga = f.readline()
  11. lunghezza_file -= len(riga)
  12.  
  13. if links.search(riga):
  14. comparazione = links.search(riga)
  15. output = comparazione.group(0)
  16. links2 = re.compile('http:-*[Zz][Ii][Pp]')
  17.  
  18. if links2.search(output):
  19. output2 = links2.search(output)
  20. print output2.group(0)
  21.  
  22. print 'FATTO'

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.