Posted By

danfsmith on 07/19/11


Tagged

html metadata


Versions (?)

Get Title and MetaData from HTML files


 / Published in: Windows PowerShell
 

  1. add-type -Path c:\dan\tools\html-agility-pack\HtmlAgilityPack.dll
  2. $files = Get-ChildItem -Filter *.htm -Path C:\Path\ -Recurse
  3. $doc = New-Object HtmlAgilityPack.HtmlDocument
  4. $result = $files | % {
  5. #Write-Host "Checking $_"
  6. $name = $_\3.FullName.Replace("FILEPATH","WEBPATH").Replace("\", "/")
  7. #Get second folder of URL as "section"
  8. $sections = $name.Split("/")
  9. $section = $sections[3]
  10. if ($section.Contains(".htm"))
  11. {
  12. $section = ""
  13. }
  14. $htmldoc = $doc.Load($_\3.FullName)
  15. $titlenode = $doc.DocumentNode.SelectSingleNode("//title")
  16. $descriptionnode = $doc.DocumentNode.SelectSingleNode("//meta[@name='description']")
  17. if ($descriptionnode) {
  18. $description = $descriptionnode.GetAttributeValue("content", "")
  19. }
  20. else {
  21. $description = ""
  22. }
  23. $title = $titlenode.InnerText
  24. New-Object PsObject -Property @{ Name = $name; Section=$section; Title=$title; Description=$description;} | Select Name, Section, Title, Description
  25. }
  26. $result | Sort Section, Name

Report this snippet  

You need to login to post a comment.