I am trying to scrape my employers website to extract images from their Blog post en mass. I have started creating a scraping tool in Excel using VBA.
(We don't have access to the SQL database)
I have setup a work sheet that contains a list of post identifiers in column A and the URL of the post in column B.
My VBA script so far runs through the list of URL's in column B extracts the HTML from a Tag on the page by ID, using getElementById and pastes the resulting output as a string into column C.
I am now at the point where I am trying to figure out how to extract the src attribute from every image in the resulting output and paste it into the relevant columns. I can't for the life of me come up with an easy solution. I am not very familiar with RegEx and am struggling with Excel's built in string functions.
The end game is to get the macro to run through each image URL and save the image to disk with a filename format like "{Event No.}-{Image Number}".jpg
Any help would be much appreciated.
Sub Get_Image_SRC()
Dim sht As Worksheet
Dim LastRow As Long
Dim i As Integer
Dim url As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set sht = ThisWorkbook.Worksheets("Sheet1")
'Ctrl + Shift + End
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
For i = 2 To LastRow
url = Cells(i, "C").Value
MsgBox (url)
IE.navigate url
Application.StatusBar = url & " is loading..."
Do While IE.readyState = 4: DoEvents: Loop
Do Until IE.readyState = 4: DoEvents: Loop
Application.StatusBar = url & " Loaded"
If Cells(i, "B").Value = "WEBNEWS" Then
Cells(i, "D").Value = IE.document.getElementById("NewsDetail").outerHTML
Else
Cells(i, "D").Value = IE.document.getElementById("ReviewContainer").outerHTML
End If
Next i
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
End Sub
Example resulting HTML:
<div id=""NewsDetail""><div class=""NewsDetailTitle"">Video: Race Face Behind the Scenes Tour</div><div class=""NewsDetailImage""><img alt=""HeadlinesThumbnail.jpg"" src=""/ImageHandler/6190/515/1000/0/""></div> <div class=""NewsDetailBody"">Pinkbike posted this video a while ago, if you missed it, its' definitely worth a watch.
Ken from Camp of Champions took a look at their New Westminster factory last year which gives a look at the production, people and culture of Race Face. The staff at Race Face are truly their greatest asset they had, best wishes to everyone!
<p><center><object width=""500"" height=""281""><param name=""allowFullScreen"" value=""true""><param name=""AllowScriptAccess"" value=""always""><param name=""movie"" value=""http://ift.tt/2dr45JB""><embed width=""500"" height=""281"" src=""http://ift.tt/2dr45JB"" type=""application/x-shockwave-flash"" allowscriptaccess=""always"" allowfullscreen=""true""></object></center><p></p>
</div><div class=""NewsDate"">Published Friday, 25 November 2011</div></div>"
Aucun commentaire:
Enregistrer un commentaire