jeudi 25 mai 2017

VBA web automation: scraping innertext from table/ or tagnames(td)

I have been trying to web scrape data from the website soccerstats, specifically the football team "Arsenal's results (http://ift.tt/2r10bAd)

(there are several tables on the webpage, I am after the data in the biggest table)

My current code scrapes the innertext from any td tag which comes out a mess:

'start a new subroutine called SearchBot
Sub soccer_stats()
 
    'dimension (declare or set aside memory for) our variables
    Dim objIE As InternetExplorer 'special object variable representing the IE browser
    Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
    Dim y As Integer 'integer variable we'll use as a counter
    Dim result As String 'string variable that will hold our result link
    Dim Variable1 As String
 
 Variable1 = InputBox("put in what you are searching")
 
    'initiating a new instance of Internet Explorer and asigning it to objIE
    Set objIE = New InternetExplorer
 
    'make IE browser visible (False would allow IE to run in the background)
    objIE.Visible = True
 
    'navigate IE to this web page (a pretty neat search engine really)
    objIE.navigate "http://ift.tt/WwXLSl"
 
    'wait here a few seconds while the browser is busy
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
Dim ele As Object

For Each ele In objIE.document.getElementsByTagName("input")
    If ele.Name = "searchstring" Then
        ele.Value = Variable1
    End If
Next ele

For Each ele In objIE.document.getElementsByTagName("input")
    If ele.className = "submit" Then
        ele.Click
    End If
Next ele


    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

For Each ele In objIE.document.getElementsByTagName("a")
    If ele.innerText = Variable1 Then
        ele.Click
    End If
Next ele

    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    
    
    
    
   

    
    'new bit
    y = 2
  For Each ele In objIE.document.getElementsByTagName("td")
 
        '...get the innertext and print it to the sheet in col A, row y
        result = ele
        Sheets("Sheet2").Range("A" & y).Value = ele.innerText
 
     
  y = y + 1
  Next
  
  
  Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
End Sub

Is there a way to paste the inner text to Rows A, B, C, D if it meets conditions i, ii, iii, iv?

the first column of the table has html: <td height=”18” align=”right”> 14 Aug</td

So could I change my code to For Each ele In objIE.document.getElementsByTagName("td") AND height="18?

and for the next column in the table, the html code has no height, so could I change it to "For Each ele In objIE.document.getElementsByTagName("td") AND height=null?

or is there a better way to scrape the entire table? Thank you for your help

edit:

the html for each column in the webpage is: date column:

<td height=”18” align=”right”> 14 Aug</td

home team column:

<td align=”right”><b>Arsenal</b></td>

score column:

   <td width=”45 align=”center”>
<a class=”tooltip2” href=”#”>
<font color=”#0000aa”>
<b>3 – 4</b>

away team column:

    <td align="left">
Liverpool
</td>



Aucun commentaire:

Enregistrer un commentaire