I’m new to Web Data Extraction, and I’m working on getting property information from the website http://ift.tt/1Af5DvY. It’s an appraisal district website with tax appraisals for all the properties in a county in Texas. It’s a project I’m helping a family member with. My plan was to use Excel and code in vba to extract the data to a spreadsheet(s). As I’ve researched online, I’ve found bits and pieces that have helped me get partially through the task. Here is the code I am using:
Sub DataExtract()
Dim IE As Object ' Internet Explorer Object
Dim URL As String ' URL string
Dim OwnerName As String ' Owner Names
Dim TxtInput As Variant ' input texbox
Dim ieDoc As Object ' the document retrieved
Dim IeTable As Object ' the data table
Dim ieCell As Object ' the data cells in the table
Dim ElementCol As MSHTML.IHTMLElementCollection ' to loop thru element collection
Dim btnInput As MSHTML.HTMLInputElement 'button input
Dim x As Integer, i As Integer, p As Integer 'counters used throughout code
' load names
OwnerName = "a"
' create the IE object
Set IE = CreateObject("InternetExplorer.Application")
' open WadTX
URL = "http://ift.tt/1Af5DvY"
With IE
.Navigate URL
.Visible = True
' loop until the page finishes loading
Do While .Busy
Loop
' enter our names in the Owner Name input box
Set TxtInput = .document.getElementsByName("propertySearchOptions:ownerName")
TxtInput(0).Value = OwnerName
' click 'Search' button
Set ElementCol = .document.getElementsByTagName("input")
For Each btnInput In ElementCol
If btnInput.Type = "submit" And btnInput.Name = "propertySearchOptions:search" Then
btnInput.Click
Exit For
End If
Next btnInput
' loop until the page finishes loading
Do While .Busy
Loop
Sheets("Sheet1").Activate
Range("A1").Select
End With
Set ieDoc = IE.document
'Loop through all the elements in the document via the 'all' property
For i = 0 To ieDoc.all.Length - 1
' find the table with the property details
If TypeName(ieDoc.all(i)) = "HTMLTable" And _
InStr(ieDoc.all(i).innerText, "Property") > 0 Then
Set IeTable = ieDoc.all(i)
' loop thru each row in our table skipping the header
For x = 1 To IeTable.Rows.Length - 1
' loop thru the cells in the row
For p = 0 To IeTable.Rows(x).Cells.Length - 1
Set ieCell = IeTable.Rows(x).Cells(p)
ActiveCell.Offset(x, p).Value = ieCell.innerText
Next
Next
Exit For
End If
Next i
' clean up
IE.Quit
Set IE = Nothing
End Sub
The idea is to use the Owner Name search box, and just type in each letter in the alphabet, one at a time, and ultimately compile a list of all the properties with the corresponding data in Excel. So far, what happens is, I open the URL, input “a” (to pull all owners with last name beginning with “a”), click search, and get the first page (of 97 total pages) of results. I can then pull the data from this table into Excel.
Where I’m stuck is, that this table is just a quick summary of the properties, and each record has a “View Details” link to another page with all of the details of the property (which is what I would like to get). So, the end goal would be to
- Loop through all letters in the alphabet on the Search page
- Loop through all summary pages of the properties under each letter
- Loop through each detail table for each property and copy the data/table to a spreadsheet.
My question is, can this be done, or are there problems that make it difficult/impossible. A side question I have is, is this the best method to accomplish my goal, or is there a better/more efficient way to do this? Would it be easier to extract to a database using some other coding method?
Thanks for your time. Please let me know if you have any questions or need any more info from me.
Aucun commentaire:
Enregistrer un commentaire