Upload
barnaby-obrien
View
227
Download
0
Embed Size (px)
Citation preview
Objective• Use VBA code to extract and store data from web
pages when the underlying database is not accessible.
• Step one – Accessing the web pages
• Step two – Extracting the data
Accessing the Web Pages
• Internet Explorer Object
• WinHttpRequest Object
• XMLHttpRequest object
• Document Object Model (DOM)
• Regular Expressions
Retrieving the Data
IE ObjectDim objIE As ObjectDim varTables As Variant
Set objIE = CreateObject("InternetExplorer.Application")URL = "http://google.com"objIE.Visible = FalseobjIE.navigate URLDo Until Not objIE.Busy DoEventsLoop
While objIE.Document.ReadyState <> "complete"Wend
Set varTables = objIE.Document.all.tags("TABLE")
Advantages• View web page in action• Credentials• Background tasks• DOM
Disadvantages• Hangs up• Can interfere with other browsers• More resources
IE Object
WinHttpRequest Object
Dim winHttpReq As ObjectSet winHttpReq = CreateObject("WinHttp.WinHttpRequest.5.1")
URL = "http://greensboro.usps.gov/Operations/SETIarea/SETI_ReProcessScans.cfm?requesttimeout=5000&SDate=" & ProcDate
winHttpReq.SetTimeouts 6000000, 6000000, 6000000, 6000000winHttpReq.Open "GET", URL, FalsewinHttpReq.SetCredentials “username", “password", HTTPREQUEST_SETCREDENTIALS_FOR_SERVER
winHttpReq.Send result = winHttpReq.responseText Set winHttpReq = Nothing
Advantages• More timeout control• Waits for webpage to complete
Disadvantages• Cannot see what is returned• May need Credentials• No DOM
WinHttpRequest Object
Time Trial
IE Object WinHttpRequest Object
470 497
452 691
414 259
Using two different versions of the Scan Error Tracking program, we ran three separate scans of 542 zip codes. Execution time is in seconds.
DOM vs RegEx• The DOM is preferred when parsing web pages
where the data is in uniform locations, using tables and rows.
• Regular Expressions work best when attempting to find data on a page where the location of the information, or the structure of the page is not known in advance.