Data Connections -- Options, Functionality, and Performance

Preview:

Citation preview

Data Connections -- Options,

Functionality, and Performance

Objective• Use VBA code to extract and store data from web

pages when the underlying database is not accessible.

• Step one – Accessing the web pages

• Step two – Extracting the data

• Document Object Model (DOM)

• Regular Expressions

Retrieving the Data

IE ObjectDim objIE As ObjectDim varTables As Variant

Set objIE = CreateObject("InternetExplorer.Application")URL = "http://google.com"objIE.Visible = FalseobjIE.navigate URLDo Until Not objIE.Busy DoEventsLoop

While objIE.Document.ReadyState <> "complete"Wend

Set varTables = objIE.Document.all.tags("TABLE")

Advantages• View web page in action• Credentials• Background tasks• DOM

Disadvantages• Hangs up• Can interfere with other browsers• More resources

IE Object

WinHttpRequest Object

Dim winHttpReq As ObjectSet winHttpReq = CreateObject("WinHttp.WinHttpRequest.5.1")

URL = "http://greensboro.usps.gov/Operations/SETIarea/SETI_ReProcessScans.cfm?requesttimeout=5000&SDate=" & ProcDate

winHttpReq.SetTimeouts 6000000, 6000000, 6000000, 6000000winHttpReq.Open "GET", URL, FalsewinHttpReq.SetCredentials “username", “password", HTTPREQUEST_SETCREDENTIALS_FOR_SERVER

winHttpReq.Send result = winHttpReq.responseText Set winHttpReq = Nothing

Advantages• More timeout control• Waits for webpage to complete

Disadvantages• Cannot see what is returned• May need Credentials• No DOM

WinHttpRequest Object

Time Trial

IE Object WinHttpRequest Object

470 497

452 691

414 259

Using two different versions of the Scan Error Tracking program, we ran three separate scans of 542 zip codes. Execution time is in seconds.

DOM vs RegEx• The DOM is preferred when parsing web pages

where the data is in uniform locations, using tables and rows.

• Regular Expressions work best when attempting to find data on a page where the location of the information, or the structure of the page is not known in advance.

Recommended