10
Data Connections -- Options, Functionality, and Performance

Data Connections -- Options, Functionality, and Performance

Embed Size (px)

Citation preview

Page 1: Data Connections -- Options, Functionality, and Performance

Data Connections -- Options,

Functionality, and Performance

Page 2: Data Connections -- Options, Functionality, and Performance

Objective• Use VBA code to extract and store data from web

pages when the underlying database is not accessible.

• Step one – Accessing the web pages

• Step two – Extracting the data

Page 4: Data Connections -- Options, Functionality, and Performance

• Document Object Model (DOM)

• Regular Expressions

Retrieving the Data

Page 5: Data Connections -- Options, Functionality, and Performance

IE ObjectDim objIE As ObjectDim varTables As Variant

Set objIE = CreateObject("InternetExplorer.Application")URL = "http://google.com"objIE.Visible = FalseobjIE.navigate URLDo Until Not objIE.Busy DoEventsLoop

While objIE.Document.ReadyState <> "complete"Wend

Set varTables = objIE.Document.all.tags("TABLE")

Page 6: Data Connections -- Options, Functionality, and Performance

Advantages• View web page in action• Credentials• Background tasks• DOM

Disadvantages• Hangs up• Can interfere with other browsers• More resources

IE Object

Page 7: Data Connections -- Options, Functionality, and Performance

WinHttpRequest Object

Dim winHttpReq As ObjectSet winHttpReq = CreateObject("WinHttp.WinHttpRequest.5.1")

URL = "http://greensboro.usps.gov/Operations/SETIarea/SETI_ReProcessScans.cfm?requesttimeout=5000&SDate=" & ProcDate

winHttpReq.SetTimeouts 6000000, 6000000, 6000000, 6000000winHttpReq.Open "GET", URL, FalsewinHttpReq.SetCredentials “username", “password", HTTPREQUEST_SETCREDENTIALS_FOR_SERVER

winHttpReq.Send result = winHttpReq.responseText Set winHttpReq = Nothing

Page 8: Data Connections -- Options, Functionality, and Performance

Advantages• More timeout control• Waits for webpage to complete

Disadvantages• Cannot see what is returned• May need Credentials• No DOM

WinHttpRequest Object

Page 9: Data Connections -- Options, Functionality, and Performance

Time Trial

IE Object WinHttpRequest Object

470 497

452 691

414 259

Using two different versions of the Scan Error Tracking program, we ran three separate scans of 542 zip codes. Execution time is in seconds.

Page 10: Data Connections -- Options, Functionality, and Performance

DOM vs RegEx• The DOM is preferred when parsing web pages

where the data is in uniform locations, using tables and rows.

• Regular Expressions work best when attempting to find data on a page where the location of the information, or the structure of the page is not known in advance.