View
2.110
Download
1
Category
Preview:
Citation preview
DevTools to crawl Webpages.
DevTools
09.05.12 2@chrschneider
3
… Apache … toolset of low level Java components focused on HTTP and associated protocols.“
● HttpComponents Core… is a set of low level HTTP transport components
● HttpComponents Client… provides reusable components for client-side ... HTTP connection management.
● HttpComponents AsyncClient (DEV)… ability to handle a great number of concurrent connections ... more ... performance in terms of a raw data throughput.
● Commons HttpClient (Legacy)… All users of Commons HttpClient 3.x are strongly encouraged to upgrade toHttpClient 4.1.
09.05.12
DevTools
@chrschneider
09.05.12 4
HttpComponents Client
Example Components
● Get, Post, Delete, … Request Objects
● Cookie Manager
● SSL
● Content Encoding Aware
● HTTP Authentication (Basic, Digest, ...)
DevTools
@chrschneider
09.05.12 5
public final static void main(final String[] args) throws Exception{
final HttpClient httpclient = new DefaultHttpClient();try{
final HttpGet httpget = new HttpGet("http://www.google.com/");
System.out.println("executing request " + httpget.getURI());
// Create a response handlerfinal ResponseHandler<String> responseHandler = new BasicResponseHandler();final String responseBody = httpclient.execute(httpget, responseHandler);System.out.println("----------------------------------------");System.out.println(responseBody);System.out.println("----------------------------------------");
}finally{
httpclient.getConnectionManager().shutdown();}
}
http://hc.apache.org/httpcomponents-client-ga/examples.html
HttpComponents Client Example
DevTools
@chrschneider
09.05.12 6
HttpComponents Client
Demo
DevTools
@chrschneider
09.05.12 7
… is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients.
See: http://netty.io/
DevTools
@chrschneider
09.05.12 8
… is a "GUI-Less browser for Java programs"
Features (extraction):● Support for the HTTP and HTTPS protocols● Support for cookies● Ability to specify whether failing responses from the server should throw exceptions
or should be returned as pages of the appropriate type (based on content type)● Ability to customize the request headers being sent to the server● Support for HTML responses
● Support for submitting forms● Support for clicking links● Support for walking the DOM model of the HTML document● JavaScript support
DevTools
@chrschneider
09.05.12 9
… is a "GUI-Less browser for Java programs"
@Testpublic void homePage() throws Exception{
final WebClient webClient = new WebClient();final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
System.out.println(page.getTitleText());
assertEquals("Welcome to HtmlUnit", page.getTitleText());
final String pageAsXml = page.asXml();assertTrue(pageAsXml.contains("<body class=\"composite\">"));
final String pageAsText = page.asText();assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols"));
webClient.closeAllWindows();}
http://htmlunit.sourceforge.net/gettingStarted.html
DevTools
@chrschneider
09.05.12 10
… is a "GUI-Less browser for Java programs"
@Testpublic void getElements() throws Exception{
final WebClient webClient = new WebClient();final HtmlPage page = webClient.getPage("http://some_url");final HtmlDivision div = page.getHtmlElementById("some_div_id");final HtmlAnchor anchor = page.getAnchorByName("anchor_name");
webClient.closeAllWindows();}
Luxus :)
http://htmlunit.sourceforge.net/gettingStarted.html
Note: Also html tables are supported. They wrote easy wrapper classes to walk though them. … Handy!http://htmlunit.sourceforge.net/table-howto.html
DevTools
@chrschneider
09.05.12 11
… automates browsers. That's it.
Selenium-WebDriver supports the following browsers along with the operating systems these browsers are compatible with.
● Google Chrome 12.0.712.0+
● Internet Explorer 6, 7, 8, 9 - 32 and 64-bit where applicable
● Firefox 3.0, 3.5, 3.6, 4.0, 5.0, 6, 7
● Opera 11.5+
● HtmlUnit 2.9
● Android – 2.3+ for phones and tablets (devices & emulators)
● iOS 3+ for phones (devices & emulators) and 3.2+ for tablets (devices & emulators)
DevTools
@chrschneider
09.05.12 12
… automates browsers. That's it.
Selenium IDE
Selenium WebDriver
Selenium Grid
The Selenium Family
Also c#, Phython, Ruby, ...
Also on Windows and Mac
DevTools
@chrschneider
09.05.12 13
… automates browsers. That's it.
Selenium IDE
Selenium WebDriver
Selenium Grid
The Selenium Family
… create quick bug reproduction scripts
… create scripts to aid in automation-aided exploratory testing
… create robust, browser-based regression automation
… scale and distribute scripts across many environments
http://seleniumhq.org/
DevTools
@chrschneider
09.05.12 14
Requirements for Selenium WebDriver with Firefox(and HtmlUnit)
<dependency><groupId>org.seleniumhq.selenium</groupId><artifactId>selenium-java</artifactId><version>2.21.0</version>
</dependency>
<dependency><groupId>org.seleniumhq.selenium</groupId><artifactId>selenium-htmlunit-driver</artifactId><version>2.21.0</version>
</dependency>
<dependency><groupId>org.seleniumhq.selenium</groupId><artifactId>selenium-firefox-driver</artifactId><version>2.21.0</version>
</dependency>
Dependencies Browser Binaries
That's
it.
DevTools
@chrschneider
09.05.12 15
Basic Selenium example
@Testpublic void testSeleniumWithFirefox() throws InterruptedException{
final WebDriver webDriver = new FirefoxDriver();
webDriver.get("http://www.majug.de");
final WebElement veranstaltungenLink = webDriver.findElement(By.linkText("Veranstaltungen"));
veranstaltungenLink.click();
// Close the browserThread.sleep(5000);webDriver.quit();
}
DevTools
@chrschneider
09.05.12 16
Selenium WebDriver Locator Strategies
It's also possible to call findElements(...) to get a List<> of WebElements.:
List<WebElement> hits = webDriver.findElements(By.tagName("a"));
DevTools
@chrschneider
09.05.12 17
Selenium WebDriver Interactions
If you got a webElement, you can...
● webElement.click() it
● webElement.sendKeys(...) to it
● webElement.submit() on it.
It is also possible to perform “Actions“ like DoubleClick, DragAndDrop, ClickAndHold, …with the “Actions“ class.
DevTools
@chrschneider
09.05.12 18
Selenium WebDriver
Demo
DevTools
@chrschneider
09.05.12 19
Selenium WebDriver Pitfalls
Newbie Pitfalls:
● Selenium doesn't wait until the hole site is loaded (Keyword: Implicit wait)● webElement.xPath(“@// ...“) starts from root of the DOM (use “.//...“ instead)● Google brings up “Selenium RC“ solutions. This is the old Selenium project.● A reference to a WebElement will become invalid if the driver “moves“ to
another page.● Firefox doesn't run on our CI because it is a headless system (try Xvfb)● New XPath 2.0 functions (like ends-with(...)) are failing. This is because Selenium
uses the driver's native Xpath engine. For Firefox this means it is Xpath 1.0 today.
DevTools
@chrschneider
Noch Fragen?Vielen Dank für Ihre Aufmerksamkeit!
Recommended