p27 Billig

Embed Size (px)

Citation preview

  • 7/28/2019 p27 Billig

    1/6

    Evaluation of Google HackingJ ustin Billig

    Department of Computer ScienceNorthern Kentucky UniversityHighland Heights, KY 41099

    (859)[email protected]

    Yuri DanilchenkoDepartment of Computer Science

    Northern Kentucky UniversityHighland Heights, KY 41099

    (859)[email protected]

    Charles E. FrankDepartment of Computer Science

    Northern Kentucky UniversityHighland Heights, KY 41099

    (859)[email protected]

    ABSTRACTGoogle Hacking uses the Google search engine to locate sensitive

    information or to find vulnerabilities that may be exploited. Thispaper evaluates how much effort it takes to get Google Hacking to

    work and how serious the threat of Google Hacking is. The paper

    discusses the countermeasures that can be used against Google

    Hacking.

    Categories and Subject DescriptorsK.6.5 [Management of Computing and Information Systems]:

    Security and Protection authentication, unauthorized access.

    General TermsSecurity.

    KeywordsInformation security, web security, hacking, Google Hacking,

    information assurance.

    1. INTRODUCTIONWikipedia [7] defines Google Hacking as the art of creating

    complex search engine queries in order to filter through largeamounts of search results for information related to computersecurity. In its malicious format it can be used to detect websites

    that are vulnerable to numerous exploits and vulnerabilities as

    well as locate private, sensitive information about others, such as

    credit card numbers, social security numbers, and passwords. This

    filtering is performed by using advanced Google operators.Attackers can use Google Hacking to uncover sensitive

    information about a company or to uncover potential security

    vulnerabilities. A security professional can use Google Hacking

    to determine if their websites are disclosing sensitive information.

    Northern Kentucky University is a 15,000 student regional state

    university. We performed a Google Hacking security assessmentof our university. In a few cases, we tried some of the Google

    Hacking techniques more widely on the Internet. This allowed usto determine if various Google hacks actually work. Often,

    techniques that worked in the past no longer work, as

    vulnerabilities are patched.

    We tried to determine how much effort it took to perform various

    Google hacks. This was done purely for research purposes. Wenever had the intent of maliciously using any sensitive

    information or potential security vulnerabilities. We have

    disclosed potential issues to the security staff at our university.

    In this paper, we assess the seriousness of information disclosure

    using Google Hacking and make recommendations of what can bedone to defend against Google hackers.

    2. BACKGROUND

    The definitive source for information about Google Hacking isLong [5]. This book provides background in Google queries and

    advanced operators. It has chapters on locating information on

    the Web in various types of documents, locating exploit code and

    finding vulnerable targets, and on how to search for usernames,

    passwords, and social security numbers. This book is a must readfor security professionals wishing to protect their websites from

    disclosing information to Google hackers.

    A second important source is Johnny Longs website [3]. Its

    Google Hacking Database [2] contains a large number of Googlesearches by category. The categories include Files containing

    passwords, Pages containing login portals, and Sensitivedirectories. A user can try a Google search in the database by

    simply clicking on a link.

    We were only able to find one paper on Google Hacking in the

    academic literature. Lancor and Workman [4] describe

    incorporating Google Hacking into a graduate course on web

    security. This paper serves as a good introduction to GoogleHacking. It describes a series of exercises used to teach students

    how to use Google Hacking to test their own sites and how to

    defend against it.

    3. TECHNIQUES

    We mostly limited our Google Hacking activities to NorthernKentucky University. We sometimes tried other educational sites

    in the US, except for a few network device searches, which

    required a bit of a broader domain. Our main goal, whileperforming the searches, was to check which Google hacks

    actually work. These hacks were found by us on the Internet, in

    Johnny Long's book [5] and in his Google Hacking Database [2].

    This information is critical to understand how vulnerable we

    really are to Google Hacking. Are websites protecting informationagainst Google Hacking? Sadly, most of the examples of

    Permission to make digital or hard copies of all or part of this work for

    personal or classroom use is granted without fee provided that copies are

    not made or distributed for profit or commercial advantage and that

    copies bear this notice and the full citation on the first page. To copy

    otherwise, or republish, to post on servers or to redistribute to lists,

    requires prior specific permission and/or a fee.

    InfoSecCD Conference08, September 26-27, 2008, Kennesaw, GA,

    USA. Copyright 2008 ACM 978-1-60558-333-4/00/0006$5.00.

    27

  • 7/28/2019 p27 Billig

    2/6

    unprotected sensitive information were found within our own

    university and did not require a substantial amount of time to find.

    Google Hacking turned out to be a very powerful and flexible

    hacking approach. Many of the most powerful hacks we found did

    not quite work. But, in most cases, if we spent enough time

    analyzing the target and understanding how the queries foundinformation, we were able to tweak the original query by

    changing the parameters or the advanced operators to find similar

    information requested in the original query.

    We found it very helpful to use Google cached pages while

    performing Google Hacks. Google crawls web pages and stores a

    copy of them on its local servers. We used Google cached pagesto anonymously browse a target's site without sending a single

    packet to its server.

    Google grabs most of the pages it crawls, but omits images and

    some other space consuming media. When we viewed Google

    cached pages by simply clicking on the cached link on the

    results page, we ended up connecting to the target's server to getthe rest of the page content. This might identify our Google

    Hacking to the target website. We added&strip=1 parameter to

    the URL to tell Google to return only crawled content and notconnect to the target's server to get any information.

    A system administrator might decide to prevent access to a certain

    part of the site by moving it, protecting it with a password orsimply shutting down the server. What administrators often do

    not realize is that the information that they are trying to protect

    may still exist on Google's servers and can be accessed through

    cached pages. This allowed us to view data on websites that had

    been removed. [2, p. 88].

    4. GOOGLE HACKING

    According to the Johnny Longs Google Hacking Database [2],there are roughly fourteen categories of Google hacks. This paper

    looks at five of them: Error Messages, Open Directories,Documents & Files, Network Devices, and Personal Information

    Gathering.

    4.1 Error MessagesError messages provide a wealth of information. Developers use

    these error messages to pinpoint where their code has gone

    wrong. Unfortunately for web administrators, error messages thatare open to the world provide that information to those who know

    how to look for them. Database error messages can provide

    information like usernames, passwords, and server names.

    Here is an example of a MySQL error messages that tell the

    Googler the username for a MySQL database.

    "Warning: mysql_connect(): Access denied for user: '*@*" "on

    line" -help forum

    Here is another example of some error messages that provide SQL

    query information.

    "You have an error in your SQL syntax near" + inurl:.edu

    28

  • 7/28/2019 p27 Billig

    3/6

    4.2 Open Directories

    Googles web-bots crawl pages in a site that a web administrator

    may not want to be catalogued. Most sites stop users from

    browsing their directory structure, but not all websites are setupcorrectly.

    A simple Google search can provide a wealth of information.

    Directory browsing allows someone to see all the files you haveon your web server. Much of the important company information

    is stored on its server directories. Leaving those directories

    accessible for outsiders can compromise the entire company's lineof defense and make hackers lives way too easy.

    A search of intitle:index of returns a list of sites that allow

    directory browsing. Often this search reveals all kinds of

    information. Not only does it give a potential hacker access to allof your files, many times index pages reveal information like the

    operating system and web server software. This information gives

    a hacker a roadmap to which vulnerabilities you may have.

    A simple Google search like intitle:index of + solutions

    potentially give students access to solutions. Adding a sitesearch parameter (site:some_university.edu), we were able to

    obtain a solution manuals for a science department potentially

    allowing students to cheat on class assignments.

    In one of the results brought back by an intitle:index of query,

    we found a directory listing that contains a screen shot of a

    universitys financial management system. One of the most

    popular hacking techniques used within directory listings is the

    directory traversal technique. This technique refers tomodifying parts of the originally found URL in order to access

    other directories on the server. These may not be accessible to

    direct Google searches. For example, if you found a relative URL

    /cs/accounting/admin/jerryb, you can start getting rid of parts of

    the original URL in order to access parent directories such asadmin or accounting, or you could replace some parts of the URL

    with potential directories names, such as hr [5, p. 109].

    Using our financial management system documentation, we used

    the directory traversal technique to get to parent directories ofthe original search result. As we browsed through these

    directories, we found the complete documentation on managing

    and using that university's financial system. Screen shots

    contained some user IDs and, potentially valid, names of

    university's funds. Such information might be used by hackers toattack the university. This technique should be used by

    penetration testers to determine whether sensitive company

    information is being exposed on the web.

    4.3 Documents & Files4.3.1 Office DocumentsWebsite administrators do not always think of how a search

    engine will crawl their site when they build it. People will put

    sensitive files on their website without thinking. Word documents,

    Excel spreadsheets, and Access databases have a wealth ofinformation in them.

    Companies may store sensitive information, such as financial

    reporting or human resources documentation, on their websites in

    spreadsheets. By searching Google using this simple query

    site:some_university.edu intitle:index.of .xls, we found several

    Microsoft Excel files stored within directory listings. We found

    the equipment spending master list of a university department.This file contained equipment purchases with vendor and price

    information. Another Excel file from the same department

    contained faculty salaries. This information should not be

    publicly obtainable through a simple Google search.

    4.3.2 WS_FTP LogsAnother source of information is log files [6]. By default,WS_FTP creates a WS_FTP.log on the web server. This file

    contains a wealth of sensitive information such as: usernames, file

    directories, file names, times of file uploads/downloads, web

    server usage information. This information can save hackers a lot

    of time in their attempt to attack a company's website.

    The query site:some_university.edu index.of ".log" brought usback many results. Among these was a link to a WS_FTP.log file

    in a universitys physics and geology department file directory,

    that listed dates and times of file uploads done by using WS-FTP

    client. This file disclosed usernames and names of file directories.

    WS_FTP.log files contain information about file transfers to andfrom FTP servers.

    4.3.3 Source CodeA source code of a computer program can contain large amounts

    of sensitive information. Source code can show how the system

    was implemented and how the database is accessed. Code can

    contain passwords, server names, database tables and field names,and directories.

    Many companies are still not using any version control or

    professional backup solutions for their source code. As a result,

    programmers backup their code by making copies of their files

    with extensions such as .bak, .bak2, or .bak3. Web servers may

    contain pages like MyCode.asp.bak. What programmers do notrealize is that these code files may be retrieved from the web

    server. Web servers display a page based on the file extension.

    The web server has no idea how to display these backup files, and

    will display them as a plain text. That means that all of the code is

    now exposed to the user, perhaps revealing sensitive information.[5, p. 112].

    By using the following simple query site: .edu index.of asp.bak,

    we found many such pages on university websites. This included

    backed up ASP pages from careers site of one university. We can

    search other domains by simply replacing .edu with another

    domain such as .com.

    4.4 Network Devices

    You can find much more that just documents on the Internet.There are also many types of devices, interactive environments,collaboration tools, and social networks. Devices accessible

    through the Internet are a very popular target for hackers. Being

    able to control printers, web cameras, and network routers can be

    useful to plan an attack on a company. It is important that

    penetration testers understand those threats and protect companiesagainst them.

    29

  • 7/28/2019 p27 Billig

    4/6

    To provide convenience to its employees, companies may put

    hardware devices online. With the increase in telecommuting, this

    is happening more and more. There are countless devices online,

    and the Google Hacking Database [2] provides users with queries

    to find them.

    4.4.1 WebCamsThe first type of device that rookie Google hackers will attempt tofind is webcams. Simple searches like

    cameralinksys inurl:main.cgi

    reveal web pages that have Linksys web cameras. Other queries

    like

    inurl:"ViewerFrame?Mode=" + inurl:.edu

    allintitle: Axis 2.10 OR 2.12 OR 2.30 OR 2.31 OR 2.32

    OR 2.33 OR 2.34 OR 2.40 OR 2.42 OR 2.43 "Network Camera "

    also provide users with information about cameras.

    Webcam information may not seem very interesting, considering

    that webcams themselves are designed to be shown on the web.

    Some webcam owners put their devices online but do not sharethe URL for the device, except with a certain set of people. This

    security through obfuscation does not hold up very well with

    Google. The Google bots crawl all accessible pages

    indiscriminately. One specific webcam we found allowed the userto control the cameras direction, tilt, zoom, and display size.

    Another example that we found was a webcam at a construction

    website, which showed so much detail we could read the license

    plate numbers.

    4.4.2 Routers and FirewallsRouters and hardware firewalls are connected to the Internet areto allow remote administration. These devices are almost always

    password protected by system administrators. Unfortunately,

    some companies keep the default login and password. Thisinformation is easily found by using these Google queries.

    intitle:"Main page - SmoothWall Express"

    intitle:"Smoothwall Express" inurl:cgi-bin "up * days".

    Google uses the information in the title of the SmoothWall

    Express firewall client to find the administrative login pages forthe device. In the Johnny Longs Google Hacking Database [2],

    the bottom query was listed as a query to use to find the

    administrative login page for the device. We found that the

    bottom query doesnt return results.

    4.4.3 Network PrintersFinally, network printers are also available online. Many of these

    are password protected, but often they are available to anyone.

    intext:"MaiLinX Alert (Notify)" -site:networkprinters.com

    30

  • 7/28/2019 p27 Billig

    5/6

    4.5 Personal Information Gathering4.5.1 Email Address HarvestingA simple search like, site:nku.edu + @, will return all web pages

    that have the @ sign on the page. This query gives a spammer a

    legal means to gather countless email addresses.

    While the Google Terms of Service prohibit users from using

    tools that will automatically query websites, you can create a

    simple program that will use a simple Google query to return alist of pages that have email addresses. Using screen scrapes and

    regular expression, this kind of program can be written in no time.

    An example program that we wrote can be found at [1]. Once you

    have harvested your emails you can run a simple telnet program

    and use the GMAIL servers to validate our email addresses.

    telnet

    open gmail-smtp-in.l.google.com 25

    HELO test

    MAIL FROM:

    4.5.2 Shipment Tracking InformationIn the past few years, online shipment tracking systems have

    become very popular. People enjoy checking the status of theirshipments online in real time. But how secure is that information?

    We tried searching for UPS tracking information using the

    following Google query site:ups.com intitle:"Ups Package

    tracking" intext:"1Z ### ### ## #### ### #" posted on the

    Johnny Long's Google Hacking Database [2]. The original queryno longer worked, but that doesn't mean that the information is

    not there.

    By simply going to the UPS website and opening the shipment

    tracking page, we found out that the URL of the shipment

    tracking site had changed since the original query had been

    posted; so did the format of the tracking number. By updating the

    URL and removing tracking number format from the query, weget cleaner and simpler query that works "In Transit"site:wwwapps.ups.com . This query can be adjusted to filter down

    to the information you need. New query brings back a substantial

    amount of pages with tracking information for UPS packages that

    are currently in transit. This information can be used to track allincoming UPS packages for a selected address, perhaps to steal a

    package. Surely, most people would not be happy with the fact

    that this kind of information is available though a simple Google

    query.

    The references are also in 9 pt., but that section (see Section 7) is

    ragged right. References should be published materials accessibleto the public. Internal technical reports may be cited only if they

    are easily accessible (i.e. you can give the address to obtain the

    report within your citation) and may be obtained by any reader.

    Proprietary information may not be cited. Private communications

    should be acknowledged, not referenced (e.g., [Robertson,personal communication]).

    5. PROTECTING AGAINST GOOGLEHACKING

    Google Hacking is well documented and easy to learn. It is very

    important for security professionals to protect their companies

    against Google Hacking. To protect your site against Google

    Hacking, you need to establish a solid security policy of whatinformation can be put on the web. Security professionals should

    perform Google Hacking against their website to check for

    sensitive information disclosure. There is no 100% protection

    against Google Hacking, but strong policies and testing canimprove the security of your site.

    Security professionals need to learn Google Hacking to provide agood level of protection for their sites. As you become more

    familiar with manual hacks, you can start using some of the

    automated Google Hacking tools. This will automate your hacks,

    ensuring that every single page within your site is protected.

    Automated tools allow for periodic security checks withfrequency that is simply impossible to achieve with manual hacks.

    There are different routes you can go with using automated

    Google Hacking tools. You can use some of the pre-built

    automated tools, or take advantage of Google API and build your

    own Google Hacking tool. Pre-built automated Google Hacking

    tools, such as Johnny Longs Gooscan [5, p489-499] are verygood for many common hacks and will save you time. If you need

    something more customized, you may need to implement your

    own tool using Google API.

    31

  • 7/28/2019 p27 Billig

    6/6

    6. CONCLUSION

    While Google Hacking does not necessarily follow the standard

    definition of hacking, it can prove just as fruitful. By using

    Google, you can gain access to information that may otherwise behidden. The information that you gather using these hacks will

    allow you to gain access to systems or devices.

    The hacks work because Google indiscriminately storesinformation when its web spiders crawl the Internet. By using the

    advanced operators, you can view this information. Google makes

    it extremely easy to find this information. Those with morecomputer knowledge will have a smaller learning curve, but it

    will not take that long for even a novice Internet user to master

    these techniques.

    Security professionals can address the problem of Google

    Hacking in a manner similar to addressing other security issues.

    1) They can use Google Hacking to test their Web sites forsensitive information disclosure. 2) They can educate employees

    concerning what information should not be put on the Internet. 3)

    They can also implement enforceable policies to ensure employee

    compliance.

    7. REFERENCES[1] Email Address Harvesting,

    http://www.nku.edu/~frank/FindEmailAddresses.htm.

    [2] Google Hacking Database Web Site,http://johnny.ihackstuff.com/ghdb.php.

    [3] Johnny Longs Web Site, http://johnny.ihackstuff.com/.[4] Lancor, L. and Workman, R., Using Google Hacking to

    Enhance Defense Strategies. SIGCSE Bull. 39, 1 (Mar.

    2007), 491-495. DOI=

    http://doi.acm.org/10.1145/1227504.1227475.

    [5] Long, J., Google Hacking for Penetration Testers, Vol. 2,Syngress Press, 2008.

    [6] Neohapsis Archive ws_ftp.log,http://archives.neohapsis.com/archives/fulldisclosure/2004-

    08/0663.html.

    [7] Wikipedia Google Hacking Web Site,http://en.wikipedia.org/wiki/Google_Hacking.

    32