Upload
iuythlit
View
228
Download
0
Embed Size (px)
Citation preview
7/28/2019 p27 Billig
1/6
Evaluation of Google HackingJ ustin Billig
Department of Computer ScienceNorthern Kentucky UniversityHighland Heights, KY 41099
(859)[email protected]
Yuri DanilchenkoDepartment of Computer Science
Northern Kentucky UniversityHighland Heights, KY 41099
(859)[email protected]
Charles E. FrankDepartment of Computer Science
Northern Kentucky UniversityHighland Heights, KY 41099
(859)[email protected]
ABSTRACTGoogle Hacking uses the Google search engine to locate sensitive
information or to find vulnerabilities that may be exploited. Thispaper evaluates how much effort it takes to get Google Hacking to
work and how serious the threat of Google Hacking is. The paper
discusses the countermeasures that can be used against Google
Hacking.
Categories and Subject DescriptorsK.6.5 [Management of Computing and Information Systems]:
Security and Protection authentication, unauthorized access.
General TermsSecurity.
KeywordsInformation security, web security, hacking, Google Hacking,
information assurance.
1. INTRODUCTIONWikipedia [7] defines Google Hacking as the art of creating
complex search engine queries in order to filter through largeamounts of search results for information related to computersecurity. In its malicious format it can be used to detect websites
that are vulnerable to numerous exploits and vulnerabilities as
well as locate private, sensitive information about others, such as
credit card numbers, social security numbers, and passwords. This
filtering is performed by using advanced Google operators.Attackers can use Google Hacking to uncover sensitive
information about a company or to uncover potential security
vulnerabilities. A security professional can use Google Hacking
to determine if their websites are disclosing sensitive information.
Northern Kentucky University is a 15,000 student regional state
university. We performed a Google Hacking security assessmentof our university. In a few cases, we tried some of the Google
Hacking techniques more widely on the Internet. This allowed usto determine if various Google hacks actually work. Often,
techniques that worked in the past no longer work, as
vulnerabilities are patched.
We tried to determine how much effort it took to perform various
Google hacks. This was done purely for research purposes. Wenever had the intent of maliciously using any sensitive
information or potential security vulnerabilities. We have
disclosed potential issues to the security staff at our university.
In this paper, we assess the seriousness of information disclosure
using Google Hacking and make recommendations of what can bedone to defend against Google hackers.
2. BACKGROUND
The definitive source for information about Google Hacking isLong [5]. This book provides background in Google queries and
advanced operators. It has chapters on locating information on
the Web in various types of documents, locating exploit code and
finding vulnerable targets, and on how to search for usernames,
passwords, and social security numbers. This book is a must readfor security professionals wishing to protect their websites from
disclosing information to Google hackers.
A second important source is Johnny Longs website [3]. Its
Google Hacking Database [2] contains a large number of Googlesearches by category. The categories include Files containing
passwords, Pages containing login portals, and Sensitivedirectories. A user can try a Google search in the database by
simply clicking on a link.
We were only able to find one paper on Google Hacking in the
academic literature. Lancor and Workman [4] describe
incorporating Google Hacking into a graduate course on web
security. This paper serves as a good introduction to GoogleHacking. It describes a series of exercises used to teach students
how to use Google Hacking to test their own sites and how to
defend against it.
3. TECHNIQUES
We mostly limited our Google Hacking activities to NorthernKentucky University. We sometimes tried other educational sites
in the US, except for a few network device searches, which
required a bit of a broader domain. Our main goal, whileperforming the searches, was to check which Google hacks
actually work. These hacks were found by us on the Internet, in
Johnny Long's book [5] and in his Google Hacking Database [2].
This information is critical to understand how vulnerable we
really are to Google Hacking. Are websites protecting informationagainst Google Hacking? Sadly, most of the examples of
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
InfoSecCD Conference08, September 26-27, 2008, Kennesaw, GA,
USA. Copyright 2008 ACM 978-1-60558-333-4/00/0006$5.00.
27
7/28/2019 p27 Billig
2/6
unprotected sensitive information were found within our own
university and did not require a substantial amount of time to find.
Google Hacking turned out to be a very powerful and flexible
hacking approach. Many of the most powerful hacks we found did
not quite work. But, in most cases, if we spent enough time
analyzing the target and understanding how the queries foundinformation, we were able to tweak the original query by
changing the parameters or the advanced operators to find similar
information requested in the original query.
We found it very helpful to use Google cached pages while
performing Google Hacks. Google crawls web pages and stores a
copy of them on its local servers. We used Google cached pagesto anonymously browse a target's site without sending a single
packet to its server.
Google grabs most of the pages it crawls, but omits images and
some other space consuming media. When we viewed Google
cached pages by simply clicking on the cached link on the
results page, we ended up connecting to the target's server to getthe rest of the page content. This might identify our Google
Hacking to the target website. We added&strip=1 parameter to
the URL to tell Google to return only crawled content and notconnect to the target's server to get any information.
A system administrator might decide to prevent access to a certain
part of the site by moving it, protecting it with a password orsimply shutting down the server. What administrators often do
not realize is that the information that they are trying to protect
may still exist on Google's servers and can be accessed through
cached pages. This allowed us to view data on websites that had
been removed. [2, p. 88].
4. GOOGLE HACKING
According to the Johnny Longs Google Hacking Database [2],there are roughly fourteen categories of Google hacks. This paper
looks at five of them: Error Messages, Open Directories,Documents & Files, Network Devices, and Personal Information
Gathering.
4.1 Error MessagesError messages provide a wealth of information. Developers use
these error messages to pinpoint where their code has gone
wrong. Unfortunately for web administrators, error messages thatare open to the world provide that information to those who know
how to look for them. Database error messages can provide
information like usernames, passwords, and server names.
Here is an example of a MySQL error messages that tell the
Googler the username for a MySQL database.
"Warning: mysql_connect(): Access denied for user: '*@*" "on
line" -help forum
Here is another example of some error messages that provide SQL
query information.
"You have an error in your SQL syntax near" + inurl:.edu
28
7/28/2019 p27 Billig
3/6
4.2 Open Directories
Googles web-bots crawl pages in a site that a web administrator
may not want to be catalogued. Most sites stop users from
browsing their directory structure, but not all websites are setupcorrectly.
A simple Google search can provide a wealth of information.
Directory browsing allows someone to see all the files you haveon your web server. Much of the important company information
is stored on its server directories. Leaving those directories
accessible for outsiders can compromise the entire company's lineof defense and make hackers lives way too easy.
A search of intitle:index of returns a list of sites that allow
directory browsing. Often this search reveals all kinds of
information. Not only does it give a potential hacker access to allof your files, many times index pages reveal information like the
operating system and web server software. This information gives
a hacker a roadmap to which vulnerabilities you may have.
A simple Google search like intitle:index of + solutions
potentially give students access to solutions. Adding a sitesearch parameter (site:some_university.edu), we were able to
obtain a solution manuals for a science department potentially
allowing students to cheat on class assignments.
In one of the results brought back by an intitle:index of query,
we found a directory listing that contains a screen shot of a
universitys financial management system. One of the most
popular hacking techniques used within directory listings is the
directory traversal technique. This technique refers tomodifying parts of the originally found URL in order to access
other directories on the server. These may not be accessible to
direct Google searches. For example, if you found a relative URL
/cs/accounting/admin/jerryb, you can start getting rid of parts of
the original URL in order to access parent directories such asadmin or accounting, or you could replace some parts of the URL
with potential directories names, such as hr [5, p. 109].
Using our financial management system documentation, we used
the directory traversal technique to get to parent directories ofthe original search result. As we browsed through these
directories, we found the complete documentation on managing
and using that university's financial system. Screen shots
contained some user IDs and, potentially valid, names of
university's funds. Such information might be used by hackers toattack the university. This technique should be used by
penetration testers to determine whether sensitive company
information is being exposed on the web.
4.3 Documents & Files4.3.1 Office DocumentsWebsite administrators do not always think of how a search
engine will crawl their site when they build it. People will put
sensitive files on their website without thinking. Word documents,
Excel spreadsheets, and Access databases have a wealth ofinformation in them.
Companies may store sensitive information, such as financial
reporting or human resources documentation, on their websites in
spreadsheets. By searching Google using this simple query
site:some_university.edu intitle:index.of .xls, we found several
Microsoft Excel files stored within directory listings. We found
the equipment spending master list of a university department.This file contained equipment purchases with vendor and price
information. Another Excel file from the same department
contained faculty salaries. This information should not be
publicly obtainable through a simple Google search.
4.3.2 WS_FTP LogsAnother source of information is log files [6]. By default,WS_FTP creates a WS_FTP.log on the web server. This file
contains a wealth of sensitive information such as: usernames, file
directories, file names, times of file uploads/downloads, web
server usage information. This information can save hackers a lot
of time in their attempt to attack a company's website.
The query site:some_university.edu index.of ".log" brought usback many results. Among these was a link to a WS_FTP.log file
in a universitys physics and geology department file directory,
that listed dates and times of file uploads done by using WS-FTP
client. This file disclosed usernames and names of file directories.
WS_FTP.log files contain information about file transfers to andfrom FTP servers.
4.3.3 Source CodeA source code of a computer program can contain large amounts
of sensitive information. Source code can show how the system
was implemented and how the database is accessed. Code can
contain passwords, server names, database tables and field names,and directories.
Many companies are still not using any version control or
professional backup solutions for their source code. As a result,
programmers backup their code by making copies of their files
with extensions such as .bak, .bak2, or .bak3. Web servers may
contain pages like MyCode.asp.bak. What programmers do notrealize is that these code files may be retrieved from the web
server. Web servers display a page based on the file extension.
The web server has no idea how to display these backup files, and
will display them as a plain text. That means that all of the code is
now exposed to the user, perhaps revealing sensitive information.[5, p. 112].
By using the following simple query site: .edu index.of asp.bak,
we found many such pages on university websites. This included
backed up ASP pages from careers site of one university. We can
search other domains by simply replacing .edu with another
domain such as .com.
4.4 Network Devices
You can find much more that just documents on the Internet.There are also many types of devices, interactive environments,collaboration tools, and social networks. Devices accessible
through the Internet are a very popular target for hackers. Being
able to control printers, web cameras, and network routers can be
useful to plan an attack on a company. It is important that
penetration testers understand those threats and protect companiesagainst them.
29
7/28/2019 p27 Billig
4/6
To provide convenience to its employees, companies may put
hardware devices online. With the increase in telecommuting, this
is happening more and more. There are countless devices online,
and the Google Hacking Database [2] provides users with queries
to find them.
4.4.1 WebCamsThe first type of device that rookie Google hackers will attempt tofind is webcams. Simple searches like
cameralinksys inurl:main.cgi
reveal web pages that have Linksys web cameras. Other queries
like
inurl:"ViewerFrame?Mode=" + inurl:.edu
allintitle: Axis 2.10 OR 2.12 OR 2.30 OR 2.31 OR 2.32
OR 2.33 OR 2.34 OR 2.40 OR 2.42 OR 2.43 "Network Camera "
also provide users with information about cameras.
Webcam information may not seem very interesting, considering
that webcams themselves are designed to be shown on the web.
Some webcam owners put their devices online but do not sharethe URL for the device, except with a certain set of people. This
security through obfuscation does not hold up very well with
Google. The Google bots crawl all accessible pages
indiscriminately. One specific webcam we found allowed the userto control the cameras direction, tilt, zoom, and display size.
Another example that we found was a webcam at a construction
website, which showed so much detail we could read the license
plate numbers.
4.4.2 Routers and FirewallsRouters and hardware firewalls are connected to the Internet areto allow remote administration. These devices are almost always
password protected by system administrators. Unfortunately,
some companies keep the default login and password. Thisinformation is easily found by using these Google queries.
intitle:"Main page - SmoothWall Express"
intitle:"Smoothwall Express" inurl:cgi-bin "up * days".
Google uses the information in the title of the SmoothWall
Express firewall client to find the administrative login pages forthe device. In the Johnny Longs Google Hacking Database [2],
the bottom query was listed as a query to use to find the
administrative login page for the device. We found that the
bottom query doesnt return results.
4.4.3 Network PrintersFinally, network printers are also available online. Many of these
are password protected, but often they are available to anyone.
intext:"MaiLinX Alert (Notify)" -site:networkprinters.com
30
7/28/2019 p27 Billig
5/6
4.5 Personal Information Gathering4.5.1 Email Address HarvestingA simple search like, site:nku.edu + @, will return all web pages
that have the @ sign on the page. This query gives a spammer a
legal means to gather countless email addresses.
While the Google Terms of Service prohibit users from using
tools that will automatically query websites, you can create a
simple program that will use a simple Google query to return alist of pages that have email addresses. Using screen scrapes and
regular expression, this kind of program can be written in no time.
An example program that we wrote can be found at [1]. Once you
have harvested your emails you can run a simple telnet program
and use the GMAIL servers to validate our email addresses.
telnet
open gmail-smtp-in.l.google.com 25
HELO test
MAIL FROM:
4.5.2 Shipment Tracking InformationIn the past few years, online shipment tracking systems have
become very popular. People enjoy checking the status of theirshipments online in real time. But how secure is that information?
We tried searching for UPS tracking information using the
following Google query site:ups.com intitle:"Ups Package
tracking" intext:"1Z ### ### ## #### ### #" posted on the
Johnny Long's Google Hacking Database [2]. The original queryno longer worked, but that doesn't mean that the information is
not there.
By simply going to the UPS website and opening the shipment
tracking page, we found out that the URL of the shipment
tracking site had changed since the original query had been
posted; so did the format of the tracking number. By updating the
URL and removing tracking number format from the query, weget cleaner and simpler query that works "In Transit"site:wwwapps.ups.com . This query can be adjusted to filter down
to the information you need. New query brings back a substantial
amount of pages with tracking information for UPS packages that
are currently in transit. This information can be used to track allincoming UPS packages for a selected address, perhaps to steal a
package. Surely, most people would not be happy with the fact
that this kind of information is available though a simple Google
query.
The references are also in 9 pt., but that section (see Section 7) is
ragged right. References should be published materials accessibleto the public. Internal technical reports may be cited only if they
are easily accessible (i.e. you can give the address to obtain the
report within your citation) and may be obtained by any reader.
Proprietary information may not be cited. Private communications
should be acknowledged, not referenced (e.g., [Robertson,personal communication]).
5. PROTECTING AGAINST GOOGLEHACKING
Google Hacking is well documented and easy to learn. It is very
important for security professionals to protect their companies
against Google Hacking. To protect your site against Google
Hacking, you need to establish a solid security policy of whatinformation can be put on the web. Security professionals should
perform Google Hacking against their website to check for
sensitive information disclosure. There is no 100% protection
against Google Hacking, but strong policies and testing canimprove the security of your site.
Security professionals need to learn Google Hacking to provide agood level of protection for their sites. As you become more
familiar with manual hacks, you can start using some of the
automated Google Hacking tools. This will automate your hacks,
ensuring that every single page within your site is protected.
Automated tools allow for periodic security checks withfrequency that is simply impossible to achieve with manual hacks.
There are different routes you can go with using automated
Google Hacking tools. You can use some of the pre-built
automated tools, or take advantage of Google API and build your
own Google Hacking tool. Pre-built automated Google Hacking
tools, such as Johnny Longs Gooscan [5, p489-499] are verygood for many common hacks and will save you time. If you need
something more customized, you may need to implement your
own tool using Google API.
31
7/28/2019 p27 Billig
6/6
6. CONCLUSION
While Google Hacking does not necessarily follow the standard
definition of hacking, it can prove just as fruitful. By using
Google, you can gain access to information that may otherwise behidden. The information that you gather using these hacks will
allow you to gain access to systems or devices.
The hacks work because Google indiscriminately storesinformation when its web spiders crawl the Internet. By using the
advanced operators, you can view this information. Google makes
it extremely easy to find this information. Those with morecomputer knowledge will have a smaller learning curve, but it
will not take that long for even a novice Internet user to master
these techniques.
Security professionals can address the problem of Google
Hacking in a manner similar to addressing other security issues.
1) They can use Google Hacking to test their Web sites forsensitive information disclosure. 2) They can educate employees
concerning what information should not be put on the Internet. 3)
They can also implement enforceable policies to ensure employee
compliance.
7. REFERENCES[1] Email Address Harvesting,
http://www.nku.edu/~frank/FindEmailAddresses.htm.
[2] Google Hacking Database Web Site,http://johnny.ihackstuff.com/ghdb.php.
[3] Johnny Longs Web Site, http://johnny.ihackstuff.com/.[4] Lancor, L. and Workman, R., Using Google Hacking to
Enhance Defense Strategies. SIGCSE Bull. 39, 1 (Mar.
2007), 491-495. DOI=
http://doi.acm.org/10.1145/1227504.1227475.
[5] Long, J., Google Hacking for Penetration Testers, Vol. 2,Syngress Press, 2008.
[6] Neohapsis Archive ws_ftp.log,http://archives.neohapsis.com/archives/fulldisclosure/2004-
08/0663.html.
[7] Wikipedia Google Hacking Web Site,http://en.wikipedia.org/wiki/Google_Hacking.
32