WHOIS Misuse Study...WHOIS data, identified in a Task Force Report on WHOIS Services (GNSO, 2007) the possibility of misuse of WHOIS data for phishing and identity theft, among others

1

WHOIS Misuse Study Draft report for public comment

26 NOVEMBER 2013

Nektarios Leontiadis

Nicolas Christin

Carnegie Mellon University

2

Table of Contents Table of Contents .......................................................................................................... 2

1. Introduction ............................................................................................................ 8

2. Background and overview of the study ............................................................. 10

2.1. Descriptive study ......................................................................................................11

2.2. Experimental study ..................................................................................................12

3. Study Samples ..................................................................................................... 13

3.1. Selecting a survey panel ..........................................................................................13

3.2. Creating a microcosm sample of the world’s registered gTLD domain names ...14

A proportional probability microcosm .................................................................................15

Registrant sample ..............................................................................................................15

Registrar/Registry sample ..................................................................................................16

4. Law Enforcement & Researchers survey ........................................................... 18

4.1. Survey methodology and design details ................................................................18

4.2. Analysis of responses ..............................................................................................19

Demographics ....................................................................................................................19

Level of expertise ...............................................................................................................21

Attack experiences .............................................................................................................22

Specific WHOIS misuse incidents ......................................................................................23

4.3. Discussion ................................................................................................................25

5. WHOIS misuse reported by Registrants ............................................................ 27

5.1. Survey methodology and design details ................................................................27

Methodology ......................................................................................................................27

Survey translations ............................................................................................................28

3

Types of questions .............................................................................................................29

5.2. Response and error rates ........................................................................................29


Characteristics of the participants ......................................................................................30

Reported WHOIS misuse ...................................................................................................32

Adverse effects ..................................................................................................................35

Countermeasures ..............................................................................................................36

5.4. Discussion ................................................................................................................36

6. Assessing Registrar/Registry anti-harvesting................................................... 38

6.1. Survey methodology and design .............................................................................38


Demographics ....................................................................................................................39

Employed anti-harvesting techniques .................................................................................40

Incidents of WHOIS misuse ...............................................................................................41

Incidents of WHOIS harvesting and their effect in deploying new countermeasures ..........41

6.3. Testing of WHOIS query rate limiting techniques ..................................................42

6.4. Discussion ................................................................................................................45

7. Experimental Study .............................................................................................. 47

7.1. Registrars ..................................................................................................................48

7.2. Domain names ..........................................................................................................49

7.3. Registrants associated with domains .....................................................................50

Names of Registrants.........................................................................................................50

Email addresses ................................................................................................................50

Physical addresses ............................................................................................................51

Phone numbers..................................................................................................................52

4

7.4. Registering domains ................................................................................................52

7.5. Duration of the experiment ......................................................................................53

7.6. Breakdown of the collected instances of misuse...................................................55

Postal address misuse .......................................................................................................55

Email address misuse ........................................................................................................56

Attempted malware delivery ...............................................................................................60

Phone number misuse .......................................................................................................61

Other types of misuse ........................................................................................................63

7.7. Overall experiment incidents of WHOIS misuse ....................................................64

7.8. Discussion ................................................................................................................65

8. Comparative result analysis................................................................................ 67

8.1. Correlation between measured and reported incidence of misuse ......................67

8.2. Domain characteristics affecting email address misuse .......................................69

8.3. Domain characteristics affecting phone number misuse ......................................73

8.1. Domain characteristics affecting postal address misuse ......................................76

9. Discussion ............................................................................................................ 77

10. Appendix A – Law Enforcement/Researcher survey ........................................ 81

10.1. Invitation to participate ..........................................................................................81

10.2. Consent form ..........................................................................................................82

10.3. Survey questions ...................................................................................................85

11. Appendix B – Registrant survey ......................................................................... 90

11.1. Invitation to participate ..........................................................................................90

11.2. Consent ...................................................................................................................91

11.3. Survey questions ...................................................................................................94

11.4. Terms .................................................................................................................... 109

5

12. Appendix C – Registrar and Registry Survey .................................................. 115

12.1. Invitation to Participate ........................................................................................ 115

12.2. Consent form ........................................................................................................ 116

12.3. Survey questions ................................................................................................. 119

13. Bibliography ....................................................................................................... 128

6

Executive summary Does public access to WHOIS-published data lead to a measurable degree of misuse1? This

study, sponsored by the Internet Corporation for Assigned Names and Numbers (ICANN) and

initiated by ICANN’s Generic Names Supporting Organization (GNSO, 2010), attempts to

answer this question, with a focus on the five most populous generic Top Level Domains

(gTLDs). To do so, we first surveyed experts, law enforcement agents, Registrants, Registrars,

and Registries, and collected their input on the prevalence of WHOIS misuse, thereby obtaining

a descriptive data set. We then complemented this descriptive portion of the study with a set of

experimental measurements of WHOIS misuse, which we obtained by registering 400 domains

in the top five gTLDs across 16 Registrars, associating unique, synthetic WHOIS contact

information with these domains, and monitoring incidents of misuse for a period of 6 months.

The main finding of the descriptive study is that there is a statistically significant occurrence of

WHOIS misuse affecting Registrants’ email addresses, postal addresses, and phone numbers,

published in WHOIS when registering domains in these gTLDs. Overall, we find that 44% of

Registrants experience one or more of these types of WHOIS misuse. Other types of WHOIS

misuse are reported, but at a smaller, non-significant rate. Among those, a handful of reported

cases appear to be highly elaborate attempts to achieve high attack impact.

As a caveat, most findings of the descriptive study are affected by low response rates from the

parties we surveyed. Most importantly, we are unable to draw meaningful conclusions about the

geographical aspects of WHOIS misuse. Indeed, the great majority of survey responses

originated from the US, even though we used a much more geographically diverse Registrant

population sample, and tried to survey Registrants in their native language.

The experimental study corroborates the findings of the descriptive study. In particular, it offers

quantitative insights regarding both the extent of WHOIS misuse, and the parameters affecting

WHOIS misuse. A limitation of the experimental study is that the impact of geographical location

1 In this study, WHOIS misuse refers to harmful acts that exploit contact information obtained from

WHOIS. Harmful acts may include generation of spam, abuse of personal data, intellectual property theft,

loss of reputation or identity theft, loss of data, phishing and other cybercrime related exploits,

harassment, stalking, or other activity with negative personal or economic consequences.

7

on postal address misuse could not be measured, due to the prohibitively expensive cost of

setting up postal boxes in countries without having an actual residence there.

Among the measurable factors analyzed by this experiment, we identify the gTLD as the sole

statistically-significant characteristic that affects the occurrence of the associated misuse of

phone numbers published in WHOIS. For example, the rates of WHOIS phone number misuse

are negatively correlated with .ORG domains (less misuse), but positively with .BIZ and .INFO

(more misuse).

Similarly, we find that the domain price is negatively correlated with the possibility of misuse of

email addresses published in WHOIS (i.e., experimental domains purchased at greater cost had

less email address misuse). We also discover that .COM, .NET, and .ORG domains are

associated with less email address misuse, while .BIZ domains are associated with more

misuse.

We also studied whether the composition of domain names themselves impacts the probability

of WHOIS misuse. We find that experimental domain names representing natural person names

appear to foster less email misuse, while for other experimental domain name categories (e.g.,

professional, randomly-generated, etc.), WHOIS misuse probability seems independent of the

domain name composition.

We find that WHOIS anti-harvesting techniques, applied both at the Registry and Registrar level,

is statistically significant in reducing the possibility of WHOIS email address misuse. Overall, we

find that experimental WHOIS data registered with Registries/Registrars with no observable

anti-harvesting countermeasures was twice more likely to result in unwanted emails compared

to cases where a countermeasure was deployed. We do not offer, however, a comparative

analysis of the effectiveness of specific anti-harvesting techniques against WHOIS misuse, as

any differences we could observe were not statistically significant.

Finally, we do not find other statistically significant correlations between specific Registrars used

to register experimental domains and measured rates of WHOIS misuse.

8

1. Introduction WHOIS is an essential information service that primarily allows anyone to map domain names

to Registrants and their contact information. There is increasing anecdotal evidence of misuse

of the data made publicly available through the WHOIS service. For instance, some Registrants2

have reported that their WHOIS publicly available data was used by a third-party to register a

domain name similar to the Registrant’s, while listing contact information identical to that

provided by the Registrant. The domain name registered with the fraudulently acquired

Registrant information was subsequently used to impersonate the owner of the original,

legitimate domain, for nefarious purposes. Other studies have concluded that WHOIS data

could be used for phishing attempts (SAC028, 2008), or even for sending spam email (SAC023,

2007).

The purpose of this WHOIS Misuse study is to provide a quantitative and qualitative

assessment of the types of WHOIS data misuse experienced by gTLD domain name

Registrants, the magnitude of these misuse cases and characteristics such as anti-harvesting

measures that may impact misuse.

The study offers the following contributions:

We test and validate the hypothesis that public access to WHOIS data leads to a

measurable degree of misuse of certain kinds of gTLD domain name Registrant identity

and contact information, via a combination of a descriptive study (surveys), and of an

experimental study.

We examine gTLD domain names, associated Registry and Registrar anti-harvesting

characteristics, and their effect on WHOIS misuse.

We describe the major types of misuse stemming from public WHOIS access to

Registrant identity and contact data.

We assess the effectiveness of anti-harvesting defenses against WHOIS misuse.

We design and describe a large-scale experiment to empirically measure the type and

extent of misuse of WHOIS information. This empirical work provides a framework for

the design of similar future studies.

2 See http://www.eweek.com/c/a/Security/Whois-Abuse-Still-Out-of-Control/.

9

The rest of this report is organized as follows. Section 2 provides the background of the study

and its objectives, and section 3 characterizes the population samples we utilized for the

different components of this study. The following sections (4, 5, and 6) discuss the descriptive

part of the study; each section separately describes each of three surveys we conducted with

law enforcement and researchers, with Registrants, and with Registrars and registries,

respectively. Section 7 discusses our experimental study, and includes a detailed presentation

of the experimental design and parameters. Section 8 provides an empirical analysis of data

collected from both the descriptive and the empirical part of the study. Section 9 concludes with

an overall discussion of the study outcomes.

10

2. Background and overview of the study Based on their operational agreement with ICANN (ICANN, 2013), all gTLD Registrars are

required to collect Registrant identification and contact information that is subsequently

published in each Registrar’s WHOIS directory. While the original purpose of WHOIS was to

provide the necessary information to get in contact with a Registrant for legitimate purposes

(e.g., abuse notifications or other operational reasons), uncontrolled public access to WHOIS

also allows the collection of the same information for nefarious purposes such as unsolicited

email or phone calls (i.e., spam). The Generic Names Supporting Organization (GNSO), which

is responsible for the development of gTLD domain name policies, including those pertaining to

WHOIS data, identified in a Task Force Report on WHOIS Services (GNSO, 2007) the

possibility of misuse of WHOIS data for phishing and identity theft, among others.

A later study by the ICANN Security and Stability Advisory Committee (SAC023, 2007) looked

into the potential of misuse of email information posted exclusively in WHOIS. During a three-

month measurement study, they registered an arbitrary number of randomly chosen domain

names, with and without the use of privacy and proxy services, and monitored the mailboxes for

spam email. The study found evidence that the public availability of WHOIS data contributes to

the frequency of spam email; and that protective services applied either to all WHOIS data (e.g.

rate limiting) or to WHOIS data associated with a single domain name (e.g. privacy and proxy

services), can deter WHOIS misuse.

This WHOIS misuse study builds on this previous work by providing updated results, and a

more comprehensive set of experiments. This study heavily draws upon the Terms of Reference

for WHOIS Misuse Studies (ICANN, 2009). This work was designed and conducted in response

to the GNSO’s decision to pursue WHOIS studies (GSNO, 2010); the goal of this study is to

provide empirical data to help ICANN determine if there is substantial WHOIS misuse which

warrants further action. Therefore, this study is designed to try to answer the following

questions:

Validate or invalidate the hypothesis that public access to gTLD WHOIS data leads to a measurable degree of misuse.

If the hypothesis is validated, identify major types of misuses stemming from public access to gTLD WHOIS data.

Determine which anti-harvesting measures appear to be most effective against gTLD WHOIS misuse.

11

We adopted a two-pronged approach – that is, we conducted both a descriptive study and a

complementary, experimental study. The descriptive study aims at collecting past instances of

misuse cases, through interviews and surveys of potential victims and Registrars/Registries. We

also surveyed law enforcement and cybercrime researchers and agencies that deal with

incidents of misuse, to better determine the nature and overall magnitude of WHOIS misuse.

We complemented the descriptive study by an experimental study. The goal was to acquire

controlled data on misuse events by setting up a representative environment attractive to those

who could be tempted to misuse WHOIS to measure the impact of anti-harvesting measures

that could affect the degree of misuse observed.

2.1. Descriptive study Pursuant to the Terms of Reference, the descriptive study consists of a set of four surveys: a)

Registrant survey, b) Registrar/Registry survey, c) Cybercrime Researchers survey, and d)

Consumer Protection, Regulatory and Law Enforcement organizations survey. Because they

relied on identical questionnaires, we will subsequently consider surveys c) and d) as a single

survey.

The goals of each of the surveys are as follows.

A) Registrant survey. Gathered a representative sample of domain names registered in the

top five gTLDs, and surveyed experiences of specific harmful acts attributed to WHOIS

misuse.

B) Registrar and Registry surveys. Surveyed Registries and Registrars associated with the

registration of the domain name sample from survey (A), to identify WHOIS anti-

harvesting mechanisms employed, and collect aggregate information about known

WHOIS harvesting attacks.

C/D) Cybercrime researchers and law enforcement surveys. These surveys intend to further

broaden the study’s perspective of WHOIS misuse by contacting a representative set of

researchers and consumer protection, regulatory, and law enforcement organizations, to

gather examples and statistics on harmful acts in general, and more specifically those

attributed to WHOIS misuse.

Our goal for survey A was to obtain a representative sample by randomly selecting domain

names from the top five gTLDs, maintaining the population proportions, and generate study

results with 95% confidence interval. Owing to the much smaller populations involved, surveys

12

B and C/D, on the other hand, are intended to provided qualitative insights rather than

quantitative measurements.

2.2. Experimental study The second facet of this work is an experimental study, which attempts to complement the

observations gained from the descriptive study by gathering a controlled set of network

measurements. The platform that is used for the measurements is a set of domain names,

registered as part of the study across the top five gTLDs through a representative sample of

Registrars, and associated with artificial Registrant identities. The goal is to measure the extent

of illegal or harmful Internet activity experienced by domain name Registrants that can be

exclusively attributed to WHOIS misuse, given that the experimental design eliminates any

extraneous variables that may correlate (positively or negatively) with the observed misuse.

In the surveys collected from the descriptive study, it is hard to completely eliminate external

plausible causes for illegal or harmful Internet activity to draw conclusions on WHOIS misuse

with certainty. For example, a Registrant might experience misuse of his/her personal phone

number used with the registration of his domain name. However, if that same number is also

listed in his/her Facebook profile and s/he has set poor privacy controls to protect his/her

profile, then misuse cannot be attributed to WHOIS with certainty. On the other hand, in the

experimental study, Registrant identities (a term defined in Section 7.3) are artificially

constructed and solely used for the purpose of this experiment.

The experimental study lasted six months, during which we collected emails, voicemails, and

postal mail received by the Registrants associated with the experimental domains. We

registered 400 domains with a geographically diverse set of 16 Registrars, distributed

proportionally across the top 5 gTLDs, with domain names that are classified in four categories

of interest plus one control category. Our analysis provides insights into the different degrees of

correlation between WHOIS misuse and gTLDs, types of misuse, types of domains, cost of

domains, and anti-harvesting techniques deployed. However the experimental design did not

allow us to gain major insights on how regions and countries are affected by WHOIS misuse; in

particular, we were not able to set up postal boxes out of the United States, due to mail

regulations requiring proof of residency, in most countries, and “virtual office” solutions being

prohibitively expensive at the scale at which we needed to run the experiment.

13

3. Study Samples In this section we discuss how we created domain name samples and selected invitees for the

different parts of the study. We first describe how we chose the invitees for the researcher and

law enforcement survey, before presenting the sampling process of the domain names and

resulting invitees of the Registrar/Registry and Registrant surveys.

3.1. Selecting a survey panel As part of the Law Enforcement and Researchers survey, we assembled a geographically

diverse group of experts in the fields of security and privacy affiliated with research institutes,

academia, law enforcement agencies, Internet Service Providers (ISPs), and national data

protection commissions. The goal was to survey experts to whom WHOIS misuse incidents are

reported, to ultimately obtain a qualitative global overview of WHOIS misuse, rather than a mere

collection of individual misuse incidents.

Geographical region Type of expertise North America Agencies to which security incidents are reported South America Large commercial vendor research labs Europe Large Internet service providers Africa Academic cybercrime research organizations Asia / Pacific Law enforcement agencies Commercial cybercrime investigators National Data Protection Commissioners

Table 1 Recruiting requirement in terms of geographical region and type of expertise

Our approach for recruiting participants was to build upon contacts established at Carnegie

Mellon University (CMU) with additional input from ICANN to fill coverage gaps. Once this

invitee list was completed, we identified remaining gaps and omissions in terms of the type of

expertise we were looking for and geographic coverage, and we successfully managed to

amend these deficiencies by researching online for additional invitees that would match our

requirements. Table 1 lists the coverage goals for this survey’s participants.

Toward the end of the time interval over which the survey was initially conducted, and despite

the high response rate (email-based invitation, 25% response rate, corresponding to 29

responses out of 114 invitations at the time), an initial analysis of the responses informed us

14

that we had collected a small number of individual misuse incidents and that we were lacking

coverage for South America. We therefore extended the duration of the survey and invited a

broader population of law enforcement experts attending the Costa Rica ICANN meeting to

participate. The required level of expertise of the additional participants was verified by survey

questions specifically structured for that purpose. Ultimately, the survey was run between

September 2011 and April 2012, with answers provided by every eligible3 participant who

completed the study being included in survey results.

3.2. Creating a microcosm sample of the world’s registered gTLD domain names

Domain name registrations in the top 5 generic Top Level Domains (gTLDs) in the summer of

2011 exceeded 127 million (Table 2). As we aspired to draw conclusions on characteristics of

the gTLD population as a whole, we decided to take a representative sample of those domains

– a microcosm – and employ statistical inference techniques on that microcosm. A similar

technique was employed by the NORC study of the Accuracy of WHOIS Registrant Contact

Information (NORC, 2010), with the exception that this WHOIS misuse study did not attempt to

geographically stratify the sample. The microcosm was selected randomly in an unbiased

proportional way from the population of 127 million.

gTLD Domains Proportion COM 95,185,529 74.54% NET 14,078,829 11.03% ORG 9,021,350 7.06% INFO 7,486,088 5.86% BIZ 2,127,857 1.67% TOTAL 127,694,306 100%

Table 2 Number of domain registrations in the top 5 gTLDs in August 2011

We select such a microcosm to investigate WHOIS misuse from a number of perspectives. At

the most basic level, we surveyed Registrants to learn about their experience of misuse of

personal or corporate information listed in WHOIS. We then surveyed the top 5 gTLD Registries

3 The eligibility was dependent on the participant being at least 18 years old, and on their explicit consent to participate. These criteria are defined by CMU’s Institutional Review Board (IRB).

15

and the Registrars associated with the sampled domains to understand how they are protecting

the Registrants’ information from WHOIS misuse. Finally, using a subset of the aforementioned

Registrars, we registered 400 test domains using artificial Registrant information, and we

monitored instances of WHOIS misuse experienced by those domains for six months. This

experiment enabled us to correlate domain name and directly-associated or observable

Registrar/Registry characteristics with WHOIS misuse (e.g, gTLD, cost, anti-harvesting).

A proportional probability microcosm In November of 2011 we received from ICANN, at our request, a sample of 6,000 domains,

selected randomly from gTLD zone files with equal probability of selection.4 Of those 6,000

domains, 83 were not within the top 5 gTLDs to be studied and so were discarded. Additionally,

we were provided with the WHOIS records associated with 5,921 of the domains, obtained over

a period of 18 hours on the day following domain sample generation. We used a WHOIS record

parser internally developed at CMU to convert the loosely formatted WHOIS records into

structured information that allowed further automated processing.

With this set of structured WHOIS information, we created a proportional probability microcosm

of the 127 million domains, using the proportions in Table 2. In deciding the size of the

microcosm we used as a baseline the size of the microcosm in (NORC, 2010). In 2009 NORC

assembled a proportional probability sample of 2,400 domains. Taking into account the growth

in the population of domain names under the 5 gTLDs from 2009 to 2011, we created a

proportional probability microcosm of 2,905 domain names, which we used to draw a sample of

domain names for data collection.

Registrant sample For the purpose of surveying domain Registrants, we needed a representative sample of the

microcosm of domain names, to identify their Registrants and invite them to participate. Our

sample design parameters are listed in Table 3. As an equal probability sample, every domain

in the microcosm has an equal probability of being selected. As with similar studies, we adopted

a confidence interval (CI) of 95% and margin of error (ME) of 5%. With the microcosm of 2,905

domains we estimated that a sample size of 340 Registrants5 would provide the necessary

4 At one point, we considered duration of registration as a sample parameter, but eventually decided not to use it, due to the relative difficulty to properly assess this parameter.

5 𝑆𝑎𝑚𝑝𝑙𝑒 ≥ 𝑤ℎ𝑒𝑟𝑒 𝑁 = 2905, 𝑛 = . × , 𝑆𝐷 = 𝑝(1 − 𝑝), 𝑎𝑛𝑑 𝑝 = 0.5.

16

insights for the given CI and ME. Additionally, provided that survey participants would be invited

via an email invitation, we projected a 15%-25% response rate. We consequently drew a

sample of 1,619 domains from the microcosm, which, with a 21% response rate, would yield the

desired 340 Registrants. This sample did not explicitly exclude or include Proxy-registered

domain names.

Method of selection Simple Random Sampling Confidence interval 95% Margin of error 5% Expected response rate 15%-25% Table 3 Sample design parameters

Registrar/Registry sample Before we provide the details about this sample, we need to clearly define the distinction

between Registrars and Registries. Registrars are entities that process individual domain name

registration requests. Each Registrar operates under agreement with at least one Registry – that

is, an organization responsible for maintaining an authoritative list of all domain names

registered in a given gTLD. For example, VeriSign is the Registry for all domain names

registered in the .COM gTLD; individual Registrars such as GoDaddy and Network Solutions

register .COM domain names under an agreement with VeriSign.

ICANN-accredited gTLD Registrars are responsible for collecting WHOIS information during

domain name registration, but WHOIS data storage and access varies across Registries. Thick WHOIS Registries maintain a central database of all WHOIS information associated with

registered domain names; they can respond directly to WHOIS queries with all available WHOIS

information. Thin WHOIS Registries maintain only basic WHOIS information centrally; they rely

on the Registrar for each domain name to store and supply all other available WHOIS

information.

In this study, we were concerned with the .BIZ, .INFO, and .ORG gTLD thick WHOIS Registries

and the .COM and .NET thin WHOIS Registries. Per the GNSO’s request for this study, we did

not attempt to study domain names registered under other smaller gTLDs or under ccTLDs.

The sample of Registrars and Registries that we surveyed as part of the Registrant and Registry

(R/R) survey, is directly associated with the previously described sample of Registrants. We

build a sample of 111 Registrars and Registries by simply looking up the Registrars who

17

maintain the registration information of the 1619 sampled domains, and the associated

Registries.

In the case of Registrar affiliates operating as resellers, the association between a domain

name and the Registrar that actually performed its registration cannot be identified in a

straightforward way. That is because WHOIS does not hold information about the Registrar-

Reseller relationship. So, for domains associated with known resellers, we used information in

WHOIS on domains’ name servers to identify some of the Registrars. This approach is based

on the assumption that in many cases domains use the DNS services of the Registrars with

which they are registered. We acknowledge that the method we described is problematic in

cases when (a) a domain has been registered with Registrar A, but the associated DNS server

is hosted by Registrar B, and (b) the Registrant delegates its domain name’s DNS services to a

company C that is not evidently associated with Registrar A. Nevertheless, we believe our

design choice provides a systematic and reproducible method of acquiring the required

information.

18

4. Law Enforcement & Researchers survey We ran an expert survey to gather examples and statistics on illegal or harmful Internet acts (as

defined by ICANN through the Terms of Reference for this and other WHOIS studies) in general,

and more specifically those attributed to WHOIS misuse, and to broaden our perspective of

WHOIS misuse. Survey invitees included a diverse set of researchers and consumer protection,

regulatory, and law enforcement organizations.

4.1. Survey methodology and design details For the invitation process we built up on contacts established at Carnegie Mellon University and

we requested ICANN’s input in finalizing the list of parties invited to participate in the survey. We

made significant effort to build a geographically diverse set of experts that enabled us to capture

the impact and the extent of WHOIS misuse around the world. We were also able to achieve

diversity in terms of the types of the expertise of survey participants. (See Section 3.1 for a

description of invitee list.)

We used email messages to invite individual experts to participate in the survey. The invitation

contained a short description of the study, information about the principal investigator, and links

to either participate in the survey or opt out from any future messages and reminders from us.

We also offered the option to download the questionnaire and email the responses to us. The

content of the invitation is available in Appendix A – Law Enforcement/Researcher survey:

Invitation to participate.

When a participant clicks on the link to participate he is presented with a consent form that

describes briefly the procedures, requirements, risks, benefits, associated compensation (none),

and privacy assurances we offered. The text is available in Appendix A – Law

Enforcement/Researcher survey: Consent form.

The survey lasted 8 months – from August 2011 until May 2012 – and collected responses from

101 participants. The survey was implemented with SurveyMonkey and all connections to this

service were protected with SSL.6 The survey questions are available in Appendix A – Law

6 Using SSL is just one of the measures we took to preserve the confidentiality of responses. In addition, only authorized personnel (researchers on our team) handled the survey responses. At the completion of the study all responses were removed from SurveyMonkey and kept at a secure location at Carnegie Mellon.

19

Enforcement/Researcher survey: Survey questions. Invitees were assured that all responses

would be treated as confidential, with survey data published in only in aggregate, anonymized

form.

4.2. Analysis of responses In the following sections we first describe the demographics of the participants, which establish

their level of expertise and geographical diversity, and then we delve into the WHOIS misuse-

specific responses. We then provide an overall summary of our findings from this survey.

Demographics The participants were initially asked to self-classify their occupation (Figure 1) and the type of

employer they are working for (Figure 2). As expected, security researchers and

government/law enforcement agents constituted about 90% of the responses. Based on the

description of the respondents’ employers, it is evident that the government view is over-

represented in responses. However, assuming that government agencies have a more

extensive and clear awareness of the misuse incidents, this characteristic of our population

sample is an acceptable bias.

Figure 1 Occupation of participants.

SecurityConsultant

Researcher(Industry)

Lawenforcement

agent

Researcher(Academia)

Governmentagency Other Manager

Consumerprotection

agencyOccupation 25% 20% 20% 12% 10% 7% 5% 1%

0%

5%

10%

15%

20%

25%

30%

% o

f par

ticip

ants

20

Figure 2 Description of employer.

In terms of geographical coverage, the respondents mainly provided responses for the

American and the European continent (Figure 3). While we made significant effort to invite

experts in the Asia, Africa, and the Pacific regions, participation from these regions was limited.

Figure 3 Reporting regions

Governmentalorganization

Securityindustry Academia Other IT

industryNot-for-profit

NGO Other

Employer 32% 23% 14% 14% 12% 5%

0%

5%

10%

15%

20%

25%

30%

35%%

of p

artic

ipan

ts

NorthAmerica

SouthAmerica Europe Central

America Africa Asia Oceania

Reporting region 37% 32% 18% 6% 4% 1% 1%

0%

5%

10%

15%

20%

25%

30%

35%

40%

% o

f par

ticip

ants

21

Level of expertise In the survey we included a set of questions that would inform us about the level and type of

expertise of the participants in the subject we are studying. Therefore we used a Likert scale (1:

low – 5: high) to rate the participants’ familiarity with the domain name registration process, the

requirement to provide personal information during that process, and the existence of the

WHOIS directory that makes this personal information available to the public, based on self-

reporting.

The results (Table 4) show that the majority of respondents are cognizant of the domain

registration process (mean:4.1, std.dev: 2.03), the requirement to submit personal information

(mean: 4.23, std.dev: 2.06), and almost 60% of participants rated themselves as experts in the

specifics of the WHOIS directory (mean: 4.35, std.dev: 2.1).

We also included questions that would not only evaluate the participants’ understanding of two

domain-specific notions (WHOIS harvesting, WHOIS anti-harvesting techniques), but would also

provide us with an insight of the level of expert awareness about WHOIS misuse, and the

techniques to thwart it.

Table 4 Familiarity with key domain registration concepts

1 - Notfamiliar 2 3 - Know

the basics 4 5 - Expert

Domain registration process 2% 0% 23% 33% 41%Requirement to supply contact

information with domainregistration

1% 1% 16% 35% 46%

Availability of contactinformation on WHOIS

directory3% 1% 10% 30% 56%

0%

10%

20%

30%

40%

50%

60%

22

81% of participants stated awareness of WHOIS harvesting, and 63% of WHOIS anti-harvesting

techniques. When the participants were asked to describe some anti-harvesting techniques,

most of them mentioned CAPTCHAs, port 43 rate limiting, and privacy or proxy registration

services.

Attack experiences In this section of the survey we sought to collect information related to direct and indirect

(reported) experiences of security related attacks overall, before we considered the role of

WHOIS misuse. The combined measures show the prevalent types of attacks that Internet

users are faced with in general. Further on, we tried to look for relationships (if any) between

reported security incidents and WHOIS misuse.

Table 5 and Table 6 list a variety of types of security incidents that can be triggered by network

attacks; participants are asked to note the ones that they have directly (Table 5) and indirectly

(Table 6) observed. Not surprisingly, email spam is the most observed type of network attack in

both cases. It is noteworthy though that all types of attacks (e.g., postal spam and blackmail)

have a high rate of occurrence. Comparing the directly observed and reported security incidents

we see a lower rate of reporting of email spam, email viruses, and postal spam. This could be

attributed to the widespread nature of these types of attacks, which could make the reporting of

these security incidents deemed unnecessary.

Table 5 Directly observed network attack experiences (overall, not specifically related to WHOIS misuse)

Email spam Email virusMalware

installation/drive by

downloadsPhishing

Unauthorizedintrusion on

serversPostal spam Denial of

Service

Abuse ofpersonal data

or identitytheft

Blackmail/ransom

demands/intimidation

Haveexperiencedattacks, butprefer not to

divulgespecifics

Vishing(voicemailphishing)

Yes 97% 82% 78% 77% 58% 55% 54% 49% 36% 26% 20%No 3% 18% 22% 23% 42% 45% 46% 51% 64% 74% 80%

0%10%20%30%40%50%60%70%80%90%

100%

% o

f par

ticip

ants

23

Table 6 Security Incidents reported to the expert (overall, not specifically related to WHOIS misuse).

Only 40% of the respondents reported that they consider the possible contribution of WHOIS misuse when analyzing security incidents. Such an observation has two possible

interpretations (or a combination of interpretations); either misuse of WHOIS data is an attack

vector that is being underestimated by the security experts and, thus, is not considered as

valuable aspect to analyze, or that WHOIS misuse is found to be insignificant in examining

security incidents. However, in a few cases, the experts reported that they were able to trace

back an attack to the public availability of WHOIS information, as described next.

Specific WHOIS misuse incidents In Figure 4 we show that 18 respondents (18%) were able to provide details in relation to 23

individual incidents involving suspected harvesting of WHOIS information.7 The experts directly

experienced about half (45%) of those incidents, as they were the targets of the misuse. In most

of the cases, the effect of the misuse was the reception of electronic and postal spam mail

containing marketing materials or bills for services that were not requested. However, a few of

those incidents (4) show highly sophisticated planning to extract money, distribute malware, and,

in one case, to poison DNS servers by deploying a phishing attack using WHOIS information. In

another case, Registrant information was used to register numerous domains for illegal

purposes.

7 The nature of the survey (expert survey) does not allow us to extrapolate this rate of WHOIS misuse occurrence, and it is merely an illustration of the kinds of misuse of WHOIS reported on a global scale.

Email spam PhishingMalware

installation/drive by

downloadsEmail virus

Abuse ofpersonal data

or identitytheft


serversDenial ofService

Blackmail/ransom


Postal spamVishing

(voicemailphishing)

Haveexperiencedattacks, butprefer not to

divulgespecifics

Yes 74% 69% 67% 67% 67% 62% 58% 50% 47% 39% 30%No 26% 31% 33% 33% 33% 38% 42% 50% 53% 61% 70%

0%10%20%30%40%50%60%70%80%90%

100%

% o

f par

ticip

ants

24

Figure 4 Portion of survey respondents, reporting at least one incident of WHOIS misuse.

The types of personal information reportedly misused were mainly the email address (16 cases,

or 70% of all 23 cases of misuse). However, there were many instances where Registrant name

(6 cases, 26% of all 23 cases of misuse), postal address (6 cases, 27% of all 23 cases of

misuse), and phone number (4 cases, 17% of all 23 cases of misuse) were misused as well,

either individually or in combination with other personal details. Figure 5 summarizes these

findings.

.

Figure 5 Breakdown of reported cased of WHOIS misuse, based on the type of personal information misused. Certain cases of misuse involved more than one type of information being misused, hence the total is greater than 100%.

18%

82%

0%10%20%30%40%50%60%70%80%90%

Respondents reportingexperience of WHOIS misuse

incidents

Respondents NOT reportingexperience of WHOIS misuse

incidents

% o

f par

ticip

ants

Emailaddress

Registrantname

Postaladdress

Phonenumber

Misused information 70% 26% 26% 17%

0%

10%

20%

30%

40%

50%

60%

70%

80%

% of cases involving

specific misuse

25

Note that the percentages in Figure 5 correspond to the fraction of misuse cases; but recall that

only 18% of our respondents experienced any form of misuse at all. Furthermore, certain cases

involved multiple types of information being misused – and thus the percentages add to more

than 100%.

In 11 (48%) of the reported WHOIS misuse cases, experts reported taking no action to mitigate

the misuse (either the effects of it, or a future reoccurrence). However in 11 out of the 12

remaining cases where anti-harvesting techniques were subsequently employed, WHOIS

misuse incidents were eradicated. A few examples of such techniques include CAPTCHA

challenges and IP blocking, and one less technical mechanism where the legal department of

the affected company identified the WHOIS harvesters and demanded that they destroy the

misused WHOIS data.

4.3. Discussion We surveyed law enforcement and security research experts to comprehend the extent of

misuse of the publicly available WHOIS information globally. We succeeded in having a

geographically diverse sample with different types of expertise providing us with their insights on

WHOIS misuse. However, as this is an expert survey with a limited population sample, we do

not achieve statistical significance in our findings. (Note that this was not a goal, due to the

inherent nature of an expert survey.)

Overall, we found that, according to experts participating in this survey, WHOIS data misuse is

generally not considered when investigating security incidents, possibly because it is

underestimated as an attack vector. It is also noteworthy that contrary to the wide net we cast in

this survey, we were able to collect only a moderate-sized list of WHOIS misuse incidents from

organizations that should have an extensive understanding of the matter. This could mean that

WHOIS misuse is either under-reported or not as prevalent as conjectured. The other parts of

this study attempt to provide a more definitive answer to this question.

We collected reports from a minority of the respondents that they had directly observed WHOIS

misuse incidents. The effects of these incidents range from simple spam, to a well-orchestrated

phishing attack with the purpose of DNS-poisoning. Additionally, the countermeasures deployed

in those cases (mainly CAPTCHA and IP blocking) were adequate in preventing future WHOIS

26

misuse incidents. Again, other parts of this study explore anti-harvesting measures more

empirically.

27

5. WHOIS misuse reported by Registrants We surveyed a representative sample of top 5 gTLD domain name Registrants described in

Section 3.2 to gain a better understanding of their direct experiences with WHOIS misuse. In the

following sections we will first discuss the methodology and design details of the Registrant

survey. Later, we describe issues presented during the survey, which affected the

representativeness of our findings. We then present our discoveries related to the ways

Registrants experience misuse of their personal information as a consequence of its public

availability in WHOIS.

5.1. Survey methodology and design details

Methodology We used email messages to invite Registrants to participate in the survey. We acquired the

contact information through the WHOIS entries associated with the domains in our sample. The

invitation contained a short description of the study, information about the principal investigator,

and links to either participate in the survey or opt out from any future messages and reminders

from us. Because this survey was designed to be taken by non-Internet-savvy Registrants, the

invitation briefly described domain registration and the role of WHOIS data in simplified

language, included the name of the sampled domain name included in our survey, and

suggested that invitees query that domain name to see data about them published in WHOIS.

We also offered the option to download the questionnaire and email the responses to us. The

content of the invitation is available in Appendix B – Registrant survey: Invitation to participate.

When participants clicked on the link to participate they were presented with a consent form that

describes briefly the procedures, requirements, risks, benefits, associated compensation (entry

into a random prize drawing), and privacy assurances we offered. The text is available in

Appendix B – Registrant survey: Consent .

Between May 2012 and August 2012 we ran two pilots of the survey, which guided us in making

adjustments that increased the observed response rate. The actual survey lasted three and a

half months, from September 2012 until December 2012. The invitations were sent out in stages,

and each group of invitees was offered a period of 5 weeks to complete the survey. We also

scheduled the distribution of weekly reminders to non-respondents that increased the response

rate. The survey was implemented with SurveyMonkey and all connections to the service were

28

protected with SSL.8 Invitees were assured that all responses would be treated as confidential,

with survey data published in only aggregate, anonymized form.

Survey translations Because potential for WHOIS misuse is not restricted to English-speaking countries and this

survey was targeted at typical Internet users across the world, we developed translations of our

survey. We relied on native speakers of various languages from CMU for the translations. Our

translators all had a background in computer network or computer security, which meant they

not only had the required technical background to produce meaningful translations, but they

were also able to integrate nuances of the different cultures, making the international invitee

more likely to understand the survey materials and therefore more willing to participate.

Our sample of 1619 domain name Registrants covers 81 countries, which would have required

a disproportionate effort to translate the survey in some languages that would be mapped to a

handful of participants. In addition, the expected low response rate of the survey (15%) was a

good indicator that a number of translations would not be necessary, as the expected number of

responses for certain languages was close to zero, regardless of the language used. We

observed that 90% of our sample was located in just 18 countries, with the other 10% spread

across 63 countries. Hence, we decided to provide translations for the top 90% of the

participants (which includes English), and offer the English version of the survey to the other

10%. We offered the survey in the following languages: English, Chinese, French, Japanese,

Spanish, Italian, and Portuguese. We also intended to have German and Turkish translations,

but were not able to secure proper translations and ended up offering the English version of the

survey to participants from those two countries. This effectively reduced the portion of

participants surveyed in their expected native language to 84.9%.

As the expected response rate for the 10% of the invitees that belong to one of 63 countries is

close to zero, regardless of the language used in the survey, we do not expect that not providing

translations for this portion affected the outcome of the survey. Invitees from Germany and

Turkey represent 5% of the sample. Considering the expected response rate, and assuming

that none of the invitees from those counties have knowledge of English (which is certainly an

extremely conservative assumption), we estimate that the upper bound of the misrepresented

population is only 0.7%.

8 See footnote 6.

29

Types of questions The survey is divided into three parts. The first set of questions was designed to collect data on

the demographics of the participants. The second part of the survey was associated with seven

different types of misuse of WHOIS: postal spam, email spam, voice spam, identity theft,

unauthorized intrusion to servers, denial of service, Internet blackmailing, or any other type of

misuse a Registrant may have experienced. We requested that the participants optionally

provide a detailed description of their experiences in any of the previous categories. Due to the

length of the survey, which could take up to 30 minutes to complete, and could therefore lead to

participants abandoning the survey before completion, we randomized the sequence of

questions for different types of misuse, in an effort to avoid biases related to the design of the

survey. The third and final part of the survey collected information related to actions taken by

the participants in response to the WHOIS misuse. The survey questions are available in

Appendix B – Registrant survey: Survey questions. Through an online glossary, we also offered

definitions for key terms used in the survey questions, to accommodate typical Internet user

participants not familiar with the technical DNS and cybersecurity jargon. The terms are

available in Appendix B – Registrant survey: Terms.

5.2. Response and error rates Between May and August of 2012, we ran two pilots of the Registrant survey to assess possible

issues with the design and/or implementation of the survey. One pilot involved tech-savvy

colleagues at CMU with great experience in user surveys. This pilot helped us identify and fix a

number of design issues. The second pilot was targeted to a broader audience of randomly-

selected English speaking Registrants, and was intended to assess the expected response rate.

As shown in Table 3, we expected a response rate of 15%. However, in this second pilot, we did

not receive any responses out of the 48 invitations sent. We identified as a possible problem the

excessive length of the survey, which apparently discouraged participation. Therefore, we

attempted to remedy this by offering entry into a random prize drawing9 to participants that

would complete the survey in its entirety. Note there was no incentive to report having

encountered misuse; respondents were only required to complete survey sections that

pertained to their experiences.

9 The prizes were one Apple iPad 3 and four Apple iPod Shuffles, selected by random drawing among all participants who completed a survey.

30

Overall, we sent out 1619 invitations and had 57 participants: 52 in English, 3 in Japanese, and

2 in Spanish, achieving a response rate of 3.6%. Out of these 57 participants, we had 41

complete responses. Such a low number in collected responses impacts our targeted levels of

significance, namely the error rate. The resulting error rate for the statistic we are measuring (is there observed WHOIS misuse?) is 12.7%. This means that for 95% of the population, the

measured misuse deviates from the actual misuse in 12.7% of Registrants. For the other 5% of

the population, the deviation of the measured misuse can deviate by more than 12.7% of the

actual value (i.e. far more or far less misuse).

We should point out that inviting more Registrants was not expected to help us reach the goal of

5% error rate. If we were to invite every one of the 2,905 Registrants in the Registrant

microcosm, with an observed response rate of 3.6%, we would collect 105 responses. This

number of responses would result in a 9.4% error rate. This lower error rate would be

associated with a higher cost of running the survey, due to additional translations required.

5.3. Analysis of responses We start the analysis of the collected responses by first giving an overview of the characteristics

of the sample in terms of the demographics as well as the knowledge reported about the

WHOIS directory. We then delve into the specific types of WHOIS misuse reported.

Characteristics of the participants From a demographic standpoint, the participants are mainly from English speaking countries

(92%) even though we made efforts – as previously discussed – to include a wide geographical

range of participants. We collected responses from the following countries (in descending order

of number of participants): USA, Japan, United Arab Emirates, Australia, Canada, Switzerland,

Germany, Spain, UK, India, and Mexico (Figure 6). There were also respondents that did not

disclose their location.

31

Figure 6 Reported origin of participants.

Although each Registrant was surveyed just once, in regards to a single sampled domain name,

the majority of the participants (60%) have more than 10 domains registered, with 9% of the

participants operating a single domain. Additionally, the domains in our sample are mainly

registered by self-described for-profit businesses or organizations (49%), followed by the

domains registered by individuals (33%), and domains registered by non-for-profit organizations

(14%)10 (Figure 7). Moreover, respondents reported that most of the domains (46.5%) in our

sample are used for commercial activities. Finally, the great majority of the participants (93%)

indicated they are aware that any personally identifiable information included in Registrant name

and contact data can be accessed via the public WHOIS directory.

10 This survey asked Registrants to indicate whether a domain name was registered by an individual, for-profit business or organization, non-profit organization, informal group, or other. Unlike other WHOIS studies (NORC, 2010 and 2013), we did not attempt to verify these answers or to classify entities actually using domains for any stated purpose.

31

2 1 1 1 1 1 1 1 1 1 05

101520253035

Part

icip

ants

from

a

sing

le c

ount

ry

32

Figure 7 Self-reported use of surveyed domains.

Comparing the self-reported demographics of our survey with the WHOIS-based findings of the

WHOIS Registrant Identification Study (NORC, 2013), we see that the top two categories are

occupied by similar entities in both studies, with individual /natural person Registrants appearing

roughly with the same frequency (30% vs. 33%). In our study, the combined share of categories

representing legal person Registrants is 62% compared to 39% in (NORC, 2013).

Reported WHOIS misuse We now present our findings for each specific type of WHOIS misuse that we studied. In each

set of questions, we first asked the participants to report if they have experienced misuse of

specific type of information supplied when registering their domain. If the answer is yes, we then

asked more specific questions about those misuse incidents.

25 of the respondents (43.9%) reported experiencing some kind of misuse of their WHOIS information. In Table 7 we provide a breakdown of the reported WHOIS misuse for the three

types of information published in WHOIS that are reportedly subject to misuse: postal and email

address, and phone number.

For-profitbusiness ororganization

Individual use Non-profitorganization

Informalinterest group

Use of domains 48.8% 32.6% 14.0% 4.7%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%%

of d

omai

ns

33

Table 7 Breakdown of participants reporting misuse, based on the type of reported misuse

Postal address misuse 38.6% of surveyed Registrants (22) have received postal spam mailed to an address published

in WHOIS, and 29.8% (17) believed the unsolicited mail resulted from misuse of their WHOIS

postal address. As a proof of their suspicion, participants provided details of the unsolicited mail;

it was either directly related to one of their domains, or it advertised web services. Moreover,

21.1% (12) of the participants reported that their WHOIS postal address was not published in

any other public directory (e.g. phone book, website, etc.).

The majority of the respondents that have received postal spam (14% of total, 8) experience this

a few times a year, with 11% (6) receiving postal spam a few times a month, and 5% (3) less

than once a year. The reported subjects of the unsolicited correspondence were mainly related

to fake domain name renewals and transfers, followed by messages related to website hosting,

and search-engine optimization (SEO) services.

Email address misuse 25 Registrants (43.9%) reported receiving spam email at an account associated with a WHOIS

email address. 29.8% (17) of those associate the misuse of their email address to WHOIS

because the topics of the spam emails specifically targeted domain name Registrants (e.g.

domain name transfer offers, domain name SEO offers). 14% (8) of the Registrants stated they

have not listed the misused email address in any other public directory.

Phonenumber

Emailaddress

Postaladdress Combined

Experienced misuse, andinformation was published in

WHOIS only8.8% 14% 21.1% 43.9%

Experienced misuse andattribute misuse to WHOIS 12.3% 29.8% 29.8% 43.9%

Experienced misuse 22.8% 43.9% 38.6% 73.7%

0%10%20%30%40%50%60%70%80%

% o

f res

pons

es

34

The majority of the respondents (10%, 6 Registrants) identifying WHOIS data misuse as a

cause for email spam reported that they receive spam email at the email address published in

WHOIS a few times a day, followed by 9% of responses (5 Registrants) receiving unsolicited

email a few times a week. The topics of the unsolicited messages are similar to the ones

reported for postal spam.

Phone number misuse 22.8% (13) of Registrants reported receiving voicemail spam, with 12.3% (7) attributing the

spam to WHOIS misuse. They were able to associate the voicemails with WHOIS because the

caller either explicitly referred to a domain name under the Registrant’s control or they were

offering domain services. 9% (5) of the Registrants who claimed to have experienced the

misuse of their WHOIS phone number said they had not listed their number in any other public

directory.

Identity theft Two of the participants reported that they have experienced identity theft but none could tie this

to WHOIS misuse.

Unauthorized intrusion to servers In order to measure the extent of misuse of WHOIS information to gain unauthorized access to

servers, we first asked the participants if they are the system administrators of Internet servers

associated with one of their registered domains. The number of participants that have this role is

very small (7%, 4), with just one person experiencing unauthorized intrusion. That respondent

could not tie the intrusion to WHOIS misuse.

Blackmail One participant reported being a victim of blackmail11 as a result of their information being

published in the WHOIS directory. The Registrant was allegedly accused by a third-party

company of violating the terms of domain registration because of the name the Registrant chose

for the domain. The Registrant said he was asked to pay some amount to settle, but after

consulting with lawyers, the Registrant decided to not take any action. After a few months, and a

series of emails from the third party, the latter stopped communicating with the Registrant. The

11 We describe this incident as reported by the Registrant, but cannot know the veracity of this claim or whether the domain name dispute was founded.

35

Registrant reported being adversely affected in terms of time (reading emails), and money

(lawyer consultation).

Other Although this survey gave Registrants an opportunity to describe WHOIS misuses not otherwise

covered, no participant claimed to have experienced any other type of WHOIS misuse.

Adverse effects In Figure 8 we present the portion of Registrants that reported they were adversely affected by

the misuse of their information, reportedly caused by WHOIS. In all types of misuse the main

adverse effect is the frustration caused by the extra time the Registrants need to go through the

spam email, postal mail, and voicemail. Spam calls associated with WHOIS misuse, even

though they only occur a few times in a year, appear to cause the highest level of frustration

(12%), possibly because spammers directly interact with the person picking up the phone.

Spam postal mail causes the least frustration (5%): people are used to junk mail, and WHOIS

associated postal spam is relatively infrequent. WHOIS-related email spam, even though it is

the type of misuse most prevalent and frequent, adversely impacted 10.5% of the Registrants. A

plausible explanation for this discrepancy is that people in general, and Registrants in this case,

are used to receiving many unsolicited emails on a daily basis. Therefore the marginal cost of

deleting one more spam email originating due to WHOIS misuse may be considered negligible

by sampled Registrants.

Figure 8 Portion of participants adversely affected by the misuse of their information published in the WHOIS, broken down into the three main types of misuse.

12.3%

10.5%

5.3%

0%

2%

4%

6%

8%

10%

12%

14%

Phone number Email address Postal address

% a

dver

sely

affe

cted

Type of misuse

36

Countermeasures 40% (8) of the 20 Registrant survey participants that have experienced at least one type of

WHOIS misuse reported having taken actions to protect themselves from additional WHOIS

misuse. On the other hand, 60% (12) of Registrants experiencing misuse did not take any

countermeasures. Registrants that took action reported utilizing a combination of the following:

Moving to a different Registrar (3). Change misused portions of WHOIS information (4). Change contact addresses and names with ones from a service provider (proxy

services) (4). Change contact addresses with forwarding addresses provided by a service provider

(privacy services) (3). Supply partially incorrect or incomplete information (2). Apply spam filter or register with an identity theft protection service (5).

The last option attracted the most interest, even though it only deals with the consequences of

the misuse, rather than trying to remedy possible factors leading to the WHOIS misuse itself.

24.5% of participants (14) were aware of strategies used by their domains’ Registrars to deter

WHOIS misuse. Most of the responses indicated the availability of proxy and privacy services

as part of the Registrars’ strategies against WHOIS misuse; and the use of CAPTCHAs in web-

based WHOIS queries as part of the set of strategies.

5.4. Discussion Getting Registrants to communicate their experiences in terms of the possible misuse of their

personally identifiable information listed in WHOIS proved to be a challenging task. Even with

an incentive to participate (a raffle at the end of the survey), we were only able to collect

responses from a small portion of invitees (57 out of 340, or 17%). However we were able to get

a clear insight into the prevalence of WHOIS misuse and the specific types of information that is

usually targeted.

Our study showed that 43.9% of Registrants claim to have experienced some type of WHOIS misuse. Given the margin of error rate of 12.7% this observation neither confirms or disproves that WHOIS-misuse is affecting the majority of Registrants. It does confirm though the hypothesis that public access to WHOIS data leads to a measurable and statistically significant degree of misuse.

The email address is mostly targeted, followed closely by the postal address. Phone numbers

are also misused, but with a much smaller occurrence and higher adverse impact per incident.

37

In terms of certainty of whether the misuse is originating from WHOIS, postal address misuse

comes first.

Potential survey biases We need to contemplate the biases the survey design introduced to evaluate the possibility of

over or under-reporting of WHOIS misuse. First, by not providing translated versions of the

survey to 15% of the sample, we may have missed some incidents of misuse experienced by

Registrants that do not speak English. However, given the observed response rate (3.6%), the

expected response rate of that portion of the sample (15%) is less that 1%. (3.6% of 15%) In

other words, even if we had all the possible translations, we expect that we would not get a

statistically significant number of responses from this group.

Another possible bias is that Registrants may be more willing to report a harmful act (e.g.

experience with misuse) rather than a lack of harmful incidents, which could lead to over-

representation of the incidents. In addition, we did not attempt to verify or corroborate any

WHOIS misuse incident, which could lead to false representation of the extent of WHOIS

misuse. However, the strong economic incentive we provided (entry into a random prize

drawing) was given for completing the survey, regardless of the kind of responses entered, and

should mitigate this potential source of bias.

One may argue that as this is a survey with a fair amount of technical content, it is biased

towards tech-savvy participants. We attempt to mitigate this possibility by providing explanatory

links throughout the survey. Additionally, since the registration of a domain assumes some level

of technical understanding about the Internet, we believe that the technical complexity of this

survey should be within the technical understanding of most Registrants.

Finally, as the described, the great majority of the survey participants originate from North

America. This fact affects our findings in the following ways; first, we are unable to analyze the

geographical distribution of misuse, as the survey suffers from coverage bias. Consequently,

findings are also descriptive of a narrower portion of the world population than we had wished.

As a result, the survey cannot accurately capture potential geographical diversity in the

occurrence of WHOIS misuse.

38

6. Assessing Registrar/Registry anti-harvesting In this section we discuss the WHOIS anti-harvesting techniques offered by the Registrars and

Registries. We first present the results of a survey that collected information from Registrars and

Registries regarding their experiences in terms of WHOIS harvesting incidents and employed

countermeasures. Then, we empirically tested the Registrars’ infrastructures when faced with

WHOIS queries at high rates, and we present our findings here.

6.1. Survey methodology and design This survey targeted the top five gTLD Registries and a globally diverse sample of Registrars to

collect information related to their experiences in terms of WHOIS misuse incidents, and their

efforts to counter such activity. We used email messages to invite a sample of Registrars and

Registries to participate in the survey. The invitation contained a short description of the study,

information about the principal investigator, and links to either participate in the survey or opt out

from any future messages and reminders from us. We also offered the option to download the

questionnaire and email the responses to us. The content of the invitation is available in

Appendix C – Registrar and Registry Survey: Invitation to Participate

When invitees click on the link to participate they are presented with a consent form that

describes briefly the procedures, requirements, risks, benefits, associated compensation (none),

and privacy assurances we offered. The text is available in Appendix C – Registrar and Registry

Survey: Consent form. In the consent form we offered assurances in terms of the confidentiality

of the reported results, in that no Registrar or Registry would be mentioned explicitly, and all the

results would be presented in aggregate form.

Before running the survey we ran a pilot with a small number of Registrars to evaluate the

quality of the questions and the related material. Some questions and part of the consent form

were modified to reflect pilot-reported sensitivity, particularly around disclosure of anti-

harvesting techniques.

The Registrar survey lasted 6 months – from March 2012 until September 2012 – and collected

in total 22 responses out of 111 invitees. For the invitation process, we used information

associated with the Registrant Survey sample, by identifying the Registrars and Registries that

collected and/or store WHOIS information for those sampled domains. Since our sample is

targeted based on the survey design, we do not make any claims of statistical significance in

39

terms of the overall gTLD Registrar and Registry population. However we do claim that we have

collected responses from 22 out of the 107 largest Registrars and, regrettably, despite

personalized invitations and multiple follow-up phone calls to Registry contacts in March 2013,

only one of the 4 top 5 gTLD Registries. The survey was implemented with SurveyMonkey and

all connections to the service were protected with SSL.12

6.2. Analysis of responses We first describe the demographics of the Registrar/Registry survey participants in terms of their

location, the volume of domain registrations and WHOIS queries they process monthly. We then

provide an overall summary of our findings from this survey.

Demographics The majority of 22 Registrars that participated in the survey were located in the United States

(5), with the rest distributed across the following countries: China, Germany, Spain, Poland,

Turkey, France, India, South Korea, and UK. About 64% of the Registrars handle under 1

million domain registrations each, and 14% handle between 1 and 10 million registrations each

(Figure 9).

Registrars reported that the most popular method of querying their WHOIS databases is by port

43, which 56% of the Registrars said was used for 100,000 to 10 million queries per month.

Figure 9 Number of domains registered with Registrars participating in survey

12 See footnote 6.

Exactly or under 100000 100 001 to 1 000 000 1 000 001 to 10 000 000 More than 10 000 000

Number of domain registrations 50% 14% 14% 0%

0%

10%

20%

30%

40%

50%

60%

% o

f Reg

istr

ars

40

Table 8 WHOIS queries received by Registrars participating in survey. Note that not all participants answered all questions, so that the columns do not add to 100%.

Employed anti-harvesting techniques 57% of surveyed Registrars and Registries (13 of 23) implement at least one WHOIS anti-

harvesting technique, and in Figure 10 we present a breakdown of the techniques implemented

per Registrar/Registry. 39% (9 of 23) reportedly implementing port 43 rate limiting. 56.5% (13 of

23) provide web forms for interactive WHOIS queries, and 39% (9) require an answer to a

CAPTCHA type challenge to receive the WHOIS response. 30% of surveyed Registrars and

Registries (7) reported that they use permanent IP/domain blacklisting when necessary, while

52% (12) blacklist temporarily abusers of the service for 5 to 10 minutes.

In addition to direct anti-harvesting measures designed to deter active harvesting, we also

asked Registrars and Registries about Privacy and Proxy services that make harvesting less

desirable. Only 22% of surveyed Registrars and Registries (5) said they offer privacy services

that shield contact details of the domain Registrant except for the Registrants name, and 9% (2)

said they offer proxy services that completely shield all contact details. However, Registrants

can also use privacy and proxy services offered by third parties that are not Registrars or

Registries. Interestingly, when looking at the Registrant survey responses for Registrants who

chose countermeasures other than privacy and proxy services, surveyed Registrants reported

only one instance where the Registrar did not offer a privacy/proxy service.

Port 43 WHOIS protocol queryresponses/month

Web form WHOIS queryresponses/month

Bulk WHOIS data purchasetransactions/month

Do not know or do not measure 18% 27% 27%1,000,001 to10.000.000 9% 5% 0%100 001 to1 000 000 32% 14% 9%Exactly orunder 100 000 14% 27% 32%

0%

5%

10%

15%

20%

25%

30%

35%

% o

f Reg

istr

ars

41

Figure 10 Proportion of Registrars and Registries implementing a specific WHOIS anti-harvesting technique.

Incidents of WHOIS misuse We inquired about harmful events associated with incidents of alleged WHOIS misuse that were

reported by any Registrant13. Table 9 shows the reported events in a descending order of

prevalence. On the top of the list is email spam, which was reported to 39% (9 of 23) of the

Registrars. It is followed by phishing (22%, 5), postal spam (17%, 4), email virus (9%, 2), ID

theft (9%, 2), and various forms of blackmail (9%, 2). 26% of the Registrars and Registries (6 of 23) said they were able to verify that the reported harmful acts originated from misuse of the WHOIS information.

Incidents of WHOIS harvesting and their effect in deploying new countermeasures 30% (7) of the surveyed Registrars and Registries have reportedly experienced attempts of

automated harvesting of WHOIS information from their directories, but the respondents did not

classify any as successful. The same respondents also reported that they have adopted new

anti-harvesting techniques in the past 2 years, as a result of the observed attacks. The most

prominent additions to their defenses are permanent and temporary IP and domain blacklisting

13 We did not ask Registrars or Registries about specific incidents that were discussed in Registrant survey responses.

52%

39% 39%

30% 22%

9%

0%

10%

20%

30%

40%

50%

60%

Temporaryblacklisting

Port 43 ratelimiting

CAPTCHAtype

challenge

Permanentblacklisting

Privateregistration

services

Registrationvia proxy

% o

f Reg

iatr

ars\

Regi

atri

es

Type of anti-harvesting technique implemented at Registrar/Registry

42

along with port 43 rate-limiting (4), privacy/proxy protections services (3), and CAPTCHA (2).

Respondents were not asked to evaluate the perceived effectiveness of these measures.

Many participants did not provide responses in this section. That can be attributed to the

sensitive nature of the information we requested. Even though we provided assurances for the

safe handling and aggregation/anonymization of any information collected by this survey,

Registrars and Registries appear to be hesitant about providing WHOIS misuse specifics.

Table 9 Registrars receiving reports related to suspected types of WHOIS misuse

6.3. Testing of WHOIS query rate limiting techniques We complement our survey with an experimental validation of methods employed by Registrars

and Registries to combat WHOIS misuse. More precisely, we performed two types of tests on a

sample of Registrars and Registries to evaluate the availability and effectiveness of WHOIS

harvesting countermeasures. First, we performed rate-limiting tests on port 43 of Registrars and

registries, the well-known network port used for the reception of WHOIS queries. Additionally we

carried out rate-limiting tests for interactive WHOIS query web forms provided by Registries.

Table 10 presents our findings related to the 3 thick Registries that are within the focus of this

study. Based on our test results, we observed that one Registry provides none of the tested

anti-harvesting mechanisms whatsoever; however the other two Registries employ a

combination of anti-harvesting techniques. For instance one Registry employs relatively strict

measures by enforcing the use of CAPTCHA, and it allows a very small number of queries to be

issued to port 43 before applying a temporary blacklist.

Email spam Phishing Postal spam Email virusAbuse of

personal dataor identity

theft

Blackmail/ransom


Denial ofService

Vishing(voicemailphishing)


servers

Registrantshave reportedexperiencingharmful acts,

but I prefer notto divulgespecifics

Registrars 39% 22% 17% 9% 9% 9% 4% 4% 4% 4%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

% o

f par

ticip

ants

43

Type of defense Number of Registries Details

Registries with no observed anti-

harvesting techniques on port 43 1

Registries that limit number of

queries, then blocking further

requests 2

Range of allowed queries: 4 - 40

Observed blacklisting duration: 1

minute14

Table 10 WHOIS query rate limiting at the thick Registries

In testing each ”thin WHOIS” Registry’s port 43 rate limiting, we issued a number of WHOIS

queries (1000) targeted at a specific Registrar, by requesting WHOIS information about a

domain registered at that Registrar. We then measured how soon the Registrar would block

further WHOIS query requests.

We tested Registrars’ rate limiting on port 43 in two stages. The first stage involved the 16

Registrars we used in the experimental study (Table 13). In this case we used domains that we

registered as part of the experiment in order to issue WHOIS queries.

In Table 11 we see that only half of the Registrars employ rate limiting as an anti-harvesting

technique, while for the others we observed no such measures. On average, they allow 83

queries, before stopping from further responding to additional queries. Just two Registrars in

this group provided information (as part of the WHOIS response message) related to the

duration of the temporary blacklisting, which in both cases was 30 minutes. One Registrar would

not provide responses in a timely manner, causing our testing script to identify this behavior as

a temporary blacklisting. By repeated testing with those Registrars we verified that the error was

not caused by a problem in the testing environment. It is unclear if this was an intended

behavior from the Registrar to prevent automated queries, or if it was just a temporary glitch in

their systems.

It is noteworthy that, when more than 100 queries originate from the same IP address, two of

the Registrars tested with port 43 queries would provide only the name of the gTLD domain

name Registrant and not any other WHOIS address details like email or postal address, and

14 We did not test if the duration of the blacklisting would change, after attempting to harvest WHOIS data for longer periods of time. We also did not vary our query rate to explicitly trigger or bypass rate limits.

44

would invite interested parties to use a web form to acquire more information about domains.

We did not perform any additional measurements using this form.

Type of defense Number of Registrars

Details

Registrars with no observed anti-

harvesting techniques on port 43. 6 (37.5%)

Registrars that limit number of queries,

then block further requests. 7 (44%)

Mean allowed queries: 83

Standard deviation: 66

Registrars that deter automated

queries, by delaying their responses by

a few seconds.

1(6%)

Registrars providing only name of

Registrant when harvesting is

detected.15

2 (12.5%)

Table 11 Aggregate results of WHOIS query rate limiting at 16 Registrars used in the experimental study.

The second stage involved the remaining Registrars associated with the domains in the

Registrant sample. Here, we queried for domains from the Registrant sample. There were a few

occasions, though, where the domains from the Registrant sample associated with specific

Registrars had expired or moved to different Registrars. In those cases we used the name of

the Registrar itself to initiate WHOIS queries, assuming that a Registrar uses its own

infrastructure to register its own domain. This assumption was not always true in the cases of

reseller Registrars. Therefore cases where a single Registrar appears multiple times were

consolidated and each Registrar counted once.

Similar to Table 11, Table 12 presents our findings for the second part of the Registrar testing.

Registrars are mainly divided between those that do not exhibit any anti-harvesting capacity and

those limiting the number of queries, followed by temporary blocking. However, most of the

Registrars (10 vs. 6) used in the experimental study offer some type of WHOIS anti-harvesting

protection, contrary to the majority of the Registrars in the second set that do not offer any

protection (48 vs. 37).

15 When more than 100 queries originate from the same IP address.

45

This observed difference in proportions potentially means that, if we conclude that WHOIS anti-

harvesting measures deter WHOIS misuse, then the measured misuse in our experimental

study will represent a lower bound of the total misuse occurrence. This conclusion assumes that

we test a representative sample of Registrars; this assumption appears valid, as the set of

Registrars is itself derived from a representative set of Registrants, as described in Section 3.2.

Type of defense Number of Registrars Details

Registrars with no observed anti-

harvesting techniques on port 43 48 (54%)

Registrars that limit number of

queries, then block further

requests 37 (42%)

Mean allowed queries: 92 Standard deviation: 99 3 Registrars imposed 24 hour

ban 2 Registrars imposed 20 min

ban 1 Registrar imposed 1 min ban 2 Registrars limit max requests

per time frame (day/minute) per IP address.

1 Registrar exponentially increases waiting period if not obeying wait time of 1 second after 15th query.

Registrars that deter automated

queries, by delaying their

responses by a few seconds.

3 (3%)

Registrars providing only name

of Registrant 1 (1%)

Table 12 Aggregate query rate limiting test results for 89 Registrars appearing in the Registrant sample, but not in the experimental study. We present the findings of this group of Registrars separately, because of the differences in the testing methodology.

6.4. Discussion Possibly the most interesting finding of this part of the study is the hesitation of the Registrars

and Registries to provide insights on the reported and experienced incidence of WHOIS misuse,

making it difficult to draw representative conclusions. Considering the responses that were

46

answered by most of the participants, we see that WHOIS queries are mainly carried out

through port 43, followed by web forms, and then by bulk purchases. However the latter has the

potential for higher impact in misuse, as the number of WHOIS records exchanged is by

definition very large. Nevertheless, port 43 rate limiting appears to be the most widely-adopted

anti-harvesting technique.

It is more insightful to focus on the rate limiting tests that we undertook. The three thick

Registries represented 14.6%16 of the total domains registered in August of 2011, and the 92

Registrars used to test thin Registries have a combined 77.4% market share17. With .COM

and .NET domains representing 85.6% of total domains for the same period, the 92 Registrars

cover 66% of the total combined .COM and .NET domain population. Combining information

from Tables 11, 12, and 13 we conclude that 51.4% of tested18 Registrars and Registries do not

employ any port 43 rate limiting technique with the remaining 48.6% employing some type of

rate-limiting technique.

The approach pursued by the two Registrars which only provided the name of the gTLD domain

name Registrant and none of the other WHOIS details – instead referring interested parties to

filling out a web form – appears to be an interesting compromise between protection of

personally identifiable information against port 43 harvesting and the contractually mandated

port 43 availability of WHOIS information imposed by ICANN.

In Sections 8.2 and 8.3 we study the correlation of the existence (or lack thereof) of anti-

harvesting mechanisms with the measured occurrence of WHOIS misuse from the experimental

study to evaluate the effectiveness of such measures.

16 BIZ, INFO, and ORG domains combined (Table 2). 17 Market share was estimated using the Registrars’ proportions in the Registrant sample, which is representative of the Internet. 18 Note that we did not test a representative set of Registrars and Registries, but rather a subset of Registrars and Registries used to register sample domain names.

47

7. Experimental Study The experimental study attempts to complement the descriptive study by gathering a set of

controlled network measurements. The experimental study aims to capture directly incidents of

WHOIS misuse by registering 400 domain names with a variety of Registrars, using artificial

contact information, and then monitoring possible misuse of this publicly available information

that we did not publish or use anywhere else. The channels that are monitored to this end

include email, postal addresses and phone numbers.

To provide a sound basis for comparison, we built upon the framework laid out in the WHOIS

spam study (SAC023, 2007). The authors of the study set up domains in 3 gTLDs

(.info, .com, .org) and 1 ccTLD (.de), and used contact information they did not publish

anywhere else. They used four different types of registration: 1) with Registrars that provide

anti-harvesting features (e.g., port 43 rate-limiting, CAPTCHAs), 2) registration by proxy, 3)

using a combination of both methods, and 4) using no such method. They then measured the

amount of spam received in all different conditions over a 90-day period. They also provided a

simple data analysis of the different types of spam being received, distinguishing between the

different types of products advertised, and phishing scams.

Taking this study as our starting point, we expanded on it as follows. We used 16 of the most

popular Registrars identified by our Registrant Survey sample to register 400 domains (note that

this expands the study (SAC023, 2007) to NET and BIZ domains; on the other hand, we did not

register any ccTLD domain). For each domain we registered, we set up a WHOIS-published

Registrant contact email address in the form of contact@domain_name.TLD. We also set up

additional unpublished email addresses for each domain name in the form of a “catchall”

account that collects all emails sent to an email address in the form *@ domain_name.TLD. The

only location where we published the Registrant email addresses was in WHOIS. In addition, we

set up incoming VoIP numbers (published as WHOIS Registrant contact information) to quantify

the amount of phone spam (and “vishing,” voicemail phishing) received. To reduce personnel

costs, the VoIP accounts were not actively monitored by an individual, but were instead

forwarded to a voicemail box that was periodically checked.

We also set up 3 PO Box accounts in the United States to detect possible spam postal mail sent

to our experimental domain Registrant names and addresses published only in WHOIS.

48

In the following sections we describe the design of the different components of the experiment,

followed by the findings of the experiment.

7.1. Registrars Out of the 107 Registrars associated with the domains in the Registrant sample, we selected a

small subset to register the domains for the experiment (Table 13). We selected Registrars

based on a number of study design parameters that we developed, namely:

Registrars were ordered based on their relative popularity (market share) in our sample

of Registrants, and the most popular ones that satisfied the rest of our design

parameters were selected. This way, we were able to create a test environment that

reflects the experiences of most Registrants.

Each Registrar selected for use in this experiment should allow registration of domain

names in all 5 gTLDs in the study’s scope. This way we would be able to effectively infer

if the measured misuse was affected by the Registrar or by other parameters of the

misused domains.

Each Registrar must allow registrations by individual natural persons. Thus, we did not

test Registrars that provide domain registration services just to businesses. Including

these Registrars would introduce bias to our findings.

Each Registrar selected must allow the purchase of a single domain name, without

requiring purchase of other services for that domain (e.g. hosting).

Each Registrar selected must allow the purchase of domains without us having to reveal

the actual identities of the researchers. We identified one Registrar that required a valid

photo ID of the Registrant; they were consequently omitted from this experiment

because we could not register test domains without having to disclose our identity and

possibly introducing result bias.

We identified three Registrars that only allow domain registration through their affiliated

resellers. These are Enom, Tucows, and Wild West Domains (WWD). In these cases we

tried to identify the resellers used by Registrants in our survey by looking at the name

server information from the domains’ WHOIS records. For example, the response of a

WHOIS query for the domain BEYONDWHOIS.COM 19 states that Tucows is the

domain’s Registrar. However, Tucows does not itself provide the actual domain

19 This domain was not part of our surveyed domains and is only listed here for illustration purposes.

49

registration services. By looking at the name servers in the WHOIS response, we can

identify that the associated domain name server is theplanet.com, which indicates

ThePlanet (now Hover) is likely to be the reseller that was used. Whenever this method

did not reveal the reseller used to register domain names included in our survey, we

randomly selected one of these Registrars’ reseller for use in this experiment.

GoDaddy Network Solutions Dotster Gandi Namecheap (Enom) Brinkster (WWD) Hover (Tucows) Tierra 1and1 Domain People Xinnet Name Joker Gandi Onamae DirectNIC Table 13 List of 16 Registrars (and affiliates) used for experimental domain registration. Together, these Registrars and affiliates cover 77% of those that appeared in our Registrant Survey sample.

7.2. Domain names As part of the experimental study, we studied the relationship between of the type of domain

name and WHOIS misuse. We registered domains that could be associated with the following

categories:

Completely random domain names composed by 5 to 20 random letters and numbers (e.g. unvdazzihevqnky1das7.biz).

Synthetic Domain Names (meaning domain names generated simply for the purposes of this study and registered by us) intended to look like individual persons (e.g. Randall-Bilbo.com).

Synthetic domain names composed by two randomly selected words from the English vocabulary (e.g. neatlimbed.net).

Synthetic Domain names intended to look like businesses within professional categories (e.g., hiphotels.biz).

In defining the characteristics of the last category, we selected a taxonomy of professional

categories that may lend themselves to spear-phishing and targeted spam. Additionally, we

hypothesized that domains that would be targeted could also be in the same categories as

domain name categories that were more likely to be abused. For instance, illicit online

pharmacies might prefer to register legitimate pharmacy domain names by harvesting related

WHOIS information. Thus, by registering pharmacy related domains, we hypothesized that we

might possibly observe higher rates of WHOIS misuse.

50

To this end, we consulted both APWG’s report on “Phishing Activity Trends” (APWG, 2011) and

the spam mailbox of this report’s authors. From the first source, we extracted the professional

categories that were mostly targeted by spam and phishing in the last quarter of 2010 with

percentages of more than 4% in total. More specifically these categories are: Financial services, Payment services, Gaming, Auction and Social networking. From the second

source, based on the kind of spam messages we usually receive (subject and sender) we

qualitatively decided to include professional categories related to medical services, medical equipment, hotels, traveling and delivery and shipping. We also defined three control

categories, which serve to verify that the above categories are specifically targeted or that they

are just general recipients of spam. The three categories are technology, education and

weapons.

7.3. Registrants associated with domains All registered gTLD domain names are associated with a Registrant Name and/or Registrant

Organization that links the domain name with its beneficial domain user, or with a legal proxy of

the beneficial domain user. This Registrant information appears publicly in WHOIS, and this

experimental study was designed to test whether the data associated with a registered gTLD

domain name is misused. Therefore, for the purpose of the experiment, we created artificial

Registrant Names, one for each domain name we registered. The ultimate goal was to be able

to associate an observation of misuse with a single domain, or with a specific set of domain

names under a specific gTLD, registered at a specific Registrar. A WHOIS record is comprised

of the following pieces of information: Registrant name, postal address, phone number, and

email address. In the following sections we discuss the design details in producing each one of

those.

Names of Registrants In generating artificial names comparable to names of real persons, we randomly glued together

an extensive list of common male and female names, with an extensive list of common last

names. There is no reuse of first name – last name combinations, so that we generated 400

distinct names, which serve as a unique association between a domain and the Registrant.

Email addresses For each domain that we registered, we set the DNS MX records to forward the requests,

through a mail proxy server, to an email server under our full control. The benefit of running our

51

own mail server is that we can completely control its behavior, disabling any spam filters that

would prevent us from collecting spam email. This email server acted as an aggregator for all

the domain names that we registered. For the purpose of anonymity we rented a virtual server

with Linode.com, which acted as a mail proxy to our email server. This mail proxy server

allowed us to conceal the fact that the mail server is running on a machine at CMU, aggregating

both solicited and unsolicited email sent to all test domains.

Physical addresses We initially put a lot of effort in finding a service that would enable us to acquire a number of

residential addresses for use with each of the registered domains.

We looked into international, as well as various national postal forwarding services. However we

were unable to find a suitable service. First, in all the countries we surveyed (including the US)

these services often require identification prior to opening a mailbox (e.g., form 1583 in the US),

and limit the number of recipients that can receive mail at this mailbox. Moreover we were

hesitant to trust mail-forwarding services from privately owned service providers. The reason for

the mistrust is that the individuals or companies providing mail-forwarding services may

themselves misuse the postal addresses and therefore contaminate our experimental results.

We decided to register three PO boxes within the US; several domains shared the same PO

box (but had different “contact names”). PO boxes can use street addresses appearing as a

residential address instead of as a PO box. This service (called “street addressing”) typically

uses the street address of the post office branch in which the PO box is situated. We provided

these street addresses when supplying contact information for inclusion in WHOIS data.

PO boxes are typically bound to the name of the person who registered them. We performed

multiple tests on the functionality of the PO boxes to see if, in practice, this was enforced. We

sent letters addressed to random names using the standard PO box addressing format as well

as the street addressing format. The purpose was to see if we would receive the letters without

any problem since the postal mail addressee’s name would not be listed as one of the owners of

the PO box. The letters were received successfully, which was a good indicator that other letters

addressed to any of the artificially created names associated with these mailboxes will likely be

accepted by the post offices (provided the volume of such mail remains low). Interestingly, we

acquired two PO boxes in California, but the same tests that worked at the other locations failed

in CA, rendering them unsuitable for the study.

52

Each one of the three physical addresses was associated randomly with an equal probability of

selection. In other words, each address is used by 33.3% of all these experimental domains.

Phone numbers We used Skype Manager to produce phone numbers that were associated with the WHOIS

records of the experimental domains. We used a separate number for each group of domains

within the same Registrar, registered under the same gTLD. For example all COM domains

registered with GoDaddy shared the same phone number. With this design there was a risk that

we would be unable to associate a spam voice call with a single domain name. Indeed, a phone

spammer may not necessarily identify a domain name or a person that he or she is calling for.

In this case, since the person/domain name acts as a unique identifier, association with a single

domain would be impossible. However, we can still compare the level of misuse within the same

gTLD across different Registrars and across gTLDs within the same Registrar. Moreover, this

re-use keeps phone costs within reason; on the other hand, registering a separate phone

number for each registered domain would have almost doubled the platform setup cost of the

experiment.

The numbers associated with each Registrant had an area code that matched the location of

the associated PO box.

7.4. Registering domains We registered in total 400 domain names across the top 5 gTLDs (.COM, .NET, .ORG, .BIZ,

and .INFO) and the 16 Registrars in Table 13. Before registering the domains, we generated

400 unique Registrant combinations for use with each of the domains. Whenever the Registrar

required the inclusion of an organization as part of the Registrant information, we used the

name of the domain’s Registrant, regardless of the category of the domain name being

registered (i.e. none of the synthetic business name category were registered with a synthetic

business name as its associated organization).

Given the parameters described above, each Registrar was assigned a group of 25 domains.

Each group is distributed evenly across the 5 types of domain names (person name, random

name, synthetic, professional category, control category) and the top 5 gTLDs. In other words,

with each Registrar we registered 5 domains under each gTLD, and the 5 domains consisted of

1 registration per category type. As all the domains under a single combination of gTLD and

53

Registrar are assigned the same phone number, we utilized 80 Skype numbers for the duration

of the experiment.

For example, given a Registrar R, we registered five domains with each of the five categories of

domain names. Each set of 5 domains has one domain under the five

gTLDs; .COM, .NET, .ORG, .BIZ, and .INFO. We created a total of five phone numbers, one for

each of the five gTLDs, and reused them across the different domain category types. Also

domains in each of the five gTLDs were associated randomly with one of the three PO box

addresses. Table 14 provides an example of the set of information required to complete the

registrations with one Registrar. In this experimental study, we used 16 blocks of information

similar to the one presented in Table 14. For each domain, we used the same Registrant

Names, postal/email addresses, and phone numbers for all types of WHOIS contacts (i.e.

Technical and Billing contacts).

The design of the experimental study differs from the design of the Registrant sample selection

in terms of the proportions used to select/register domains. In selecting the Registrant sample,

we utilized the method of proportional probability sampling, with proportions selected to be

equal to the ones on the Internet (see Table 2). On the other hand, the methodology we used to

register domains for the experimental study is similar to an equal probability sample, as there

was an equal number of domains registered in each gTLD. This design choice was motivated by

a desire to balance the costs of running the experimental study while retaining scientific

meaning. We will relate the rates of measured WHOIS misuse with the reported rates of WHOIS

misuse in Section 7.7.

7.5. Duration of the experiment We started registering domains at the last week of June 2012, and we completed the

registrations four weeks later. The main difficulty that we faced was the time required to

manually register the 400 domains in different registration environments (Registrars); little to no

automation was available across such a range of Registrars. The experiment lasted six months,

ending in the last week of January 2013. All experimental domains were registered using

commercial services offered by Registrars (i.e., we did not use free solutions such as those

provided by DynDNS), and none of the experimental domains was suspended or deleted during

this test period.

54

Domain name Domain category gTLD Registrant name and

organization Phone number PO box address Contact email

theo-lovell person name com Yvonne Beverly pn1 PO1 [email protected] farouk-head

net Miek Luo pn2 PO1 [email protected]

neville-llewellyn org Hilda Lucas pn3 PO3 [email protected] sedat-brandon info Sidney Charizard pn4 PO2 [email protected] hubert-germaine biz Vivek Christian pn5 PO3 [email protected] MoK8XlJ7BD random name com Izumi Brooke pn1 PO1 [email protected]

w6ilHlOhVy4PuO8s3gU8 net Colin Yushchenko pn2 PO1 contact@ w6ilHlOhVy4PuO8s3gU8.net

X6fIq96VvTae org Kinch Dana pn3 PO3 [email protected]

frTIg6FfxOWZTe5DL9Xgu4 info Tyler Gill pn4 PO2 contact@f rTIg6FfxOWZTe5DL9Xgu4.info

6TkOqIg

biz Kirk Xuereb pn5 PO3 [email protected] shescoundrel synthetic com Donna Langley pn1 PO1 [email protected] screwturned net Sharon Gasparian pn2 PO1 [email protected] lifethirsting

org Bonnie Addison pn3 PO3 [email protected]

steamertraffic info Trevor Ryan pn4 PO2 [email protected] gazellebrown biz Alexis Chandler pn5 PO3 [email protected]

pediatrictherapyequipment prof categories com Hein Clayden pn1 PO1 contact@ pediatrictherapyequipment.com

hotelspell

net Pandora Angelopoulou pn2 PO1 [email protected]

chiropractictherapyequipment org Chet Miyazaki pn3 PO3 contact@ chiropractictherapyequipment.org

chattanoogatherapyequipment info Barrio Bruce pn4 PO2 contact@ chattanoogatherapyequipment.info

hiphotels

biz Stevan Stratford pn5 PO3 [email protected] techdaft control com Liyuan Thornton pn1 PO1 [email protected] teachreel

net Molly Tattersall pn2 PO1 [email protected]

techyank

org Vicki Stoner pn3 PO3 [email protected] weaponsmob info Dewey Fermi pn4 PO2 [email protected] fastweapons biz Mechael Mereon pn5 PO3 [email protected]

Table 14 Example of domain registration details for a single Registrar. Identical information was used for all types of contacts (e.g. Technical and Billing)

55

7.6. Breakdown of the collected instances of misuse We next report the level of WHOIS misuse we experienced. More specifically, we report the

amounts of postal, email, and voicemail spam we observed, and we try to characterize different

types of spam within each set. We also analyze the email spam we collected to characterize the

incidents of phishing and malware distribution.

Postal address misuse As explained in previous sections, we operated three post office (PO) boxes in the state of

Pennsylvania, which we associated randomly with the artificial Registrant identities. The PO box

addresses used were not published in combination with our test domain name in any other

public directory, other than WHOIS. We monitored the contents of the PO boxes biweekly from

June 2012 until January 2013. We categorized the content either as generic spam or targeted spam. We placed mail in the first group if the receiver was not explicitly mentioned by name. A

common example in this category is mail addressed to the generic “PO Box holder.” Two out of

the three boxes would receive this kind of spam mail periodically, and this was observed with

every inspection of those boxes. In addition, there were cases where we would receive postal

mail addressed to a name that was not matching any of the Registrant names associated with a

specific PO box. A reasonable explanation for these instances is that previous owners of the PO

boxes would still have mail sent to that location. This kind of spam email is still considered

generic spam, and was observed in one of the PO boxes.

TLD Domain name category Purpose of postal mail

COM Professional (auctions) SEO services

NET Person name SEO Services

ORG Person name Product offer

INFO Professional (auctions) Shipping services

Table 15 Observed postal spam attributed to WHOIS misuse. First two rows (same color) represent same Registrar.

We received in total four pieces of postal mail that we classified as targeted WHOIS spam

(Table 15). Two out of four were from the same company; they were received in the same

collection period, and were both dated September 14th 2012. The purpose of both letters was to

sell advertising services for the domain names. The company collects a one-time fee of $85

56

USD, in exchange for submitting the domain names to search engines and performing search

engine optimization (SEO) on the domains.

Both domains subjected to this postal misuse were registered using the same Registrar. The

third piece of postal mail spam was received from a Registrar towards the end of the experiment

and targeted a domain registered with a different Registrar. The purpose of the letter was to

enroll the recipient in a membership program that provides easy means of sending postal mail

without the need to interact with the US post office. The fourth piece of postal mail spam was

received very close to the end of the experiment and offered a free product in exchange for a

website sign-up.

Surprisingly, the third PO box only received three pieces of generic spam throughout the

duration of the experiment.

Overall, the volume of targeted WHOIS postal spam is very low (4 pieces, 10%), compared to

the 34 pieces 20 of generic postal spam (90%). However, this may be due to the small

geographical diversity that we were able to achieve.

Email address misuse Each of the 400 domains we registered for the purpose of this experiment has a set of published

and unpublished email addresses. A published email address is of the form of

contact@domainname (e.g. [email protected]) and is listed only in the WHOIS record of

each domain. However, any email sent to a different recipient under the same domain (e.g.

[email protected]), will still be collected for later analysis; all such email addresses are

deemed “unpublished” addresses, since they are not advertised anywhere, including WHOIS.

By collecting unsolicited emails sent to both published and unpublished addresses, we are able

to provide a meaningful comparison of WHOIS-related spam, and generic (random) spam.

To classify incoming email either as solicited email or as unsolicited bulk email (spam), we used

the definition of spam offered by (Spamhaus.org, 2013). In short, an email is classified as spam

if it is unsolicited, and the recipient has not provided explicitly his consent to receive such email.

We adapted this definition to our experiment, by considering email originating from each

domain’s respective Registrar as not spam, while any other email is classified as spam. Indeed,

in many cases, the contract between Registrar and Registrant, which is established upon

20 (30 weeks x 1 piece of spam x 2 PO boxes) + 1 of spam from third PO box

57

registering a domain, gives permission to the Registrar to send informative emails. Since the

Registrant enters freely into this agreement (and can exit freely) by providing the published

email address to receive such notifications, we did not consider email received at the published

addresses from the Registrar as spam. We identified email originating from a domain’s

Registrar by looking at the email headers, extracting the domain part of the sender’s email

address, and comparing this string with the recipient’s respective Registrar.

Throughout the experiment, published email addresses received 7,609 unsolicited emails out of

which 7,221 (95%) were classified as spam (Figure 11). Of the total 400 domains, 95%21

received unsolicited emails in their published addresses with 71% of those receiving spam email

(Figure 12). Interestingly, 80% of spam emails collected during this experiment were addressed

to the 25 domains of a single Registrar (Registrar 13). As all the domains across all gTLDs

registered with the specific Registrar are equally affected by WHOIS misuse, this observation

does not affect the statistical validity of the results we present, except when it is explicitly stated

herein.

All 1,872 emails received at the unpublished addresses were classified as spam22, and they

were targeted to 15% of the domains23. This observation is a consequence of the definition of

spam we use; since the unpublished addresses are not listed in any public directory and not

shared in any way, all emails received are unsolicited, and therefore counted as spam. Out of

our 400 domains, two specific domains received a disproportionate amount of spam emails in

their unpublished mailboxes. We ascribed this to the possibility that 1) these domains had been

previously registered, and 2) previous owners of those domains were targeted, so that we

inherited the misuse along with the domains. It is thus highly plausible that the misuse

experienced there had a source different from WHOIS misuse. Looking at historical WHOIS

records24 confirmed that both domains had been previously registered (12 years prior, and 5

years prior, respectively) which lends further credence to our hypothesis.

21 All the received emails were unsolicited, and some of them were classified as spam, based on the definition. Therefore, this number is composed by the proportion of domains receiving unsolicited emails and the proportion of domains receiving spam. 22 None of those emails originated from a domain’s Registrar. 23 85% of the domains did not receive any emails at their unpublished email addresses. 24 http://www.domaintools.com/research/whois-history/

58

Figure 11 Breakdown of the emails collected across all domains, based on their classification.

Figure 12 Breakdown of experimental domains based on the emails they receive. The difference between public and private addresses in receiving email spam is statistically significant.

5% 95%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

In published addresses

In unpublished addresses

In published addresses In unpublished addressesTotal emails not classified as

spam 5% 0%

Total emails classified as spam 95% 100%

24% 71%

15%

5%

85%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

In published addresses

In unpublished addresses

In published addresses In unpublished addressesDomains receiving emails not

classified as spam 24% 0%

Domains receiving spam 71% 15%Domains not receiving any

unsolicited email 5% 85%

59

Domain grouping Number of domains / Total domains (in category) Published Addresses Unpublished addresses

gTLD Received email but no spam With spam Received email

but no spam With spam

COM 22/80 48/80 0/80 18/80 NET 19/80 52/80 0/80 11/80 ORG 16/80 45/80 0/80 11/80 INFO 20/80 62/80 0/80 15/80 BIZ 20/80 75/80 0/80 5/80 Domain name category

Control 20/80 58/80 0/80 12/80 Synthetic 17/80 56/80 0/80 13/80 Person name 21/80 51/80 0/80 10/80 Random 19/80 57/80 0/80 10/80 Professional categories 20/80 60/80 0/80 15/80 Registrars Registrar 1 0/25 23/25 0/25 1/25 Registrar 2 0/25 23/25 0/25 1/25 Registrar 3 2/25 4/25 0/25 3/25 Registrar 4 0/25 5/25 0/25 3/25 Registrar 5 21/25 24/25 0/25 8/25 Registrar 6 0/25 19/25 0/25 16/25 Registrar 7 0/25 25/25 0/25 0/25 Registrar 8 4/25 20/25 0/25 1/25 Registrar 9 0/25 18/25 0/25 7/25 Registrar 10 0/25 13/25 0/25 0/25 Registrar 11 0/25 23/25 0/25 0/25 Registrar 12 22/25 17/25 0/25 6/25 Registrar 13 25/25 25/25 0/25 13/25 Registrar 14 22/25 16/25 0/25 0/25 Registrar 15 0/25 11/25 0/25 1/25 Registrar 16 1/25 16/25 0/25 0/25

Table 16 Breakdown of collected email based on the nature of the targeted email address (i.e. published or not), the gTLD, the type of the domain name, and the Registrar.

60

In Table 16 we provide a breakdown of the collected emails based on the domain gTLD, the

domain name category, and the Registrar of the domain.25 The middle column shows the

domains receiving email in their published mailboxes, and the right column domains receiving

email in their unpublished mailboxes. Each of the two columns is further divided between

domains receiving email other than spam, and domains that received spam. Obviously, none of

those categories are mutually exclusive. Each fraction in the table represents the number of

domains existing in a specific category, out of the possible total number of domains in the

category. For example, of the 80 .COM test domains, 48 received spam at a WHOIS published

email address and 18 received spam at an unpublished email address. 22 of those .COM test

domains received non-spam email at the WHOIS-published address.

As expected, looking at the unpublished mailboxes of all domains in all categories we only

observe non-WHOIS-misuse spam. Across all categories there is a seemingly higher

occurrence of spam email in the published mailboxes compared to the unpublished ones. Using

a chi-square test, we find that the difference in proportions of received spam between published

and unpublished addresses is statistically significant when considering the gTLD (p < 0.05) and

the Registrar (p < 0.001), but not the domain name category (p > 0.05). In other words, WHOIS

misuse is present at measurable, statistically-significant levels (as shown by the difference

between published and unpublished addresses receiving spam); domain name category does

not seem to impact the amount of misuse, while the choice of gTLD and Registrar can increase

the occurrence of WHOIS attributed email misuse rate.

In Section 8.2 we study in detail which parameters of a domain (e.g. price of registration, gTLD,

anti-harvesting techniques employed by the Registrar/Registry, etc.) affect the rates of WHOIS

attributed email misuse.

Attempted malware delivery We used VirusTotal26 to scan all collected files received as email attachments during the first

four months of the experiment, and to detect malicious software. There is a great variety of

malicious software (malware) that can infect any computer, and which can place the infected

computer under the control of an attacker. For example, so-called “backdoors” can grant the

25 This experiment does not aim to identify specific Registrars, but to look for patterns affecting WHOIS misuse. Therefore, the name of the Registrar is not explicitly provided, and we instead offer anonymized identities. 26 https://www.virustotal.com/

61

attacker unrestricted remote access to the infected computer. The attacker may use the

backdoor, for example, to steal passwords or personally identifiable information. We followed, in

this respect, ICANN’s Terms of Reference for the various WHOIS studies (ICANN, 2009) in

which the existence of malware in email spam is associated with attempts for identity theft.

In total, we received 496 email messages with any type of attachments, with only 10 of those

targeting published email addresses. These attachments were sent to 10 distinct domains

registered with the same Registrar, and were sent by the same sender. However, all 10

attachments were innocuous, with the content being some form of newsletter.

Of the 486 attachments that were sent to the unpublished email addresses, 76 were found to

contain malware. The recipients of these infected emails were three of our experimental

domains. The analysis of the malware indicated that the 76 infected attachments were

associated with 12 well known families of malware, with 10 being different variants of Trojans.

However, none of the infected attachments targeted any of the published email addresses, and

as such we did not observe any WHOIS attributed malware delivery. This is in line with the

findings of the Registrant survey.

Phone number misuse As we experience in our daily lives, we often receive phone calls that were intended for a

different recipient, either because of misdialing or because the caller has the wrong contact

information of the person they are trying to reach. These cases, while they represent unwanted

calls, can hardly be classified as WHOIS originating spam. On the other hand, if the call is

unsolicited and the caller offers Internet services (e.g. website development) or is starting a

discussion about a domain name, then we can, with reasonable assurance, associate the call

with WHOIS misuse. There are also instances where the call is unsolicited, and the caller offers

services unrelated to WHOIS. However it is unknown if the caller harvested the number from

WHOIS, or if it was obtained in some other way (e.g., exhaustive dialing of known families of

phone numbers). The experimental design did not involve registering additional unpublished

phone numbers (similar to the private email addresses in the previous section), and, therefore,

we cannot compare the findings of this section to a baseline voicemail spam rate.

In the context of this experiment, we define voicemail spam associated with WHOIS misuse as

any voicemail that has intelligible content and the content makes reference of a domain name or

Internet related services, or if the caller states that he found the number online. Voicemails that

do not fall in this category are either categorized as not spam (e.g. misdialing), or as possible

62

spam (i.e. spam not clearly associated with WHOIS). There is a special case where the caller

makes mention of the name of the person they are trying to reach. In this event, we cross

checked our database of experimental Registrant identities, and if there was a match, the

voicemail was automatically classified as spam, regardless of the content. Voicemails that had

no content or where the content was not comprehensible are shown below but classified

separately from voice spam and non-spam.

We present the overall classification of the received voicemails in Figure 13. We collected in

total 674 voicemails throughout the experiment, and we classified 6% (39) as spam, 15% (102)

as not spam, and 4% (28) as possible spam. An additional 38% (256) of voicemails contained a

recorded message inviting the recipient of the call to “press one to accept”. We started receiving

this type of voicemail on a daily basis, several times a day, starting during the second month of

the experiment. All these voice messages – but one – were directed to a single number we used

in the WHOIS records of the .NET domains we registered at one Registrar. Even though the

content was not adequate to characterize these messages, the persistency indicates that there

is no randomness and we therefore placed them in a special category: interactive spam. Finally,

37% (249) of voicemails were not classified due to the lack of content.

Figure 13 Characterization of 674 collected voicemails. The 2 categories on the right represent WHOIS-attributed misuse phone number misuse.

Of the 39 pieces of voicemail spam, 77% (30) had the same caller and were originating from the

same company selling website advertising services. This caller placed two phone calls in each

of the numbers, one as an initial contact and one as a follow up. The caller targeted .BIZ

38% 37%

15%

6% 4%

0%

5%

10%

15%

20%

25%

30%

35%

40%

Interactivespam

Blank Not spam Spam Possible spam

% o

f col

lect

ed v

oice

mai

ls

Voicemail class

63

domains registered with five Registrars, .COM domains registered with four Registrars,

and .INFO domains registered with six Registrars. In total, domains registered with 11 out of the

16 Registrars used in the experiment, received this call.

The remaining spam voicemail targeted .BIZ domains registered with four Registrars, .COM

domains registered with three Registrars, and .INFO, .NET, and .ORG domains associated with

1 Registrar each. In one case we observed a particularly elaborate attempt to acquire personally

identifiable information

In Figure 14 we present a breakdown of domains receiving voicemail spam based on the gTLD

of the domains. .COM, .INFO, and .BIZ received 93% of spam voicemail, with the other 6%

equally divided between .NET and .ORG domains. Overall, 30% of all domains, registered with

14 out of 16 Registrars were affected by WHOIS-originated voice spam misuse.

Figure 14 Breakdown of domains receiving voicemail spam per gTLD.

In Section 8.3 we study in detail which domain characteristics (gTLD, domain category,

Registrar, and domain price) affect the WHOIS attributed voicemail spam rates.

Other types of misuse We next briefly discuss our findings considering the other types of WHOIS misuse covered in

the Registrant survey.

Throughout the experimental period we did not detect any unauthorized intrusion or attempts at

Denial of Service attacks to the servers involved in the experiment. It is possible that using

artificial Registrant identities and a proxy to hide the real IP address of the servers acted as a

deterrent to such attempts. However, we cannot validate this hypothesis.

.COM, 23%

.NET, 3%

.ORG, 3%

.INFO, 33%

.BIZ, 38%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

1

% of domains receiving voicemail spam

gTLD

64

We did not observe any blackmailing attempts through the inspection of voicemails, or postal

mail. On the other hand, we could not analyze the contents of each spam email, due to their

sheer volume, and the probabilistic nature of automated detection methods.

Similarly, we did not actively look for cases of identity theft, as this would have necessitated

significantly greater resources to discover.

7.7. Overall experiment incidents of WHOIS misuse In Table 17 we present the proportions of experimental domains experiencing misuse based on

the harmful act, grouped by their respective gTLDs. The three measured harmful acts are listed

from left to right in a descending order of reported impact (see Figure 8). It is evident that email

WHOIS misuse impacts most of the domains. Voice spam (Phone number) WHOIS misuse

comes right after email misuse in terms of frequency, and postal spam (Postal address) WHOIS

misuse comes last.

Since, as stated in Section 7.4 above, each phone number used in the experimental study is

associated with the five domains registered under the same gTLD and Registrar, we cannot

directly derive the exact amount of phone number misuse each domain experiences. However,

we can get a lower bound on the amount of phone number misuse by considering that each

instance of misuse affects only a single domain (lower bound); and an upper bound by

assuming that all five domains associated with that specific number are targeted. Since the true

value lies somewhere in between, we report both values in Table 17 and we use the average for

further analysis.

TLD Test domains experiencing WHOIS misuse (by Type of harmful act27)

27 None of the experimental domains experienced any incident of malware delivery or identity theft.

65

% Domains experiencing phone # misuse

% Domains experiencing email misuse

% Domains experiencing postal misuse

BIZ 30%

(max 50%, min 10%) 94% 0%

COM 15%

(max 25%, min 5%) 60% 1%

INFO 23%

(max 38%, min8%) 78% 1%

NET 4%

(max 6%, min 1%) 65% 1%

ORG 4%

(max 6%, min 1%) 56% 0%

Total 5% 71% 1%

Table 17 Portion of domains affected by WHOIS misuse based on TLD and type of harmful act. The harmful acts are ordered based on the reported impact (from the Registrant survey) in decreasing order from left to right.

The proportions in the table above do not take into consideration the differences in gTLD

distribution in the Internet, presented in Table 2. Therefore, while merely presenting our findings,

we do not allow for meaningful comparison of the measured WHOIS misuse with the misuse

reported by the Registrars in their survey responses. In Section 8.1 we offer a comparative

analysis of the empirically measured WHOIS misuse, and the reported misuse from the

Registrant survey, weighting the empirically measured WHOIS misuse appropriately.

7.8. Discussion We found evidence that WHOIS data publication contributes measurably to the misuse of

personal information of Registrants. Our experimental domain names’ postal addresses, email

addresses, and phone numbers listed in WHOIS were misused by third parties to advertise

66

unsolicited services. None of our experimental domain names experienced attempted malware

delivery or identity theft.

The amount of postal spam we received is too low to assess the significance of the parameters

associated with a domain name (such as professional category of domain name) in relation to

the possibility of receiving postal spam. However, the amount of email and voicemail spam

attributable to WHOIS misuse was notably higher than other (non-WHOIS-misuse) email and

voice spam measured for these domains. We were also able to infer that both the choice of

gTLD and the choice of Registrar were statistically significant in the rate of email spam

attributable to WHOIS misuse. In section 8 we perform a more in depth analysis of the extent to

which gTLDs, Registrars, anti-harvesting measures, and other domain parameters impact the

rates of WHOIS-related misuse.

Given the design of this experiment it would be possible to state that there is a causal

relationship between the public availability of personally identifiable information in WHOIS and a

Registrant’s experience of spam email and voicemail. However, there is one additional

explanation for the WHOIS misuse than cannot be controlled: Registrars may be providing

Registrant information collected during domain registration to third parties (e.g., through bulk

WHOIS access, possibly through private communication with resellers). Investigating this

possibility is out of the scope of this study.

67

8. Comparative result analysis In this section we provide a holistic analysis, combining data from different sections, to assess

the hypothesis of there being statistical significance between WHOIS publication of and

apparent third-party misuse of Registrant personal information.

8.1. Correlation between measured and reported incidence of misuse

In Section 5 we collected the experiences of Registrants related to the misuse of their personal

information that they attributed to WHOIS publication. Later in Section 7, we presented the

experimental findings that showed that there is indeed measurable misuse attributable to the

availability of such information in WHOIS. In this section we compare the reported rates of

misuse with the measured rates.

Figure 15 Comparison of measured vs. Registrant reported WHOIS misuse rates. The reported rates include error bars representing the 12.5% error rate.

We did indeed measure instances of all three kinds of WHOIS misuse reported by the majority

of the Registrars: phone number, postal address, and email address misuse. In Figure 15, we

present the overall measured WHOIS misuse per type of harmful act, taking into consideration

the gTLD global shares as provided in Table 2. We contrast the measured rates with the

equivalent rates collected from the Registrant survey presented in Table 7. More specifically, we

consider the portion of Registrant responses that indicated having experienced WHOIS

Email addressmisuse

Phone numbermisuse

Postal addressmisuse

Measured misuse rate 62% 14% 1%Reported misuse rate 14% 9% 21%

0%

10%

20%

30%

40%

50%

60%

70%

68

attributed misuse, while claiming they had not published the allegedly misused information

anywhere other than WHOIS. We decided to compare this portion of responses with the

measured misuse rates, since it matches our experimental condition—we too only published the

misused information in WHOIS but not anywhere else.

In the cases of email and voicemail spam, we observe that the measured experimental misuse

rates are higher than the Registrant-reported rates. However, we measured lower rates of

postal spam in our experiment than were reported by Registrants. The differences between the

measured and reported misuse rates are statistically significant in the cases of email and postal

address misuse, while in the case of phone number misuse, they are not.

There are a few caveats that should be noted as they may have introduced measurement

biases, possibly affecting the observed differences.

Overall, the low response rate of the Registrant survey, as explained elsewhere, may have led

to inaccurate levels of reported WHOIS misuse. The experimental study, on the other hand, was

conducted in a systematic way and on a larger scale than the final turnaround of the Registrant

survey (400 domains vs. 57 domains). In addition, we can be certain that the Registrant

information we used to register the experimental domains have not been published in any other

directory. Similar statements made by Registrants could not be verified with the same level of

certainty.

More specifically, the significant difference in postal spam might be attributed to the limited

geographical diversity of the experimental PO boxes, in combination with the heavy reuse of

postal addresses by our artificial Registrant identities. The aforementioned experimental design

decisions may have driven the measured misuse to lower rates. Additionally, the possible

inability of some Registrants to distinguish WHOIS from non-WHOIS originating misuse may

have resulted in an overestimation in the reported rates. However, considering the frequency of

misuse (e.g. “A few times a year”) and the content of spam mail (e.g. SEO services), the

experimental findings are in line with the reported incidents. Moreover, the duration of the

experiment (6 months) is possibly not adequate to allow for extensive harvesting of postal

addresses. In support of this argument, it is noteworthy that the postal address of a domain we

registered in January of 2012, as part of a pilot of the experimental infrastructure, received its

first WHOIS-attributed spam postal mail 8 months after registering the domain.

Similarly, as discussed in Sections 7.3 and 7.7, the experimental design allows us to measure

occurrences of voicemail spam for groups of domains, registered under the same gTLD and

69

Registrar. Following up on the arguments offered in Section 7.7, we observe an average

weighted (based on gTLD proportions) voicemail spam rate of 14%, which is very close to the

reported rate. It is possible that the Registrants were especially accurate in reporting this type of

misuse, since they were immediately able to recognize it in terms of adverse effects.

Looking at the incidents of email misuse, we observe a difference of 45% between the

measured and the reported rates. This difference can be attributed to the strict definition of

spam that we adopted in this study, which may have led to the overestimation of the measured

spam rates. Looking at the frequency of WHOIS-related spam, the majority of Registrants (35%)

reported daily occurrences, with 30% reporting a frequency of a few times per week (the

difference of 5% is within the margin of error). Our measurements showed that most of the

spam-receiving domains received just one piece of spam, but the majority of them that received

more than one would receive spam every 10 days on average. This frequency is similar to the

one reported by the Registrants. However, the difference in frequency between the reported and

observed rates can be ascribed to the Registrants’ possible difficulty in identifying WHOIS-

originated email spam (and as such leading to Registrants’ reporting only of spam they believed

to be WHOIS-originated), and to different perceptions of what may constitute spam.

Regarding other types of WHOIS misuse (e.g. identity theft, server intrusion), the findings from

the Registrant survey are fully supported by the experimental study. There was no reported or

experimentally measured WHOIS-attributed misuse, other than the misuse types we already

discussed.

8.2. Domain characteristics affecting email address misuse In Section 7.6 we showed that email address misuse is affected by the gTLD of the domain and

the choice of Registrar in a statistically significant way. This analysis did not consider possible

correlations between the independent variables, i.e. the domain price, the existence of anti-

harvesting techniques, the gTLD, the domain name category, and the Registrar. In this section

we try to disentangle the effect all five may have on the prevalence of misuse.

Price of purchasing an experimental domain name

The price ranges of the 400 domains we purchased for one year are presented in Figure 16.

These ranges are distributed across 35 price levels, and it is noteworthy that no Registrar

offered a domain in the range of $19.96 to $34.98.

70

Because the dependent variable is binary in the case of email address misuse we use a

multivariate logit regression. The results of the regression (Table 18) show that the domain price

is statistically significant and negatively correlated with the WHOIS misuse. This coefficient

means that each $1 increase in the price of an experimental domain28 corresponds to a 15%

decrease in the odds of the Registrants experiencing misuse of their email address.29 In other

words, the more expensive the registered domain is, the less email address misuse the

Registrant experiences. Note, however, that none of our experimental domains were in use at

the time we registered them; none required purchase from another Registrant at above-retail

prices.

Figure 16 Observed minimum and maximum price, per TLD at the 16 Registrars we used for the experimental domain registrations. In total we observed 35 price levels.

Coefficient Odds Std. Err. Significance

Domain price -0.166 0.846 1.376 p < 0.001 Table 18 The logit regression shows that for every $1 increase in the price of a domain, there is a 15% less chance of experiencing WHOIS attributed misuse of the Registrant’s email address.

gTLD of experimental domain name

28 registered at the lowest possible retail prices offered by each Registrar 29 For instance, if domain A costs $10 to register, and domain B costs $12 to register, and if domain A has a chance of experiencing email address misuse of 10%, domain B would be expected to have a chance of experiencing misuse roughly equal to 7.2%.

$9.47 $8.59 $7.17

$0.99 $2.99 $0.99

$35.00 $35.00 $38.51 $38.51 $38.51 $38.51

$- $5

$10 $15 $20 $25 $30 $35 $40 $45

COM NET ORG INFO BIZ Overall

Dom

ain

regi

stra

tion

pric

e

Minimum price Maximum price

71

The gTLD, being a categorical variable in the regression, requires a different examination. Using

deviation coding we examined which gTLDs have a statistically significant contribution in the

possibility of receiving email spam attributed to WHOIS misuse. In Table 19 we present our

findings. Green highlighting represents negative correlation with email spam, while red

represents positive correlation. For the domain names included in our experiment, the BIZ gTLD

is highly correlated with spam, while domains under the COM, NET, and ORG gTLDs

experienced less spam.

gTLD Correlation to email spam Rate of change from mean

BIZ Positive (p < 0.001) 21

COM Negative (p < 0.001) 0.3

INFO Not statistically significant -

NET Negative (p < 0.05) 0.44

ORG Negative (p < 0.001) 0.32

Table 19 Overall correlation of TLD with email misuse. Green represents less misuse, while red represents more misuse. Domains under the COM, NET, and ORG gTLDs are less probable to be subject to email misuse, while BIZ domains are more susceptible.

Using dummy coding, we examine in Table 20 how the different gTLDs compare in rates of

observed email address misuse, all else being equal. The gTLDs appearing in the columns are

the point of reference with which gTLDs appearing in the rows are compared. Cell colors

represent the relative contribution of the point of reference gTLD to another gTLD, and the

contents show the level of significance.30 Green highlighting means that the column gTLD

correlates with less spam than the row gTLD, while red highlighting means that the column

gTLD correlates with more spam than the row gTLD. Grey cells represent statistically

insignificant comparisons. For the domain names included in our experiment, we found that BIZ

domains are correlated with higher instances of email misuse compared to all other gTLDs. The

30 Only statistically significant comparisons are shown.

72

INFO gTLD follows immediately after, and it exhibits lower potential for email address misuse

only when compared to the BIZ gTLD.

COM NET ORG INFO BIZ

COM p < 0.05

(0.26) p = 0

(0.85)

NET p < 0.1

(0.38) p = 0

(0.02)

ORG p < 0.05

(0.28) p = 0

(0.015)

INFO p < 0.05

(3.8)

p < 0.1

(2.6)

p < 0.05

(3.52)

p = 0

(0.05)

BIZ p = 0

(71)

p = 0

(48.3)

p = 0

(65.4) p = 0

(18)

Table 20 Comparison of gTLDs in terms of contribution to WHOIS-attributed email misuse. Columns are the reference TLDs and the color indicates if they contribute more (red) or less (green). P-value is shown where statistically significant. The numbers in parentheses show the rate of change of the conditional mean with respect to a specific value of the categorical variable at a corresponding row.

Category of experimental domain name

Using deviation coding we identified one category with statistical significant correlation to email

misuse. Domains denoting a person name (like randall-bilbo.com) are negatively correlated to

misuse (p < 0.05) – that is, the possibility of experiencing email address misuse is 37% less

than if the domain name had a different format. Other categories examined in our experiment

(e.g., randomly-generated names, synthetic business names) do not appear to have a

statistically significant role.

This appears to be an important result. However, we point out that all the domain names we

registered with the aim of denoting individuals contain a hyphen, while none of the domain

73

names we used for the other categories do. It is unclear whether the statistical differences

observed are due to the domain names denoting a person, or because they contain a hyphen.

Anti-harvesting applied to experimental domain name

The existence of anti-harvesting techniques was encoded as a dichotomous categorical variable

that denotes the existence or not of any anti-harvesting techniques for each experimental

domain name. While the Registrars and Registries selected for this experiment employ a variety

of parameters in WHOIS port 43 and web form rate limiting, we chose this simple binary coding

for simplicity in the statistical interpretation.

Using a logistic regression we find that the existence of anti-harvesting techniques is statistically

significant in predicting the potential of email address misuse. Additionally, the possibility of experiencing email misuse without the existence of any anti-harvesting technique is 2.3 times higher than when an anti-harvesting technique is in place.

Registrar of experimental domain name

We encoded each experimental domain name’s Registrar as a 16 part categorical variable

using deviation coding to measure each Registrar’s deviation from the overall mean value. We

did not find any statistically significant contribution. In other words, the choice of Registrar alone

is not adequate to predict the possibility of email address misuse.

8.3. Domain characteristics affecting phone number misuse We examine the factors that affect the possibility of a Registrant receiving a voicemail in the

three main classes: (a) spam, (b) possible spam, and (c) not spam. We are purposefully not

considering the other two classes (interactive and empty) as they do not present meaningful

outcomes. The factors measured in the experimental study that could affect the type of received

voicemail are the price of a domain, the gTLD, and the Registrar.31 A model that could allow us

to perform this analysis is the multinomial logistic regression. However, multinomial logic

31 The experimental study design did not allow association between a received voicemail and of the domain name category.

74

regressions require a large sample size (i.e. observations) to calculate statistically significant

correlations, which, in the case of our experiment, is not available.

gTLD Correlation to voice spam Rate of change from mean

BIZ Positive (p = 0.002) 7.39

COM Not statistically significant

INFO Positive (p = 0.003) 5.12

NET Not statistically significant

ORG Negative (p < 0.05) 0.1

Table 21 Correlation of gTLD with WHOIS-attributed phone number misuse.

Therefore we reverted to a basic logistic regression by transforming the multiple-response

dependent variable into a dichotomous one. We did this by conservatively transforming

observations of possible spam into observations of not spam. The independent variables, as in

Section 8.2, are the domain price, the gTLD, the existence of anti-harvesting techniques, and

the Registrar. The categorical variables were coded initially using deviation coding to identify

which had an overall statistical significance.

gTLD of experimental domain name

The gTLD was the only variable with statistical significance. Table 21 shows how the five gTLDs

are correlated with the measured WHOIS-attributed phone number misuse. Domains under the

BIZ and INFO gTLDs are correlated with higher misuse, while domains under the ORG gTLD

are correlated with lower misuse.

We also looked into how each gTLD affects the phenomenon of WHOIS-attributed phone

number misuse, in comparison to the other gTLDs. Table 22 presents our findings. Cell colors

represent the relative contribution of the point of reference gTLD to another gTLD, and the

contents show the level of significance32. Green highlighting means that the column gTLD

32 Only statistically significant comparisons are shown.

75

correlates with less voicemail spam than the row gTLD, while red highlighting means that the

column gTLD correlates with more voicemail spam than the row gTLD. Grey cells represent

statistically insignificant comparisons. We found that BIZ and INFO domains are correlated with

higher instances of phone number misuse compared to all other gTLDs.

COM NET ORG INFO BIZ

COM p = 0.01

(0.12) p = 0.001

(0.09)

NET p < 0.05

(0.07) p = 0.01

(0.05)

ORG p < 0.001

(0.02) p = 0.001

(0.01)

INFO p = 0.001

(7.9) p < 0.05

(13.5) p < 0.001

(47.9)

BIZ p = 0.001

(11.4) p = 0.01

(19.4) p = 0.001

(69.1)

Table 22 Comparison of gTLDs in terms of contribution to WHOIS-attributed phone number misuse. Columns are the reference TLDs and the color indicates if they contribute more (red) or less (green). P-value is shown where statistically significant. The numbers in parentheses show the rate of change of the conditional mean with respect to a specific value of the categorical variable at a corresponding row.

Our decision to code possible voicemail spam as not spam may underestimate the extent of

misuse, and therefore the coefficients. However we believe that this is a conservative approach

that prevents possible false positives from being considered in our model.

Anti-harvesting applied to experimental domain name

We did not find any statistically significant correlation between use or non-use of anti-harvesting

measures and the rate of phone number misuse observed for our experimental domain names.

Registrar of experimental domain name

76

We did not find any statistically significant correlation between Registrar and the rate of phone

number misuse observed for our experimental domain names.

Price of experimental domain name

We did not find any statistically significant correlation between retail price of an experimental

domain name and the rate of phone number misuse observed for those names.

8.1. Domain characteristics affecting postal address misuse The level of misuse that we observed during the experimental study in terms of postal address

misuse was very minimal as we have discussed in previous sections. Therefore we cannot

provide any meaningful analysis regarding the domain name characteristics that affect this type

of misuse.

77

9. Discussion In this work we undertook a combination of descriptive and experimental studies to examine the

hypothesis that WHOIS-published data leads to a measurable degree of misuse, and to cast

light on the experience of WHOIS misuse from the viewpoint of Registrants, Registrars,

Registries, experts, and law enforcement agencies.

We surveyed 101 experts and law enforcement agents, and the majority of participants (60%)

indicated that WHOIS misuse is usually not considered when investigating security incidents.

This actuality combined with the fact that WHOIS misuse is a real and measurable phenomenon,

reveals that WHOIS is an underestimated vulnerability that experts should consider more

consistently. However, the views of the experts may be affected by underreporting of incidents

of WHOIS misuse and inconsistencies between self-reported WHOIS misuse and actual

(experimentally-measured) WHOIS misuse.

In the few cases where experts were able to report on specific cases of WHOIS misuse (23

cases reported by 18% of participants), the adverse effects were similar to the cases that were

reported through the Registrant survey, and measured through the experimental study (e.g.

postal and email spam). However, a few targeted cases of WHOIS misuse (4 out of 23) had

potentially significant impact (e.g. fraud to extract money); due to their rare occurrence, we were

not able to observe or measure similar cases in the other parts of this study. Finally, the experts

stated that anti-harvesting techniques deployed subsequently did in fact deter reoccurrence of

WHOIS misuse in 11 out of 12 incidents. We made similar observations in the experimental part

of this study, where we identified a statistically significant effect of anti-harvesting techniques in

thwarting WHOIS-originated email spam.

Through our Registrant survey, we were able to gather information about experiences of

WHOIS misuse only from 57 Registrants (out of 1619 invitations to participate, with a target of

340 participants), despite our effort to attract participation by offering a chance to win attractive

prizes. This low response rate (3.6%) demonstrated how difficult such survey-based studies are

to run over the Internet. In addition it serves as a reminder that Internet-based surveys should

be very minimalistic in terms of extent and terminology used. Given the limited turnout of the

Registrars and Registrant survey in this particular study, simply limiting ourselves to the expert

survey and the experiment would have sufficed to achieve the same level of statistical

significance as we obtained in our entire study.

78

In our limited sample, we found that Registrants experienced measurable and statistically significant WHOIS misuse. Specifically, the prevalent types of misuse are

associated with phone numbers, email addresses, and postal addresses published exclusively

in WHOIS. More specifically:

29.8% of surveyed Registrants reported WHOIS email address misuse 12.3% of surveyed Registrants reported WHOIS phone number misuse 29.8% of surveyed Registrants reported WHOIS postal address misuse

No other type of misuse (e.g. identity theft) was reported or measured at a statistically

significant level.

Possibly the most interesting finding of the Registrar and Registry survey was their hesitation

to participate (22 participants out of 111 invitations). This could have been due to concerns over

possible consequences of public disclosure of confidential business practices.

Nevertheless, the survey provides insights on the reported and experienced incidence of

WHOIS misuse. Registrars and Registries reported that WHOIS queries are mainly carried out

through port 43, followed by web forms, and then by bulk purchases. However the latter has the

potential for higher impact in misuse, as the number of WHOIS records exchanged is by

definition very large. Nevertheless, port 43 rate limiting appears to be the most widely adopted

anti-harvesting technique.

We performed rate-limiting tests for the 92 “thin” Registrars in our sample (representing a

combined 77.4% market share in August 2011) and the three “thick” Registries. We found that

54% of the Registrars and Registries we tested do not employ any port 43 rate limiting

technique with the remaining 46% employing some type of rate limiting (e.g. IP blacklisting,

CAPTCHAs, combination of techniques).

Through the experimental study we found statistically significant evidence of WHOIS

originated misuse targeting the email addresses of Registrants. 71% of the 400 experimental

domains experienced email address misuse. More specifically,

94% of .BIZ domains, 78% of .INFO domains, 65% of .NET domains, 60% of .COM domains, and 56% of .ORG domains

were affected by email address misuse attributed to WHOIS.

79

The occurrence of email misuse can be empirically predicted by taking into account the cost of a

domain, the gTLD, and the existence of anti-harvesting mechanisms. When comparing the

contribution of the top 5 gTLDs in predicting the relative occurrence of WHOIS originated email

misuse, the .BIZ domains rank first in being vulnerable to email address misuse.

Considering the misuse of Registrants’ phone numbers – measured in the experimental study

as voicemail spam – we found that 5% of the experimental domains were affected phone

number misuse, with the following breakdown per gTLD:

30% of .BIZ domains, 23% of .INFO domains, 15% of .COM domains, 4% of .NET domains, and 4% of .ORG domains.

There is a statistically significant correlation between the choice of gTLD and the WHOIS-

attributed phone number misuse. As with email spam, BIZ and INFO gTLDs are correlated with

more misuse, while ORG is correlated with less misuse. However, we found no relationship

between cost of a domain or existence of anti-harvesting mechanisms and Registrant phone

number misuse.

The type of the domain name (for the five types we studied) cannot adequately predict any type

of WHOIS misuse, with the exception of domain names denoting a person’s full name (in this

study, formatted as firstname-lastname). This domain name format resulted in a 37% reduction

in email address misuse, compared to the other types of domain names.

The volume of collected postal spam attributed to WHOIS misuse, even though it is non-zero, is

too low to allow any inferences. Overall 1% of experimental domains were subject to postal

address misuse.

Comparing the number of Registrants experiencing WHOIS-originated email misuse (17%) with

the measured WHOIS-originated email misuse in the experimental study (62% of domains) we

note that the difference is well beyond the margin of error. Therefore we believe that WHOIS-

attributed spam email occurrence is under reported, possibly because Registrants find email

spam to be less impactful than phone or postal spam.

In addition, the reported occurrence of WHOIS-originated postal address misuse (22%)

compared to the total of three instances of measured misuse (1% of domains) is an indication

that the limitations of operating not more than three mailboxes as part of the experimental study

80

was possibly a deterrent for more adequate measurement of postal spam misuse. However, as

we observed through the pilot of the experimental study, it can take more than 6 months to

receive a piece of WHOIS-related spam postal mail. If this observation is generally true, then

the duration of this study may have contributed to the low representation of WHOIS-attributed

postal address misuse.

On the other hand, the measured occurrence of phone number misuse (14%) is very similar to

the reported misuse (13%), and within the margin of error, possibly because Registrants find

phone misuse more impactful, and they are therefore more prone to report it.

Revisiting the main hypothesis we set off to test in this WHOIS misuse study, namely, that

public access to WHOIS misuse data leads to a measurable degree of misuse, we conclude

that this hypothesis is validated, in a statistically significant way, both via measurements and via

surveys. The main types of misuse we found are voice spam, email spam, and postal spam.

Although we found other types of misuse as well (e.g. malware and DDoS attacks), the surveys

and experiments did not yield many such instances as to be statistically-significant for the

purposes of this study. Through our controlled measurement experiments, we found anti-

harvesting mechanisms were a deterrent to misusing a WHOIS-published email address. On

the other hand, anti-harvesting does not appear to significantly impact the other types of misuse

considered in this study.

81

10. Appendix A – Law Enforcement/Researcher survey

10.1. Invitation to participate Dear [Insert name here],

We are researchers at Carnegie Mellon University, conducting a study sponsored by ICANN on

misuse of gTLD WHOIS data Ð that is, harmful acts such as spam, phishing, identity theft, and

stalking which Registrants believe were sent using WHOIS-published contact information.

(Please see: http://blog.icann.org/2011/04/cylab-at-carnegie-mellon-university-selected-to-

conduct-study-of-whois-misuse/comment-page-1/).

As part of this study, we are planning to interview and survey a

number of cyber security researchers, law enforcement agents, consumer protection agencies

from various countries, about security incidents they have observed in the field. Given your

noted expertise in the field, we would be delighted to have the opportunity to interview you. We

are aiming to complete gathering answers to this survey by [closing date here].

Shall you be interested in a phone or email interview, please let us

know by responding to this email or by contacting Nicolas Christin

at +1-412-268-4432. We have also made an online survey available at

[Insert link here].

The survey (or equivalently, the phone interview) should not take more

than 15 minutes of your time, and is a vital component of the study.

Note that, since the study is commissioned by ICANN, participation

82

presents a great opportunity to have an impact on policy making.

Thank you in advance for your time and consideration. We look forward to

your contribution.

Sincerely,

--

Nicolas Christin, Ph.D. and Nektarios Leontiadis

Carnegie Mellon University CyLab

10.2. Consent form This survey is part of a research study conducted by Prof. Nicolas Christin at Carnegie Mellon

University.

The purpose of the research is to investigate the extent to which public availability of certain

information online leads to the information being misused by unauthorized parties.

Procedures

Participants are expected to answer a survey. The expected duration of participation is 15

minutes.

Participant Requirements

Participation in this study is limited to individuals age 18 and older.

Risks

The risks and discomfort associated with participation in this study are no greater than those

ordinarily encountered in daily life or during other online activities.

83

Benefits

There may be no personal benefit from your participation in the study but the knowledge

received may be of value to humanity.

Compensation & Costs

There is no compensation for participation in this study. There will be no cost to you if you

participate in this study.

Confidentiality

By participating in this research, you understand and agree that Carnegie Mellon may be

required to disclose your consent form, data and other personally identifiable information as

required by law, regulation, subpoena or court order. Otherwise, your confidentiality will be

maintained in the following manner:

Your data and consent form will be kept separate. Your consent form will be stored in a locked

location on Carnegie Mellon property and will not be disclosed to third parties. By participating,

you understand and agree that the data and information gathered during this study may be used

by Carnegie Mellon and published and/or disclosed by Carnegie Mellon to others outside of

Carnegie Mellon. However, your name, address, contact information and other direct personal

identifiers in your consent form will not be mentioned in any such publication or dissemination of

the research data and/or results by Carnegie Mellon.

Right to Ask Questions & Contact Information

If you have any questions about this study, you should feel free to ask them by contacting the

Principal Investigator now at

84

Dr. Nicolas Christin

Carnegie Mellon INI & CyLab

4720 Forbes Avenue, CIC Room 2108

Pittsburgh, PA 15217 USA

Phone: 412-268-4432

Email: [email protected]

If you have questions later, desire additional information, or wish to withdraw your participation

please contact the Principal Investigator by mail, phone or e-mail in accordance with the contact

information listed above.

If you have questions pertaining to your rights as a research participant; or to report objections

to this study, you should contact the Research Regulatory Compliance Office at Carnegie

Mellon University. Email: [email protected] . Phone: 412-268-1901 or 412-268-5460.

The Carnegie Mellon University Institutional Review Board (IRB) has approved the use of

human participants for this study.

Voluntary Participation

Your participation in this research is voluntary. You may discontinue participation at any time

during the research activity.

I am age 18 or older. Yes No

I have read and understand the information above. Yes No

I want to participate in this research and continue with the survey. Yes No

85

10.3. Survey questions Thank you for agreeing to participate in a network security survey assembled by Carnegie Mellon CyLab. We appreciate your time filling out answers to the following questions thoroughly. If you have any questions about this survey, or the underlying study, please contact Dr. Nicolas Christin at <[email protected]>. 1. How would you best describe your occupation? - Researcher (Academia) - Researcher (Industry) - Consultant - Law enforcement agent - Consumer protection agency - Other (Please describe) 2. Which category best describes your employer: - Academia - Security industry - Other IT industry - Not-for-profit Non-Governmental Organization (NGO) - Governmental organization - Other (Please describe) 3. a) Which country are you based in?

(Drop down list) b) If different from the country you are based in, which geographic area are you providing

input about: - Africa - North America - South America - Asia - Europe - Oceania - Same as 3a. 4. Are you familiar with the process of DNS name registration? [1: Not familiar - 3: Know the basics - 5: Expert]

86

5. a) Are you familiar with how domain Registrants are required to supply contact information

when registering a domain? [1: Not familiar - 3: Know the basics - 5: Expert] b) Are you familiar with how this Registrant contact information can be queried or

obtained by third parties via WHOIS? [1: Not familiar - 3: Know the basics - 5: Expert] 5. Do you know what WHOIS harvesting is? [Yes/No] If yes, provide a one-line summary of what it is: (Open ended field) 6. Are you aware of any WHOIS anti-harvesting techniques? [Yes/No] If yes, please describe: (Open ended field) Page break. Answers to the following set of questions may contain sensitive or private data. Let us re-

iterate that 1) you do not have to answer questions you do not fill comfortable answering, 2)

shall you decide to answer, your individual answers will be protected: only the research

team at Carnegie Mellon will be able to view your individual answers (others would only

have access to aggregate statistics or reports); data will be stored encrypted. In addition,

except for members of the research team, your identity and the identity of your organization

will not be tied to specific answers, unless you explicitly grant us permission to do so (see

question 12). 7. In the course of your employment, have you directly experienced any of the following

network security related attack caused by outsiders?

87

[Yes/No for all questions] - Denial of Service - Phishing - Vishing (voicemail phishing) - Email spam - Postal spam - Email virus - Abuse of personal data or identity theft - Malware installation/drive by downloads - Unauthorized intrusion on servers - Blackmail/ransom demands/intimidation - Have experienced attacks, but prefer not to divulge specifics - Other (Please describe) - Prefer not to answer 8. In the course of your employment, have any of the following network security-related

attacks been reported to you or to your organization by a third-party? [Yes/No for all questions] - Denial of Service - Phishing - Vishing (voicemail phishing) - Email spam - Postal spam - Email virus - Malware installation/drive by downloads - Unauthorized intrusion on servers - Abuse of personal data or identity theft - Blackmail/ransom demands/intimidation - Have experienced attacks, but prefer not to divulge specifics - Other (Please describe) - Prefer not to answer

88

9. Can you or your organization supply aggregate reports or statistics on security incidents that

you have collected? - Yes - No 9a. If yes, can you give an online pointer to the resources? - Open ended field 10. Have you or your organization analyzed whether WHOIS contact data was analyzed or found

to play a role in security incidents? - Yes - No 10b. If yes, can you give details about how WHOIS contact data was analyzed and to which

extent (aggregate statistics are appropriate here): - Open ended field Specific incidents 11. Have you ever observed directly or indirectly individual incidents (as opposed to

collecting aggregate data, per the previous questions) involving harvesting of WHOIS data? Please distinguish between each incident. For each incident you are aware of please

answer the following questions: How did you become aware of the incident: (Experienced yourself, reported to you, reported to your organization, heard from third

parties) - Which elements of WHOIS Registrant contact information were misused? - Are you aware of any measures that were taken to protect the Registrant's WHOIS contact

information (“countermeasures”)?

89

- If yes, have you had any similar incident after the deployment of countermeasures - Can you provide any other details? As an example: Alice reported to me that her email was published as the WHOIS contact for ABC Corp.

Alice subsequently received phishing emails containing details available through WHOIS

but published in no other Internet location. 12. If applicable, can you disclose which organization you represent in your answers? - [open ended field] - Prefer not to answer 13. If needed, would you be available for follow-up discussion to clarify certain of your

answers to this survey? - yes - no

90

11. Appendix B – Registrant survey

11.1. Invitation to participate Dr. Nicolas Christin

Carnegie Mellon University – CyLab

4720 Forbes Avenue, CIC Rm 2108


http://www.andrew.cmu.edu/user/nicolasc/

Please click here to verify authenticity of this email:

http://dogo.ece.cmu.edu/whois-study/

Dear [FirstName], Sampled Domain Name: [CustomData]

Interested in winning the new Apple iPad 4G or an Apple iPod Shuffle? Read on.

We are computer security researchers in Carnegie Mellon University’s Cyber Security Lab

(CyLab) (http://www.cylab.cmu.edu). We are conducting a study that may help reduce Internet-

based crimes, and we need your help!

At some point – perhaps when you created a website or an email account – you registered a

domain name. During registration, you were asked to provide contact details (name, email,

phone number, address). These details are published in a public Internet directory called

"WHOIS." ANYONE, including us, can look up this directory to find out registration information.

By sharing your experience as a domain name Registrant, you can help us better understand

potential misuses of WHOIS registration data.

The results of this study will help the Internet community to fight various forms of online crime.

We will NOT collect your personal information, unless you specifically give us permission to

contact you to discuss this survey. Information about this option is available at the end of the

survey.

The survey should take about 30 minutes to complete, and will ask questions about the domain

name you have registered and your experience using it.

http://www.cylab.cmu.edu/

91

You can complete the survey in two ways:

- Complete and submit an on-line survey form by clicking [SurveyLink] (PREFERRED),

- Download survey questions from http://dogo.ece.cmu.edu/whois-

study/WHOIS_Misuse_Survey_Registrant_Printable.pdf and email answers to whois-

[email protected].

We aim to complete this survey by [closing date here]. Please click on the link below if you do

not wish to participate or receive further communication from us. You will not be contacted

further.

[RemoveLink]

If you fully complete the survey, you will be entered in a drawing for a chance to win one new

iPad (“iPad 3”) 16GB with 4G, or one of four 2GB iPod Shuffle.

Thank you very much for your time and consideration. We look forward to hearing from you.

Sincerely,

--

Nicolas Christin, Ph.D


11.2. Consent This survey is part of a research study conducted by Prof. Nicolas Christin at Carnegie Mellon

University.



Procedures


minutes.

http://dogo.ece.cmu.edu/whois-study/WHOIS_Misuse_Survey_Registrant_Printable.pdf

http://dogo.ece.cmu.edu/whois-study/WHOIS_Misuse_Survey_Registrant_Printable.pdf

mailto:[email protected]


92



Risks



Benefits

There may be no personal benefit from your participation in the study, but the knowledge



By fully completing the survey, you will be entered in a drawing for a chance to win an Apple

iPad 4G, or one of four Apple iPod Shuffle. There will be no cost to you if you participate in this

study.

Confidentiality










93










Phone: 412-268-4432











94






11.3. Survey questions Thank you very much for completing this survey conducted by Carnegie Mellon CyLab in the

United States (http://www.cylab.cmu.edu). We contacted you because you registered one or

more of the domain names that appear in a random sample being examined by this survey. By

sharing your experience as a domain name Registrant, you can help us make the Internet a

safer place! If you have any questions about this survey, or the underlying study, please contact

Dr. Nicolas Christin at <[email protected]>.

This survey is commissioned by ICANN, the Internet Corporation for Assigned Names and

Numbers. ICANN coordinates the assignment of domain names, and is in charge of the policies

governing WHOIS directory. The results of this study will help ICANN to take steps to reduce

WHOIS misuse.

First, let us start with a brief explanation to help you complete this survey. A domain name

identifies an Internet resource like a website or email service (google.com, verizon.net, cmu.edu

etc.). At some point – perhaps when you created a website or an email account – you obtained

a domain name. That process is called “domain registration.” Companies that provide domain

registration services are known as “Registrars.” Examples include GoDaddy, Tucows and

ENOM. During registration, your Registrar asked you to provide contact details (name, email,

phone number, address). These details are published in a public Internet directory called

"WHOIS" and ANYONE (including us) can access it.

95

We value your privacy. We assure you that all of your survey answers will be treated as

confidential and we will use them only for aggregate statistical analysis. By this we mean that no

entity will be able to associate a specific answer to you. Your personal contact details or

individual answers will NOT be disclosed to anyone outside of our research team.

1. How many domain names have you currently registered?

- 1

- 2-10

- More than 10

2. Please list all of the domain names that you have registered. If you registered more than one name, please separate them with commas (,) – for example, “mycorp1.com, mycorp2.com.”

[Open ended]

2.1 Please tell us the “sampled domain name” that appears in your survey invitation letter.

[Open ended]

When answering questions that follow, please think about your experiences as the Registrant of this sampled domain and communication sent to addresses that you supplied when registering that domain. Before continuing, you may find it helpful look up your own domain in WHOIS using http://whois.domaintools.com.

3. Thinking about why you registered this domain name and how you use it, please indicate

which of the following categories best describes you as this domain name’s Registrant:

- I registered the domain for my own use as an Individual

- I registered the domain for use by a For-profit business or organization

- I registered the domain for use by a Non-profit organization

- I registered the domain for use by an informal interest group (e.g., tennis club)

http://whois.domaintools.com/

96

- Other (please specify)

3.1 Is this domain name used for any commercial activities – for example, to sell or advertise

goods or services or to collect donations?

- Yes

- No

- Not sure or prefer not to answer

4. Please indicate the country that you identified when you registered this domain name. Note:

WHOIS identifies several contacts for each domain name, including an administrative contact

(usually you) and a technical contact (may be your Internet service provider). Here, we are

interested in the country identified in YOUR contact details.

(Drop down list)

5. Please identify the Registrar (that is, the registration service provider) from whom you

obtained this domain name. If you do not know or recall, you may leave this blank.

[Open ended field]

6. Before taking this survey, did you know that the contact details which you provided during

domain registration would be publicly available on the Internet through “WHOIS”?

[Yes/No]

7. Since registering this domain name, have you ever received unsolicited postal mail at any of

the postal addresses that you specified in contact details during domain registration?

[YES/NO]

97

7.1 [If yes to Q7] Do you have reason to suspect that you received this unsolicited postal

mail because your postal address was published in WHOIS?

[YES/NO]

7.1.1 [If yes to Q7.1] Why do you think so?

[Open ended field]

7.1.2 [If yes to Q7.1] Is the postal address published in another public directory or

Internet source (for example, a phone book, a website, your email signature)?

[Yes/No]

7.1.3 [If yes to Q7.1] How often do you receive unsolicited postal mail at the

postal addresses published in WHOIS?

- A few times in a week

- A few times in a month

- A few times in a year

- Less than once in a year

7.1.4 [If yes to Q7.1] When was the last time that you experienced this?

- Within this week

- Within this month

- Within the past three months

- Within this year

- More than a year ago (please specify)

7.1.5 [If yes to Q7.1] Please describe reasons for which you were contacted in these cases (e.g., a domain name hosting services offer)

98

[Open ended]

7.1.6 [If yes to Q7.1] If you know or can recall who contacted you in a recent case, please tell us more about that entity (e.g., sender’s name, type of company)

[Open ended]

7.1.7 [If yes to Q7.1] Did this unsolicited postal mail have any adverse impact on you?

- Yes (describe)

- No

7.2 [If no to Q7.1] Could the postal address have been obtained from another public

directory or Internet source (for example, a phone book, a website, your email

signature)?

[Yes/No]

7.2.1[If no to Q7.2] How do you think your postal address was obtained?

[Open ended]

8. Since registering this domain name, have you ever received unsolicited electronic mail at any of the email addresses that you specified in contact details during domain registration?

[YES/NO]

8.1 [If yes to Q8] Do you have reason to suspect that you received those emails because

your email address was published in WHOIS?

[YES/NO]

8.1.1 [If yes to Q8.1] Please specify why you think so.

[Open ended field]

99

8.1.2 [If yes to Q8.1] Is the misused email address published in another public

directory or Internet source (for example, a website, your email signature,

Facebook, Twitter)?

[Yes/No]

8.1.3 [If yes to Q8.1] How often do you experience misuse of your email address published in WHOIS?

- A few times a day - A few times in a week





- Within this week

- Within this month


- Within this year


8.1.5 [If yes to Q8.1] Please describe the reasons for which you were contacted in these cases (e.g., a domain name hosting services offer, targeted phishing email)

[Open ended]

8.1.6 [If yes to Q8.1] If you know or can recall who contacted you in a recent case, please tell us more about that entity (e.g., sender’s name, type of company)

[Open ended]

8.1.7 [If yes to Q8.1] Did this unsolicited email have any adverse impact on you?

- Yes (describe)

- No

100

8.2 [If no to Q8.1] Could the email address have been obtained from another public directory or Internet source (for example, a website, your email signature, facebook, twitter)?

[Yes/No]

8.2.1 [If no to Q8.2] How do you think your email address was obtained?

[Open ended]

9. Since registering this domain name, have you ever received unsolicited voice calls at the

phone number(s) that you specified in contact details during domain registration?

[YES/NO]

9.1 [If yes to Q9] Do you have reason to suspect that those unsolicited voice calls

happened because your phone number(s) are published in WHOIS?

[YES/NO]

9.1.1 [If yes to Q9.1] Please specify why you think so.

[open ended]

9.1.2 [If yes to Q9.1] Is the misused phone number(s) published in another public directory or Internet source (for example, a phone book, a website, your email signature)?

[Yes/No]

9.1.3 [If yes to Q9.1] How often do you experience misuse of your phone number(s) published in WHOIS?

- A few times a day - A few times in a week




101


- Within this week

- Within this month


- Within this year


9.1.5 [If yes to Q9.1] Please describe the reasons for which you were contacted

in these cases (e.g., a domain name hosting services offer)

[Open ended]

9.1.6 [If yes to Q9.1] If you know or can recall who contacted you in a recent case, please tell us more about that entity (e.g., sender’s name, type of company).

[Open ended]

9.1.7 [If yes to Q9.1] Did these unsolicited calls have any adverse impact on you?

- Yes (describe)

- No

9.2 [If no to Q9.1] Could the phone number have been obtained from another public directory or Internet source (for example, a phone book, a website, your email signature)?

[Yes/No]

9.2.1 [If no to Q9.2] How do you think your phone number(s) was obtained?

[Open ended]

10. Since registering this domain name, have you ever had your identity (e.g. name, address,

phone number) abused or stolen? An example would be fraudulent use of your identity (without

your knowledge) to apply for a credit card or receive financial services.

[YES/NO]

102

10.1 [If yes to Q10] Was this identity specified in contact details during domain

registration?

[Yes/No]

10.1.1 [If yes to Q10.1] Do you have reason to suspect that the identity abuse happened because your identity details are published in WHOIS?

[YES/NO]

10.1.1.1 [If yes to Q10.1.1] Please specify why you think so.

[Open ended]

10.1.1.2 [If yes to Q10.1.1] Are the misused identity details published in another public directory or Internet source (for example, your email signature, a workplace directory, Facebook)?

[Yes/No]

10.1.1.3 [If yes to Q10.1.1] How many times have been your identity published in WHOIS abused or stolen?

- Once

- Twice

- Three times

- More than three times (please indicate)

10.1.1.4 [If yes to Q10.1.1] When was the last time that you experienced this?

- Within this week

- Within this month


- Within this year


103

10.1.1.5 [If yes to Q10.1.1] Please describe how your identity details were misused (e.g. issuing of a loan, credit card)

[Open ended]

10.1.1.6 [If yes to Q10.1.1] If you know or suspect who is responsible for this identity abuse/theft please tell us more about that entity (e.g., name, relationship to you if any).

[Open ended]

10.1.1.7 [If yes to Q10.1.1] Please describe the adverse impact of this identity abuse/theft on you. For example, would you rate the impact as minor, major, or severe?

[Open ended]

10.1.2 [If no to Q10.1.1] Could the identity details have been obtained from another public directory or Internet source (for example, your email signature, a workplace directory, Facebook)?

[Yes/No]

10.1.2.1 [If no to Q10.1.2] How do you think identity details were

obtained?

[Open ended]

11. Are there any Internet servers (web, email, etc.) now reachable using the domain name that

you registered?

[YES/NO]

11.1 [If yes to Q11] Are you the system administrator of these servers? That is, do you

own and operate the computer on which the server runs? (If your servers are hosted by

a web or email services provider, the answer to this question should be NO. If you’re not

sure about the answer, chances are good it should be NO.)

[YES/NO]

104

11.1.1 [If yes to Q11.1] Since registering this domain name, have you ever

experienced unauthorized intrusion into servers within this domain for which you

have administrative rights?

[YES/NO]

11.1.1.1 [If yes to Q11.1.1] Do you have reason to suspect that the

unauthorized intrusion(s) happened because your identity details are

published in WHOIS?

[YES/NO]

11.1.1.1.1 [If yes to Q11.1.1.1] Please specify why you think so.

[Open ended]

11.1.1.1.2 [If yes to Q10.1.1.1] Are the misused identity details published in another public directory or Internet source (for example, your email signature, a workplace directory, Facebook)?

[Yes/No]

11.1.1.1.3 [If yes to Q11.1.1.1] How many times have you

observed intrusions into your server(s) that you can relate to your

identity details published in WHOIS?

- Once

- Twice

- Three times


11.1.1.1.4 [If yes to Q11.1.1.1] When was the last time that you experienced this?

- Within this week

- Within this month

105


- Within this year


11.1.1.1.5 [If yes to Q11.1.1.1] Please describe the adverse effect

and severity of the unauthorized intrusion (e.g. web site

defacement)

[Open ended]

11.1.1.1.6 [If yes to Q11.1.1.1] If you know or suspect who was behind a recent intrusion, please tell us more about that entity (e.g., source IP address or domain name).

[Open ended]

11.1.2 [If yes to Q11.1] Have any of the servers in your domain(s) been a victim of denial of service (DoS) attack? (If unsure, the answer should be NO.)

[YES/NO]

11.1.2.1 [If yes to Q11.1.2] Do you think the DoS attack happened

because your identity details are published in WHOIS?

[YES/NO]

11.1.2.1.1 [If yes to Q11.1.2.1] Why do you think so?

[Open ended]

11.1.2.1.2 [If yes to Q11.1.2.1] Are the misused identity details published in another public directory or Internet source (for example, your email signature, a workplace directory, Facebook)?

[Yes/No]

11.1.2.1.3 [If yes to Q11.1.2.1] How many times have you have you experienced a DoS attack against one or more of the servers within this domain that you attribute to WHOIS misuse?

106

- Once

- Twice

- Three times


11.1.2.1.4 [If yes to Q11.1.2.1] When is the last time that you experienced this?

- Within this week

- Within this month


- Within this year


11.1.2.1.5 [If yes to Q11.1.2.1] Please describe the adverse impact of the attack (e.g.unable to provide services to customers, etc)

[Open ended]

11.1.2.1.6 [If yes to Q11.1.2.1] If you are know or suspect who was behind a recent attack, please tell us more about that entity (e.g., caller’s name, type of company)

[Open ended]

12. Since registering this domain name, have you ever been a victim of blackmail or intimidation?

[YES/NO]

12.1 [If yes to Q12] Was the identity (e.g., name, address, phone number, etc) that was the target of blackmail or intimidation specified in contact details during domain registration?

[Yes/No]

12.1.1 [If yes to Q12.1] Do you have reason to suspect that the blackmail or intimidation was related to the fact that your identity details are published in WHOIS?

107

[YES/NO]

12.1.1.1 [If yes to Q12.1.1] Please specify why you think so.

[Open ended]

12.1.1.2 [If yes to Q12.1.1] Are the misused identity details published in another public directory or Internet source (for example, email signature, workplace directory, Facebook)?

[Yes/No]

12.1.1.3 [If yes to Q12.1.1] How many times have you have you been blackmailed or intimidated using your identity details published in WHOIS?

- Once

- Twice

- Three times


12.1.1.4 [If yes to Q12.1.1] When was the last time that you experienced this?

- Within this week

- Within this month


- Within this year


12.1.1.5 [If yes to Q12.1.1] Please describe a recent incident (e.g., how you got blackmailed or intimidated).

[Open ended]

12.1.1.6 [If yes to Q12.1.1] If you know or suspect who was behind a recent incident, please tell us more about that entity (e.g., name, relationship to you if any)

[Open ended]

108

12.1.1.7 [If yes to Q12.1.1] Please describe the adverse impact this incident had on you. For example, would you rate the incident’s impact as minor, major, or severe?

[Open ended]

13. Have you received any other type of harmful Internet communication or experienced any

other harmful acts that you have reason to believe may represent WHOIS data misuse?

[Yes/No]

13.1 [If yes to 13] Please tell us what you experienced, why you believe WHOIS contact

details for this domain name might have played a role, and whether the contact details

misused in this incident were available from any other source.

[Open ended]

14. If you believe that the information you used for domain name registration has been misused

in any way, and you have indicated this in any one of the previous questions, did you

subsequently take any measures to avoid WHOIS misuse in the future?

[I have experienced misuse and taken measures/I have experienced misuse and not taken

measures/I have not experienced misuse]

14.1 [If yes to Q14] Please tell us about the measures that you took. Check all steps that

you tried and explain any additional strategies you tried that are not listed below:

- Cancelling your domain name’s registration or moving it to a different Registrar.

- Changing your email address or domain name or any other misused WHOIS data.

- Replacing your own WHOIS contact addresses with forwarding addresses supplied by

a service provider (such as your domain’s Registrar).

- Replacing your WHOIS contact names and addresses with the names and addresses

of a service provider (for example, someone registering the domain name on your

behalf).

109

- Supplying partially incorrect or incomplete information when re-registering the domain

name or updating its WHOIS contact details (e.g., using a fake street number with

everything else valid)

- Supplying completely fake information when re-registering the domain name or

updating its WHOIS contact details.

- Applying a spam filter or registering with an identy theft protection service or some

other step to deal with the consequences of WHOIS misuse (as opposed to reducing

misuse itself).

- Other (please describe)

[Important note: As previously stated, your individual answer to this question is completely confidential and will NOT be shared with your Registrar or ICANN.]

15. Are you aware of any strategies that your domain name’s current Registrar may be taking to

reduce or protect against WHOIS data misuse?

[YES/NO]

15.1 [If yes to Q15] Please describe: [open ended field]

16. Do you grant us permission to contact you further in case we need clarifications about your

answers to this survey?

[YES/NO]

16.1 [If yes to Q16] If yes, please enter your email here.

[Open ended]

11.4. Terms

Carnegie Mellon University

Definition of Terms - WHOIS Misuse Survey

110

The following are the descriptions for the technical terms used in the ICANN WHOIS Misuse

Study being conducted by CMU. These descriptions will help you understand both the general

meaning of a term and its specific meaning as applied in this study.

Identity Theft

Identity theft occurs when someone uses your personally identifiable information, like your name,

address, phone number, Social Security number (or national identification number), or credit

card number, without your permission, to commit fraud or other crimes. Some examples of

identity theft include renting an apartment, obtaining a credit card, or establishing a telephone

account in your name, without your permission.

Identity thieves steal information by going through trash looking for bills or other paper with your

personally identifiable information, soliciting your information by sending emails pretending to be

your bank (see also Phishing), calling your financial institution while pretending to be you, etc.

Thieves may also be able to get some personally identifiable information by searching WHOIS

for domain name contact names and addresses.

For additional information about Identity Theft and examples, see the United States Federal

Trade Commission website.

Blackmail

In common usage, blackmail is a crime involving threats to reveal substantially true and/or false

information about a person to the public, a family member, or associates unless a demand is

met. Blackmail can include coercion involving threats of physical harm, criminal prosecution, or

taking the person's money or property. In the context of WHOIS misuse, blackmailers may use

some personally identifiable information by searching WHOIS for domain name contact names

and addresses.

For additional information about Blackmail, see Wikipedia website.

Email Spam

Spam email is an unsolicited mail message, sent to your email address without your permission.

The sender of spam is commonly called a "spammer" Spammers send the same email to a

large number of email addresses. They may obtain email addresses from many different

111

sources such as websites and chat forums. It is also possible for spammers to search WHOIS

for domain name contact email addresses.

Spam email is often used to advertise (or sell) legal and illegal products and to attempt to steal

sensitive information like credit card numbers (see also Phishing). Products commonly

advertised by spam include prescription drugs, herbal medications, replica watches, online

gambling and pornography.

For additional information about Email Spam, see SpamHaus.

Postal Spam

Postal spam is unsolicited postal mail sent to a residential or commercial postal mailbox or

another postal address, and is similar in concept to email spam (see Email Spam). Postal

spammers may obtain postal addresses from many different sources, both offline and on-line,

including searching WHOIS for domain name contact postal addresses.

Phishing

Phishing attacks attempt to steal your personally identifiable information (see also Identity Theft)

and financial account information. A common tactic used during phishing attacks is sending

spam emails that contain links to counterfeit websites (see also Email Spam). Phishing emails

may contain details about recipients, obtained from many different sources, including searching

WHOIS for domain name contact names, addresses and phone numbers.

The attacker can use techniques to hide the identity of the phishing message's true sender and

make the email look like someone else sent it. For example, a phishing email may appear to

come from a legitimate bank, but when you click on the link, you may be taken to a website

designed to look like the bank's website. This may trick you into divulging sensitive data such as

banking or other website account usernames and passwords.

Alternatively, when you click on a phishing email link, you may be taken to a website that

attempts to automatically install malicious software on your computer without your permission or

knowledge. For example, a key-logger program may be installed to send everything that you

type (e.g., passwords) to a remote attacker.

For additional information about Phishing, see this United States Federal Trade Commission

alert.

Vishing

112

Vishing attacks attempt to steal your personally identifiable information (see also Identity Theft)

and financial account information. Vishing attacks are similar to phishing attacks (see Phishing),

but are conducted using voice or telephone calls instead of email messages. The attacker can

use techniques to hide the vishing caller's true caller identification number and make the caller's

number appear to be another party's number. Vishing attack victims may be tricked into

revealing sensitive information.

For example, the attacker may call you, claiming to be a representative of a bank, and request

your banking information for administrative purposes. Alternatively, upon receiving a vishing call,

you may hear an automated voice message requesting you to immediately call a specified

number to verify account details. But that number reaches the attacker, not your bank.

For additional information about Vishing, see this United States Federal Bureau of Investigation

(FBI) consumer alert.

Email Virus

The most generic definition of an email virus is malicious software (also called "malware")

delivered as an email file attachment. When the recipient opens the attached file, the malicious

software is installed or otherwise activated. The malicious software may damage data or

services on the recipient's computer. It may also carry out harmful actions on behalf of the

attacker. Common examples include deleting files, sending spam emails (see Email Spam) on

the attacker's behalf, tracking the user's actions, and downloading and installing additional

malicious software. Mail messages that carry viruses may be sent to email addresses obtained

from many different sources, including searching WHOIS for domain name contact addresses.

For additional information about Email Viruses, see Carnegie Mellon My Secure Cyberspace.

Denial of Service (DoS)

In a denial-of-service attack, an attacker attempts to prevent legitimate users from accessing or

making use of information or services. By targeting your computer and its network connection,

or the computers and network of Internet sites that you are trying to use, an attacker may be

able to prevent you from accessing email, websites, online service provider accounts (banking

sites, etc.), or other services that rely on the computers or networks that are under DoS attack.

Not all disruptions to service are the result of a DoS attack. There may be technical problems

with a particular network, or system administrators may be performing maintenance. However,

the following symptoms could indicate a DoS attack:

113

unusually slow performance when opening files or accessing websites,

unavailability of a particular website,

inability to access any website, or

a dramatic increase in the amount of spam that you receive

DoS attacks may be launched against targets identified in many different sources, including

searching WHOIS for domain name contact names and addresses.

For additional information about DoS, see United States Computer Emergency Response Team

(US-CERT) website.

Unauthorized Intrusion

Unauthorized intrusion occurs when an attacker gains access to services or information on a

computer system without the owner's permission. It is also possible that the attacker is a

legitimate user of the computer system, but has managed to gain access to an access level

higher than she is authorized to access.

Unauthorized intrusion can happen in many ways. Some common techniques used by intruders

are sending malicious messages to the targetís computer through the network, tricking the

administrator of the computer system in to installing malicious software (see also Phishing), and

guessing the administrator's account username and password. Unauthorized intrusions may be

launched against targets identified in many different sources, including searching WHOIS for

domain name contact names and addresses.

For additional information about Unauthorized Intrusions, see Carnegie Mellon My Secure

Cyberspace:

Document Information

This document was prepared to help users completing surveys being conducted by computer

security researchers at Carnegie Mellon University - Cylab. This document is for research and

education purposes only, and is not for commercial or business purposes. Anyone can use this

document in part or whole by citing all the sources cited in this document, and adhering to the

terms of use specified by the sources cited in this document. All queries regarding this

document should be directed to [email protected].

Acknowledgement of sources

114

All sources used to create this document are specified below. Some sentences have been

quoted verbatim or with slight modifications to assist readers with limited knowledge of

computer terminology. Further, certain references to United States specific terminology (e.g.,

Social Security Number) have been reduced as this document is intended for use by an

international audience.

Identity Theft: http://www.ftc.gov/bcp/edu/microsites/idtheft/consumers/about-identity-

theft.html#Whatisidentitytheft

Denial of Service: http://www.us-cert.gov/cas/tips/ST04-015.html

Phishing: http://www.icann.org/en/general/glossary.htm#P

Blackmail: http://en.wikipedia.org/wiki/Blackmail

Spamhaus: http://www.spamhaus.org/definition.html

Email Viruses: http://www.mysecurecyberspace.com/encyclopedia/index/intrusion.html

Phishing: http://www.ftc.gov/bcp/edu/pubs/consumer/alerts/alt127.shtm

Vishing: http://www.fbi.gov/news/stories/2010/november/cyber_112410

Unauthorized Intrusions:

http://www.mysecurecyberspace.com/encyclopedia/index/intrusion.html

http://www.ftc.gov/bcp/edu/microsites/idtheft/consumers/about-identity-theft.html#Whatisidentitytheft

http://www.ftc.gov/bcp/edu/microsites/idtheft/consumers/about-identity-theft.html#Whatisidentitytheft

http://www.us-cert.gov/cas/tips/ST04-015.html

http://www.icann.org/en/general/glossary.htm#P

http://en.wikipedia.org/wiki/Blackmail

http://www.spamhaus.org/definition.html


http://www.ftc.gov/bcp/edu/pubs/consumer/alerts/alt127.shtm

http://www.fbi.gov/news/stories/2010/november/cyber_112410


115

12. Appendix C – Registrar and Registry Survey

12.1. Invitation to Participate Carnegie Mellon University - CyLab


http://www.andrew.cmu.edu/user/nicolasc/

Email: Please click here to verify authenticity of this email

Dear [Firstname] [Lastname],

We are researchers at Carnegie Mellon University in the United States, conducting a study

commissioned by ICANN on the extent to which public WHOIS contact data for gTLD domains

is misused to commit harmful acts such as spam, phishing, identity theft, stalking, etc. One

survey will target Registrants who believe cases of misuse have originated from WHOIS-

published contact data. We are asking gTLD Registries and a geographically diverse set of

Registrars to participate in this second related survey to learn how WHOIS data for sampled

domain names could possibly have been obtained (e.g., supported query vectors, applied anti-

harvesting measures). Because your organization is a Registry or Registrar associated with a

domain name included in our study’s random sample, we would like to learn about your WHOIS

data access practices. Your participation in this survey presents a great opportunity to share

your insights as a Service Provider about how prevalent public WHOIS data misuse may or may

not be and ways you think have been most effective in deterring WHOIS harvesting.

Please visit this link [URL] to view more information about this study, including important term

definitions that may be useful when you answer this survey and information about how we will

treat business-sensitive and personal information that you choose to share with us.

Should you be interested in participating in the survey, please do so in any of the following

ways:

- Complete and submit an on-line survey form by clicking [URL] (preferred),

- Download survey questions form [URL] and email answers to [email protected],

or

- Schedule a phone interview by responding to this email.


116

We are aiming to complete this survey by [closing date here]. Please note: If you do not wish to

participate and receive further communication from us, please click the link below, and you will

be automatically removed from our mailing list.

[RemoveLink]

The survey (or equivalently, the phone interview) should take about 25 minutes of your time,

and is a vital component of the study. If you explicitly permit us to do so, we may follow up with

you by phone or email in case we wish to clarify some of your answers.

Thank you in advance for your time and consideration. We look forward to your contribution.

Sincerely,

--

Nicolas Christin, Ph.D.


12.2. Consent form This survey is part of a research study conducted by Prof. Nicolas Christin at Carnegie Mellon

University.



Procedures


minutes.


117


Risks



Benefits

There may be no personal benefit from your participation in the study but the knowledge



There is no compensation for participation in this study. There will be no cost to you if you

participate in this study.

Confidentiality












118








Phone: 412-268-4432













119




12.3. Survey questions We are researchers at Carnegie Mellon University in the United States, conducting a study

commissioned by ICANN on the extent to which public WHOIS contact data for gTLD domains

is misused to commit harmful acts such as spam, phishing, identity theft, stalking, etc. One

survey will target Registrants who believe cases of misuse have originated from WHOIS-

published contact data. We are asking gTLD Registries and a geographically diverse set of

Registrars to participate in this second related survey to learn how WHOIS data for sampled

domain names could possibly have been obtained (e.g., supported query vectors, applied anti-

harvesting measures).

Because your organization is a Registry or Registrar associated with a domain name included in

our study’s random sample, we would like to learn about your WHOIS data access practices.

Your participation in this survey presents a great opportunity to share your insights as a Service

Provider about how prevalent public WHOIS data misuse may or may not be and ways you

think have been most effective in deterring WHOIS harvesting.

Please visit this link [URL] to view more information about this study, including important term

definitions that may be useful when you answer this survey and information about how we will

treat business-sensitive and personal information that you choose to share with us. If you have

any questions about this survey, or the underlying study, please contact Dr. Nicolas Christin at

<[email protected]>.

This first set of questions is intended to capture general characteristics of the gTLD domain name Registrar or registration services that your organization provides.

120

0. Does your organization operate a gTLD Registry? If so, please list the generic top-level

domains for which your organization is responsible, separating values with commas (,).

[Open ended]

1. Does your organization operate as a gTLD domain name Registrar? If so, please list the

generic top-level domain(s) under which your organization offers registration services,

separating values with commas (,).

[Open ended]

2. In which country is your headquarters located?

(Drop down list)

3. Please indicate, by order of magnitude, the number of individual domain names for which your organization provides registration services (directly or indirectly), as of the response date.

- Exactly or under 100 000

- 100 001 to 1 000 000

- 1 000 001 to 10 000 000

- More than 10 000 000

4. Please indicate, by order magnitude, the monthly number of WHOIS queries that you receive and respond to via any of the following means, without regard to the number of WHOIS records actually returned in those responses:

a) Port 43 WHOIS protocol query responses/month


- 100 001 to 1 000 000

- 1 000 001 to 10 000 000

- More than 10 000 000

- Do not know or do not measure

b) Web form WHOIS query responses/month


- 100 001 to 1 000 000

- 1 000 001 to 10 000 000

121

- More than 10 000 000


c) Bulk WHOIS data purchase transactions/month


- 100 001 to 1 000 000

- 1 000 001 to 10 000 000

- More than 10 000 000


d) Other WHOIS data request methods (please describe method and frequency)

[Open ended]

WHOIS anti-harvesting techniques

The following set of questions are intended to explore WHOIS anti-harvesting techniques that Registries and Registrars may have implemented to reduce WHOIS misuse, resist DoS attack or improve operating efficiency. Your response to these questions will help us better understand the extent to which Registries and Registrars have already taken steps to deter WHOIS harvesting and how.

5. Do you currently implement any techniques to deter WHOIS data harvesting?

[Yes/No]

[If no to Q5, next section]

[New page]

122

This section explores WHOIS anti-harvesting techniques that you may have implemented. For each technique, we will ask you to provide a short description of key parameters used to trigger or tune your implementation. This information will help us better understand both common and innovative techniques and assess their apparent impact on WHOIS misuse frequency.

Rate limiting

Here we ask you to describe two techniques used to limit WHOIS resolution request rates: limiting the requests sent to the well-known WHOIS port 43, and providing a web form for submitting WHOIS requests which makes machine automated request generation more difficult.

6. Do you implement Port 43 rate limiting?

[Yes/No]

6.1 [If yes to Q6] Please describe key parameters that affect your implementation, such as

thresholds used to activate the lock and the duration of the lock.

[Open ended]

7. Do you support WHOIS Query through a web form?

[Yes/No]

7.1 [If yes to Q7] Does submitting a WHOIS Query to your web form require answering a

CAPTCHA prompt?

[Yes/No]

7.2 [If yes to Q7] Do you apply any other (non-CAPTCHA) rate limiting to web form queries?

123

[Yes/No]

7.2.1 [If yes to Q7.2] Please describe key parameters that affect your web form rate limiting

implementation, such as thresholds used to activate the lock and the duration of the lock

[Open ended]

[New page]

Blacklisting

Here we ask you to describe anti-harvesting methods that involve some kind of sender blacklisting, used to prevent suspected WHOIS data harvesters from performing an unreasonably large number of WHOIS resolution requests.

8. Do you implement Permanent IP or Domain Name-based Blacklisting of suspected WHOIS

data harvesters?

[Yes/No]

9. Do you implement Temporary IP or Domain Name-based Blacklisting of suspected WHOIS data harvesters?

[Yes/No]

10. If you implement any form of blacklisting (either listed above, or other), please provide details about the types of blacklisting you have implemented, the criteria used to identify suspected WHOIS data harvesters, and any thresholds and durations used to determine when to activate or remove the blacklist.

[Open ended]

Privacy or Proxy registration services

124

Here we ask you to identify any services that you may already offer to domain name Registrants to address their concerns about harvesting of data published in WHOIS.

11. Do you offer a service which provides alternate WHOIS contact information and mail

forwarding services while not actually shielding the Registered Name Holder’s identity?

- Yes

- No

- Unknown or prefer not to answer

12. Do you offer a service which registers domain names on a customer's behalf and then

licenses domain use so that your identity and contact information is published in WHOIS?

- Yes

- No

- Unknown or prefer not to answer

13. Other techniques. Please provide a detailed description of any other techniques you have

implemented to deter misuse of WHOIS data obtained by harvesters.

[Open ended]

[New page]

The following set of questions will help us understand to what extent you receive feedback from domain name Registrants regarding harmful acts which they believe were sent using WHOIS contact information. We are particularly interested in understanding whether these incidents can in fact be corelated to the information available on-line only through WHOIS.

125

14. Have any of the following harmful acts ever been reported to your organization by domain

name Registrants who suspected they were experiencing WHOIS data misuse? Please refer to

[URL] for our description of the following terms.

[Check all that apply]

- Denial of Service

- Phishing

- Vishing (voicemail phishing)

- Email spam

- Postal spam

- Email virus

- Unauthorized intrusion on servers

- Abuse of personal data or identity theft

- Blackmail/ransom demands/intimidation

- Registrants have reported experiencing harmful acts, but I prefer not to divulge specifics

- Other (Please describe)

- Prefer not to answer

15. Was your organization able to identify whether WHOIS contact data was in fact misused or

found to play a role in any of the above-reported incidents?

- Yes

- No

15.1. [If yes to Q15] Please supply details about how WHOIS contact data was misused and to

what extent. You may describe particular incidents involving suspected misuse and/or

aggregate statistics about how often misuse was either confirmed or ruled out.

- Open ended field

[New page]

126

The following set of questions will help us understand how Registries and Registrars take steps to combat WHOIS harvesting if and when such activity is detected.

16. Have you experienced any known WHOIS data harvesting incidents (successful or otherwise)?

[Yes/No]

16.1 [If yes to Q16] Can you provide any statistics that might quantify how often your

organization experiences WHOIS harvesting attempts (e.g., frequency of WHOIS rate limit

triggering or blacklisting)?

[Open ended field]

16.2 Can you describe how successful WHOIS data harvesting attempts (if any) have been detected and investigated?

[Open ended field]

17. Were your WHOIS anti-harvesting techniques implemented or adapted in response to past harvesting incidents?

[Yes/No]

17.1 [If yes to Q17] Which of the following anti-harvesting techniques (if any) did you adopt within the last 2 years after you realized that you were being targeted by WHOIS data harvesters?

[Check all that apply among the following:

Port 43 rate limiting

Query only through a web form.

CAPTCHA

Permanent IP or Domain Name Blacklisting

Temporary IP or Domain Name Blacklisting

Privacy or Proxy registration services

Other anti-harvesting technique (please explain) ]

127

17.2 [If yes to Q17] For each technique deployed or adapted as a countermeasure, please give a short description of your rationale and the extent to which it has proven effective as a deterrent.

[Open ended]

[New page]

Finally, we wish to consider other access paths that harvesters can use to obtain WHOIS data. Your answer may help us assess alternative ways that misused WHOIS data may have been obtained and possible impact of affiliates on WHOIS misuse frequency.

18. Please provide a list of key affiliates and partners that purchase domain name -related services from your organization (e.g., bulk purchase of domain names for resale, bulk access to WHOIS data).

[Open ended]

19. Do you grant us permission to contact you further in case we need clarifications about your

answers to this survey?

[YES/NO]

19.1 [If yes to Q19] If yes, please enter your email here.

[Open ended]

128

13. Bibliography APWG. (2011). Phishing Attack Trends Report - Q2 2010. Anti-Phishing Working Group.

GNSO. (2007). Retrieved from http://gnso.icann.org/en/issues/whois-privacy/whois-services-

final-tf-report-12mar07.htm

ICANN. (2009). Terms of reference for WHOIS misuse studies. Retrieved from

http://gnso.icann.org/issues/whois/tor-whois-misuse-studies-25sep09-en.pdf

NORC. (2010). Draft Report for the Study of the Accuracy of WHOIS Registrant Contact Information. University of Chicago.

SAC023. (2007). Is the WHOIS Service a Source for email Addresses for Spammers?

SAC028. (2008). SSAC Advisory on Registrar Impersonation – Phishing Attacks.

129

Documents

WHOIS Misuse Study...WHOIS data, identified in a Task Force Report on WHOIS Services (GNSO, 2007) the possibility of misuse of WHOIS data for phishing and identity theft, among others