Genealogy and Privacy in an Online World - Declan Greally

BSc (Hons) Information Technology

Genealogy and Privacy Issues in an Online World

Declan Greally

23/4/2015

Supervisor: Andrew Rae

Contents

ACKNOWLEDGEMENTS IV

ABSTRACT V

INTRODUCTION 1

1 LITERATURE REVIEW 2

1.1 Genealogy 3

1.2 Genealogy and Information Technology 8

1.3 Privacy Issues 14

2 PRIMARY RESEARCH METHODOLOGY 19

2.1 Terms 20

2.2 Available Research Methodologies 22

2.3 Chosen Research Methodology 28

3 PRIMARY RESEARCH ANALYSIS 30

3.1 Demographics 31

3.2 Privacy 51

3.3 Genealogy 60

CONCLUSION 69

REFERENCES 73

APPENDIX 78

A: Preliminary Questionnaire 78

B: Final Questionnaire 80

C: Supervisor Meeting Agenda Example 83

D: Supervisor Meeting Minutes Example 84

E: Data from Preliminary Survey 85

F: Preliminary Survey Demographics Results 88

G: Final Survey Demographics Results 89

H: Genealogy Website Rankings Data 90

I: Genealogy Website Rankings Graph 91

Table of Figures

FIGURES

Figure 1.1 - Top 15 genealogy website traffic-rankings over a five-month period (lower is better) - Alexa 13

Figure 3.1 - Preliminary survey: Age vs. interest and research undertaken 33

Figure 3.2 - Preliminary survey: Age of researchers vs. when they started researching 37

Figure 3.3 - Final survey: The 40 countries that responded 39

Figure 3.4 - Final survey: Age vs. interest and research undertaken 41

Figure 3.5 - Final survey: Marital status and undertaking of research 43

Figure 3.6 - Comparison: Percentage of each age group that had undertaken research 46

Figure 3.7 - Comparison - Children: Percentage of participants who had researched 49

Figure 3.8 - Final survey: Age and experience vs. perception of genealogy as a social activity 62

Figure 3.9 - Preliminary survey: Why did you start researching your family tree? 63

TABLES

Table 1 - Preliminary survey: Marital status vs. interest and research undertaken 34

Table 2 - Preliminary survey: Number of children vs. interest and research undertaken 35

Table 3 - Final survey: Number of children vs. interest and research undertaken 44

Table 4 - Preliminary survey: Age vs. years of experience of researchers 47

Table 5 - Final survey: Age vs. feeling of experience of researchers 47

Table 6 - Final survey: Privacy matrix question 52

Table 7 - Final survey: Privacy matrix vs. age 53

Table 8 - Final survey: Privacy matrix vs. marital status 54

Table 9 - Final survey: What are you most worried about [] your genealogical data on the Internet? 56

Table 10 - Final survey: Social quantitative questions 61

Table 11 - Preliminary survey: Why did you start researching your family tree? - Data 64

iv

Acknowledgements

The purpose of this page is to thank those that have helped me achieve the completion of

this dissertation.

The first person that I would like to thank is my supervisor, Andrew Rae, who has proven to

be of immense help throughout the creation of this body of work. Andrew helped to keep

me on track and motivated, was a fantastic sounding board for ideas for the project, and

actually initially proposed the idea of a project on the progress of genealogy in relation to

information technology. I certainly do not believe that my work would be of the quality that

it is without the massive assistance you provided, Andrew.

I would also like to thank Mark Stansfield for agreeing to moderate this project, although we

did not have much interaction with relation to this project, the lectures that you provided

throughout the year for the subject proved invaluable, allowing me to know exactly what

was required of the project. Alongside Mark, I also have to thank Carolyn Begg for stepping

in to moderate my presentation and I enjoyed answering the questions that you provided

during said presentation.

I would finally also like to thank my family and friends who supported me through the

production of this dissertation. The help of my parents, who have provided everything that I

could have needed and for supporting me through my education throughout my life, cannot

be understated and without them, I would not be able to produce a piece of work of this

standard of quality.

v

Abstract

The purpose of this study is to analyse the human element of genealogy as information

technology has become an integral part of the field. The demographics of researchers are

identified and their fears about privacy are quantified using a two survey, quantitative

approach, with correlations drawn between certain demographical data such as their age,

marital status and number of children and their likelihood of having carried out research.

Conclusions are drawn on these through a comparison of the two surveys. The history and

future of genealogy is investigated through a combination of literature review and

qualitative survey approaches and predictions are made, alongside analysing the growth of

genealogy on the Internet as a social venture.

1

Introduction

Genealogy is an age-old discipline that has seen a rapid evolution as the Internet and

computers in general have become more widespread, available and advanced. For many

years, the art of genealogy was carried out in a manual and laborious way, with the only way

to gather information being to physically travel to or otherwise correspond with archives and

other such genealogical repositories. It was not until early 1994 following the public

release of the World-Wide Web (hereafter referred to as the web) in 1991 that the

Internet became a viable resource in genealogical research with Genserv becoming the first

publically accessible genealogy website (Christian, 2014). That is not to say that there was

nothing available prior to the release of Genserv as prior to the release of the web,

newsgroups were available for use with the first of these being net.roots (Christian, 2014).

With that being said, this paper is not written to discuss the history of genealogy on the

Internet, nor indeed of genealogy itself; but to analyse the effects that the introduction of

computers, the Internet and of information technology in general has had on genealogy as a

whole, with a special regard paid to the social consequences of this integration.

This paper aims to identify the demographics of those who undertake genealogical research

and analyse said demographics to reveal patterns and to investigate why people choose to

undertake genealogical research.

Privacy concerns continue to mount as the Internet becomes a larger part of genealogy,

especially since the appearance of DNA testing related to genealogy, with a portion of this

paper dedicated to investigating how safe this data is, what the data is used for and how

accurate it is and what the future holds for genealogy as a whole.

This paper is intended to address an apparent lack of academic interest regarding genealogy,

especially as pertains to the growth and various positive effects that information technology

has brought to genealogical research.

2

1 Literature Review

Surprisingly, there has been little work of note carried out concerning the fusion of

genealogy and information technology and this has been noted by various authors (Bishop,

2008; Veale, 2004). As a result, the majority of this literature review focuses on different

aspects of genealogy, ranging from books discussing the history and future of genealogy as a

subject to the privacy policies included with the major companies in modern genealogy.

These works were chosen as through the collective information they impart, they combine

to help the paper fulfil all objectives of this study.

The following literature review is split into different categories, culminating in a concluding

section reflecting on the literature review as a whole.

The first section addresses the assorted works that discuss the history of genealogy, ranging

back to the very beginnings of genealogy up until modern day. Moving past the history of

genealogy, this section then discusses the motivations behind the mass undertaking of

genealogical research over the years.

The second section focuses on the merging of genealogy and information technology, first

explaining how this union occurred before moving on to examining the growth of genealogy

because of the Internet and discussing the future trends of genealogy on the Internet.

Finally, the literature review examines the core topic of this thesis, the underlying privacy

issues that have become a part of genealogy as it has evolved onto the Internet. It examines

who owns the genealogical data, how said data can be used in malicious ways and finally

discussing the integrity of said data.

3

1.1 Genealogy

This section investigates genealogy in a historical context, through analysis of the historical

literature available on the subject, defining genealogy as a concept and revealing the origins

of genealogy. The progress of pre-modern genealogy will also be analysed, with the reasons

behind the popularity booms in genealogy being explained in the context of the era.

Definition of Genealogy

(the study of) the history of the past and present members of a family

The above is the definition given to genealogy by Oxford Dictionaries (2015) and it is

exactly that. Genealogists are the people who undertake research into their ancestors and

family tree to discover how they are all interlinked. Saar (2002) believed that genealogy is

a form of writing history whereas Sharpe (2011) posits that purists draw a clear distinction

between the terms genealogy and family history:

Genealogists set out to develop a pedigree a family tree based on identification

of births, marriages and deaths. Family history is wider in scope, aiming to fill out the

branches of this tree through investigation of other aspects of our ancestors lives. It

involves genealogical, biographical and historical research

The word genealogy itself has Ancient Greek roots, stemming from the Greek words genea

() meaning race, family or generation and logia (-), a suffix that denotes the

study of a subject (Oxford Dictionaries, 2015; Teknia, 2015).

4

History of Genealogy

Early Written Genealogy

Written genealogy has an incredibly long history but there are differing opinions on where it

originated, with no clear agreed-upon answer. Garrett (2010) claims that:

The tracing of genealogical lineages in Western Europe dates back, at least, to St.

Matthews gospel which was first written in Greek.

Potter-Phillips (1999), holds a slightly different view; whilst examining the mentions of

genealogy within the bible, the author points to the fact that these references may have

been born out of the Roman culture of the time.

Genealogy was practiced by the ancient Romans to distinguish between the

patrician class (those with proven noble ancestry) and plebians[sic] (commoners).

The author goes further, pointing out that the ancient Egyptians and Chinese both had

dynasties, which could both be construed as genealogies, with both pre-dating the New

Testament of the Bible.

As history progressed through the ages, the discipline of genealogy was primarily used to

settle disputes over titles, land and wealth as a much political importance was affixed to

your bloodline, with said power and wealth passed down through the generations. Whilst

not without question, these pedigrees were generally accurate as Pine (2014) explains:

The truth was sometimes bent to suit some political end, but, on the

whole, medieval European records are genealogically valid. This is because they were

not primarily intended to supply genealogical information but to record land

transactions, taxation, and lawsuits.

5

Early Modern Genealogy

Unlike the origins of written genealogy, theories regarding the beginnings of modern

genealogy are relatively dispute free. It is generally agreed that modern genealogy

originated in the sixteenth century, in particular, arising from a law passed by King Henry VIII

in 1538 which required that ministers keep records of christenings, baptisms, marriages

and burials. (Potter-Phillips, 1999).

The following centuries were filled with upheaval within the nobility and society as a whole,

with literacy rates rising alongside an increased societal interest in history, leading to a major

growth in genealogical interest (Pine, 2014; Potter-Phillips, 1999). Various European

provinces soon began to keep records similar to British ones, with Potter-Phillips (1999)

suggesting that this was due to influences by the Catholic Church. Interestingly, the author

also states, In most countries, church parish registers pre-date any civil record keeping.

Little of note occurred for centuries, as it remained a steadily studied discipline with stable

interest levels. Genealogy received a boost following the American Revolution as citizens of

the USA sought to establish links to the heroes of the Revolution and to those who originally

colonised the New World. This lead to the creation of the first genealogical society in the

world; the New England Historic Genealogical Society in Boston, Massachusetts in 1845

(New England Historic Genealogical Society, 2015; Potter-Phillips, 1999). It was not until

1911 that a similar society was set up in the UK; the Society of Genealogists, set up in

London, England (Kennett, 2011; Society of Genealogists, 2015).

Pre-Internet Modern Genealogy

This author considers the start of modern genealogy, as we know it today, to coincide with

the turn of the twentieth century. This is believed owing to many major genealogical

societies being established during this time, such as the aforementioned Society of

Genealogists. It was also in this time-period that the Church of Jesus Christ of Latter-day

6

Saints (hereafter referred to as the LDS Church) also known as the Mormon Church

began collecting and making genealogical records available, specifically, in 1894.

The LDS Church are based in Salt Lake City, Utah and believe that members can save

deceased ancestors and baptise them, leading to their collection of genealogical records. To

help achieve this aim, they set up the Family History Library, which is the largest genealogical

library in the world (Garrett, 2010; Utah, 2015).

In the early twentieth century, standardisation became a concern with the rise in

genealogys popularity as genealogists sought to make their field academically relevant.

Garrett (2010) stated that:

The need for standards resulted from a growing disdain of historians, librarians, and

archivists who viewed genealogy as nonscholarly[sic], error-filled works of family

pride.

Scholars viewed these attempts with disdain however, with an example communicated by

O'Hare (2002) citing an article from an edition of William & Mary Quarterly in 1942:

"[a]s a pleasant and harmless form of antiquarianism, the study of family history,

biography, and the tracing of genealogy are tolerantly humored but certainly not

seriously honored by historians and scientists."

Genealogy was seen as a quaint hobby but certainly not an academic subject, rightly or

wrongly, and it continued to increase in popularity steadily following the end of World War II

(Garrett, 2010), before exploding in popularity following the release of Alex Haleys book

Roots: The Saga of an American Family in 1976 and its subsequent award-winning mini-

series in 1977. Pettinato (2014) revealed that:

Requests to the National Archives for genealogical material quadrupled the week

after the TV show ended.

7

The author also points out that the number of genealogical societies being inaugurated in

the USA increased drastically following the release of Roots. The release of Roots spurred a

worldwide interest in the subject and pushed the discipline to heights that it had not known

before.

In this chapter, the author defined genealogy before identifying the key factors that drove

the evolution of genealogy and studied the history of the genealogical subject as a whole,

starting at the beginning of written genealogies until the appearance of information

technology. The author identified genealogy as a steadily growing subject that struggles with

being academically accepted but has captured the imagination of the public and at times, of

the nobility and politicians.

8

1.2 Genealogy and Information Technology

This section of the paper is intended to review the available literature and evaluate

genealogy as a field since the introduction of information technology, and reveal how the

field has grown since said introduction, with special attention paid to the growth of

genealogy since the introduction of the Internet.

Beginnings

Genealogy appeared on computers in 1979 with the first piece of genealogical software

released, Genealogy: Compiling Roots and Branches, written by John J. Armstrong and cost

$250 ($813.19 adjusted for inflation or 537.14) (Eastman, 2002; US Inflation Calculator,

2015; XE Currency Converter, 2015).

Due to genealogical software became more popular and widespread following the

pioneering work of John J. Armstrong in the creation of Genealogy: Compiling Roots and

Branches, there arose a need to create a specification that allowed the transfer of

genealogical data between the various software on the market.

The Family History Department of the Church of Jesus Christ of Latter-day Saints recognised

this need and created the GEDCOM (GEnealogical Data COMmunications) standard in 1984

(Nurse, 1994). The initial problem to overcome was that only the program written by the

Church of Jesus Christ of Latter-day Saints could read this new standard, although over the

years, most programs came to read the .GED format utilised by the GEDCOM standard

(Eastman, 2014). Eastman (2014) further explains that GEDCOM required that the file was a

plain text file with a set, structured format with numbers preceding information to indicate

the position within the hierarchy that information sits, alongside a tag to identify the

significance of the information.

9

In 1983, genealogy made its first appearance on the Internet with the creation of the

newsgroup net.roots (Isaacson, 1998) though it was replaced in September 1986 by soc.roots

(Christian, 2014). This was followed up by the creation of the first genealogical mailing list,

ROOTS-L, in December 1987 (ROOTS-L, 2010).

These were the early days of the Internet and it was not the most useful of resources to use

at the time, as Garrett (2010) explains:

genealogists could only access the services of their online service providers and

communicate with fellow genealogists who subscribed to the same provider.

However, these were only temporary issues, alleviated by the public introduction of the

World Wide Web on 6th August 1991 (CERN, 2015) but Garrett (2010) goes further,

identifying the issues that were present at the time:

These early forays into Internet genealogical research were further limited by the

text-based technology of the time, which made it impossible to view digital facsimiles

of records.

The first family tree was put online shortly after by Alan Stanier in June 1993 followed by the

first directory of genealogical resources on the Internet, The Genealogy Home Page in July

1994 (Christian, 2014).

In between these two events, Mosaic was released in March 1993 and is credited with

helping to popularise the World Wide Web. Mosaic was the first browser that was able to

display images on the same page as text, instead of having to click on a link to view the

image in a separate window (Boutell, 2006).

In one of the most significant events in the history of genealogy on the Internet,

Ancestry.com was released in 1996 and it rapidly became the highest-trafficked site on the

genealogical Internet, where it remains to this day (Alexa, 2015; Ancestry, 2015).

10

In a landmark move, Scots Origins was launched on 6th April 1998, becoming the first pay-

per-view site for UK public records and in August 2001, they became the first to add the

1881 and 1891 census indexes and images to their records collection (Christian, 2014).

By 22nd April 2006, all available UK censuses were made available online with the addition of

the previously unavailable 1841 censuses and the collection was made complete in England

and Wales in January 2009 when the 1911 census was added, with Scotland following suit in

April 2011. (Christian, 2014)

Growth of Genealogy on the Internet

In late 1999, McClure (1999) stated that A search of the word genealogy on the Internet

results in over 5 million possible pages.. On 5th January 2015, a Google search showed that

this figure now stands at 92.1 million possible pages, with this number growing to 94.1

million results as of 22nd April 2015. This is an unsurprising amount of growth, given how

rapidly the Internet in of itself has expanded over the years. To give a more recent example,

Kennett (2011) identified that in August 2010, the Office for National Statistics revealed 10

million adults had never been online; making up 21% of the adult population within the

United Kingdom. In May 2014, the Office for National Statistics (2014) announced that the

number of adults that had never been online had dropped to 6.4 million, or 13% of the

British adult population.

Blogging about genealogy is also seeing growth through the years as Hill (2011) identified

that at the end of 2010, GeneaBloggers a website dedicated to blogging about genealogy

had 1,535 bloggers regularly posting on their website and by January 2015, that number had

risen to 3,056 (GeneaBloggers, 2015).

An article by GenealogyInTime Magazine (2012) included traffic rankings of the top 25

genealogical websites according to Alexas traffic ranking statistics, with the average traffic

ranking of said websites being 23,606. The author of this study (Genealogy and Privacy Issues

11

in an Online World) carried out similar research on 19th November 2014 and found the

average traffic ranking at that date to be 19,168, a growth of 23.15%, when directly

compared to previous rankings, websites showed an average growth of 23.3%, discounting

new entries to the rankings.

Genealogy on the Internet is also growing in a more subtle way, through the undeniably

ageing population of the world.

Archives.com data and broader industry analyses indicate that users of genealogy

websites tend to be female aged 45 or older. This age group constitutes 62% of

Archives.com members.

(Hill, 2011)

That alone does not show growth, however, statistics released by the UK government in

February 2012 indicate that the population of the UK is ageing (Rutherford, 2012). Through

analysis of statistics given in said paper, it predicts that the population within the age group

of 40 and over will overtake those younger than 40 by 2025, with the average age of the

population to rise from 40 to 43 by 2035 (Rutherford, 2012). Further analysis shows that the

population of the age group of 45 and over directly relevant to the data revealed by Hill

(2011) will grow by an average of 1.83% over the average population growth until at least

2035, where the available predicted statistics end (Rutherford, 2012). Josiam and Frazier

(2008) also stated that:

The more a person has used the Internet and the older they are, the more likely they

are to use the Internet for genealogy research.

This leads to the overall conclusion that as the population ages and becomes more

accustomed to the Internet, interest in genealogy on the Internet will rise.

12

The first recorded incidence of the term genetic genealogy appeared in a Dallas, Texas

newspaper known as Dallas Morning News in March 1989:

Of course, scientists have long known that we all carry a record of our roots in our

genes. It's just that the record in the rocks has been easier to read. Lately, though,

practitioners of genetic genealogy have found methods to search for the woman

from whom we all are descended.

(Siegfried, 1989)

Due to the increasing merging of genealogy and technology, commercially available genetic

DNA tests became available in 2000 following the launch of the companies Family Tree DNA

and Oxford Ancestors (Family Tree DNA, 2009; Oxford Ancestors, 2015). Surprisingly, the cost

involved in the testing was not prohibitively expensive when the technology first became

available, with Family Tree DNA offering mtDNA - mitochondrial DNA, DNA passed from

mothers to children (Phillips DNA Project, 2015) - tests for $219, with the equivalent test in

2015 costing $199 (Family Tree DNA, 2000; Family Tree DNA, 2015b).

Figure 1.1, on the following page, shows the rapid growth of genealogy websites related to

DNA, of particular note Family Tree DNA and 23andMe, who have improved their traffic

ranking by 6,815 and 5,281 according to the traffic rankings of Alexa respectively over the

five-month period between 19th November 2014 and 16th April 2015. It also shows that the

top websites within the genealogical world stay rather stagnant, or with slow but steady

growth. Of particular interest is that there appears to be a slight drop in interest going over

the Christmas season, although it recovers quickly going into March. A ranking of the top 25

websites with a comparison to 2012 can be found in Appendix H: Genealogy Website

Rankings - Data and Appendix I: Genealogy Website Rankings - Graph.

13

Figure 1.1 - Top 15 genealogy website traffic-rankings over a five-month period (lower is better) - Alexa

The size of Ancestry.com as a whole cannot be understated, with seven of the top 15

websites shown in Figure 1.1 being owned by the Ancestry.com group, these being

Ancestry.ca, Ancestry.co.uk, Ancestry.com, Ancestry.com.au, Archives.com, Genealogy.com

and Find A Grave (Tester, 2014).

In summary, the field of genealogy has seen enormous growth since appearing on a virtual

format, and especially with the Internet. The gradually increasing penetration of the Internet

has made genealogy more accessible to people of a lower income and of the older, reducing

the need for expensive travel or cost at all, as it is very possible to carry out genealogical

research entirely free.

14

1.3 Privacy Issues

This unit of the literature review is dedicated to reviewing the literature available on privacy

issues related to genealogy as a field, mainly evaluating the accuracy of data and the

databases that genealogical data is held on in the age of the Internet.

Integrity of Genealogical Data

Issues with the veracity of genealogical data is not a problem new to the Internet age of

genealogy but it is one was carefully watched as the Internet has become a larger and larger

part of genealogy as a whole. As Howells (1998) posited when genealogy on the Internet was

under its highest levels of scrutiny:

In the future, we will continue to find published information which is based on

hearsay and poor research methods entirely lacking in any source citations.

The key word in this statement being continue. There were also accuracy issues in regards to

earlier genealogical records, for example, Durie (2009) states that missing records are an

issue in early Scottish censuses, stating about the 1841 Scottish census:

Some parishes are known to be missing from the records. A lot of these are in Fife

because the records were lost overboard during their transit by boat to Edinburgh.

Even though people might have moved after census night, and therefore could be

counted twice, it was impossible to repeat the exercise for these fourteen Fife

parishes, which represented about 30% of Fifes census data, much to the fury of

genealogists ever since.

This process was repeated in the 1851 Scottish census, with seven registration districts going

missing. Interestingly in the aforementioned 1841 census, the ages of anyone over the age

15

of fifteen were rounded down to the nearest five, creating inaccuracies within dates of

births (Durie, 2009).

Some authors have doubts about the authenticity of genealogical records on the Internet,

amongst them Kovacs (2001) who wrote:

One issue which should concern genealogists who find records on the Internet is the

authenticity of the documents. It is often difficult to ascertain whether or not primary

records have been altered (either inadvertently or intentionally) in the digitization

process.

There are also genealogical researchers who flat out reject online records as being valid as

explained in a paper by (Garrett, 2010), referencing a book written by Crowe, E. P. entitled

Genealogy Online:

In fact, according to Crowe, many professional genealogists refuse to consider online

records as authoritative: [t]heir attitude is this: A source is not a primary source

unless you have held the original document in your hand. And a primary source is not

proof unless it is supported by at least one other original document you have held in

your hand.

There is also the constant issue that when data is on the Internet, it is there forever, despite

any questions over the accuracy of said data. Bishop (2008) carried out a study analysing

exactly why genealogists carry out their research, prompting them to keep a diary and

record their experiences. It threw up some interesting anecdotes about errors in

genealogical research, with one researcher noting that she had found errors in a book but

decided against notifying the author of said book of the error; perpetuating a cycle of

erroneous information. Bishop (2008) writes:

One researcher shared her frustration at her inability to convince other researchers

to correct a piece of information about her great-great-grandmother. They have

[her] married 4 times and will not change their documentation. With each post to an

16

online message board devoted to the family, this myth perpetuates itself! she

wrote.

Finally in this study, another researcher noted problems within census taking in the early

twentieth century:

I was so disgusted with my grandmother and what she told the census taker on the

1930 census, said one researcher. Her grandmother said she was born in California

and that her parents were from France; both pieces of information turned out to be

incorrect.

Veale (2004) identified several issues with genealogy on the Internet in her paper, noting

that groups such as the Internet Genealogists for Quality have been set up in response to the

volume of false genealogical data available on the Internet. A possible reason for this

proliferation is given by Veale (2004):

Thus the many genealogies published on the Internet have given rise to the quickie

genealogist those who go online to pursue their ancestry, and by using the work of

others, copy the information verbatim, disregarding basic genealogical methodology,

to regurgitate the material, mistakes and all, as their own

This all combines to give a negative perception according to Veale (2004):

Some of the negative perceptions include: concerns over information veracity and

quality; fears about intrusions into privacy and even the chance for identities to be

stolen; and the commercialisation of both amateur labour and previously free

information.

Even within genealogical software itself, there can be issues with data loss when exchanging

data between programs using dissimilar standards as Eastman (2014) explains:

Translating from one programs database to GEDCOM is sort of the same as

translating from one spoken language to another. The basics work, but subtleties and

17

details sometimes do not translate well. Then, when translating to the third language

(the receiving genealogy programs database), more translation losses creep in.

In terms of the security and integrity of genealogical databases/websites, there has been

little to worry about. The biggest of these, Ancestry.com has only one incidence that the

author can find online of their services being compromised, and this was when their servers

were subjected to a DDoS (Distributed Denial of Service) attack in June 2014, although no

data was compromised (Dobner, 2014).

There was an incidence of the data of living people being released accidentally in relation to

genealogy however, by website run by the Irish government named IrishGenealogie.ie. The

website accidentally released the civil registry records of every citizen born or who married

in the State in July 2014 (Edwards, 2014).

Aside from these two incidents, genealogy websites appear to be mostly free from security

issues, whether this is due to good security practices by the services or just a lack of interest

from malicious instigators is questionable but ultimately irrelevant as it cannot be proved

either way at this point by the author.

Who Owns Genealogical Data?

Hoffman (2011) states that genealogy falls under the purview of intellectual property laws.

This means that the genealogical data that you create is copyrighted to you, i.e. if you create

a family tree online, that family tree is copyrighted to you. This is only the case if you add

some creativity to the tree, i.e. you can add some form of narration to the tree. This

essentially means that you own the data that you create in relation to genealogy, so long as

you have added a modicum of creativity to it (Hoffman, 2011).

One of the most contentious issues arising due to recent genealogical advances is the

ownership of DNA information if a genealogist submits their DNA to a testing lab to identify

18

their heritage. Despite this, the reading of various privacy policies of DNA testing companies

such as Family Tree DNA, 23andme and AncestryDNA by the author has revealed that there

are no real issues regarding this, within the largest companies. Interestingly however, Family

Tree DNA do not seem to have put very much thought into their privacy policy (Family Tree

DNA, 2015a). The author posits this as the policy states:

Family Tree DNA also adheres to the Genetic Genealogy Guidelines as proposed by

the The[sic] Genetic Genealogy Standards Committee in January 2015.

This is interesting as The Genetic Genealogy Standards Committee (2015) state at the start

of the second paragraph of their Genetic Genealogy Standards that:

These Standards are intentionally directed to genealogists, not to genetic genealogy

testing companies.

This chapter, more than anything, identified the sheer lack of literature regarding privacy in

genealogy. However, it also revealed the error-strewn past of genealogy, showing that

inaccuracies are not solely born of the Internet age for genealogy.

In the first chapter, the author examined the history of genealogy before studying the effects

that information technology has had on genealogy as a whole in the second chapter. Finally,

the third chapter focused on the integrity and partially on the ownership of genealogical

data and the websites that hold this data.

There is a surprising lack of information about privacy issues in terms of genealogy and the

author has found that even the most basic of questions have yet to be answered within

genealogy as an academic field.

The material discussed within the past three chapters and the material that has been

reviewed as a whole throughout the entire process has led to the final question of:

Are genealogical researchers worried about privacy issues within genealogy in the age of

the Internet and, if so, what worries them the most?

19

2 Primary Research Methodology

This chapter is intended to outline the possible methodologies that can be used to carry out

the primary research required for a thorough undertaking of the project, discussing the

features and limitations of the various methodologies available for use.

The intention of this chapter is to outline the optimal methods for the undertaking of a study

such as this one and to discuss the reasons behind the choosing of this methodology. The

underlying philosophical terms relevant to modern methodologies are also explained in

detail, to give a fuller understanding of the subject matter.

20

2.1 Terms

When discussing research methodologies, it is important to understand some of the terms

behind the subject matter and this section aims to explain the various terms required for a

full understanding of the methodologies. There are two main philosophical approaches to

methodologies and these are positivism and interpretivism.

Positivism

Positivism is generally regarded as being a scientific approach to research, using an objective

and measurable approach to every study.

Punch (2013) defines positivism as:

the belief that objective accounts of the world can be given, and that the function

of science is to develop descriptions and explanations in the form of universal laws

Bryman (2012) identifies that there are five key principles of positivism, and these are:

Phenomenalism the principle that only knowledge that can be perceived through senses

can be identified as usable knowledge (Mastin, 2008).

Deductivism the ability to create hypotheses that are testable and allow explanations to be

easily created (Bryman, 2012).

Inductivism the principle in which Knowledge is arrived at through the gathering of facts

that provide the basis for laws (Bryman, 2012).

Alongside these three principles, the fourth and fifth principles of positivism require that a

study be free from bias (objective) and that a clear distinction is made between scientific

statements and subjective statements.

21

Interpretivism

Interpretivism is generally seen as the antithesis to positivism by researchers. Whereas

positivism focuses on objective and scientific approaches, Murphy (2014) states that

interpretivism encourages researchers to explore the data and to come to an understanding

of why the people involved in the study made the choices that they did. Punch (2013) gives

the definition of interpretivism as being:

the philosophical position that people bring meanings to situations, and use these

meanings to understand their world and influence their behaviour

This shows that interpretivism has to be approached with a measure of subjectivity, as it is

important to discover the reasoning behind answers. Interpretivism also requires the author

of a study to not structure the data in a format that follows the researchers initial

assumptions (Murphy, 2014).

22

2.2 Available Research Methodologies

There are three main approaches to be considered when identifying an appropriate

methodology for the undertaking of primary research; these are given as qualitative,

quantitative and mixed methods (Creswell, 2003). This section is dedicated to exploring

these methods and aims to evaluate the individual methods strengths and weaknesses.

Quantitative

Quantitative research methods attempt to maximize objectivity, replicability, and

generalizibility[sic] of findings, and are typically interested in prediction. Integral to

this approach is the expectation that a researcher will set aside his or her experiences,

perceptions, and biases to ensure objectivity in the conduct of the study and the

conclusions that are drawn. Key features of many quantitative studies are the use of

instruments such as tests or surveys to collect data, and reliance on probability theory

to test statistical hypotheses that correspond to research questions of interest.

(Harwell, 2011)

Quantitative research is generally carried out in the form of surveys or through alternative

data gathering activities that have a focus on closed-ended questions (Creswell, 2003).

Creswell (2003) also identifies that quantitative research is post-positivist that is, it is an

empirically scientific approach to a study, focusing on the objective facts throughout a study,

although not to the rigid standards set by positivism, allowing for some manner of

subjectivity to be applied.

23

To give an example of a quantitative approach related to the subject matter of this project, a

survey could be created and then distributed to a large amount of people using the Internet

to gather demographical information, with questions ranging from asking respondents age

to asking if they have ever undertaken genealogical research.

Advantages

There are multiple advantages to choosing to approach a study with a quantitative

methodology, key amongst them being that a quantitative approach allows a large amount

of data to be collected relatively quickly. This can be in the form of an online survey

distributed to many respondents, which, aside from the initial time spent designing the

survey, essentially runs itself and allows responses to be collected passively without much

effort on the part of the researcher.

The data that is gathered through a quantitative approach can generally be easily quantified

and analysed as quantitative data is in the form of closed ended questions, which allow

responses to be tallied into easily read sets of data. Large-scale analysis can be carried out

using statistics software, such as Microsoft Excel, and it is relatively straightforward to carry

out this analysis, with a lot of the work carried out by the program automatically, although it

can be time-consuming (Punch, 2013).

Quantitative research such as surveys also appeal to the natural human preference for

numbers as opposed to having to fill out text boxes for questions (Creswell, 2015).

Due to the closed-ended nature of questions asked, quantitative results are generally

regarded as objective data as the raw data is free from misinterpretation, as opposed to a

respondent responding in the form of text, where tone could be important. This leads to

quantitative data being largely accepted as unbiased and mostly free from subjectivity.

Quantitative studies have better replicability than qualitative surveys, allowing multiple

researchers to carry out the survey, to further confirm or reject the original conclusions of

the study (Altermatt, 2008).

24

Disadvantages

A quantitative approach is not without its disadvantages however, with one of the major

criticisms levelled at the methodology being its inherent inflexibility. In many cases, a

researcher cannot identify every single category of response to a question, leading to

inaccurate responses being gathered, due to the respondent not agreeing with any of the

closed-ended options (Altermatt, 2008).

Creswell (2015) also states that quantitative research Is impersonal, dry and that it Does

not record the words of participants, possibly missing out on possible key information that

would have otherwise been gathered through a more personal, qualitative approach.

Qualitative

In contrast to quantitative research, Harwell (2011) infers that the qualitative methodology

is quantified through:

discovering and understanding the experiences, perspectives, and thoughts of

participants that is, qualitative research explores meaning purpose or reality

Altermatt (2008) concisely defines qualitative research as:

qualitative research involves observations that are transformed into records based

on the observers intuitive sense of what is important.

In essence, qualitative research is the antipode of quantitative research, focusing more on

the narrative of the information gathered as opposed to the facts produced (Creswell, 2015).

A qualitative approach is usually used to generate theories, as it gathers open ended

answers from respondents, allowing for researchers to generate a hypotheses based on the

answers given (Punch, 2013).

25

An example of a qualitative study in the context of this project, a qualitative approach would

be performing in-depth interviews, preferably face-to-face with genealogists from

genealogical societies, where possible. This would allow for an in-depth discussion on the

factors behind undertaking research and the future of genealogy as a whole.

Advantages

The most important advantage that qualitative research has over quantitative research is

that qualitative research can provide extremely detailed information on the subject in

question through a written description as opposed to purely numerical data. Data is not as

narrowly focused as quantitative data is, allowing for a more thorough analysis to be

performed (Altermatt, 2008).

Altermatt (2008) suggests that if a researcher is unfamiliar with the project at hand, a

qualitative approach allows a researcher with moderate knowledge to ask open questions,

which when answered, gives the researcher a greater knowledge of the subject, and allows

them to approach the project with a more narrow focus.

Creswell (2015) states that a qualitative study Is based on the views of participants, not of

the researcher, suggesting that a qualitative approach helps to mitigate the problems

associated with the pre-defined assumptions of the researcher designing the study.

Disadvantages

The biggest disadvantage of undertaking a qualitative study is that it becomes very difficult

to study a large subset of people, as each individual study requires a much larger amount of

time, as opposed to a quantitative study.

The nature of a qualitative study is such that it only provides soft data, which is highly

subjective and relies heavily on the participants of the study, reducing the ability of the

researcher to apply their expertise (Creswell, 2015).

26

Largely, the disadvantages of the qualitative methodology mirrors the advantages of the

quantitative methodology. Altermatt (2008) indicates that qualitative studies are susceptible

to confirmation bias as:

observers intuitions may lead them to seek out, notice, interpret, and remember

events that are consistent with their expectations

It is also much more difficult to analyse qualitative data as opposed to quantitative data as it

is generally in the form of words, instead of numerical data. It is still possible to quantify this

data, although it is extremely time-consuming, compared to quantitative data.

Qualitative studies are also difficult to replicate due to the in-depth, subjective nature that is

part of the methodologys core principles (Harwell, 2011).

Mixed Methods

As the name indicates, a mixed methods methodology combines aspects of both

quantitative and qualitative research methods. Creswell (2015) states that he believes mixed

methods research to be:

An approach to research in the social, behavioral, and health sciences in which the

investigator gathers both quantitative (closed-ended) and qualitative (open-ended)

data, integrates the two, and then draws interpretations based on the combined

strengths of both sets of data to understand research problems.

Mixed methods are generally considered a fairly new methodology, as indicated by Harwell

(2011), with its modern standards appearing during the early 1990s. No set, widely

accepted definition of the mixed methods approach exists, due to the contentious nature of

the methodology. The author agrees with the above definition however, as for it to be a

methodology of its own, the approach cannot simply attach the methods together without

any relevant linkage between them. Harwell (2011) posits that:

27

other authors say a mixed methods study must have a mixed methods question,

both qualitative and quantitative analyses, and integrated inferences

This definition meshes effectively with the definition given by Creswell, and these given

definitions are the definitions that the author agrees with. The opposite view is that a mixed

methods approach is any study with both qualitative and quantitative data (Harwell,

2011), which, in the authors opinion, is simply a study using both qualitative and quantitative

methodologies as opposed to a new methodology in its own right.

Advantages

A mixed methods approach allows for the most in-depth and accurate data gathering of the

three given methods due to the combination of both qualitative and quantitative

approaches to form a complete picture (Creswell, 2015).

Disadvantages

The major disadvantage of a mixed methods study is that compared to both qualitative and

quantitative studies, it is incredibly time consuming. This is in part due to the large amount

of planning that is required to effectively carry out a mixed methods study, and also due to

the complex data analysis required to successfully integrate the qualitative and quantitative

data (Punch, 2013).

28

2.3 Chosen Research Methodology

The chosen methodology for the purposes of this study was that of a quantitative approach,

with some elements of the qualitative methodology also. As the qualitative questions do not

directly link with the quantitative questions proposed, it is not a mixed methods survey.

A quantitative approach was chosen due to the large importance of gathering the

demographics of genealogical researchers, requiring a large volume of respondents,

rendering the usage of a qualitative approach to be inadequate due to the large time

required to carry out large-scale qualitative research. As two surveys are planned, a

quantitative survey is also ideal due to its high replicability, this replicability allows the

author to carry out two surveys within the time allotted; one being a preliminary survey to

gather a baseline result, to ensure that the final surveys results were not badly skewed.

Owing to the importance of the gathering of demographics, a quantitative approach was

chosen, as the gathering of demographics requires a relatively large sample size. The

quantifiable nature of the data is also helpful for the data analysis required to identify

patterns in those who undertake research as it allows the data to be analysed quickly and

thoroughly using programs such as Microsoft Excel.

Whilst a qualitative survey would have allowed for a much more in-depth analysis of

sections of the final project, for example, for investigating why genealogists carry out their

research, it would have been far too time consuming to carry out qualitative research to the

volume and ultimately quality required for an accurate demographics result.

The qualitative elements of the survey come in the form of open-ended questions at the end

of both surveys. With both surveys, the survey has been split into two separate pages, with

multiple choice questions on the first page, and text-based open-ended questions on the

second page. Participants will only advance to the second page if they indicate that they

have undertaken genealogical research at some point, as the questions on the second page

require some knowledge of genealogy. This decision was made because the author did not

29

want respondents closing the survey at the sight of text boxes and it was reasoned that if the

respondent indicated that they were interested in genealogy, the participants would be

more likely to answer said questions.

To summarise this chapter, the author has investigated the various methodologies available

to a researcher carrying out a study. Each of the three main approaches were evaluated,

with the advantages and disadvantages of each method revealed and contrasted. This

evaluation of methodologies was used to select an appropriate method to approach the

study in question with, with the reasons justified and outlined. The method chosen for the

study was that of a quantitative approach, with elements of the qualitative method as it

allows the author to assess the demographics of researchers with the appropriate volume.

30

3 Primary Research Analysis

This chapter is intended to evaluate the primary research that has been gathered

throughout the entire project timeline. The chosen format for the study was two separate

quantitative surveys, created using instant.ly, issued online through various social media

avenues such as Facebook and Reddit, and analysed using the inbuilt tools of instant.ly and

Microsoft Excel. Some data is also displayed using tools made available by info.gram.

There are three sections of data analysed in this section. The first section of data is the

quantitative data gathered to identify the demographics of those who undertake

genealogical research. The second section analyses both quantitative and qualitative data,

with a heavy focus on investigating the privacy issues associated with genealogy and how

participants feel about said privacy issues. Finally, the third section focuses mainly on the

qualitative data gathered, this section is dedicated to understanding the underlying

motivations and future of genealogy as a whole, with attention also paid to genealogy as a

social venture.

31

3.1 Demographics

This section reviews and analyses the demographical data acquired through the two surveys

issued for the purposes of this study. The author evaluates both surveys in their own rights

before comparing and contrasting them with each other in order to identify possible trends

within the genealogical community. As will become evident, the study received a large

contingent of responses from younger, single and childless participants. As such, the author

will not state overall shares per demographic as it is skewed by the lack of a normal

distribution, therefore, only the percentage chance of a certain demographic having

undertaken research will be evaluated, to prevent incorrect results.

Preliminary Survey

Introduction

The preliminary survey was taken by 310 respondents across 24 different countries. It was

distributed over the Internet using various social media such as Facebook and Reddit. The

survey tool used was instant.ly and select data was displayed in the form of an infographic,

using info.gram (see Appendix E: Data from Preliminary Survey). In total, 149 people

completed the entire survey, having answered yes to having undertaken genealogical

research and not leaving the survey prior to completion.

The survey overall comprised of nine closed-ended/quantitative questions, with six being

multiple choice radio buttons, two accepting numerical answers and one comprising of a

drop-down menu. Two qualitative questions were also asked, both requiring a textual

answer. Seven of the quantitative questions were placed on the first page, with the other

two quantitative and two qualitative questions asked on page two. All questions in the

survey were mandatory.

32

The survey was intentionally designed to have nothing but multiple-choice questions on the

first page, to avoid having participants deciding to end the survey without fully completing it.

Unfortunately, once greeted with the two qualitative questions on page two, 17.68% of

participants opted to leave the survey; however, this ultimately validated the authors

decision to split the questionnaire into two pages, as the demographics of those 32

respondents were still collected. A full transcript of the survey can be found in Appendix A:

Preliminary Questionnaire. The average time taken to complete this survey was 1 minute

and 48 seconds.

The aim of this survey was primarily to gather a baseline demographic for genealogical

researchers, to ensure that the final survey had something to compare to, in order to verify

that the final survey was not skewed towards any one demographic, and to be able to make

correct and valid correlations and conclusions based on the combined data. Appendix F:

Preliminary Survey Demographics Results contains the raw quantitative response data

from the preliminary survey, for reference purposes.

Age

The preliminary survey saw a large range of ages participating in the study, with participants

as young as 14 and as old as 70, with the overall median age of respondents being 26.5 years

old. For researchers, the minimum and maximum ages stayed the same, at 14 and 70 years

old respectively, however the median age increased by three and a half years to thirty.

Appendix F: Preliminary Survey Demographics Results identifies the shares that each age

group held in the survey.

33

Figure 3.1 - Preliminary survey: Age vs. interest and research undertaken

In Figure 3.1, above, the data shows a clear correlation between age and having undertaken

research, with a visible turning point being reached when participants were above the age of

24 years old. There is a slightly similar correlation with interest in genealogy and age,

although it is far less pronounced than with actually carrying out research.

Gender

There was a fairly even split of genders amongst respondents to the initial survey, with

46.45% of respondents being male and 52.26% of respondents being female, with a further

1.29% of participants not identifying as either male or female. There does not appear to be

any relationship between gender and either being interested in genealogy or having

34

researched your family tree, with there being a 1% difference between males and females in

interest (male: 85.42%, female: 86.42%) and a 2.4% difference in having undertaken

research (male: 60.42%, female: 58.08%). The author did not feel that the data on those who

did not identify as either male or female was not sufficient to produce a conclusive

relationship, as only four respondents to the survey recorded themselves as such.

Marital Status

There was a significant skew in favour of those who were single, although this was

unsurprising, given the aforementioned median age of 26.5. The percentage share per

marital status is as follows:

Single: 50.32%

Living with a partner: 16.77%

Married: 28.39%

Separated: 0.65%

Divorced: 3.23%

Widowed: 0.65%

As the data above shows, those who have never married comprise 67.1% of the survey

participants, with the remaining 32.9% having married at some point in their lifetime and

these are the two groups that the survey will focus on, as they carry similar views.

Table 1 - Preliminary survey: Marital status vs. interest and research undertaken

35

As Table 1, on the previous page, shows, there is no significant correlation between interest

in genealogy and a persons marital status. There is a trend between marital statuses and

having carried out research, however. Those who have been married at some point in their

lives were found to be 41.03% more likely to investigate their family genealogy than those

who have never married. This was the strongest correlation of the preliminary survey. This is

due to 72.55% of those who have been wed indicating that they had undertaken research, in

contrast to the 51.44% who recorded themselves as having never married indicating that

they had researched their genealogy.

Children

The prevalence of young, single participants leads there to be a large skew towards

respondents having no children. 73.87% of respondents were childless, with only 26.13%

having children. This disparity led the author to collate all respondents with children into a

single category, as opposed to evaluating each separately. For posterity: 10% had one child,

10% had two children; 3.87% had three children, 0.97% had four children and 1.29% of

participants had five or more children.

Table 2 - Preliminary survey: Number of children vs. interest and research undertaken

36

Table 2, on the previous page, indicates no correlation between interest in genealogy and

the number of children that a person has, although there is certainly one with having

undertaken research. In this instance, Table 2 shows that 54.59% of those without children

had carried out some form of family research, and when all participants with childrens

responses were collated, it was revealed that 69.14% of parents have researched their

genealogy, a percentage increase of 26.66%.

Experience

Participants in the survey were also asked when they had first begun to research their

genealogies, as well as how many years of experience they had of research; this question

was only asked if they had indicated that they had indeed carried out genealogical research

at some point in their lives. The reason that two such similar questions were answered was

to identify if researchers judged their experience as going from their first experience of

research or from continuous research.

The median experience of respondents was five and a half years of experience if judged

going by age versus when they started their research, using the raw quantitative data. Many

respondents answered that they had zero years of experience, which is made clear in Figure

3.2 on the following page, with the longest amount of experience being 60 years.

Unfortunately, not all respondents answered this question correctly, with fourteen

respondents giving unusable responses.

The median years of experience of respondents as based on the bandings of experience

given (0-1, 1-3; 3-5, 5-10; 10+) was three to five years, which does not differ much from the

previous result, as the difference can be explained by the absence of the fourteen responses.

37

Figure 3.2 - Preliminary survey: Age of researchers vs. when they started researching

Figure 3.2, above, shows a scatter graph of all respondents ages versus when they started

researching, it is clear that many participants had little experience, with 49% of respondents

indicating that they had less than three years of experience. The largest contingent of

respondents indicated that they had their first taste of genealogical research before the age

of 18, with 34.81% of participants fitting into this section; this would indicate that many

researchers had their first experience of research whilst still in school, perhaps as part of a

school project.

38

Conclusion

The preliminary survey revealed that a given person was 26.66% more likely to carry out

research if they were a parent, 46.6% more likely to research their family tree if they were in

a serious relationship involving cohabitation and generally more likely to carry out research

as they age. It was found that the median amount of experience that researchers have is five

and a half years, with the largest segment of researchers having first undertaken research

before the age of 18. No significant correlation was found between gender and the chances

of having carried out research. Interestingly, it was also revealed that 70.48% of respondents

from the United States of America had carried out research but only 35.29% of participants

from the United Kingdom had done so, a huge culture shift, with Canada also leaning

towards the American result with 64.52% of their respondents also indicating that they had

undertaken research.

39

Final Survey

Introduction

The final survey was taken by 420 unique people, across 40 different countries (outlined in

Figure 3.3, below). It was distributed through the same channels as the preliminary survey,

to minimise differences in respondent demographics, to compare both studies adequately,

using the same tools. The tools used were instant.ly to create the survey, with it being

distributed through Facebook, Reddit, The Student Room, and through emailing genealogical

societies.

Figure 3.3 - Final survey: The 40 countries that responded

40

The final survey was more in-depth than the preliminary survey, with a total question count

of 16. There was a higher focus on qualitative questions throughout the survey, as it was

primarily intended for use in identifying the privacy issues that researchers feel are present

in genealogy since the advent of the Internet. Despite this, the survey also had a secondary

objective of confirming the results identified in the preliminary survey.

Following the completion percentage success of the preliminary survey, the author decided

that it would be appropriate to follow that example, and limit qualitative questions to the

second page of the questionnaire. The first page of the survey remained nearly identical to

the preliminary survey, with the only change being adding age bandings to the question of

respondents ages, instead of asking for their specific age. It was reasoned that this change

would increase participants comfort levels with answering the survey, as it decreases the

amount of identifiable data, without harming the data quality excessively.

Ten of the questions were multiple-choice questions with radio buttons, with a further

multiple-choice question with a dropdown box for an answer selector. One question

comprised of a matrix, with six different questions within this question, asking about their

levels of worry in relation to the categories. The other five questions were qualitative

comment-based questions, with two of these being optional. A full transcript of the survey

can be found in Appendix B: Final Questionnaire. The average time taken to complete the

survey was 2 minutes and 6 seconds, 18 seconds longer than the preliminary survey.

Unfortunately, the increased quantity of in-depth questions lead to a much higher survey

termination rate, with 41.18% of participants opting to end the survey upon viewing the

second page of questions, an incredibly high ratio. This lead to the final survey having fewer

participants completing the entirety of the survey, with only 120 finishing the full survey, a

decrease of 29 from the initial survey, despite there being an increase of 110 in the amount

of participants completing the first page of the survey.

Appendix G: Final Survey Demographics Data, reveals the raw demographic data gathered

through the lifetime of the surveys issuance, for reference.

41

Age

As this survey asked respondents for their age banding as opposed to their specific age, a

maximum and minimum age could not be acquired in this survey, however, the median age

banding for respondents in the survey was 25-34, with the median age banding for

researchers being the same.

Figure 3.4 - Final survey: Age vs. interest and research undertaken

Figure 3.4, above, shows a clear correlation between age and having carried out research,

although there is a bell curve correlation with interest levels, although the author believes

this to be the case due to the small sample size of those aged 65+. The jump in respondents

having researched their family history once they hit the age of 25 is nothing short of

42

remarkable, with a 111% increase in the likelihood of carrying out research when moving

from the age groups 18-24 to 25-34.

Gender

There was a similar split in the final survey to the preliminary survey, with a slight increase in

favour of females, as 44.05% of respondents identified as male, 54.76% as female with the

remaining 1.19% identifying as neither male nor female. Whilst there was no relationship

found between gender and being interested in genealogy, it was found that participants

were 16.73% more likely to carry out research if they were female. Again, as with the

preliminary survey, the author does not feel comfortable drawing conclusions with those

who do not identify as either gender as only five respondents out of 420 classified

themselves as such.

Marital Status

Appendix G: Final Survey Demographics Results indicates that there was a large skew

towards respondents classifying themselves as single, with 55.95% of participants identifying

as such. Following the example set in the preliminary survey, the author again splits the

demographics into two groups, those who have married (married, separated, divorced, and

widowed) and those who have never married (single, living with a partner). A relationship

was found between the likelihood of a given person being interested in genealogy based on

their marital status, with people who had married being 20.5% more likely to be interested

in genealogy than those who had never married, with the figures standing at 89.17% for

those who had married and 74% for those who had not.

43

Figure 3.5 - Final survey: Marital status and undertaking of research

Following the trend of genealogical interest, there was an extremely large correlation

between marital statuses and the likelihood of having carried out research. Based on the

aforementioned groupings, it was found that those who had married were 97.37% more

likely to have carried out research than those who had never married. Figure 3.5, above,

shows the percentage of participants having undertaken research amongst all of the

individual marital statuses. When grouped into having married and never married; the figure

for those who had undertaken research stood at 75% of those who had married in contrast

to 38% of those who had never married having carried out research.

44

Children

As with the preliminary survey, there is a large skew towards childless respondents, with

78.57% of participants stating that they have no children, with every other option under

10%. Full listings of shares can be found in Appendix G: Final Survey Demographics

Results.

Table 3 - Final survey: Number of children vs. interest and research undertaken

There is a direct correlation between people being a parent and being interested in

genealogy, with a parent being 13.95% more likely to be interested in genealogy than a

childless respondent is. In addition, there is a clear relationship between parenthood and the

likelihood of a given person having carried out genealogical research, as shown in Table 3,

above. Amongst participants, 67.78% of parents indicated that they had undertaken

research, whilst only 43.33% of childless respondents had done so. This revealed that a

respondent was 56.41% more likely to research their family tree if they were a parent.

45

Conclusion

The final survey confirmed many of the conclusions of the preliminary survey. A direct

correlation was found between age and the likelihood of a given person carrying out

genealogical research, with a noticeably large jump of 111% when moving between the age

brackets 18-24 and 25-34. A relationship was found between gender and the percentage of

respondents undertaking research; it was found that females are 16.73% more likely to carry

out research than males. The final survey also found that a given person is 20.5% more likely

to have carried out research if they have been married at one point in their lives, regardless

of if they remain married or not. This cross-section of people are also 97.37% more likely to

have researched their family history than those who had never married, according to the

findings of this study. Finally, it was also found that a given participant was 13.95% more

likely to be interested in genealogy and 56.41% more likely to have carried out some form of

genealogical research if they were parents.

46

Comparison and Conclusions

One conclusion that can be drawn with the surveys is that respondents were reluctant to

carry out surveys that required them to answer in the form of a text box, justifying the

authors decision to split the surveys into two separate pages. The data supports this

hypothesis, with 17.68% of participants deciding to end the survey upon seeing the second

page of the preliminary survey, which included two comment-based questions whilst the

final survey, with five qualitative questions saw 41.18% of the respondents close the page

upon seeing the second page of the survey.

Figure 3.6 - Comparison: Percentage of each age group that had undertaken research

47

There is an incredibly strong correlation between age and the chances of a given person

having undertaken genealogical research. This is consistent with the findings of Hill (2011),

who identified that the majority of Archives.com users were aged over 45 years old.

Notice a peak in the age bandings 45-54 before dropping going into 55-64 on both surveys in

Figure 3.6 on the preceding page, more noticeably on the preliminary survey. The author

believes that this is due to the 1977 release of the TV mini-series Roots by Alex Haley, when

the participants in question would have been between the ages of 6 and 16, school ages,

where they could have been asked to undertake research as part of the project.

Table 4 - Preliminary survey: Age vs. years of experience of researchers

Table 4, above, and Table 5, below, certainly support this theory as there is a noticeable

jump in experience of those aged between 45 and 54 years old, indicating that they could

well have had their first taste of research in school, around the time of Roots release.

Table 5 - Final survey: Age vs. feeling of experience of researchers

48

The fact that there is also a marked drop-off with experience for 45-54 year olds with the

Very experienced option also suggests that they undertook their research in school many

years ago and thus do not regard themselves as very experienced. The discrepancy between

the data may be due to participants feeling that the original experience question in the

preliminary survey was based on when they first undertook research as opposed to

continuous research. This hypothesis is further validated by two participants in the age

bracket answering that they had carried out research as part of a school project and one 48-

year-old respondent, who, when asked why they started researching their family tree

answered:

I saw "Roots" and wanted to learn where my roots were.

Both surveys indicated that it was much more likely for a given person to have carried out

research if they were or had been married, with 72.55% in the preliminary survey and 75% in

the final survey of those who were or had been married having carried out research. This in

sharp contrast to the 51.44% in the preliminary and 38% in the final surveys, respectively, of

those who had never been married. The final survey also indicated that those who had been

married were also 20.5% more likely to be interested in genealogy.

This indicated that a given person would be 41.03% and 97.37% more likely to have carried

out genealogical research, respectively, if they were in a serious relationship. The author

believes that the discrepancy between the results of both surveys is owed in a large part to a

younger contingent of respondents answering the final survey, with 41% of respondents to

the preliminary survey being under 25 years of age, and 49% being of said age banding in the

final survey. This is significant due to the previously identified age vs. likelihood of having

carried out research correlation, which revealed that participants became more likely to

have researched their ancestors the older they became.

49

Figure 3.7 - Comparison - Children: Percentage of participants who had researched

A similar trend was noticed with likelihood of researching and having children, as evidenced

in Figure 3.7, above. In the preliminary survey, it was determined that a given person would

be 26.66% more likely to carry out research if they were a parent. The difference was much

more marked in the final survey, with people being 56.41% more likely to have researched

their ancestors if they had children. This was a much bigger discrepancy than the

relationship correlation and it cannot be fully accounted for by the age difference, this leads

the author to believe that whilst there is certainly a large correlation, it is not as pronounced

as the final survey indicated.

There was not enough of a difference over both surveys for the author to asseverate a

correlation between gender and having carried out research. There was a negligible

difference in the preliminary survey of less than one percent, whilst in the final survey; the

data suggested that females were 16.73% more likely to carry out research. This fits with Hill

(2011)s findings that the majority of Archives.com users were female. Despite this, the

reason the author does not draw a correlation is due to the final survey seeing a significantly

younger male population than female population, with nearly 55% of males surveyed being

under the age of 25 and just over 44% of females also fitting into that age bracket, which

accounts for the difference.

50

There were not enough respondents from all countries for the author to asseverate a trend

overall, there were enough respondents from both the United States and United Kingdom to

make a conclusion. Respondents were nearly double as likely to have carried out some form

of research if they were from the United States as opposed to the United Kingdom.

It was revealed that a genealogical researcher is most likely to be aged over 45 years old,

married with kids and living in the United States. All of those in that cross-section over both

surveys answered that they had indeed carried out research, with age proving to be the least

important factor of the four, with 94% of Americans that are married with kids that

answered the survey indicating that they had undertaken genealogical research.

To summarise, correlations were definitely proven between the likelihood of a given person

researching their ancestors and their: relationship status, age and if they had children or not.

It was found that people are significantly more likely to have carried out research as they

grow older, and that they are much more likely to have researched their family history if

they have had children or are in a serious relationship involving cohabitation of some form

i.e. being married or living with their partner.

51

3.2 Privacy

This part of the report is dedicated to evaluating the various privacy questions asked in the

final survey. Firstly, respondents perceptions of privacy concerning genealogy will be

investigated, followed by analysing what privacy issues they are most worried about with

genealogy on the Internet specifically. Finally, social medias role in the privacy argument will

be examined and quantified.

Perceptions

Participants in the survey were asked to complete a matrix question that required them to

select how worried they were about six different factors in genealogy on a five-category

scale. These factors were accuracy of genealogical data in general, accuracy of genealogical

data on the Internet, identity theft, privacy of living people, security of data and others

profiting from your data.

The results of this question can be found on the following page in Table 6. These results

show that the factor that the respondents were most worried about was accuracy of

genealogy on the Internet, with 51% of those questioned answering that they were either

worried or very worried. It is evident that the data being online is the major issue, as 36%

noted that they were either worried or very worried about genealogical datas accuracy in

general, 15% less than the option related to the Internet. Not all respondents thought that

this digital information was of poor accuracy however, with one respondent with intimate

knowledge of the digitisation process stating:

Having been involved in transcribing/digitizing records for FamilySearch.org, I don't

worry much about accuracy of digitized records. It's what people do or do not do with

those records when building a tree that concerns me.

52

It is not surprising that identity theft was the lowest ranked factor with only 22% of

respondents agreeing that it is definitely something to worry about, and 41% stating that

they are not worried at all by it due to the non-sensitive nature of the data available,

although not all respondents agreed.

We are close to the point of too much information being available. I was a victim of

ID theft. It is very easy to get the minimum information required for ID theft.

The author believes that most of the information that people would be worried about being

mined from genealogical data, such as date of births and mothers maiden name are

generally made freely available by people on social media anyway through poor privacy

settings. Surprisingly, the category that saw the least amount of participants indicate that

they were very worried about it was the accidental release of data pertaining to living

people. The author feels that it is the most dangerous section, in that it can be potentially

very harmful, whilst also being much more likely to occur than identity theft, and can in fact,

lead to identity theft. The author feels that this is especially surprising given previous

instances of this occurring, e.g. IrishGenealogy.ies accidental release (Edwards, 2014).

Table 6 - Final survey: Privacy matrix question

53

It was found that males are less likely to be worried about the factors regarding genealogys

privacy issues, with 35.32% of male participants stating that they were not worried, whilst

only 21.27% of females said the same. Women were also nearly doubly as likely to be

indecisive about their feelings on the subject, with 15.35% stating they were not sure, with

only 7.94% of male participants claiming that they were not sure. There was a slight skew in

favour of women being more worried overall, but it was not as significant as the other

relationships.

As Table 7, below, shows, there is no clear correlation between age and participants

comfort levels regarding the various privacy factors, and no conclusions can be drawn from

the below data.

Table 7 - Final survey: Privacy matrix vs. age

The study revealed that a given participant was much more likely to be worried about the

factors given if they were not a parent. Of the childless respondents, 37.86% responded that

they were either worried or very worried, whilst only 28.2% of parents indicated similarly.

There was a similar, but not as significant trend for classifying themselves as not worried

about the six factors, with nearly 7% more stating that they were not worried if they had

children, as 30.34% of parents revealed that they were not worried, whereas 23.66% of

childless respondents said the same.

54

The survey concluded that those who had never married much more likely to be worried,

than those who have previously married. Individual statistics can be found below in Table 8.

With the exception of those who are separated, which could be due to a low sample size,

and widowed, who had zero respondents; those who are currently married are the least

likely to be worried about any of the six factors, with only 7% stating that they were very

worried about the factors given.

Table 8 - Final survey: Privacy matrix vs. marital status

To summarise, the study concluded that certain demographics were more prone to worr

Documents

Genealogy and Privacy in an Online World - Declan Greally