96
BSc (Hons) Information Technology Genealogy and Privacy Issues in an Online World Declan Greally 23/4/2015 Supervisor: Andrew Rae

Genealogy and Privacy in an Online World - Declan Greally

Embed Size (px)

DESCRIPTION

Undergraduate dissertation.The purpose of this study is to analyse the human element of genealogy as information technology has become an integral part of the field. The demographics of researchers are identified and their fears about privacy are quantified using a two survey, quantitative approach, with correlations drawn between certain demographical data such as their age, marital status and number of children and their likelihood of having carried out research. Conclusions are drawn on these through a comparison of the two surveys. The history and future of genealogy is investigated through a combination of literature review and qualitative survey approaches and predictions are made, alongside analysing the growth of genealogy on the Internet as a social venture.

Citation preview

  • BSc (Hons) Information Technology

    Genealogy and Privacy Issues in an Online World

    Declan Greally

    23/4/2015

    Supervisor: Andrew Rae

  • Contents

    ACKNOWLEDGEMENTS IV

    ABSTRACT V

    INTRODUCTION 1

    1 LITERATURE REVIEW 2

    1.1 Genealogy 3

    1.2 Genealogy and Information Technology 8

    1.3 Privacy Issues 14

    2 PRIMARY RESEARCH METHODOLOGY 19

    2.1 Terms 20

    2.2 Available Research Methodologies 22

    2.3 Chosen Research Methodology 28

    3 PRIMARY RESEARCH ANALYSIS 30

    3.1 Demographics 31

    3.2 Privacy 51

    3.3 Genealogy 60

    CONCLUSION 69

    REFERENCES 73

    APPENDIX 78

    A: Preliminary Questionnaire 78

    B: Final Questionnaire 80

    C: Supervisor Meeting Agenda Example 83

    D: Supervisor Meeting Minutes Example 84

    E: Data from Preliminary Survey 85

    F: Preliminary Survey Demographics Results 88

    G: Final Survey Demographics Results 89

    H: Genealogy Website Rankings Data 90

    I: Genealogy Website Rankings Graph 91

  • Table of Figures

    FIGURES

    Figure 1.1 - Top 15 genealogy website traffic-rankings over a five-month period (lower is better) - Alexa 13

    Figure 3.1 - Preliminary survey: Age vs. interest and research undertaken 33

    Figure 3.2 - Preliminary survey: Age of researchers vs. when they started researching 37

    Figure 3.3 - Final survey: The 40 countries that responded 39

    Figure 3.4 - Final survey: Age vs. interest and research undertaken 41

    Figure 3.5 - Final survey: Marital status and undertaking of research 43

    Figure 3.6 - Comparison: Percentage of each age group that had undertaken research 46

    Figure 3.7 - Comparison - Children: Percentage of participants who had researched 49

    Figure 3.8 - Final survey: Age and experience vs. perception of genealogy as a social activity 62

    Figure 3.9 - Preliminary survey: Why did you start researching your family tree? 63

    TABLES

    Table 1 - Preliminary survey: Marital status vs. interest and research undertaken 34

    Table 2 - Preliminary survey: Number of children vs. interest and research undertaken 35

    Table 3 - Final survey: Number of children vs. interest and research undertaken 44

    Table 4 - Preliminary survey: Age vs. years of experience of researchers 47

    Table 5 - Final survey: Age vs. feeling of experience of researchers 47

    Table 6 - Final survey: Privacy matrix question 52

    Table 7 - Final survey: Privacy matrix vs. age 53

    Table 8 - Final survey: Privacy matrix vs. marital status 54

    Table 9 - Final survey: What are you most worried about [] your genealogical data on the Internet? 56

    Table 10 - Final survey: Social quantitative questions 61

    Table 11 - Preliminary survey: Why did you start researching your family tree? - Data 64

  • iv

    Acknowledgements

    The purpose of this page is to thank those that have helped me achieve the completion of

    this dissertation.

    The first person that I would like to thank is my supervisor, Andrew Rae, who has proven to

    be of immense help throughout the creation of this body of work. Andrew helped to keep

    me on track and motivated, was a fantastic sounding board for ideas for the project, and

    actually initially proposed the idea of a project on the progress of genealogy in relation to

    information technology. I certainly do not believe that my work would be of the quality that

    it is without the massive assistance you provided, Andrew.

    I would also like to thank Mark Stansfield for agreeing to moderate this project, although we

    did not have much interaction with relation to this project, the lectures that you provided

    throughout the year for the subject proved invaluable, allowing me to know exactly what

    was required of the project. Alongside Mark, I also have to thank Carolyn Begg for stepping

    in to moderate my presentation and I enjoyed answering the questions that you provided

    during said presentation.

    I would finally also like to thank my family and friends who supported me through the

    production of this dissertation. The help of my parents, who have provided everything that I

    could have needed and for supporting me through my education throughout my life, cannot

    be understated and without them, I would not be able to produce a piece of work of this

    standard of quality.

  • v

    Abstract

    The purpose of this study is to analyse the human element of genealogy as information

    technology has become an integral part of the field. The demographics of researchers are

    identified and their fears about privacy are quantified using a two survey, quantitative

    approach, with correlations drawn between certain demographical data such as their age,

    marital status and number of children and their likelihood of having carried out research.

    Conclusions are drawn on these through a comparison of the two surveys. The history and

    future of genealogy is investigated through a combination of literature review and

    qualitative survey approaches and predictions are made, alongside analysing the growth of

    genealogy on the Internet as a social venture.

  • 1

    Introduction

    Genealogy is an age-old discipline that has seen a rapid evolution as the Internet and

    computers in general have become more widespread, available and advanced. For many

    years, the art of genealogy was carried out in a manual and laborious way, with the only way

    to gather information being to physically travel to or otherwise correspond with archives and

    other such genealogical repositories. It was not until early 1994 following the public

    release of the World-Wide Web (hereafter referred to as the web) in 1991 that the

    Internet became a viable resource in genealogical research with Genserv becoming the first

    publically accessible genealogy website (Christian, 2014). That is not to say that there was

    nothing available prior to the release of Genserv as prior to the release of the web,

    newsgroups were available for use with the first of these being net.roots (Christian, 2014).

    With that being said, this paper is not written to discuss the history of genealogy on the

    Internet, nor indeed of genealogy itself; but to analyse the effects that the introduction of

    computers, the Internet and of information technology in general has had on genealogy as a

    whole, with a special regard paid to the social consequences of this integration.

    This paper aims to identify the demographics of those who undertake genealogical research

    and analyse said demographics to reveal patterns and to investigate why people choose to

    undertake genealogical research.

    Privacy concerns continue to mount as the Internet becomes a larger part of genealogy,

    especially since the appearance of DNA testing related to genealogy, with a portion of this

    paper dedicated to investigating how safe this data is, what the data is used for and how

    accurate it is and what the future holds for genealogy as a whole.

    This paper is intended to address an apparent lack of academic interest regarding genealogy,

    especially as pertains to the growth and various positive effects that information technology

    has brought to genealogical research.

  • 2

    1 Literature Review

    Surprisingly, there has been little work of note carried out concerning the fusion of

    genealogy and information technology and this has been noted by various authors (Bishop,

    2008; Veale, 2004). As a result, the majority of this literature review focuses on different

    aspects of genealogy, ranging from books discussing the history and future of genealogy as a

    subject to the privacy policies included with the major companies in modern genealogy.

    These works were chosen as through the collective information they impart, they combine

    to help the paper fulfil all objectives of this study.

    The following literature review is split into different categories, culminating in a concluding

    section reflecting on the literature review as a whole.

    The first section addresses the assorted works that discuss the history of genealogy, ranging

    back to the very beginnings of genealogy up until modern day. Moving past the history of

    genealogy, this section then discusses the motivations behind the mass undertaking of

    genealogical research over the years.

    The second section focuses on the merging of genealogy and information technology, first

    explaining how this union occurred before moving on to examining the growth of genealogy

    because of the Internet and discussing the future trends of genealogy on the Internet.

    Finally, the literature review examines the core topic of this thesis, the underlying privacy

    issues that have become a part of genealogy as it has evolved onto the Internet. It examines

    who owns the genealogical data, how said data can be used in malicious ways and finally

    discussing the integrity of said data.

  • 3

    1.1 Genealogy

    This section investigates genealogy in a historical context, through analysis of the historical

    literature available on the subject, defining genealogy as a concept and revealing the origins

    of genealogy. The progress of pre-modern genealogy will also be analysed, with the reasons

    behind the popularity booms in genealogy being explained in the context of the era.

    Definition of Genealogy

    (the study of) the history of the past and present members of a family

    The above is the definition given to genealogy by Oxford Dictionaries (2015) and it is

    exactly that. Genealogists are the people who undertake research into their ancestors and

    family tree to discover how they are all interlinked. Saar (2002) believed that genealogy is

    a form of writing history whereas Sharpe (2011) posits that purists draw a clear distinction

    between the terms genealogy and family history:

    Genealogists set out to develop a pedigree a family tree based on identification

    of births, marriages and deaths. Family history is wider in scope, aiming to fill out the

    branches of this tree through investigation of other aspects of our ancestors lives. It

    involves genealogical, biographical and historical research

    The word genealogy itself has Ancient Greek roots, stemming from the Greek words genea

    () meaning race, family or generation and logia (-), a suffix that denotes the

    study of a subject (Oxford Dictionaries, 2015; Teknia, 2015).

  • 4

    History of Genealogy

    Early Written Genealogy

    Written genealogy has an incredibly long history but there are differing opinions on where it

    originated, with no clear agreed-upon answer. Garrett (2010) claims that:

    The tracing of genealogical lineages in Western Europe dates back, at least, to St.

    Matthews gospel which was first written in Greek.

    Potter-Phillips (1999), holds a slightly different view; whilst examining the mentions of

    genealogy within the bible, the author points to the fact that these references may have

    been born out of the Roman culture of the time.

    Genealogy was practiced by the ancient Romans to distinguish between the

    patrician class (those with proven noble ancestry) and plebians[sic] (commoners).

    The author goes further, pointing out that the ancient Egyptians and Chinese both had

    dynasties, which could both be construed as genealogies, with both pre-dating the New

    Testament of the Bible.

    As history progressed through the ages, the discipline of genealogy was primarily used to

    settle disputes over titles, land and wealth as a much political importance was affixed to

    your bloodline, with said power and wealth passed down through the generations. Whilst

    not without question, these pedigrees were generally accurate as Pine (2014) explains:

    The truth was sometimes bent to suit some political end, but, on the

    whole, medieval European records are genealogically valid. This is because they were

    not primarily intended to supply genealogical information but to record land

    transactions, taxation, and lawsuits.

  • 5

    Early Modern Genealogy

    Unlike the origins of written genealogy, theories regarding the beginnings of modern

    genealogy are relatively dispute free. It is generally agreed that modern genealogy

    originated in the sixteenth century, in particular, arising from a law passed by King Henry VIII

    in 1538 which required that ministers keep records of christenings, baptisms, marriages

    and burials. (Potter-Phillips, 1999).

    The following centuries were filled with upheaval within the nobility and society as a whole,

    with literacy rates rising alongside an increased societal interest in history, leading to a major

    growth in genealogical interest (Pine, 2014; Potter-Phillips, 1999). Various European

    provinces soon began to keep records similar to British ones, with Potter-Phillips (1999)

    suggesting that this was due to influences by the Catholic Church. Interestingly, the author

    also states, In most countries, church parish registers pre-date any civil record keeping.

    Little of note occurred for centuries, as it remained a steadily studied discipline with stable

    interest levels. Genealogy received a boost following the American Revolution as citizens of

    the USA sought to establish links to the heroes of the Revolution and to those who originally

    colonised the New World. This lead to the creation of the first genealogical society in the

    world; the New England Historic Genealogical Society in Boston, Massachusetts in 1845

    (New England Historic Genealogical Society, 2015; Potter-Phillips, 1999). It was not until

    1911 that a similar society was set up in the UK; the Society of Genealogists, set up in

    London, England (Kennett, 2011; Society of Genealogists, 2015).

    Pre-Internet Modern Genealogy

    This author considers the start of modern genealogy, as we know it today, to coincide with

    the turn of the twentieth century. This is believed owing to many major genealogical

    societies being established during this time, such as the aforementioned Society of

    Genealogists. It was also in this time-period that the Church of Jesus Christ of Latter-day

  • 6

    Saints (hereafter referred to as the LDS Church) also known as the Mormon Church

    began collecting and making genealogical records available, specifically, in 1894.

    The LDS Church are based in Salt Lake City, Utah and believe that members can save

    deceased ancestors and baptise them, leading to their collection of genealogical records. To

    help achieve this aim, they set up the Family History Library, which is the largest genealogical

    library in the world (Garrett, 2010; Utah, 2015).

    In the early twentieth century, standardisation became a concern with the rise in

    genealogys popularity as genealogists sought to make their field academically relevant.

    Garrett (2010) stated that:

    The need for standards resulted from a growing disdain of historians, librarians, and

    archivists who viewed genealogy as nonscholarly[sic], error-filled works of family

    pride.

    Scholars viewed these attempts with disdain however, with an example communicated by

    O'Hare (2002) citing an article from an edition of William & Mary Quarterly in 1942:

    "[a]s a pleasant and harmless form of antiquarianism, the study of family history,

    biography, and the tracing of genealogy are tolerantly humored but certainly not

    seriously honored by historians and scientists."

    Genealogy was seen as a quaint hobby but certainly not an academic subject, rightly or

    wrongly, and it continued to increase in popularity steadily following the end of World War II

    (Garrett, 2010), before exploding in popularity following the release of Alex Haleys book

    Roots: The Saga of an American Family in 1976 and its subsequent award-winning mini-

    series in 1977. Pettinato (2014) revealed that:

    Requests to the National Archives for genealogical material quadrupled the week

    after the TV show ended.

  • 7

    The author also points out that the number of genealogical societies being inaugurated in

    the USA increased drastically following the release of Roots. The release of Roots spurred a

    worldwide interest in the subject and pushed the discipline to heights that it had not known

    before.

    In this chapter, the author defined genealogy before identifying the key factors that drove

    the evolution of genealogy and studied the history of the genealogical subject as a whole,

    starting at the beginning of written genealogies until the appearance of information

    technology. The author identified genealogy as a steadily growing subject that struggles with

    being academically accepted but has captured the imagination of the public and at times, of

    the nobility and politicians.

  • 8

    1.2 Genealogy and Information Technology

    This section of the paper is intended to review the available literature and evaluate

    genealogy as a field since the introduction of information technology, and reveal how the

    field has grown since said introduction, with special attention paid to the growth of

    genealogy since the introduction of the Internet.

    Beginnings

    Genealogy appeared on computers in 1979 with the first piece of genealogical software

    released, Genealogy: Compiling Roots and Branches, written by John J. Armstrong and cost

    $250 ($813.19 adjusted for inflation or 537.14) (Eastman, 2002; US Inflation Calculator,

    2015; XE Currency Converter, 2015).

    Due to genealogical software became more popular and widespread following the

    pioneering work of John J. Armstrong in the creation of Genealogy: Compiling Roots and

    Branches, there arose a need to create a specification that allowed the transfer of

    genealogical data between the various software on the market.

    The Family History Department of the Church of Jesus Christ of Latter-day Saints recognised

    this need and created the GEDCOM (GEnealogical Data COMmunications) standard in 1984

    (Nurse, 1994). The initial problem to overcome was that only the program written by the

    Church of Jesus Christ of Latter-day Saints could read this new standard, although over the

    years, most programs came to read the .GED format utilised by the GEDCOM standard

    (Eastman, 2014). Eastman (2014) further explains that GEDCOM required that the file was a

    plain text file with a set, structured format with numbers preceding information to indicate

    the position within the hierarchy that information sits, alongside a tag to identify the

    significance of the information.

  • 9

    In 1983, genealogy made its first appearance on the Internet with the creation of the

    newsgroup net.roots (Isaacson, 1998) though it was replaced in September 1986 by soc.roots

    (Christian, 2014). This was followed up by the creation of the first genealogical mailing list,

    ROOTS-L, in December 1987 (ROOTS-L, 2010).

    These were the early days of the Internet and it was not the most useful of resources to use

    at the time, as Garrett (2010) explains:

    genealogists could only access the services of their online service providers and

    communicate with fellow genealogists who subscribed to the same provider.

    However, these were only temporary issues, alleviated by the public introduction of the

    World Wide Web on 6th August 1991 (CERN, 2015) but Garrett (2010) goes further,

    identifying the issues that were present at the time:

    These early forays into Internet genealogical research were further limited by the

    text-based technology of the time, which made it impossible to view digital facsimiles

    of records.

    The first family tree was put online shortly after by Alan Stanier in June 1993 followed by the

    first directory of genealogical resources on the Internet, The Genealogy Home Page in July

    1994 (Christian, 2014).

    In between these two events, Mosaic was released in March 1993 and is credited with

    helping to popularise the World Wide Web. Mosaic was the first browser that was able to

    display images on the same page as text, instead of having to click on a link to view the

    image in a separate window (Boutell, 2006).

    In one of the most significant events in the history of genealogy on the Internet,

    Ancestry.com was released in 1996 and it rapidly became the highest-trafficked site on the

    genealogical Internet, where it remains to this day (Alexa, 2015; Ancestry, 2015).

  • 10

    In a landmark move, Scots Origins was launched on 6th April 1998, becoming the first pay-

    per-view site for UK public records and in August 2001, they became the first to add the

    1881 and 1891 census indexes and images to their records collection (Christian, 2014).

    By 22nd April 2006, all available UK censuses were made available online with the addition of

    the previously unavailable 1841 censuses and the collection was made complete in England

    and Wales in January 2009 when the 1911 census was added, with Scotland following suit in

    April 2011. (Christian, 2014)

    Growth of Genealogy on the Internet

    In late 1999, McClure (1999) stated that A search of the word genealogy on the Internet

    results in over 5 million possible pages.. On 5th January 2015, a Google search showed that

    this figure now stands at 92.1 million possible pages, with this number growing to 94.1

    million results as of 22nd April 2015. This is an unsurprising amount of growth, given how

    rapidly the Internet in of itself has expanded over the years. To give a more recent example,

    Kennett (2011) identified that in August 2010, the Office for National Statistics revealed 10

    million adults had never been online; making up 21% of the adult population within the

    United Kingdom. In May 2014, the Office for National Statistics (2014) announced that the

    number of adults that had never been online had dropped to 6.4 million, or 13% of the

    British adult population.

    Blogging about genealogy is also seeing growth through the years as Hill (2011) identified

    that at the end of 2010, GeneaBloggers a website dedicated to blogging about genealogy

    had 1,535 bloggers regularly posting on their website and by January 2015, that number had

    risen to 3,056 (GeneaBloggers, 2015).

    An article by GenealogyInTime Magazine (2012) included traffic rankings of the top 25

    genealogical websites according to Alexas traffic ranking statistics, with the average traffic

    ranking of said websites being 23,606. The author of this study (Genealogy and Privacy Issues

  • 11

    in an Online World) carried out similar research on 19th November 2014 and found the

    average traffic ranking at that date to be 19,168, a growth of 23.15%, when directly

    compared to previous rankings, websites showed an average growth of 23.3%, discounting

    new entries to the rankings.

    Genealogy on the Internet is also growing in a more subtle way, through the undeniably

    ageing population of the world.

    Archives.com data and broader industry analyses indicate that users of genealogy

    websites tend to be female aged 45 or older. This age group constitutes 62% of

    Archives.com members.

    (Hill, 2011)

    That alone does not show growth, however, statistics released by the UK government in

    February 2012 indicate that the population of the UK is ageing (Rutherford, 2012). Through

    analysis of statistics given in said paper, it predicts that the population within the age group

    of 40 and over will overtake those younger than 40 by 2025, with the average age of the

    population to rise from 40 to 43 by 2035 (Rutherford, 2012). Further analysis shows that the

    population of the age group of 45 and over directly relevant to the data revealed by Hill

    (2011) will grow by an average of 1.83% over the average population growth until at least

    2035, where the available predicted statistics end (Rutherford, 2012). Josiam and Frazier

    (2008) also stated that:

    The more a person has used the Internet and the older they are, the more likely they

    are to use the Internet for genealogy research.

    This leads to the overall conclusion that as the population ages and becomes more

    accustomed to the Internet, interest in genealogy on the Internet will rise.

  • 12

    The first recorded incidence of the term genetic genealogy appeared in a Dallas, Texas

    newspaper known as Dallas Morning News in March 1989:

    Of course, scientists have long known that we all carry a record of our roots in our

    genes. It's just that the record in the rocks has been easier to read. Lately, though,

    practitioners of genetic genealogy have found methods to search for the woman

    from whom we all are descended.

    (Siegfried, 1989)

    Due to the increasing merging of genealogy and technology, commercially available genetic

    DNA tests became available in 2000 following the launch of the companies Family Tree DNA

    and Oxford Ancestors (Family Tree DNA, 2009; Oxford Ancestors, 2015). Surprisingly, the cost

    involved in the testing was not prohibitively expensive when the technology first became

    available, with Family Tree DNA offering mtDNA - mitochondrial DNA, DNA passed from

    mothers to children (Phillips DNA Project, 2015) - tests for $219, with the equivalent test in

    2015 costing $199 (Family Tree DNA, 2000; Family Tree DNA, 2015b).

    Figure 1.1, on the following page, shows the rapid growth of genealogy websites related to

    DNA, of particular note Family Tree DNA and 23andMe, who have improved their traffic

    ranking by 6,815 and 5,281 according to the traffic rankings of Alexa respectively over the

    five-month period between 19th November 2014 and 16th April 2015. It also shows that the

    top websites within the genealogical world stay rather stagnant, or with slow but steady

    growth. Of particular interest is that there appears to be a slight drop in interest going over

    the Christmas season, although it recovers quickly going into March. A ranking of the top 25

    websites with a comparison to 2012 can be found in Appendix H: Genealogy Website

    Rankings - Data and Appendix I: Genealogy Website Rankings - Graph.

  • 13

    Figure 1.1 - Top 15 genealogy website traffic-rankings over a five-month period (lower is better) - Alexa

    The size of Ancestry.com as a whole cannot be understated, with seven of the top 15

    websites shown in Figure 1.1 being owned by the Ancestry.com group, these being

    Ancestry.ca, Ancestry.co.uk, Ancestry.com, Ancestry.com.au, Archives.com, Genealogy.com

    and Find A Grave (Tester, 2014).

    In summary, the field of genealogy has seen enormous growth since appearing on a virtual

    format, and especially with the Internet. The gradually increasing penetration of the Internet

    has made genealogy more accessible to people of a lower income and of the older, reducing

    the need for expensive travel or cost at all, as it is very possible to carry out genealogical

    research entirely free.

  • 14

    1.3 Privacy Issues

    This unit of the literature review is dedicated to reviewing the literature available on privacy

    issues related to genealogy as a field, mainly evaluating the accuracy of data and the

    databases that genealogical data is held on in the age of the Internet.

    Integrity of Genealogical Data

    Issues with the veracity of genealogical data is not a problem new to the Internet age of

    genealogy but it is one was carefully watched as the Internet has become a larger and larger

    part of genealogy as a whole. As Howells (1998) posited when genealogy on the Internet was

    under its highest levels of scrutiny:

    In the future, we will continue to find published information which is based on

    hearsay and poor research methods entirely lacking in any source citations.

    The key word in this statement being continue. There were also accuracy issues in regards to

    earlier genealogical records, for example, Durie (2009) states that missing records are an

    issue in early Scottish censuses, stating about the 1841 Scottish census:

    Some parishes are known to be missing from the records. A lot of these are in Fife

    because the records were lost overboard during their transit by boat to Edinburgh.

    Even though people might have moved after census night, and therefore could be

    counted twice, it was impossible to repeat the exercise for these fourteen Fife

    parishes, which represented about 30% of Fifes census data, much to the fury of

    genealogists ever since.

    This process was repeated in the 1851 Scottish census, with seven registration districts going

    missing. Interestingly in the aforementioned 1841 census, the ages of anyone over the age

  • 15

    of fifteen were rounded down to the nearest five, creating inaccuracies within dates of

    births (Durie, 2009).

    Some authors have doubts about the authenticity of genealogical records on the Internet,

    amongst them Kovacs (2001) who wrote:

    One issue which should concern genealogists who find records on the Internet is the

    authenticity of the documents. It is often difficult to ascertain whether or not primary

    records have been altered (either inadvertently or intentionally) in the digitization

    process.

    There are also genealogical researchers who flat out reject online records as being valid as

    explained in a paper by (Garrett, 2010), referencing a book written by Crowe, E. P. entitled

    Genealogy Online:

    In fact, according to Crowe, many professional genealogists refuse to consider online

    records as authoritative: [t]heir attitude is this: A source is not a primary source

    unless you have held the original document in your hand. And a primary source is not

    proof unless it is supported by at least one other original document you have held in

    your hand.

    There is also the constant issue that when data is on the Internet, it is there forever, despite

    any questions over the accuracy of said data. Bishop (2008) carried out a study analysing

    exactly why genealogists carry out their research, prompting them to keep a diary and

    record their experiences. It threw up some interesting anecdotes about errors in

    genealogical research, with one researcher noting that she had found errors in a book but

    decided against notifying the author of said book of the error; perpetuating a cycle of

    erroneous information. Bishop (2008) writes:

    One researcher shared her frustration at her inability to convince other researchers

    to correct a piece of information about her great-great-grandmother. They have

    [her] married 4 times and will not change their documentation. With each post to an

  • 16

    online message board devoted to the family, this myth perpetuates itself! she

    wrote.

    Finally in this study, another researcher noted problems within census taking in the early

    twentieth century:

    I was so disgusted with my grandmother and what she told the census taker on the

    1930 census, said one researcher. Her grandmother said she was born in California

    and that her parents were from France; both pieces of information turned out to be

    incorrect.

    Veale (2004) identified several issues with genealogy on the Internet in her paper, noting

    that groups such as the Internet Genealogists for Quality have been set up in response to the

    volume of false genealogical data available on the Internet. A possible reason for this

    proliferation is given by Veale (2004):

    Thus the many genealogies published on the Internet have given rise to the quickie

    genealogist those who go online to pursue their ancestry, and by using the work of

    others, copy the information verbatim, disregarding basic genealogical methodology,

    to regurgitate the material, mistakes and all, as their own

    This all combines to give a negative perception according to Veale (2004):

    Some of the negative perceptions include: concerns over information veracity and

    quality; fears about intrusions into privacy and even the chance for identities to be

    stolen; and the commercialisation of both amateur labour and previously free

    information.

    Even within genealogical software itself, there can be issues with data loss when exchanging

    data between programs using dissimilar standards as Eastman (2014) explains:

    Translating from one programs database to GEDCOM is sort of the same as

    translating from one spoken language to another. The basics work, but subtleties and

  • 17

    details sometimes do not translate well. Then, when translating to the third language

    (the receiving genealogy programs database), more translation losses creep in.

    In terms of the security and integrity of genealogical databases/websites, there has been

    little to worry about. The biggest of these, Ancestry.com has only one incidence that the

    author can find online of their services being compromised, and this was when their servers

    were subjected to a DDoS (Distributed Denial of Service) attack in June 2014, although no

    data was compromised (Dobner, 2014).

    There was an incidence of the data of living people being released accidentally in relation to

    genealogy however, by website run by the Irish government named IrishGenealogie.ie. The

    website accidentally released the civil registry records of every citizen born or who married

    in the State in July 2014 (Edwards, 2014).

    Aside from these two incidents, genealogy websites appear to be mostly free from security

    issues, whether this is due to good security practices by the services or just a lack of interest

    from malicious instigators is questionable but ultimately irrelevant as it cannot be proved

    either way at this point by the author.

    Who Owns Genealogical Data?

    Hoffman (2011) states that genealogy falls under the purview of intellectual property laws.

    This means that the genealogical data that you create is copyrighted to you, i.e. if you create

    a family tree online, that family tree is copyrighted to you. This is only the case if you add

    some creativity to the tree, i.e. you can add some form of narration to the tree. This

    essentially means that you own the data that you create in relation to genealogy, so long as

    you have added a modicum of creativity to it (Hoffman, 2011).

    One of the most contentious issues arising due to recent genealogical advances is the

    ownership of DNA information if a genealogist submits their DNA to a testing lab to identify

  • 18

    their heritage. Despite this, the reading of various privacy policies of DNA testing companies

    such as Family Tree DNA, 23andme and AncestryDNA by the author has revealed that there

    are no real issues regarding this, within the largest companies. Interestingly however, Family

    Tree DNA do not seem to have put very much thought into their privacy policy (Family Tree

    DNA, 2015a). The author posits this as the policy states:

    Family Tree DNA also adheres to the Genetic Genealogy Guidelines as proposed by

    the The[sic] Genetic Genealogy Standards Committee in January 2015.

    This is interesting as The Genetic Genealogy Standards Committee (2015) state at the start

    of the second paragraph of their Genetic Genealogy Standards that:

    These Standards are intentionally directed to genealogists, not to genetic genealogy

    testing companies.

    This chapter, more than anything, identified the sheer lack of literature regarding privacy in

    genealogy. However, it also revealed the error-strewn past of genealogy, showing that

    inaccuracies are not solely born of the Internet age for genealogy.

    In the first chapter, the author examined the history of genealogy before studying the effects

    that information technology has had on genealogy as a whole in the second chapter. Finally,

    the third chapter focused on the integrity and partially on the ownership of genealogical

    data and the websites that hold this data.

    There is a surprising lack of information about privacy issues in terms of genealogy and the

    author has found that even the most basic of questions have yet to be answered within

    genealogy as an academic field.

    The material discussed within the past three chapters and the material that has been

    reviewed as a whole throughout the entire process has led to the final question of:

    Are genealogical researchers worried about privacy issues within genealogy in the age of

    the Internet and, if so, what worries them the most?

  • 19

    2 Primary Research Methodology

    This chapter is intended to outline the possible methodologies that can be used to carry out

    the primary research required for a thorough undertaking of the project, discussing the

    features and limitations of the various methodologies available for use.

    The intention of this chapter is to outline the optimal methods for the undertaking of a study

    such as this one and to discuss the reasons behind the choosing of this methodology. The

    underlying philosophical terms relevant to modern methodologies are also explained in

    detail, to give a fuller understanding of the subject matter.

  • 20

    2.1 Terms

    When discussing research methodologies, it is important to understand some of the terms

    behind the subject matter and this section aims to explain the various terms required for a

    full understanding of the methodologies. There are two main philosophical approaches to

    methodologies and these are positivism and interpretivism.

    Positivism

    Positivism is generally regarded as being a scientific approach to research, using an objective

    and measurable approach to every study.

    Punch (2013) defines positivism as:

    the belief that objective accounts of the world can be given, and that the function

    of science is to develop descriptions and explanations in the form of universal laws

    Bryman (2012) identifies that there are five key principles of positivism, and these are:

    Phenomenalism the principle that only knowledge that can be perceived through senses

    can be identified as usable knowledge (Mastin, 2008).

    Deductivism the ability to create hypotheses that are testable and allow explanations to be

    easily created (Bryman, 2012).

    Inductivism the principle in which Knowledge is arrived at through the gathering of facts

    that provide the basis for laws (Bryman, 2012).

    Alongside these three principles, the fourth and fifth principles of positivism require that a

    study be free from bias (objective) and that a clear distinction is made between scientific

    statements and subjective statements.

  • 21

    Interpretivism

    Interpretivism is generally seen as the antithesis to positivism by researchers. Whereas

    positivism focuses on objective and scientific approaches, Murphy (2014) states that

    interpretivism encourages researchers to explore the data and to come to an understanding

    of why the people involved in the study made the choices that they did. Punch (2013) gives

    the definition of interpretivism as being:

    the philosophical position that people bring meanings to situations, and use these

    meanings to understand their world and influence their behaviour

    This shows that interpretivism has to be approached with a measure of subjectivity, as it is

    important to discover the reasoning behind answers. Interpretivism also requires the author

    of a study to not structure the data in a format that follows the researchers initial

    assumptions (Murphy, 2014).

  • 22

    2.2 Available Research Methodologies

    There are three main approaches to be considered when identifying an appropriate

    methodology for the undertaking of primary research; these are given as qualitative,

    quantitative and mixed methods (Creswell, 2003). This section is dedicated to exploring

    these methods and aims to evaluate the individual methods strengths and weaknesses.

    Quantitative

    Quantitative research methods attempt to maximize objectivity, replicability, and

    generalizibility[sic] of findings, and are typically interested in prediction. Integral to

    this approach is the expectation that a researcher will set aside his or her experiences,

    perceptions, and biases to ensure objectivity in the conduct of the study and the

    conclusions that are drawn. Key features of many quantitative studies are the use of

    instruments such as tests or surveys to collect data, and reliance on probability theory

    to test statistical hypotheses that correspond to research questions of interest.

    (Harwell, 2011)

    Quantitative research is generally carried out in the form of surveys or through alternative

    data gathering activities that have a focus on closed-ended questions (Creswell, 2003).

    Creswell (2003) also identifies that quantitative research is post-positivist that is, it is an

    empirically scientific approach to a study, focusing on the objective facts throughout a study,

    although not to the rigid standards set by positivism, allowing for some manner of

    subjectivity to be applied.

  • 23

    To give an example of a quantitative approach related to the subject matter of this project, a

    survey could be created and then distributed to a large amount of people using the Internet

    to gather demographical information, with questions ranging from asking respondents age

    to asking if they have ever undertaken genealogical research.

    Advantages

    There are multiple advantages to choosing to approach a study with a quantitative

    methodology, key amongst them being that a quantitative approach allows a large amount

    of data to be collected relatively quickly. This can be in the form of an online survey

    distributed to many respondents, which, aside from the initial time spent designing the

    survey, essentially runs itself and allows responses to be collected passively without much

    effort on the part of the researcher.

    The data that is gathered through a quantitative approach can generally be easily quantified

    and analysed as quantitative data is in the form of closed ended questions, which allow

    responses to be tallied into easily read sets of data. Large-scale analysis can be carried out

    using statistics software, such as Microsoft Excel, and it is relatively straightforward to carry

    out this analysis, with a lot of the work carried out by the program automatically, although it

    can be time-consuming (Punch, 2013).

    Quantitative research such as surveys also appeal to the natural human preference for

    numbers as opposed to having to fill out text boxes for questions (Creswell, 2015).

    Due to the closed-ended nature of questions asked, quantitative results are generally

    regarded as objective data as the raw data is free from misinterpretation, as opposed to a

    respondent responding in the form of text, where tone could be important. This leads to

    quantitative data being largely accepted as unbiased and mostly free from subjectivity.

    Quantitative studies have better replicability than qualitative surveys, allowing multiple

    researchers to carry out the survey, to further confirm or reject the original conclusions of

    the study (Altermatt, 2008).

  • 24

    Disadvantages

    A quantitative approach is not without its disadvantages however, with one of the major

    criticisms levelled at the methodology being its inherent inflexibility. In many cases, a

    researcher cannot identify every single category of response to a question, leading to

    inaccurate responses being gathered, due to the respondent not agreeing with any of the

    closed-ended options (Altermatt, 2008).

    Creswell (2015) also states that quantitative research Is impersonal, dry and that it Does

    not record the words of participants, possibly missing out on possible key information that

    would have otherwise been gathered through a more personal, qualitative approach.

    Qualitative

    In contrast to quantitative research, Harwell (2011) infers that the qualitative methodology

    is quantified through:

    discovering and understanding the experiences, perspectives, and thoughts of

    participants that is, qualitative research explores meaning purpose or reality

    Altermatt (2008) concisely defines qualitative research as:

    qualitative research involves observations that are transformed into records based

    on the observers intuitive sense of what is important.

    In essence, qualitative research is the antipode of quantitative research, focusing more on

    the narrative of the information gathered as opposed to the facts produced (Creswell, 2015).

    A qualitative approach is usually used to generate theories, as it gathers open ended

    answers from respondents, allowing for researchers to generate a hypotheses based on the

    answers given (Punch, 2013).

  • 25

    An example of a qualitative study in the context of this project, a qualitative approach would

    be performing in-depth interviews, preferably face-to-face with genealogists from

    genealogical societies, where possible. This would allow for an in-depth discussion on the

    factors behind undertaking research and the future of genealogy as a whole.

    Advantages

    The most important advantage that qualitative research has over quantitative research is

    that qualitative research can provide extremely detailed information on the subject in

    question through a written description as opposed to purely numerical data. Data is not as

    narrowly focused as quantitative data is, allowing for a more thorough analysis to be

    performed (Altermatt, 2008).

    Altermatt (2008) suggests that if a researcher is unfamiliar with the project at hand, a

    qualitative approach allows a researcher with moderate knowledge to ask open questions,

    which when answered, gives the researcher a greater knowledge of the subject, and allows

    them to approach the project with a more narrow focus.

    Creswell (2015) states that a qualitative study Is based on the views of participants, not of

    the researcher, suggesting that a qualitative approach helps to mitigate the problems

    associated with the pre-defined assumptions of the researcher designing the study.

    Disadvantages

    The biggest disadvantage of undertaking a qualitative study is that it becomes very difficult

    to study a large subset of people, as each individual study requires a much larger amount of

    time, as opposed to a quantitative study.

    The nature of a qualitative study is such that it only provides soft data, which is highly

    subjective and relies heavily on the participants of the study, reducing the ability of the

    researcher to apply their expertise (Creswell, 2015).

  • 26

    Largely, the disadvantages of the qualitative methodology mirrors the advantages of the

    quantitative methodology. Altermatt (2008) indicates that qualitative studies are susceptible

    to confirmation bias as:

    observers intuitions may lead them to seek out, notice, interpret, and remember

    events that are consistent with their expectations

    It is also much more difficult to analyse qualitative data as opposed to quantitative data as it

    is generally in the form of words, instead of numerical data. It is still possible to quantify this

    data, although it is extremely time-consuming, compared to quantitative data.

    Qualitative studies are also difficult to replicate due to the in-depth, subjective nature that is

    part of the methodologys core principles (Harwell, 2011).

    Mixed Methods

    As the name indicates, a mixed methods methodology combines aspects of both

    quantitative and qualitative research methods. Creswell (2015) states that he believes mixed

    methods research to be:

    An approach to research in the social, behavioral, and health sciences in which the

    investigator gathers both quantitative (closed-ended) and qualitative (open-ended)

    data, integrates the two, and then draws interpretations based on the combined

    strengths of both sets of data to understand research problems.

    Mixed methods are generally considered a fairly new methodology, as indicated by Harwell

    (2011), with its modern standards appearing during the early 1990s. No set, widely

    accepted definition of the mixed methods approach exists, due to the contentious nature of

    the methodology. The author agrees with the above definition however, as for it to be a

    methodology of its own, the approach cannot simply attach the methods together without

    any relevant linkage between them. Harwell (2011) posits that:

  • 27

    other authors say a mixed methods study must have a mixed methods question,

    both qualitative and quantitative analyses, and integrated inferences

    This definition meshes effectively with the definition given by Creswell, and these given

    definitions are the definitions that the author agrees with. The opposite view is that a mixed

    methods approach is any study with both qualitative and quantitative data (Harwell,

    2011), which, in the authors opinion, is simply a study using both qualitative and quantitative

    methodologies as opposed to a new methodology in its own right.

    Advantages

    A mixed methods approach allows for the most in-depth and accurate data gathering of the

    three given methods due to the combination of both qualitative and quantitative

    approaches to form a complete picture (Creswell, 2015).

    Disadvantages

    The major disadvantage of a mixed methods study is that compared to both qualitative and

    quantitative studies, it is incredibly time consuming. This is in part due to the large amount

    of planning that is required to effectively carry out a mixed methods study, and also due to

    the complex data analysis required to successfully integrate the qualitative and quantitative

    data (Punch, 2013).

  • 28

    2.3 Chosen Research Methodology

    The chosen methodology for the purposes of this study was that of a quantitative approach,

    with some elements of the qualitative methodology also. As the qualitative questions do not

    directly link with the quantitative questions proposed, it is not a mixed methods survey.

    A quantitative approach was chosen due to the large importance of gathering the

    demographics of genealogical researchers, requiring a large volume of respondents,

    rendering the usage of a qualitative approach to be inadequate due to the large time

    required to carry out large-scale qualitative research. As two surveys are planned, a

    quantitative survey is also ideal due to its high replicability, this replicability allows the

    author to carry out two surveys within the time allotted; one being a preliminary survey to

    gather a baseline result, to ensure that the final surveys results were not badly skewed.

    Owing to the importance of the gathering of demographics, a quantitative approach was

    chosen, as the gathering of demographics requires a relatively large sample size. The

    quantifiable nature of the data is also helpful for the data analysis required to identify

    patterns in those who undertake research as it allows the data to be analysed quickly and

    thoroughly using programs such as Microsoft Excel.

    Whilst a qualitative survey would have allowed for a much more in-depth analysis of

    sections of the final project, for example, for investigating why genealogists carry out their

    research, it would have been far too time consuming to carry out qualitative research to the

    volume and ultimately quality required for an accurate demographics result.

    The qualitative elements of the survey come in the form of open-ended questions at the end

    of both surveys. With both surveys, the survey has been split into two separate pages, with

    multiple choice questions on the first page, and text-based open-ended questions on the

    second page. Participants will only advance to the second page if they indicate that they

    have undertaken genealogical research at some point, as the questions on the second page

    require some knowledge of genealogy. This decision was made because the author did not

  • 29

    want respondents closing the survey at the sight of text boxes and it was reasoned that if the

    respondent indicated that they were interested in genealogy, the participants would be

    more likely to answer said questions.

    To summarise this chapter, the author has investigated the various methodologies available

    to a researcher carrying out a study. Each of the three main approaches were evaluated,

    with the advantages and disadvantages of each method revealed and contrasted. This

    evaluation of methodologies was used to select an appropriate method to approach the

    study in question with, with the reasons justified and outlined. The method chosen for the

    study was that of a quantitative approach, with elements of the qualitative method as it

    allows the author to assess the demographics of researchers with the appropriate volume.

  • 30

    3 Primary Research Analysis

    This chapter is intended to evaluate the primary research that has been gathered

    throughout the entire project timeline. The chosen format for the study was two separate

    quantitative surveys, created using instant.ly, issued online through various social media

    avenues such as Facebook and Reddit, and analysed using the inbuilt tools of instant.ly and

    Microsoft Excel. Some data is also displayed using tools made available by info.gram.

    There are three sections of data analysed in this section. The first section of data is the

    quantitative data gathered to identify the demographics of those who undertake

    genealogical research. The second section analyses both quantitative and qualitative data,

    with a heavy focus on investigating the privacy issues associated with genealogy and how

    participants feel about said privacy issues. Finally, the third section focuses mainly on the

    qualitative data gathered, this section is dedicated to understanding the underlying

    motivations and future of genealogy as a whole, with attention also paid to genealogy as a

    social venture.

  • 31

    3.1 Demographics

    This section reviews and analyses the demographical data acquired through the two surveys

    issued for the purposes of this study. The author evaluates both surveys in their own rights

    before comparing and contrasting them with each other in order to identify possible trends

    within the genealogical community. As will become evident, the study received a large

    contingent of responses from younger, single and childless participants. As such, the author

    will not state overall shares per demographic as it is skewed by the lack of a normal

    distribution, therefore, only the percentage chance of a certain demographic having

    undertaken research will be evaluated, to prevent incorrect results.

    Preliminary Survey

    Introduction

    The preliminary survey was taken by 310 respondents across 24 different countries. It was

    distributed over the Internet using various social media such as Facebook and Reddit. The

    survey tool used was instant.ly and select data was displayed in the form of an infographic,

    using info.gram (see Appendix E: Data from Preliminary Survey). In total, 149 people

    completed the entire survey, having answered yes to having undertaken genealogical

    research and not leaving the survey prior to completion.

    The survey overall comprised of nine closed-ended/quantitative questions, with six being

    multiple choice radio buttons, two accepting numerical answers and one comprising of a

    drop-down menu. Two qualitative questions were also asked, both requiring a textual

    answer. Seven of the quantitative questions were placed on the first page, with the other

    two quantitative and two qualitative questions asked on page two. All questions in the

    survey were mandatory.

  • 32

    The survey was intentionally designed to have nothing but multiple-choice questions on the

    first page, to avoid having participants deciding to end the survey without fully completing it.

    Unfortunately, once greeted with the two qualitative questions on page two, 17.68% of

    participants opted to leave the survey; however, this ultimately validated the authors

    decision to split the questionnaire into two pages, as the demographics of those 32

    respondents were still collected. A full transcript of the survey can be found in Appendix A:

    Preliminary Questionnaire. The average time taken to complete this survey was 1 minute

    and 48 seconds.

    The aim of this survey was primarily to gather a baseline demographic for genealogical

    researchers, to ensure that the final survey had something to compare to, in order to verify

    that the final survey was not skewed towards any one demographic, and to be able to make

    correct and valid correlations and conclusions based on the combined data. Appendix F:

    Preliminary Survey Demographics Results contains the raw quantitative response data

    from the preliminary survey, for reference purposes.

    Age

    The preliminary survey saw a large range of ages participating in the study, with participants

    as young as 14 and as old as 70, with the overall median age of respondents being 26.5 years

    old. For researchers, the minimum and maximum ages stayed the same, at 14 and 70 years

    old respectively, however the median age increased by three and a half years to thirty.

    Appendix F: Preliminary Survey Demographics Results identifies the shares that each age

    group held in the survey.

  • 33

    Figure 3.1 - Preliminary survey: Age vs. interest and research undertaken

    In Figure 3.1, above, the data shows a clear correlation between age and having undertaken

    research, with a visible turning point being reached when participants were above the age of

    24 years old. There is a slightly similar correlation with interest in genealogy and age,

    although it is far less pronounced than with actually carrying out research.

    Gender

    There was a fairly even split of genders amongst respondents to the initial survey, with

    46.45% of respondents being male and 52.26% of respondents being female, with a further

    1.29% of participants not identifying as either male or female. There does not appear to be

    any relationship between gender and either being interested in genealogy or having

  • 34

    researched your family tree, with there being a 1% difference between males and females in

    interest (male: 85.42%, female: 86.42%) and a 2.4% difference in having undertaken

    research (male: 60.42%, female: 58.08%). The author did not feel that the data on those who

    did not identify as either male or female was not sufficient to produce a conclusive

    relationship, as only four respondents to the survey recorded themselves as such.

    Marital Status

    There was a significant skew in favour of those who were single, although this was

    unsurprising, given the aforementioned median age of 26.5. The percentage share per

    marital status is as follows:

    Single: 50.32%

    Living with a partner: 16.77%

    Married: 28.39%

    Separated: 0.65%

    Divorced: 3.23%

    Widowed: 0.65%

    As the data above shows, those who have never married comprise 67.1% of the survey

    participants, with the remaining 32.9% having married at some point in their lifetime and

    these are the two groups that the survey will focus on, as they carry similar views.

    Table 1 - Preliminary survey: Marital status vs. interest and research undertaken

  • 35

    As Table 1, on the previous page, shows, there is no significant correlation between interest

    in genealogy and a persons marital status. There is a trend between marital statuses and

    having carried out research, however. Those who have been married at some point in their

    lives were found to be 41.03% more likely to investigate their family genealogy than those

    who have never married. This was the strongest correlation of the preliminary survey. This is

    due to 72.55% of those who have been wed indicating that they had undertaken research, in

    contrast to the 51.44% who recorded themselves as having never married indicating that

    they had researched their genealogy.

    Children

    The prevalence of young, single participants leads there to be a large skew towards

    respondents having no children. 73.87% of respondents were childless, with only 26.13%

    having children. This disparity led the author to collate all respondents with children into a

    single category, as opposed to evaluating each separately. For posterity: 10% had one child,

    10% had two children; 3.87% had three children, 0.97% had four children and 1.29% of

    participants had five or more children.

    Table 2 - Preliminary survey: Number of children vs. interest and research undertaken

  • 36

    Table 2, on the previous page, indicates no correlation between interest in genealogy and

    the number of children that a person has, although there is certainly one with having

    undertaken research. In this instance, Table 2 shows that 54.59% of those without children

    had carried out some form of family research, and when all participants with childrens

    responses were collated, it was revealed that 69.14% of parents have researched their

    genealogy, a percentage increase of 26.66%.

    Experience

    Participants in the survey were also asked when they had first begun to research their

    genealogies, as well as how many years of experience they had of research; this question

    was only asked if they had indicated that they had indeed carried out genealogical research

    at some point in their lives. The reason that two such similar questions were answered was

    to identify if researchers judged their experience as going from their first experience of

    research or from continuous research.

    The median experience of respondents was five and a half years of experience if judged

    going by age versus when they started their research, using the raw quantitative data. Many

    respondents answered that they had zero years of experience, which is made clear in Figure

    3.2 on the following page, with the longest amount of experience being 60 years.

    Unfortunately, not all respondents answered this question correctly, with fourteen

    respondents giving unusable responses.

    The median years of experience of respondents as based on the bandings of experience

    given (0-1, 1-3; 3-5, 5-10; 10+) was three to five years, which does not differ much from the

    previous result, as the difference can be explained by the absence of the fourteen responses.

  • 37

    Figure 3.2 - Preliminary survey: Age of researchers vs. when they started researching

    Figure 3.2, above, shows a scatter graph of all respondents ages versus when they started

    researching, it is clear that many participants had little experience, with 49% of respondents

    indicating that they had less than three years of experience. The largest contingent of

    respondents indicated that they had their first taste of genealogical research before the age

    of 18, with 34.81% of participants fitting into this section; this would indicate that many

    researchers had their first experience of research whilst still in school, perhaps as part of a

    school project.

  • 38

    Conclusion

    The preliminary survey revealed that a given person was 26.66% more likely to carry out

    research if they were a parent, 46.6% more likely to research their family tree if they were in

    a serious relationship involving cohabitation and generally more likely to carry out research

    as they age. It was found that the median amount of experience that researchers have is five

    and a half years, with the largest segment of researchers having first undertaken research

    before the age of 18. No significant correlation was found between gender and the chances

    of having carried out research. Interestingly, it was also revealed that 70.48% of respondents

    from the United States of America had carried out research but only 35.29% of participants

    from the United Kingdom had done so, a huge culture shift, with Canada also leaning

    towards the American result with 64.52% of their respondents also indicating that they had

    undertaken research.

  • 39

    Final Survey

    Introduction

    The final survey was taken by 420 unique people, across 40 different countries (outlined in

    Figure 3.3, below). It was distributed through the same channels as the preliminary survey,

    to minimise differences in respondent demographics, to compare both studies adequately,

    using the same tools. The tools used were instant.ly to create the survey, with it being

    distributed through Facebook, Reddit, The Student Room, and through emailing genealogical

    societies.

    Figure 3.3 - Final survey: The 40 countries that responded

  • 40

    The final survey was more in-depth than the preliminary survey, with a total question count

    of 16. There was a higher focus on qualitative questions throughout the survey, as it was

    primarily intended for use in identifying the privacy issues that researchers feel are present

    in genealogy since the advent of the Internet. Despite this, the survey also had a secondary

    objective of confirming the results identified in the preliminary survey.

    Following the completion percentage success of the preliminary survey, the author decided

    that it would be appropriate to follow that example, and limit qualitative questions to the

    second page of the questionnaire. The first page of the survey remained nearly identical to

    the preliminary survey, with the only change being adding age bandings to the question of

    respondents ages, instead of asking for their specific age. It was reasoned that this change

    would increase participants comfort levels with answering the survey, as it decreases the

    amount of identifiable data, without harming the data quality excessively.

    Ten of the questions were multiple-choice questions with radio buttons, with a further

    multiple-choice question with a dropdown box for an answer selector. One question

    comprised of a matrix, with six different questions within this question, asking about their

    levels of worry in relation to the categories. The other five questions were qualitative

    comment-based questions, with two of these being optional. A full transcript of the survey

    can be found in Appendix B: Final Questionnaire. The average time taken to complete the

    survey was 2 minutes and 6 seconds, 18 seconds longer than the preliminary survey.

    Unfortunately, the increased quantity of in-depth questions lead to a much higher survey

    termination rate, with 41.18% of participants opting to end the survey upon viewing the

    second page of questions, an incredibly high ratio. This lead to the final survey having fewer

    participants completing the entirety of the survey, with only 120 finishing the full survey, a

    decrease of 29 from the initial survey, despite there being an increase of 110 in the amount

    of participants completing the first page of the survey.

    Appendix G: Final Survey Demographics Data, reveals the raw demographic data gathered

    through the lifetime of the surveys issuance, for reference.

  • 41

    Age

    As this survey asked respondents for their age banding as opposed to their specific age, a

    maximum and minimum age could not be acquired in this survey, however, the median age

    banding for respondents in the survey was 25-34, with the median age banding for

    researchers being the same.

    Figure 3.4 - Final survey: Age vs. interest and research undertaken

    Figure 3.4, above, shows a clear correlation between age and having carried out research,

    although there is a bell curve correlation with interest levels, although the author believes

    this to be the case due to the small sample size of those aged 65+. The jump in respondents

    having researched their family history once they hit the age of 25 is nothing short of

  • 42

    remarkable, with a 111% increase in the likelihood of carrying out research when moving

    from the age groups 18-24 to 25-34.

    Gender

    There was a similar split in the final survey to the preliminary survey, with a slight increase in

    favour of females, as 44.05% of respondents identified as male, 54.76% as female with the

    remaining 1.19% identifying as neither male nor female. Whilst there was no relationship

    found between gender and being interested in genealogy, it was found that participants

    were 16.73% more likely to carry out research if they were female. Again, as with the

    preliminary survey, the author does not feel comfortable drawing conclusions with those

    who do not identify as either gender as only five respondents out of 420 classified

    themselves as such.

    Marital Status

    Appendix G: Final Survey Demographics Results indicates that there was a large skew

    towards respondents classifying themselves as single, with 55.95% of participants identifying

    as such. Following the example set in the preliminary survey, the author again splits the

    demographics into two groups, those who have married (married, separated, divorced, and

    widowed) and those who have never married (single, living with a partner). A relationship

    was found between the likelihood of a given person being interested in genealogy based on

    their marital status, with people who had married being 20.5% more likely to be interested

    in genealogy than those who had never married, with the figures standing at 89.17% for

    those who had married and 74% for those who had not.

  • 43

    Figure 3.5 - Final survey: Marital status and undertaking of research

    Following the trend of genealogical interest, there was an extremely large correlation

    between marital statuses and the likelihood of having carried out research. Based on the

    aforementioned groupings, it was found that those who had married were 97.37% more

    likely to have carried out research than those who had never married. Figure 3.5, above,

    shows the percentage of participants having undertaken research amongst all of the

    individual marital statuses. When grouped into having married and never married; the figure

    for those who had undertaken research stood at 75% of those who had married in contrast

    to 38% of those who had never married having carried out research.

  • 44

    Children

    As with the preliminary survey, there is a large skew towards childless respondents, with

    78.57% of participants stating that they have no children, with every other option under

    10%. Full listings of shares can be found in Appendix G: Final Survey Demographics

    Results.

    Table 3 - Final survey: Number of children vs. interest and research undertaken

    There is a direct correlation between people being a parent and being interested in

    genealogy, with a parent being 13.95% more likely to be interested in genealogy than a

    childless respondent is. In addition, there is a clear relationship between parenthood and the

    likelihood of a given person having carried out genealogical research, as shown in Table 3,

    above. Amongst participants, 67.78% of parents indicated that they had undertaken

    research, whilst only 43.33% of childless respondents had done so. This revealed that a

    respondent was 56.41% more likely to research their family tree if they were a parent.

  • 45

    Conclusion

    The final survey confirmed many of the conclusions of the preliminary survey. A direct

    correlation was found between age and the likelihood of a given person carrying out

    genealogical research, with a noticeably large jump of 111% when moving between the age

    brackets 18-24 and 25-34. A relationship was found between gender and the percentage of

    respondents undertaking research; it was found that females are 16.73% more likely to carry

    out research than males. The final survey also found that a given person is 20.5% more likely

    to have carried out research if they have been married at one point in their lives, regardless

    of if they remain married or not. This cross-section of people are also 97.37% more likely to

    have researched their family history than those who had never married, according to the

    findings of this study. Finally, it was also found that a given participant was 13.95% more

    likely to be interested in genealogy and 56.41% more likely to have carried out some form of

    genealogical research if they were parents.

  • 46

    Comparison and Conclusions

    One conclusion that can be drawn with the surveys is that respondents were reluctant to

    carry out surveys that required them to answer in the form of a text box, justifying the

    authors decision to split the surveys into two separate pages. The data supports this

    hypothesis, with 17.68% of participants deciding to end the survey upon seeing the second

    page of the preliminary survey, which included two comment-based questions whilst the

    final survey, with five qualitative questions saw 41.18% of the respondents close the page

    upon seeing the second page of the survey.

    Figure 3.6 - Comparison: Percentage of each age group that had undertaken research

  • 47

    There is an incredibly strong correlation between age and the chances of a given person

    having undertaken genealogical research. This is consistent with the findings of Hill (2011),

    who identified that the majority of Archives.com users were aged over 45 years old.

    Notice a peak in the age bandings 45-54 before dropping going into 55-64 on both surveys in

    Figure 3.6 on the preceding page, more noticeably on the preliminary survey. The author

    believes that this is due to the 1977 release of the TV mini-series Roots by Alex Haley, when

    the participants in question would have been between the ages of 6 and 16, school ages,

    where they could have been asked to undertake research as part of the project.

    Table 4 - Preliminary survey: Age vs. years of experience of researchers

    Table 4, above, and Table 5, below, certainly support this theory as there is a noticeable

    jump in experience of those aged between 45 and 54 years old, indicating that they could

    well have had their first taste of research in school, around the time of Roots release.

    Table 5 - Final survey: Age vs. feeling of experience of researchers

  • 48

    The fact that there is also a marked drop-off with experience for 45-54 year olds with the

    Very experienced option also suggests that they undertook their research in school many

    years ago and thus do not regard themselves as very experienced. The discrepancy between

    the data may be due to participants feeling that the original experience question in the

    preliminary survey was based on when they first undertook research as opposed to

    continuous research. This hypothesis is further validated by two participants in the age

    bracket answering that they had carried out research as part of a school project and one 48-

    year-old respondent, who, when asked why they started researching their family tree

    answered:

    I saw "Roots" and wanted to learn where my roots were.

    Both surveys indicated that it was much more likely for a given person to have carried out

    research if they were or had been married, with 72.55% in the preliminary survey and 75% in

    the final survey of those who were or had been married having carried out research. This in

    sharp contrast to the 51.44% in the preliminary and 38% in the final surveys, respectively, of

    those who had never been married. The final survey also indicated that those who had been

    married were also 20.5% more likely to be interested in genealogy.

    This indicated that a given person would be 41.03% and 97.37% more likely to have carried

    out genealogical research, respectively, if they were in a serious relationship. The author

    believes that the discrepancy between the results of both surveys is owed in a large part to a

    younger contingent of respondents answering the final survey, with 41% of respondents to

    the preliminary survey being under 25 years of age, and 49% being of said age banding in the

    final survey. This is significant due to the previously identified age vs. likelihood of having

    carried out research correlation, which revealed that participants became more likely to

    have researched their ancestors the older they became.

  • 49

    Figure 3.7 - Comparison - Children: Percentage of participants who had researched

    A similar trend was noticed with likelihood of researching and having children, as evidenced

    in Figure 3.7, above. In the preliminary survey, it was determined that a given person would

    be 26.66% more likely to carry out research if they were a parent. The difference was much

    more marked in the final survey, with people being 56.41% more likely to have researched

    their ancestors if they had children. This was a much bigger discrepancy than the

    relationship correlation and it cannot be fully accounted for by the age difference, this leads

    the author to believe that whilst there is certainly a large correlation, it is not as pronounced

    as the final survey indicated.

    There was not enough of a difference over both surveys for the author to asseverate a

    correlation between gender and having carried out research. There was a negligible

    difference in the preliminary survey of less than one percent, whilst in the final survey; the

    data suggested that females were 16.73% more likely to carry out research. This fits with Hill

    (2011)s findings that the majority of Archives.com users were female. Despite this, the

    reason the author does not draw a correlation is due to the final survey seeing a significantly

    younger male population than female population, with nearly 55% of males surveyed being

    under the age of 25 and just over 44% of females also fitting into that age bracket, which

    accounts for the difference.

  • 50

    There were not enough respondents from all countries for the author to asseverate a trend

    overall, there were enough respondents from both the United States and United Kingdom to

    make a conclusion. Respondents were nearly double as likely to have carried out some form

    of research if they were from the United States as opposed to the United Kingdom.

    It was revealed that a genealogical researcher is most likely to be aged over 45 years old,

    married with kids and living in the United States. All of those in that cross-section over both

    surveys answered that they had indeed carried out research, with age proving to be the least

    important factor of the four, with 94% of Americans that are married with kids that

    answered the survey indicating that they had undertaken genealogical research.

    To summarise, correlations were definitely proven between the likelihood of a given person

    researching their ancestors and their: relationship status, age and if they had children or not.

    It was found that people are significantly more likely to have carried out research as they

    grow older, and that they are much more likely to have researched their family history if

    they have had children or are in a serious relationship involving cohabitation of some form

    i.e. being married or living with their partner.

  • 51

    3.2 Privacy

    This part of the report is dedicated to evaluating the various privacy questions asked in the

    final survey. Firstly, respondents perceptions of privacy concerning genealogy will be

    investigated, followed by analysing what privacy issues they are most worried about with

    genealogy on the Internet specifically. Finally, social medias role in the privacy argument will

    be examined and quantified.

    Perceptions

    Participants in the survey were asked to complete a matrix question that required them to

    select how worried they were about six different factors in genealogy on a five-category

    scale. These factors were accuracy of genealogical data in general, accuracy of genealogical

    data on the Internet, identity theft, privacy of living people, security of data and others

    profiting from your data.

    The results of this question can be found on the following page in Table 6. These results

    show that the factor that the respondents were most worried about was accuracy of

    genealogy on the Internet, with 51% of those questioned answering that they were either

    worried or very worried. It is evident that the data being online is the major issue, as 36%

    noted that they were either worried or very worried about genealogical datas accuracy in

    general, 15% less than the option related to the Internet. Not all respondents thought that

    this digital information was of poor accuracy however, with one respondent with intimate

    knowledge of the digitisation process stating:

    Having been involved in transcribing/digitizing records for FamilySearch.org, I don't

    worry much about accuracy of digitized records. It's what people do or do not do with

    those records when building a tree that concerns me.

  • 52

    It is not surprising that identity theft was the lowest ranked factor with only 22% of

    respondents agreeing that it is definitely something to worry about, and 41% stating that

    they are not worried at all by it due to the non-sensitive nature of the data available,

    although not all respondents agreed.

    We are close to the point of too much information being available. I was a victim of

    ID theft. It is very easy to get the minimum information required for ID theft.

    The author believes that most of the information that people would be worried about being

    mined from genealogical data, such as date of births and mothers maiden name are

    generally made freely available by people on social media anyway through poor privacy

    settings. Surprisingly, the category that saw the least amount of participants indicate that

    they were very worried about it was the accidental release of data pertaining to living

    people. The author feels that it is the most dangerous section, in that it can be potentially

    very harmful, whilst also being much more likely to occur than identity theft, and can in fact,

    lead to identity theft. The author feels that this is especially surprising given previous

    instances of this occurring, e.g. IrishGenealogy.ies accidental release (Edwards, 2014).

    Table 6 - Final survey: Privacy matrix question

  • 53

    It was found that males are less likely to be worried about the factors regarding genealogys

    privacy issues, with 35.32% of male participants stating that they were not worried, whilst

    only 21.27% of females said the same. Women were also nearly doubly as likely to be

    indecisive about their feelings on the subject, with 15.35% stating they were not sure, with

    only 7.94% of male participants claiming that they were not sure. There was a slight skew in

    favour of women being more worried overall, but it was not as significant as the other

    relationships.

    As Table 7, below, shows, there is no clear correlation between age and participants

    comfort levels regarding the various privacy factors, and no conclusions can be drawn from

    the below data.

    Table 7 - Final survey: Privacy matrix vs. age

    The study revealed that a given participant was much more likely to be worried about the

    factors given if they were not a parent. Of the childless respondents, 37.86% responded that

    they were either worried or very worried, whilst only 28.2% of parents indicated similarly.

    There was a similar, but not as significant trend for classifying themselves as not worried

    about the six factors, with nearly 7% more stating that they were not worried if they had

    children, as 30.34% of parents revealed that they were not worried, whereas 23.66% of

    childless respondents said the same.

  • 54

    The survey concluded that those who had never married much more likely to be worried,

    than those who have previously married. Individual statistics can be found below in Table 8.

    With the exception of those who are separated, which could be due to a low sample size,

    and widowed, who had zero respondents; those who are currently married are the least

    likely to be worried about any of the six factors, with only 7% stating that they were very

    worried about the factors given.

    Table 8 - Final survey: Privacy matrix vs. marital status

    To summarise, the study concluded that certain demographics were more prone to worr