35
Databases for Databases for Renaissance and Renaissance and Early Modern Early Modern Sources Sources Session Tutor: Sarah Session Tutor: Sarah Richardson Richardson sarah.richardson@warwick. sarah.richardson@warwick. ac.uk ac.uk

Databases for Renaissance and Early Modern Sources

  • Upload
    frayne

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

Databases for Renaissance and Early Modern Sources. Session Tutor: Sarah Richardson [email protected]. Using Databases. Databases may be used in a number of ways to support your research. Bibliography (see later sessions) For simple lists To analyse complex sources. Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Databases for Renaissance and Early Modern Sources

Databases for Databases for Renaissance and Renaissance and

Early Modern Early Modern SourcesSources

Session Tutor: Sarah Session Tutor: Sarah Richardson Richardson

[email protected]@warwick.ac.ukc.uk

Page 2: Databases for Renaissance and Early Modern Sources

Using DatabasesUsing Databases

Databases may be used in a number Databases may be used in a number of ways to support your research.of ways to support your research.

Bibliography (see later sessions)Bibliography (see later sessions) For simple lists For simple lists To analyse complex sourcesTo analyse complex sources

Page 3: Databases for Renaissance and Early Modern Sources

OverviewOverview

Source assessment and data-Source assessment and data-modelling modelling

The challenge of sourcesThe challenge of sources How will relational databases help?How will relational databases help? Source analysisSource analysis Database design and creationDatabase design and creation Free text databasesFree text databases Methodological issuesMethodological issues

Page 4: Databases for Renaissance and Early Modern Sources

ChallengesChallenges

Unstructured source materialUnstructured source material Missing dataMissing data Complications with numbers and Complications with numbers and

datesdates Data comes from more than one Data comes from more than one

sourcesource

Page 5: Databases for Renaissance and Early Modern Sources

Databases should look Databases should look like this?like this?

Voter ID First Name Surname Address Occupation Voting Preference

001 John Smith Halifax Butcher X

002 John Smith Halifax Butcher X

003 John Smith Halifax Butcher X

004 John Smith Halifax Butcher X

Unique identifier or primary key

Column or field or attribute

Row or record

Field name or attribute name

Page 6: Databases for Renaissance and Early Modern Sources

But what do you do with But what do you do with this?this?

Letter from the Medici Granducal Archive

Page 7: Databases for Renaissance and Early Modern Sources

From Source to Database

Frankpledge: Original source from The National Archive translated to the Thame Database

Page 8: Databases for Renaissance and Early Modern Sources

How will relational How will relational databases help?databases help?

A relational database is a database A relational database is a database created with many tables linked created with many tables linked togethertogether

Each table has a common factor Each table has a common factor which links it to others in the which links it to others in the databasedatabase

For complex sources a number of For complex sources a number of tables may be created to deal with tables may be created to deal with different aspects of the datadifferent aspects of the data

Page 9: Databases for Renaissance and Early Modern Sources

Relational modelRelational modelSentence Table

Defendant IDCase NumberVerdictSentenceComments

Offences Table

Defendant IDCase NumberOffence TypePlace of OffenceDate of OffenceDescriptionComments

Occupational Categorisation Table

Occupation TitleOccupational Categorisation 1Occupational Categorisation 2

Witnesses Table

Case NumberWitness 1 First nameWitness 1 SurnameWitness 1 AddressWitness 1 SexWitness 2 First nameWitness 2 SurnameWitness 2 AddressWitness 2 SexComments

Defendant Table

Defendant IDFirst nameSurnameAddressAgeSexOccupation TitleComments

Page 10: Databases for Renaissance and Early Modern Sources

A more complex relational A more complex relational databasedatabase

Page 11: Databases for Renaissance and Early Modern Sources

Source analysisSource analysis Data should be broken down into components Data should be broken down into components

that collects groups of information into that collects groups of information into objects or events. objects or events.

For example information relating to a person, For example information relating to a person, an organisation, a document, an object or a an organisation, a document, an object or a building, or to events such as a marriage, a building, or to events such as a marriage, a transaction, the making of a will, or an transaction, the making of a will, or an election. election.

In database terminology these are referred to In database terminology these are referred to as as entitiesentities. .

Each entity will form a table in the final Each entity will form a table in the final database. database.

Page 12: Databases for Renaissance and Early Modern Sources

AttributesAttributes

Once each entity has been identified, Once each entity has been identified, list the data associated with each. list the data associated with each.

For example, the Defendant table has For example, the Defendant table has information on the first name, surname, information on the first name, surname, address, age, sex and occupation of address, age, sex and occupation of each defendant.each defendant.

This information will produce the fields This information will produce the fields for each table. for each table.

The fields are also known as The fields are also known as attributesattributes. .

Page 13: Databases for Renaissance and Early Modern Sources

Field typesField types

Text For alphabetical or numerical data but beware that numbers will be treated like text if you choose this data type.

Numbers For all numbers but you may wish to use one of the types below for currency/dates.

Date/Time For dates and/or times.

Currency In most commercial database software this is applicable only to modern currency.

AutoNumber

Allocates a unique identifier to each record. It is useful for ID fields.

Memo For fields containing much unstructured information. Useful for comments fields.

Page 14: Databases for Renaissance and Early Modern Sources

Issues for field typesIssues for field types

SizeSize CalculationsCalculations DatesDates CurrencyCurrency Unstructured dataUnstructured data Unique identifiersUnique identifiers

Page 15: Databases for Renaissance and Early Modern Sources

RelationshipsRelationships

One-to-one relationships: records in one One-to-one relationships: records in one table have only one match with records table have only one match with records in a second table. in a second table.

One-to-many relationships: records in One-to-many relationships: records in the first table match many in the second, the first table match many in the second, but those in the second table only have but those in the second table only have one match. one match.

Many-to-many relationships: records Many-to-many relationships: records from both tables have relationships from both tables have relationships between them between them

Page 16: Databases for Renaissance and Early Modern Sources

Data entry tipsData entry tips Fields may be designated as ‘required’. Fields may be designated as ‘required’. Default values may be entered. Default values may be entered. Use the tool to allow one of only two Use the tool to allow one of only two

options to be entered such as Yes/No, options to be entered such as Yes/No, True/False, Guilty/Not Guilty. True/False, Guilty/Not Guilty.

‘‘Look-up’ tables: a fixed list of values that Look-up’ tables: a fixed list of values that may be entered into a particular field. may be entered into a particular field.

Validation rules. Validation rules. Automatic generation of unique numbers. Automatic generation of unique numbers.

Page 17: Databases for Renaissance and Early Modern Sources

Free Text DatabasesFree Text Databases Free text databases search unstructured Free text databases search unstructured

texts and images provided in digital formtexts and images provided in digital form They work by ‘tagging’ the text in a mark-They work by ‘tagging’ the text in a mark-

up language (eg HTML, XML, SGML). In up language (eg HTML, XML, SGML). In the past users had to do this. Now most the past users had to do this. Now most programmes will do it for you.programmes will do it for you.

The database may then be searched in a The database may then be searched in a number of ways: full-text; wildcard number of ways: full-text; wildcard searches with * and ?; Boolean searches searches with * and ?; Boolean searches (AND, OR, and NOT); proximity searches; (AND, OR, and NOT); proximity searches; numeric searches (>, <, >=, <=, <>); numeric searches (>, <, >=, <=, <>); Date searches; Fuzzy searches Date searches; Fuzzy searches

Page 18: Databases for Renaissance and Early Modern Sources

ZoteroZoteroZotero is an easy-to-use yet powerful research tool that helps you gather, organize, and analyze sources (citations, full texts, web pages, images, and other objects), and lets you share the results of your research in a variety of ways. Zotero is an easy-to-use yet powerful research tool that helps you gather, organize, and analyze sources (citations, full texts, web pages, images, and other objects), and lets you share the results of your research in a variety of ways. It stores author, title, and publication fields and exports that information as formatted references. It also has the ability to interact, tag, and search in advanced ways. http://www.zotero.org/

Page 19: Databases for Renaissance and Early Modern Sources

http://www.zotero.org/

For anyone who writes with footnotes, Zotero is a fabulous tool. With a click of a mouse, it imports catalogue records from a library database, or JSTOR, or even Amazon, allowing a scholar to create a personal reference database on his desktop. Better still, it permits extensive annotations, keyword tagging, and hyperlinks both to other items in the database and to external materials. Some users know that it can catalogue images, too, pulling metadata from Flickr. If you already run Zotero and need to work with images, try it. The possibilities are mind-bending for those of us who work with visual resources.

Page 20: Databases for Renaissance and Early Modern Sources

Old Bailey OnlineOld Bailey Online

http://www.oldbaileyonline.org/

Page 21: Databases for Renaissance and Early Modern Sources

Methodological IssuesMethodological Issues

Nominal record linkageNominal record linkage CodingCoding Occupational analysisOccupational analysis ProsopographyProsopography Community reconstructionCommunity reconstruction

Page 22: Databases for Renaissance and Early Modern Sources

Nominal Record LinkageNominal Record Linkage Concerns all historians using data containing Concerns all historians using data containing

namesnames How do we determine that sources relate to the How do we determine that sources relate to the

same person and not another person with the same person and not another person with the same name?same name?

Particularly difficult for early modern sources Particularly difficult for early modern sources where names are not fixed. where names are not fixed.

Two problems: Two problems: The existence of multiple common names. This The existence of multiple common names. This

problem is particularly acute in local communities problem is particularly acute in local communities where certain surnames are dominant.where certain surnames are dominant.

Variation in spellings.Variation in spellings.

Page 23: Databases for Renaissance and Early Modern Sources

SolutionsSolutions

Coding surnames using Coding surnames using standardisation schemes, eg standardisation schemes, eg SOUNDEX or FISKsSOUNDEX or FISKs

Using multiple passes through the Using multiple passes through the data changing variables each time as data changing variables each time as the data is matchedthe data is matched

Using a combination of computer Using a combination of computer and manual techniquesand manual techniques

Page 24: Databases for Renaissance and Early Modern Sources

SOUNDEXSOUNDEXNumber Represents the

Letters

1 B, F, P, V

2 C, G, J, K, Q, S, X, Z

3 D, T

4 L

5 M, N

6 R

Page 25: Databases for Renaissance and Early Modern Sources

SOUNDEX rulesSOUNDEX rules Names With Double Letters: Names With Double Letters: If the surname If the surname

has any double letters, they should be treated as has any double letters, they should be treated as one letter. one letter.

Names with Letters Side-by-Side that have Names with Letters Side-by-Side that have the Same SOUNDEX Code Numberthe Same SOUNDEX Code Number : should : should be treated as one letter. For example, Jabe treated as one letter. For example, Jacksckson on or Schmior Schmidtdt. .

Names with Prefixes: Names with Prefixes: such as Van or De should such as Van or De should be coded twice with and without the prefix be coded twice with and without the prefix

Consonant Separators: Consonant Separators: If a vowel (A, E, I, O, If a vowel (A, E, I, O, U) separates two consonants that have the same U) separates two consonants that have the same SOUNDEX code, the consonant to the right of SOUNDEX code, the consonant to the right of the vowel is coded. the vowel is coded.

Page 26: Databases for Renaissance and Early Modern Sources

Problems with SOUNDEXProblems with SOUNDEX

Does not work so well for European Does not work so well for European names. Works best with names of names. Works best with names of English originEnglish origin

Does not work as well with early Does not work as well with early modern names and spelling variantsmodern names and spelling variants

One solution for early modern One solution for early modern historians is FISKhistorians is FISK

Page 27: Databases for Renaissance and Early Modern Sources

Four Letter Initial Surname Four Letter Initial Surname Codes (FISK)Codes (FISK)

Consists of letters and punctuation Consists of letters and punctuation marksmarks

Generated from first letter of a Generated from first letter of a surname variant plus up to three surname variant plus up to three further consonants from the further consonants from the surname. surname.

Vowels only used when they are the Vowels only used when they are the first letter of the surnamefirst letter of the surname

A full stop is used where no second, A full stop is used where no second, third or fourth letter is available for third or fourth letter is available for use. use.

Page 28: Databases for Renaissance and Early Modern Sources

If surname variants are deduced to If surname variants are deduced to be of the same surname base these be of the same surname base these names are considered to form a names are considered to form a distinct surname group and the distinct surname group and the same FISK is allocatedsame FISK is allocated Thus: Thus: Eyres Eyres is coded as ARS. is coded as ARS.

Group Group Ayres. Morrice Ayres. Morrice is coded as is coded as MRS. Group MRS. Group Morris Morris

Bowyer Bowyer is coded with is coded with Boyer Boyer and and Springall Springall with with Springold. Springold.

Davies Davies and and Davidson Davidson are placed in are placed in one group. one group. ap Howell ap Howell is included is included in the group in the group PowellPowell

Page 29: Databases for Renaissance and Early Modern Sources

Five letter FISKsFive letter FISKs Used to differentiate between similar but Used to differentiate between similar but

distinct surname groups. distinct surname groups. Fifth letter would normally be a Fifth letter would normally be a

distinctive letter from the end of the distinctive letter from the end of the surname, but any letter could be used, surname, but any letter could be used, and often a vowel from the start of the and often a vowel from the start of the surname would be convenient. surname would be convenient. To distinguish To distinguish Partridge Partridge from from Porter Porter (FISK = (FISK =

PRTR) an additional letter PRTR) an additional letter g g is added to make is added to make the new FISK for the new FISK for Partridge Partridge (PRTRG). The (PRTRG). The code for code for Porter Porter remains as (PRTR). remains as (PRTR).

To distinguish To distinguish Bailey Bailey from from Bloy Bloy (FISK = BLY.) (FISK = BLY.) an additional letter an additional letter y y is added to make the is added to make the new FISK for new FISK for Bailey Bailey (BLY.Y) The code for (BLY.Y) The code for Bloy Bloy remains as (BLY.)remains as (BLY.)

Page 30: Databases for Renaissance and Early Modern Sources

CodingCoding Used to be necessary because Used to be necessary because

databases could not handle large databases could not handle large amounts of text amounts of text

Historians still code: Historians still code: data entry may be speeded up by using data entry may be speeded up by using

simple codes eg. ‘M’ for married, ‘U’ for simple codes eg. ‘M’ for married, ‘U’ for unmarried, and ‘W’ for widowed but unmarried, and ‘W’ for widowed but complicated coding may complicated coding may slow slow data entry data entry down down

Is a form of close assessment of the data Is a form of close assessment of the data and may lead to the development of and may lead to the development of categories for ease categories for ease

May facilitate the process of record linkage May facilitate the process of record linkage

Page 31: Databases for Renaissance and Early Modern Sources

Deciding to codeDeciding to code Should coding take place before or Should coding take place before or

after data entry?after data entry? Should codes be letters or numbers? Should codes be letters or numbers?

Numbers mean high level of errorNumbers mean high level of error Coding schemes should make Coding schemes should make

decisions in the light of other decisions in the light of other classification systems used by classification systems used by historians. historians.

Full code book should be developed Full code book should be developed as part of the documentation to as part of the documentation to accompany the database. accompany the database.

Page 32: Databases for Renaissance and Early Modern Sources

Occupational analysisOccupational analysis Form of post-coding Form of post-coding Assist in analysing fields with Assist in analysing fields with

numerous values numerous values Most common type is categorisation Most common type is categorisation

of occupational information. of occupational information. Must be able to compare with other Must be able to compare with other

research in the field and to provide research in the field and to provide as complete a picture as possible as complete a picture as possible regarding the status and occupation regarding the status and occupation of the populationof the population

Page 33: Databases for Renaissance and Early Modern Sources

Coding schemesCoding schemes Modern historians use standardised Modern historians use standardised

occupational classification systemsoccupational classification systems Early modern historians often each Early modern historians often each

devise their own schemadevise their own schema A compromise is to use a multi-A compromise is to use a multi-

dimensional approach: each dimensional approach: each occupation is classified using several occupation is classified using several different methods. Occasionally different methods. Occasionally individual occupational titles may be individual occupational titles may be isolated where any categorisation isolated where any categorisation would destroy the nuances of work would destroy the nuances of work experiences. experiences.

Page 34: Databases for Renaissance and Early Modern Sources

ProsopographyProsopography

Mostly used for study of elitesMostly used for study of elites Database is created not from a single Database is created not from a single

source but many bringing source but many bringing biographical data togetherbiographical data together

Use relational design to avoid very Use relational design to avoid very large, multi-field databases large, multi-field databases containing many blank fieldscontaining many blank fields

Consider issues of nominal record Consider issues of nominal record linkagelinkage

Page 35: Databases for Renaissance and Early Modern Sources

Community Community ReconstructionReconstruction

Concentrates on bringing together Concentrates on bringing together all records from one placeall records from one place

Needs careful designNeeds careful design Primary methodological issue is one Primary methodological issue is one

of record linkage, so documents, of record linkage, so documents, place names and individuals may all place names and individuals may all have their own ID codeshave their own ID codes