39
ETD 2005 International International Accesses to a Accesses to a Digital Library of Digital Library of ETDs ETDs

ETD 2005 International Accesses to a Digital Library of ETDs

Embed Size (px)

Citation preview

Page 1: ETD 2005 International Accesses to a Digital Library of ETDs

ETD 2005

International Accesses International Accesses to a Digital Library of to a Digital Library of

ETDsETDs

Page 2: ETD 2005 International Accesses to a Digital Library of ETDs

ETD 2005

Ana PavaniAna Pavani

Departamento de Engenharia ElétricaDepartamento de Engenharia ElétricaPontifícia Universidade Católica do Rio de JaneiroPontifícia Universidade Católica do Rio de Janeiro

[email protected]://www.maxwell.lambda.ele.puc-rio.br/

Page 3: ETD 2005 International Accesses to a Digital Library of ETDs

Presentation outlinePresentation outline

Profile of the digital library

Generation of data

Combination and anaysis of data – interesting results

Next steps

Page 4: ETD 2005 International Accesses to a Digital Library of ETDs

Profile of the digital libraryProfile of the digital library

Beginning of the collection – 2nd semester of 1995

Items to start the collection – courseware (texts, exercises, technical manuals, tests, etc.)

Page 5: ETD 2005 International Accesses to a Digital Library of ETDs

The digital library is part of a system that:

Is a LMS (Learning Management System) Has administrative functions that allow data

exchange with the university’s administrative system

Is linked (2 directions) to CNPq’s Lattes Platform (curricula database with more than 595 K CV)

Allows the control of series collections Is multilingual and has interfaces in 3 languages

Page 6: ETD 2005 International Accesses to a Digital Library of ETDs

Evolution of the collection:

Administrative documents Preprints, published papers & online articles Interactive courseware ETDs (2000) Online journals (2003) Senior projects (2003) Online bulletins – distributed through mailing

lists, archived and published automatically (2004) Books (Oct. 2005)

Page 7: ETD 2005 International Accesses to a Digital Library of ETDs

Numbers of titles in the collection:

Courseware (many types) – 2,700+ Administrative documents – 33 Technical documents – 94 ETDs – 1873 (PUC-Rio) + 31 (UNICAP) Preprints, published papers & online articles –

280 Senior projects – 305 Online journals – 3 (+ 1 in Oct. 2005 + 1 in Dec. 2005) Online bulletins – 2 Books – 1 (to be published in Oct. 2005)

Total number of digital objects (DOs): 16,400+

Page 8: ETD 2005 International Accesses to a Digital Library of ETDs

Technological characteristics:

Machine – IBM RS/6000 Operating system – IBM AIX Web server – Apache DBMS – IBM DB2

Apache log contains info on accesses to ALLALL digital contents on the system, besides all transaction that users perform (clicking buttons, reading posts, reading help pages, etc.) – data on transactions with contents must be extracted from the server log to generate the numbers to be analyzed

Page 9: ETD 2005 International Accesses to a Digital Library of ETDs

Generation of dataGeneration of data

Data have 2 different natures: production and accesses

Production data come from functions of the system that are not related to the Apache server but only to the DB

example

Page 10: ETD 2005 International Accesses to a Digital Library of ETDs

(*) PUC-Rio started requiring ETDs in Aug. 2002;

(*) UNICAP does not require ETDs.

ETD time series - PUC-Rio & UNI CAP

0

200

400

600

800

1000

1200

1400

1600

1800

2000

semesters

nu

mb

ers

jun.00 dec.00 jun.01 dec.01 jun.02 dec.02 jun.03 dec.03 jun.04dec.04 jun.05 sep.05

Page 11: ETD 2005 International Accesses to a Digital Library of ETDs

Access data are obtained from both the Apache Server log and the DB:

Logs are mined (according to the following definitions) and the results are stored on the DB

Mined data are combined with production data (metadata) already in the database (types of contents, authors, programs, areas of knowledge, dates, countries, etc.) to yield results

Page 12: ETD 2005 International Accesses to a Digital Library of ETDs

Definitions for mining the logDefinitions for mining the log

When access statistics came into discussion, it was necessary to define how data should be mined from the log and how it should be combined afterwards

The definitions follow – (M) mining definitions and (C) combining definitions

Page 13: ETD 2005 International Accesses to a Digital Library of ETDs

(M) Visits and complete visits

An ETD can have one or many digital objects. The number of visits is the sum of all accesses to all digital objects in a given month. A complete visit is a set of visits to all digital objects from a country in a given month.

Page 14: ETD 2005 International Accesses to a Digital Library of ETDs

(M) Country x IP address

The decision to use the country and not the IP address to establish a visit was based on the fact that the visits to an ETD can be made at different times (and reconnecting may assign a new IP address) and from different locations (with fixed IP addresses).

Page 15: ETD 2005 International Accesses to a Digital Library of ETDs

(M) Counting visits from the same IP address

Visits from the same IP are counted individually due to the fact that networks with many machines can be identified by the IP address of a firewall.

Page 16: ETD 2005 International Accesses to a Digital Library of ETDs

(M) Counting visits to restricted digital objects

Some ETDs are totally or partially restricted – approximately 30% have some type of permanent or temporary restriction. Metadata, abstracts included, are publicly available for all of them. It was decided that attempts followed by denials of access would be counted as accesses.

!! This is informed in the help pages !! This is informed in the help pages of the system; it is suggested that of the system; it is suggested that authors should consider allowing their authors should consider allowing their contents to become public if many contents to become public if many attempts occur.attempts occur.

Page 17: ETD 2005 International Accesses to a Digital Library of ETDs

(C) Lines to mine

Since the interest was on access to digital objects, the decision was to get the lines with extensions .dcr, .doc, .htm, .pdf, etc. All possible extensions on the database are considered, as long as the corresponding item is cataloged on the digital library, so that an eventual static html system page is not counted.

Page 18: ETD 2005 International Accesses to a Digital Library of ETDs

Observations

(1)(1) Statistics were planned on a monthly basis. Statistics were planned on a monthly basis. The model treats data as sequences of points The model treats data as sequences of points with discrete-time intervals of a month. Past with discrete-time intervals of a month. Past months data are unchanged and current months data are unchanged and current month is updated according to the Update month is updated according to the Update definition. definition.

(2)(2) IPs are resolved using a plug-in called GeoIP IPs are resolved using a plug-in called GeoIP Free that is available with AWStats.Free that is available with AWStats.

Page 19: ETD 2005 International Accesses to a Digital Library of ETDs

(C) Information to get from a log line

The month and the year are extracted along with identification of the digital object and the country of the IP address that accessed the digital object.

Page 20: ETD 2005 International Accesses to a Digital Library of ETDs

(C) Update of the DB

The lines are read every hour at the full hours (00:00, 01:00, etc.); incremental lines are mined. Accesses are summed for each month-year-DO-country, so the table is not very big – in the first 6 months of 2005 the average number of lines per month was 10,000.

Page 21: ETD 2005 International Accesses to a Digital Library of ETDs

(C) When to start computing

The log of the Apache Server started being saved on Jun 01, 2004. So, either this date was used or a later one, for example Jan 01, 2005. The decision was to use all available monthly logs.

When the process started, some days When the process started, some days of offline processing were required. of offline processing were required. Afterwards update became automatic Afterwards update became automatic according to the Update definition.according to the Update definition.

Page 22: ETD 2005 International Accesses to a Digital Library of ETDs

Observations

(1)(1) Maybe these were not the best definitions – we Maybe these were not the best definitions – we are willing to discuss alternatives!!are willing to discuss alternatives!!

(2)(2) The The (original)(original) logs are stored and saved offline logs are stored and saved offline in case some change in the minig strategy is in case some change in the minig strategy is decided decided (we have not sunk the ships!!)(we have not sunk the ships!!)..

Page 23: ETD 2005 International Accesses to a Digital Library of ETDs

Definitions for computing statisticsDefinitions for computing statistics

By author

Visited ETDs by year, month and country

Visited ETDs by country, month and year

25 most visited ETDs (on the system = PUC-Rio + UNICAP)

20 most visited ETDs by institution

Page 24: ETD 2005 International Accesses to a Digital Library of ETDs

10 most visited ETDs by graduate program

Visited ETDs by institution, program, year and month

Page 25: ETD 2005 International Accesses to a Digital Library of ETDs

Initial ResultsInitial Results

Page 26: ETD 2005 International Accesses to a Digital Library of ETDs

# ETDs may/sep – 13%

# accesses may/sep – 54.6%

Access to ETDs is increasing (Sep 28, 2005)

Visits

0

200

400

600

800

1,000

1,200

1,400

1,600

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

J un 04 - Sep 05

Brazil

United States

pt speaking countries

es speaking countries

others

Page 27: ETD 2005 International Accesses to a Digital Library of ETDs

# ETDs may/sep – 13%

# accesses may/sep – 54.6%

Number of total visits is increasing (Sep 28, 2005)

Total Visits

0

500

1,000

1,500

2,000

2,500

3,000

3,500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

J un 04 - Sep 05

Page 28: ETD 2005 International Accesses to a Digital Library of ETDs

# ETDs may/sep – 13%

# accesses may/sep – 54.6%

Accumulated average total visits is increasing (Sep 28, 2005)

Accumulated Average Total Visits

0

100

200

300

400

500

600

700

800

900

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

J un 04 - Sep 05

Page 29: ETD 2005 International Accesses to a Digital Library of ETDs

But… But… Brazil + pt speaking + es speaking = 75%Brazil + pt speaking + es speaking = 75%

Brazil + US + pt speaking + es speaking = 87%Brazil + US + pt speaking + es speaking = 87%

54.87

11.94

7.00

13.53

12.66

Brazil

United States

pt speaking countries

es speaking countries

others

Brazil accounts for 55% of the accesses since Jun 01, 2004 (Sep 28, 2005)

Page 30: ETD 2005 International Accesses to a Digital Library of ETDs

On Jun 15, 2007 the numbers of ETDs in Iberian languages on the NDLTD DB were

Brazilian ETDs were 83% of all ETDs in Iberian languages Brazilian ETDs were 83% of all ETDs in Iberian languages (total number 13,369)(total number 13,369)

Institution Country Language(s) Number

National Library Portugal Portuguese 185

IBICT (includes PUC-Rio)IBICT (includes PUC-Rio) BrazilBrazil PortuguesePortuguese 11,11811,118

UAB Spain (Catalunya) Catalan or English or Spanish 1,011

UIB Spain (Catalunya) Catalan or English or Spanish 22

UJI Spain (Catalunya) Catalan or English or Spanish 42

UOC Spain (Catalunya) Catalan 1

UPC Spain (Catalunya) Catalan or English or Spanish 415

UPF Spain (Catalunya) Catalan or English or Spanish 67

URL Spain (Catalunya) Spanish 1

URV Spain (Catalunya) Catalan or English or Spanish 106

UdG Spain (Catalunya) Catalan or English or Spanish 131

UdL Spain (Catalunya) Catalan or English or Spanish 70

UV Spain (Catalunya) Catalan or English or Spanish 200

Page 31: ETD 2005 International Accesses to a Digital Library of ETDs

Percentage of visits from Brazil is decreasing (Sep 28, 2005)

Percentage of Visits from Brazil

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

J un 04 - Sep 05

Page 32: ETD 2005 International Accesses to a Digital Library of ETDs

Accumulated percentage averages of visits from Brazil (Sep 28, 2005)

Accumulated Average Percentage of Visits from Brazil

50

52

54

56

58

60

62

64

66

68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

J un 04 - Sep 05

Page 33: ETD 2005 International Accesses to a Digital Library of ETDs

Total accesses top 10 countries (Sep 28, 2005)

# identified countries 122

+ unindentified countries

+ satellite access host

Country Visits

Brazil 12,845

USA 2,795

Portugal 1,489

Spain 679

Peru 652

Mexico 432

Chile 364

France 245

Colombia 225

Argentina 224

Page 34: ETD 2005 International Accesses to a Digital Library of ETDs

Some interesting resultsSome interesting results

Some ETDs are permanent ‘best sellers’

They are on specific subjects (examples: a specific phylosopher and history of modern architecture in Brazil)

They are linked from sites on the subjects (examples: the first from the US & Brazil and the second from Germany)

They are accessed from different countries

Some topics are permanent ‘best sellers’ (example: energy)

Page 35: ETD 2005 International Accesses to a Digital Library of ETDs

Some ETDs are temporary ‘best sellers’ – this seems to happen when they are displayed at the ‘last published ETDs’ functions (system and graduate program)

Some graduate programs are permanent ‘best sellers’

They research topics that are very specific of the country (examples: education and history of culture)

They are indexed in other sites and/or digital libraries (examples: Universia in Spain for social sciences and humanities)

They are accessed from different countries

Page 36: ETD 2005 International Accesses to a Digital Library of ETDs

The 25 most visited ETDs have a large number of visits

No average is lower than 100 visits per month

Average Number of Visits of the 25 Most Visited ETDs

0

50

100

150

200

250

300

350

400

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

J an 04 - Sep 05

Page 37: ETD 2005 International Accesses to a Digital Library of ETDs

Next stepsNext steps

Find out how readers got to ETDs (BDTD, NDLTD, SCIRUS, etc.) – an online survey is planned

Interview faculty to check if some ETDs are recommended reading in courses

Gather more data and analyze in a ‘more scientific’ manner (must find a student!!)

Page 38: ETD 2005 International Accesses to a Digital Library of ETDs

Develop additional functions comparing accesses with production

Extend to other digital contents (at the moment only ETDs and online journals have access statistics)

Page 39: ETD 2005 International Accesses to a Digital Library of ETDs

Thank you!Thank you!

Muito obrigada!Muito obrigada!