67
National Archives and Records Administration National Archives Catalog (The Catalog) NARA Catalog Reporting Design – Catalog Perspective – Status-Final Version 1.7 June 11, 2015

OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

National Archives and Records Administration

National Archives Catalog (The Catalog)

NARA Catalog Reporting Design– Catalog Perspective –

Status-FinalVersion 1.7

June 11, 2015

Page 2: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

National Archives & Records Administration

NARA Catalog Reporting Design

Archana Ballur

Madhu Koneni

Rhea Mandavilli

Version 1.7

Contract Number GS-35F-0541U

Order Number NAMA-13-F-0120

June 11, 2015

Page 3: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Contents

1 Overview................................................................................................................2

2 Reporting Technology.............................................................................................32.1 Data Sources......................................................................................................................3

2.2 Log Files.............................................................................................................................4

2.3 Database............................................................................................................................4

2.4 Report Format...................................................................................................................5

2.5 Roles and Permissions (14.34)...........................................................................................5

2.6 System Admin Configuration.............................................................................................6

2.6.1 Configure Top N Count..............................................................................................6

2.6.2 Configure Time Based Reports..................................................................................6

3 Access Reports........................................................................................................73.1 Number of queries- Daily/Weekly.....................................................................................7

3.2 Top 1000 Search Terms.....................................................................................................8

3.3 Top 1000 Most Accessed Files...........................................................................................9

3.4 Simple Search Access Reports..........................................................................................11

3.5 Advanced Search Access Reports.....................................................................................12

4 Contributions Reports...........................................................................................144.1 No. of User Contributions broken down by type of contribution....................................14

4.2 No. Of Registered Users who contributed by Contribution Type....................................15

4.3 Top 100 records with most contributions........................................................................17

4.4 Reasons for removing contributions................................................................................19

4.5 Reasons for Restoring contributions................................................................................20

4.6 Other User Contribution Reports.....................................................................................21

4.6.1 Average number of contributions for the top 1000 contributors by contribution type. 21

4.6.2 Top 100 contributors with their username and total number of contributions (excluding NARA users) [Req: 14.8.4]...................................................................................22

4.6.3 Top 1000 tags contributed by users and the total number of links that the tags have to records in OPA [Req: 14.8.5]....................................................................................23

5 API - Reports.........................................................................................................255.1 Number of Queries Generated through the API..............................................................25

Page 4: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

5.2 Characteristics of Queries made through API..................................................................26

5.3 Number of times exports are generated through API......................................................26

5.4 Total no. of Bytes served via the API................................................................................27

5.5 No. of User contributions via the API...............................................................................28

5.6 No. of Unique contributors via the API............................................................................29

6 Export Reports......................................................................................................316.1 No. of times records are exported...................................................................................31

6.2 Average number of Exports per User...............................................................................33

7 Saved Lists - Reports.............................................................................................34

8 Server Space Reports............................................................................................35

9 Shares – Reports...................................................................................................369.1 No. of Shares made by Users...........................................................................................36

9.2 No. of Shares per User.....................................................................................................37

9.3 Records with most no. of shares......................................................................................38

9.4 Top 100 Shares by Destination........................................................................................39

9.5 No. of times Copy URL is accessed...................................................................................40

10 Digital Analytics – Reports....................................................................................4110.1 Tabbed Groupings Access............................................................................................41

10.2 Advanced Search – Access...........................................................................................42

10.3 Policy & Help Link – Access..........................................................................................42

11 Statistics – Report.................................................................................................44

12 Requirements INDEX............................................................................................45

Version Control

Version Date Author Summary Description

0.1 2014-02-19 Rhea Mandavilli Initial Outline and structure to incorporate all requirements

1.0 2014-03-10 Archana Ballur,Madhu Koneni

First Version

1.2 2014-04-02 Archana Ballur Added Diagram to Section 2.1 Data Sources to

Page 5: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

show the various sources input to the Reporting System

1.3 2014-04-11 Archana Ballur Updated based on Requirements Spreadsheet dated Apr 4

1.4 2014-07-28 Altán Cabal Update based on requested log format changes

1.5 2014-11-14 Brandon Stahl Removed “Confidential to Search Technologies” text from footer

1.6 2014-11-24 Brandon Stahl Replaced https://research.archives.gov url with https://catalog.archives.gov url

1.7 2015-06-11 Kristy Martin Changed branding for system name throughout document.

Page 6: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

1 Overview

This document is a detailed description of the the National Archives Catalog reporting system and covers the following aspects:

- Technologies used to generate reports

- Sources used to generate reports

- SQL/Pseudo queries that can be used to extract the information required for the reports

- Sample Reports

- Roles and Permissions of Authorized Users

- System Configuration Options

For the purpose of organizing this document, the reports are classified into the following categories:

Access Reports – Related to various access statistics.

Contributions Reports - Related to user contributions.

API reports – Related to all the API requests from public users through systems other than the Catalog.

Server Space Reports – Related to Server space information.

Export Reports – Related to statistics on downloads/exports of Catalog documents/objects/contributions

etc.

Saved Lists Reports – Related to the “My Lists” created by users.

Shares Reports – Related to shares on public social networking sites like Twitter, Facebook, etc.

Digital Analytics Reports – Reports which can be obtained using Google Analytics

Statistics Reports – Related to the statistics of Catalog descriptions.

Page 7: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

2 Reporting Technology

It is assumed that Splunk is the reporting tool used for the Catalog reporting purposes. It is currently being evaluated by NARA as part of “Analysis of Alternatives”.

Splunk Enterprise transforms machine data into real-time operational intelligence. It enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites, applications, servers and networks.

Customers use Splunk software to improve analysis of log data to better manage their business. Splunk software automatically indexes all of the data, including structured, unstructured and complex multi-line application log data, enabling you to search on all of the data without need for custom connectors and without the scalability limitations inherent in traditional solutions.

Once the data is in Splunk, one can quickly search, report and diagnose operations. Splunk can also integrate with relational database systems such as MySQL. Splunk allows users to create a wide variety of reports. These reports can be requested on demand and can also be scheduled.

2.1 Data Sources Data for Reports will be obtained either from various log files on the server or from the relational database.

The following diagram captures the Reporting system and the various sources of the log files and database tables.

Page 8: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

2.2 Log Files

In Catalog production, the application will be logging all the various events and actions including errors and other information that is needed to generate the reports based on the requirements.

Tools like Splunk can process huge amount of this log data structuring and indexing the data. One can query it to obtain information from the logs for a given time range and it also offers various functions which can be used to further filter and clean up the returned results.

Data source and methods to extract report details from log files will be detailed for each report in various sections below.

2.3 DatabaseThe information stored in the database can be used to generate some of the reports. Tools like Splunk have DB Connectors using which data from database can be fed to Splunk and Splunk can be configured to generate these reports. Splunk accepts SQL queries and reports can be configured with the SQL statements. These reports can be scheduled and also requested on demand.

Page 9: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

2.4 Report Format [Req: 14.7] All Catalog reports are Microsoft Excel compatible. The report names will have the pattern “Report_Name_YYYY.MM.DD.HH:MM:SS”. For example: The report for the top 1000 search terms that resulted in zero results will be named “Top_Searches_ZeroResults_2014.03.05.00:00:00”

All the reports will have the following template. The sections 4 to 12 will have complete examples.

Report Name

Run Date <date on which the report was run>

Reporting Period <start date of the reporting period> - <end date of the reporting period>

Figure 1 Report in Excel Format

2.5 Roles and Permissions (14.34)System administrator can create user accounts in Splunk defining their roles and permissions. From the requirements, there are two types of users identified as Authorized Users – System Administrators and Reporters. Following table captures the different types of Authorized Users and the permissions they will be granted in Splunk to generate, save, print and modify reporting period.

Generate Reports [Req: 14.1]

System Administrator can generate report [Req: 14.1.1]

Reporter can generate report [Req: 14.1.2]

Save Reports [Req: 14.2]

System Administrator can save report [Req: 14.2.1]

Reporter can save report [Req: 14.2.2]

Print Reports [Req: 14.3]

<Report Content goes here>

Page 10: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

System Administrator can save report [Req: 14.3.1]

Reporter can save report [Req: 14.3.2]

Modify Reporting Period for all system reports [Req: 14.5]

System Administrator to enter the start/end dates for all system reports. [Req: 14.5.1]

Reporter to enter the start/end dates for all system reports. [Req: 14.5.2]

2.6 System Admin Configuration System administrator will be able to configure the reporting settings for various parameters that are listed in the requirements below.

2.6.1 Configure Top N Count

The system will provide the capability for a System Administrator to configure all reports to report on the top 10, top 100, or top 1000 statistics specific to that report. [Req: 14.16]

The system shall provide the capability for a System Administrator to configure all reports to report on the top 10 statistics specific to that report. [Req: 14.16.1]

The system shall provide the capability for a System Administrator to configure all reports to report on the top 100 statistics specific to that report. [Req: 14.16.2]

The system shall provide the capability for a System Administrator to configure all reports to report on the top 1000 statistics specific to that report. [Req: 14.16.3]

2.6.2 Configure Time Based Reports

System administrator will have the capability to make modifications to the reporting period to allow them to view reports for data for specific periods of time. The ability to modify reporting start and end dates shall provide system administrator the flexibility to tailor reports to specific data needs.

The Catalog system shall provide the capability for a System Administrator to configure time-based reports to provide totals for weekly, monthly, yearly, and since inception. [Req: 14.15]

The system shall provide the capability for a System Administrator to configure time-based reports to provide weekly totals. [Req: 14.15.1]

The system shall provide the capability for a System Administrator to configure time-based reports to provide monthly totals. [Req: 14.15.2]

The system shall provide the capability for a System Administrator to configure time-based reports to provide yearly totals. [Req: 14.15.3]

Page 11: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

The system shall provide the capability for a System Administrator to configure time-based reports to provide totals since inception. [Req: 14.15.4]

Page 12: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

3 Access Reports

The system will automatically generate the following scheduled access reports [Req: 14.4.6]. It will send an email notification to System administrator when the report is ready [Req: 14.4.1].

3.1 Number of queries- Daily/WeeklyThe system will generate the following scheduled weekly access reports on each Monday at 12:01 AM [Req: 14.4.3].

Report Type

Daily number of queries [Req: 14.4.3.1]

Weekly Number of Queries for the Last Quarter [Req: 14.4.3.2]

Source: This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=USAGE Controller=<controller-Name> Type=WebApp, Query= {query parameters} <date-timestamp> MainType=USAGE Controller=<controller-Name> Type=API, Query= {query parameters}

Sample Logs:

2014-03-04 15:28:27,088 MainType=INFO Controller=SearchController Type=WebApp, Query= {q=Truman&source=authorities&type=person&filter=year: 1930}

2014-03-04 15:29:20,233 MainType=INFO Controller=SearchController Type=API, Query= {q=Obama}

Splunk Configuration:

For these reports, Splunk will be configured to identify the keys “Type=WebApp” or “Type=API” and “Query={q=” and filter the queries by the date range.

Page 13: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Sample Report:

Catalog Access Report - Weekly number of Queries for the last quarterRun Date 02/16/2013

Reporting Period 4/1/2012 - 6/30/2012

Week Number of Queries4/01/2012 - 04/07/2012 2000

04/08/2012 - 04/14/2012 3000

04/15/2012 - 04/21/2012 2500

04/22/2012 - 04/28/2012 2800

04/29/2012 - 05/05/2012 2400

3.2 Top 1000 Search Terms

Report Content

Top 1000 search terms in the last 1 month [Req: 14.4.3.3]

Top 1000 Search Terms in the last 1 year [Req: 14.4.3.4]

Top 1000 search terms since inception – monthly report [Req: 14.4.3.5]

Top 1000 search terms that return zero results – monthly report [Req: 14.4.3.6]

Source: This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=USAGE <controller-Name> Type=WebApp, Query= {query parameters} <date-timestamp> MainType=USAGE <controller-Name> Type=API, Query= {query parameters}

Page 14: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Sample Logs:

2014-03-04 15:28:27 MainType=INFO SearchController Type=OPA, Query= {q=Truman&source=authorities&type=person&filter=year: 1930}

Splunk Configuration:

For these reports, Splunk will be configured to identify the keys “Type=WebApp” or “Type=API” and “Query={q=” and filter the queries by the date range. Extract the q=<query term> from the Query String and group the results by query term to get the top search terms.

Sample Report:

Catalog Access Report - Top 1,000 Search Terms - Yearly

Run Date 02/16/2013Reporting Period 02/16/2012 - 02/16/2013 Search Term Frequencywar 41960 2benjamin franklin 2dd214 2declaration of independence 2immigration records 2industrial revolution 2Izzo 2

3.3 Top 1000 Most Accessed Files The system will generate a weekly Top 1000 Most Accessed Files Report on each Monday at 12:01 AM. [Req: 14.4.4]The system will generate a weekly Top 1000 Most Accessed Files Report since Inception on each Monday at 12:01 AM. [Req: 14.4.5]

Page 15: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Report Type

Top 1000 Most Accessed: Descriptions with Digital Objects in the last 1 week. [Req: 14.4.4.1]

Top 1000 Most Accessed: Descriptions without Digital Objects in the last 1 week. [Req: 14.4.4.2]

Top 1000 Most Accessed: Authority Records in the last 1 week. [Req: 14.4.4.3]

Top 1000 Most Accessed webpages (Archives.gov, Presidential Libraries) in the last 1 week [Req: 14.4.4.4]

Top 1000 Most Accessed: Descriptions with Digital Objects since inception. [Req: 14.4.5.1]

Top 1000 Most Accessed: Descriptions without Digital Objects since inception. [Req: 14.4.5.2]

Top 1000 Most Accessed: Authority Records since inception. [Req: 14.4.5.3]

Top 1000 Most Accessed webpages (Archives.gov, Presidential Libraries) since inception[Req: 14.4.5.4]

Source: This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO Controller=<controller-Name> Type=WebApp, Action=ViewFullResults, Source=<source-name, NaId=<naid>, ObjectId=<objectId>

<date-timestamp> MainType=INFO Controller=<controller-Name> Type=API, Action=ViewFullResults, Source=<source-name, NaId=<naid>, ObjectId=<objectId>

Source will be one of OnlineHoldings, DescriptionsOnly, WebPages and Authorities

Sample Logs:

2014-01-04 17:28:27 MainType=INFO Controller=FullResultsController Type=WebApp, Action=ViewFullResults, Source=DescriptionsOnly Naid=12345

2014-03-04 15:28:27 MainType=INFO Controller=FullResultsController Type=API, Action=ViewFullResults, Source=WebPages Naid=167261

Splunk Configuration:

For these reports, Splunk will be configured to identify these keys (“Type=WebApp” or “Type=API”), “Action=ViewFullResults” and filter the queries by the date range.

Page 16: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Group by Naid to get the count as to how many times each record has been accessed. Splunk can return the top n of the results.

Sample Report:

Catalog Report - NARA's 1000 Most Accessed: Descriptions without Digital Objects - Weekly

Run Date 02/16/2013

Reporting Period 01/20/2013 - 01/26/2013 File Name Frequencyhttp://catalog.archives.gov/description/520628 4http://catalog.archives.gov/description/2538325 2http://catalog.archives.gov/description/7387382 2http://catalog.archives.gov/description/6364848 2http://catalog.archives.gov/description/1937409 2http://catalog.archives.gov/description/296529 2http://catalog.archives.gov/description/296531 2http://catalog.archives.gov/description/296523 2http://catalog.archives.gov/description/296517 2http://catalog.archives.gov/description/296508 2

3.4 Simple Search Access ReportsThe system shall generate a report that displays the number of times the filter options are accessed[Req: 14.24]

Source: This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=USAGE Controller=<ControllerName> Type=WebApp, Query= {q=Truman&facet.year=2010}

Sample Logs:

2014-01-04 17:28:27 MainType=USAGE Controller=BriefResultsController Type=WebApp, Query= {q=Truman&facet.fields=year,oldScope,level}

Page 17: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Splunk Configuration:

For these reports, Splunk will be configured to identify these keys “Type=WebApp”, “Query= contains (facet)” and filter the queries by the daterange.

Extract the facet.fields parameters from the Query String and then group by facet.fields to get the no. of times each of the filter option was accessed.

Sample Report:

Catalog Report - No. of times the filter options was accessed from Simple Search ScreenRun Date 02/16/2013Reporting Period 01/20/2013 - 01/26/2013 Filter No. of TimesData source - Archives.gov 123Level of Description - Collection 34File Format - ASCII Text 1

3.5 Advanced Search Access ReportsThe system shall generate a report on the type of filters that are applied to advanced searches, and the frequency of each. [Req: 14.25]

Source: This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=USAGE Controller=<ControllerName> - Type=WebApp, SearchType=Advanced, Query={q=Truman& f.filtername= filtervalue}

Sample Logs:

2014-01-04 17:28:27 MainType=USAGE Controller=BriefResultsController Type=WebApp, Query= {q=Truman& f.inclusiveStart= range (1900, 1999)}

Splunk Configuration:

For these reports, Splunk will be configured to identify these keys “Type=WebApp”, “Query= contains (f.)” and filter the queries by the daterange.

Page 18: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Extract the f.filtername parameters from the Query String and then group by f.filtername to get the no. of times each of the filter option was accessed.

Sample Report:

Catalog Report - No. of times the filter options was accessed from Advanced Search ScreenRun Date 02/16/2013Reporting Period 01/20/2013 - 01/26/2013 Filter No. of TimesData source - Archives.gov 123Level of Description - Collection 34File Format - ASCII Text 1

Page 19: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

4 Contributions Reports

The system will generate the following scheduled reports to capture user contributed data. [Req: 14.8].It will send an email notification to System administrator when the report is ready [Req: 14.4.1].

4.1 No. of User Contributions broken down by type of contribution

The system will generate a report on the total number of user contributions broken down by type of contribution (translations, transcriptions, comments, tags). [Req: 14.8.1]

The system will also generate a report on the cumulative number of user contributions made across all types of contribution (translations, transcriptions, comments, and tags). [Req: 14.8.1.1]

Source: The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Query:This is the initial query as per the schema at the moment. This is subject to change based on the schema updates.

SELECT TR_count.Transcriptions, TL_count.Translations, C_count.Comments, T_count.TagsFROM( SELECT count(*) AS TranscriptionsFROM annotations_transcriptions WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS TR_count,(SELECT COUNT(*) AS TranslationsFROM annotations_translationsWHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS TL_count,(SELECT COUNT(*) AS CommentsFROM annotations_commentsWHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS C_count,(SELECT COUNT(*) AS TagsFROM annotations_tagsWHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS T_count

For the cumulative count, the select statement would be:

Page 20: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

SELECT SUM (TR_count.Transcriptions + TL_count.Translations + C_count.Comments + T_count.Tags) AS CUMULATIVEFROM( SELECT count(*) AS TranscriptionsFROM annotations_transcriptions WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS TR_count,(SELECT COUNT(*) AS TranslationsFROM annotations_translationsWHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS TL_count,(SELECT COUNT(*) AS CommentsFROM annotations_commentsWHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS C_count,(SELECT COUNT(*) AS TagsFROM annotations_tagsWHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS T_count

Sample Report:

Catalog Report - No. of user contributions broken down by type of contributionRun Date 02/16/2013Reporting Period 01/20/2013 - 01/26/2013 Contribution Type TotalTags 100Comments 162Transcriptions 123Translations 12 All 397

4.2 No. Of Registered Users who contributed by Contribution TypeThe system will generate a report on the total number of register users who have contributed data, broken down by type of contribution (translations, transcriptions, comments, tags), with totals for weekly, monthly, yearly, and since inception, as well as cumulative totals across all types and time periods. [Req: 14.8.2, 14.8.2.1]

Page 21: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Source: The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Query:This is subject to change based on the schema updates.

SELECT TR_count.Transcriptions, TL_count.Translations, C_count.Comments, T_count.TagsFROM ( SELECT COUNT(DISTINCT account_id) AS Transcriptions FROM annotations_transcriptions WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS TR_count, (SELECT COUNT(DISTINCT account_id) AS Translations FROM annotations_translations WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS TL_count, (SELECT COUNT(DISTINCT account_id) AS Comments FROM annotations_comments WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS C_count, (SELECT COUNT(DISTINCT account_id) AS Tags FROM annotations_tags WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS T_count

For the cumulative count, replace the select statement with the following:

SELECT SUM (TR_count.Transcriptions + TL_count.Translations + C_count.Comments + T_count.Tags) AS CUMULATIVEFROM ( SELECT COUNT(DISTINCT account_id) AS Transcriptions FROM annotations_transcriptions WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS TR_count, (SELECT COUNT(DISTINCT account_id) AS Translations FROM annotations_translations WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS TL_count, (SELECT COUNT(DISTINCT account_id) AS Comments FROM annotations_comments WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS C_count, (SELECT COUNT(DISTINCT account_id) AS Tags FROM annotations_tags WHERE annotation_ts >= NOW() - INTERVAL 1 WEEK ) AS T_count

Page 22: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Sample Report:

Catalog User Contributions Total - Registers Users (Including NARA staff) - Since InceptionRun Date 02/16/2013

Reporting Period 01/01/2012 - 02/16/2013 Contribution Type TotalTags 1200Comments 3500Transcriptions 600Translations 700 All 6000 Total Number of users contributed 2020Average contributions per user 2.97029703

4.3 Top 100 records with most contributions

Report Type

Top 100 records (descriptions, authorities, and digital objects) that contain the most comments contributed by users. [Req: 14.8.6]

Top 100 records(digital objects) that contain the most translations contributed by users [Req: 14.8.7]

Top 100 records (digital objects) that contain translations with the most edits, and the total number of edits made to the translations. [Req: 14.8.8]

Top 100 records (digital objects) that contain transcriptions with the most edits, the average number of edits made to the transcriptions, and the total number of edits per transcription. [Req: 14.8.9]

Source: The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Page 23: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Query:For Report - Top 100 records that contain the most translations contributed by users, following pseudo query will be used.

SELECT AC.opa_id, OT.opa_title, count(*)

FROM annotations_translations AC, opa_titles OT

WHERE AC.opa_id = OT.opa_id

GROUP BY AC.opa_id, OT.opa_title

LIMIT 100

Sample Report:

Catalog user contributions - Top 100 Records with most comments

Run Date 02/16/2013Reporting Period 01/01/2013 – 02/16/2013NAID Type Title Page Comments

300321 DescriptionThe Final Rolls of Citizens and Freedmen of the Five Civilized Tribes in Indian Territory 72

300320Digital Object

Index to the Final Rolls of Citizens and Freedmen of the Five Civilized Tribes in Indian Territory 1 of 5 70

535413 Description We can do it!, ca 50

2745164Digital Object The Indian School Journal 1 of 10 45

7226539 DescriptionLetter from George McGovern to Harry S. Truman 45

7283870 DescriptionFall 1973: 17-34-2: What did Truman Say About the CIA?, by Benjamin F. Onate 40

2745164Digital Object The Indian School Journal 2 of 5 30

2745164Digital Object The Indian School Journal 3 of 5 20

Page 24: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

4.4 Reasons for removing contributions[Req: 14.8.10] The system will generate a report on the reasons that user-contributed data has been removed from the Catalog, broken down by type of contribution (translations, transcriptions, comments, tags), and the total number of contributions removed for each reason.

Source:

The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Query:

SELECT AL.annotation_type, count(*), AR.reason

FROM annotations_log AL, accounts_reasons AR

WHERE AL.reason_id = AR.id

AND AL.action = “REMOVE”

GROUP BY annotation_type, reason

Sample Report:

Catalog user contributions - Reasons for removing contributions

Run Date 02/16/2013 Contribution Type Reason CountTags VANDALISM 100Comments VANDALISM 200Transcriptions VANDALISM 10Translations VANDALISM 10Tags SPAM 50Comments SPAM 100Transcriptions SPAM 5Translations SPAM 5Tags FOUL LANGUAGE 100Comments FOUL LANGUAGE 100Transcriptions FOUL LANGUAGE 0Translations FOUL LANGUAGE 0Tags OTHER 200

Page 25: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Comments OTHER 100Transcriptions OTHER 20Translations OTHER 10 All All 1010

4.5 Reasons for Restoring contributions[Req: 14.8.11] Reasons for restoring user contributed data by contribution type, total number of contributions restored for each reason

Source:

The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Query:The following is the pseudo query:SELECT AL.annotation_type, count(*), AR.reasonFROM annotations_log AL, accounts_reasons ARWHERE AL.reason_id = AR.id AND AL.action = “RESTORE”GROUP BY annotation_type, reason_id

Sample Report:

Catalog user contributions - Reasons for restoring contributions

Run Date 02/16/2013 Contribution Type Reason CountTags ACCIDENTAL REMOVAL 50Comments ACCIDENTAL REMOVAL 100Transcriptions ACCIDENTAL REMOVAL 5Translations ACCIDENTAL REMOVAL 5Tags REQUESTED BY USER 40Comments REQUESTED BY USER 100Transcriptions REQUESTED BY USER 5Translations REQUESTED BY USER 5Tags OTHER 100

Page 26: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Comments OTHER 100Transcriptions OTHER 0Translations OTHER 0 All All 510

4.6 Other User Contribution Reports

4.6.1 Average number of contributions for the top 1000 contributors by contribution type.

The system will generate a report on the average number of user contributions for the top 1,000 contributors, broken down by type of contribution (translation, transcription, comments, and tags).[Req: 14.8.3]

Source:The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Query:

The following is the pseudo query:SELECT annotation_type, count(*) FROM annotation_log WHERE account_id IN

(SELECT TOP 1000 account_id FROM annotation_log GROUP BY account_id) GROUP BY annotation_type;

This will give the counts for each annotation type for the top 1000 contributors.

To get the average for each type, divide the counts by 1000.

Sample Report:

Catalog user contributions – Average number of contributions for top 1000 contributorsRun Date 02/16/2013Reporting Period 01/01/2013 - 02/16/2013

Contribution Type AverageTags 1000Comments 1500Transcriptions 500

Page 27: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Translations 400All 3300

4.6.2 Top 100 contributors with their username and total number of contributions (excluding NARA users) [Req: 14.8.4]

The system will generate a report on the top 100 contributors including their username and total number of contributions, excluding users registered with a NARA email address.[Req: 14.8.4]

Source: The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Query:

CREATE VIEW contributions_by_userid AS

(SELECT account_id, count(*) count FROM annotations_transcriptions GROUP BY account_id) UNION (SELECT account_id, count(*) FROM annotations_translations GROUP BY account_id)

UNION(SELECT account_id, count(*)

FROM annotations_comments GROUP BY account_id)

UNION(SELECT account_id, count(*)

FROM annotations_tags GROUP BY account_id)

SELECT account_id, SUM(count)

FROM contributions_by_userid

GROUP BY account_id

Page 28: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Sample Report:

Catalog Report – Top 100 Contributors and their Total no. of contributions(Excluding NARA Staff)

Run Date 02/16/2013Reporting Period 01/01/2013 – 02/16/2013

User Name Allmkoneni 240aballur 205pnelson 179sdugan 165lleu 130

4.6.3 Top 1000 tags contributed by users and the total number of links that the tags have to records in the system [Req: 14.8.5]

Source:

The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Query:

SELECT annotation, count(*) as no_of_recordsFROM annotations_tagsGROUP BY annotationORDER BY no_of_records DESC

Sample Report:

Catalog Report – Top 1000 Tags and Total no. of links tags have to records in the Catalog

Run Date 02/16/2013

Page 29: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Reporting Period 02/9/2013 – 02/16/2013Tags Total no. of record tagged Independence 24Washington DC 21China Agreement 15Coal deals 3Gold Industry 1

Page 30: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

5 API - Reports

The following section describes the various API reports that need to be generated.

5.1 Number of Queries Generated through the APIThe system will generate a report on the number of queries generated through the API [Req: 14.26].

Source: This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=USAGE <controller-Name> Type=API, Query= {query parameters}

Sample Logs:

2014-03-04 15:28:27 MainType=INFO FullResultsController Type=API, Query= {holdings/7226539/description&format=JSON}

2014-03-04 15:28:27 MainType=INFO SearchController Type=API, Query= {q=Truman&type=description}

SPLUNK configuration:

For these reports, Splunk will be configured to identify the keys “Type=API”, “Query”, filter by the date range.

Sample Report:

Catalog Report - No. of queries generated through the API in the last 1 week

Run Date 02/16/2013

Reporting Period 02/9/2013 - 02/16/2013

No. of queries generated through the API in last 1 week 212

Page 31: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

5.2 Characteristics of Queries made through APIThe system shall generate a report that indicates the characteristics of queries made through the use of the API, including queries that caused errors, speed on query responses, and a listing of the most common fields queried [Req: 14.27].

Source:

This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date timestamp> ERROR <Controller-Name> Type=API respTime= <response time> query= {query parameters}

Sample Logs:

2014-03-04 15:29:12 ERROR SearchController Type=API, results=0, query = {q=LGBTQ}

Splunk Configuration:

For these reports, Splunk will be configured to identify these keys (“Type=API”)Sample Report:

Catalog Report – Characteristics of queries made via the API

Run Date 02/16/2013

Reporting Period 02/16/2011 - 02/16/2013

Query ERROR Response Time

q=Truman&source=authory YES 2ms

q=Truman&source=authorities NO 5ms

5.3 Number of times exports are generated through APIThe system will generate a report on the number of exports generated through the API [Req: 14.28].

Source: This report can be generated using the Application Server log files as input to Splunk. Splunk will be configured to extract the required information from the logs and generate a report. The following log format will be used

Page 32: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

for all the logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO <controller-Name> Type=WebApp, Action=export,

All exports generated though the API will be logged as Type=API and Action=export.

SPLUNK configuration:For these reports, Splunk will be configured to identify these keys “Type=API and Action=export”.

Sample Report:

Catalog Report - No. of times exports are generated through the API

Run Date 02/16/2013

Reporting Period 02/9/2013 - 02/16/2013

No. of times exports generated through the API 181

5.4 Total no. of Bytes served via the APIThe system will generate a report on the number of total bytes served via the API. [Req: 14.29]

Source:This report can be generated using the Application Server log files as input to Splunk. Splunk will be configured to extract the required information from the logs and generate a report. The following log format will be used for all the logging in the Catalog application. This is only an initial version and the format may change.

<date timestamp> ERROR <Controller-Name> Type=API respTime= <response time> query= {query parameters} , BytesServed = 1290192817

For every query response, a log will be made and it will log the amount of data being transferred.

Splunk Configuration:

For these reports, Splunk will be configured to identify these keys “Type=API”, “BytesServed”

Sample Report:

Catalog Report - No. of Bytes Served via the API

Run Date 02/16/2013

Page 33: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Reporting Period 02/9/2013 - 02/16/2013

No. of Bytes Served via the API 10.1 GB

5.5 No. of User contributions via the APIThe system will generate a report on the number of user contributions (translations, transcriptions, comments, tags) via the API. [Req: 14.30]

Source:

This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date timestamp> MainType=USAGE Controller=<Controller-Name> Type={API, WebApp}, Action=save, AnnotationType=<annotation type>, Naid=<naid> , Object=<object no if any>, Username=<username>

Sample Logs:

2014-03-04 15:28:27,054 MainType=USAGE Controller=gov.nara.opa.api.controller.annotation.tags.CreateTagController Type=API, Action=save, AnnotationType= Transcription, Naid= 12345, Object=1, Username=jsmith

SPLUNK configuration:

For these reports, Splunk will be configured to identify these keys “Type=API, contains (INFO), Action=save, contains (AnnotationType) and filter by date range. Then, group by contribution Type to get the counts.

Sample Report:

Catalog Report - No. of user contributions via the API made in the last 1 week

Run Date 02/16/2013

Reporting Period 02/9/2013 - 02/16/2013

Contribution Type No. of User Contributions via API

Transcriptions 12

Translations 7

Comments 109Tags 191

Page 34: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

5.6 No. of Unique contributors via the APIThe system will generate a report on the number of unique contributors via the API. [Req: 14.31]

Source:

This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date timestamp> MainType=USAGE Controller=<Controller-Name> Type=API, Action=save, AnnotationType=<annotation type>, Naid=<naid>, Object=<object no if any>, Username=<username>

Sample Logs:

2014-03-04 15:28:27,889 MainType=USAGE Controller=CreateTagController Type=API, Action=save, AnnotationType= Transcription, Naid= 12345, Object=1, Username=jsmith

SPLUNK configuration:

For these reports, Splunk will be configured to identify these keys “Type=API, “MainType=INFO”, Action=save, “AnnotationType” and filter by date range.Extract the results as stated above and apply filter to get distinct usernames. Additional Note: Splunk as a way to remove duplicates. For more info - http://docs.splunk.com/Documentation/Splunk/4.2.3/SearchReference/Dedup

Sample Report:

Catalog Report - No. of unique contributors via the API made in the last 1 week

Run Date 02/16/2013

Reporting Period 02/9/2013 - 02/16/2013

No. of unique contributors via the API made in last 1 week 12

Page 35: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

6 Export Reports

6.1 No. of times records are exported The system will generate a report on the number of times records are exported from the system, and the characteristics of the export. [Req: 14.9]

The various characteristics that will be included in the report are:Number of records exported- Type of formats exported- Type of results exported (e.g., brief vs. full results)- Total file size- Type of export (normal, bulk, API)- Citizen contributed data included in export (tags, comments, transcriptions, translations)- Type of image setting (thumbnails, only metadata, metadata and thumbnails)

The system will generate a report on the number of times records are exported from the system, broken down by type of export format. [Req: 14.10]The system will generate a report on the number of times that records are exported from the system. [Req: 14.11]

Source:

This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO <controller-Name> Type=WebApp, Action=export, Format=<format>,TotalRecords= <no of records exported>, ExportOptions = {brief, full, thumbnails, tags, comments, transcriptions, translations}, username: <username>

Sample Logs:

2014-03-04 15:28:27 MainType=INFO Controller-Name Type=API, Action=export, Format=XML, TotalRecords=500,ExportOptions={thumbnails }, username:jsmith

2013-01-15 15:28:27 MainType=INFO Controller-Name Type=API, Action=export, Format=XML, TotalRecords=500,ExportOptions={thumbnails, comments, tags}, username:jsmith

Page 36: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

SPLUNK configuration:

For these reports, Splunk will be configured to identify these keys “Type=API”, “MainType=INFO”, “Action=export” and filter by date range.

Group by exported format type and get the count as to no. of times records are exported in CSV format, no. of times records are exported in PDF format etc.

Sample Report:

Catalog Report- No. of times records are exported from the system, broken down by export formatRun Date 02/16/2013Reporting Period 01/20/2013 - 01/26/2013

File Format FrequencyXML 45CSV 54JSON 12PDF 90

Catalog Report- No. of times records are exported from the system in the last 1 weekRun Date 02/16/2013Reporting Period 02/9/2013 - 02/16/2013

Date Frequency02/09/2013 4502/10/2013 5402/11/2013 1202/12/2013 9002/13/2013 4502/14/2013 3202/15/2013 1102/16/2013 87

Page 37: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

6.2 Average number of Exports per User

The system will generate a report on the average number of exports per user. [Req: 14.12]

Source: This report can be generated using the Application Server log files which are input to Splunk. Splunk will be configured to extract the required information from the logs. The following log format will be used for logging in the Catalog application. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO <controller-Name> Type=API, Action=export, Format=<format>,TotalRecords= <no of records exported>, ExportOptions = {thumbnails, tags, comments, transcriptions, translations}, username: <username>

Sample Logs:

2014-03-04 15:28:27 MainType=INFO Controller-Name Type=API, Action=export, Format=XML, TotalRecords=500,ExportOptions={thumbnails }, username:jsmith

SPLUNK configuration:

For these reports, Splunk will be configured to identify these keys “Type=API”, contains (MainType=INFO), Action=exportThen, group by username to get the count of no. of exports per user and use Splunk’s avg function to get the average.

Sample Report:

Catalog Report- Average no. of exports per user

Run Date 02/16/2013Reporting Period 01/20/2013 - 01/26/2013

Username Average no. of exportsjsmith 32mia61 12amartin1 1

Page 38: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

kelizabeth5 1

Page 39: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

7 Saved Lists - Reports

The system will generate a report on the number of saved lists created by registered users. [Req: 14.17]

Following Reports to be Generated

No. of saved lists created weekly. [Req: 14.17.1]

No. of saved lists created monthly. [Req: 14.17.2]

No. of saved lists created yearly. [Req: 14.17.3]

No. of saved lists created since inception. [Req: 14.17.4]

Source:The source for these reports will be the Catalog Database. Splunk will be configured to communicate with the MySQL database. Refer to the Application Server Design document for the database schema.

Query:

SELECT account_id, COUNT(* )

FROM accounts_lists

GROUP BY account_id

Sample Report:

Catalog Report- No. of saved lists created by registered user

Run Date 02/16/2013Reporting Period 01/20/2013 - 01/26/2013

Username No. of lists createdjsmith 32mia61 12amartin1 1kelizabeth5 1

Page 40: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

8 Server Space Reports

The system will generate a monthly Catalog Server Space Report on the first Monday of a month [Req: 14.4.2]

Source: The information can be obtained from the Catalog servers. A script will be scheduled to run on the servers to extract the information required to generate this report.

Sample Report:

Catalog Server Space ReportRun Date 02/16/2013Reporting Period 01/01/2013 - 01/31/2013Total Space 300000Total Used Space 30000

Space Reserved for Maintenance 10000

Total Available/Usable Space 260000 All sizes rounded up to the nearest GB

Page 41: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

9 Shares – Reports

9.1 No. of Shares made by UsersRelease INFO: Not in R1

The system will generate a report on the number of shares made by users [Req: 14.18].

Following Reports to be Generated

Number of shares made in the last 1 week [Req: 14.18.1]

Number of shares made in the last 1 month [Req: 14.18.2]

Number of shares made in the last 1 year [Req: 14.18.3]

Number of shares made since inception [Req: 14.18.4]

Source: Front end will log the information related to Shares such as Username, URL (Results Page), Record (for content detail pages), and Share Destination. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO <Controller-Name> - Username=<username>, SharedTo=<twitter/email/etc>, Url=<complete url>, RecordId= <naid>

Sample Logs:

2014-03-04 15:28:27 MainType=INFO LogController – username= jsmith, SharedTo=twitter, url=http://catalog.archives.gov/ui/search?q=Truman

2014-01-01 12:23:12 MainType=INFO LogController – username= jsmith, SharedTo=twitter, RecordId=desc-12345

Splunk Configuration:

For these reports, Splunk will be configured to identify the keys - ”MainType=INFO”, “sharedTo”

Filter the above results by using different time ranges to get the weekly, monthly, yearly and since inception reports.

Page 42: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Sample Report:

Catalog Report - No. of shares made in last 1 week

Run Date 02/16/2013Reporting Period 02/9/2013 - 02/16/2013 No. of shares made in last 1 week 78

9.2 No. of Shares per UserRelease INFO: Not in R1The system will generate a report that captures the number of shares per user. [Req: 14.19]

Source: Front end will log the information related to Shares such as Username, URL (Results Page), Record (for content detail pages), and Share Destination. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO <Controller-Name> - username=<username>, sharedTo=<twitter/email/etc>, Url=<complete url>, RecordId=<naid>

Sample Logs:

2014-03-04 15:28:27 MainType=INFO LogController– Username= jsmith, SharedTo=twitter, Url=http://catalog.archives.gov/ui/search?q=Truman

Splunk Configuration:

For these reports, Splunk will be configured to identify the keys - ”MainType=INFO”, “SharedTo”Group the returned results by username to get the count of no. of shares made per user.

Sample Report:

Catalog Report - No. of shares per user

Run Date 02/16/2013

Page 43: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Reporting Period 02/9/2013 - 02/16/2013 Username No. of sharesPublic 56Jsmith 34kelizabeth12 23Rdann 6

9.3 Records with most no. of sharesRelease INFO: Not in R1The system will generate a report that captures the records with the most number of shares. [Req: 14.20]

Source: Front end will log the information related to Shares such as Username, URL (Results Page), Record (for content detail pages), and Share Destination. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO <Controller-Name> - Username=<username>, SharedTo=<twitter/email/etc>, recordId=<naid>

Sample Logs:

2014-03-04 15:28:27 MainType=INFO LogController– Username= jsmith, SharedTo=twitter, RecordId=672514

Splunk Configuration:

For these reports, Splunk will be configured to identify the keys - ”MainType=INFO”, “SharedTo”, and “RecordId”

Group by Naid to get the count of no. of times this record was shared.

Sample Report:

Catalog Report - Records with most no. of shares

Run Date 02/16/2013Reporting Period 02/9/2013 - 02/16/2013

Page 44: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Record – NAID No. of sharesdesc-12345 56desc-877111 34desc-453112 23desc-918261 6

9.4 Top 100 Shares by Destination Release INFO: Not in R1The system will generate a report on the top 100 share destinations on social media platforms, broken down by the destination (e.g., Twitter) and the total number of shares for that destination. [Req: 14.21]

Source: Front end will log the information related to Shares such as Username, URL (Results Page), Record (for content detail pages), and Share Destination. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO <Controller-Name> - Username=<username>, SharedTo=<twitter/email/etc>, Url=<complete url>, RecordId=<naid>

Sample Logs:

2014-03-04 15:28:27 MainType=INFO LogController – Username= jsmith, SharedTo=twitter, Url=http://catalog.archives.gov/ui/search?q=Truman

2014-03-04 15:28:27 MainType=INFO LogController – Username= jsmith, SharedTo=twitter, RecordId=12345

Splunk Configuration:

For these reports, Splunk will be configured to identify the keys- “MainType=INFO”, “SharedTo”Group by SharedTo field to get the count of no. of times records were shared to this destination.

Sample Report:

Catalog Report - Top 100 Share Destinations and no. of shares to that destinationRun Date 02/16/2013Reporting Period 02/9/2013 - 02/16/2013

Page 45: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Share Destination No. of sharesGmail 56Email 34Facebook 23Twitter 6

9.5 No. of times Copy URL is accessedRelease INFO: Not in R1The system shall generate a report that displays the number of times the copy URL function is accessed. [Req: 14.22]

Source: Front end will log the information related to Shares such as Username, URL (Results Page), Record (for content detail pages), and Share Destination. This is only an initial version and the format may change.

<date-timestamp> MainType=INFO <Javascript Filename> <JavaScript Function Name> - Username=<username>, SharedTo=CopyUrl, Url=<complete url>, NaId=<naid>

Sample Logs:2014-03-04 15:28:27 MainType=INFO LogController – Username= jsmith, SharedTo=”CopyUrl”, Url=http://catalog.archives.gov/ui/search?q=Truman

Splunk Configuration:For these reports, Splunk will be configured to identify the keys- “MainType=INFO”, “SharedTo=CopyUrl”. Count the no. of results to get the no. of times copy URL was accessed.

Sample Report:

Catalog Report - No. of times Copy URL is accessed

Run Date 02/16/2013Reporting Period 02/9/2013 - 02/16/2013 No. of times Copy URL was accessed 56

Page 46: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

10 Digital Analytics – Reports

The system will implement a digital analytics program. [Req: 14.13]

"Implementation of a digital analytics program” - The gov't wide contract is for Google Analytics.

URL for information on code to implement:

http://www.howto.gov/web-content/digital-metrics/digital-analytics-program/analytics-tool-instructions

10.1 Tabbed Groupings AccessThe system will generate a report on the total number of times that the tabbed groupings are accessed in the brief search results display. [Req: 14.14]

Following Reports to be Generated

Total no. of times the tabbed groupings are accessed in the brief search results display in the last 1 week. [Req: 14.14.1]

Total no. of times the tabbed groupings are accessed in the brief search results display in the last one month. [Req: 14.14.2]

Total no. of times the tabbed groupings are accessed in the brief search results display in the last one year. [Req: 14.14.3]

Source:

The information for this report will be obtained from the Digital Analytics provided by Google. The logs from Google Analytics will be used to generate information required for this report.

Exact Google Analytics Log Format will be available during the development phase when it is integrated into the website and logs are generated on the development server. The report content i.e. Frequency of accessing the tabs in the brief results display can be obtained from Google Analytics log files. Google Analytics captures the all the clicks and events on a webpage.

Sample Report:

Catalog Report - Tabbed Groupings Access

Run Date 02/16/2013Reporting Period 02/9/2013 - 02/16/2013

Page 47: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Tab name No. of times accessedAll 455Available Online 212Web Pages 211Documents 114Images 321Video 122

10.2 Advanced Search – AccessThe system will generate a report that displays the number of times the Advanced Search Screen is accessed. [Req: 14.23]Source:

The information for this report will be obtained from the Digital Analytics provided by Google. The logs from Google Analytics will be used to generate information required for this report.

Exact Google Analytics Log Format will be available during the development phase when it is integrated into the website and logs are generated on the development server. The report content i.e. Frequency of accessing the tabs in the brief results display can be obtained from Google Analytics log files. Google Analytics captures the all the clicks and events on a webpage.

Sample Report:

Catalog Report - Advanced Search Screen Access

Run Date 02/16/2013Reporting Period 02/9/2013 - 02/16/2013 No. of Advanced Search Screen was accessed 78

10.3 Policy & Help Link – AccessRelease INFO: Not in R1The system will generate a report on the number of times the policy link is accessed . [Req: 14.32]

Page 48: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Following Reports to be Generated

Display the number of times the policy link is accessed weekly. [Req: 14.32.1]

Display the number of times the policy link is accessed monthly. [Req: 14.32.2]

Display the number of times the policy link is accessed yearly. [Req: 14.32.3]

The system shall generate a report on the number of times the help link is accessed. [Req: 14.34]

Source:

The information for this report will be obtained from the Digital Analytics provided by Google. The logs from Google Analytics will be used to generate information required for this report.

Exact Google Analytics Log Format will be available during the development phase when it is integrated into the website and logs are generated on the development server. The report content i.e. Frequency of accessing the tabs in the brief results display can be obtained from Google Analytics log files. Google Analytics captures the all the clicks and events on a webpage.

Sample Report:

Catalog Report - No. of times Policy Link was accessed - in the last 1 week

Run Date 02/16/2013Reporting Period 02/9/2013 - 02/16/2013 No. of times Policy link was accessed 43

Page 49: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

11 Statistics – Report

This refers to the Statistics page which shows the amount of data per level of description.

The system will provide the capability for a System Administrator to view the amount of data per level of description (i.e. Collection, Record Group, File Unit, Series, and Item). [Req: 14.6]

The system will provide the capability to select a level of description (i.e. Collection, Record Group, Series, File Unit and Item) in order to preview data. [Req: 14.6.1]

The system will provide the capability for a System Administrator to view the amount of data per level of description (i.e. Collection, Record Group, File Unit, Series, and Item). [Req: 14.6.2]

The system will provide the capability for a Reporter to view the amount of data per level of description (i.e. Collection, Record Group, File Unit, Series, and Item). [Req: 14.6.3]

Statistics will be a report available in the form of a webpage and will be part of the Catalog website and can be accessed by Public users also. Statistics link will be available in the header block on all pages of the Catalog website.

More details about Statistics webpage, the look and feel as well as the contents of the page is detailed in Section 12 of the Catalog UI Design Document.

The above two reports can be generated using the ARC data indexed in SOLR. This has already been implemented in the prototype.

Page 50: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

12 Requirements INDEX

1.20.1.......................................................1014.1............................................................514.1.1.........................................................514.1.2.........................................................514.10........................................................3114.11........................................................3114.12........................................................3314.13........................................................4114.14........................................................4114.14.1.....................................................4114.14.2.....................................................4114.14.3.....................................................4114.15..........................................................614.15.1.......................................................614.15.2.......................................................614.15.3.......................................................614.15.4.......................................................614.16..........................................................614.16.1.......................................................614.16.2.......................................................614.16.3.......................................................614.17........................................................3414.17.1.....................................................3414.17.2.....................................................3414.17.3.....................................................3414.17.4.....................................................3414.18........................................................3614.18.1.....................................................3614.18.2.....................................................3614.18.3.....................................................3614.19........................................................3714.2............................................................514.2.1.........................................................514.2.2.........................................................514.20........................................................3814.21........................................................3914.22........................................................4014.23........................................................4214.24........................................................1114.25........................................................1214.26........................................................2514.27........................................................26

14.28........................................................2614.29........................................................2714.3............................................................514.3.1.........................................................514.3.2.........................................................614.30........................................................2814.31........................................................2914.32........................................................4214.32.1.....................................................4314.32.2.....................................................4314.32.3.....................................................4314.34........................................................4314.4.1...................................................7, 1414.4.2.......................................................3514.4.3.........................................................714.4.3.1......................................................714.4.3.2......................................................714.4.3.3......................................................814.4.3.4......................................................814.4.3.5......................................................814.4.3.6......................................................814.4.4.........................................................914.4.4.1......................................................914.4.4.2......................................................914.4.4.3......................................................914.4.4.4....................................................1014.4.5.........................................................914.4.5.1....................................................1014.4.5.2....................................................1014.4.5.3....................................................1014.4.5.4....................................................1014.4.6.........................................................714.5............................................................614.5.1.........................................................614.5.2.........................................................614.6..........................................................4414.6.1.......................................................4414.6.2.......................................................4414.6.3.......................................................4414.7............................................................514.8..........................................................1414.8.1.......................................................14

Page 51: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

14.8.1.1....................................................1414.8.10.....................................................1914.8.11.....................................................2014.8.2.......................................................1514.8.2.1....................................................1514.8.3.......................................................2114.8.4.......................................................22

14.8.5.......................................................2314.8.6.......................................................1714.8.7.......................................................1714.8.8.......................................................1714.8.9.......................................................1714.9..........................................................31

Page 52: OPA Reporting Design Document · Web viewIt enables organizations to monitor, search, analyze, visualize and act on the massive streams of machine data generated by the websites,

NARA Catalog Reporting Design

Database Reports Technical Details

DB App:

Download the SplunkDB Connect from http://apps.splunk.com/app/958/

Follow the below instructions for installing the app:

Log into Splunk Enterprise

On the Apps Menu, click Manage Apps

Click Install App from file

In the upload app window, click “Choose File”

Locate the .tar.gz file you just downloaded, and then click open or choose. Click upload.

Click Restart Splunk, and then confirm that you want to restart

To install apps and add-ons directly into Splunk Enterprise:

Put the downloaded file in $SPLUNK_HOME/etc/apps directory

Untar and unzip your apps or add-on, using a tool like tar –xvf (on linux/unix) or WinZip (Windows)

Restart Splunk

After installing the app, set it up by clicking on setup:

Set Java Home

Click Save

Install JDBC driver

Copy the MySql driver mysql-connector-java-5.1.18.jar to the $SPLUNK_HOME/etc/apps/dbx/bin/lib directory.

Restart Splunk (Settings -> Server Controls -> Restart)

Create DB connection

http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Abouttheconnector

Configure DB inputs