22
RESEARCH EXPERT Paul Varcholik Joshua Thompson EEL 6883 – Software Engineering II Spring 2009

Research ExperT

  • Upload
    maik

  • View
    17

  • Download
    0

Embed Size (px)

DESCRIPTION

Research ExperT. Paul Varcholik Joshua Thompson EEL 6883 – Software Engineering II Spring 2009. Background. Academic Research Literature Reviews Conferences Journals Material collected from the Internet Google Scholar How do researchers organize the papers they find? Hard copies - PowerPoint PPT Presentation

Citation preview

Page 1: Research ExperT

RESEARCH EXPERT

Paul VarcholikJoshua Thompson

EEL 6883 – Software Engineering IISpring 2009

Page 2: Research ExperT

Background Academic Research

Literature Reviews Conferences Journals

Material collected from the Internet Google Scholar

How do researchers organize the papers they find?

Hard copies On-Disk Directory Structures

Page 3: Research ExperT

Background (cont.) Needs

Storage and quick retrieval of research papers Collaboration with colleagues User-provided reviews Annotated references

Existing Tools 2collab.com Mendeley Zotero Papers (Mac-only) Wikipedia comparison

Page 4: Research ExperT

High-Level ArchitectureDatabase

Data Laye

rUI

5 Assemblies 1 Common 1 Data Layer 1 Unit Test 2 UI

1 Web 1 Windows Forms (WinForms)

Page 5: Research ExperT

First Iteration Requirements gathering, initial design,

and implementation Web-based system Foundation set, key features available Large scope required feature pull-back UI lacking polish

Page 6: Research ExperT

Second Iteration Windows Forms (WinForms) UI Same base code – database and data

layer with some extensions Attempts at auto-extraction of meta-data

Page 7: Research ExperT

Iteration Metrics Comparison

First Iteration Second Iteration

180 files ~4,500 ELOC 57 classes and

enumerations 15 database tables 88 stored procedures 87 unit tests

Files ~9,650 ELOC 92 classes and

enumerations 16 database tables 100 stored procedures 96 unit tests

Page 9: Research ExperT

Unit Testing

Page 10: Research ExperT

Discussion (cont.) Low complexity

Cyclomatic complexity Risk1-10 A simple, low risk program

11-20 A more complex program, moderate risk

21-50 A complex, high risk program

Greater than 50 An un-testable program (very high risk)

Assembly Cyclomatic complexity Unit Test 0.73

Data Layer 1.10

Common 1.54

Windows Client 1.16

Average 1.13

Page 11: Research ExperT

Discussion (cont.) High maintainability

Assembly MaintainabilityUnit Test 90.75

Data Layer 80.69

Common 82.64

Windows Client 64.34

Average 67.98 *

You can think of the score as a percentage grade, numbers closer to 100 are better.

* The formula for average complexity is logarithmic (the numbers don’t add up like sums)

Page 12: Research ExperT

PDF Parsing Metadata

Issue HeadingTitleAuthorsAbstractKeywords

Page 13: Research ExperT

PDF Parsing (cont.) Using PDFBox libraries for PDF reading

and manipulation Three methods for parsing PDFs

AutomaticXML basedUser-driven image based

Page 14: Research ExperT

PDF Parsing (cont.) Automatic parsing

Uses heuristics to determine metadata○ Font sizes○ Relative positioning○ Specific tokens

Pros○ No user input required○ Can provide reasonable guesses

Cons○ Makes assumptions○ Does not always work 100%○ Difficulties with text grabbing

Page 15: Research ExperT

PDF Parsing (cont.)

Page 16: Research ExperT

PDF Parsing (cont.) XML Parsing

Paper formats are specified○ Order of metadata○ Relative font sizes○ Token delimiters

Pros○ More effective than automatic parsing○ No direct user input required

Cons○ Requires manual input for each publication

source

Page 17: Research ExperT

PDF Parsing (cont.) User-Driven Image Based Parsing

Display Page 1User draws rectangles around metadataUses automatic parsing as an initial guess

○ User can review/modify the resultsPros

○ Uses automatic and user-driven methodsCons

○ Requires user input

Page 18: Research ExperT

PDF Parsing

Page 19: Research ExperT

Demonstration

Page 20: Research ExperT

Discussion Interesting uses of .NET Reflection Object Registry Difficulties of PDF Parsing

Approaches to resolving these difficulties○ Publication source templates○ User input○ Cut-and-paste

Page 21: Research ExperT

Future Work Integrated meta-data parsing Group-User-Repository access roles Author ranking Advanced searching Annotated references Additional document types (e.g. MS Word) More UI polish

Server selectionReview attachment improvementsAdministration features

Page 22: Research ExperT

Questions?

Research ExpertPaul VarcholikJoshua Thompson

EEL 6883 – Software Engineering IISpring 2009