Upload
icsm-2010
View
3.807
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
I. Keivanloo, L. Roostapour, P. Schugerl, J. Rilling
Scalable Semantic Web-based Source Code Search Infrastructure
SE-CodeSearch
ICSM 2010 ERA 2
Search
Who lives in London?
Who has relatives in London!
9/14/2010
ICSM 2010 ERA 3
Source code search
Where is it defined? Where is it called!
9/14/2010
ICSM 2010 ERA 4
Query types • Pure structural (PSQ)
• Metadata (MDQ)
• Transitive closure-based (TCQ)
• Method call (MCQ)
• Absent information (AIQ)
• Mixed queries (MXQ)
Requirement-based classification
9/14/2010
ICSM 2010 ERA 5
SICS Semantic-rich Internet-scale Code Search
•Supports all query types •Handles a tera-scale repository
ICSM 2010 ERA 6
Is there any SICS?
•NO
ICSM 2010 ERA 7
•Incomplete code (no binaries)
•Repository evolution–The crawler is working 24/7–Dependent code might be indexed in any order
•Very large repository (tera-scale)
Challenges
9/14/2010
•Creates small ontology for each code part
• Code facts
• Static code analysis rules
•Saves them in the RDF repository
•Uses backward chaining reasoner to answer
• Not only structural query
• But also all the other query types
(embedded code analysis at runtime)
SE-CodeSearch
9/14/2010
ICSM 2010 ERA 9
SICSONT
• Source Code Ontology for Internet-scale Static Analysis
http://aseg.cs.concordia.ca/ontology#sicsont
9/14/2010
ICSM 2010 ERA 10
Semantic Web-based Static Code Analysis
• Knowledge-based approach
• Inference engine does the analysis
• Restricted to OWL-DL
– De facto standard for knowledge sharing
– Based on Description Logic
• Decidable
• More restricted than rule-based families
9/14/2010
ICSM 2010 ERA 11
Semantic Web-based Static Code Analysis (Cont.)
• No compiler• Possible analysis– Inheritance tree computation– Fully qualified name resolution– Method call/return statement and type resolution
• Translation template for each analysis rule
9/14/2010
Queries:1. Transitivity closure-based2. Method call
Dataset:600,000 Java classes (no binaries) from a very large dataset (~400 GB)
http://www.ics.uci.edu/~lopes/datasets.
Scalability Test
Hardware:• 3 GB RAM• 3.40 GHz CPU
9/14/2010
ICSM 2010 ERA 13
SE-CodeSearch Highlights
•Avoid expensive knowledge
modeling
•Optimized ontology population
•Backward-chaining reasoner
•Disk-based computation
–Works on minimum hardware
9/14/2010
ICSM 2010 ERA 14
SE-CodeSearch Highlights (Cont.)
•Parallelization
–One pass code analysis
–Static code analysis on
•Complete code
•Partial Code
–Independent of parsing order
•First Package A then Package B
•First Package B then Package A
–Repository evolves incrementally
•Open World Reasoning (Not available in Relational DB)9/14/2010
ICSM 2010 ERA 15
The poster
9/14/2010
ICSM 2010 ERA 16
?• SE-CodeSearch homepage:
http://aseg.cs.concordia.ca/codesearch
• Source Code Ontology homepage:http://aseg.cs.concordia.ca/ontology
• ASEG Lab. homepage:http://aseg.cs.concordia.ca
• Any question:[email protected]