Upload
kasia
View
56
Download
0
Embed Size (px)
DESCRIPTION
Graph-based, Pattern-oriented, Context-sensitive Code Completion. Anh Nguyen, Tung Nguyen, Hoan Nguyen, Ahmed Tamrawi , Hung Nguyen , Jafar Al- Kofahi , and Tien N. Nguyen Electrical and Computer Engineering Department Iowa State University. Eclipse’s Built-in Code Completion. - PowerPoint PPT Presentation
Citation preview
Graph-based, Pattern-oriented, Context-sensitive Code Completion
Anh Nguyen, Tung Nguyen, Hoan Nguyen, Ahmed Tamrawi, Hung Nguyen, Jafar Al-Kofahi, and Tien N. Nguyen
Electrical and Computer Engineering Department
Iowa State University
2
3
Eclipse’s Built-in Code Completion
code completion
invocation point
List of recommended methods
Documentation on a proposed
method
4
Eclipse’s Built-in Code Completion
Filled-in code
5
Source Code Completion
Plays an important role in modern IDEsSupports developers by
Recommending relevant codeAutomatically filling in code
6
State-of-the-Art Code Completion
Single method/field recommendation Non-ranked list (sorted by alphabetical order) Ranked list• By return type (Ye et al. ICSE ‘02)• Via co-occurring methods (Bruch et al. FSE ‘09)• Via editing history (Robbes and Lanza ASE ’08)
Template-based recommendation (e.g., Eclipse)
7
Template-based Code Completion
Recommending a templatewithout considering context
8
Our Goal
A code completion approach and tool Auto-completing a high volume of code Taking into consideration the context of the
currently edited code
9
GraPacc Approach
Developing a pattern-oriented, context-sensitive code completion approach Evaluating usefulness of our code completion method and tool
10
Programming Pattern
//Reading a text file char-by -char using FileReader and BufferedReaderString fileName = “myfile.txt”;FileReader fReader = new FileReader(fileName);BufferedReader bReader = new BufferedReader(fReader);while(bReader.ready()){ bReader.read();}bReader.close();fReader.close();
A correct and frequent usage of API elementsIs used to perform a specific programming task
Declaring
Reading characters
Closing
11
Pattern-oriented Completion
Multiple method invocations of multiple variables with different types and control structure (if, for,…) are recommended to adapt the currently editing code
12
Context-sensitive Recommendation
Query A code fragment under editing A sequence of textual tokens Often incomplete and may be
un-parseable
13
Context-sensitive Recommendation
Different cursor positions Potentially different recommendation lists
a) b)
14
GraPacc Overview
Pattern Database
Query Processing
Searching & Ranking
Code Completion
Queryfeatures {fq}
Patternfeatures {fp}
Ranked list of patterns {P}Query Q
Patterns {P}
Filled-in code
15
Pattern Management
Pattern Database
Query Processing
Searching & Ranking
Code Completion
Queryfeatures {fq}
Patternfeatures {fp}
Ranked list of patterns {P}Query Q
Patterns {P}
Filled-in code
16
Pattern RepresentationGraph-based Object Usage Model - Groum [Nguyen et al. FSE ’09]
A directed acyclic graphRepresenting control and data dependencies
FileReader.new
BufferedReader.new
FileReader
FileReader fReader = new FileReader(“c:/aTextFile.txt”);BufferedReader bReader = new BufferedReader(fReader);while (bReader.ready()){}
BufferedReader
Data dependency
Control dependency
WHILE
Action node
Data node
Control node
BufferedReader.ready
17
Features
Graph-based feature: a sequence of the textual labels of the nodes along a path of a GroumToken-based feature: a lexical token extracted in a query
18
FeaturesFileReader.newFileReaderBufferedReader.newBufferedReaderBufferedReader.readyWHILEFileReader.newFileReaderFileReaderBufferedReader.newFileReader.newBufferedReader.newBufferedReader.new BufferedReaderBufferedReader BufferedReader.readyBufferedReader.ready WHILEFileReader.newFileReaderBufferedReader.newFileReader.new BufferedReader.new BufferedReaderFileReader.new BufferedReader.new BufferedReader.readyFileReader BufferedReader.new BufferedReaderFileReader BufferedReader.new BufferedReader.readyBufferedReader.newBufferedReaderBufferedReader.readyBufferedReader.newBufferedReader.readyWHILEBufferedReaderBufferedReader.ready->WHILE…
FileReader.new
BufferedReader.new
FileReader
BufferedReader
WHILE
BufferedReader.ready
Size-1 features
Size-2 features
Size-3 features
A feature’s size: number of nodes of the path
19
Patterns’ Feature Weighting
Significance of feature f in pattern P (tf-idf):
Nf,P: number of occurrences of f in P NP: number of features in P Nf: number of patterns containing f N: number of patterns in pattern database
Popularity of pattern P: Pr(P)
𝑠 ( 𝑓 ,𝑃 )=𝑁 𝑓 ,𝑃
𝑁 𝑃∗ log 𝑁𝑁 𝑓
20
Storing Patterns
GraPacc stores each pattern with Features and their weights Code templates
Inverted indexing is applied to patterns via their features.
21
Query Processing
Pattern Database
Query Processing
Searching & Ranking
Code Completion
Queryfeatures {fq}
Patternfeatures {fp}
Ranked list of patterns {P}Query Q
Patterns {P}
Filled-in code
Partial Program Analysis [Dagenais et al. OOPSLA ’08]
Tokenizing and parsing the code under editing into an Abstract Syntax Tree (AST)
22
public void readText(){ FileReader fReader; BufferedReader bReader = new BufferedReader(fReader);}
Method declaration
Declaration
fReader Assignment
Declaration
InitializationbReader
fReader
Method body
23
Building Groum
Query’s AST is used to build query’s Groum
Method declaration
Declaration
fReader Assignment
Declaration
InitializationbReader
fReader
Method body
BufferedReader.new
FileReader
BufferedReader
24
Feature Extraction
BufferedReader.new
FileReader
BufferedReader
FileReaderBufferedReader.newBufferedReaderFileReaderBufferedReader.newBufferedReader.newBufferedReaderFileReader.newBufferedReader.newBufferedReader
Remaining textual tokens
25
Weighting Query’s Features
Context-sensitive: taking into account the current editing point and surrounding codeFeatures are weighted to represent their significance in a query
𝑤 (𝑞)=(𝑤𝑠 (𝑞)+𝑤𝑐 (𝑞 ) )∗𝑤 𝑓 (𝑞)
Size-based factor Centrality-based factor Location-based factor(Distance to the focus point)
26
Searching and Ranking
Pattern Database
Query Processing
Searching & Ranking
Code Completion
Queryfeatures {fq}
Patternfeatures {fp}
Ranked list of patterns {P}Query Q
Patterns {P}
Filled-in code
27
Searching and Ranking
Query Q
Pattern P1
Pattern P2
Pattern Pn
…..
rel(Pi,Q)
28
Pattern Relevancy
Relevancy between pattern P and query Q is defined as:
𝑟𝑒𝑙 (𝑃 ,𝑄 )=Pr (𝑃 )∗ ∑𝑝∈𝑃 ,𝑞=𝑀 (𝑝)
𝑟𝑒𝑙 (𝑝 ,𝑞)
Popularity of P Weighted maximum matching
Relevancy between pairs of features p in P and q in Q
Combined relevancy
29
Feature Relevancy
The relevancy between two features p in pattern P and q in query Q:
𝑟𝑒𝑙 (𝑝 ,𝑞 )=s (𝑝 , P )∗w (𝑞)∗∼(𝑝 ,𝑞 )
Significance of p in P
Weight of q in Q
Similarity between p and q
𝑠 (𝑝 ,𝑃 )=𝑁 𝑝 ,𝑃
𝑁 𝑃∗ log 𝑁𝑁 𝑝
𝑤 (𝑞)=(𝑤𝑠 (𝑞)+𝑤𝑐 (𝑞 ) )∗𝑤 𝑓 (𝑞)
30
Feature Similarity
The similarity between two size-k features: Feature p of P: p1p2..pk
Feature q of Q: q1q2..qk
𝑠𝑖𝑚 (𝑝 ,𝑞 )=∏𝑖=1
𝑘
𝑛𝑠𝑖𝑚(𝑝𝑖 ,𝑞𝑖)
name-based similarity between the pair of elements pi and qi
p: FileReaderBufferedReader.newq: FileReadBufferedReader.new
Example:𝑠𝑖𝑚 (𝑝 ,𝑞 )=12∗1=
12
31
Code Completion
Pattern Database
Query Processing
Searching & Ranking
Code Completion
Queryfeatures {fq}
Patternfeatures {fp}
Ranked list of patterns {P}Query Q
Patterns {P}
Filled-in code
32
Aligning Nodes
FileReader.new
BufferedReader.new
FileReader
BufferedReader
FileReader.new
BufferedReader.new
FileReader
BufferedReader
String
String.new
BufferedReader.ready
FOR
BufferedReader.close
FileReader.close
Maximal alignment
Query Pattern
maximum weighted bipartite matching
33
Inserting Nodes & Edges
FileReader.new
BufferedReader.new
FileReader
BufferedReader
FileReader.new
BufferedReader.new
FileReader
BufferedReader
String
String.new
BufferedReader.ready
FOR
BufferedReader.close
FileReader.close
Aligned nodes
Query PatternInserting unaligned nodes
34
Empirical Evaluation
Goal: measure how accurately GraPacc recommends and fills in the current code28 subject systems: 4 systems for mining patterns 24 systems for testing
Using 197 patterns from java.util and java.io
35
Simulation
divideand takethe firsthalf
GraPacc’sRecomm-endation
Comparing and calculating accuracy- # shared nodes between recommended and real code- # nodes that appear in real code- # nodes that appear in recommended code
36
Accuracy results
71% of API usages are covered by API usage patterns.
Approximately 1.2 patterns are recommended for 1 test method.
37
Conclusions
Code completion using graph-based patterns and context information
Future work includes user study and adaptive code completionDemo on Friday 10:45-12:45
GraPacc
http://home.engineering.iastate.edu/~anhnt/Research/GraPacc
38
39
Return-type-based Ranking
40
Sources of inaccuracies
Customization of a pattern (e.g., two consecutive readLine method calls) Don't aim to replace developers Developers can easily customize
Lack of patterns in databaseAPI usage spans 2 methods