Upload
federico-tomassetti
View
626
Download
2
Tags:
Embed Size (px)
DESCRIPTION
An algorithm (with code on GitHub) to identify cross-language relations. Welcome into polyglot software development!
Citation preview
Spotting automatically
cross-language relations
Federico Tomassetti (me)
Giuseppe Rizzo
Marco Torchiano
CREATE TABLE Persons ( ID int, FirstName varchar(255), LastName varchar(255), City varchar(255) ); String query = "select ID, FirstName, LastName, " + "City " + "from " + dbName + ".Persons"; try { ... while (rs.next()) { int id = rs.getInt("ID"); String firstName = rs.getString("FirstName"); String lastName = rs.getString("LastName"); String city= rs.getString("City"); } } catch (SQLException e ) { ...... }
data.sql
Person.java
CREATE TABLE Persons ( ID int, FirstName varchar(255), LastName varchar(255), City varchar(255) ); String query = "select ID, FirstName, LastName, " + "City " + "from " + dbName + ".Persons"; try { ... while (rs.next()) { int id = rs.getInt("ID"); String firstName = rs.getString("FirstName"); String lastName = rs.getString("LastName"); String city= rs.getString("City"); } } catch (SQLException e ) { (Hopefully it does not happen) }
data.sql
Person.java
…the complexive system, works, sometimes
If we would automatically identify
cross-language relations we could:
• Recognize them
• Support refactoring
• Validate them
• Navigate them
So I am aware that this ID is
related to something else
If we would automatically identify
cross-language relations we could:
• Recognize them
• Support refactoring
• Validate them
• Navigate them
If I change one, the others are
updated
If we would automatically identify
cross-language relations we could:
• Recognize them
• Support refactoring
• Validate them
• Navigate them
See broken relations as errors
If we would automatically identify
cross-language relations we could:
• Recognize them
• Support refactoring
• Validate them
• Navigate them
Click to see the other side of
the relation
CodeModels
ASTs
Embedded AST (prendo immagine da paper)
<ul id="types">
<li ng-repeat="t in types" ng-class="{'selected': t.id == type}">
<a ng-href="#/{{t.id}}">{{t.title}}</a>
</li>
</ul>
var types = [
{ id: 'sliding-puzzle', title: 'Sliding puzzle' },
{ id: 'word-search-puzzle', title: 'Word search puzzle' }
];
index.html
app.js
app.controller('slidingAdvancedCtrl', function($scope) {
$scope.puzzles = [
{ src: './img/misko.jpg', title: 'Miško Hevery', rows: 4, cols: 4 },
{ src: './img/igor.jpg', title: 'Igor Minár', rows: 3, cols: 3 },
{ src: './img/vojta.jpg', title: 'Vojta Jína', rows: 4, cols: 3 }
];
});
<div ng-repeat="puzzle in puzzles">
<h2>{{puzzle.title}}</h2>
…
</div>
<ul id="types">
<li ng-repeat="t in types" ng-class="{'selected': t.id == type}">
<a ng-href="#/{{t.id}}">{{t.title}}</a>
</li>
</ul>
var types = [
{ id: 'sliding-puzzle', title: 'Sliding puzzle' },
{ id: 'word-search-puzzle', title: 'Word search puzzle' }
];
index.html
app.js
app.controller('slidingAdvancedCtrl', function($scope) {
$scope.puzzles = [
{ src: './img/misko.jpg', title: 'Miško Hevery', rows: 4, cols: 4 },
{ src: './img/igor.jpg', title: 'Igor Minár', rows: 3, cols: 3 },
{ src: './img/vojta.jpg', title: 'Vojta Jína', rows: 4, cols: 3 }
];
});
<div ng-repeat="puzzle in puzzles">
<h2>{{puzzle.title}}</h2>
…
</div>
Context of a node:
all the descendants
+
the siblings and their descendants
Context of a node:
all the descendants
+
the siblings and their descendants
Some metrics we use:
• Number of shared values
• Min and max number of different values
• Tversky Index
𝑇𝑉 𝑋, 𝑌 =|𝑋∩𝑌|
|𝑋∩𝑌|+𝛼|𝑋−𝑌|+𝛽|𝑌−𝑋|
• Jaro, Jaccard, tf-idf and others
How to compare contexts:
1) Take all the values in the context (IDs, strings,
numbers)
+
2) Employ different metrics
How to combine those metrics:
Random Tree tells us
We built a golden set of 1200 candidate relations
(around 140 real relations, the other just same ID)
We train it with golden set
Random Tree find out the best way to combine those
metrics to decide if a pair is related or not
Rule to understand if two nodes with same ID are
connected
Output of Random Tree
How to evaluate it?
10-fold cross valiationn
What now?
Code available at:
https://github.com/orgs/CrossLanguageProject
• We want to build a larger golden set
• We want to integrate support in editors
What we have
• A tool that spot automatically cross-language relations
with a precision and recall > 90% (on a first in-house
dataset)
Code available at:
https://github.com/orgs/CrossLanguageProject
www.slideshare.net/FTomassetti
Spotting Automatically
Cross-Language Relations
Federico Tomassetti, Giuseppe Rizzo, Marco Torchiano
CSMR 2014, Antwerpen, Belgium
Preprint at:
http://www.di.unito.it/~rizzo/publications/Tomassetti_Rizzo-CSMRWCRE2014.pdf