Upload
stephen-stephens
View
20
Download
0
Embed Size (px)
DESCRIPTION
Finding File Clones in FreeBSD Ports Collection. Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue. File Clones. Two or more files with the same content Comments and code indentation ignored Inside a project or between different projects Research about file-clones is scarce - PowerPoint PPT Presentation
Citation preview
Department of Computer Science, Graduate School of Information Science & Technology,Osaka University
Finding File Clones in FreeBSD Ports Collection
Yusuke Sasaki
Tetsuo Yamamoto
Yasuhiro Hayase
Katsuro Inoue
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
File Clones
Two or more files with the same contentComments and code indentation ignored
Inside a project or between different projects Research about file-clones is scarce
Get new knowledge about file-clones
int main() {printf(“Hello msr!”);return 0;}
Project AProject A Project BProject B
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
FCFinder
Input .c and .h files
Output File-clone sets
Faster than other tools
Detection Tokenization MD5 Hash Calculation Exact Matching
Tool Speed
CCFinder 1.4M files / 960 hours x1 1PC
D-CCFinder 1.4M files / 51 hours x19 80PCs
FCFinder 1.4M files / 17.16 hours x55 1PC
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Experiment
Target Only .c and .h files in the FreeBSD Ports Collection ~1.4M files ~12 GB 17.16 hours
We measured: File size Number of files in each project Size of each file-clone set Number of file-clones in a project
These values follow the power law
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
population of file clone set
num
ber o
f file
clo
ne s
ets
110
0File-clone Set Size
5 5010 100
Left : used in PHP5Right : used in PHP4
DE
used in both of PHP4 and 5
419 setsL:650 setsR:500 sets
L:61 file clonesR:59 file clones
120 file clones
file clone set size R*2 = 0.8508
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
number of file clones in projects (clones inside project are excluded)
num
ber o
f pro
ject
s w
ith fi
le c
lone
s
15
5050
0
File-clones per Project
Right : PHP4 modulesCenter : projects related bin-utilsLeft : PHP5 modules
G
5 5010 100 500 1K 5K 10K
number of file clone sets R*2 = 0.8263
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
File-clones Between Projects (1/3)
* Nodes show the projects* Edges between projects show the number of file clones between two projects
Ex) gcc41 and gfortran shares 7691 file clones
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
File-clones Between Projects (2/3)
* Nodes show the projects* Edges between projects show the number of file clones between two projects
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
File-clones Between Projects (3/3)
* Nodes show the projects* Edges between projects show the number of file clones between two projects
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Conclusions & Future Work
Conclusions Measured several features of the FreeBSD
Ports collection. Found that the measured features follow the
power law
Future Work Projects logical coupling investigation