10
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Finding File Clones in FreeBSD Ports Collection Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue

Finding File Clones in FreeBSD Ports Collection

Embed Size (px)

DESCRIPTION

Finding File Clones in FreeBSD Ports Collection. Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue. File Clones. Two or more files with the same content Comments and code indentation ignored Inside a project or between different projects Research about file-clones is scarce - PowerPoint PPT Presentation

Citation preview

Page 1: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology,Osaka University

Finding File Clones in FreeBSD Ports Collection

Yusuke Sasaki

Tetsuo Yamamoto

Yasuhiro Hayase

Katsuro Inoue

Page 2: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

File Clones

Two or more files with the same contentComments and code indentation ignored

Inside a project or between different projects Research about file-clones is scarce

Get new knowledge about file-clones

int main() {printf(“Hello msr!”);return 0;}

Project AProject A Project BProject B

Page 3: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

FCFinder

Input .c and .h files

Output File-clone sets

Faster than other tools

Detection Tokenization MD5 Hash Calculation Exact Matching

Tool Speed

CCFinder 1.4M files / 960 hours x1 1PC

D-CCFinder 1.4M files / 51 hours x19 80PCs

FCFinder 1.4M files / 17.16 hours x55 1PC

Page 4: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

Experiment

Target Only .c and .h files in the FreeBSD Ports Collection ~1.4M files ~12 GB 17.16 hours

We measured: File size Number of files in each project Size of each file-clone set Number of file-clones in a project

These values follow the power law

Page 5: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

population of file clone set

num

ber o

f file

clo

ne s

ets

110

0File-clone Set Size

5 5010 100

Left : used in PHP5Right : used in PHP4

DE

used in both of PHP4 and 5

419 setsL:650 setsR:500 sets

L:61 file clonesR:59 file clones

120 file clones

file clone set size R*2 = 0.8508

Page 6: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

number of file clones in projects (clones inside project are excluded)

num

ber o

f pro

ject

s w

ith fi

le c

lone

s

15

5050

0

File-clones per Project

Right : PHP4 modulesCenter : projects related bin-utilsLeft : PHP5 modules

G

5 5010 100 500 1K 5K 10K

number of file clone sets R*2 = 0.8263

Page 7: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

File-clones Between Projects (1/3)

* Nodes show the projects* Edges between projects show the number of file clones between two projects

Ex) gcc41 and gfortran shares 7691 file clones

Page 8: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

File-clones Between Projects (2/3)

* Nodes show the projects* Edges between projects show the number of file clones between two projects

Page 9: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

File-clones Between Projects (3/3)

* Nodes show the projects* Edges between projects show the number of file clones between two projects

Page 10: Finding File Clones in FreeBSD Ports Collection

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University

Conclusions & Future Work

Conclusions Measured several features of the FreeBSD

Ports collection. Found that the measured features follow the

power law

Future Work Projects logical coupling investigation