27
Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Embed Size (px)

Citation preview

Page 1: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Code and Pattern Mining in C/C++

Aditya S. DeshpandeNamratha Nayak

Guides:Dr. A.Serebrenik(TU/e)

P.Kourzanov, ir(NXP)Y.Dajsuren, PDEng(Virage Logic)

Page 2: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Agenda

• Introduction• Problem Definition• Data flow• Design patterns• Summary

/ Faculteit Wiskunde en Informatica PAGE 211-04-23

Page 3: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Introduction

• Code mining – Process of extracting patterns from source code.

• Design Patterns – A design pattern is a general reusable solution to a commonly occurring problem in software design.

• Streaming Data - Data streaming is the transfer of data at a steady high-speed rate sufficient to support such applications as high-definition television or a radio signal.

/ Faculteit Wiskunde en Informatica PAGE 311-04-23

Page 4: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Problem Definition

• Lack of synchronisation between models and source code.

• Significant amount of repetitive code in different modules.

• Identifying patterns and integrating them in the framework.

• Objective

/ Faculteit Wiskunde en Informatica PAGE 411-04-23

Page 5: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Approach

• Study the Design flow models available.• Study the various design pattern matching methods

and tools.

/ Faculteit Wiskunde en Informatica PAGE 511-04-23

Page 6: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Data flow models

• Kahn Process Networks.• Synchronous Data Flow.

/ Faculteit Wiskunde en Informatica PAGE 611-04-23

Page 7: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Kahn Process Network - Introduction

• Processes communicate via FIFO.• Parallel communication is organized as follows

• Autonomous computing stations are connected to each other in a network by communication lines.

• A station computes on data coming on its input lines to produce output on some or all of its output lines.

• Assumptions• Communication lines are the only means of communication.

• Communication lines transmit info within a finite time.

/ Faculteit Wiskunde en Informatica PAGE 711-04-23

Page 8: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Kahn Process Network - Introduction

• Restrictions• At any given time a computing station is either computing or

waiting for information on one of its input lines.

• Each computing station follows a sequential program.

/ Faculteit Wiskunde en Informatica PAGE 811-04-23

Page 9: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Kahn Process Network - Example

• From Kahn’s original 1974 paper

process f(in int u, in int v, out int w){ int i; bool b = true; for (;;) { i = b ? wait(u) : wait(v); printf("%i\n", i); send(i, w); b = !b; }}

/ Faculteit Wiskunde en Informatica PAGE 911-04-23

Process alternately reads from u and v, prints the data value, and writes it to w.

Process alternately reads from u and v, prints the data value, and writes it to w.

u

vwff

Page 10: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Kahn Process Network - Example

• From Kahn’s original 1974 paper

process f(in int u, in int v, out int w){ int i; bool b = true; for (;;) { i = b ? wait(u) : wait(v); printf("%i\n", i); send(i, w); b = !b; }}

/ Faculteit Wiskunde en Informatica PAGE 1011-04-23

Process interface includes FIFO’s.

wait() returns the next token in an input FIFO, blocking if it’s empty

send() writes a data value on an output FIFO

Page 11: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

SDF - Introduction

• Synchronous data flow graph (SDF) is a network of synchronous nodes (also called blocks).

• For a synchronous node, the consumptions and productions are known a priori.

• Homogeneous SDF

/ Faculteit Wiskunde en Informatica PAGE 1111-04-23

Page 12: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

SDF - Delay

• Delay of signal processing• Unit delay on arc between A and B, means

• nth sample consumed by B, is (n-1)th sample

produced by A.• The arc is initialized with ‘d’ zero samples.

/ Faculteit Wiskunde en Informatica PAGE 1211-04-23

A d B

Page 13: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

SDF - Implementation

• Implementation requires:

• Buffering of the data samples passing between nodes

• Schedule nodes when inputs are available

• Dynamic implementation (= runtime) requires

• Runtime scheduler checks when inputs are available and schedules nodes when a processor is free.

/ Faculteit Wiskunde en Informatica PAGE 1311-04-23

Page 14: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

SDF - Implementation

• Contribution of Lee-87:

• SDF graphs can be scheduled at compile time

• No overhead• Compiler will:

• Determine the execution order of the nodes on one or multiple processors or data path units

• Determine communication buffers between nodes.

/ Faculteit Wiskunde en Informatica PAGE 1411-04-23

Page 15: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Design Patterns

• Describe solutions for common recurring problems• Can be used in a wider context as they are defined

informally• Documenting them in a software system simplifies

maintenance and program understanding• Usually it is not documented, so there is a need to

discover design patterns from source code

/ Faculteit Wiskunde en Informatica PAGE 1511-04-23

Page 16: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Pattern Mining

• Structure of design pattern is searched in the source code.

• Should include the main properties of the design pattern

• Flexible to describe the slightly distorted pattern occurrences.

• Helps to understand the relationships between the different parts of a large system

/ Faculteit Wiskunde en Informatica PAGE 1611-04-23

Page 17: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Pattern Mining

• Reverse Engineering • Analysis of a system to

− Identify the components and their interrelationships

− Create representations of the system in another form

• Why tools for Reverse Engineering?• Existing legacy code

• High number of participants in code development

• Tools developed to mine the patterns from the source code

/ Faculteit Wiskunde en Informatica PAGE 1711-04-23

Page 18: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Pattern Mining Tools

• Aspects in the different mining tools• Programming Language : Tools for Java and C++

• Method used to discover design patterns : Graph Matching , Constraint Satisfaction Problem, pattern inference

• Intermediate Representation – Abstract Semantic Graph, Abstract Syntax Tree, Matrix and Vector

/ Faculteit Wiskunde en Informatica PAGE 1811-04-23

Page 19: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Columbus – Design Pattern Mining Tool

• Reverse engineering framework• Developed in cooperation between the Research

Group on Artificial Intelligence in Szeged, the Software Technology Laboratory of the Nokia Research Center and FrontEndART Ltd.

• Analyze large C/C++ projects and extract data according to the Columbus Schema

• Supports project handling , data extraction , data representation, data storage, filtering and visualization

/ Faculteit Wiskunde en Informatica PAGE 1911-04-23

Page 20: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Columbus - Design Pattern Mining Tool

• Has a C/C++ extractor plug-in that performs the parsing of the source code

• Information collected by the plug-in corresponds to the Columbus Schema

• Schema captures C++ language at low detail(i.e, Abstract Syntax Tree) and has the higher –level elements(i.e., semantics of types)

• Supports various file formats for exporting the extracted data

/ Faculteit Wiskunde en Informatica PAGE 2011-04-23

Page 21: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Other Pattern Mining Tools

• Other tools to be studied• CPP2XMI

• Maisa

• CrocoPat

/ Faculteit Wiskunde en Informatica PAGE 2111-04-23

Page 22: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Issues to be considered

• Can the tools support NXP source Code?• Would it be possible to add proprietary patterns to

these tools?• Can these tools be extended to support other

languages like C?

/ Faculteit Wiskunde en Informatica PAGE 2211-04-23

Page 23: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Summary

• Overview of the Data flow models• Introduced the design pattern mining tool -

Columbus • Find the patterns present in the NXP source code

and check whether these can be mined using the available tools

/ Faculteit Wiskunde en Informatica PAGE 2311-04-23

Page 24: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

References

• E.A.Lee and D.G.Messerschmitt, “Synchronous data flow”,Proc. IEEE, vol. 75, pp. 1235-1245, Sept 1987.

• G.Kahn, “The semantics of a simple language for parallel programming”, Proc.IFIP congr., Stockholm, Sweden, Aug.1974, pp.471-475

• Gamma, E., Helm, R., Johnson, R. and Vlissides, J. Design Patterns - Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.

/ Faculteit Wiskunde en Informatica PAGE 2411-04-23

Page 25: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

References

• R. Ferenc, A. Beszedes, M. Tarkiainen, and T. Gyimothy. Columbus – Reverse Engineering Tool and Schema for C++. In Proceedings of the 6th International Conference on Software Maintenance (ICSM 2002), pages 172–181. IEEE Computer Society, Oct. 2002.

• R. Ferenc , and A. Beszedes. Data Exchange with the Columbus Schema for C++. In Proceedings of the 6th European Conference on Software Maintenance and Reengineering (CSMR 2002), pages 59–66. IEEE Computer Society, Mar. 2002.

/ Faculteit Wiskunde en Informatica PAGE 2511-04-23

Page 26: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

References

• Z. Balanyi, and R. Ferenc. Mining Design Patterns from C++ Source Code. In Proceedings of the 19th International Conference on Software Maintenance (ICSM 2003), pages 305–314. IEEE Computer Society, Sept. 2003.

/ Faculteit Wiskunde en Informatica PAGE 2611-04-23

Page 27: Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)

Questions

/ Faculteit Wiskunde en Informatica PAGE 2711-04-23