32
Software Networks Christian Bird Computer Science Dept. UC Davis

Software Networks

  • Upload
    holli

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Software Networks. Christian Bird Computer Science Dept. UC Davis. A network like any other. A software network is made up of Nodes: software artifacts Edges: relationships between those artifacts (may be directed or undirected). imports. module. function. requires. co-comitted. file. - PowerPoint PPT Presentation

Citation preview

Page 1: Software Networks

Software Networks

Christian Bird

Computer Science Dept.

UC Davis

Page 2: Software Networks

A network like any other

• A software network is made up of– Nodes: software artifacts– Edges: relationships between those artifacts

(may be directed or undirected)

functionmodule

class file

imports

co-comitted

includes

requires

Page 3: Software Networks

• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries

Nodes

Page 4: Software Networks

• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions (3000 in apache)– Classes– Files– Modules/Packages– Directories– Libraries

Nodes

int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c;}

Page 5: Software Networks

• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries

Nodes

Class Logger { int logItem(Object item, int level) { stuff… } int logError(String msg) { more stuff… } more functions…}

Page 6: Software Networks

• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files (300 in apache)– Modules/Packages– Directories– Libraries

Nodes

math.cfloat absoluteValue(float a) { return a > 0 ? a : -a;}

void printName(char *name) { printf(“Hello %s\n”, name);}

more functions…

Page 7: Software Networks

• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries

Nodes

class Logger { stuff…}

class LogMessage { stuff…}

class LogError { stuff…}

more classes…

Page 8: Software Networks

• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories (65 in apache)– Libraries

Nodes

/apache/http-2.0/server/core/handle.c/apache/http-2.0/server/core/serve.c/apache/http-2.0/server/core/cgi.c/apache/http-2.0/server/core/locking.c

Page 9: Software Networks

• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries (25 in apache)

Nodes

libkdeinit_konqueror.solibkonq.so.4libkutils.so.1libkio.so.4libkdeui.so.4libkdesu.so.4libkdecore.so.4libDCOP.so.4libdl.so.2libresolv.so.2libutil.so.1libart_lgpl_2.so.2 libidn.so.11libqt-mt.so.3libpng12.so.0libXext.so.6 libX11.so.6libSM.so.6libICE.so.6libXrender.so.1

Page 10: Software Networks

Edges

• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries

Page 11: Software Networks

Edges

• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries

int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c;}

Page 12: Software Networks

Edges

• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries

Class Logger inherits Writer{ int logItem(LogMessage item, int level) { stuff… } int logError(String msg) { more stuff… } more functions… FileWriter w }

Page 13: Software Networks

Edges

• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries

math.cfloat absoluteValue(float a) { return max(a, -a);}

void printName(char *name) { printf(“Hello %s\n”, name);}

more functions…

Page 14: Software Networks

Edges

• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries

import java.lang.util;import edu.ucdavis.senses;class WirelessSensor { …}

Page 15: Software Networks

Edges

• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries

A function in /apache/http-2.0/server/core/handle.c

may call a function in /apache/http-2.0/apr-util/hash.c

Page 16: Software Networks

Edges

• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries

Library libkdecore.so may need toLoad libqt3-mt.so which in turn mayNeed to load libX11.so and libm.so whichAll need libc.so

libkdecore.so

libqt3-mt.so

libX11.so libm.so

libc.so

Page 17: Software Networks

Example Callgraphvoid printInt(int a) { printf(“the number is %i\n”, a);}

int add(int a, int b) { return a + b;}

int multiply(int a, int b) { return a * b;}

int factorial(int a) { if (a == 1) return a; return multiply(a,factorial(a-1));}

void main() { printf(“calculating 6!\n”); printInt(factorial(6));}

printInt

addmultiply

factorial

main

printf

Never called

Page 18: Software Networks

Static versus Runtime Callgraphs

• Static callgraphs are constructed by a syntactic analysis of the source code

• Pros– Don’t have to build or run the program– Works in the presence of syntactic or semantic errors– Catches calls for exceptional situations– Fairly fast

• Cons– Doesn’t get valued information (how many calls to each function)– Includes calls in dead code. Example: if (0 == 3) logError(…)– Doesn’t include calls through function pointers– Doesn’t include calls to functions in dynamically loaded libraries

Page 19: Software Networks

Static versus Runtime Callgraphs

• Runtime callgraphs are constructed by running a piece of software one or more times and logging the number of function calls

• Pros– Includes number of times function calls occur– Includes calls through function pointers and dynamically loaded

libraries– Will not include calls in dead code

• Cons– Requires building the software– Hard to get complete code coverage– Can take a long time– May require a test harness of some kind (especially for

interactive applications) along with test data

Page 20: Software Networks

Differences between callgraphs and other graphs we’ve seen

• Has a root and commonly will form a tree-like structure

• Few if any cycles in callgraphs (direct or indirect recursion is rare)

• Reciprocity is not common due to levels of abstraction

• Preferential attachment?– If a function is called by many functions is it more

likely to be called by other functions in the future? Maybe.

Page 21: Software Networks

Software Repositories

• Used in development of virtually any software project (commercial, personal, OSS, etc.)

• Examples include RCS, CVS, subversion, perforce, bitkeeper, and sourcesafe

• Keeps track of every change to the software, who made the change, time of change, comments associated with a change, etc.

• Allows us to view the evolution of a piece of software• A developer makes changes to software code and then

commits the changes to the software respository with a description of the changes

Page 22: Software Networks

Software Networks from Repositories

• The software history allows us to relate different artifacts in the software

• Create an edge between functions, files, classes, if they all were modified in the same commit

• Create an edge between artifacts if they were modified by the same developer

Page 23: Software Networks

Modularity: one use of a callgraph• The characteristic of a system that has been divided into

smaller subsystems which interact with each other• Software that is modular has distinct subsystems

(modules) with high levels of interaction within the subsystems and low levels of interaction between the subsystems

• Software that is modular is easier to understand and maintain

Filesystem

Scheduler

I/O devicesMemory Management

Networking

Kernel

Modular OS

Page 24: Software Networks

Modularity Case Study using Callgraphs

• Exploring the structure of Complex Software Designs: An Empirical Study of Open Source by Alan MacCormack, John Rusnak, and Carliss Baldwin

• Created a “Design Structure Matrix” at the file level using function calls as ties. (i.e. if a function in foo.c calls a function in bar.c then there is a tie from foo.c to bar.c, non-symmetric)

• Used static analysis to extract the file-level callgraph• Clustered the DSM using standard clustering techniques• Metrics used:

– Clustering cost: measure of how many function calls are not within a cluster

– Propagation cost: measure of how many functions will be affected if a particular function is modified

Page 25: Software Networks

DSM examplesExample System in Graphical and Dependency Matrix Form

A DSM with dependencies in an “Idealized Modular Form”

All calls are within clusters so the clustering cost is 0

A change to F propagates to E, C, and A while a change to B only propagates to A

Page 26: Software Networks

Mozilla Project

• Netscape opensourced Navigator in March 1998• The project was named Mozilla and eventually

led to what Firefox is today• Initially the code was complex and tightly

coupled, a common phenomenon in industry code

• This formed a high barrier to entry for volunteers to contribute code

• Architecture was re-designed in late 1998 due to increasing complexity

Page 27: Software Networks

DSM’s for Mozilla

Page 28: Software Networks

Results of Mozilla Re-design

Page 29: Software Networks

More Results

• After the re-design, volunteerism went up dramatically (critical for an OSS project to succeed)

• Both functionality and performance increased

• Both code size and number of files decreased (initially)

Page 30: Software Networks

What are we doing with software nets?

• Due to CVS history, we can create a callgraph for a piece of software at any time during it’s evolution

• Do certain parts of the callgraph stabilize before others? Why?

• Are certain portions of the callgraph more bug-prone than others?

• What does code ownership in the callgraph look like?

• What is the relationship between callgraph network, co-commit network, and ownership network?

Page 31: Software Networks

More Questions

• Does the software network bear any resemblance to the social network of the developers who work on it? (Conway’s Law)

• Are callgraphs small-world networks? What is the distribution of in- and out-degrees? What would the answers mean (if anything)?

• What partitioning techniques allow us to extract module structure from source code?

• Is there a relationship between the co-committer social network and the email social network for developers?

Page 32: Software Networks

On with the show…