Upload
holli
View
35
Download
0
Embed Size (px)
DESCRIPTION
Software Networks. Christian Bird Computer Science Dept. UC Davis. A network like any other. A software network is made up of Nodes: software artifacts Edges: relationships between those artifacts (may be directed or undirected). imports. module. function. requires. co-comitted. file. - PowerPoint PPT Presentation
Citation preview
Software Networks
Christian Bird
Computer Science Dept.
UC Davis
A network like any other
• A software network is made up of– Nodes: software artifacts– Edges: relationships between those artifacts
(may be directed or undirected)
functionmodule
class file
imports
co-comitted
includes
requires
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Nodes
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions (3000 in apache)– Classes– Files– Modules/Packages– Directories– Libraries
Nodes
int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c;}
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Nodes
Class Logger { int logItem(Object item, int level) { stuff… } int logError(String msg) { more stuff… } more functions…}
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files (300 in apache)– Modules/Packages– Directories– Libraries
Nodes
math.cfloat absoluteValue(float a) { return a > 0 ? a : -a;}
void printName(char *name) { printf(“Hello %s\n”, name);}
more functions…
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Nodes
class Logger { stuff…}
class LogMessage { stuff…}
class LogError { stuff…}
more classes…
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories (65 in apache)– Libraries
Nodes
/apache/http-2.0/server/core/handle.c/apache/http-2.0/server/core/serve.c/apache/http-2.0/server/core/cgi.c/apache/http-2.0/server/core/locking.c
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries (25 in apache)
Nodes
libkdeinit_konqueror.solibkonq.so.4libkutils.so.1libkio.so.4libkdeui.so.4libkdesu.so.4libkdecore.so.4libDCOP.so.4libdl.so.2libresolv.so.2libutil.so.1libart_lgpl_2.so.2 libidn.so.11libqt-mt.so.3libpng12.so.0libXext.so.6 libX11.so.6libSM.so.6libICE.so.6libXrender.so.1
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c;}
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Class Logger inherits Writer{ int logItem(LogMessage item, int level) { stuff… } int logError(String msg) { more stuff… } more functions… FileWriter w }
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
math.cfloat absoluteValue(float a) { return max(a, -a);}
void printName(char *name) { printf(“Hello %s\n”, name);}
more functions…
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
import java.lang.util;import edu.ucdavis.senses;class WirelessSensor { …}
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
A function in /apache/http-2.0/server/core/handle.c
may call a function in /apache/http-2.0/apr-util/hash.c
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Library libkdecore.so may need toLoad libqt3-mt.so which in turn mayNeed to load libX11.so and libm.so whichAll need libc.so
libkdecore.so
libqt3-mt.so
libX11.so libm.so
libc.so
Example Callgraphvoid printInt(int a) { printf(“the number is %i\n”, a);}
int add(int a, int b) { return a + b;}
int multiply(int a, int b) { return a * b;}
int factorial(int a) { if (a == 1) return a; return multiply(a,factorial(a-1));}
void main() { printf(“calculating 6!\n”); printInt(factorial(6));}
printInt
addmultiply
factorial
main
printf
Never called
Static versus Runtime Callgraphs
• Static callgraphs are constructed by a syntactic analysis of the source code
• Pros– Don’t have to build or run the program– Works in the presence of syntactic or semantic errors– Catches calls for exceptional situations– Fairly fast
• Cons– Doesn’t get valued information (how many calls to each function)– Includes calls in dead code. Example: if (0 == 3) logError(…)– Doesn’t include calls through function pointers– Doesn’t include calls to functions in dynamically loaded libraries
Static versus Runtime Callgraphs
• Runtime callgraphs are constructed by running a piece of software one or more times and logging the number of function calls
• Pros– Includes number of times function calls occur– Includes calls through function pointers and dynamically loaded
libraries– Will not include calls in dead code
• Cons– Requires building the software– Hard to get complete code coverage– Can take a long time– May require a test harness of some kind (especially for
interactive applications) along with test data
Differences between callgraphs and other graphs we’ve seen
• Has a root and commonly will form a tree-like structure
• Few if any cycles in callgraphs (direct or indirect recursion is rare)
• Reciprocity is not common due to levels of abstraction
• Preferential attachment?– If a function is called by many functions is it more
likely to be called by other functions in the future? Maybe.
Software Repositories
• Used in development of virtually any software project (commercial, personal, OSS, etc.)
• Examples include RCS, CVS, subversion, perforce, bitkeeper, and sourcesafe
• Keeps track of every change to the software, who made the change, time of change, comments associated with a change, etc.
• Allows us to view the evolution of a piece of software• A developer makes changes to software code and then
commits the changes to the software respository with a description of the changes
Software Networks from Repositories
• The software history allows us to relate different artifacts in the software
• Create an edge between functions, files, classes, if they all were modified in the same commit
• Create an edge between artifacts if they were modified by the same developer
Modularity: one use of a callgraph• The characteristic of a system that has been divided into
smaller subsystems which interact with each other• Software that is modular has distinct subsystems
(modules) with high levels of interaction within the subsystems and low levels of interaction between the subsystems
• Software that is modular is easier to understand and maintain
Filesystem
Scheduler
I/O devicesMemory Management
Networking
Kernel
Modular OS
Modularity Case Study using Callgraphs
• Exploring the structure of Complex Software Designs: An Empirical Study of Open Source by Alan MacCormack, John Rusnak, and Carliss Baldwin
• Created a “Design Structure Matrix” at the file level using function calls as ties. (i.e. if a function in foo.c calls a function in bar.c then there is a tie from foo.c to bar.c, non-symmetric)
• Used static analysis to extract the file-level callgraph• Clustered the DSM using standard clustering techniques• Metrics used:
– Clustering cost: measure of how many function calls are not within a cluster
– Propagation cost: measure of how many functions will be affected if a particular function is modified
DSM examplesExample System in Graphical and Dependency Matrix Form
A DSM with dependencies in an “Idealized Modular Form”
All calls are within clusters so the clustering cost is 0
A change to F propagates to E, C, and A while a change to B only propagates to A
Mozilla Project
• Netscape opensourced Navigator in March 1998• The project was named Mozilla and eventually
led to what Firefox is today• Initially the code was complex and tightly
coupled, a common phenomenon in industry code
• This formed a high barrier to entry for volunteers to contribute code
• Architecture was re-designed in late 1998 due to increasing complexity
DSM’s for Mozilla
Results of Mozilla Re-design
More Results
• After the re-design, volunteerism went up dramatically (critical for an OSS project to succeed)
• Both functionality and performance increased
• Both code size and number of files decreased (initially)
What are we doing with software nets?
• Due to CVS history, we can create a callgraph for a piece of software at any time during it’s evolution
• Do certain parts of the callgraph stabilize before others? Why?
• Are certain portions of the callgraph more bug-prone than others?
• What does code ownership in the callgraph look like?
• What is the relationship between callgraph network, co-commit network, and ownership network?
More Questions
• Does the software network bear any resemblance to the social network of the developers who work on it? (Conway’s Law)
• Are callgraphs small-world networks? What is the distribution of in- and out-degrees? What would the answers mean (if anything)?
• What partitioning techniques allow us to extract module structure from source code?
• Is there a relationship between the co-committer social network and the email social network for developers?
On with the show…