10
Successful Global Development of a Large-scale Embedded Telecommunications Product Marek Leszak and Manfred Meier Alcatel-Lucent, Thurn-und-Taxis-Str. 10, 90411 Nuernberg, Germany {mleszak,manfredmeier}@alcatel-lucent.com Abstract This paper describes a success story in large-scale global systems & software development. Carried out at several company locations and associated supplier sites, we describe how we manage and develop a complex embedded product in a global dynamic environment. Main acting locations are at Germany and China. Organizational key success factors include very experienced project management team, partnership-based collaboration in cross-functional teams across locations, and well-defined roles & responsibilities on all project levels. Sociological key factors include frequent cross-site visits on engineering and management level, cross-cultural trainings, and high process and quality focus of the whole project team. Technical key factors include feature-driven and timeboxing based iterative process, effective multi-site tools for configuration control, for change management, for joint asynchronous reviews, for software integration and release, and full traceability of all product changes. Evidence by some process measurements is provided supporting our view that our project is very successful. 1. Introduction This paper describes and characterizes a successful collaboration between a German and a Chinese development location of Alcatel-Lucent resulting in a large-scale, multi-release telecommunications product delivered to many network suppliers worldwide. This collaboration aims at developing an optical network element as a node for backbone and metropolitan communications networks, called “ADMu”. ADMu is a complex embedded system consisting of embedded hardware (circuit packs, ASICs, FPGAs, optical components) and associated software. Source code size is about 2.7 Million lines of product code (excluding generated source code, test software, and 3 rd party software when measuring the joint manual programming effort). Programming languages used are largely ANSI-C and C++. The ADMu system has high architectural complexity – there are several levels of concurrency: software threads, processor level, hardware board level, system level (called network element as part of a multi-node transmission network), and multi-system protocol communication between network elements and to a network management system. Global development–our rationale. The very tough competitive market for telecom supplier equipment with ever-declining prices makes cost reduction always a strategic management consideration. This trend forces us to cut costs, by partnerships and offshoring with low-cost countries as well as shifting company-internal development to locations in such countries without compromising the focus on time-to-market and sustained high level of product quality of new feature releases. For the ADMu product development originally started at European locations, the need for development cooperation with our Shanghai (SH) branch has originated also from the requirement to have a design center and manufacturing site at China as a pre-requisite to make business with Chinese telecommunications service providers. In this paper we focus on this collaboration between the Nuernberg (NB) location and SH. In addition, ADMu development involves 3 rd party software and hardware suppliers. Also our customers and their communication networks are globally distributed over all continents worldwide – this is out-of-scope of this paper. Global development–our success criteria. We have managed to make the transition of a significant part of product development to SH several years ago while fulfilling our project commitments for most releases of the ADMu product: releases were fully on schedule for customer deliveries (called “general availability” (GA) milestone) with the largest part of the feature set delivered as agreed at project start. In terms of post-delivery quality, most ADMu releases face only a couple of critical and major customer found defects till 12 month after GA – a low number which is much better than industry average of competing telecom products of the same category. This evidence is based on a standardized measurement and International Conference on Global Software Engineering(ICGSE 2007) 0-7695-2920-8/07 $25.00 © 2007

[IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

  • Upload
    manfred

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

Successful Global Development of a Large-scale Embedded Telecommunications Product

Marek Leszak and Manfred Meier

Alcatel-Lucent, Thurn-und-Taxis-Str. 10, 90411 Nuernberg, Germany {mleszak,manfredmeier}@alcatel-lucent.com

Abstract

This paper describes a success story in large-scale global systems & software development. Carried out at several company locations and associated supplier sites, we describe how we manage and develop a complex embedded product in a global dynamic environment. Main acting locations are at Germany and China. Organizational key success factors include very experienced project management team, partnership-based collaboration in cross-functional teams across locations, and well-defined roles & responsibilities on all project levels. Sociological key factors include frequent cross-site visits on engineering and management level, cross-cultural trainings, and high process and quality focus of the whole project team. Technical key factors include feature-driven and timeboxing based iterative process, effective multi-site tools for configuration control, for change management, for joint asynchronous reviews, for software integration and release, and full traceability of all product changes. Evidence by some process measurements is provided supporting our view that our project is very successful. 1. Introduction

This paper describes and characterizes a successful collaboration between a German and a Chinese development location of Alcatel-Lucent resulting in a large-scale, multi-release telecommunications product delivered to many network suppliers worldwide. This collaboration aims at developing an optical network element as a node for backbone and metropolitan communications networks, called “ADMu”. ADMu is a complex embedded system consisting of embedded hardware (circuit packs, ASICs, FPGAs, optical components) and associated software. Source code size is about 2.7 Million lines of product code (excluding generated source code, test software, and 3rd party software when measuring the joint manual programming effort). Programming languages used are largely ANSI-C and C++. The ADMu system has high architectural complexity – there are several levels of

concurrency: software threads, processor level, hardware board level, system level (called network element as part of a multi-node transmission network), and multi-system protocol communication between network elements and to a network management system.

Global development–our rationale. The very tough competitive market for telecom supplier equipment with ever-declining prices makes cost reduction always a strategic management consideration. This trend forces us to cut costs, by partnerships and offshoring with low-cost countries as well as shifting company-internal development to locations in such countries without compromising the focus on time-to-market and sustained high level of product quality of new feature releases. For the ADMu product development originally started at European locations, the need for development cooperation with our Shanghai (SH) branch has originated also from the requirement to have a design center and manufacturing site at China as a pre-requisite to make business with Chinese telecommunications service providers. In this paper we focus on this collaboration between the Nuernberg (NB) location and SH. In addition, ADMu development involves 3rd party software and hardware suppliers. Also our customers and their communication networks are globally distributed over all continents worldwide – this is out-of-scope of this paper.

Global development–our success criteria. We have managed to make the transition of a significant part of product development to SH several years ago while fulfilling our project commitments for most releases of the ADMu product: releases were fully on schedule for customer deliveries (called “general availability” (GA) milestone) with the largest part of the feature set delivered as agreed at project start.

In terms of post-delivery quality, most ADMu releases face only a couple of critical and major customer found defects till 12 month after GA – a low number which is much better than industry average of competing telecom products of the same category. This evidence is based on a standardized measurement and

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 2: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

benchmarking system applied by us as a supplier and used by our customers to benchmark field performance based on the telecom-specific ISO9001 extension called “TL9000” [15]. Related work. Our study reveals many similar techniques applied as the one in [1]; esp. applying an incremental feature-driven process and strong requirements management are key success factors we share with this study. Overcoming cultural borders in global development across organizations in Western vs. Asian countries by cross-cultural training, strict project control, and common processes and development environment is described in [16]; we can confirm all these mechanisms as important success factors in our global projects, too. Some of the major hypotheses studied in [2] for global development include an extra delay for processing large changes using modification requests (MRs, see definition below) and a significant communication increase once a project becomes distributed. In our environment we claim that both risks do not occur or do not have significant impact: Changes are managed by splitting and assigning them to the affected area of control (i.e. a software subsystem). These “functional areas” are reflected both as a part of the product structure and of the organizational structure, so the change size variances are limited by subsystem borders, see an initial change metrics analysis in chapter 4 which does partly confirm the validity of the results of [2] in our organization. We have established various formal and informal communication means in our global projects which have proven to be effective in our organization; all of these measures are also applied to single-site projects.

The strategies we apply for global projects are similarly covered in [3]. However, we in addition have installed quality control techniques like gate reviews and process metrics and a high degree of process automation. The process set and associated process automation which we also apply in a global environment as well have been described elsewhere [4, 5, 6, 7, 8, 9]. This methodology includes e.g. feature driven development, strict (subsystem scoped) code ownership, iterative weekly timeboxing based delivery, loadbuild and smoke test – similar concepts have been published in [11, 12, 14], respectively. Terms and acronyms. Throughout the remaining of this paper, we use the following acronyms and terms: • NB and SH for Nuernberg / Germany and Shanghai

/ China locations • ADMu as abbreviation for the commercial product

name “Add-Drop Multiplexer Universal” [10]

• R&D functions which we call “disciplines” which can be mapped to CMMI terminology in the following way:

• PjM for Project Management—includes PP, PMC, IPM and SAM/ISM,

• SE for System Engineering—includes both product requirements management REQM/RD and system architecture definition,

• SW for Software Development, • HW for Hardware Development, and • SVT for System Verification Test—includes the

integration of the complete set of product’s software and hardware (related to PI process area) and black-box system testing the integrated system against product and system requirements (related to verification process area; validation is partly covered as well).

• CMMI V1.1 terms, esp. process area acronyms, are used without explanation.

• MR stands for Modification Request and is the basic transaction for changing product code, documents, hardware, process documents, test scripts, etc. An important type of MR is a “defect MR” where defect is used as a common term for both faults and errors.

• Change control is established by a CCB (change control board) which is a cross-functional team to decide on proposed changes for various affected parts of the product. In our global development setting, CCB members are distributed and include delegates from our development suppliers.

• “Project” as a synonym for a “product release” of ADMu.

• “Subsystem” as a logically disjoint partitioning of the software sources of ADMu. A subsystem in our context is a unit of the SW architecture, of configuration and version control, of change control (MR driven privilege ends at subsystem border), and of project organization. Note that in an embedded system, the term “subsystem” is often used differently, e.g. for a certain HW unit with the SW residing on it.

2. Project organization and coordination 2.1. Multi-site project management–principles

Although the formal so-called “product ownership” is by the NB organization, the relationship of both Alcatel-Lucent locations has been based on peer-to-peer partnership from the beginning, not on customer-supplier like treatment. There was a well-organized transition from NB to SH of selected software subsystems (and also some hardware), of infrastructure

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 3: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

(software development environment, test automation environment), of know-how in the product domain and systems architecture), and of development processes. Communication paths have been set up in both directions, by fully including Shanghai teams and colleagues on all levels, i.e. into project management as well as engineering work of all R&D disciplines.

The project management organization at NB and SH consists of highly experienced senior managers, originally expert engineers with large and long global development experience. Within 15 years timeframe the NB organization established successful company-internal collaborations with a large number of sites in the U.S., India, Netherlands, Great Britain, Ireland, Russia, Belgium, France, and other countries.

The cross-location project management team is guided by yearly project objectives aligned within the organization’s senior management. These objectives are refined into personal objectives of each team member and monitored by the respective supervisor. Both individual and team achievements are the basis for incentives at the end of each fiscal year. Objectives cover each planned feature release incl. feature contents, delivery dates, and quality targets like availability of cross-reviewed and baselined process-conformant deliverables, usage of static code analysis tool, test coverage, e.t.c.. 2.2. Project organization

The project organization is highly decentralized between NB and SH, and includes also supplier management for certain components of the product which are developed by selected suppliers or belong to the commercial-off-the-shelf (COTS) category; partly lower operating system layers are covered by COTS components. Although project teams are geographically distributed, the focal point of PjM control is by a weekly alignment meeting with teamleaders of all affected stakeholders: PjM, SE, SW, HW, and SVT. This cross-functional team of representatives from all affected disciplines coordinates (plans & monitors) the development project. A project is initiated by senior executives and by product Management which provides the project’s budget. On lower level there are discipline level sub-project coordination teams, e.g. within software development. Roles & responsibilities are defined unambiguously down to the teamleader level by partitioning the work-breakdown structure (WBS) across NB and SH. Software subsystems (called “software domains”) were split according to available system knowledge, criticality, and staff availability. Feature Teams are formed to assess feasibility and

define the system architecture for new, critical features. Besides this organizational project assignment, each subsystem is a coherent unit of functionality, software configuration, and access control. 2.3. Project dynamics

In our R&D organization, releases within one product line can overlap with 1–2 other releases. The PjM teams of active projects in the same product line may overlap as well. Over time we have “large” projects and “small” projects evolving. The former projects deliver “feature sets” to one or more customers worldwide, mostly including also fixes of unresolved defects from previous releases. The latter are corrective maintenance releases triggered by some customers, fixing few known defects from previous releases.

Development lifecycle of large and small projects can range from 6–9 months and from 1–2 months, respectively. We don’t use software patches: customer provided fixes need thorough regression testing and an official release. All releases supported can be purchased and ordered by any of our customers, large global telecom service providers. 2.4. Supplier management

For 3rd party SW parts developed acc. to our

statement-of-work which includes system requirements and architecture (we do not outsource this strategically essential know-how!), our suppliers are monitored continuously by dedicated supplier managers in R&D. For some OEM SW (and also HW) assets to be included in our product, our senior R&D management supported by our purchasing organization negotiates longer-term joint system evolution and features required for future asset deliveries. Usually corrective maintenance tasks are included in supplier contracts.

For quality control we apply joint reviews of vendor deliveries, based on feature contents, agreed delivery schedule, and well-defined exit criteria like design and code reviews performed. 2.5. Multi-site team building and cross-functional teams

At the beginning of development for the ADMu product family both project teams in NB and SH invested a lot of systematic and sensible team building effort to overcome the existing large differences in culture, geographic and temporal distance, project management style, and development process. We conducted cross-cultural trainings for the Chinese

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 4: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

team on German culture, and vice versa. For collaborating teams we organized joint team building activities initially. There are still frequent cross-site visits if there is a project need to work on a critical activity jointly. There are cross-site sponsor visits: the project manager from NB is visiting the SH team frequently; SH management often also attends face-to-face meetings in NB.

For development of critical new features we often assign two closely cooperating feature owners at NB and SH. We frequently exchange developers between “neighbor subsystems” i.e. closely coupled software pieces. Engineers are sent over to the other site during a project for 1–6 weeks, typically for feature integration. We organize expert visits in critical project phases, e.g. in the beginning of a product release to discuss architecture extensions for new features and at the end for “endgame” support, i.e. key software integrators visit the other location to verify software deliveries.

We have deployed a cross-functional team approach on several levels based on CMMI’s IPPD approach: each R&D discipline (SE, SW, HW, SVT) with representatives from both locations participate in various project activities. This includes a “feature team” to assess feasibility and define the system architecture for new critical features, a central weekly project planning & monitoring meeting, and a joint change control board. Within SW development we have organized a dedicated SW project management team with participants from both sites, similarly as in other disciplines.

We conduct process and product trainings, partly on-site at Shanghai initially. If the “MR handling self-discipline” or resulting process compliance (checked by quality gate reviews for each software subsystem evolution in a project release [7,8,9]) revealed issues of insufficient process discipline or non-uniform process deployment we provided “refresh” courses, often conducted by local quality management at SH – behaving rather pro-actively on all kinds of quality issues and improvements. The split NB-SH of the system into subsystems is accompanied by a partitioning of all project functions into clearly defined, disjoint and detailed roles & responsibilities based on generic role definitions in our organizational process set (SDP, see below). E.g. for a certain software subsystem, it is not only well-defined which site owns it and who is the teamleader, but also roles within the software process like “domain (subsystem) architect” and “feature integrator” are assigned, documented on the project’s website, and communicated to the project team.

Overall we can claim that our cross-location relationship is based on mutual trust, joint problem

solving, and constructive risk management avoiding the “not invented here” syndrome. Still it took sometimes 6–12 months until a software area under transition was fully operational, i.e. could produce deliverables with sufficient speed (or rather effectiveness) and quality. Even then the productivity of senior engineers having 10 or more years of experience cannot be reached fully in such a short timeframe due to complexity of both software and hardware architecture, telecom standards for e.g. transmission protocols, etc..

After delivering a major product release PjM is coordinating retrospective analysis (lessons learned) sessions across the various R&D Teams at SH and NB. Resulting aligned improvements have lead to optimized processes and more effective project execution for the subsequent releases.

We have also introduced strong guidance/training by experienced “expatriates”, i.e. key engineers who stay at the other location for six month up to several years, e.g. from the SE discipline. For day-to-day engineering work, we follow a “trust but verify” approach, see section 4.2.

3. Software management 3.1. Product decomposition and configuration management

The WBS split is clearly defined between the

locations with strict tool-enforced access control, see below for details. Requirements & architecture work was originally centralized at NB. Meanwhile, after gaining a sufficiently high experience level, also SH engineers contribute here to a significant extent. Software subsystems (so-called software domains) are partitioned, i.e. exactly one sub-team at either location owns a complete subsystem incl. associated domain-level architecture, design, source code, and unit test documentation. This concept has proven to minimize coupling and communication overhead across locations.

Strict access control to source code is applied on domain level – one of the key success factors: Each domain is mapped onto a Clearcase (TM by IBM-Rational) Versioned Object Base (VOB). Only members of the owning team are allowed to check-out / check-in files there based on CCB controlled, assigned MRs to this particular team. All VOBs for a product like ADMu constitute a distributed and replicated database: each VOB has a master site such that changes are allowed on any site, but weekly software deliveries by a sub-team onto the so-called “main branch” are only possiblE from the owning site

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 5: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

and by members of the team owning the respective VOB. File replicas residing on all other participating sites are kept in sync by specific protocols of ClearCase’s “multi-site” capability.

Only engineers belonging to the SW sub-team owning the particular VOB are allowed to provide changes to files in this VOB – a strict form of code ownership which turned out to make planned changes (feature driven) and bugfixes less error-prone, thus leading to effective PjM control of all product changes. For other kinds of code ownership see [11]. By this concept we avoid ineffective multi-site MRs [2] – all assigned change actions can usually be performed single-site. In rare cases however, logical dependencies across multiple related changes lead to some cross-site coordination. Overall our approach to use VOBs as strict software configuration control unit has proven to ensure the best resulting product quality and efficiency. 3.2. Agile techniques for software code control and software release Our development model in the late 90’ies had an XP-like code ownership, i.e. any developer could update any system part, either to implement a certain feature or to resolve complex defects. At that times projects suffered from many broken loads, i.e. subsequent smoke testing failed quite often, the feature contents of a particular load was not clearly defined and had to be “discovered” by integration testing. We had daily build cycles; lot of effort was wasted with resolving wrong code dependencies and link errors for what was called politely “release coordination”. In 2001 we changed the integration paradigm for all of our products being developed using weekly official builds with “varying” but planned feature and bugfix contents based on timeboxing approach similar to [13]. Similarly to the feature-driven development (FDD) process model [12] we use incremental feature development within each SW sub-team (synchronized by the SW PjM team) and lifecycle phase “feature integration” prior to SW deliveries towards SVT.

Software release coordination since is accomplished on PjM level – all SW teamleaders from NB and SH run a weekly virtual meeting for planning the release contents, mapping feature (partial or full) deliveries to weekly loads in alignment with SVT and monitoring progress using a risk management based approach. Most publications following the agile paradigm (see e.g. [14]) propagate daily loadbuilds and smoke test. According our experience, however, this is not feasible for complex embedded systems with various levels of concurrency, hardware/software dependencies etc. –

we need an integration cycle of one week which works very well.

Coordination of planned (feature based, architectural enhancements etc.) and unplanned changes (defects found, to be fixed) is accomplished by the weekly SW PjM meeting and the SW-specific CCB (for coordination of fixes only). There we decide which team / subsystem has to provide which update, as kind of sub-transaction from a created defect MR, and in which weekly iteration (physically: load) this change shall be delivered. Defect localization precedes this decision. For complex defects or defects affecting complex system parts an investigation is requested from a key engineer. This may include an impact analysis w.r.t. affected features, fix effort, discussion of alternate solutions, etc.. Once the GA (customer delivery) date is approaching the CCB meets daily, and only “by invitation” controlled fixes are allowed to be integrated into the main load line – a mechanism which is quite successful in converging software stability – we call this sometimes “gatekeeping”. We have established a software stability metric based on amount of source files touched and size of code changes per load used to assess (among some other process metrics like amount of open high-severity defects) the software product’s “GA readiness”. Note that software change control is enforced by the product’s CCB, i.e. an engineer cannot simply assign a defect MR to him-/herself – only the cross discipline CCB team has sufficient overview of the whole system to assess which other parts, documents, etc. may be affected by a “simple” code change and need separate “sub-MRs” to perform the change, such that overall a fix mostly sustains integrity across the code base and associated documentation. 3.3. Collaboration tools for multi-site development in different time zones

The communication media we apply include

• audio phone bridges plus MS NetMeeting™ for virtual meetings;

• project websites as central communication instance for project organization, project status reports, access to document repositories, etc.;

• email notifications to all project members for project status updates, project meetings, review invitations, etc.;

• dedicated email notifications to specific persons for special events, e.g. to all critical reviewers to provide review comments, to engineers having an MR assigned which is overdue, etc.;

• notification and CCB monitoring and re-scheduling of overdue MRs for more effective change control;

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 6: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

• notification of new document versions from central web-based document repository;

• tightly coupled collaboration tools applied over cross-site software teams, e.g. VNC is used for multi-site concurrent debugging and testing and other kinds of remote execution control. Are you missing sophisticated groupware tools in

this list? We don’t – fine-granular “concurrent engineering” is not seen as an effective way of working. By the subsystem based split of work products and responsibility typically a local team within one location can arrange for sequentialization of activities on shared work products like code or design documents.

The temporal distance of 6–7 (summer/winter) hours between SH and NB rarely provides any problems. We rather often can benefit from an overall working time of 15–16 hours per day e.g. for defect localization and defect removal. Virtual alignment calls or working meetings are focused on morning / afternoon timeframe at NB and SH, respectively. The time window for direct communication is often being extended without being requested by project management: Many colleagues adapt their working hours on their own initiative and decision, i.e. many NB colleagues start much earlier than 8a.m. local time, whereas SH colleagues could be reached at their office (or via remote access from their homes) much beyond

5p.m. local time – this applies esp. in critical endgame situations close to the GA milestone. 4. Organizational processes, quality control, and supporting software tools

Overall control of development projects is accomplished by applying our corporate product lifecycle management process in terms of quality gates guarded by senior management for a project’s entry and exit conditions. The lifecycle process tracks budget availability, technical feasibility of the committed feature set between PM and R&D, and planning and availability of resources. R&D quality gates check fulfillment of criteria like number of open high severity MRs and SVT completion. Now we provide an overview of the development process applied between a project’s entry and exit milestones. 4.1. Development processes supporting global development

Before official project start an alignment of new features per release is accomplished based on value-added methodology. The alignment is done between product management, development project management, and R&D disciplines SE, SW (software

Product Management

Network Validation

System Test

Systems Engineering

Is verified or validated by Is needed as further V&V input

Customer requirements

Validated comm. network

Product (system &SW) requirements & architecture

Verified system

SW/HW Architecture / Design documents

Verified HW/SWsubsystems

SW/HW Architecture /Design

SW/HW Integration

Main work products

Discipline (org. function)

Legend:

Figure 1: Development lifecycle (overview, largely simplified)

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 7: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

architecture sub-team), HW, and SVT leading to a mutually committed and stable feature set at beginning of work for a release, and to low feature churn (for ADMu: << 10%), see Figure 1 for an overview of our development lifecycle. Development is based on the Standard Development Process (SDP), evolving in our organization since 1993 [7]. The SDP is based on a compact and rigorous semi-graphical notation, see details in [7]. It has been defined largely in a bottom-up fashion from best-practice of key engineers. Standard process models like the German V-Model or RUP did not exist yet at that time, although there could be alternate ways of describing our processes – but their contents is unique and highly adapted to our way-of-working and known best practices. The SDP has been appraised to fully satisfy CMMI maturity level 3 process areas like OPD, OPF, RSKM, etc. For all products owned at our optical development organization at NB the SDP is uniformly deployed, incl. collaborations with several other company locations and several offshoring development suppliers. There is a process improvement program at our R&D organization towards CMMI maturity level 3, leading to well-defined and stable set of processes, stakeholder communication and alignment, enforcement of certain QA rules, etc.

As detailed in [7] the SDP is structured into engineering sub-processes for each R&D discipline, and into supporting sub-processes for PjM, CM, RSKM, PPQA, etc. For the SW sub-process it has been evident in our company-internal collaborations that both the process prescriptions as well as their support, e.g. by our software production environment, lead to very effective SW development. Some indications are: • An agile paradigm followed where feasible:

incremental, timeboxing based weekly deliveries and load builds, minimal amount of necessary design documentation, e.g. to pre-align software subsystem interfaces prior to coding.

• Centralized load-build and software release management. Whereas weekly SW increments for each subsystem from various sub-teams can be developed and delivered from any location, official loadbuild and deliveries to SVT (for testing new features, regression test of previously released features, and re-test of defects fixed by SVT) are controlled by a central software integration team.

• The contents of a team’s incremental delivery (expert information: ClearCase label) is well-defined by the list of associated MRs which have been used to perform the changes w.r.t. planned features and possibly also some bugfixes. Thus, the main delivery information to SVT can be derived automatically,

contributing to safe planning and monitoring of test execution for both the SW and SVT teams.

4.2. Change management, reviews, and how global development is supported

As one essential principle of change management, we enforce that all changes to code, product, project, and process documents, to business-critical tools, and to hardware units are MR controlled. There is a uniform MR system (i.e. change control toolset) for all artifacts listed above. MR handling is based on an optimized change management process: there are dedicated MR lifecycles for entities of SW, HW, documents, and tools. An MR’s state-transition model per such lifecycle is structured into phases acc. the corresponding engineering sub-process of the SDP. There are clear and tool-enforced responsibilities and privileges per MR state. Process & tool evolution are supported by a dedicated process engineering team. We apply a multi-site toolset for MR control (commercial core tool, plus high amount of customization to our MR lifecycle models) based on a replicated and synchronized MR database per major development location.

Change control is rigorous and enforced. Cross-artifact change integrity on a product basis is managed by a change control board (CCB). All R&D disciplines from the various locations participate in the CCB as stakeholders. Changes to dedicated artifacts per discipline are controlled by sub-MRs called “spawns”. Communication of MR changes (and associated SW changes) is done via an email-based notification to all stakeholders. There is also full traceability in MR history logs i.e. who has done what MR-triggered update, and when. This MR system feature relies on a tool bridge to the software and document repository (IBM-Rational ClearCase). Verification of MR-controlled changes is accomplished by a dedicated expert “verifier” role.

A recent CMMI appraisal confirmed the strength of our change control approach as one of the best in the whole company (i.e. at Lucent Technologies, before the Merger with Alcatel). Review process. In addition to MR control, for new and significantly changed artifacts, we follow an effective technical review / inspections process based on a proprietary web-based review control toolset called “Quality DataBase (QDB)” [6]. The review process for engineering artifacts is applied for verification of software code and for product documentation provided by all R&D disciplines, i.e. system requirements and architecture from SE Team,

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 8: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

software architecture, design, code, test plans by SW Team, and test plans by and automation testware by SVT Team. Also HW changes are MR controlled, but this is out-of-scope of our paper. For less critical and/or smaller changes to artifacts, effective asynchronous “desk reviews” are performed – no need for a meeting. We do have measurements of review effectiveness, so we know how effectively desk reviews are applied in our organization. Quality coordinators per discipline have to assure that invited so-called “critical reviewers” really contribute also to desk reviews – not easy to achieve in a multi-site environment and in project pressure situations. Review invitations are sent one week ahead to the complete R&D project team, so that also experts not invited can provide comments to the artifact under review. QDB is applied extensively throughout our Optical product division: around 3500 reviews have been performed since 2001 using QDB; this tool is popular at some other corporate locations as well.

We also apply a special review process for R&D internal quality gates, leading to high process compliance and ultimately to better product quality – a statically proven evidence in our organization [6, 7]. In contrast to audits performed usually “after-the-fact” the findings in gate reviews can be used in-process with corrective actions to push the actual project back in line with the quality targets. We will soon have extended our toolset being capable of measuring compliance to our defined SDP sub-process quantitatively. This concept is based on using the activity structure of any such sub-process also as checklist for quality gates [8,9].

Applying the SDP for the ADMu team at SH has resulted in a high degree of acceptance – meanwhile SH management decided for another product to make the transition from their local development process to a tailored, lean version of the SDP. 5. How effective is our global development approach?

We have a stable process-driven development

environment for ADMu and other products owned at the NB site, and we have introduced and are still evolving a set of product and process metrics in a measurement database. For an overview of supported metrics, see [17]. These metrics support early warnings of critical project situations w.r.t. progress and quality issues, and detailed analyses on team level (organizational view) and subsystem level (architectural view) within a running project and also between different projects of the same product line. By the well-defined split between locations we can

perform various comparisons and have insights on the dynamics of global development. How to assess the effectiveness of our global development approach? There are several qualitative and quantitative indications of our effectiveness. First there is the capability of “late feature adders” after project start to better serve a highly dynamic market: Here the iterative feature development approach and a continuous support by a SW architecture team allow pretty high flexibility. Note that in a typical project, the balance is quite positive – we manage to add significantly more features after project start than we have to drop because of e.g. lowered market priority or changes to other business conditions.

We have also the capability of very fast load-build cycle. In case of emergency fixes for customer deliveries, we can produce a full load (several dozen MB of size!) within 24 hrs. Load quality is sufficiently high – we use a load stability & quality metric, stating that for a typical ADMu SW load less than 5% of fixes inject new bugs, and in the average 10% of new defects (of the number fixed in any load) are found downstream.

We face very few critical & major defects reported by our various customers worldwide as described in section 1.

As a partial view of productivity per site in the ADMu project we made an initial effort to check if some of the hypothesis in [2] on defect correction effectiveness can be verified in our environment. We compared data collected for a large 2006 release ‘A’ of ADMu with data for a release ‘B’ of another product of our organization; the forthcoming book section [9] will have details of the defect measurements on this other product. We assume that we can compare ‘B’ to the 2006 ADMu release: • Both products with their associated releases are in

the same application domain; • Both use the same development process “SDP”

with rather small degree of variation; • Our NB engineers have very similar skills and

domain and system knowledge; many have been working on the other product over the products’ lifetimes;

• Release ‘B’ does apply offshoring only to a small extent, i.e. we can compare the ‘local’ release B to the ‘global’ release A

So the following comparative analysis should be valid: • The duration between MR assignment and

delivering a bugfix is roughly 50% higher in the global project ‘A’ vs. the local project ‘B’. 25% (75%) quartiles of the duration’s distribution are 1

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 9: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

(8) for ‘A’ and 0.5 (5) for ‘B’, respectively. (Note that after the successor release of ‘B’ was partly offshored to India the 75% quartile of bugfix duration increased then from 5 to 11!) However, the validity of this data w.r.t. fix effectiveness is unclear – part of the engineers are to some extent of their time often assigned to other projects as well, and we have not analyzed the effort split yet. The priority of bugfixing vs. feature development can vary between ‘A’ and ‘B’. So the correlation of MR processing intervals and effort spent in our organization is yet unexplored

• We can confirm the observation in another case study [2] that fix duration does not correlate with number of touched files or with changed/added LOC at all. We determined a Pearman correlation coefficient of 0.07 and 0.08 respectively, and then did not pursue any deeper statistical analysis

Table 1: Software defect distribution by project, location, and lifecycle phase detected Project / Site

SW Sub-teams

SW Inte-gration

System Test

Customerfound

A / NB 26.0 29.5 44.5 < 0.1 A / SH 27.4 35.1 36.3 1.2 B / NB 17.7 18.2 59.7 4.4

• Defect distribution by phase detected in ‘A’ vs. ‘B’:

Table 1 depicts a classification of defect MRs by phase where a defect has been detected (valid information in our organization: main MR fields are CCB controlled and corrected if needed) and which location owns the particular subsystem to be fixed. Using project knowledge we can offer interesting interpretations and insights, some of which may need a more in-depth analysis of our data: The amount of defects found by SW sub-teams based on unit testing and feature integration is roughly identical across sites of global project ‘A’. The SH team even outperforms the NB team in early defect finding – 8% more defects detected before delivery towards SVT. The local project ‘B’ finds a much lower quantity of defects before SVT. We know of higher regression test automation in ‘B’, but still have to analyze the cause for this observation.

6. Summary: Lessons learned

The success in planning, organizing, developing, and delivering the ADMu product is based on a combination of several factors. Each single factor contributed to it (although we have not analyzed quantitatively the impact of each factor):

• Very experienced project management team, large experience with various in-house and vendor collaborations worldwide in various geographic areas and cultures;

• Well defined and communicated roles & responsibilities on all levels of the project, especially WBS split clearly defined between the locations, with strict access control per area of responsibility and therefore full ownership;

• Rigorous project planning and monitoring including all stakeholders based upon fully deployed corporate product lifecycle management process;

• Project culture based on stakeholder alignment (cross-functional team concept) and verification where needed, esp. for critical changes. This includes not only cross-location stakeholders, but also management of dependencies between software and hardware deliveries for system / software / hardware integration and SVT activities – a much tougher problem with typically more complex contents and schedule dependencies than in software-only systems!

• Alignment based feature management process, leading to small or manageable feature churn;

• Mature and effective organizational development process maintained and deployed by dedicated Process Teams;

• Quality control by effective tool supported technical reviews and quality gate reviews, and by continuous collection and analysis of several key project and process metrics;

• Very efficient change control process optimized to support major process phases and steps leading to full traceability of each change needed – customer requirements, system requirements, R&D code and document artifacts, SVT testcases, and MRs which are bi-directionally traceable;

• Multi-site capabilities of development environment leading to “location transparency”, incl. tools for configuration and change management, documentation repository, SW delivery, and review support;

• Initial and continuous multi-location team building and team communication activities, “One-Team” approach;

• Established and maintained good personal relationships on many levels in the organization despite of the challenging temporal, geographical, and cultural barriers.

We are proud of having achieved the “One Team”

approach: a recent CMMI (SCAMPI A) appraisal confirmed that process deployment and artifacts where

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007

Page 10: [IEEE International Conference on Global Software Engineering (ICGSE 2007) - Munich, Germany (2007.08.27-2007.08.30)] International Conference on Global Software Engineering (ICGSE

largely indistinguishable, whether those have been created at Shanghai or Nuernberg and covering all disciplines (PjM, SE, SW, HW, and SVT).

After merging into our new company Alcatel-Lucent, we look forward to intensify our global development experience with many new colleagues and teams in various countries worldwide! ACKNOWLEDGMENT

The expert contributions and constructive comments of our colleagues Helmut Merz, Herbert Hoess, Chen Hong, and Ludwig Bayer are greatly appreciated. 7. References [1] Christof Ebert and Philip de Neve, “Surviving Global Software Development”, IEEE Software 18, 2 (Mar./Apr. 2001), pp. 62-69 [2] James D. Herbsleb and Audris Mockus, “An Empirical Study of Speed and Communication in Globally Distributed Software Development”, IEEE Trans. Software Engineering 29, 6 (June 2003), pp. 481-494 [3] Gwanhoo Lee, William Delone, and Alberto Espinosa, “Ambidextrous Coping Strategies in Globally Distributed SW Development Projects”, Comm. of the ACM 49, 10 (Oct. 2006), pp. 35-40 [4] Marek Leszak, Dewayne E. Perry and Dieter Stoll, “A Case Study in Root Cause Defect Analysis”, Proc. IEEE International Conf. on Software Engineering (ICSE-22), Limerick/Ireland, June 2000 [5] Marek Leszak, Dewayne E. Perry and Dieter Stoll, “Classification and Evaluation of Defects in a Project Retrospective”, Journal of Systems and Software, 61/2002, pp. 173-187 [6] Marek Leszak and Walter Kammerer, “QDB - a Flexible Environment for Process and Quality Management”, Proc. of CONQUEST-2002, Nuernberg / Germany, Sept. 2002 [7] Marek Leszak, “Process Modeling and Quality Control for Embedded Telecommunication Systems”, Proc. of IEE

Workshop on Process Modeling and Simulation (ProSim), Edinburgh, May 2004 [8] Matthias Ruffler, Marek Leszak, “Software Quality Assessment – A Tool-Supported Model”, Proc. Int. Workshop on Software Measurement (IWSM/Metrikon2006), Potsdam / Germany, Nov. 2006 [9] Marek Leszak and Dieter Stoll, “Quality gates – a case study”, Chapter in book Best Practices in Software Measurement, 2nd ed., Christof Ebert and Reiner Dumke (editors), Springer 2007 (to appear) [10] Alcatel-Lucent, “Add-Drop Multiplexer Universal (ADMu) product - white paper” http://www.alcatel-lucent.com/wps/portal/products [11] M.E. Nordberg, “Managing Code Ownership”, IEEE Software, March/April 2003, pp. 26-33 [12] S. Palmer and J. Felsing, “A Practical Guide to Feature-Driven Development”, Prentice Hall, 2002 http://www.featuredrivendevelopment.com (accessed in May 2007) [13] Pankaj Jalote et al: Timeboxing, “A Process Model for Iterative Software Development”, http://www.cse.iitk.ac.in/users/jalote/papers/Timeboxing.pdf (accessed in May 2007) [14] Steve McConnell, “Best Practices - Daily Build and Smoke Test”, IEEE Software, Vol. 13, 4(July) 1996 [15] Quest Forum, “TL9000 quality management system”, http://www.questforum.org/tl9000/tl9000.htm [16] S. Krishna, Sundeep Sahay, and Geoff Walsham, “Managing cross-cultural issues in global software outsourcing”, Comm. of the ACM 47 4(April 2004), pp. 62-66 [17] Martin Kunz, Marek Leszak, Rene Braungarten, and Reiner Dumke, “Design of an Integrated Measurement Database for Telecom Systems Development”, Proc. Int. Workshop on Software Measurement (IWSM/ Metrikon2006), Potsdam / Germany, Nov. 2006

International Conference on Global Software Engineering(ICGSE 2007)0-7695-2920-8/07 $25.00 © 2007