101
CERN-THESIS-2014-086 21/07/2014 Master-Thesis Karlsruhe, 13.01.2014 Der Vorsitzende des Prüfungsausschusses Prof. Dr. Ditzinger Fakultät für Informatik und Wirtschaftsinformatik Name: Thema: Yves Fischer Monitoring and Diagnostics for C/C++ Real-Time Applications Arbeitsplatz: CERN, Genf Referent: Korreferent: Abgabetermin: Prof. Dr. Fuchß Prof. Dr. Hoffmann 12.07.2014

Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

CER

N-T

HES

IS-2

014-

086

21/0

7/20

14

Master-Thesis Karlsruhe, 13.01.2014 Der Vorsitzende desPrüfungsausschusses Prof. Dr. Ditzinger

Faku

ltät f

ür In

form

atik

und

Wirt

scha

ftsin

form

atik

Name:

Thema:

Yves Fischer

Monitoring and Diagnostics for C/C++ Real-TimeApplications

Arbeitsplatz: CERN, Genf

Referent:

Korreferent:

Abgabetermin:

Prof. Dr. Fuchß

Prof. Dr. Hoffmann

12.07.2014

Page 2: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Eidesstattliche Erklärung Statutory Declaration

Ich versichere alle verwendeten Quellen angege- I hereby declare that no other person's work has

ben zu haben. been used without due reference. The german ver-

Alle übernommenen Textzeilen, ganze Textpassag- sion of this statutory is authoritative.

en, Tabellen oder Bilder sind mit Quelle angege-

ben. Dies gilt unabhängig davon ob die Quelle ein

Buch oder eine Veröffentlichung im Internet ist.

Auch eine direkte Übersetzung eines fremdspra-

chigen Dokuments ist mit Quellenangabe verse-

hen.

Die deutsche Version dieser Erklärung ist bindend.

Prévessin, 12th of July 2014

Yves Johannes Wolfgang Fischer

i

Page 3: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Acknowledgements

First I would like to thank my supervisor at CERN, Felix Ehm, for his support and helpfulguidance. His advice, expertise and understanding added considerably to my graduateexperience.

I would like to express my gratitude to professor Thomas Fuchß, as he provided me withmany great points to include and gave me advice whenever it was required.

I would also like to thank Stephen Page, who proofread my text and provided me withhelpful comments and suggestions.

ii

Page 4: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Abstract

Knowledge about the internal state of computational processes is essential for problemdiagnostics as well as for constant monitoring and pre-failure recognition. The CMX li-brary provides monitoring capabilities similiar to the JavaManagement Extensions (JMX)for C and C++ applications.

This thesis provides a detailed analysis of the requirements for monitoring and diagnos-tics of the C/C++ processes at CERN.

The developed CMX library enables real-time C/C++ processes to expose values with-out harming their normal execution. CMX is portable and can be integrated in differentmonitoring architectures.

Page 5: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Overview of CERN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Monitoring of C/C++ Systems 52.1 Technical Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Requirements 103.1 Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Technical Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Existing Technologies and Solutions 164.1 Monitoring Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Logging Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Interprocess Communications . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3.1 Possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 Existing Software Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Design of CMX Protocol and Data Structures 285.1 Design of CMX Data Structures . . . . . . . . . . . . . . . . . . . . . . . . 285.2 Shared Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.3 Design of CMX Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.3.1 Real-Time Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 355.3.2 Concurrent Access to Shared Memory . . . . . . . . . . . . . . . . 365.3.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 Comparison with Similar Algorithms . . . . . . . . . . . . . . . . . . . . . 45

iv

Page 6: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5.5 Verification with Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.5.1 Simple Example of a Promela Model . . . . . . . . . . . . . . . . . 485.5.2 Model of Two Writers . . . . . . . . . . . . . . . . . . . . . . . . . 495.5.3 Model of Concurrent Reader/Writer . . . . . . . . . . . . . . . . . 50

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Implementation of CMX 536.1 Platform and Toolchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.1.1 Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.1.2 Atomicity of Operations . . . . . . . . . . . . . . . . . . . . . . . . 556.1.3 Processor Memory Consistency . . . . . . . . . . . . . . . . . . . . 586.1.4 Processor Cache Coherency . . . . . . . . . . . . . . . . . . . . . . 646.1.5 Choosing a Suitable Timesource . . . . . . . . . . . . . . . . . . . 64

6.2 Implementation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.2.1 The Implementation in C . . . . . . . . . . . . . . . . . . . . . . . 666.2.2 The C++ API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.2.3 Independent Usage of CMX . . . . . . . . . . . . . . . . . . . . . . 706.2.4 Real-Time Compatibility . . . . . . . . . . . . . . . . . . . . . . . . 706.2.5 Automated Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.2.6 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 756.2.7 Possible Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7 Integration in CERN Infrastructure 797.1 A Remote Agent for CMX . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.1.1 Diagnostic Access in the DIAMON GUI . . . . . . . . . . . . . . . 807.1.2 Monitoring of CMX Enabled Applications in DIAMON . . . . . . . 80

7.2 Interaction of CMX with Build Tools . . . . . . . . . . . . . . . . . . . . . 827.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8 Summary 86

Literature 87

Glossary 91

List of Definitions and Requirements 92

List of Figures 93

List of Tables 95

v

Page 7: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

1 Introduction

High system availability is essential for successfully operating a large industrial facility.For this reason it is important to identify sources of errors and potential problems as earlyas possible. In the field of computing, system and application monitoring is applied tofulfill this task.

This work describes the implementation of application monitoring and diagnostic toolsthat are suitable for real-time applications, such as the ones which are used in CERN'saccelerator control system.

1.1 Motivation

Large installations like particle accelerators or industrial sites are expensive in construc-tion and operation. The cost for building the LHC accelerator was about 6 billion CHF.The experiments which depend on the correct functioning of the accelerator are fundedindependently. The material costs for the ATLAS experiment were 540 million CHF [1,p. 17].

The only time frame in which this investment pays back is when everything is workingcorrectly and, in case of the LHC, collisions can be delivered to the experiments. Thecondition of a proper operating accelerator depends on the reliability of many smaller orbigger hard- and software components.

The BE-CO group, where this work was carried out, is responsible for a large part of theaccelerator controls software. Naturally our primary goal is to provide reliable, fault-tolerant software and - in case of unforeseen events - response times as short as possible.

Monitoring plays a critical role in early recognition of possible error conditions and fastidentification of problem sources. The monitoring system constantly watches about 2,000machines and applies many rules to detect problems.

1

Page 8: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

1 Introduction

Monitoring is always limited to what developers consider worth being monitored. Hence,enabling developers to expose metrics easily from within their application in a suitableand standardized way is a key factor for success.

We failed to find any existing solution in this area for C/C++ applications that fulfills ourrequirements to a large extent and is at the same time compatible with the existing mon-itoring and diagnostic system. This was the initial reason to develop a new monitoringand diagnostics library for C/C++, called CMX, at CERN.

1.2 Overview of CERN

This project is carried out at CERN in Geneva, where physicists and engineers are re-searching the fundamental structure of the universe. Founded in 1954, the CERN labora-tory is one of Europe's first joint ventures and now has 21 member states.

Figure 1.1: CERN Accelerator Complex [2]

Today CERN hosts many particle physic experiments. The biggest and most well knownis the particle accelerator LHC and the detectors ATLAS and CMS, most known for thediscovery of the Higgs-Boson.

A bunch of particles in the LHC, that collide in one of the detectors, have gone through acascade of increasingly powerful accelerators (Fig. 1.1) to reach the speed of 0.999 999 991

2

Page 9: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

1 Introduction

(a) “The observed probability (local p-value) thatthe background-only hypothesis would yield thesame ormore events as are seen in the CMS data,as a function of the SM Higgs boson mass forthe five channels considered. The solid black lineshows the combined local p-value for all chan-nels.”

(b) “Event recorded with the CMS detector in 2012at a proton-proton centre-of-mass energy of 8TeV. The event shows characteristics expectedfrom the decay of the SMHiggs boson to a pair ofphotons (dashed yellow lines and green towers).The event could also be due to known standardmodel background processes”

Figure 1.2: Pictures related to the discovery of the Higgs Boson, CMS Collaboration [3]

times the speed of light. Moreover, at the time of collision, every proton has reached atop energy of 3.5 TeV. Compared to energy emissions in the real-world, these energiesare still low. However, in the LHC they are so heavily concentrated in space like nowhereelse. A complete beam, in the LHC two beams circulating in opposite directions, contains:2 · 3.5 TeV · (1.1 · 1011 particles) · (2808 Bunches) ≃ 350 MJ, which is about as energeticas a 400t train, such as the French TGV, travelling at 150 km/h [1].

Colliding inside one of the four detectors, the protons or lead-ions produce sub-atomicparticles. Particle detectors use different devices to identify these particles. For instancetheir path in a magnet field is measured and tracked. Calorimeters finally stop someof the particles and measure their energy. This and additional measurement techniquesallow physicists to identify events when unusual particles, that may fit into establishedor new theories, appear.

Fig. 1.2 shows two graphics taken from the paper about “a new boson at amass of 125 GeVwith the CMS experiment at the LHC” [3] by the CMS-Collaboration. The plot on theleft shows the probability for a subatomic particle with the characteristics of the Higgs-Boson. The illustration on right is a visualization of a particle collisionwith characteristicsof a Higgs-Boson decay.

3

Page 10: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

1 Introduction

1.3 Structure of this Thesis

The following chapter 2 is about monitoring in general and the technical environment atCERN. The usage-scenario for a C/C++ monitoring solution will be defined.

Chapter 3 defines requirements for monitoring and diagnostics in C/C++.

Chapter 4 discuss basic technology decisions and existing software solutions.

Chapter 5 describes the technical design of the designated solution in detail.

Chapter 6 focuses on important points of the implementation process.

Chapter 7 is about the way of extending the existing monitoring and diagnostics systemwith C/C++ monitoring capabilities provided by CMX.

The final chapter 8 gives an overall résumé. More specific conclusions can be found atthe end of chapters 4, 5, 6 and 7.

4

Page 11: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

2 Monitoring of C/C++ Systems

CERN's accelerator control system is essential for operating the accelerators (Fig. 1.1);hence its availability, performance and correct functioning is essential. In the BeamsDepartment (BE) of CERN, the Controls Group (BE-CO) [4] is responsible for the speci-fication, design, procurement, integration, installation, commissioning and operation ofthe controls infrastructure for all CERN accelerators, their transfer lines and the experi-mental areas.

The controls group provides general services like front-end software framework (FESA),general machine and beam-synchronous timing generation and distribution, signal ob-servation system, communication middleware, surveillance and monitoring (DIAMON),alarms, general logging facilities and data management.

Device

Busin

ess

Pre

senta

tion

monitoring

logging

monitoringmessaging

operator consoles

databases

protection

configuration

Acccess Control

Controls Middleware (CORBA/zmq)

Acccess Control

JMS / Java RMI / JDBC / HTTP

status displays012

3 4 56

78

9

magnets cooling

Figure 2.1: CERN Accelerator Control System

5

Page 12: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

2 Monitoring of C/C++ Systems

2.1 Technical Environment

In terms of software, the accelerator control system is comprised of approximately 3,500applications written in Java, C and C++, it is composed of three layers (Fig. 2.1):

• Presentation layer hosts the graphical control interfaces, status displays and theoperator consoles.

• Business layer general services including: Logging, Monitoring, Messaging, Config-uration Management.

• Device layer time-critical control software, mostly written using the FESA Frame-work.

While the Presentation and Business layer applications are running mostly on Java, thetiming critical applications in the device layer are written in C or C++.

In terms of numbers, most C++ applications are written using the FESA Framework. FESAprovides an object oriented description of equipment with standardized basic functional-ity such as real-time event handling, standardized interface to device properties, logging,testing and simulation as well as the necessary tooling for code generation and integra-tion into the Eclipse IDE.

Host infrastracture The C/C++ software runs mostly on Intel x86 and x86-64 basedHardware but some also on very old PowerPC based hardware using the LynxOS op-erating system. The internal support for the PowerPC based hardware is slowly beingabandoned and replaced by Intel x86-64 systems. This process will not be finished beforethe next long shutdown (LS2, around 2018). The newer 32- or 64-bit Intel systems arerunning Scientific Linux (SLC) 4/32-bit, 5/32-bit and 6/64-bit.

It is unlikely that software which is currently running on PowerPC will be enhanced andre-compiled only for integrating new minor features but without moving to the newersystems at the same time. Supporting PowerPC requires that new written software mustcompile cleanly with the old gcc version 2.95 from the LynxOS Toolchain and also at thesame time with gcc-4.1 (SLC5) and gcc-4.4 (SLC6).

The C++ Toolbox In terms of application monitoring and diagnostic remote-access theC/C++ landscape is a rather fragmented, complicated, and incomplete area, compared tothe Java Platform.

6

Page 13: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

2 Monitoring of C/C++ Systems

The current operational C++ Toolbox consists of:

• Diagnostics: Different programming framework and task specific tools

• Debugging: Trigger core dump and/or attaching gdb

• Post-mortem: Analyze core dump with gdb

• Info/Warnings: Centralized logging

• Configuration Management: Centralized framework for tracing messages

In many cases logging is reduced to minimum to avoid performance degradation whichaffects the real-time constraints.

For monitoring process' health there are currently only simple process existence checks,and regular probing of functionality from the outside (e.g. application specific tests forthe process does what it is supposed to do) as well as a manual core dump if a problem issuspected.

There is currently neither a standardized nor easy way to monitor a specific value frominside a C/C++ process. However, like in Java, there is a need for monitoring in C++.

2.2 Motivation

We claim that monitoring is even more critical for C/C++ than Java, since C and C++are commonly considered as low-level programming languages. Generally applicationswritten in low-level languages are more error-prone, more difficult to verify automaticallyand have a more complex build process, hence they need more attention than similar Javaapplications during development and testing phase.

But not all problems can be identified before finishing testing. It happens that someproblems are identified very late, that means during productive run-time. However, in allthese stages, there are no appropriate, simple to use tools for monitoring and diagnosisavailable to developers.

Experience from real world operation revealed for example a serious problem: Softwarecan be updated during working times and may introduce faulty code. Although it maywork initially, it eventually stops somewhen later, e.g. as an internal message queue hasfilled up or a counter overflowed.

7

Page 14: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

2 Monitoring of C/C++ Systems

The original author did not have a chance to identify this issue until the program breaksin the night. No monitoring system had a clue about the approaching problem, the op-erational responsible of the service is not automatically alerted about the situation.

The consequence: Machine operators have to identify the source of the problem them-selves, then experts have to be called in and the resolution of the situation takes muchlonger than during the day. The experts may have very little information on C++ pro-grams and often no possibilities to inspect a running instance of the program from re-mote. Entirely crashed programs may, depending on their setting, write a core dump oftheir last state which can be analyzed. In any way, important historical data of internalmetrics, which help solving the source of the problem, is not always available.

Next steps Improving this situation for the CERN BE-CO environment is constrainedin several ways. In the following chapters we will define the targets and requirementsfor a possible solution. We will look at existing ideas and solutions and elaborate theirsuitability also in respect to environment constraints and especially their compatibilitywith real-time controls applications.

2.3 Related Work

In the following chapters there are references to related work. This section provides anoverview of referenced work. A discussion of existing software products will follow inchapter 4.

The efforts for better C/C++ monitoring at CERN will enhance the capabilities of thelarger controlsmonitoring frameworkDIAMON [5] (Diagnostics andMonitoring) system.Like existing modules for Java, it is foreseen to improve the capabilities of DIAMON forservices written in C/C++.

The closest relative to this work is the CERN paper “CMX – A Generic Solution to Explore

Monitoring Metrics” [6] for the 14th International Conference on Accelerator & LargeExperimental Physics Control Systems, published in autumn 2013. The correspondingsource-files, paper and poster can be obtained from the CMX Website.

The paper describes the ideas for C/C++ monitoring and the planned integration into theexisting controls monitoring framework. At this time a prototype of CMX existed, but this

8

Page 15: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

2 Monitoring of C/C++ Systems

version suffered from several issues including memory leaks, locking errors, read-writeraccess blocking and flawed OO-design for the C++ part.

For the implementation, we refer mainly to the following literature:

• “Is Parallel Programming Hard, And, If So, What Can You Do About It?” [7] written byMcKenney explains parallel programming in general, atomic operations, and CPUmemory barriers. The Sequence Locks mentioned there in section 8.2 are equivalentto the locking mechanism used in CMX.

• “Timecounters: Efficient and precise timekeeping in SMP kernels.” [8] by Kamp de-scribes a specific use-case in the FreeBSD operating system. The algorithm thatcan be seen as a mix of holding a multiple value copy and a having a sequence lock.

• “Effective synchronization on Linux/NUMA systems” [9] by Lameter describes Se-quence locks as they are implemented in the Linux Kernel.

9

Page 16: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

3 Requirements

This chapter defines what we expect from a C/C++ monitoring and diagnostics solution.It provides a formalized overview of the requirements for this project. All requirementsand definitions are formatted like:

TYPE N.N (short name) … short description…Followed by a detailed explanation….. ⋄

3.1 Terms

In the following, terms are described which are used throughout this chapter.

TERM 1 (Roles). We use roles to characterize different kind of user groups. A usercan have any number of roles. The following are important in our scenarios: Developer,Operator, Monitoring and Expert.

• A Developer creates applications and may use libraries from other developers. Incase of unforeseen incidents, the developer may be involved in resolving errorsduring operation.

• An Operator takes care of machine operation. Operators can call specific expertsif additional support is needed.

• A Monitoring system reads values periodically to provide a historical view andtrigger alarms.

• An Expert is acquainted with a broad range of systems and can be a developer atthe same time.

TERM2 (Metric). Ametric describes ameasure of a property of the system that is beingmonitored.

10

Page 17: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

3 Requirements

A metric is similar to a Key Performance Indicator (KPI), every KPI is a metric but notevery metric is automatically a good candidate for a KPI.

Examples of possible metrics are: “requests processed per second”, “number of connected

clients”, “amount of memory in use”, “round-trip time to external peripheral”, “uptime”,“number of crc errors”.

Metrics can be exported from running applications, here the following applies:

• One application has zero to many metrics.

• A machine can execute multiple applications, it can also execute the same appli-cation multiple times.

The term 'metric' can be used more precisly as:

Metric App.no_connections/int defined by developers

Metric instance App<pid>.no_connections/int created at program runtime

Metric value App<pid>.no_connections=10 updated during runtime

Metricaggregate

App<1>.no_connections

+App<2>.no_connections

+App<3>.no_connections

=120

calculated in external system

Metric alarm App<pid>.no_connections == 0 calculated in external system

Developers define metrics during development of their applications. Their names donot have to be compile-time constants. The metrics get registered at program startup.Their values could be static (determined at compile time) or dynamic (updated at run-time).

Using these values, an independent system (e.g. monitoring system) can aggregatemetrics or apply rules to trigger alarms.

TERM 3 (Real-Time). “A real-time computer system is a computer system where thecorrectness of the system behavior depends not only on the logical results of the com-putations, but also on the physical time when these results are produced. By systembehavior we mean the sequence of outputs in time of a system.” [10, p. 2]

The real-time term is often separated in soft and hard real-time. An example for softreal-time, a live 3d visualization, runs at about 30fps to look smooth but occasional

11

Page 18: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

3 Requirements

frame skips can be tolerated and will not be noticed by the human eye. On the contrarya hard real-time systemmight control a industrial robot. If it fails to sent a STOP signalto the robot at exactly the right time, things might get damaged permanently.

If we see a real-time system as a black-box receiving inputs, and reacting by emittingoutputs, then our goal is that this system can fulfill its constraints regarding responsetimes. In fact our main interest is to avoid harming the real-time properties of existingsystems, by adding the functionality into their execution path.

Real-time is not a strict term. In this work we are not going to make detailed Worst

Case Execution Time (WCET) calculation. That means we will not look at influencessuch as from caches and the peripherals and also will not calculate actual executiontimes based on scheduling priorities. Both are heavily dependent upon the executionplatform's hardware and software. Instead, we analyse the complexity and added costsof algorithms and functions.

3.2 Functional Requirements

includes

includes

includes

includes

includes

Monitor exposedvalues regularly

CMX

JMX

Development cycle Add metrics usingCMX Values

Error duringoperation

Monitor real-timeapplication

Monitoring system

Developer

Operator

Figure 3.1: C++ Monitoring and Diagnostics: Users and their use-cases

The functional requirements are described from a user's point of view. Fig. 3.1 shows theuse-cases of the designated users.

FUNC 1 (Monitoring). The monitoring system must be able to collect internal processmetrics from C/C++ processes.

12

Page 19: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

3 Requirements

The DIAMON monitoring system is able to monitor different sources of informationthrough customized data acquisition agents. Support for monitoring C/C++ metricsshall be provided to the same extent as it is currently supported for Java processesthrough JMX.

It is not required to provide event-processing grade precisionwhere every status changein the monitored application is evaluated by the monitoring system.

FUNC 2 (Development). The system must provide tools for developers to allow insightinto software test-runs during the development.

C/C++ Monitoring needs to be accessible with the same comfort as it is currently pos-sible for Java/JMX using Jconsole or the embedded JMXGUI in the CERN specific Mon-itoring GUI.

In addition, because C/C++ is often used in a very low-level, system-orientied en-vironment where using Java GUIs can be uncomfortable, there should be powerfulcommand-line tools to access values exposed by applications.

Developers are concentrated on the main functionality of their applications. Moni-toring and diagnosis functionality is often only added where needed. To encouragedevelopers to implement metrics, it must be as easy as possible.

FUNC 3 (Diagnosis). Operators and developers must be able to explore all exposedvalues for error diagnosis.

The differences between monitoring and diagnosis require that the user can decide tore-acquire the metrics values at any time, meaning he is not bound to regular refreshinterval of the monitoring system.

For diagnosis the user must be able to access every exposed metric, not only thoseconfigured to be surveyed by the monitoring system.

3.3 Technical Requirements

TECH 1 (Real-Time). The system must never interfere with the main program blockingexecution or non-deterministic duration of function calls.

13

Page 20: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

3 Requirements

The real-time processes must not be disturbed if the monitoring system wants to getan regularly update of a metric (Monitoring aspect) or the user asks the current valueof metric (Diagnostic aspects).

The overhead for the real-time thread to update a value needs to be very low and mustnot block the process or obstruct the execution in any other way. ⋄

TECH 2 (Integration). The C/C++ monitoring system must integrate well into exist-ing infrastructures, such as the build process and the data acquisition of the monitoringsystem.

The system shall be able to use the compile-time information, which is already gen-erated in the build process, and make it available at run-time. This includes e.g. theusername of the user who compiled a product or the exact release timestamp.

Information exposed at run-timemust be accessible for the existing accelerator controlsmonitoring system. ⋄

TECH 3 (Reusability). The C/C++ monitoring system must be useable in all differentkinds of C/C++ applications.

C/C++ projects in CERN BE-CO are not standardized. There are several projects fromdifferent teams that are slightly different. A monitoring system must not be specific tospecial use-cases but applicable to all kind of applications. ⋄

TECH 4 (Portability). The system needs to be separated in modules which are CERNspecific or not. The core libraries and tools must be portable to other similar environ-ments.

It is also preferable if the process is not directly related to the monitoring system in useat CERN, but provides only a clean interface from which many tools can profit. ⋄

TECH 5 (Easy to use). Exposing run-time information must be an easy programmingtask.

The developer must be bothered as little as possible with implementation details andthe implementation must be as unintrusive as possible.

The goal of a library for exposing monitoring and diagnostics is to hide all implemen-tation details behind a simple API. ⋄

14

Page 21: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

3 Requirements

TECH 6 (Datatypes). Support the most common datatypes for metric values.

The following common datatypes must be supported:

• signed integer numbers of 64-bit

• signed floating point numbers of 64-bit

• boolean values

• character strings of a maximal length specifid by the developer

The values in different datatypes must be stored without applying any lossy type-conversion. The value returned in read-calls must be bit-wise equal to the value storedbefore.

15

Page 22: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

This chapter highlights existing possible solutions and points out their characteristics.

For enabling processes to expose internal values, there has to be some kind of inter-process communication (IPC) technique. In this chapter we discuss possibilities for inter-process communication.

IPC in general is defined as “different ways of message passing between different pro-cesses that are running on some operating system” [11]. In fact they can also run indifferent places connected via network.

In this chapter we evaluate IPC mechanisms, regarding their suitability to connect real-time processes to monitoring facilities without coupling their execution behavior. Thisinvestigation contains an overview of existing technologies and assesses them with therequirements from the previous chapter.

The monitoring patterns and criteria of IPC mechanisms are then applied in a review ofexisting monitoring software. In this comparison some software which is popular andpowerful in general, turn out as inapplicable or do not provide solutions for the use-caseof monitoring real-time applications.

Before doing a detailed analysis of IPC machanisms, we will next look at general proper-ties of a monitoring system and compare it to a logging system.

4.1 Monitoring Systems

A general architecture of amonitoring system is shown in Fig. 4.1. In the following, wewilldescribe the components shown in this figure and then focus on the connection betweenapplication and the connector/agent of the monitoring system.

16

Page 23: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

AlarmsStatus

AcquisitionAgent

Connector

Application Application

poll values in intervals

push values

Rule engine

metric1 metric2

< 14> 500

AND | THEN status=1 | ELSE status=2

History

status = 2for time:1min

Source

Processor

Sink

Applications

Monitoring System

Figure 4.1: General monitoring system architecture

Components The figure shows examples of two applications under monitoring. Theleft one is monitored through a server-side agent, who regularly polls values. It could bea JMX-enabled Java application where JMX Attributes are read in specified intervals.

Next to this, on the right side, the application actively pushes information to the moni-toring system. The application sends out values as events which will then be processedby rules of the monitoring system.

Each application contributes a metric (named metric1 and metric2). In this example, eachmetric is evaluated with a rule that yields a boolean result. This result is combined witha logical AND and translated into an integer status code.

The result of the evaluation is then used by any kind of status listener. Typical candi-dates are systems for recording the history of values, status displays for operators, alarmsystems and notification systems sending messages over e-mail and SMS.

Temporal behavior The entire stack shown in Fig. 4.1 is driven by events which arepropagated from bottom to top (from source over the event-processor to sink).

If the right application changes a value, it sends out an event. Therefore, all changes inthe application reach the monitoring system. There, the update event will invalidate alldepending results and trigger a recalculation.

17

Page 24: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

The application on the right does not decide itself the point in time at which it sendsmetric updates. Instead, the updates are sent regularly in fixed intervals determined bythe acquisition agent. The acquisition agent then sends the values to the monitoringsystem, where it will trigger a recalculation if the values have changed.

Real-time applications If we imagine that the application to monitor is a real-timecontrols application, then both scenarios are suboptimal for the following reasons.

A real-time application can operate at high frequency. On every state change, it commu-nicates to the monitoring system, following results of this change have to be calculatedevery time. This can create a considerable amount of work for the monitoring system,depending on the complexity of the rules.

Application metrics can be updated with a very high frequency. The resulting networktraffic will therefore be highly undeterministic. It can easily overload the network. Apossible fluctuating load on the kernel towards the network stack is prone to influencethe system behavior. Even with using stateless protocols (possibly UDP), the in-kernelwork connected with writing to sockets is not tolerable inside the real-time applicationthread.

Considering querying a application regularly through an acquisition agent (Fig. 4.1 onthe left), it is not acceptable to allow an external entity, to communicate with a real-timeapplication directly. External requests, especially if they require session state handlingand command parsing, may harm the deterministic execution as well.

AcquisitionAgent

Connector

Agent Agent

poll values in intervals

push values

Application Application

IPC 1

IPC 2

Figure 4.2: Data acquisition for monitoring systems

For these reasons, the acquisition has to be decoupled from the normal program flow.Fig. 4.2 extends the previous Fig. 4.1 with this aspect.

18

Page 25: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

Fig. 4.2 describes two IPC connections, the first (IPC 1) is the same as before, probably anetwork connection to send values to the monitoring system.

The second (“IPC 2”), is a new additional IPC connection between the real-time controlsprogram and a lower-prioritized agent. The agent is queried by the monitoring system'sacquisition agent (Fig. 4.2 left) or directly communicates with the monitoring system andpushes values (right).

“IPC 2” needs to be non-blocking from the side of the real-time application. Also theoverhead must be as low as possible. IPC mechanisms available on our target systemsare discussed in the following section 4.3.

4.2 Logging Systems

From a high-level perspective, the task of system or application monitoring is similar tosystem or application logging. Both involve a producer of information (system/applica-tion) and a consumer (logging/monitoring system).

In most enterprise landscapes there is a centralized logging system installed. At CERN,the accelerator control system uses a low-footprint logging library [12]. The C++ variantof this service sends messages as UDP packages to a central endpoint which injects theminto the logging system. From here on, user applications can subscribe to messages orinspect the history.

That means, there is a system which already allows pushing of information to a cen-tral instance. The question arises whether it can be used to transmit C/C++ monitoringinformation.

Generally the use-cases for logging and monitoring are not the same:

• Logging provides access to the latest log-message issued by the process in near real-time, but the point in time where the next message will be issued/received cannoteasily be determined.

• Monitoring wants to keep track of values over time. Usually “snapshots” of valuesare created in predefined intervals.

• Diagnostic access is entirely controlled by the user.

19

Page 26: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

For monitoring systems, evaluating logging messages can be one source of information.For example, simply the number of messages per second can be a useful metric for clas-sifying the health state of a service. More specific information requires knowledge of thedata-format used by the applications.

However, compared to the characteristics of monitoring real-time application from previ-ous section, the technical implementation of logging is incompatible with monitoring. Asshown in Table 4.1, logging is traditionally push-based. Pushmeans that the log-producersends his messages/events to the log-consumer at his own will. Therefore, there is notseparation as the one named with “IPC 2”.

temporal behavior data formatLogging determined by producer

(the program)mostly unstructured

Monitoring regularly/by producer or con-sumer

structured and standardized

Diagnostic determined by user structured but flexible

Table 4.1: Comparison of Logging, Monitoring and Diagnostics

Table 4.1 summarizes the differences between logging, monitoring and diagnostic system.This results in the following arguments for not using the logging infrastructure tomonitorreal-time applications:

• the number of variables may grow so high that pushing the value changes becomesinefficient.

• scaling: Using logging might work for a small number of hosts, scale it up to around2000 hosts and logging constantly at least 200 messages per second, will certainlyoverload the current logging system.

• the update frequency is very much limited and entirely determined by the process.One probably doesn't want to emit logging messages at higher rate than about 1per second. This again would require rate-limiting or non-deterministic logic onthe client-side.

• real-time software do not tolerate interruptions. Therefore, logging over network isnot possible and would require a non-blocking queuing mechanism to deliver themonitoring data to non-real-time threads.

• real-time software often turn logging off because it threatens their deterministicbehaviour

20

Page 27: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

• all values have to be sent over the network, every time. Even if the monitoringsystem throw away some of them or do not actually monitor a value. The client-side not necessarily knows in which information the server is interested.

Conclusion The existing logging system is not suitable for monitoring and diagnos-tics. The identified differences in communications make an effective implementation ofmonitoring and diagnostics on top of logging facilities unfeasible for the requested envi-ronment.

4.3 Interprocess Communications

In current operating systems different kinds of IPC mechanism are provided, sometimeswith different implementations of the same principle. This section provides an overviewof the advantages and disadvantages of different IPC solutions.

Because of the huge variety of available IPC mechanisms, we will concentrate in thefollowing on the most applicable ones.

communication

datatransfer

message

bytestream

sharedmemory

synchronization

semaphore

file lockfcntl())

flock())

signal

pseudoterminal

anonymousmapping

FIFO/pipe/stream socket

standard/realtime signal

SysV shared memory

HTTP

MOM

Figure 4.3: Taxonomy of UNIX IPC facilities (Figure is based on [13, p. 878])

21

Page 28: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

For sorting out the non-applicable IPCmechanisms, we use the “IPC Taxonomy” (Fig. 4.3)found in “The Linux Programming Interface: A Linux and UNIX System Programming Hand-

book” [13]. It shows different categories of IPC mechanism, actual implementations

and applications .

4.3.1 Possibilities

A monitoring communication protocol involves two parties: writer(s) and the reader(s).The writer is an application exposing internal information, the reader is the part of themonitoring system accessing information of this application. Probably there is only onewriter most of the time but any number of readers.

The basic non-blocking requirements require that the protocol must be designed withoutstates in which the writer would have to wait for the reader. Also a request-responsemodel is unsuited because we want to avoid the possibility that the reader can disruptthe execution of the writer.

communication → data transfer A very common use-case for data transfer IPCin the subclass of “byte stream” are TCP sockets. A solution designed to use streamsockets can easily be adapted to fast communication inside one host using UNIX-DomainSockets as well as network-transparent communication with TCP, possibly adding anauthentication, encryption, and compression layer, thus making it scalable for internet-wide usage.

Writing data on sockets requires a system call, which is expensive in terms of executiontime. It may also block the execution of the current thread somewhere in the kernel.

In data transfer a message between the writer (publishing a metric's value) and the readeris destroyed after reading. In consequence, the reader always has to listen for values sentby the publishing applications.

In real-time systems, we expect the reader to be always lower prioritized than the writer.Hence, if the writer goes, for whatever reasons, full-speed in publishing metric values thismeans a lot of work for the reader. If the reader cannot cope with this speed, the writerwill be blocked or some other overflow mitigation effect will deal with the situation. Thereader has to handle incoming updates all the times, even if the reading side is currently

22

Page 29: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

not interested in up-to-date values. On the other side, if the reader for some reasons isunavailable, the writer needs to be able to deal with this situation as well.

A concept like the traditional client-server as implemented e.g. in the HTTP (HypertextTransfer Protocol) protocol [14] is also unsuited as it requires connection handling, statetracking and therefore the rate and volume of requests will certainly influence the exe-cution of the application. This way a low priority reader process can affect high priorityreal-time processes, which is a form of priority inversion and undesirable.

The same applies to the connection to a MOM (Message-Oriented Middleware). Whilehere different communication patterns are possible, they all suffer either from too muchclient-side work by handling subscriptions or possibly generating unpredictable amountsof traffic for value updates.

communication → shared memory there are different flavors of shared memory.All allow any number of processes access to the same memory region by mapping it intothe process's virtual address space.

Shared memory, in contrary to data transfer, is unsynchronized by design. Therefore,custom synchronization methods have to be implemented.

Reads to shared memory are non-destructive. The values stay the same until they areoverridden and without additional effort the writing process doesn't care about readers.

In communication realized by data transfer mechanisms, both sides agree too the same“protocol”, but they can be designed to be backward-compatible. In data transfer a proto-col is event-oriented, typically every message as such either contains a set of informationor triggers a state change.

In shared memory there is only one piece of shared information which can be modified atany time by any involved party. This makes the protocol design for shared memory muchharder than for data transfer/event oriented communication.

Themost important advantage of shared memory is that it is by design the fastest possibleinter-process communication technique, without involving in-kernel queues or locks.

In current computer systems there cannot be anything faster than communication di-rectly through memory. The essential communication instructions (read,write) are im-plemented in hardware, which is the most direct communication possible.

23

Page 30: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

The latency and throughput of memory are direct but still not constant. Like every otheroperation it is influenced by various factors, including:

• Size of processor cache

• Number of processors in a SMP system

• Performance of the cache hierarchy, performance of the attached main memory

• Performance of the inter-processor bus, cache coherency protocol

• Current system workload, type of other concurrent applications

• Timing anomalies because of out-of-order execution, prefetching, speculative exe-cution etc. [15]

signal/synchronization these facilities are not usable for communication, becausethey do not transport any payload data. Signals or locks have to be used with greatattention as they cause time-delays and unpredictable change of the program flow. Theyare listed here for completeness.

4.3.2 Evaluation

In general all the different mechanisms under data transfer are unsuited in the same wayas logging solutions are unsuited for monitoring (see section 4.2).

While a message-driven architecture is probably the most flexible approach, it's hard tomake a message-driven system real-time compatible. The overhead and blocking behav-ior of the message-oriented approaches could be treated by separating the creation of ametric update from actually sending it. This can be implemented with a separate, low-priority sender thread. The real-time threads would send their data to the monitoringthread using something like a size-limited queue and hence are unaffected by the latencyin the communication with the monitoring server.

This kind of decoupling between real-time and non-real-time code can be achieved moreeasily by using shared memory directly inside the client application's address space.

We consider the approach to let the applications, which are to bemonitored, directly writeto an IPC shared memory region, as preferable to less direct communication solutions

24

Page 31: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

such as byte transfers over a network or pipes. It also reduces the amount of code andlogic needed to be included in every application. Furthermore it's more trustworthy tomake guarantees about real-time suitability because of the reduced complexity.

Due to the nature of shared memory, if one publishes whole data structures at once,the implementation will become less private (no “encapsulation” in OO-terms), with theeffect that it is more complicated to maintain a fully backward-compatible system.

By avoiding any in-kernel work, which is triggered by all other IPC mechanisms, sharedmemory becomes very fast and has the lowest possible overhead. It allows us to makevalue updates nearly as cheap as normal arithmetic operations. SHM is perfectly suitedto fulfill the TECH 1 Real-Time requirement.

The integration into the centralized monitoring system will then be done in a separatereader process, which can be scheduled with low priority. It can be started/stopped, up-dated and extended independently. This also makes it easier to integrate into any othermonitoring solution in the future (TECH 4 Portability).

4.4 Existing Software Solutions

This section covers a wide range of existing software solutions for monitoring and soft-ware components which might be useful in monitoring tasks.

JMX On the Java Platform the Java Management Extensions (JMX) [16] are often usedfor remote-monitoring of Java processes. JMX has a extensive set of features coveringmore than only exposing run-time information.

Run-time information in JMX is organized in different kinds of MBeans. Basically anMBean exposes values and enables remote triggering of operations.

Fig. 4.4 shows attributes of an MBean in the Java VisualVM Tool. In this example, anapplication exports information about loaded JAR (Java Archive) files as MBeans. TheMBeans have two attributes: URL and Properties.

The usage of JMX is restricted to the Java Platform, however it's possible to start a Javavirtual machine embedded inside a C++ application, only for the purpose of JMX [17].

25

Page 32: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

Figure 4.4: Java VisualVM showing JMX attributes

xymon The xymon [18] host-monitoring solution uses POSIX shared memory for inter-process communication (see lib/xymond_ipc.c). From looking at the source code, it lookslike xymon developers didn't choose shared memory for performance reasons, but for theease of use.

pcp As a full-featured monitoring suite, the “performance Co-Pilot” (called pcp) [19],supports so calledMemory-mapped values (pcp/src/libpcp_mmv). pcp is a complete frame-work for logging applications performancemetrics. For the environment at CERN the pcpframework is too intrusive. Also we didn't find any statements in documentation or codeof how the memory-mapped values in pcp handle concurrent access.

other applications of shared memory We found usage of shared memory in otherapplications which are less connected to monitoring. Nevertheless we will mention themand describe shortly how they use shared memory.

pvbrowser [20] is a SCADA application framework. It provides a SystemV sharedmemorybacked value table supporting all common datatypes. The table is protected by a process-shared pthread-mutex [21, pthread_mutex_init]. pvbrowser supports Linux, Windowsand VMS operating systems.

localmemcache [22] is a hash table implementation using POSIX SHM Objects as storageback-end. It's mainly intended to be used from programs written in ruby. It tries toemulate a Berkeley DB style access paradigm. The design looks very specific, it usesPOSIX named semaphores for exclusive locking of the whole table.

26

Page 33: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

4 Existing Technologies and Solutions

The X11windowmanager i3 [23] uses sharedmemory to provide a debug channel which ispersistent in cases of crashes and readable if the programhangs (see src/log.c). It doesn'tlock the shared memory data structures, instead a pointer to the latest log message isprovided and a pthreads condition variable [21, pthread_cond_init] is used to broadcastsignals when the pointer has been updated after adding a new message.

4.5 Conclusions

This chapter presented monitoring systems in general and highlighted specific aspects ofmonitoring real-time applications. We identified that there has to be a kind of decouplinglayer between the monitoring system and the real-time application, which ensures thatactivity of the monitoring system cannot harm the application.

In the evaluation of different IPC mechanisms, we designated shared memory for thecommunication between the real-time application and a monitoring agent on the localmachine.

A comparison with existing software solutions showed that currently exists no plugin-insolution to this problem. However, some aspects of exposing values using shared-memoryare implemented for different use-cases elsewhere.

Consequently, the next chapters will describe the development of CMX, a new solutionfor monitoring real-time controls applications written in C and C++.

27

Page 34: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and DataStructures

This chapter describes the design phase of the data structures and access protocol ofthe CMX library. The CMX library is intended to fulfill the requirements described inchapter 3.

In the previous chapter we concluded that amonitoring solution, according to the require-ments, must be implemented using shared memory as the inter-process communicationtechnique. The next section discusses the design of shared memory data structures anda reader-writer access protocol.

5.1 Design of CMX Data Structures

From the definition of a metric (TERM 2), the behavior of our selected IPC mechanism,and our system environment, we derived amodel which is shown in Fig. 5.1 as the optimalrepresentation of data:

• A host can execute any number of applications at the same time

• An application can expose independent sets of metrics (Components: TERM 2).

• Some predefined metrics are exposed by default (easy to use: TECH 5).

• Metrics have different data types, depending on their content (types: TECH 6).

The model is a hierarchy written like: Host->Process->Component->Metric.

The Host part is self-evident, since we do not want to communicate metrics over networkusing the core CMX Library. A Host is a computer system executing multiple Processes.A host is identified by a host name.

28

Page 35: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

Component TestMetrics

Component - (Process)

active_users=5/int64last_sql_stmt="SELECT..."/stringitems_processed=120123/int64

start_time=1323123123/int64hostname=ewe-123-fbcdev/stringprocess_name=TEST-ECW10/string

Application Process

Figure 5.1: CMX Host - Process - Component model

The Processes shall be independent in the way in which they expose their metrics. Thisallows smooth upgrades in case of improvements to the library and reduces the risk ofinterference between the processes.

The next separation level is Component. This maps directly to a shared memory spacewhere themetrics are organized. Most executables are built frommany different libraries,sometimes from 3rd-partys. Therefore, we do not risk that one library interfere withothers by filling up a shared Component. Instead every library should register its ownComponents independently.

Additionally, the CMX monitoring library will automatically register a so called Process

Component for every application to expose some predefined metrics like start-time, hostname or resource usage.

The Metric is placed into the Components. Every metric has a name, type and value. Thename is a limited character string. The value can have a fixed (integer, float, boolean) orarbitrary length (TECH 6).

Metrics can be either addressed by their indexes in the component or searched by theirnames. The search is a simple linear search and should be avoided if possible.

A Component starts empty and has to be filled with Metrics. The metrics are initializedwith a neutral value (zero:0 or empty string:""). The developer can set/get values onmetrics. They can be referenced by their index in the Component. Since Metrics have aname, they can also be searched using a simple linear search through all elements in aComponent.

29

Page 36: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

It is not planned to implement hash map access to Components as a hash map data struc-ture is by design non-deterministic in time and a very dynamic data structure. The imple-mentation effort is quite high for static-sized shared memory segments and not feasiblefor CMX.

By putting the metrics into Components we also avoid name collisions if two instances ofthe same application register a metric with the same name.

A process can expose the same Component multiple times but with a different name forall its client connections or every storage subsystem etc. This approach is similar to JMX,in the sense that one can expose several instances/objects of the same Type (MBean) [16].

Implementation of the Hierarchy The planned hierarchy for metrics stored on acomputer is Host->Process->Component->Metric. Given that we chose POSIX SHM ob-jects as our memory-backing technology, the first abstraction (Host) is free, provided bythe separation through the operating system.

The next separation (Component) level is quite special since POSIX SHM is not directlygrouped by owner processes, instead it is linked to the creator's user-id. Therefore, wewrite the owning process PID in the name of the SHM file like:/dev/shm/cmx.2345.ComponentName, where 2345 is the operating system process ID (PID).

The Metrics stored in the Components are organized in “slots”. A slot is usually used forone value (integer, float, bool). In case of character strings many slots can be chainedtogether to support data of arbitrary length.

5.2 Shared Memory

This section discusses some aspects of using shared memory for implementing CMX. Itstarts with a description of how different SHM implementations identify shared memoryregions. Then the usage of pointers inside shared memory is discussed, followed by somegeneral thoughts about the design of shared memory data structures. Finally we focusat the delayed mapping of allocated shared memory to physical pages.

30

Page 37: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

Identification and Handles System V shared memory segments are identified by ahandle which is determined when the segments are created. This handle is created froma numeric key, which is specified by the application. From this the operating system willderive a unique SysV-Id which acts as an handle. This first numeric key is subject topossible identifier collisions between independent applications.

The derived ID is then used in calls to control (shmctl) and attach/detach (shmat/shmdt)functions. Special System V command-line utilities are available to create/inspect/de-stroy shared memory segments.

It is also possible to skip the numeric key and directly ask the operating system to gen-erate the unique ID. The unique ID can be passed to other processes.

POSIX shared memory objects by contrast, use character based identifiers. In the case ofLinux, the SHM object will be created as a file in the directory /dev/shm. This directory isa filesystem of type tmpfs (a filesystem which is entirely stored in ram). This way POSIXSHM objects are identical to memory-mapped files but using an in-memory filesystem.

Re-sizing shared memory: pointer issues While it seems appealing to be able tore-size the amount of memory available for storing metrics to the actual need, this raisessome practical problems andmakes implementation of data structures more complicated.The most significant problem with re-sizing applies in to same way to usual memorymanagement using malloc() and realloc() from the C standard library.

For the initial allocation the call to void *malloc(size_t size) returns (if successful) apointer to the newly allocated memory.

The operating system creates a mapping between the process' virtual memory and thehosts physical memory (see Fig. 5.2). Subsequent calls to malloc() will likely return thevirtual-memory address of the previously returned pointer, plus the size of the previouslyallocated region.

Once allocated memory can be resized using realloc(). Here it is possible that the firstallocated block cannot be expanded, because it would grow into the address space of thenext allocated block. Thus, the call to realloc() returns a new pointer, which can bedifferent from the previous one.

The same problem applies in principle to POSIX shared memory object's mapping us-ing shm_open(), mmap() and ftruncate(). Hence, the variable holding the pointer to the

31

Page 38: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

shared-memory data structure needs to be protected against concurrent access. If not,a thread could trigger the re-size operation and consequently render the current pointerinvalid, while it is still in use by another thread. This would lead to a illegal memoryaccess in the second thread.

Any implementation is limited to only increase but never decrease the shared memorysize. Otherwise a blocking synchronization is required, which prevents other threadsand processes from accessing possible invalid memory. This would also render the wholeeffort of providing fast and guaranteed non-blocking access to shared memory useless.

Re-sizing sharedmemory: data structure design Dynamic data structures aremorecomplicated to represent in C data structures. In C there is not a notion of an “array withvariable size of X.”

A workaround is to use pointer manipulation, the following struct definition serves as anexample:

1 struct cmx_value {2 int value; //> numeric value3 char name[64]; //> name of this value4 };5 struct cmx {6 int process_id; //> process-id of the creator7 int number_of_values; //> current number of values (size)8 char component_name[64]; //> name of this collection/component9 };

Here we do not define the actual number of struct cmx_value. The allocation will haveto calculate and add them manually to the overall size:

1 struct cmx *;2 cmx = (struct cmx*) malloc(sizeof(cmx) + NO_OF_VALUES * sizeof(cmx_value));

Since the value slots are not longer real fields in the struct cmx, one has to take theaddress of the struct cmx, add one to skip to the possible next element in a array, thenthis pointer is cast to struct cmx_value * type.

This operation can be simplified using C preprocessor macros but stays error prone be-cause it cannot be verified easily. It is getting even more complicated if there are morethan one dynamically growing fields. For example, one array of value-names and sec-ondly the value itself. Then one either has to build groups or copy a lot of memory.

The first prototype of CMX used groups of System V shared memory segments, whichwere allocated as needed and freed if empty. This is inefficient because it involves a lotof mapping/un-mapping operations and required very strict locking.

32

Page 39: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

Anotherprocess'smemory

Physicalmemory

Process A

sharedmemory

Process A

Process B

Process B

sharedmemory

0x0000

0xffff

0x0000

0xffff

0x0000

0xffff

Figure 5.2: Virtual to physical memory addresses

Mapping of virtual to physical pages The mapping from a virtual memory addressto a physical memory page is not set up instantly when calling mmap() or malloc() respec-tively.

The memory allocated by ftruncate() or malloc() only gets actually mapped wired tophysical memory when it is first accessed. The operating system maintains a page-faulthandler which creates the mapping on demand, in segments of at least 4 KB.

Fig. 5.2 shows the simplified mapping of a memory segment shared between process Aand process B. This mapping consists of two smaller physical memory regions and alsoun-allocated holes. Holes can be utilized to allocate huge amounts of memory, hencereserving the address space in the virtualmemory address space, without actuallywastingthe same amount of physical memory.

This method works as long the memory doesn't get initialized (for instance “zeroed”) bydefault. It works both with usual malloc() allocated memory, System V shared memorysegments as well as POSIX shared memory objects. Also most modern Linux file-systemsunderstand the similar concept of “holes” in files, so this is usable with filesystem backedmappings too.

This can be changed at operating system level by calling mlockall(). If called with flagMCL_CURRENT this will fault-in all currently open resources. With flag MCL_FUTURE this will

33

Page 40: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

also affect resources like sharedmemorymappings in the future, this setting can be turnedoff by calling munlockall(). The behaviour can also changed for specific memory regionsusing mlock() or the mmap() flag MAP_LOCKED.

Criteria POSIX SHM mapped file SysV SHMCommon Name SHM object mmap()-ed file SHM segmentAvailable since 1993 (POSIX.1b) 1999 (Linux) 1983 (SysV SVR1)Identifier FS-path FS-path SysV Key (integer)Handle file descriptor file descriptor SysV Id (integer)Follows UNIX I/O Design yes yes noResizeable yes yes noAuto-delete on last detach no no yesPortability ok ok very goodData persistence reboot filesystem reboot

Table 5.1: Comparison of Shared Memory Implementations

Conclusions There are threemajor implementations of sharedmemory to choose from.Table 5.1 shows the differences between the three implementations available on SLC5 andSLC6, using common criteria.

Memory-mapped files are not strictly speaking a shared memory. As files they are savedusually in a disk-persistent filesystem. However, they can be mapped into the processmemory aswell and then shared between different processes. The usage ofmapped files isinteresting because, they offer persistence over system reboots, this also includes systemcrashes to some extent. However, in our target environment, most systems mount theirfilesystem read-only, so they cannot write to the filesystem.

After all, we see the POSIX SHM as the most powerful and suitable solution. The pro-gramming interface is more consistent than the one of System V SHM. System V can bestill interesting if one needs to support the Microsoft Windows operating system which,through the UNIX compatibility library provided by Microsoft, only supports SystemVIPC or a totally different Windows-specific interface.

5.3 Design of CMX Protocol

The previous sections have described the overall data structure on a high level. The fol-lowing section is about managing concurrent access to the data. It discusses ways to let a

34

Page 41: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

process write its internal values to shared memory, without need to fear any obstructionscaused by concurrent reads, and at the same time ensure that a reader can always detectread-corrupted values.

5.3.1 Real-Time Constraints

The basic real-time constraints are roughly described in TECH 1. For the implementationthe following assumptions are made upon this:

• Not all functions need to be real-time suitable. For example, we cannot make guar-antees about functions involving system calls.

• The basic get/set operations must be real-time suitable to be called from real-timethreads.

• The creation of CMX Components and registering names of new metrics does notnecessarily need to be real-time suitable, since these operations can run once atinitialization time.

• We can assume that only one thread wants to update the very same value at a time,every concurrent update upon the same value is allowed to fail.

• A read is allowed to fail if a concurrent write happens. Otherwise it must succeed.

• A write is allowed to fail if a concurrent write happens. Otherwise it must succeed.

The following table shows an overview of the high-level operations needed to publishmetric values using CMX. Some of these actions operate on an “object”, for example, ametric is always bound to a component.

The column “real-time” marks whether these actions have critical real-time requirements,according to TECH 1.

Operation Object Real-time required?

Create/update process information - noCreate Component - noRemove Component - noCreate Metric instance Component noRemove Metric instance Component noUpdate (set) Metric value Metric instance yesRead (get) Metric value Metric instance (limited) yes

35

Page 42: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

5.3.2 Concurrent Access to Shared Memory

For accessing shared memory, a common protocol is mandatory. The role of the protocolis to take care of handling concurrent access, and thus prevent data corruption and loss.

A shared memory access protocol is different from the usually known stream-orientedprotocols. In shared memory, values can be manipulated at any position, at any time(random access). The is no access synchronization between multiple parties built intoPOSIX SHM objects. They may access SHM in overlapping operations and concurrently.

Therefore, the access methods must implement a protocol which guarantees that dataintegrity is ensured at any time. This applies for read and write operations equally.

In the following the evolution of such a protocol is described. We start with a naive,broken design and improve it to provide the required guarantees. The final design isdescribed from page 41 onwards.

First approach A very simple example shows the basic usage of POSIX SHM objectsand a flawed solution of how to store values. The exposed values are of 32 bit integertype and identified by keys (64 characters).

The naive shared memory data structure looks like:1 struct cmx_value_t {2 char name[64];3 int32_t value;4 }5 struct cmx_shm_t {6 char component_name[64];7 int32_t process_id;8 struct cmx_value_t values[1024];9 }

The process can then map this structure to a shared-memory region of the same size(error checking omitted):

1 // create shared memory object2 int fd = shm_open("component-name" /* name */,3 (O_CREAT | O_EXCL | O_RDWR /* flags */),4 (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH /* mode */));5 // set size of shared memory6 ftruncate(fd, (off_t) sizeof(struct cmx_shm_t));7 // map shared memory into virtual adress space / create mapping8 struct cmx_shm_t * cmx_shm_ptr;9 cmx_shm_ptr = (struct cmx_shm_t *) mmap(0, sizeof(cmx_shm),

10 PROT_READ | PROT_WRITE,11 MAP_SHARED, fd, 0);

36

Page 43: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

The shared-memory object is created by calling shm_open() (line 1 in listing above). Thesize of the shared-memory object is set then using ftruncate() (line 6) to the size of thecmx_shm_t struct type.

The actual pointer/reference to the shared memory is returned from the mmap() (line 9)function. The mapping is configured read-write (PROTO_READ|PROTO_WRITE) and shared(MAP_SHARED). A private mapping (MAP_PRIVATE) would be a copy-on-write clone of thecurrent state, only visible to the current process.

The returned reference to the shared memory is assigned to the variable cmx_shm_ptr oftype pointer to cmx_shm_t (line 9). This pointer can then be used to access the struct asusual. That means up to 1024 values of type cmx_value_t can be stored there (7̃0 KB ofmemory).

This simple approach works to a certain extent, but lacks data-integrity. The next stepswill document the problems in this solution and improve it gradually to the state whichis actually implemented in CMX.

First enhancements The previously described version hasmany shortcomings ofwhichwe will address some in this first enhancement phase.

So far it is undefined when a value is set (contains valid data) or unset. Now we define:If the name is empty (it starts with a null byte '\0'), the specific value is empty.

Also it's missing an update timestamp, but a timestamp field can be easily added (namedmtime field for “modification time”):

1 struct cmx_value_t {2 char name[64];3 int32_t value;4 uint64_t mtime;5 }6 ...

This raises a new problem: one value is atomic, setting two values is not. Imagine thefollowing program with two threads:Update value Read value

• Write data • Read data• Write mtime • Read mtime

With any ordering, it is not guaranteed for the reader that his data (value,mtime) belongtogether.

37

Page 44: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

In consequence, the following reader/writer pattern can appear:Update value: write(data,mtime)->() Read value: read()->(data,mtime)

• T1: write("A", 0x1) • read()->(dataT2, mtimeT1)->("B",0x1)

• T2: write("B", 0x2) • read()->(dataT3, mtimeT2)->("C",0x2)

• T3: write("C", 0x3)

Here the values read ("B",0x1 and "C",0x2) do not match any of the value pairs writtenon the left side.

Second enhancement To keep track of the connection in (value, mtime), we need toprevent the reader from accessing the data if an update is in progress.

The classical approach is to use a simple spin-lock (condition flag) or a reader-writer lock.However, this is undesired as it adds blocking behavior to the write and/or read functions.In real-time systems it's unreasonable that the writer process has to wait for the readerto finish his reading operation.

A lock requires the reader to interfere with the writer by grabbing the lock. It looks linean easy solution to simple use a lock without actually blocking the execution, by justfailing immediately if the lock-flag is set. This doesn't work, imagined adding a simplebit indicating if the current state (locked/unlocked). Then the reader has to check this bitboth on entry and exit of the critical region where he reads the value.

This approach is fundamentally broken because a reader can get suspended inside thecritical region at any time, for instance because of a process-context switch or the pro-cessor's data-access stalling. During this sleep period the update occurs and when thereader resumes the value is unlocked again. In this case the reader is unable to detectchanges done during his reading phase, the read data is likely corrupted or inconsistent.This scenario is generally known as the ABA-Problem [24, p. 235].

Instead of traditional locking, sequence locks can be used. Existing implementations arediscussed and references to literature are made in section 5.4.

While this second step adds a classical sequence lock, the final implementation in CMXuse some modifications to suit it to particular needs. The description here uses a tradi-tional sequence lock for better understanding.

For counting the sequence-value, there is now a new field called ctr (for counter), addedto the cmx_value_t struct:

38

Page 45: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

1 struct cmx_value_t {2 char name[64];3 int32_t value;4 uint64_t mtime;5 uint64_t ctr;6 };

For the new ctr field the following applies:

• will be initialized with zero.

• if it is even, the current value is valid.

• if it is odd, the current value is invalid.

Based on the previously described update/read protocol, there are now new steps to setand verify the ctr value:

��

��Write value

assert(ctr is even)&& increment ctr

ATOMIC

Set value

Set mtime

Increment ctr

��

��return

��

��Read value

Read ctr1 <- ctr

Read value <- value

Read mtime <-mtime

Read ctr2 <- ctr

��

��QQQQ�

���Q

QQQ

false

true

ctr1 == ctr2&& ctr1 is even

��

��return

Figure 5.3: Program flow of a sequence lock imple-mentation

Fig. 5.3 on the left shows theflow diagram of a sequencelock write and read operation.

The first step of the of Write

Value procedure contains twooperations. They need tobe executed atomically, thatmeans in one step, withoutbeing interrupted by threadscheduling or concurrent exe-cution on other processors.

The Intel architecture and al-most all modern CPU architec-tures provide suitable atomicinstructions to implement this.

39

Page 46: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

struct cmx_value_t { char name[64];

int32_t value;

uint64_t mtime;

uint64_t ctr;};

time

1

0

0

"test"

2

1400766

101

3 4 5

. . .

add("test") set(101)

validinvalid invalid valid invalid

1400865

102

set(102)

Figure 5.4: Visualization of data in “Second enhancement” version, over time

When using sequence-locks the data switches from being valid to invalid during writeaccess. Fig. 5.4 shows the validity of the data inside the cmx_value_t struct over time.First the struct is initialized (add()), then the value is set to 101 at the fictive time of1400766 (set(101)). Next the value will get updated again to 102 (set(102)).

Between those updates there are timeframes in which the value is valid (ctr is even). Aslong as the reader is inside such a timeframe, the read will succeed. If the reader startsin one valid timeframe but ends in another this is be detected using the ctr value.

Third enhancement So far there is only support for values of type int32_t (32 bitsigned integer). As mentioned in (TECH 6 Datatypes), it is required to be able to storefloat values and character strings as well.

Support for both float and integer values can be easily added in a type-safe manner byusing a union for storing the value. A new separate integer value keeps track of the actualtype in the union.

Character strings of arbitrary length can be implemented by allowing the cmx_value_t

data structure to be chained like a single-linked list. Only in the first element of this listthe name and mtime fields are important, the others are only used to store the payload ofthe character string.

1 typedef enum cmx_value_type_t2 { // enumeration of possible types3 TYPE_INT64 = 1, //> identifies values of type integer 64-bit4 TYPE_FLOAT64 = 2, //> identifies values of type floating-point 64-bit5 TYPE_STRING = 3, //> identifies head of character string values6 TYPE_STRING_CONT = 4 //> identifies continuation of string values7 } cmx_value_type;8

9 struct cmx_value_t {10 char name[64]; //> name of the value11 uint8_t type; //> type, see enum cmx_value_type_t12 union {

40

Page 47: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

13 int64_t _int64; //> the integer value (TYPE_INT64)14 float64_t _float64; //> the float value (TYPE_FLOAT64)15 union {16 char data[64]; //> data of string value17 int32_t next; //> index of the next cmx_value_t data structure18 } _string;19 } value;20 uint64_t mtime; //> modification timestamp21 uint64_t ctr; //> sequence lock counter22 };

The following is a example state of four values in memory. The value of mtime is notrelevant, the ctr is always 2 after the first update, thus mtime and ctr omitted here.

cmx_value_t[0] {.name = "val1", .type = TYPE_INT64,.value._int64 = 1234L, };

cmx_value_t[1] {.name = "val2", .type = TYPE_FLOAT64,.value._float64 = 12.34L };

cmx_value_t[2] {.name = "val3", .type = TYPE_STRING,.value._string.data = "Hello..[64]",.value._string.next = 3 };

cmx_value_t[3] {.name = "", .type = TYPE_STRING_CONT,.value._string.data = "World..[64]",.value._string.next = -1 };

Final design The current implementation of CMX contains some more changes com-pared to the state described here. The definite data structure used in shared memory isvisualized with a diagram generated from source-code in Fig. 5.5. The definitive protocolis written on the bottom of Fig. 5.6.

So far value fields have been addressed by index. Of course one can also search throughall values by name. Since values can be added and removed, a reference by index staysvalid even if the name of the value has changed. To avoid this we include a counter valueinto the reference. This way our reference is effectively a 15bit index value + 16bit countervalue. The remaining single bit is used to indicate errors (the “sign” bit turns the valuenegative, this is handy for error checking).

Instead of incrementing the ctr twice, a field called state is used. State can be any ofFREE,OCCUPIED,UPDATE,SET,PAYLOAD. Occupied is used in the transition from free to set.Payload is a new state used when saving strings.

The value_state variable is an array independent from the values itself. This way we donot initialize the memory while scanning for a free value in insert or find operations.

41

Page 48: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

struct (8.0 B)

cmx_shm_value_string_tsize unsigned short

current_size unsigned short

next_index unsigned short

next_id unsigned short

union (8.0 B)

cmx_shm_value_t_int64 long

_float64 double

_bool int

_string struct

struct (128.0 B)

cmx_shm_slot_value_tid unsigned short

type int

mtime unsigned long

name char[64]

txn unsigned long

_reserved char[30]

value union

struct (128.0 B)

cmx_shm_slot_payload_tid unsigned short

next_index unsigned short

next_id unsigned short

data char[116]

union (128.0 B)

Nonevalue struct

value_payload struct

struct (2.7 MB)

cmx_shm_tCMX_TAG char[8]

process_id int

reserved char[52]

name char[64]

value_state int[20479]

value array[20479]

Figure 5.5: Memory structures of CMX (generated using a custom tool on top ofsparse [25])

42

Page 49: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

5.3.3 Verification

In the verification of properties of the data-access protocol we want to proof that differentconstraints are met. The following arguments are written in prose. A verification usingmachine-based reasoning follows in section 5.5.

The correctness of the algorithm depends on several conditions, we going to prove themone after another. The explanations are separated by a ⋆ on the right side of the page.

The next seven statements are related to Fig. 5.6. For Verify 8 - Verify 10 there are noillustrative figures.

Verify 1 (Read before Write). Succeed, trivial.

Verify 2 (Read after Write). Succeed, trivial.

Verify 3 (Read overlaps Write). The read fails in TR2 because TW1 sets state to UPDATE.

Verify 4 (Start of Write overlaps Read partially). This operation succeeds, because theread ends before the write increases the ctr value in TW2 . The data read is valid as thewriter changed nothing before the end of TR4 is reached.

Verify 5 (Start of Write overlaps Read). The read operation fails in TR5 , because thewriter increased the ctr value in TW2 before the reader finished TR4 .

Verify 6 (Read inside Write). The read operation fails in TR1 because the state variableis set to UPDATE in TW1 .

Verify 7 (Write inside Read). The read fails in TR5 because write modified the ctr inTW2 value.

Verify 8 (The value update is non-blocking). The update (set) operation cannot blockbecause there are no blocking instructions.

Verify 9 (The value update fails if another update is in progress). The atomic compare-and-exchange operation in TW1 acts as a mutual exclusion. The update operation cannotbe entered by a second process while state ̸= SET.

Verify 10 (The read operation always detects invalid data). A invalid readwould happen

43

Page 50: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

1READ

1 2 3 4 5

WRITE1 2 3 4

READ1 2 3 4 5

WRITE1 2 3 4

READ1 2 3 4 5

WRITE1 2 3 4

READ1 2 3 4 5

WRITE1 2 3 4

READ1 2 3 4 5

WRITE1 2 3 4

READ1 2 3 4 5

WRITE1 2 3 4

READ1 2 3 4 5

WRITE1 2 3 4

2

3

4

5

6

7

Read before write

Read after write

Read overlaps write

Write begin overlaps read partially

Write begin overlaps read

Read inside write

Write inside read

TW1 CAS(state,SET,UPDATE) TR1 ctrjTW2 ctr++ TR2 state = SETTW3 value := /*value*/ TR3 read valueTW4 state := SET TR4 ctrk

TR5 ctrj = ctrk

Figure 5.6: Overview of the CMX Reader/Writer Protocol with examples

44

Page 51: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

if the read (TR3) is executed in the time after the transaction counter is incremented (TW2),but before the value (value and mtime) is written completely.

For this to happen, the check state = SET(TR2) must execute before TW1 . If so thenit is guaranteed that ctrj (TR1) is read before ctr++(TW2) and ctrk is read after TW2 -therefore they must be different and the modification will be detected by the check inTR5 .

5.4 Comparison with Similar Algorithms

The reader/writer protocol for CMX is not new or unique. While this specific implemen-tation is customized for the use-case of CMX, the general idea has been described andimplemented before elsewhere. We found three publications or implementations of sim-ilar algorithms.

Non-BlockingWrite Protocol (NBW) This description of a similar algorithm is from1993 by Kopetz and Reisinger [26]. They provide correctness arguments and a detailedanalysis of scheduling. The usage scenario described there is more abstract. It is notspecifically designed for SMP machines but for a network of nodes each consisting ofCPU, memory and a communication controller. Every node can execute many taskspseudo-parallel and because there is only one CPU, the tasks are executed interleavedone after another, meaning that they get interrupted from time to time.

The nodes receive updates via broadcasts over a communication processor, it will writethe update using direct access to the memory of the node. Thus, this is a single-writer,many reader situation (every task can be a reader). If a task gets interrupted, it is possiblethat the communication processor updates the value in the meantime. The Non-BlockingReader/Write protocol makes sure to detect those cases and inform the reader that hisdata is now invalid.

The NBW protocol as shown in Fig. 5.7 and as described in the paper [26, p. 3/133] issimilar to the one in CMX.

Write message is designed single-writer while CMX uses the state value to prevent twowriters running at the same time. The Writer first increments the CCF-Counter, then thevalue and again the CCF-Counter.

45

Page 52: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

start: CCFi := 0; // The counter CCFi starts globally at zero

Write messageistart: CCFold := CCFi;

CCFi := CCFold + 1;<write messagei>CCFi := CCFold + 2;

Read messageistart: CCFbegin := CCFi;

<read messagei>CCFend := CCFi;if CCFend ̸= CCFbegin or

CCFbegin is oddthen goto start;

Figure 5.7: Non Blocking Reader Writer Protocol [26]

In Read message the reader looks at the CCF-Counter, then the value and then again theCCF-Counter. This is repeated until both reads of the CCF-Counter are equal.

Timecounters in FreeBSD “Timecounters: Efficient and precise timekeeping in SMP

kernels.” [8] from 2002 is about the management of high-performance timer values in theFreeBSD operating system. The paper is mostly about timekeeping, except the section“Locking, lack of …” which describes a lock-free single-writer data structure supportingmultiple data generations.

In difference to the previous algorithm, this one keeps multiple copies of the data in aring buffer. This way a reader which is slower than the writer can still succeed, in fact hehas time until the data-item generation he selected will be overwritten in the next roundof the ring buffer. This reduces the reader fail-rate significantly, especially in scenarioswith very high update rates and where the readers would otherwise not be able to readthe complete data until the writer starts his next update.

This approach is perfectly suited for exposing time sources to the whole system, becausethe time is read from every process in the system, ones with very low to very high prior-ities. Even a low priority process which might be suspended for a longer timeslice mustsucceed in reading his time data. Also timekeeping is critical for many applications andretries are more problematic than the negligible increase of memory usage for the ring-buffer.

In CMX, we do not expect a situation with many reading clients with different processpriorities. Due to the implementation without ring-buffer in CMX, a reader has a higherchance to fail. For example, if a process does nothing else than updating its CMX values,like a time source update its timestamp, a reader might not be able to read this value.

46

Page 53: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

CMX might be used to store bigger amounts of data (character strings), where holdingmultiple copies is memory becomes quickly very expensive. Also we do not expect CMXusers to update metric values in a frequency comparable to time sources. Where in CMXvalues will possibly be updated every millisecond under some circumstances, the timesource has to be able to handle nanosecond precision.

In contrast to time-source clients, the CMX readers are less critical, they can tolerate toretry a read if it has failed before.

Linux Sequence Locks The Linux kernel has provided Sequence Locks since kernelversion 2.6.12 (around April 2005). They are not considered as locks in the common sense,but an abstraction of a Reader/Writer protocol like the one found in [26] and describedbefore.

A description of Sequence Locks can be found in “Effective synchronization on Linux/NUMA

systems” by Lameter [9, p. 15] and in “Linux device drivers, Third Edition” by Rubini andCorbet [27, chapter 5, p. 127], as well as in the C source of the Linux Kernel itself [28,include/linux/seqlock.h].

The Sequence Lock additionally uses a spinlock to synchronize the write access. CMXlocks the write access in a similar way by setting the state variable from SET to UPDATE,but CMX does not spin, rather it fails immediately.

The seqcount struct contains a counter variable of type unsigned.1 typedef struct {2 struct seqcount seqcount;3 spinlock_t lock;4 } seqlock_t;

The usage of a sequence lock in the linux kernel looks like (assume lock is of seqlock_t):Writer Reader

1 write_seqlock(&lock); 1 unsigned seq;

2 // modify 2 do {

3 write_sequnlock(&lock); 3 seq = read_seqbegin(&lock);

4 // read data here

5 } while (read_seqretry_xxx(&lock, seq));

47

Page 54: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

Unlike the algorithm described for CMX, the Linux seqlock uses an even/odd check todetect if the writer is currently active. The CMX algorithm uses the status flag for this.As the status variable needs to be checked anyway in CMX, there is no advantage ofchecking even/odd instead or additionally. On the contrary this half the chances for thetransaction counter to overflow, except that this won't happen anyway to a 264 variableduring one read-cycle.

5.5 Verification with Models

The previous attempts to verify the data access protocol of CMX, by writing proofs innatural language, showed how difficult it is to describe parallel algorithms in informalprose. As suggested by McKenney [7], we additionally use the language Promela and itscompiler Spin [29], to model some of the core aspects of the algorithms in CMX.

This allows better understanding of the algorithm as it is re-written in a language spe-cialized for parallel computing.

In contrast to the unit-tests written in C/C++ (see subsection 6.2.5), where we try to findproblematic situations by executing the code sections multiple times concurrently to trig-ger a race condition, Spin will search the entire possible state space of the algorithm.

This can quickly lead to state space explosion, if we would try to implement a complex al-gorithm completely. A complex model would create a state space that cannot completelybe searched in a lifetime. Instead we try to focus on key aspects of our algorithms. Aftera short introduction, a small model will only reflect the writer-writer situation. Followingthat, the next step will be a writer-reader situation.

All the models described in the following were successfully validated by means of theassertions written in the code.

5.5.1 Simple Example of a Promela Model

This is an introductory example unrelated toCMX. Listing 5.1 shows a very simple Promelamodel. It consists of two so called processes (keyword proctype).

• A producer toggles the global variable turn to C, triggering the other process consumerto start.

48

Page 55: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

• In a real scenario, the second process would do some real work. In this example,the consumer immediately changes the state back to P, indicating that he finishedhis work.

1 mtype = { P,C };2 mtype turn = P;3

4 active proctype producer(){5 do6 :: (turn == P) ->7 progress_produce:8 printf("Produce\n");9 turn = C;

10 od;11 }1213

14 active proctype consumer(){15 do16 :: (turn == P) ->17 progress_consume:18 printf("Consume\n");19 turn = P;20 od;21 }

Listing 5.1: Simple model of a producer/consumer scenario

Figure 5.8 shows the state spaces as created by the Spin compiler.

Figure 5.8: Model states, image generated by spin from Listing 5.1

.

5.5.2 Model of Two Writers

The next Promela program is actually linked to CMX Reader/Writer algorithm. It is stilla pretty obvious case: In a two writer scenario, is it guaranteed that the two threads can-

49

Page 56: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

not access/write at the same time: does the mutual exclusion through the state variableactually work?

1 mtype = { STATE_SET, STATE_UPDATE };2 mtype state = STATE_SET;3 int count;4

5 inline enter() {6 atomic { state == STATE_SET -> state = STATE_UPDATE; }7 }8

9 inline leave() {10 atomic { state = STATE_SET; }11 }12

13 active [2] proctype writer() {14 do15 :: enter() ->16 count++;17 assert(count == 1);18 count--;19 leave();20 progress:21 skip22 od23 }Listing 5.2: Model of two Writers/Updaters using test-and-set locking with state

variable

The model in Listing 5.2 describes two identical writer processes, named writer(). Spinwill explore all possible execution paths of these two processes executed concurrently.With every possibility of a process to enter the critical region, spin checks with (assertcount == 1) that the other process is currently not in the critical region.

5.5.3 Model of Concurrent Reader/Writer

The previous check can be extended to include a reader. Listing 5.3 contains one writerand one reader. The model covers all the states of the CMX protocol as described inFig. 5.6.

Instead of writing real values, it uses a boolean value which is 'true' if the field containsvalid data and 'false' while the writer updates the field inside the critical region.

According to the protocol, the reader makes a copy of ctr and then the value. If ctr

matches at the end, then the boolean value flag needs to be true.

50

Page 57: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

1 mtype = { STATE_SET, STATE_UPDATE };2 mtype state = STATE_SET;3 int ctr_count = 1;4 bool value = true;5 int no_writers;6

7 inline enter_update() {8 atomic { state == STATE_SET -> state = STATE_UPDATE; } }9

10 inline leave_update() {11 atomic { state = STATE_SET; } }12

13 active [1] proctype writer() {14 if15 :: enter_update() -> /* TW_1 CAS(state,SET,UPDATE) */16 no_writers++;17 assert(no_writers == 1);18 ctr_count++; /* TW_2 ctr++ */19 value = false; /* TW_3 would happen here */20 assert(no_writers == 1);21 no_writers--;22 value = true; /* TW_3 ends */23 leave_update(); /* TW_4 state = SET */24 fi;25 }26

27 active [1] proctype reader() {28 int ctr_j;29 int ctr_k;30 bool read_value;31 bool success = false;32

33 start:34 ctr_j = ctr_count; /* TR_1 ctr^j */35 if36 :: (state == STATE_SET) -> /* TR_2 state == SET */37 read_value = value; /* TR_3 read value */38 ctr_k = ctr_count; /* TR_4 ctr^k */39 if40 :: (ctr_j == ctr_k) -> /* TR_5 ctr^j == ctr^k */41 assert(read_value) /* read was outside TW_2..TW_3 */42 :: else ->43 progress_retry: goto start44 fi;45 fi;46 end_success:47 }

Listing 5.3: Model of a Writer and a Reader

51

Page 58: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

5 Design of CMX Protocol and Data Structures

5.6 Conclusions

This chapter made an analysis of suitable inter-process communication principles andimplementations. Based on shared memory, a data access protocol for shared data struc-tures has been developed. The protocol is perfectly suited for the use-case of CMX. Itprovides very fast and unobtrusive publishing of run-time values from shared memory.The next steps will involve implementing CMX using this protocol based on POSIX sharedmemory objects.

The algorithm was tested with different verification techniques. In one approach wereasoned about the correctness in written prose. Then models were created, using a spe-cialized language and verification tool. The successful verification of the protocol wasmandatory to advertise CMX for use in critical applications.

52

Page 59: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

This chapter targets different aspects of the implementation process. The first part ana-lyzes properties of the compiler and the target computing platform, including operatingsystem, processor instructions and memory consistency.

The second section of this chapter introduces into the implementation details of the newCMX library.

6.1 Platform and Toolchain

CMXprimarily targets Scientific Linux 5 on Intel x86-32 and 6 on Intel x86-64 (SLC5/SLC6).It is written in the C and C++ programming languages. The compiler versions shippedwith the operating system releases are gcc 4.1.2 on SLC5 and gcc 4.4.7 on SLC6.

6.1.1 Compiler

Today only themost up-to-date C compilers are aware of concurrent programming. Thereis an ongoing effort to standardize and implement atomic operations and architecture-aware memory access for the C Language. The new C11 (ISO/IEC 9899, 7.17 Atomics)standard [30] contains a header file named stdatomic.h which offers basic atomic typesand operations. The compilers used in SLC5/6, however, do not support these features.

Memory consistency is not the only issue a developer needs to take care about when cre-ating concurrent programs. Due to that, compilers are not aware of concurrent execution,they tend to make code optimization, which change the original intent of the program.

53

Page 60: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

Erroneous Optimization The following example shows that compiler optimizationcan have undesired results when it comes to concurrent access of a shared state (here anint value).

While the following program looks legitimate:

1 int function(int * state) {2 int c = 0;3 if (*state == 1) c += 1;4 if (*state == 1) c += 1;5 if (*state == 1) c += 1;6 return c;7 }

The compiler is right to assume that *state does not change inside the function. Both ofthe next two transformations are therefore valid:

1 int function(int * state) { 1 int function(int * state) {

2 int c = 0; 2 if (*state == 1) return 3;

3 int s = *state; 3 else return 0;

4 if (s == 1) c += 1; 4 }

5 if (s == 1) c += 1;

6 if (s == 1) c += 1;

7 return c;

8 }

On the left side the compiler deduced that Based on the previous optimization, thethe three if clauses always the samemem- right code evaluated the additions to x atory value. Therefore it optimized the ac- compile-time.cess to a single read which will then sup-posedly stay in a CPU-register.

In the C language, the keyword volatile is used to mark variables that are memory-mapped and can be changed fromoutside the program's process. As described in “VolatilesAre Miscompiled, and What to Do about It” [31] some compilers unfortunately containbugs regarding the handling of volatile data types. For example, a program accessing avolatile int * variable multiple times can still be compiled wrongly to access the vari-able only once. The authors recommend to use read-helper functions to prevent the com-pilers from optimizing the code, in their analysis: “96% of all the volatile errors we foundare fixed through the introduction of helper functions” [31, section 5 and 6.3].

54

Page 61: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

This also makes sense to support the readability of the source code and not need to addvolatile to all relevant variables but use the implicitly applied typecast in the functionparameter:

1 int function(int * state) { 1 int

2 int c = 0; 2 read_int(volatile int * v) {

3 if (read_int(state) == 1) 3 return *v;

4 c += 1; 4 }

5 .... 5 int

6 } 6 read_int(volatile int * v)

7 __attribute__ ((noinline));

With the additional gcc function attribute noinline, we can prevent this function frombeing inlined by the compiler. This eases the verification of the algorithm because we caneasily trace the order in which operations are executed. On the other side, one needs toremember that this additional call may have a considerable performance impact.

6.1.2 Atomicity of Operations

With concurrent execution of algorithms, any non-atomic/divisible operation can yieldhalf-processed data to other threads. This is explained in the following example of asmall program of two threads:

1 while (c<n) 1 int a=data[0];

2 data[++c] = source[c]; 2 int b=data[1];

• Thread 1 provides data in a global array called data.

• Thread 2 reads data from Thread 1 and stores into two variables named a and b.

If both threads start at the same time, “int b” can contain the value from after the updatewhile “int a” was read before the update in Thread 1 occured. This is a very simple

55

Page 62: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

scenario, we won't go deeper into solving it at this point. It was already discussed insubsection 5.3.2.

The essential question is, how do we know that a simple assignment is executed atomi-cally/in one step. For example, int x = 0xff00ff00; where the memory/register backingthe variable x never contains, for instance 0x0000ff00 or 0xff000000?

Simple Atomic Operations If we translate the program to assembler, we would seethat the assignment of an 32-bit value maps to exactly one store/load assembler instruc-tion. Still, the processor could translate this internally into many smaller operations. Forexample:

• Moving 4 times each 8bit of the value

• Loading the constant into a register, storing the register to memory location of x.

While the second option is fine inmost situations, the first one results in having a differentvalue for some time at the destination.

For x86-64 the Intel documentation [32, ch.8 p.8] says “The Intel-64 memory-orderingmodel guarantees that:”

• Constituent memory operation that read or write a double word (4 bytes) whoseaddress is aligned on a 4 byte boundary appears to execute as a single memoryaccess

• … read or write quad word (8 bytes) whose address is aligned on a 8 bytes boundary…

• Any locked instruction appears to execute as an indivisible sequence of load(s),followed by store(s) regardless of alignment.

For the x86-32 (Intel Pentium and newer) the documentation [32, ch.8 p.2] states that “thefollowing additional memory operations will always be carried out atomically: Readingor writing a quadword aligned on a 64-bit boundary”

More Complex Atomic Operations If two threads want to communicate with eachother about shared data, they need to use an atomic operation that allows them to com-

pare the value of a shared datum and conditionallymodify the datum in one step (atomic).This can be illustrated by a simple spinlock implementation, used for example in operat-ing systems for low-level synchronization. The naive implementation looks like:

56

Page 63: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

1 bool lock=false; // shared variable2 PROCESS() {3 do {4 if (!lock) {5 lock = true;6 break;7 }8 } while (1);9 // critical section

10 lock = false; // unlock11 }

This solution is wrong, because it is not atomic. Both hardware threads can execute andpass the if(!lock) check at the same time. We can fix this with an instruction whichcompare (line 4) and modify (line 5) in one step.

Imagine a instruction which does the work in line 4 to 7 in the previous program in onesingle atomic step, called check_and_set. The resulting program would look like:

1 bool lock=false; // shared variable2 PROCESS() {3 // spin on lock4 while (1) { if (check_and_set(&lock)) { break; } }5 // critical section6 lock = false; // unlock7 }

This function has actually been available on Intel platform since i486, named compare

and swap or compare exchange (Instruction Mnemonic: cmpxchg). It is implemented asfollows [33, pp. 3-148]:

1 TEMP ← DEST lock; cmpxchg DEST, SRC

2 IF %eax = TEMP lock; cmpxchg r/m32, r32

3 THEN

4 ZF ← 1; Compare EAX with r/m32.5 DEST ← SRC; If equal,6 ELSE set ZF and7 ZF ← 0; load r32 into r/m32.8 %eax ← TEMP; Else,9 DEST ← TEMP; clear ZF and

10 FI; load r/m32 into EAX.

Compare and swap is not part of Standard C99 [30, ISO 9899:1999], but it can be wrappedin a C function using inline assembler. In the following short example this wrapper is

57

Page 64: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

called cmx_atomic_val_compare_and_swap_int32 and returns the value of TEMP. If the ex-change succeeds this value is equal to the first argument and exchanged (swapped) withthe value in the second parameter.

This actual code is used to enter the TW1 state of the writer access protocol:

1 // State = T_W_12 // Enter state UPDATE3 switch (cmx_atomic_val_compare_and_swap_int32(4 &cmx_shm_ptr->value_state[value_index],5 CMX_SLOT_STATE_SET, CMX_SLOT_STATE_UPDATE))6 {7 case CMX_SLOT_STATE_SET:8 // thats OK, the XCHG succeeded9 break;

10 case CMX_SLOT_STATE_UPDATE:11 // another update is in progress12 return E_CMX_CONCURRENT_MODIFICATION;13 default:14 // invalid other state15 return E_CMX_OPERATION_FAILED;16 }

The C compiler gcc [34] in version ≥ 4.4 provides the builtin function to some atomicprimitives. For example, __sync_val_compare_and_swap() is a wrapper to access cmpxchg

in a portable way. But the experience from this project shows that one should alwayscheck the generated assembler code before using the gcc atomic-builtins.

With older compilers (for SLC5, gcc-4.1.2) inline assembler must be used. New compilers,supportingC11, will likely provide a compare exchange function through the <stdatomic.h>header.

6.1.3 Processor Memory Consistency

In the previous section we described how to the compiler can be prevented from reorder-ing our carefully sequenced program. The next step is to ensure that the processor doesnot destroy our efforts by executing instructions out-of-order, causing other processorsin an SMP system to see corrupted data.

The ancient DECAlpha architecture or ARMbased CPUs, now popular inmobile comput-ing, have a very weak memory ordering model. They aggressively try to re-order memoryaccesses to gain performance [7, appendix C.7.1]. With out-of-order execution, a CPUis free to execute independent instructions until a datum, required for a previous loadinstruction, is actually loaded into the processor. This way, independent units inside the

58

Page 65: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

processor (e.g. FPU and Integer ALU) can continue to work and do not need to idle, if thecurrent instruction is unrelated to them.

Todays multi-processor machines can execute hundreds of instructions in the time thatis needed to fetch a single datum from memory, by this they can achieve an even greaterspeedup in instruction reordering. The gap between processor vs. memory performanceincreased constantly since 1980 [35, p. 289].

To reduce the communication with the slower memory in the hierarchy (communicationwith cpu-registers to L1/L2/L3-Cache, cache to main memory), processors have store andload buffers between the registers and the level 1 cache. In contrary to the caches, buffersonly hold a single datum not an entire cache linemirroring whole 64 bits of main-memory.

CPU 0 CPU 1

BufferStore

BufferStore

CacheCache

Memory

Interconnect

Lock

CPU 0 CPU 1

CacheCache

Memory

Interconnect

Figure 6.1: Simplified CPU with and without Store Buffer (Figure based on [7])

At the same time, buffers are not involved in the cache coherency protocol that ensuresa consistent view of memory between the processors in a SMP system. The importantaspect here is that the programmer must actually take care that the buffer is flushed tothe caches at the right time, if he wants other processors to be able to see the updateinstantaneously.

The Intel x86-32/64 in contrast to ARM, PowerPC or Alpha CPUs has a rather strong

memory ordering model, while still using store buffers per logical CPU (see Fig. 6.1 right).This means, the developers have less things to care about when writing parallel codecompared to ARM but the CPU needs more logic to provide a strong memory model andapply optimization such as out-of-order execution or load/store buffers at the same time.

59

Page 66: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

However, for Intel x86 the manufacturers reference manual about the memory model [32,ch.8 p.6] is still extensive and complicated.

While the ARM/PowerPC memory model is formally defined, there has been some ef-fort to provide the same for the x86 architecture. The most recent description of the x86memory model is called x86-TSO (Total Store Order) [36]. The x86-TSO is not an offi-cial documentation from Intel or AMD. It is specified by researchers based on vendordocumentation, assumptions and extensive testing on real hardware.

Total Store Order means that the processors agree on a global order of the store (mem-ory write) operations. In contrast to SC (sequentially consistent memory), the strongestmemory model (all processors share a consistent view on the memory at any time), theIntel x86 processor is allowed to buffer write (store) operations. As a kind of exceptionthe processor is also allowed to read (load) his own stored values before the global orderis established.

Due to the big changes in the history of the x86 architecture, the x86-TSOmodel describesa model that is valid for all x86 processers, even though there may exist x86 processorsproviding a stronger ordered-memory model.

Store Buffer Effects with x86-TSO The Intel documentation claims in one place that:“In general, the existence of the store buffer is transparent to software, even in systemsthat use multiple processors” [32, ch.11 p.20]. This is literally not quite true.

Fig. 6.3 shows a simple data-race involving the store buffer. The scenario features twoprocesses (Tp andTq), operating on thememory locations [x] and [y], and their respectiveregister %eax. Both [x], [y] and %eax are initialized with 0.

We see that the load of [x] by Tq, or the load of [y] by Tq respectively, takes place duringthe time that the value is still in the store buffer of the corresponding processor. In con-sequence, the loads will likely load the initial value (0) instead of 1, even in contradictionwith the program order.

To fix this problem, one has to insert so called fences or barriers. They force the buffer toflush the content ahead of schedule. A corrected version would look like:

60

Page 67: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

mov %eax, [y] mov %eax, [x]

mov [x], $1 mov [y], $1t t

Tp Tq

Figure 6.2: x86 Assembler test program

mov [x], $1

mov [y], $1

x=0y=0R

AM

INSTR

SB {x=1}

x=1

{y=1}

y=1

mov %eax, [x]

FLUSH

FLUSH

y=1x=1

eax=0

eax=0

mov %eax, [y]

INSTR

SB

time

{} {}

{}{}

Figure 6.3: Timing of x86 assembler test program

1 mov [x], $1 1 mov [y], $1

2 mfence 2 mfence

3 mov %eax, [y] 3 mov %eax, [x]

With the insertion of the barriers after each memory write, the thread's store buffer isforced to flush its contents immediately. Thus, both threads share an equivalent view ofthe memory [37, p. 12].

Store Buffer and CMX/Sequence Locks Applying the theory about data-races withx86-TSO to the algorithm used in CMX reveals no problems regarding concurrency. Thestore buffer indeed stays transparent to software.

..W1

.(w,x ,1) .

W2

.

(w,y ,1)

. R1

. ( r , x ,_).

R2

.

( r ,y,_)

..

Figure 6.4: Read and write values x and y

61

Page 68: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

The diagram in Fig. 6.4 is simplified but equivalent to processes which read and writevalues x and y using CMX. Here process W is writing to CMX values (x and y) in sharedmemory and R is reading the very same. As described in section 5.3, the behavior of theCMX algorithm is that a reader never writes to the shared memory.

Because TSO ensures that the stores don't get reordered, the data always stays validaccording to the protocol.

However, things can get interesting if we take a look at a special case. It is describedin “Reasoning About the Implementation of Concurrency Abstractions on x86-TSO” [37,p. 16] and applies in the same way to the CMX algorithm:

[…] a programwhere the reading processors only access memory via the codeat Reader is trivially TRF [triangular race free]. However, there are data racesbetween the writer and a reader on […ctr, value], and if the reading pro-cessor has written to memory before initiating the read, these becometriangular races.

Fig. 6.5 shows the data-races described in the quote.

However, on x86, we won't have the chance to observe this theoretical bug with CMXbecause the locked instruction cmpxchg in TW1 imposes a total memory order. It is usedto change from state SET to state UPDATE.

As marked with the blue line (from T 1W1

to T 0W4

), the TW process on thread 1 will not beable to progress until the SET in TW4 left the store buffer of thread 0.

To optimize we could flush the store buffer at the end of TW4 explicitly, and thus minimizethe time spent in TR on thread 1 waiting for the unlock and respectively the fail rate onthread 1.

The locking behavior on the state variable makes the race on ctr (red lines) irrelevant.

Conclusion The documentation for the memory ordering on our target x86-32/x86-64processor architecture is quite complicated, fragmented and leaves room for interpreta-tion. The history of the x86 architecture is long and there have been many improvementsover the time and the current processors still have to keep compatibility with chips fromthe 1980s. The x86-TSO model seems to provide a good reference for the processor be-haviour, but on the other side it is highly academic.

62

Page 69: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

T 0W1

(CAS,state ,SET→UPDATE)

T 0W2

( r , txn ,A)

T 0W2

(w,txn ,( A+1))

T 0W3

(w,value ,V)

T 0W4

(w,state ,SET)

TR1 ( r , ctrj ,A ')

T 1R2

( r , state ,SET)

T 1R3

( r , value ,V ')

T 1R4 ( r , ctrk ,A '')

T 0R1( r , ctrj ,B ')

T 0R2

( r , state ,SET)

T 0R3

( r , value ,W')

T 0R4( r , ctrk ,B '')

T 1W1

(CAS,state ,SET→UPDATE)

T 1W2

( r , ctr ,B)

T 1W2

(w,ctr ,( B+1))

T 1W3

(w,value ,W)

T 1W4

(w,state ,SET)

Thread 0 Thread 1

Figure 6.5: Reader/Writer with CPU swap

Compared to the weak memory model of other processors, the x86 is easier to target be-cause of the stronger memory model. From the processor point of view, these guaranteesadd more dependencies. They increase the need for inter-core communication, thus theoverall performance is theoretically lower.

Using unnecessary memory barriers instead of relying on specific x86-behavior costssome performance but is much easier to handle and more portable. The needed memorybarriers can be issued using gcc-builtin functions or assembler instructions, for SLC6 andSLC5 respectively.

63

Page 70: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

6.1.4 Processor Cache Coherency

Multiprocessor systems use a cache coherency protocol to maintain coherency betweenall the processor caches and the main memory. For high performance and low overheadthe cache management communication must be kept as low as possible.

In our target architectures, the cache line size is 64-bytes. Thatmeans if a processor wantsto cache some memory, it will cache blocks of 64-bytes from the main memory at once.This will be important to consider when it comes to define the layout of data structureslater on.

The algorithm in CMX is equivalent to Sequence Locks (see section 5.4) from the cacherelevant perspective. It is perfectly suited to keep the cache communication low, becausethe 'reader' process will never do a write access to the data [9, p.16]. Therefore, the 'reader'will never have to gain exclusive ownership of the cache lines holding the values.

6.1.5 Choosing a Suitable Timesource

The metrics in CMX are supposed to be time-stamped in the event of an update. For nowthe timestamps are set automatically using the POSIX Realtime API. The clock_gettime()function can select from available clocks defined in the time.h system header.

Both the 32-bit and 64-bit SLC6 systems offer the following clocks (overview based on theclock_gettime() documentation):

• CLOCK_REALTIME System-wide real-time clock. Real-time in the sense of that thevalues represent the amount of time (in seconds and nanoseconds) since the startof the epoch (01.01.1970).

• CLOCK_MONOTONIC Clock that represents monotonic-time, counting from some un-specified starting point.

• CLOCK_PROCESS_CPUTIME_ID Thread-specific CPU-time clock.

Starting from Linux 2.6.32 some new clock sources are defined but exposed only in the64-bit SLC6.

• CLOCK_REALTIME_COARSE A faster but less precise version of CLOCK_REALTIME.

• CLOCK_MONOTONIC_COARSE A faster but less precise version of CLOCK_MONOTONIC.

64

Page 71: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

• CLOCK_MONOTONIC_RAW Similar to CLOCK_MONOTONIC, but provides access to a raw hard-ware based time that is not subject to NTP adjustments.

As described in [38, ch. 15.2.1], the _COARSE variants avoid the actual read of the desig-nated timer source and therefore also the switch from user-space to kernel-space. Thishas been validated with the perf [39] tool. Instead they use the so-called vDSO mecha-nism (see vdso(7)) where actual kernel functions are exposed and then executed in user-space.

The performance gain of using the _COARSE variant is noticeable. In case of virtualizedmachines used during development of CMX, it is even higher, presumably because theclock read-out avoiding a call to the hypervisor.

The next table shows the time for updating a 64-bit integer value 5 million times single-threaded in CMX under different conditions. The tests were repeated 10 times with perf

and the percentage shows the difference between these runs.

CLOCK_REALTIME CLOCK_REALTIME_COARSEnative 0.459 s. (+- 2.51%) 0.353 s. (+- 6.87%)virtualized 4.024 s. (+- 2.20%) 0.421 s. (+- 4.10%)

Table 6.1: Compare execution times on virtualized/unvirtualized hosts regarding the us-age of different clock sources

The tests were made on up-to-date modern hardware. The virtualized tests are run ina Microsoft Hyper-V guest system. The native system is equipped with two Intel XeonX5660 at 2.8 GHz.

Because of effects of the processor caches, those results cannot be simply divided by 5million to get the execution time of a single CMX update. The time spent in a singleupdate can be much longer if the processor first has to load the CMX data structuresfrom main memory.

Conclusion CMX is intended to be used in real-time applications. These machines arevery seldom virtualized and have fast time sources available for use with CLOCK_REALTIME.

Depending on requests from users, CMX might provide a way to pass the update times-tamp in the future. It is possible that there are applications of CMX where the users wantto pass a timestamp gathered from an external source, e.g. an accelerator timing receiver.

65

Page 72: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

6.2 Implementation Overview

CMX is implemented in C. Additionally, there is a C++ library available which acts as awrapper for the C functions.

6.2.1 The Implementation in C

The implementation of CMX is split in different modules. These are:

cmx.h Parent header file, includes complete public APIshm.h CMX Components and Values in shared memorylog.h CMX logging and log redirection functionsprocess.h Predefined metrics for the CMX Process Componentregistry.h Lookup of CMX Components§atomic.h Implementation of atomic primitives§common.h CMX error codes and some common functions§shm-private.h CMX data structures in shared memory but not exposed in the API

Table 6.2: Tabular overview of C source-code header files

For each header file (<name>.h) a corresponding implementation file (<name>.c) can befound, except for shm-private.h. The files marked with § are for internal use only andprotected with an #ifdef acting as include-guard.

6.2.2 The C++ API

The API for C++ is a wrapper around the C implementation. It provides amore convenientaccess to CMX while still focusing on real-time suitability and low overhead.

Fig. 6.6 shows a UML Diagram of the classes in the CMX C++ API.

• CmxRef is the base type of a CMX Value reference. It holds the type of the value asan integer value but is itself untyped.

• CmxImmutableInt64, …Float64, …Bool and …String are typed references to CMXValues. They can by created from CmxRefs using the cmx_cast function.

66

Page 73: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

• CmxInt64 (without Immutable), …Float64, …Bool and …String are the same but ex-tended for write support. They are created by the newInt64,Float64,Bool,String

function of Component

• ImmutableComponent can be used to open a CmxComponent of another process. Itimplements the read-only operations on CMX components. If requested, the imple-mentation can hide all error-cases from the user by returning neutral values insteadof errors.

• Component addsmodifying create, remove and set operations to ImmutableComponent.If requested it can act as a 'dummy' as well, causing all write operations to be no-ops.

• Registry is responsible for listing existing CMX components on the local system.cleanup() removes components from dead processes.

• ProcessComponent can be used to create a CMX Process Component, there prede-fined metrics about the process are exposed. It should be called by every processbefore creating CMX components.

• CmxException Various exception classes map the C error codes.

One way of using CMX in C++ is similar to the C API. Fig. 6.7 shows a self-containedexample.

The C/C++ API can be also be used in a more abstract way, it supports mapping a C++class to a CMX Component with very little effort needed from the developer. This ap-proach is shown in Fig. 6.8. The CMX class CmxSupport is not shown in the previouslyshown class diagram but also part of CMX-C++.

67

Page 74: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

CMW::CMX

Cm

xR

ef

-immutableComponent_: ImmutableComponentPtr

+mtime(): uint64_t

+name(): std::string

+type(): int

Cm

xIm

mu

tab

leIn

t64

+c_type: typedef = CmxTypeInfo<CmxTypeTagInt64>::c_type

+operator c_type()

Cm

xIm

mu

tab

leFlo

at6

4+c_type: typedef = CmxTypeInfo<CmxTypetagFloat64>::c_type

+operator c_type() Cm

xIm

mu

tab

leB

ool

+c_type: typedef = CmxTypeInfo<CmxTypeTagBool>::c_type

+operator c_type()

Cm

xS

trin

g+c_type: typedef = CmxTypeInfo<CmxTypeTagString>::c_type

+operator c_type()

Mu

tab

leC

om

pon

en

tMix

in-component_: ComponentPtr

Cm

xIn

t64

+operator=(value:const c_type &): c_type

Cm

xFlo

at6

4

+operator=(value:const c_type &): c_type

Cm

xB

ool

+operator=(value:const c_type &): c_type

Cm

xS

trin

g

+operator=(value:const c_type &): c_type

Imm

uta

ble

Com

pon

en

t+iterator: typedef = ComponentIterator

+begin(): iterator

+end(): iterator

+getValue(ref:CmxImmutableInt64&): CmxImmutableInt64::c_type

+getValue(ref:CmxImmutableFloat64&): CmxImmutableFloat64::c_type

+getValue(ref:CmxImmutableBool&): CmxImmutableBool::c_type

+getValue(ref:CmxImmutableString&): CmxImmutableString::c_type

+getValueAsString(ref:CmxRef &): std::string

+name(): std::string

+processId(): int

+setIgnoreErrors(ignoreErrors:bool): void

+isIgnoreErrors(): bool

Com

pon

en

t

+newInt64(): CmxInt64

+newFloat64(): CmxFloat64

+newBool(): CmxBool

+newString(): CmxString

+set(ref:mxInt64 &,value:CmxInt64::c_type &): void

+set(ref:CmxFloat64 &,value:CmxFloat64::c_type &): void

+remove(ref:CmxRef &): void

+create(name:std::string,ignoreError:bool=false): ComponentPtr

+dummy(): ComponentPtr

<<typedef>>

Com

pon

en

tPtr

: b

oost:

:sh

are

d_p

tr<

Com

pon

en

t>

<<typedef>>

Imm

uta

ble

Com

pon

en

tPtr

: b

oost:

:sh

are

d_p

tr<

Imm

uta

ble

Com

pon

en

t>

Th

e C

MX

C+

+ A

PI

Cm

xExcep

tion

Cm

xErr

orC

od

eExcep

tion

...

Reg

istr

y+iterator: typedef = RegistryIterator

+begin(): iterator

+end(): iterator

+cleanup()

+open(processId:int,componentName:std::string &): ImmutableComponentPtr

Pro

cessC

om

pon

en

t+update()

Cm

xC

astE

xce

pti

on

Figure 6.6: C++ Class diagram

68

Page 75: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

1 #include <ctime>2 #include <iostream>3 #include <cmw-cmx-cpp/ProcessComponent.h>4 #include <cmw-cmx-cpp/Component.h>5

6 using namespace cmw::cmx;7

8 int main()9 {

10 struct timespec tm;11 tm.tv_sec = 0;12 tm.tv_nsec = 50000000;13

14 ProcessComponent::update();15

16 ComponentPtr component = Component::create("stats");17 CmxInt64 metr_test = component->newInt64("test");18

19 std::cout << "Enter␣work-sleep␣loop" << std::endl;20 for (int i = 0; i < 100; i++)21 {22 metr_test = i; // update metric23

24 if (i % 500 == 0) ProcessComponent::update(); // update processmetrics

25 (std::cout << ".").flush();26 nanosleep(&tm, NULL);27 }28 }

Figure 6.7: Simple example of using the CMX C++ API

Header1 class Demo1 : CmxSupport2 {3 CmxInt64 counterInt;4 CmxFloat64 counterFloat;5 CmxBool counterBool;6 CmxString counterString;7 public:8 Demo1();9 void execute();

10 };

Implementation1 Demo1::Demo1() :2 counterInt(newInt64("Component1", "counter_int")),3 counterFloat(newFloat64("Component1", "counter_float")),4 counterBool(newBool("Component1", "counter_bool")),5 counterString(newString("Component1", "counter_string", 30))6 {7 counterInt = 1;8 counterString = "Initalizing...";9 }

Figure 6.8: Demo of OO abstraction for CMX (excerpt)

69

Page 76: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

6.2.3 Independent Usage of CMX

CMX can be used independent from the CERN infrastructure. The public source codereleases contain configuration files for building the libraries and example applicationsusing the scons [40] build system.

Also included is a Python program, based on the C++ API, which allows inspection ofall CMX-enabled applications running on a host via the HTTP [14] protocol. The outputis either formatted in HTML, intended for humans or in JSON [41], for integration intoexisting monitoring systems.

The integration of CMX into the CERN environment is discussed in chapter 7.

6.2.4 Real-Time Compatibility

CMX is targeted to suit requirements of real-time applications. We defined the term “realtime” in TERM 3 and our concrete requirements in TECH 1.

The real-time compatibility of a program can be influenced by different parameters. In thefollowing we will give an overview over possible causes that can harm real-time executionand analyze their impact in CMX.

Priority inversion describes the effect of unintentionally changing a thread/processpriority in a scheduled system. The effect is pointed out in Fig. 6.9, with an examplewhere 3 Jobs (Threads or processes) access 2 resources concurrently.

In this scenario, the 3 Jobs have different priorities. T1, the job with the highest priority,tries to grab a lock on resource 'brown'. However, this resource is currently owned bythe Job T3, having a 'low' priority. This means, high priority jobs must wait for the lowpriority job to finish.

Again, before unlocking resource 'brown', the low priority job T3 must do some work onresource 'blue'. Resource 'blue' however, is owned by the middle priority job T2. This waythe high priority job is affected by two lower priority jobs.

This scenario does not take the scheduling effects into account. Depending on the strat-egy, the effects may vary but in the end the system is generally more willing to give timeto higher priority jobs which, in case of priority inversion, cannot make any progress and

70

Page 77: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

Priority

Time

High

Middle

Low

T1

T2

T3

High MiddleLow Low High

wait

continue

continue

wait

Figure 6.9: Priority Inversion With 3 Threads on 2 Resources

immediately yield the execution or even worse, waste valuable compute time by spinningon the lock.

In section 5.1 we concluded that real-time aspects are most important for the get() andset() operations. For get() and set(), because they contain no resource accesseswhich are vulnerable for priority inversion or calls to the operating system, this isa not an issue. There is no blocking behavior of the resources in use hence, the effectcannot appear.

In other scenarios, a possible way to solve priority inversion is to implement priorityinheritance. There the processes which a high priority process is waiting for, temporarilyinherit the high priority from the waiting processes and hence can finish earlier, thus thewhole system finishes earlier.

Memory Management The malloc()/free() functions are not used inside the set()

or get() functions of CMX. However, memory reserved in shared memory is un-mapped(not wired to physical memory) by default. Thismeans the 4K pages are bound to physicalmemory only on-demand, in fact the first time they are accessed.

This way a process can reserve any addressable amount of memory without immediateeffect, the operating system will map the address to physical memory only at the firstreal data access. This is detected by catching the page-fault resulting from the invalidaccess. Handling the page-fault takes time and has to be avoided in a real-time process.

71

Page 78: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

One can force the mapping simply by calling memset() over the whole data structure,thus activate the page-fault handling of the operating system, or use mlock() which alsoguarantees that the pages will stay in RAM and will not be swapped out to disk. In CMX,we actually want to profit from this behaviour since a CMX Component is fixed size butrarely filled completely.

On Linux, the mapping to physical memory can be reversed with a madvice(MADV_REMOVE)

(Memory advise) call.

System Calls System calls in general can harm the real-time execution badly, becausethe user-space program using CMX has no control over what is happening inside thekernel. For example, the shared memory allocation code uses locks internally to protectthe management data structures against concurrent access.

Except for setting up the data structures in sharedmemory, CMX does not use any systemcalls.

foreign/libc provided functions In principle every function called in CMX, which isnot implemented in CMX itself, has to be verified in terms of runtime complexity. For-tunately there are not many and if we only take the one involved in get() and set()

operations into account, there are the following:

• printf()/vprintf for the logging functions (can be disabled).

• memcpy() is used to copy multi-byte data such as character strings.

• getpid() used to validate if the current process owns a CMX Component.

• clock_gettime() obtain a time value for the modification timestamp.

Logging in CMX involves calling printf() and related functions. String formatting isquite expensive in general, however depending on the logging level, functions involvingprintf() are only called in case of errors. Then, this can also create side-effects andblocking in case of errors or warnings (depending on the configured log-level) when itis configured to write to stderr. To avoid this, logging can be disabled completely atcompile time, for specific or all log-levels. This also reduces the binary size considerably.

memcpy is directly proportional to amount of memory which is to be copied, given thatthis memory is initialized (wired). This is the case after calling add() in CMX.

72

Page 79: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

While getpid() is generally translated into a system call, the value will be cached by libcand only be updated after a fork() call.

The clock_gettime() function uses the vDSO (virtual Dynamic Shared Object) on Linux,which is actual kernel code, linked and executed in user space. This allows a low overheadcall and is almost constant in time, since it translates into reading a Linux SeqLock (seesection 5.4) protected value.

6.2.5 Automated Testing

CMX is embedded into the BE-CO integration test environment based on Bamboo, acontinuous integration (CI) server application by Atlassian Software [42].

The CI server is connected to the source code management system and triggers the exe-cution of test plans according to changes in the source-code.

The automated tests are split into tests of the C API (including test code-coverage report),the C++ API and an integration test performing interprocess communication between allrecent CMX versions to ensure backward compatibility.

LCOV-codecoveragereportCurrentview: toplevel-cmw-cmx Hit Total Coverage

Test: unnamed Lines: 481 687 70.0%Date: 2014-04-17 Functions: 47 54 87.0%

Filename LineCoverage(showdetails) Functionsatomic.c 100.0% 28/28 100.0% 11/11common.c 92.0% 23/25 100.0% 4/4log.c 100.0% 48/48 100.0% 6/6registry.c 60.8% 48/79 100.0% 6/6shm-private.h 0.0% 0/47 0.0% 0/6shm.c 72.2% 328/454 94.4% 17/18shm.h 100.0% 6/6 100.0% 3/3

Generatedby:LCOVversion1.10

Figure 6.10: Results of the coverage analysis (CMX 2.0.4)

73

Page 80: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

Unit tests using Google Test The tests using the Google Test framework are intendedto cover all possible usage scenarios of CMX and ensure the correctness of the code. Thecoverage of the tests can be verified using code coverage analysis tools such as gcov withlcov [43].

Fig. 6.10 shows the code-coverage result for the CMX-C source-code. The low coverageof registry.c and shm.c is due to error handling, where possible errors cannot be easilytriggered in unit tests. The file shm-private.h contains some functions which are not usedin production and thus this result can be ignored.

Valgrind tests Valgrind is a machine code execution engine which can apply transfor-mations before actually executing the code.

The most popular tool in the valgrind family is memcheck. It keeps track of memoryrequested from the operating system. If the program makes illegal access to this mem-ory, memcheck prints error messages with a detailed message report of what's going on,including a stack trace.

Furthermore valgrind can be used to apply wrapper functions to already defined symbols.This way system-calls can be easily wrappedwithout having to deal with dynamic linking.

We created tests where the ftruncate() function is overridden by amock implementation.The actual test, whichwill execute cmx_shm_create()while calling my_ftruncate() insteadof ftruncate(), then looks like:

1 WITH_WRAPPER(ftruncate, my_ftruncate)2 {3 cmx_shm * cmx_shm_ptr;4 my_ftruncate_result.ret = -1;5 ASSERT_EQ(E_CMX_CREATE_FAILED, cmx_shm_create("foo", &cmx_shm_ptr));6 ASSERT_NE(0, my_ftruncate_result.fd);7 ASSERT_NE(0, my_ftruncate_result.length);8 }

The asserts in this example verify the correct error handling and whether the correctparameters have been passed to the wrapper of ftruncate().

Checking of struct size and packing The constant layout of the shared memorystructures is crucial for the correct operation of shared memory applications like CMX.Two versions of CMX, where the shared memory structures are compiled differently, willcertainly lead to problems at run-time.

74

Page 81: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

There are no syntactic ways in the C language to guarantee that the compiler will alwayscreate the same memory layout. A compiler optimizing for size might choose the densestpacking of the fields, another one that optimizes for fast access puts the fields in bestalignment for the processor.

In CMX, checks to verify the constant in-memory layout of data structures are impera-tively executed at compile time. All builds must fail if the checks are not successful.

This is implemented in functions which will not be used later on, but have to be syntac-tically correct anyway. The checks examine the offset and the overall size of all structsand compare them to user-defined expected values.

The size is obtained using the compiler-function sizeof(struct type) and the offset usingoffsetof(struct type, field). These functions are compile-time constants, their valuesare calculated by the compiler. A condition with two constant values will be resolved atcompile time, the false branch will not be compiled.

In the false branch of the condition we place a function which is artificially marked aserroneous using gcc-attributes. The whole process is supported by preprocessor macros,if the false branch is taken, gcc will abort the compilation with a message about the causeof the failure.

A set of tests for a struct looks like:1 static_assert_eq(0U, offsetof(cmx_shm_value, _int64), "Check␣Offset");2 static_assert_eq(0U, offsetof(cmx_shm_value, _float64), "Check␣Offset");3 static_assert_eq(0U, offsetof(cmx_shm_value, _bool), "Check␣Offset");4 static_assert_eq(0U, offsetof(cmx_shm_value, _string), "Check␣Offset");5 static_assert_eq(8U, sizeof(cmx_shm_value), "size␣of␣cmx_shm_value");

6.2.6 Performance Analysis

Latencies The latencies of CMXhighly depend on the actual setup. Today's systems areseldom single processor, but most often multi-core systems with more than 4 processorcores. The performance of shared-memory applications depends on architecture-specificfeatures and can vary among different generations of Intel/AMD processors. Most im-portantly, it depends on the main workload that is running on a machine.

As long as a CMX writer thread is the only workload on a system it obviously runs withvery high speed since all the operations take place inside the processor's Level 1 Cache.

75

Page 82: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

Once another thread on a second processor core is going to read the CMX-Values, thiscore needs access to the cache-lines holding the memory pages in which the CMX-Valuesreside. At the same time, the writing thread must regain authority of the cache line tomake modifications to the value. Now, writing to the cache-line will invalidate all copiesthat have been previously shared with the reading thread. This activity creates cache-coherence protocol communication between the cores, hence it adds additional latencies.

When CMX is embedded into another application, its data will probably not stay in theLevel 1 cache. Therefore, the values may need to be fetched from the DRAM first, whichresults in a additional delay. Accessing DRAM can be estimated with around 60ns [44,p. 22]. Int64/Float64/Bool values fit into one cache line. Character values are certainlyslower, depending on their size.

In general the overhead of CMX is equal to similar memory operations. Updating a valuein CMX is only slightly more expensive than a usual memory operation. The overheadis created in the reader/writer access protocol by memory fences and the compare-and-swap to mutually exclude the writers.

Validation of memory behavior As described in section 5.2, we expect the operatingsystem to allocate the memory for the shared memory used in CMX on demand. Thiscan be easily verified by creating a CMX component through the CMX API and addingvalues while recording the current memory usage.

The results are plotted in Fig. 6.11. The values were taken on a SLC6 system. Values areobtained from the getrusage()[21] function. The plots show the values over time whileadding CMX values to a CMX Component.

The upper plot shows the current memory usage minus the current value at start-time(blue) and the corresponding calculated size of the cmx_value struct multiplied by thenumber of values.

The bottom plot shows the number of page-faults of the process. This aligns nicely withthe increase of the actual memory usage. Every access of the uninitialized memory trig-gers a page-fault, as a consequence the operating system jumps in and allocates themem-ory as needed.

76

Page 83: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

0 500 1000 1500 2000 2500 3000 3500 4000No. of metrics

0

100

200

300

400

500

resi

dent

set

siz

e [k

B]

maxrss minus initcalculated size

280300320340360380400420440

Page

faul

ts

Figure 6.11: Change of memory usage depending on number of metrics

6.2.7 Possible Extensions

Direct increment operations Feedback from early adopters mentioned that it wouldbe handy to have a call for integer values which directly increase the value for one step.Otherwise, if they do not store the values themselves somewhere, they must execute aget() operation before every update of the value.

While this is implemented in the C++ part using a separate get() before set(), it wouldcertainly make sense to support this operation in one transaction. Without this “atomic”increment of CMX values, a multi-thread writer scenario is not possible without riskingto loose some updates.

Garbage collection In the current implementation, the space for the values stays ini-tialized until the first usage (using add()). After a remove() call, the memory does not getuninitialized, thus stays in physical memory. Currently this is not a big problem since sofar most users populate their values once at start time and do not change anything lateron.

Simply marking one value as unused is impossible since this must be done for whole

77

Page 84: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

6 Implementation of CMX

memory-pages at once. A memory page on Linux/x86 is 4 KB where a CMX Value is 128KB. Therefore, many values must be freed before they can get initialized.

The foreseen approach for this is to provide a call to search for values in aligned blocksgreater or equal than the page size, mark themwith a special status for garbage collection,inform the operating system that this memory is free using madvise(MADV_DONTNEED) andthen set the status to free.

Dynamic character string size The current implementation requires the size of thestring length to be determined at the time a new CMX Value is added. When the valueis set, the exact length is recorded, but it must be less than or equal to the previously setmaximum size.

This behavior can be designed to be more dynamic by allocating the slots for storing thedata as needed when updating the value. A string value would then only consume theactual amount of space it really needs.

On the other side it raises the danger of update failure during runtime and the developershould not be obliged to handle such situations. The same behavior can be emulated byremoving the value for a short time and then add a new one under the same name witha different size.

If this happens for more than one value, it will also increase the fragmentation of char-acter value slots.

6.3 Conclusions

This chapter covered the main part of the work in this thesis. Starting from the ideas andthe technical foundations prepared in the previous chapters, we made the transformationof the CMX algorithm/protocol from theory into an implementation using the C and C++programming languages and POSIX SHM objects.

During the implementation, we observed different kinds of issues through the involvedtechnology stack. We described and solved different aspects, ranging from hardwarecharacteristics and programming techniques to compiler tricks.

Certainly the whole issue around the memory model could be solved more elegantly ifC11 were available, but this might be valid for many development setups targeting theSLC5 toolchain throughout the coming years.

78

Page 85: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

7 Integration in CERN Infrastructure

This chapter describes the essential steps in the integration of the monitoring and diag-nostic capabilities of CMX into established tools at CERN. This includes the integrationinto the DIAMON system (see requirements FUNC 1, p. 12) and tools for the diagnosticsuse case (FUNC 3).

7.1 A Remote Agent for CMX

The CMX library includes command-line tools, which can be used to inspect, read andcreate CMX values on the local machine. In many cases, users expect to access the valuesexposed by CMX remotely. For example, in the Java world the graphical JConsole tool canbe used to inspect a process over a network. This is implemented by direct communicationwith the process over a TCP connection.

In contrary to JMX, CMX does not provide networking capabilities. There has to be aprocess, a CMX Agent, which connects to the shared memory data structures, managedby CMX and responds to requests over a network.

In the accelerator controls infrastructure, there is already a daemon which runs on al-most every machine, the “clic” agent. It is a system monitoring daemon and supports aplugin-architecture for which hardware-specific modules exists. All “clic” agents reportthe acquired metrics to a message broker dedicated to monitoring.

Due to themodular architecture of the agent, it was easy to extend the functionality. Withthe new CMX module, monitoring CMX Values is supported in the same standardizedway as other system metrics. Additionally, custom commands can be used to list theavailable CMX Components and their values.

The standardized metric protocol uses a “clic”-specific, self-describing protocol for thetransmission of values over a message broker connection. The CMX-related remote com-mands respond with JSON [41] formatted messages.

79

Page 86: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

7 Integration in CERN Infrastructure

7.1.1 Diagnostic Access in the DIAMON GUI

To provide an easy diagnostic access to CMX values, a graphical user interface has beendeveloped. The MX-Viewer is available either as a standalone application (see Fig. 7.2) orintegrated into the host-centric view of DIAMON Console user-interface.

To make the work flow more general and reach a higher user acceptance for CMX, theMX-Viewer also supports read-only access to Java programs using JMX. Therefore, it du-plicates the functionality of the JConsole Tool in accessing MBean attributes.

7.1.2 Monitoring of CMX Enabled Applications in DIAMON

DIAMON is based on the C2MON [45, 46] SCADA Monitoring System, which uses dis-tinct modules to access different kinds of data-sources, like one for network equipmentthrough SNMP or Java through JMX.

For the integration of CMX into DIAMON, it was not necessary to create a new accessmodule for C2MON, since the one for accessing the “clic” agent can be reused.

CMX metrics are addressed using a generic metric naming format. The agent reads themetric configuration from the database with field formatted like:

<modulename>{arguments}

for example: cmx.metric{process-name,component-name,value-name}

real example: cmx.metric{CGFS_COHAL,COMPONENT,lun1.LatestCall}

The monitoring of CMX values is configured in the CERN Controls database. The config-uration string for identifying the metric can be generated in the MX-Viewer (see Fig. 7.3).

Since CMX integrates tightly into the monitoring system, the values from CMX can beanalyzed using the provided tools as any other monitored value. Also the alarm-triggerrules are applicable.

80

Page 87: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

7 Integration in CERN Infrastructure

CM

X Li

brar

y

CMX AgentSTOMP

ProcessC++

ProcessC

CMX Remote Viewer

Accelerator Monitoring system

Command Line

Exposes

Exposes

Reads

Inspects via

Remote Access

Local Access

….Component

….Component

Software Engineer

Operator

clients=10packets_lost=0commits_failed=0txns_per_s=134no_threads=10

Computer

Shared Memory

Figure 7.1: CMX remote access using an agent

Figure 7.2: The MX-Viewer, running embedded in the Diamon Console, showing live val-ues from a CERN BE-CO C++ program

81

Page 88: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

7 Integration in CERN Infrastructure

Figure 7.3: Configuration of a CMX Metric. The MX Viewer (on top) generated the valuespecifier for the web-based controls configuration interface (at the bottom).

7.2 Interaction of CMX with Build Tools

C and C++ projects for CERN accelerator controls are built with a common build systembased on make. The system enforces some conventions about the source directory layoutand the structure of released products. The releases are copied to a common repository,similar to maven [47] but far simpler. It does not involve a server process, all data isshared over NFS. Only one repository is usable at a time and there is no dependencymanagement in place.

The build process of C/C++ libraries and applications has, compared to standards knownfrom Java, some serious shortcomings related to release and dependencymanagement. Infact there is no support from the system in these tasks. Without exceptional attention it iseasily possible to mix-up the dependencies between the various products. Most softwareprojects in BE-CO depend upon a lot of libraries.

C/C++ programs, in contrary to Java with various RuntimeExceptions and detailed infor-mation, do not fail nicely in case of errors. They simply crash without useful informationabout the errors introduced by linking errors. In particular, the C language toolchain ig-nores the function's signature and does not warn about duplicate symbols in the linker's

82

Page 89: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

7 Integration in CERN Infrastructure

path, so there are even more possibilities to create scenarios which are very difficult todebug.

Dependency verification using a manifest To improve the current situation we es-tablished a non-intrusive extension to the build system, which collects dependency andcompiler information throughout the build process. This information is called Manifest.

The manifest contains metadata about the project, describing the external resources/de-pendencies used at build time. This data will then be propagated through each dependentbuild process. At the end, this gives an insight into the complete build process.

This manifest is also added to the resulting binaries, which are currently copied to theservers without any meta-data and hence lacking any self-identifying datum. This waywe prevent situations where one cannot determine which version of a software is actuallyrunning on a live system.

Format of a manifest file During the compilation of each product, a manifest file isgenerated as plain-text file. The following snippet shows an example manifest file:

1 $Manifest: name=fesa-deploy-unit/CGAFG_DU:1.0.3@2014-05-06T08:40:09+0000;2 dep=fesa-class/CGAFG:1.0.3@2014-05-06T07:40:25+0000;3 user=...;4 compiler=i386-redhat-linux-g++ (GCC) ...;5 os=Linux 2.6.32-...;6 cpu=L865 $7 $ManifestDep:8 name=fesa-class/CGAFG:1.0.3@2014-05-06T07:40:25+0000;9 dep=;

10 user=...;11 compiler=i386-redhat-linux-g++ (GCC) ...;12 os=Linux 2.6.32-...;13 cpu=L865 $

The format is oriented towards the RCS-Keywords [48, sec.2.4] which can be identified bythe UNIX tool ident. It can be executed on any file, searches for RCS-Keywords and printsthem one per line. RCS-Keywords are formatted like $keyword: data $. The manifestalways uses the term “Manifest” or “ManifestDep”, respectively as keyword for dependentlibraries.

CMXuses one keyword line per product. Each product ships all keywords fromdependentlibraries, their $Manifest:$ line is transformed $ManifestDep:$ and their name= value isadded to the list of libraries on which they depend in the 'dep=' attribute.

83

Page 90: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

7 Integration in CERN Infrastructure

The name= attribute is formatted like project-name/product-name:version@timestamp. Thisis similar to GAVC identifiers used in Java environments. In this analogy, it is group-id/artifact-id:version@timestamp, where timestamp has no equivalent in the Java worldand classifier has no equivalent in the C/C++ world. The timestamp is used to detectspurious re-releases which might create distortions but may not be fatal.

The attribute user= contains the username running the release-process, compiler= theC/C++ compiler name and version, os= the operating system and cpu= the identifier ofthe target platform.

Figure 7.4: MX-Viewer showing dependency information of a remotely running program

Embedding of structured text The plain-text manifest file is converted to an object-code file named manifest.o using objcopy from binutils [49]. The manifest.o file is thenput into the static library archive file with the rest of the product's object files. It is alsoput explicitly into the final executable, enabling the use of ident to identify deployedexecutables.

The role of CMX in improving the CERN control system build system is to make thismanifest also accessible remotely to operators and general specialists, thereby eliminatingthe need to use command-line tools. With this in place, automatic verification of the

84

Page 91: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

7 Integration in CERN Infrastructure

current configuration, by comparing it to the software which is currently running, can bedone.

The MX viewer application post-processes the manifest information to give a graphicaland easily understandable overview. Fig. 7.4 shows the MX-Viewer focusing on the man-ifest, where the application (here the “clic” agent itself) has two dependencies, one calledcmw/cmx-cpp (the C++ wrapper for the CMX library) also depends on the other cmw/cmx.

A software library developer is able to scan all running programs to find out where hissoftware is running and in which version. This can be used during inventory-taking, itcan support smooth software upgrades of operational software. In problem diagnostictasks this provides a standardized, easy to use interface to compile-time information.

7.3 Conclusions

CMXwas successfully integrated into the existing monitoring systemDIAMON. This wayover 2,000 hosts can be simply configured to make use of the C/C++ monitoring capabil-ities provided through CMX.

The modular architecture of both the “clic” monitoring agent and CMX enabled a seam-less integration of values provided through CMX with other system metrics. The supportfor CMX in DIAMON was shipped as a regular update for the DIAMON “clic” daemon.CMX proved to be ready to be integrated into existing infrastructure.

The new possibilities with CMX were successfully combined with efforts to improve theC/C++ build system for the CERN accelerator controls software. Supported by CMX,compile-time information gets accessible during run-time.

85

Page 92: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

8 Summary

The target of this thesis was to create a possibility to expose run-time information fromreal-time C/C++ applications, like the ones of CERN's accelerator control system. Wecollected requirements and made a survey over existing solutions. However, we found noexisting products which fit the requirements.

With the CMX software library developed in this thesis, we are able to fulfill the pre-defined requirements successfully. The solution combines low-overhead inter-processcommunication using shared memory technology, with a non-blocking communicationprotocol. We have seen that the run-time overhead of CMX is minimal and suitable fortiming-critical processes.

The CMX API allows developers to expose run-time information in a standardized way.This removes the burden to develop andmaintain different, application-specific tools. TheCMX C library has a low memory footprint and increases the code size by only 10-20 KB.The C++ wrapper adds syntactic constructs to provide a more simple and concise API,while keeping the performance characteristics of the underlying C Library.

CMX is integrated into the CERN accelerator controls monitoring infrastructure. User-friendly tools are provided to simplify the access and hence reduce the overall diagnostictime. Additionally, users can easily develop their own tools and scripts. At time of writingthis thesis, it is being adopted in accelerator controls software at CERN.

CMX is also ready to be used in any common Linux environment. All source code ispublished as an open source project under the LGPL license.

Today, CMX presents a good choice for adding monitoring abilities in any kind of timing-critical C/C++ applications. Thus enable measures to increase the overall availability andlower the general mean time to recover.

86

Page 93: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Literature

[1] CERN. LHC the guide. [Online; accessed 10-June-2014]. url: http ://cds . cern . ch/record/999421.

[2] CERN. The Accelerator complex/Complexe des accélérateurs. [Online; Accessed 10-06-2014]. url: http://cds.cern.ch/record/1621894.

[3] CMS Collaboration. “Observation of a new boson at a mass of 125 GeV with theCMS experiment at the LHC”. In: Phys.Lett. B716 (2012), pp. 30–61. doi: 10.1016/j.physletb.2012.08.021. arXiv: 1207.7235 [hep-ex].

[4] CERN. The Accelerator Control Group (BE-CO). [Online; Accessed 10-06-2014]. url:http://cern.ch/be-dep-co.

[5] W. Buczak et al. Diamon2 - Improved Monitoring of CERN’s Accelerator Controls

Infrastructure. Tech. rep. CERN-ACC-2013-0234. [Online; accessed 10-June-2014].Geneva: CERN, Oct. 2013. url: http://cds.cern.ch/record/1611115.

[6] Felix Ehm et al. CMX – A Generic Solution to Explore Monitoring Metrics. Tech. rep.CERN-ACC-2013-0241. [Online; Accessed 10-06-2014]. Geneva: CERN, Oct. 2013.url: http://www.cern.ch/cmx.

[7] Paul E. McKenney. Is Parallel Programming Hard, And, If So,What Can You Do About

It? First Edition Release Candidate 4, [Online; Accessed 10-06-2014]. Linux Tech-nology Center, IBM Beaverton, 2014. url: https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html.

[8] Poul-Henning Kamp. Timecounters: Efficient and precise timekeeping in SMP kernels.

Tech. rep. [Online; accessed 10-April-2014]. The FreeBSD Project, 2002. url: phk.freebsd.dk/pubs/timecounter.pdf.

[9] Christoph Lameter. Effective synchronization on Linux/NUMA systems. In GelatoConference. [Online; Accessed 10-06-2014]. Silicon Graphics, Inc., 2005. url: https://www.kernel.org/pub/linux/kernel/people/christoph/gelato/gelato2005-presentation.pdf.

87

Page 94: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Literature

[10] Hermann Kopetz. Real-Time Systems: Design Principles for Distributed Embedded

Applications. 1st Edition. Norwell, MA, USA: Kluwer Academic Publishers, 1997.isbn: 0792398947.

[11] W. Richard Stevens. UNIX Network Programming: Networking APIs: Sockets and XTI.2nd. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1997. isbn: 013490012X.

[12] F. Ehm and A. Dworak. “A Remote Tracing Facility For Distributed Systems”. In:Conf. Proc. C111010.CERN-ATS-2011-200 (Oct. 2011), WEMAU001. 4 p.

[13] Michael Kerrisk. The Linux Programming Interface: A Linux and UNIX System Pro-

gramming Handbook. 1st. San Francisco, CA, USA: No Starch Press, 2010. isbn:1593272200, 9781593272203.

[14] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax

and Routing. RFC 7230 (Proposed Standard). [Online; Accessed 10-06-2014]. Inter-net Engineering Task Force, June 2014. url: http://www.ietf.org/rfc/rfc7230.txt.

[15] T. Lundqvist and P. Stenstrom. “Timing anomalies in dynamically scheduled mi-croprocessors”. In: Real-Time Systems Symposium, 1999. Proceedings. The 20th IEEE.1999, pp. 12–21. doi: 10.1109/REAL.1999.818824.

[16] S. Larsen and H. Wong. JSR 3: Java™Management Extensions (JMX™) Specification.[Online; Accessed 10-06-2014]. 1998. url: https://jcp.org/en/jsr/detail?id=3.

[17] Johannes Hölzl. “Monitoring von Anwendungsservern mit Java Management Ex-tensions (JMX)”. Diplomarbeit. Institut für Informationsverarbeitung und Mikro-prozessortechnik, Universität Linz, 2007.

[18] Henrik Storner et al. Xymon systems and network monitor. [Online; Accessed 10-06-2014]. url: http://xymon.sourceforge.net/.

[19] SGI/RedHat. Performance Co-Pilot. [Online; Accessed 10-06-2014]. url: http://oss.sgi.com/projects/pcp.

[20] pvbrowser.de. The Process Visualization Browser. [Online; Accessed 10-06-2014]. url:https://github.com/pvbrowser/pvb.

[21] The IEEE and The Open Group. The Open Group Base Specifications Issue 7: IEEE

Std 1003.1™, 2013 Edition. [Online; accessed 11-June-2014]. 2013. url: http://pubs.opengroup.org/onlinepubs/9699919799/.

[22] Sven C. Koehler. localmemcache. [Online; Accessed 10-06-2014]. url: https://github.com/sck/localmemcache.

88

Page 95: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Literature

[23] Michael Stapelberg and contributors. i3 - improved tiling wm. [Online; Accessed17-06-2014; v4.8]. url: http://i3-wm.org.

[24] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. San Fran-cisco, CA, USA: Morgan Kaufmann Publishers Inc., 2008. isbn: 9780123705914.

[25] Linux Torvalds, Josh Triplett, and Christopher Li. Sparse - a Semantic Parser for C.[Online; accessed 10-June-2014]. url: https ://sparse .wiki . kernel . org/index . php/Main Page.

[26] Hermann Kopetz and J. Reisinger. “The non-blocking write protocol NBW: A so-lution to a real-time synchronization problem”. In: Real-Time Systems Symposium,

1993., Proceedings. Dec. 1993, pp. 131–137. doi: 10.1109/REAL.1993.393507.

[27] Alessandro Rubini and Jonathan Corbet. Linux device drivers, Third Edition. [Online;Accessed 10-06-2014]. O'Reilly Media, Inc., 2001. url: https://lwn.net/Kernel/LDD3/.

[28] Linux Kernel Developers. Linux. [Online, accessed 01-July-2014]. url: https://www.kernel.org/.

[29] Gerard Holzmann. Spin Model Checker, the: Primer and Reference Manual. First.Addison-Wesley Professional, 2003. isbn: 0-321-22862-6.

[30] ISO IEC JTC1/SC22/WG14 - C. [Online; accessed 11-June-2014]. url: http ://www.open-std.org/jtc1/sc22/wg14/.

[31] Eric Eide and John Regehr. Volatiles Are Miscompiled, and What to Do about It. 2008.

[32] Intel®. Intel®64 and IA-32 Architectures SoftwareDeveloper’sManual, Volume 3 A/B/C

(order number 253668, 253669, 326019). 2014.

[33] Intel®. Intel®64 and IA-32 Architectures SoftwareDeveloper’sManual, Volume 2 A/B/C

(order number 325383-050US). 2014.

[34] GCC, the GNU Compiler Collection. [Online; accessed 11-June-2014]. url: https://gcc.gnu.org.

[35] John L. Hennessy and David A. Patterson. Computer Architecture, Fifth Edition: A

Quantitative Approach. 5th. San Francisco, CA, USA: Morgan Kaufmann PublishersInc., 2011. isbn: 012383872X, 9780123838728.

[36] Scott Owens, Susmit Sarkar, and Peter Sewell. “A Better x86 Memory Model: x86-TSO”. In: Theorem Proving in Higher Order Logics. Ed. by Stefan Berghofer et al.Vol. 5674. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2009,pp. 391–407. isbn: 978-3-642-03358-2. doi: 10 . 1007/978 - 3 - 642 - 03359 - 9 27. url:http://dx.doi.org/10.1007/978-3-642-03359-9 27.

89

Page 96: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Literature

[37] Scott Owens. “Reasoning About the Implementation of Concurrency Abstractionson x86-TSO”. In: Proceedings of the 24th European Conference on Object-oriented

Programming. ECOOP'10. Maribor, Slovenia: Springer-Verlag, 2010, pp. 478–503.isbn: 3-642-14106-4, 978-3-642-14106-5. url: http ://dl . acm . org/citation . cfm ? id =1883978.1884011.

[38] Lana Brindley, Alison Young, and Cheryn Tan. Red Hat Enterprise MRG 2 Realtime

Reference Guide. 2013.

[39] perf: Linux profiling with performance counters. [Online; Accessed 10-06-2014]. url:https://perf.wiki.kernel.org/index.php/Main Page.

[40] Steven Knight et al. SCONS - build your software, better. [Online; accessed 11-June-2014]. url: http://www.scons.org.

[41] T. Bray. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159(Proposed Standard). Internet Engineering Task Force, Mar. 2014. url: http://www.ietf.org/rfc/rfc7159.txt.

[42] Atlassian Software. Bamboo Continous Integration and Build Server. [Online; ac-cessed 11-June-2014]. url: https://www.atlassian.com/software/bamboo.

[43] Peter Oberparleiter et al. lcov - a graphical GCOV front-end. [Online; accessed 11-June-2014]. url: http://ltp.sourceforge.net/coverage/lcov.php.

[44] Dr. PhD. David Levinthal. Performance Analysis Guide for Intel® CoreTM i7 Processor

and Intel® XeonTM 5500 processors. Intel Corporation, 2009.

[45] CERN. C2MON open source SCADA monitoring system. [Online; Accessed 10-06-2014]. url: http://cern.ch/c2mon.

[46] M. Braeger et al. High-Availability Monitoring and Big Data: Using Java Clustering

and Caching Technologies to Meet Complex Monitoring Scenarios. 2014.

[47] Apache Maven Project. [Online; accessed 01-May-2014]. url: http://maven.apache.org.

[48] GNU RCS Manual. [Online; accessed 11-June-2014]. url: http ://www . gnu . org/software/rcs/manual/.

[49] GNU Binutils. [Online; accessed 11-June-2014]. url: http://www.gnu.org/software/binutils/.

90

Page 97: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

Literature

Glossary

ARM refers to the ARMv7/v8 computer architecture, designed by ARM Holdings plc. 58, 59, 60

CERN French: Organisation Européenne pour la Recherche Nucléaire.French (originally): Conseil Européen pour la Recherche NucléaireEnglish: European Organization for Nuclear Research 2, 5, 8, 14, 19, 26, 70, 79, 80, 82, 86, 91

DIAMON The Diagnostics and Monitoring system (,,DIAMON2'') is the main monitoring system in theCERNs Accelerator controls infrastructure. 5, 8, 13, 80, 85

Intel Intel Corporation, inventor of the x86 Platform. The term Intel Platform is also used to referto the Intel processor architecture. There we use the term x86-64 for the Intel 32-bit architec-ture (also called i386) and x86-32 for the 64-bit architecture (also AMD-64 or Intel 64) CPUarchitecture. 6, 57, 59, 60

IPC Inter Process Communication. 16, 19, 21, 24, 27

JMX Java Management Extensions - Specification for surveillance and remote control of applica-tions running on the Java Platform. 13, 17, 25, 30, 79, 80

LHC The Large Hadron Collider is the worlds largest and most powerful particle collider, locatedat CERN near Geneva. 1, 2, 3

make Make is a programming language designed to automate build processes. A commonly usedimplementation is GNU Make. Make can be used to model the dependencies between sourcefiles and their corresponding compile units as well as any other file based entities. 82

NTP The Network T ime Protocol is widely used to synchronize computer clocks over ethernet. 65

POSIX Portable Operating System Interface, published by IEEE. 26, 30, 31, 33, 52, 64, 78

SCADA Supervisory Control and Data Acquisition (SCADA), is a term to describe to surveillance andcontrol of industrial processes using a computer system 26

SLC Scientific Linux for CERN, Linux based operating system built from the sources of RedHatEnterprise Linux. 6, 34, 53, 58, 63, 64

SMP A Symmetric multiprocessing (SMP) systems consist of multiprocessor computer hardwarewhere more than one processors connect to a single, shared main memory and are controlledby a single OS instance. 24

System V Unix System V. Standardized in POSIX 1003.1-2008[21] 26, 32, 33, 34

91

Page 98: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

List of Definitions and Requirements

1 TERM: Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 TERM: Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 TERM: Real-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 FUNC: Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 FUNC: Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 FUNC: Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 TECH: Real-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 TECH: Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 TECH: Reusability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 TECH: Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 TECH: Easy to use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 TECH: Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1 Verify: Read before Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 Verify: Read after Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Verify: Read overlaps Write . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 Verify: Start of Write overlaps Read partially . . . . . . . . . . . . . . . . . 435 Verify: Start of Write overlaps Read . . . . . . . . . . . . . . . . . . . . . . 436 Verify: Read inside Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Verify: Write inside Read . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Verify: The value update is non-blocking . . . . . . . . . . . . . . . . . . . 439 Verify: The value update fails if another update is in progress . . . . . . . 4310 Verify: The read operation always detects invalid data . . . . . . . . . . . 43

92

Page 99: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

List of Figures

1.1 CERN Accelerator Complex [2] . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Pictures related to the discovery of the Higgs Boson, CMS Collaboration [3] 3

2.1 CERN Accelerator Control System . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 C++ Monitoring and Diagnostics: Users and their use-cases . . . . . . . . 12

4.1 General monitoring system architecture . . . . . . . . . . . . . . . . . . . 174.2 Data acquisition for monitoring systems . . . . . . . . . . . . . . . . . . . 184.3 Taxonomy of UNIX IPC facilities (Figure is based on [13, p. 878]) . . . . . . 214.4 Java VisualVM showing JMX attributes . . . . . . . . . . . . . . . . . . . . 26

5.1 CMX Host - Process - Component model . . . . . . . . . . . . . . . . . . . 295.2 Virtual to physical memory addresses . . . . . . . . . . . . . . . . . . . . . 335.3 Program flow of a sequence lock implementation . . . . . . . . . . . . . . 395.4 Visualization of data in “Second enhancement” version, over time . . . . . 405.5 Memory structures of CMX (generated using a custom tool on top of

sparse [25]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.6 Overview of the CMX Reader/Writer Protocol with examples . . . . . . . . 445.7 Non Blocking Reader Writer Protocol [26] . . . . . . . . . . . . . . . . . . 465.8 Model states, image generated by spin from Listing 5.1 . . . . . . . . . . . 49

6.1 Simplified CPU with and without Store Buffer (Figure based on [7]) . . . . 596.2 x86 Assembler test program . . . . . . . . . . . . . . . . . . . . . . . . . . 616.3 Timing of x86 assembler test program . . . . . . . . . . . . . . . . . . . . . 616.4 Read and write values x and y . . . . . . . . . . . . . . . . . . . . . . . . . 616.5 Reader/Writer with CPU swap . . . . . . . . . . . . . . . . . . . . . . . . . 636.6 C++ Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686.7 Simple example of using the CMX C++ API . . . . . . . . . . . . . . . . . . 696.8 Demo of OO abstraction for CMX (excerpt) . . . . . . . . . . . . . . . . . . 69

93

Page 100: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

List of Figures

6.9 Priority Inversion With 3 Threads on 2 Resources . . . . . . . . . . . . . . 716.10 Results of the coverage analysis (CMX 2.0.4) . . . . . . . . . . . . . . . . . 736.11 Change of memory usage depending on number of metrics . . . . . . . . . 77

7.1 CMX remote access using an agent . . . . . . . . . . . . . . . . . . . . . . 817.2 The MX-Viewer, running embedded in the Diamon Console, showing live

values from a CERN BE-CO C++ program . . . . . . . . . . . . . . . . . . 817.3 Configuration of a CMX Metric. The MX Viewer (on top) generated the

value specifier for the web-based controls configuration interface (at thebottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.4 MX-Viewer showing dependency information of a remotely running pro-gram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

94

Page 101: Master-Thesiscds.cern.ch/record/1746269/files/CERN-THESIS-2014-086.pdf · CERN-THESIS-2014-086 21/07/2014 Master-Thesis Thema: Referent: Karlsruhe, 13.01.2014 Der Vorsitzende des

List of Tables

4.1 Comparison of Logging, Monitoring and Diagnostics . . . . . . . . . . . . 20

5.1 Comparison of Shared Memory Implementations . . . . . . . . . . . . . . 34

6.1 Compare execution times on virtualized/unvirtualized hosts regarding theusage of different clock sources . . . . . . . . . . . . . . . . . . . . . . . . 65

6.2 Tabular overview of C source-code header files . . . . . . . . . . . . . . . . 66

95