Service Database (SDB) · Document version 1.2 / January 24, 2007 Change log Version Date Changes Author(s) 1.0 15.12.2006 First draft S. Lopienski 1.1 19.12.2006 Several comments

Service Database (SDB)

system requirements technical specification

author: Sebastian Lopienski (IT/FIO)

[email protected]

with input from: T. Bell, G. Cancio, T. Cass, V. Dore, A. Grossir, V. Lefebure,

M. Marques Coelho, U. Schwickerath, B. Tomlin

2

Document version 1.2 / January 24, 2007 Change log

Version Date Changes Author(s) 1.0 15.12.2006 First draft S. Lopienski 1.1 19.12.2006 Several comments G. Cancio 1.2 24.1.2007 Included many comments and

corrections – most important: • split between service

managers and support people • periods in best effort

preferences • limited access to

contact details

T. Bell, G. Cancio, T. Cass, V. Dore, A. Grossir, V. Lefebure, S. Lopienski, M. Marques Coelho, U. Schwickerath, B. Tomlin

3

Table of contents 1. Introduction................................................................................................................. 5

1.1. ITIL compliance.................................................................................................. 5 1.2. About this document ........................................................................................... 5

2. Definitions................................................................................................................... 6 3. Data model .................................................................................................................. 7

3.1. Services, subservices, metaservices.................................................................... 7 3.2. Clusters and nodes .............................................................................................. 7 3.3. Clusters/nodes to services mapping .................................................................... 7 3.4. Service class........................................................................................................ 8 3.5. Support schedule................................................................................................. 8

Best effort.................................................................................................................... 8 Piquet .......................................................................................................................... 9

4. Use cases..................................................................................................................... 9 4.1. Use case #1 – Whole CC down........................................................................... 9 4.2. Use case #2 – Single (isolated) alarm on a node ................................................ 9

5. Requirements ............................................................................................................ 10 5.1. User operations ................................................................................................. 10 5.2. Administrator operations .................................................................................. 11 5.3. Offline copy ...................................................................................................... 11 5.4. Reports for the management ............................................................................. 12 5.5. Non-functional requirements ............................................................................ 12

6. Security ..................................................................................................................... 12 6.1. Security model .................................................................................................. 12 6.2. User roles .......................................................................................................... 13 6.3. Offline copy ...................................................................................................... 15 6.4. Data recovery .................................................................................................... 15

7. System architecture................................................................................................... 15 7.1. System components .......................................................................................... 16 7.2. Data flow........................................................................................................... 16

8. Data structures .......................................................................................................... 17 8.1. Service............................................................................................................... 17 8.2. Service class...................................................................................................... 19 8.3. Manager/support person contact information ................................................... 19

9. Interactions with existing systems ............................................................................ 20 9.1. CDB (Configuration Database)......................................................................... 20 9.2. SLS (Service Level Status) ............................................................................... 21 9.3. Lemon (CERN Monitoring).............................................................................. 21 9.4. CERN Phonebook and HR databases ............................................................... 21

10. End notes............................................................................................................... 22 10.1. What SDB is not?.......................................................................................... 22 10.2. Open questions.............................................................................................. 22

4

5

1. Introduction The aim of the Service Database (SDB) project is to provide an interactive database of computing services hosted in the Computer Centre (CC) at CERN. Both low-level, infrastructure services (operating system and core software on nodes and clusters) and high-level services (applications) will be covered by the SDB. SDB should work like a service catalog, storing also service class definitions, services' hierarchy and dependencies, service support schedules (with service managers and support personnel on best effort schema and piquet schedule), contact information for support people etc. The purpose is to provide accurate information to operate the Computer Centre (properly restarting services after a major problem in the CC, escalating to the right person(s) a problem that arises on a specific node or service at a specific time), and to store and provide information on services. Main users of SDB will be Computer Centre operators and sysadmins, service providers and IT department management. SDB will be independent of, and complementary to the existing Configuration Database (CDB), as the latter stores details of nodes and clusters' configuration. Service details will be regularly exported from SDB to Service Level Status (SLS) system.

1.1. ITIL compliance Information Technology Infrastructure Library (ITIL) is a framework of best practice approaches intended to facilitate the delivery of high quality IT services.1 Although SDB is not part of an explicit ITIL implementation, it will provide new features to consolidate the description of system and service configurations in the Computer Centre at CERN, as recommended by ITIL.

1.2. About this document This document describes data model used in SDB, groups and structures system requirements for the SDB, and lists a number of important use cases. It proposes a security model for SDB, and main data structures. It also describes the general architecture of the system and some most important data structures, as well as its interactions with, and impact on existing systems.

1 Read more at http://www.itil.co.uk/ and http://en.wikipedia.org/wiki/ITIL

6

2. Definitions service a computing service that is (usually) provided by CERN IT department and hosted

in the Computer Centre. application service a service that is seen/used by users (like CVS Service). infrastructure service (aka fabric service) not a real computing service, rather a set of activities carried out

on particular nodes or clusters, like updating software, reacting to low-level alarms etc. (e.g. taking care of lxbatch cluster).

service class a concept that allows sharing attributes between very similar services. Services that

belong to the same service class are its instances. service manager a person responsible for a service, an administrator of a service service support person a person supporting a service, that may be contacted in case of problems with the

service. (A given person can both manage a service and support it). CDB Configuration Database, part of Quattor toolkit.1 node a single machine in the Computer Centre. Information about nodes and details of

node configuration are stored in CDB. cluster a set of nodes with similar configuration and usually used by the same or similar

services. Information about clusters and details of cluster configuration are stored in CDB. Clusters may be divided to subclusters. For the sake of simplicity, “a cluster” means “a CDB cluster or a CDB subcluster” in this document.

SLS Service Level Status display (available at http://cern.ch/SLS). SLS static XML

1 More details at: http://www.quattor.org

7

XML file with general service description that is consumed by SLS. It contains service details like name, group, dependencies etc.1

subservice (in both SLS and SDB) a service that is (physically or logically) part of another

service or group of services. metaservice (in SDB) a service with metaservice flag on. (in SLS) a group of services; availability of a metaservice is usually calculated as a

weighted average of availabilities of its subservices.

3. Data model

3.1. Services, subservices, metaservices SDB will share service model with SLS. The service model used by SLS is a hierarchy of services. Services that are below (part of) another service are its subservices. A service can be part of several other services. Metaservice is a simply a group of services, represented as a separate service with several subservices.2

3.2. Clusters and nodes Clusters and nodes in CDB are stored in a hierarchical model where each node belongs to one domain, one cluster and possibly one sub-cluster. This model is used by operators, sysadmins and service managers. When dealing with clusters and nodes, SDB will adhere to this model.

3.3. Clusters/nodes to services mapping SDB will keep information about both infrastructure and application services. Information about cluster and nodes is, and will be stored in CDB. There is usually no simple one-to-one mapping between clusters and services. For example, a given cluster could host several services, or a service could use machines from different clusters.

1 See more at: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#StaticXML 2 In SLS, metaservice webpage is different from regular service page.

8

CDB templates describing nodes’ or clusters’ configuration shall contain IDs of service(s) hosted on that nodes/clusters. This information should be available to SDB via CDB SQL database (see also chapter 9.1 CDB (Configuration Database)).

3.4. Service class Service class (see also the definition) is a way of grouping services that share some attributes. The goal is to be able to define one service class and then create several instances of it, instead of having to specify same details to several services in a row. To some extent, a service class is like a service template, with some service description fields pre-filled. Using a service class concept is different than cloning a service. The latter is just creating a separate, new service, and copying over service attributes once. Further changes in the original service will not affect the cloned service. On the other hand, when a service class is modified, all its instances get subsequently modified as well. A service class can have any or all of the service attributes defined. If an attribute is specified in a service class, it cannot be overwritten in its instances. There is no hierarchy of service classes. A service class can be cloned.

3.5. Support schedule In principle, there are two service support schemas recognized in SDB:

• best effort – person concerned declares, in general or per service, his preferences: whether or not he wants to be contacted on working hours, evenings, weekends and nights

• piquet – only time periods in full hours, at most one person for a service at any given time, no preferences (person on piquet can be called)

A service may have both piquet schedule and best effort personnel defined at the same time. When a service support person is to be contacted, the person on piquet takes precedence over people supporting the service on best effort basis.

Best effort

Best effort preferences can be modified by the person concerned, or (per service) by service managers of a given service. For each period (working hours, evenings, weekends and nights) and service choice (a particular service or all services) a person declares whether and how he wants to be contacted (mobile phone, phone at home); his “callability” (high, normal or low) and can add comments (e.g. “Contact only for critical production problems”). An e-mail notification will be sent to other managers of services this person manages or supports.

9

It will be possible to define different best effort preferences for different periods defined by their start and end time (full hours or days). These periods cannot overlap, and are not repetitive (no recurrence). Each support person can declare his current availability by setting and unsetting a special flag. Additionally, for each support person similar flag per service is defined, indicating whether this person is currently an active support person for this service. The latter flag can be modified by any service manager of a given service (for example, if a support person is ill and have no Internet access to change his availability).

Piquet

Piquet schedule for a service can be modified by any manager of the service. An e-mail notification will be sent to other service managers and support persons. If there is no piquet support for a service for a given period, its managers should mark it accordingly.

4. Use cases

4.1. Use case #1 – Whole CC down (in italic components that are not part of SDB)

• SDB (or SDB offline copy) provides a list of services (infrastructure or application), ordered by their (1) criticality and (2) dependencies

• For each service, SDB o identifies which other services are needed before starting the service o lists nodes that this service works on (is hosted on) o provides contact information for service support people o points to Service Restart Procedure documentation

this documentation explains how a service/application should be restarted, which nodes have to be started, and how to confirm whether the service has been correctly started up

4.2. Use case #2 – Single (isolated) alarm on a node (in italic parts of the procedure/process that are not part of SDB)

• Alarm is related to either infrastructure (node) or application on it • SDB says which services run on that node. • Operator follows adequate procedure(s). • If he has to call a service support person, SDB provides an ordered list of persons

to call for a given service (infrastructure or application) o first the person on piquet, then people on best effort basis (ordered by their

“callability”, then in random order)

10

o each entry in the list should contain person’s name, phone numbers, information whether the person is on piquet or best effort, and person’s comments

• If there is nobody on the list, operator doesn’t call anyone. o display group leader’s name and phone numbers, if escalation needed.

Group leaders will not be called except for very serious incidents, as judged by CC staff, based on their experience.

5. Requirements

5.1. User operations This chapter lists operations that users should be able to perform via user interfaces to SDB – provided, of course, that they are authorized to do so (see chapter 6.2 User roles for more details). Operations on a service (and on a service class):

• create a new, blank service • clone this service (create a new, independent service that is a copy of this service) • make service class of that service (or vice-versa for a service class) • see and modify service details (including service personnel, subservices,

dependencies, infrastructure etc.) • delete this service • see and modify piquet schedule for a given period for this service (periods when

there is no piquet coverage defined should be clearly marked) • see best effort support personnel (and their preferences) for this service • get a list of support people (with contact details) for a service and a given moment

in time, as described in 4.2 Use case #2 – Single (isolated) alarm on a node • list all subservices of that service (also indirect) • list all services this service is a subservice of (also indirect) • list all services that depend on this service (also indirectly) • list all services this service depends on (also indirectly) • search for a service by name, group etc. • browse between services (lists mentioned above should be clickable)

Operations on a service manager or a support person (notifications should go to all service managers of service(s) concerned):

• see and modify person’s contact information • see and modify (with notification) person’s best effort preferences and comments • see this person’s piquet schedule for all services for a given period

11

• list services managed or supported by the person • assign the person (as a service manager or support person) to a service (with

notification) • remove the person from a service (with notification) • search for a person by name, group etc.

In addition to operations mentioned above, SDB user interfaces should also provide data views described in 4.1 Use case #1 – Whole CC down and 4.2 Use case #2 – Single (isolated) alarm on a node.

5.2. Administrator operations SDB administrators should be able to perform the following operations (in addition to all operations listed above):

• add and remove persons for the list of users authorized to access SDB • list “zombie” services (services not managed or supported by anyone, services

outside of service hierarchy, services with missing mandatory information etc.) • access SDB logs • etc.

5.3. Offline copy The SDB offline copy is a copy of crucial SDB data, that can be moved to a stand-alone machine disconnected from the network. It is a read-only copy, and so it will not allow any data modifications. It may therefore be implemented with different technologies than SDB itself. Information in the SDB offline copy should be stored in an easily printable format. The offline copy doesn’t have to include all information stored in SDB, but just data needed by CC staff in case of major CC failure (see 4.1 Use case #1 – Whole CC down). In principle, it should contain information necessary to restart services in the CC, and to contact services’ managers and support people. SDB offline copy will not contain Operational Procedures (like how to restart a given service), as they are not part of SDB. The offline copy should be updated frequently. SDB should automatically and regularly (for example, every 1 hour) generate the offline copy, and then send it by e-mail to an outside account, and store it on a flash drive plugged in to the server. SDB should also detect, and warn administrators, if the offline copy was not requested recently.

12

5.4. Reports for the management Service managers and IT management would like to be able to generate different reports about computing resources and services. Some examples are listed below. Reports that are based on SDB data only should be available via SDB user interface:

• What services are being managed by person X? • What services are being run by people in section X? • What services are running on machine or cluster X? • Which services require person X on piquet during the next Y months? • Which people were on piquet for service X (or: from section Y) and for how

many hours during last month? (with subtotals for Sundays/holidays and for regular working days)

• etc.

Generating other reports would require also querying CDB or other databases. Such reports will not be available via SDB user interface. SDB database should nevertheless accept SQL queries for SDB data necessary for reports like the following:

• What services are running on machines which run out of warranty in the next X months?

• Are all services running in the critical area able to maintain their service if the other machines are off?

• What would be the impact of changing the service level of a service to 8x5? • Which service requires backup to another building and/or disaster recovery

facilities off-site? • Which service is using the operations contract for the most alarms? • Which services have no deputy technical contact? • What services are being run by someone who is leaving?

5.5. Non-functional requirements SDB user interfaces should be simple and intuitive to use. SDB should be easy to maintain for its administrators.

6. Security

6.1. Security model Because of the sensitivity of information stored in SDB, the whole system should be accessible only to authorized persons (CC operators, service managers/support people, and IT management). The list of authorized users will be managed by SDB administrators.

13

Users will be authenticated using a site-wide login mechanism (e.g. with their NICE or CERN Lightweight account credentials). A registered user will be “assigned to” one or more roles (see chapter 6.2 User roles for more details). Depending on their role(s), they will be able to access certain data. SDB administrators will manage assignments of users to roles. It should be noted that SDB is not a tool for end users – they should use SLS to get basic information about services, their availability etc.

6.2. User roles There are several different roles defined in SDB:

• CC staff (operators, sysadmins) • Service managers • Service support people • IT management • SDB administrators

As it was stated before, a user may have more than one role. Table 1 is a role/action matrix – it presents which roles are allowed to perform which actions:

14

CC operators

Service managers

Servicesupport

IT mgmt

SDB admins

Service users

External report tools

via user interface n/a SQL access reading service details + + + + + – + modifying service details – +/– (1) – – + – – generating reports – +/– (1) – + + – – reading contact info + +/– (2) +/– (3) +/– (4) + – – modifying contact info – +/– (2) +/– (5) – + – – generating offline copy + – – – + – – managing user groups – – – – + – –

(1) Only for services managed by a given service manager (2) Only for service managers and support people of services managed by a given service manager (3) Only for people currently supporting services supported by a given support person (4) Only contact details and preferences for people in their group (5) Only contact details and preferences concerning a given support person

Table 1. Roles and actions matrix CC operators can read all data from SDB, including service managers’ contact information, but are not granted any rights to modify data. They can generate SDB offline copy. A service manager or support person can:

• see details of all services stored in SDB • see best effort preferences and contact details for only support people of services

he supports • modify only his best effort preferences and his contact details

A service manager can additionally:

• create a new service (he initially becomes its manager) • modify and remove services he manages (including piquet schedule) • see best effort preferences and contact details for only people managing or

supporting services he manages • modify his contact details and best effort preferences, and that of other people

managing or supporting services he manages IT management can read all service details and generate reports about service and service support. SDB administrators can read, modify and remove all data stored in SDB.

15

Services’ end-users will not be authorized to use SDB, and should refer to SLS instead. Additionally, external tools may be granted direct read-only SQL access to SDB database, but only to service details (not to service managers’ contact information).

6.3. Offline copy SDB offline copy will be protected with a password. This password will be communicated to people concerned (CC staff). Reading data stored in the SDB offline copy will not be permitted unless the password is provided. SDB offline copy can be requested by CC staff, as described above, or by non-interactive scripts from a predefined set of machines. In the latter case, SDB will relay on host-based authentication (such as certificates) to ensure that only authorized hosts get a copy of SDB data.

6.4. Data recovery SDB will not keep history of changes to data it stores. Nevertheless, user actions will be logged for auditing purposes. These logs, together with regular SDB database backup, should allow data recovery in case of malicious attack or accidental data loss.

7. System architecture General architecture of the system can be seen on Figure 1. The diagram contains also different groups of users (roles), and other existing systems that SDB interacts with.

16

Figure 1. SDB data flow diagram

7.1. System components SDB Web application will be the only user interface to SDB. It will allow authorized users to enter, modify and retrieve information about computing services at CERN. SDB will store data in a relational (SQL) database. The system will be implemented to use Oracle DB. An Entity Relationship Diagram will be prepared, but is not part of this document. SDB offline copy will be implemented as a set of files in HTML format (or similar) exported from SDB.

7.2. Data flow Arrows in the diagram on Figure 1 are meaningful; they show the main direction in which data flows for a given pair of entities. For example, CC operators and sysadmins are in principle data consumers, while the main task of service managers is to provide information about their services. Direction in which an arrow points do not necessarily say who initiates this particular communication. Let’s discuss each connection: (Users to SDB)

• CC staff (operators and sysadmins) access SDB web interface to get information about services, their dependencies etc. as well as contact details of people

17

responsible for given services (see 4.1 Use case #1 – Whole CC down and 4.2 Use case #2 – Single (isolated) alarm on a node).

• If SDB web application is not available (for any reason: network down, database problem etc), CC staff uses SDB offline copy (which is main SDB data exported to HTML files).

• Service managers use SDB web interface to enter and modify information about their services, as well as their contact details and preferences.

• Other users (IT management, service managers) use SDB web interface to learn about various services, dependencies between them, people and support schedules etc.

(SDB components)

• SDB web application will access SDB database to store and retrieve information about services, support people and support schedules.

• SDB web application will be able to generate, on request, an offline copy of SDB data, by exporting data to HTML files. Scheduling this operation (automatic and regular export of SDB data) and securely storing the resulting SDB offline copy will be done in cooperation with CC staff.

(SDB to existing systems)

• SDB will query CDB SQL for information about clusters and nodes (see chapter 9.1 CDB (Configuration Database) for more details).

• SDB will be able to export service details to SLS static XML files1 (see chapter 9.2 SLS (Service Level Status) for more details).

• External tools/systems may be granted direct read access to non-confidential data (service details, but not contact information) in the SDB SQL database (see chapter 5.4 Reports for the management).

8. Data structures This chapter doesn’t try to cover all data structures that will be used in SDB. Therefore only the main data structures are described. Others, like the ones representing groups and section hierarchy and their leaders, or service availability classes, are skipped.

8.1. Service The following attributes (properties) describe a single service. Mandatory fields: 1 Learn more about SLS static XML files at: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#StaticXML

18

• id – globally unique service alphanumeric ID • fullname – full name of the service • type – service type (infrastructure or application) • group and section – a service is assigned to a group and section • production_level – is it a production service or a test instance?

Recommended fields:

• shortname – shorter name of the service (if full name is longer than 20 characters) • criticality – priority (importance) of the service, from 0 (low) to 100 (high) • site – where (which lab) does this service run (default: CERN) • vo – which VO (Virtual Organization) does this service belong to • servicedesc – brief service description • email – main service contact/support email address • webpage – service webpage • alarmpage – service alarms or news page • procedurespage – link to service operational procedures in OPM • notes – additional information concerning the service (free text field) • development_notes – comments on development person/team, contact details

etc. (free text field) • servicemanagers – a list of managers of this service (see below) • supportpeople – a list of people supporting this service (see below) • subservices – a list of subservices (see below) • dependencies – a list of services this service depends on (see below) • service_class – ID of a service class this service is an instance of • is_a_service_class (yes/no flag) – is it a service class, or a regular service

(default: no) • contact_info_visible (yes/no flag) – should contact info of support people for

this service be visible for CC staff (default: yes) • visible_in_sls (yes/no flag) – should this service appear in SLS (default: no)

Fields and flags used in SLS1 (optional unless stated otherwise): • datasource – URL of the SLS update XML (mandatory) • downifnoupdate (yes/no flag) – if update XML is not accessible, should service

be treated as not available (default: no)2 • refreshperiod – how often should SLS update XML be retrieved (in minutes) • validityduration – how long is an update valid (in minutes) • availabilitydesc – description of how the availability of the service is

calculated • staticxmllocation – original location (URL) of the static XML file3

1 More information on service fields (tags) used in SLS is available at: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#Which_fields_in_which_XML 2 See also: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#TagDownIfNoUpdateMore 3 See also: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#TagStaticxmllocationMore

19

• accountingxmllocation – URL of the SLS accounting XML • availabilitythresholds_available, availabilitythresholds_affected

and availabilitythresholds_degraded – availability threshold levels1 • metaservice (yes/no flag) – is this service a metaservice (default: no) • searchable (yes/no flag) – should this service appear in search results (default:

yes) • visibleinsubservices (yes/no flag) – should this service be listed in its

subservices (default: yes) More information on complex fields:

• servicemanagers is an ordered list of NICE login names. There is an additional yes/no flag (main) attached to each person to mark main service managers. Otherwise, all service managers are equal (there is no further distinction between service responsible, development team leader, secondary support person etc.).

• supportpeople is an ordered list of NICE login names. This list and the previous one (servicemanagers) are not mutually exclusive – a given person can appear on both lists.

• subservices – an ordered list of service IDs of subservices. Each subservice may have a weight specified (default: 1), that is used by SLS to calculate weighted average of subservices’ availabilities.

• dependencies – an ordered list of service IDs of services this service depends on. Each service is either marked as “dependson” (strong dependency, the service will not work if other service is not available) or “uses” (weak dependency, the service only uses sometimes the other service, but should be able to work anyway)2.

• infrastructure – a list of clusters and nodes used by this service. There are no additional attributes or flags for clusters or nodes listed in this field. (For example, if only a subset of nodes hosting a service is necessary to start it up, it should be stated and described in the Service Restart Procedure provided by service managers).

8.2. Service class Since service class and service share the same fields, service class details can be represented in the same data structure as service instances.

8.3. Manager/support person contact information

1 See also: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#Availability_vs_status 2 See also: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#TagDependenciesMore

20

The following fields (attributes) describe a support person or service manager. For CERN staff members, fields in bold cannot be modified in SDB; their values are synchronized with (automatically and regularly taken from) CERN Phonebook and HR database.

• nice_login – person’s NICE login name • cern_staff – (yes/no flag) is this person a CERN staff member • active – (yes/no flag) is this person active as a support person or service

managers (i.e. not on holidays etc.) • first_name and last_name – person’s name • department, group, section – person’s affiliation within CERN • email – main CERN e-mail address • email-alternative – alternative, private e-mail address • phone-work – phone number at work • phone-mobile – mobile phone number • phone-home – phone number at home • phone-other – other phone number

9. Interactions with existing systems This chapter describes interactions between SDB and existing systems, in particular CDB and SLS, and discusses potential consequent impact on, or requirements to those systems.

9.1. CDB (Configuration Database) CDB stores templates that describe configuration of clusters and nodes in the Computer Centre. These templates should be able to include a list of services hosted on a node or cluster. Such list of service IDs for a given cluster or node should be available to SDB via CDB SQL database (as a view, materialized view, or a table). It was decided to move all cluster/node responsible contact information to SDB, once it is available. In CDB, a node or cluster should point to a corresponding infrastructure (fabric) service – so that CDB would not need to store any contact info (maybe except for root e-mail addresses). For example, template describing lxplus cluster in CDB should in the future say that this cluster “hosts” LXPLUS fabric service. This infrastructure service, declared in SDB, should contain all contact details for the cluster (including the current CDB’s “service manager”). CDB should otherwise not contain any information that logically “belongs to” SDB (like service dependencies etc.). CDB and SDB should avoid duplication of information.

It should be noted that CDB and SDB are planned to be (and should stay) independent systems.

21

9.2. SLS (Service Level Status) Since most of the computing services at CERN are covered by SLS, their details should be initially imported into SDB (in order to avoid manual data entry). SLS stores service description in XML files called SLS static XMLs. These static XML files are currently provided (and usually manually edited) by service managers. Once SDB is available, service managers will be encouraged to use it to create and modify their services’ descriptions. SLS static XML files will then be automatically and regularly generated from SDB. SLS itself will stay independent of SDB. Deploying SDB will not require any developments on SLS side.

9.3. Lemon (CERN Monitoring) There is no direct interaction between SDB and Lemon. Nevertheless, while collecting requirements for SDB it became apparent that it should be possible to link alarms with services, not only with nodes. Currently Lemon/LAS alarms, as well as Remedy/ITCM which process LAS alarms, relate to nodes. In the future, an alarm would continue to be linked, by default, to a node. In case an alarm is related to a service (either infrastructure service or application service), it will have to clearly indicate for which service it is raised. Operators will then be able to follow instructions for the given alarm/service combination. Implementing service alarms would help CC staff and service support people – when responding to an alarm, they would know how important it is, and which service it concerns.

9.4. CERN Phonebook and HR databases In order to avoid multiple copies of contact information (and the consequent synchronization problems), SDB should retrieve service personnel’s contact details like home phone number or mobile phone number from HR databases and CERN Phonebook. On the other hand, SDB should not depend on constant availability of external information sources. SDB will therefore store contact information, update it automatically and regularly from HR databases and CERN Phonebook, and will not allow users to modify data that comes from external databases.

22

10. End notes

10.1. What SDB is not? This document focuses on what Service Database is, what it will do, and how. However, it should be also clearly stated what SDB is not:

• it is not a tool for generating management reports across various databases • it is not a replacement of OPM (Operational Procedure Management) project • it is not a flexible, generic calendar tool for managing service support

The following are outside of scope of SDB project: • maintaining and assuring availability of the computer(s) storing SDB offline copy • storing, updating and making available operational procedures

10.2. Open questions

• (Which attributes from the service catalog XLS) Are service attributes like enduser broadcast, enduser contact, service access time, support time, support response time, In-Support Time Support Line, Out of Hours Time, Out of Hours Response Time, Out of Hours, Support Line, Service Maintenance Window, Service Availability Target, Service Availability, Service Lifetime etc. needed/required in SDB? Can they be in free text format (which would make implementation much simpler)? How would they be used/consumed by SDB?

Documents

Service Database (SDB) · Document version 1.2 / January 24, 2007 Change log Version Date Changes Author(s) 1.0 15.12.2006 First draft S. Lopienski 1.1 19.12.2006 Several comments