Upload
dokhue
View
221
Download
1
Embed Size (px)
Citation preview
Establishment of National Agricultural Bioinformatics Grid in ICAR
. . . . ./. .05/2014
I.A.S.R.I/T.B. 05/2014
. . . . -
f"k tSolwpuk dsUnz
Hkkjrh; f"k lkaf[;dh vuqla/kku laLku
ykbcszjh ,osU;w] iwlk] u fnYyh&110012 Hkkjr
Centre for Agricultural Bioinformatics
Indian Agricultural Statistics Research Institute
Library Avenue, Pusa, New Delhi 110012, India
2014
POLICY AND RESOURCE ALLOCATION DOCUMENT
Anil Rai
K. K. Chaturvedi
S. B. Lal
Anu Sharma
IASRI, New Delhi
Sanjay Wandhekar
Ashish Ranjan
Gourav Chaudhari
Abhishek Sharma
Tarun Singh
C-DAC, Pune
Table of Contents
1 Data Center Security Policy ............................................................................................ 1
1.1 Introduction ............................................................................................................. 1
2 Management Responsibilities ......................................................................................... 2
3 Policy ................................................................................................................................. 3
3.1 Physical Access Control.......................................................................................... 3
3.1.1 Access to Network and communication devices ................................................ 3
3.1.2 Access to HPC Datacenter facilities ................................................................... 3
3.1.3 BMS room .......................................................................................................... 3
3.1.4 UPS room ........................................................................................................... 4
3.1.5 Visitors ............................................................................................................... 4
3.1.6 Facilities maintenance personnel........................................................................ 4
3.1.7 Food, Drink, Tobacco and Inflammable Products ............................................. 4
3.1.8 Photography/Videography ................................................................................. 4
3.2 Logical Access Control ........................................................................................... 4
3.2.1 User Creation ...................................................................................................... 4
3.2.2 Procedure to get a new account: ......................................................................... 5
3.2.3 Modification of user rights: ................................................................................ 5
3.2.4 Account locking ................................................................................................. 5
3.2.5 Unlocking or resetting of password ................................................................... 5
3.2.6 User deletion ...................................................................................................... 5
3.3 Application Access and Installation ........................................................................ 6
4 Physical Access to Data Centre Facility ......................................................................... 7
5 Administrator rights ........................................................................................................ 8
5.1 User ID Reviews: .................................................................................................... 8
6 Storage Allocation ............................................................................................................ 9
6.1 Storage Summary: ................................................................................................... 9
Policy and resource allocation document
6.2 Type of users: .......................................................................................................... 9
6.3 Storage quota......................................................................................................... 10
6.3.1 Limits for home storage 250T: ......................................................................... 10
6.3.2 Limits for scratch (Parallel) storage 200T: ...................................................... 10
6.3.3 scratch Policy ................................................................................................... 11
6.3.4 Limits for archive storage 200T: ..................................................................... 11
6.4 Automatic File Deletion Policy............................................................................. 11
7 CLUSTER RESOURCE ALLOCATION ................................................................... 13
7.1 Types of cluster: .................................................................................................... 13
7.2 Available Cluster Resources: ................................................................................ 13
8 BACKUP AND RESTORE ........................................................................................... 15
8.1 Procedure .............................................................................................................. 15
8.2 Backup Content ..................................................................................................... 15
8.3 Backup Approaches .............................................................................................. 15
8.4 Database Back-up ................................................................................................. 16
8.5 Data Backup policy: .............................................................................................. 16
8.6 Restore Procedure ................................................................................................. 16
8.7 Deleted User Account ........................................................................................... 17
8.8 Best Practices for User .......................................................................................... 17
Policy and resource allocation document
Tables
Table 1: Management Responsibilities ...................................................................................... 2
Table 2: Storage Summary......................................................................................................... 9
Table 3: Limits for Home Storage ........................................................................................... 10
Table 4: Limits for Scratch Storage ......................................................................................... 10
Table 5: Limits for Archive Storage ........................................................................................ 11
Table 6: Automatic File Deletion Policy ................................................................................. 12
Table 7: Available Cluster Resources ...................................................................................... 13
Table 8: Allocation of Resources ............................................................................................. 14
Policy and resource allocation document
Abbreviations
HPC High Performance Computing (HPC) is a system having high
processing and computing capability to carryout complex
calculations.
Lead Centre IASRI, New Delhi
Domain Centres NBPGR, New Delhi, NBAGR, Karnal, NBFGR, Lucknow,
NBAIM Mau and NBAII, Bangalore.
NFS Network File System is a distributed file system protocol allowing
a user on a client computer to access files over a network in a
manner similar to how local storage is accessed.
SMP Symmetrical Multiprocessing involves a multiprocessor computer
hardware and software architecture where two or more identical
processors are connected to a single shared main memory, have
full access to all I/O devices, and are controlled by a single OS
instance, and in which all processors are treated equally, with none
being reserved for special purposes.
CABin Centre for Agricultural Bioinformatics
IASRI Indian Agricultural Statistics Research Institute
PFS Parallel File System
Policy and resource allocation document
1
1 DATA CENTER SECURITY POLICY
1.1 INTRODUCTION
This document covers data center security, allocation of storage, allocation of cluster
resources and data backup requirements. Data center security is about physical and logical
security of data center resources. Allocation of storage and cluster resources is about
allocation of limited amount of storage, cores and memory among different user category.
Data backup requirement describes how to protect data, what approach to use for backing up
important data, how often to take backup and how to restore data.
The implementation of policy mentioned in this document will increase the security and
efficient use of HPC system and help to safeguard HPC resources.
The physical and logical access to data and information processing resources are covered in
this document.
Policy and resource allocation document
2
2 MANAGEMENT RESPONSIBILITIES
There is two level of user authentication implemented to avoid the unauthorized access of the
resources. The level of authentication will initially be approved by the Centre Head. The
login credentials will be created by the System Administrator. The Centre Head will be
responsible to grant the physical and logical access of the resources. The role of the manager
type is shown in table 1. Two manager positions are proposed and are responsible for
providing the authentication of the available resources to the intended user
Table 1: Management Responsibilities
Manager Type Role Centre Head Identifying physical and logical access rights to
be granted to the users.
System Administrator 1. Creation / Deletion of User IDs. 2. Granting / Revoking of access rights 3. Review and report
Policy and resource allocation document
3
3 POLICY
Access control is a mechanism to ensure that authorized personnel have access to the
information and information processing resources that are assigned to them. It also helps to
track the accountability. Access controls are mainly two types namely physical access control
and logical access control. Physical access control ensures that authorized personnel can have
access to the physical assets that are assigned to them. This would include physical access to
HPC Data center. Logical access control ensures that only authorized personnel have access
to the information or data in electronic form. This includes access to the Operating system,
application and associated information.
3.1 Physical Access Control
3.1.1 Access to Network and communication devices
The network devices on all the floors should be housed in secure cabinets that can be locked
and access should be restricted to network administrators or authorized personnel only.
3.1.2 Access to HPC Datacenter facilities
A valid ID card is mandatory for system administrators or maintenance personnel for
accessing the service to co-located equipment. These cards need to be checked into entry and
out at exit of/from the facility. IASRI staff will escort them to their equipment in HPC Data
center.
As HPC systems shall be operational 24X7, support provided by a vendor should be
identified in advance so that vendor representatives can get necessary access to the system.
System administrator should notify staff as soon as possible when they are notified that a
vendor support visit is planned. Vendor representatives will be escorted in the facility.
Biometric device is installed to control the access to the HPC data center. CCTV cameras are
installed in the HPC Data center to monitor activities of the server room. No person is
allowed to enter into the HPC Data center unless authorized person accompanies him/her.
Entry and exit time for visitor must be logged in the system or in entry log book.
3.1.3 BMS room
Biometric device is installed to control the access to the BMS room. Only authorized staff or
person who monitors alert generated by security devices is allowed to enter BMS room.
Policy and resource allocation document
4
3.1.4 UPS room
Biometric device is installed to control the access to the UPS room. Only authorized person is
allowed to enter the UPS room.
3.1.5 Visitors
All IASRI staff, students, and third-parties who are visiting the facility are required to present
their ID cards or valid government-issued identification which will be checked while entering
in and out of/from the facility. General visitors to the HPC Data center must be escorted by
IASRI staff during their visit to the facility.
3.1.6 Facilities maintenance personnel
Maintenance of equipment and the facility by IASRI staff and third parties are essentially
required. Maintenance may include but is not limited to general cleaning, raised floor space
cleaning, and maintenance on electrical and mechanical systems. Maintenance visits by non-
IASRI staff must be scheduled in advance and informed to the System administrator.
Maintenance staff will be escorted at all times and/or under surveillance. All maintenance
personnel must carry an approved identification credential and adhere to IASRI policies and
procedures.
3.1.7 Food, Drink, Tobacco and Inflammable Products
Food, drinks, tobacco and inflammable products shall not be allowed in the HPC Data center
area. Smoking shall be strictly prohibited in the HPC Data center facility.
3.1.8 Photography/Videography
Taking of pictures and/or video, including by cell phones equipped with cameras, is
prohibited unless a valid approval from the competent authority is presented.
3.2 Logical Access Control
The login credentials can be created by the system administrator to access the computing
resources or information/data.
3.2.1 User Creation
A unique Identifier (User ID) should be created for every individual who is given access to
the HPC facilities at IASRI. The ID creation naming convention would be decided by the
system administrator. The system administrator will create a new user-id and provide the
access rights as recommended by the CABin head.
Policy and resource allocation document
5
3.2.2 Procedure to get a new account:
1. Fill up the registration/login form available at webapp.cabgrid.res.in/biocomp portal.
2. Fill the form online
3. The credentials will be verified.
4. After verification, get the approval of the Centre head.
5. Submit the request to HPC system administrator to get the account created.
6. Send the email to the concern user.
3.2.3 Modification of user rights:
In case of any modification in user access rights, the request should be submitted to CABin
Head for approval. The approved request should be submitted to the system administrator
who will do the necessary changes. If the rights are often modified on temporary basis then it
is very important to review the user rights on regular basis so that misuse of elevation of
rights can be avoided.
3.2.4 Account locking
Account will be locked if number of wrong password attempt is made. Account would be
locked if user is not going to use the account for a long period of time.
3.2.5 Unlocking or resetting of password
If an account is locked or a user has forgotten his/her password then he/she needs to send a
request mail request to the system administrator who will then unlock the account or reset the
password.
3.2.6 User deletion
Information regarding the user deletion should be sent to CABin Head who will notify the
system administrator for disabling / deleting the user-id of the user. This would be required in
case of the user has been resigeds / suspended / terminated from the service or left (for any
other reason) the institute.
On receipt of notification, the system administrator would carry out the requested action
before the specified period of time. User-ids of retired personnel must be deleted within 15
days, unless explicitly advised by the CABin head. It is better to retain deleted user data in
archive storage for a certain period of time.
Policy and resource allocation document
6
3.3 Application Access and Installation
User can access all the available applications through web portal. Users are not supposed to
install any application in their home directory without prior permission from system
administrator/ CABin Head. In case any specific application is not installed and it is required
by the user then he may send a request to the system administrator. This application will be
installed if CABin Head approves the request.
Policy and resource allocation document
7
4 PHYSICAL ACCESS TO DATA CENTRE FACILITY
1. If a person wants physical access to the data center, he would need to submit a written
request to system administrator/ CABin head for permission.
2. Once the permission is granted, the user needs to contact the system administrator
who will then create an account in the biometric software.
3. After creation of the account, the user can get entry in the data center through finger
scan device.
Policy and resource allocation document
8
5 ADMINISTRATOR RIGHTS
Administrator logins and privileged access rights allow users to override HPC system
controls. Users must not be allowed to work with administrator credentials or with
privileged rights, unless it is very much required, it must be done in the presence of system
administrator.
5.1 User ID Reviews:
Access requests must be renewed annually to maintain approved access. Access permissions
should be reviewed periodically. Users shall notify the system administrator immediately if
the access is no longer required due to an employees termination or a change in
responsibilities.
Policy and resource allocation document
9
6 STORAGE ALLOCATION
Each individual user/researcher is assigned a standard storage allocation or quota on each
type of storage namely /home, /scratch and /archive. Researchers/users are allowed to use
home storage according to soft limit, hard limit and grace period, scratch storage according to
fixed space and fixed time period. The chart below shows the general view of types of
storage will be provided to the users.
6.1 Storage Summary:
Table 1 shows the different type of file systems with their purpose of use, total size and file
system used.
Table 2: Storage Summary
Storage Purpose Size Back up File System
/home Space where users have their
home directories, users can keep
their files as long as they want but
must be kept under soft limit.
250 TB
Yes, but for
particular user
account
NFS
/scratch Computational work space 200 TB No PFS
/archive Long-term storage 200 TB No NFS
Important: Of all the space, only /scratch should be used for computational purposes.
6.2 Type of users:
The users are grouped based on the resources and application usage. There are three types of
users in the HPC system.
1. Registered User (RU)
2. Centre Normal User (CNU)
3. Centre Main User (CMU)
The registered users (RU) profile will be created through web based registration process and
is available to any user whose request is approved by the Centre Head. Center Normal User
(CNU) category is the type of user category which is capable of using fair amount of cluster
resources. Center Main Users are those who will utilize the huge amount of cluster resources
frequently.
There will be approximately 1200 users in all categories initially i.e. 100 in CMU, 100 in
CNU and 1000 in RU category.
Policy and resource allocation document
10
6.3 Storage quota
The storage quota is assigned based on the types of users. There are three types of storage
namely home, scratch and archive. The storage is allocated based on soft limit, hard limit and
grace period.
Soft Limit: Soft limit in quota is defined as the limit which can be exceeded for a particular
time i.e. grace period.
Hard Limit: Hard limit in quota is defined as the limit which cannot be exceeded.
Grace Period: Time period for which a user can keep its space usage above the soft limit.
6.3.1 Limits for home storage 250T:
Main purpose of home storage is keep home directory of users where users will keep their
files as long as they want. Table 2 shows the assigned soft limit, hard limit and grace period
to each category with total number of users in each category.
Table 3: Limits for Home Storage
USER TYPE SOFT
LIMIT
HARD
LIMIT
GRACE
PERIOD
Total No.
of users
Registered User 20G 25G 60 Days 1000
CNU 500G 600G 30 Days 100
CMU 750G 900G 40 Days 100
Maximum home space used
by all users
60T 175T ----------- -----
Above mentioned home storage limit can be varied according to number of users in future.
6.3.2 Limits for scratch (Parallel) storage 200T:
Scratch storage is used here to keep the data which is either required as input or produced as
an output during the execution of parallel application. Table 3 shows the assigned soft limit,
hard limit and grace period to each category with total number of users in each category.
Table 4: Limits for Scratch Storage
USER TYPE HARD
LIMIT
TIME
PERIOD
Total no. of
users
Registered User 25G 25 Days 1000
CNU 600G 40 Days 100
CMU 900G 50 Days 100
Maximum scratch space used by all users 175T ----------- ------
Policy and resource allocation document
11
Above mentioned scratch storage limits can be varied according to the number of users in
future. If a user wants more space (greater than hard limit), he/she can make a request to
exceed the hard limit for some time.
6.3.3 /scratch Policy
The /scratch storage system is a shared resource that needs to run as efficiently as possible for
the benefit of all users. There is no system backup for data in /scratch, it is the user's
responsibility to back up their data frequently.
All files which are older then allowed time period to a particular user will be removed
on the regular basis as a part of the cleaning process.
It is strongly suggested to the user that they do regular cleaning of their data in
/scratch to decrease /scratch usage by backing up files they need to retain either on
/archive or elsewhere.
Administrator has reserved the rights to clean up files on /scratch at any time if it is
needed to improve the performance of the system..
Some precautions for the users:
Do not put important source code, scripts, libraries, executables in /scratch. These
important files should be stored in /home.
Do not make soft link for the folders in /scratch to /home for /scratch access
6.3.4 Limits for archive storage 200T:
Archive storage is a low cost storage which is used to store data for longer period of time and
it can only be used by CMU. 40% i.e. .8TB will be kept of database and application
archiving. Table 4 shows the assigned limit of archive storage to users.
Table 5: Limits for Archive Storage
USER TYPE HARD LIMIT TIME PERIOD
Registered User 0 0
CNU 0 0
CMU 1.2T None
Above mentioned archive storage limit can be varied according to number of users in future.
6.4 Automatic File Deletion Policy
The table below describes the policy concerning the automatic deletion of files from home,
scratch and archive storage.
Policy and resource allocation document
12
Table 6: Automatic File Deletion Policy
Space Automatic File Deletion Policy
/home None
/archive None
/scratch Files will be deleted after the expiry of allowed time period. Files may be
deleted as needed without warning if required for system productivity.
ALL ALL /home and /archive files associated with expired accounts will be
automatically deleted after 90 days. The /scratch files will automatically be
deleted according to time period assigned to each user.
Policy and resource allocation document
13
7 CLUSTER RESOURCE ALLOCATION
Resource allocation is an important process to ensure efficient and fair use of the cluster.
Following section describes the different types of cluster available and allocation of resources
to a particular user type with respect to a particular cluster.
7.1 Types of cluster:
1. Linux cluster having 256 nodes
2. Windows cluster having 16 nodes
3. Linux based GPU cluster having 16 nodes
4. Linux cluster at each of the five domain center
5. SMP
7.2 Available Cluster Resources:
Following table provides the information about the cores and RAM in each cluster and their
individual nodes. Column Cores tells the number of cores in the cluster and its nodes and
column memory gives the total amount memory in the cluster and its nodes.
Table 7: Available Cluster Resources
Cluster Type Cores Memory
PER NODE CORES PER NODE
256 Nodes Linux Cluster 12 12*256=3072 96G
16 Nodes Windows Cluster 12 12*16=192 96G
16 Node GPU based Linux Cluster* 12 12*16=192 96 G
SMP 64 64 1.5T
*Each GPU node contains two GPU cards and the memory of each GPU card is 6 GB.
Allocation of Resources:
Following tables shows allocation of cores to CMU (Center Main User), CNU (Center
Normal User) and RU (registered user) from all the available users
Policy and resource allocation document
14
Table 8: Allocation of Resources
Cluster Type Max Cores/User Total Cores Assigned
to User Category
Total Cores
Under Use CMU CNU RU
256 Nodes Linux Cluster 40 20 4 1600 800 400 2800
16 Nodes Windows
Cluster
8 4 NA 176 80 NA 256
16 Node GPU based
Linux Cluster*
8 4 NA 176 80 NA 256
SMP NL NL NA NL NL NA 64
*32 GB RAM is set as a limit to the registered user
*NA stands for Not Allowed and NL stands for No Limit
Policy and resource allocation document
15
8 BACKUP AND RESTORE
The importance of data and unprecedented growth in data volumes has necessitated an
efficient approach to data backup and recovery. This document is intended to provide details
of data backup and retrieval operations.
The purpose of back and restore policy is as follows:
To safeguard the information assets of IASRI
To prevent the loss of data in the case of an accidental deletion or corruption of data,
system failure, or disaster.
To permit timely restoration of information if some unwanted event occur.
To manage and secure backup and restoration processes and the media employed in
the process.
8.1 Procedure
The Archive storage currently deployed for backup has 200 TB of disk-based storage. 60% of
storage is reserved for user data and rest is for database, application data and other data.
The backup software used to control the backup processes is HP ibrix. The Systems Support
team ensures that all backups are completed successfully and reviews the backup process
daily. Logs are maintained to verify the amount of data backed up and the unsuccessful
backup occurrences.
8.2 Backup Content
The primary data that will be backed up are: Data files of Center Main User, Database files,
application installed on the cluster and common application data required by users. Data to be
backed up will be listed by location and specified data sources.
8.3 Backup Approaches
1. Data accessed 24x7 should backed up with full back-up most of the time as restore
process will take less time to make data back online again. It is important to decide
how many days you want to keep full backup copy as it will consume lot of backup
storage space. If back-up space is constraint then repetition of one full backup
followed by several differential backup daily should be carried out.
2. User data back-up should be carried out as differential or incremental backup.
Differential back-up should be considered first if there is enough space available,
otherwise go for incremental backup.
Policy and resource allocation document
16
3. Installed applications and data used by these applications do not change very often, so
incremental backup is the best option.
8.4 Database Back-up
1. Database back-up process needs online database to be offline or locked, so that
backup copy contains consistent state of database.
2. If making database offline is not an option than specialized software should be used to
carry out periodic backups.
3. Time interval for backup
8.5 Data Backup policy:
1. Full backups are performed weekly. Full backups are retained for 3 months before
being overwritten.
2. Incremental backups are performed daily. Incremental backups are retained for
1month before being overwritten.
3. Backups are carried out overnight.
4. Once the Backup process is finished, Backup copy will be copied to remote site for
disaster recovery process.
5. Backups are stored securely and only authorized person have access to it.
6. The IT department monitors backup operations and the status for backup jobs is
checked on a daily basis during the working week.
7. Re-run of failed backup will be done next day.
8.6 Restore Procedure
1. Data for restoration will be available once the ongoing backup is done or required
backup copy already exist i.e. older backup.
2. Backup data will only be available for restoration during retention period.
3. Request for data restoration/recovery must be sent to backup/IT administrator or
HEAD of IASRI datacenter.
Policy and resource allocation document
17
8.7 Deleted User Account
In case of any user account is deleted, backup will be kept for some limited period of time.
But user must be made aware of the fact that IASRI is not responsible of the data one account
is deleted as in case of shortage of backup space user might get deleted.
8.8 Best Practices for User
1. Always have backup of your data on your personal device.
2. For better use of backup storage, please remove the backed up data that you will never
need in the future.
3. Do not let the multimedia files backed up which are not at all related to your HPC
work.