View
2
Download
0
Category
Preview:
Citation preview
LENOVOSystem Management Solutions
2015 Lenovo All rights reserved.
Luigi Brochard, Lenovo HPC Distinguished Engineer
HPC Advisory Council 2016, Lugano April 21-23.
2
HPC Software Solutions through Partnerships
2015 Lenovo
• Building Partnerships to provide
the “Best In-Class” HPC Cluster
Solutions for our customers
• Collaborating with software vendors
to provide features that optimizes
customer workloads
• Leveraging “Open Source”
components that are production
ready
• Contributing to “Open Source” (i.e.
xCAT, Confluent, OpenStack) to
enhance our platforms
• Providing “Services” to help
customers deploy and optimize their
clusters
Customer Applications
Compute Storage Network
OFED
UFM
LenovoSystem x
Virtual, Physical, Desktop, Server
OS
VM
Systems
ManagementIBM PCM
xCatExtreme Cloud
Admin. Toolkit
Parallel File
SystemsIBM GPFS Lustre NFS
Workload &
Resources
IBM LSFHPC & Symphony
Adaptive
Moab
Maui/Torque
Slurm
Parallel
RuntimeIntel MPI Open MPI
MVAPICH,
IBM PMPI
Compilers &
Tools
Intel Parallel
Studio, MKL
Open Source Tools:
FFTW, PAPI, TAU, ..
Debuggers &
Monitoring
Eclipse PTP +
debugger, gdb,..ICINGA Ganglia
Ente
rprise S
olu
tion S
erv
ices
Insta
llatio
n a
nd
cu
sto
m s
erv
ice
s, m
ay n
ot in
clu
de
se
rvic
e s
up
po
rt fo
r th
ird
pa
rty s
oft
wa
re
OmniPath
3
xCAT
2015 Lenovo
Open Source
Collaboration with IBM
Server Hardware Management
OS Deployment
IP and network service
management
Virtualization Management
CLI
Holistic solution management
Weak GUI
Complex to learn
Lacking structure
Poor enablement for web
development
Good for large clusters, difficult for
smaller solutions/enterprise
networks
4
WEB ORCHESTRATIONInitial GOALs
Provide easy cluster access to new HPC customers using Open Source HPC
Infrastructure Low cost entry into HPC
Visual summary views to help understand cluster usage Admin Console – User management, Cluster Monitoring
User Console – Jobs submission, Job/Cluster Monitoring
Initial target and Proof of Concept trials – China Market Focus on China Market first – A lot of customers are just coming into HPC workloads
Collaborating with customers to understand their usage models and future requirements
Very positive feedback and market acceptance
LiCO – Lenovo Intelligent Computing Orchestration was released to China market
WW Market – Create English version and work with collaborators to release
the English version as “Open Source” project : OSMWC Oxford University collaboration
2015 Lenovo
5
Lenovo Intelligent Cluster Orchestrator (LiCO)
What is Web Console:
An Unified GUI
• User Portal (dashboard, submit job, monitor job)
• Admin Portal (dashboard, user/account management)
Future Work Items:
• SLURM integration
• ICINGA integration
• Intel OPA integration
• LDAP integration
Lenovo components Open Source/3rd party Lenovo Hardware
xCAT/Confluent
Torque/MAUIGOLD/Ganglia
WEB CONSOLE GUI
Insta
llatio
n g
uid
e / s
crip
ts
Adm
in g
uid
e / s
crip
ts
OpenMPI, MVAPICH
MPICH, Intel Parallel studio
CentOS/RHEL Lustre OFED
Server Storage Network
Main HPC components below the GUI would be part of OpenHPC project
2015 Lenovo
6
Open System Management Web Console (OSMWC)
What is Web Console:
An Unified GUI
• User Portal (dashboard, submit job, monitor job)
• Admin Portal (dashboard, user/account management)
Future Work Items:
• SLURM integration
• ICINGA integration
• Intel OPA integration
• LDAP integration
Lenovo components Open Source/3rd party Lenovo Hardware
xCAT/Confluent
Torque/MAUIGanglia
WEB CONSOLE GUI
Insta
llatio
n g
uid
e / s
crip
ts
Adm
in g
uid
e / s
crip
ts
OpenMPI, MVAPICH
MPICH, Intel Parallel studio
CentOS/RHEL Lustre OFED
Server Storage Network
Main HPC components below the GUI would be part of OpenHPC project
2015 Lenovo
7
END USER PORTAL – TRANSLATED VIEW
2015 Lenovo
8
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
9
Confluent
2015 Lenovo
10
Confluent Goals
2015 Lenovo
Lenovo led project to improve upon xCAT heritage
Carries on strong CLI and other facets of xCAT
More structured interface
Easier to learn
Web development enabled – RESTful APIs – good GUI possible
Faster performance/lower memory usage/higher scalability for large solutions
Better equipped to work in smaller configurations without full network control
Enhanced security model
Reuse effort across HPC, Openstack, xClarity efforts
Reuse development effort across multiple projects (Lenovo/external
Ecosystem)
More contributions from third parties
11
Confluent updates
• xCAT style noderanges
• Client connections persist across server restart (e.g. consoles)
• xCAT style commands:– nodehealth (new)
– nodesensors (like rvitals)
– nodepower (like rpower)
– nodeeventlog (like reventlog)
– nodeconsole (like rcons)
– nodesetboot (like rsetboot)
– nodeidentify (like rbeacon)
– nodelist (like nodels)
• Inventory in API (nodeinventory to come, similar to rinv)
• Dynamic nodegroups (groups with a ‘noderange’ attribute get expanded)
• Enriched debugging facilities
• Rotating log support (defaults to daily)2015 Lenovo
12
Confluent Web UI (consoles without plugin or java)
2015 Lenovo
13
Confluent CLI – through confetti (RESTful API)
2015 Lenovo
14
nodesensors (csv, and time series data)
2015 Lenovo
15
Confluent performance
2015 Lenovo
16
Future High Performance Computing Open Solutions
2015 Lenovo
• Partnering as founding member
of OpenHPC initiative to
establish a common Open HPC
Framework
• Collaborating with Oxford
University to create an Open
System Management framework
for small to medium clusters
• Leading Open Source system
management projects: Confluent
and soon to be formed OSMWC
• Contributing to xCAT Open Source
project to enhance our platforms
• Providing “Services” to help
customers deploy and optimize
their clusters
Customer Applications
Parallel File
SystemsLenovo
GSS
Intel
LustreNFS
Ente
rprise S
olu
tion S
erv
ices
Insta
llatio
n a
nd
cu
sto
m s
erv
ice
s, m
ay n
ot in
clu
de
se
rvic
e s
up
po
rt fo
r th
ird
pa
rty s
oft
wa
re
Systems
Management
Open System Management
WEB Console (OSMWC)
Confluent
xCatExtreme Cloud
Admin. Toolkit
OS
VMOFED
Compute Storage Network UFM
Leovo System x
Virtual, Physical, Desktop, Server
OmniPath
17
Future High Performance Computing Solutions
2015 Lenovo
• Adding new features• Power & Energy awareness
• Light weight virtual HPC
• Big Data / Spark workload
• Managing more than the servers
Customer Applications
Parallel File
SystemsLenovo
GSS
Intel
LustreNFS
Ente
rprise P
rofe
ssio
nal S
erv
ices
Insta
llatio
n a
nd
cu
sto
m s
erv
ice
s, m
ay n
ot in
clu
de
se
rvic
e s
up
po
rt fo
r th
ird
pa
rty s
oft
wa
re
Open System Management WEB Console (OSMWC)
Integration with
OS
VMOFED
Compute Storage Network UFM
Lenovo System x
Virtual, Physical, Desktop, Server
OmniPath
xCat Extreme Cloud Admin Toolkit, Confluent
18
Future HPC Software Solutions through Partnerships
2015 Lenovo
• Building Partnerships to provide
the “Best In-Class” HPC Cluster
Solutions for our customers
• Collaborating with software vendors
to provide features that optimizes
customer workloads
• Bright Computing
• Altair
• …
Customer Applications
Compute Storage Network
OFED
UFM
LenovoSystem x
Virtual, Physical, Desktop, Server
OS
VM
Systems
ManagementIBM PCM
xCatExtreme Cloud
Admin. Toolkit
Parallel File
SystemsIBM GPFS Lustre NFS
Workload &
Resources
IBM LSFHPC & Symphony
Adaptive
Moab
Maui/Torque/
Slurm/PBSPro
Parallel
RuntimeIntel MPI Open MPI
MVAPICH,
IBM PMPI
Compilers &
Tools
Intel Parallel
Studio, MKL
Open Source Tools:
FFTW, PAPI, TAU, ..
Debuggers &
Monitoring
Eclipse PTP +
debugger, gdb,..ICINGA Ganglia
Ente
rprise S
olu
tion S
erv
ices
Insta
llatio
n a
nd
cu
sto
m s
erv
ice
s, m
ay n
ot in
clu
de
se
rvic
e s
up
po
rt fo
r th
ird
pa
rty s
oft
wa
re
OmniPath
BC CM
20
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
21
User Job Submission views
2015 Lenovo
22
User Job Submission – provide Scheduler job file
2015 Lenovo
23
Admin / Operator views
2015 Lenovo
24
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
25
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
26
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
27
nodehealth
2015 Lenovo
Recommended