69
> Solving Big problems with OS: Condor > Antonio Sanz ([email protected]) > 15 / Oct / 2012

Solving Big problems with Condor - II HPC Sysadmins Meeting

Embed Size (px)

DESCRIPTION

This is a long talk about the main features of Condor, and what tweaks we have added at I3A.

Citation preview

Page 1: Solving Big problems with Condor - II HPC Sysadmins Meeting

> Solving Big problems with OS: Condor > Antonio Sanz ([email protected]) > 15 / Oct / 2012

Page 2: Solving Big problems with Condor - II HPC Sysadmins Meeting

2

> Antonio Sanz > I3A System Manager

> HERMES HPC cluster sysadmin > [email protected] > @antoniosanzalc

Page 3: Solving Big problems with Condor - II HPC Sysadmins Meeting

3

Page 4: Solving Big problems with Condor - II HPC Sysadmins Meeting

4 I’m no SGE guy …

Page 5: Solving Big problems with Condor - II HPC Sysadmins Meeting

5

Condor – Main features

Page 6: Solving Big problems with Condor - II HPC Sysadmins Meeting

6

Page 7: Solving Big problems with Condor - II HPC Sysadmins Meeting

7

Healthy project

Page 8: Solving Big problems with Condor - II HPC Sysadmins Meeting

8 Condor Basics

Heterogeneous computing

Page 9: Solving Big problems with Condor - II HPC Sysadmins Meeting

9

Job Surveillance

Page 10: Solving Big problems with Condor - II HPC Sysadmins Meeting

10

Requirements

Page 11: Solving Big problems with Condor - II HPC Sysadmins Meeting

11 Condor Basics

Fair use of resources

3. Sistemas de gestión de colas : Condor

Page 12: Solving Big problems with Condor - II HPC Sysadmins Meeting

12

Checkpoints

Page 13: Solving Big problems with Condor - II HPC Sysadmins Meeting

13 Condor Basics

Nested jobs (DAG)

Page 14: Solving Big problems with Condor - II HPC Sysadmins Meeting

14

Easy Licensing

Page 15: Solving Big problems with Condor - II HPC Sysadmins Meeting

15

… with Hadoop, MPI, OpenMP, GPU

Page 16: Solving Big problems with Condor - II HPC Sysadmins Meeting

16

Condor Flocking

Page 17: Solving Big problems with Condor - II HPC Sysadmins Meeting

17

Grid & Cloud Computing

Page 18: Solving Big problems with Condor - II HPC Sysadmins Meeting

18

VM Universe

Page 19: Solving Big problems with Condor - II HPC Sysadmins Meeting

19

Hooks & APIs

Page 20: Solving Big problems with Condor - II HPC Sysadmins Meeting

20 Condor Basics

Flexibility

Page 21: Solving Big problems with Condor - II HPC Sysadmins Meeting

21

How Condor works

How Condor works

Page 22: Solving Big problems with Condor - II HPC Sysadmins Meeting

22

Management

[Hello, Dave]

Page 23: Solving Big problems with Condor - II HPC Sysadmins Meeting

23

Compute

* Hey!. I’m a 64K one!.

* *

Page 24: Solving Big problems with Condor - II HPC Sysadmins Meeting

24 Condor Basics

Job list ClassAd

3. Sistemas de gestión de colas : Condor

Page 25: Solving Big problems with Condor - II HPC Sysadmins Meeting

25

Resource list ClassAd

Page 26: Solving Big problems with Condor - II HPC Sysadmins Meeting

26

Matchmaking

Page 27: Solving Big problems with Condor - II HPC Sysadmins Meeting

27 Condor Basics

Priority Management

Page 28: Solving Big problems with Condor - II HPC Sysadmins Meeting

28 Data

Transfer

Page 29: Solving Big problems with Condor - II HPC Sysadmins Meeting

29 Condor Basics

3. Sistemas de gestión de colas : Condor

Job running

Page 30: Solving Big problems with Condor - II HPC Sysadmins Meeting

30

Job Monitoring

Page 31: Solving Big problems with Condor - II HPC Sysadmins Meeting

31

Job End

Page 32: Solving Big problems with Condor - II HPC Sysadmins Meeting

32

Example

Page 33: Solving Big problems with Condor - II HPC Sysadmins Meeting

33

Hello, World !!

#!/bin/sh # I’m hola.sh echo Hola mundo desde `hostname` # # A Hello World .. In Condor! # # I’m hello.sub Universe = vanilla Executable = hola.sh Log = hola.log Output = hola.out Error = hola.err Queue

Page 34: Solving Big problems with Condor - II HPC Sysadmins Meeting

34 Lanzar el cálculo

condor_submit

4. Condor Basics – Un cálculo fácil

Page 35: Solving Big problems with Condor - II HPC Sysadmins Meeting

35 Lanzar el cálculo

condor_q

Page 36: Solving Big problems with Condor - II HPC Sysadmins Meeting

36

HERMES

I3A HPC cluster

Page 37: Solving Big problems with Condor - II HPC Sysadmins Meeting

37 Condor Basics

1500 executing jobs, 40000 in queue … Lookin’ good

Page 38: Solving Big problems with Condor - II HPC Sysadmins Meeting

38

Condor Tweaks

Page 39: Solving Big problems with Condor - II HPC Sysadmins Meeting

39

Propietary Resources

Page 40: Solving Big problems with Condor - II HPC Sysadmins Meeting

40

Dynamic Partitioning

Page 41: Solving Big problems with Condor - II HPC Sysadmins Meeting

41 Condor Basics

Long Jobs

Page 42: Solving Big problems with Condor - II HPC Sysadmins Meeting

42 Condor Basics

Short Jobs

Page 43: Solving Big problems with Condor - II HPC Sysadmins Meeting

43 Condor Basics

Big Jobs

Page 44: Solving Big problems with Condor - II HPC Sysadmins Meeting

44

Advanced Accounting

Page 45: Solving Big problems with Condor - II HPC Sysadmins Meeting

45

Dynamic Checkpointing

Page 46: Solving Big problems with Condor - II HPC Sysadmins Meeting

46

Condor_ssh

Interactive Access

Page 47: Solving Big problems with Condor - II HPC Sysadmins Meeting

47 Condor Basics GPU Integration

Page 48: Solving Big problems with Condor - II HPC Sysadmins Meeting

48

Extra Bonus

Future (always work in progress)

Page 49: Solving Big problems with Condor - II HPC Sysadmins Meeting

49

HA

Page 50: Solving Big problems with Condor - II HPC Sysadmins Meeting

50

Cgroups Isolation

Page 51: Solving Big problems with Condor - II HPC Sysadmins Meeting

51 Condor Basics

Hadoop Integration

3. Sistemas de gestión de colas : Condor

Page 52: Solving Big problems with Condor - II HPC Sysadmins Meeting

52

Green Computing

Page 53: Solving Big problems with Condor - II HPC Sysadmins Meeting

53 Condor Basics

3. Sistemas de gestión de colas : Condor

Nobody’s perfect ….

Page 54: Solving Big problems with Condor - II HPC Sysadmins Meeting

54

No MPI + Dynamic Partitioning Rellenado de trabajos HA Complicada

No MPI + Dynamic Partitioning (yet)

No slot wise preemption

HA tough as nails

Page 55: Solving Big problems with Condor - II HPC Sysadmins Meeting

55 Condor Basics

3. Sistemas de gestión de colas : Condor

Page 56: Solving Big problems with Condor - II HPC Sysadmins Meeting

56 Condor Basics

> Conclusiones

3. Sistemas de gestión de colas : Condor

Page 57: Solving Big problems with Condor - II HPC Sysadmins Meeting

57

Example

Page 58: Solving Big problems with Condor - II HPC Sysadmins Meeting

58

Antonio Sanz [email protected] @antoniosanzalc http://slideshare.net/ansanz

Slides here:

Fly like a bird with Condor Powerful. Flexible. Free.

Page 59: Solving Big problems with Condor - II HPC Sysadmins Meeting

59

Extra Bonus

Page 60: Solving Big problems with Condor - II HPC Sysadmins Meeting

60

I3A & Condor

Page 61: Solving Big problems with Condor - II HPC Sysadmins Meeting

61

Alzheimer & Dementia Diagnose

Page 62: Solving Big problems with Condor - II HPC Sysadmins Meeting

62

Tissue Modelling

Page 63: Solving Big problems with Condor - II HPC Sysadmins Meeting

63

Rare Diseases

Page 64: Solving Big problems with Condor - II HPC Sysadmins Meeting

64

Crash test simulations

Page 65: Solving Big problems with Condor - II HPC Sysadmins Meeting

65

Heart complete sim.

Page 66: Solving Big problems with Condor - II HPC Sysadmins Meeting

66 Communication Systems

Page 67: Solving Big problems with Condor - II HPC Sysadmins Meeting

67

Dynamic gaming AI

Page 68: Solving Big problems with Condor - II HPC Sysadmins Meeting

68

Autonomous robots

Page 69: Solving Big problems with Condor - II HPC Sysadmins Meeting

69

Antonio Sanz [email protected] @antoniosanzalc http://slideshare.net/ansanz

Slides here:

Fly like a bird with Condor Powerful. Flexible. Free.