Upload
tranhanh
View
216
Download
2
Embed Size (px)
Citation preview
From Newton to Einsteinrelativistic dynamics of black holesin dense star clusters
Rainer Spurzem, Astronomisches Rechen-InstitutZentrum für Astronomie (ZAH) Univ.Heidelberg
[email protected]://www.ari.uni-heidelberg.de/mitarbeiter/spurzem/
with …Astrogrid-D
DLRNVIDIA
@ ARI / ITA (ZAH): Peter Berczik, Ingo Berentzen, Miguel PretoRalf Klessen, Robi Banerjee, Jonathan Downing, Andreas Ernst, Jose Fiestas, Oliver Porth, Kristina WäckenComputer Engineering at U HeidelbergReinhard Männer, Andreas Kugel, Guillermo Marcus@UvA, NLSimon Portegies Zwart et al.Further important collaborators:David Merritt (RIT), Sverre Aarseth (IoA) Doug Lin (KIAA / UCOLick), G. Schäfer, A. Gopakumar (Univ. Jena, D)N. Nakasato (Aizu-Wakamatsu, Japan)T. Hamada (Nagasaki U, Japan)
Sponsors and Team:
SPP1177
MWK Ba-Wü.
N. Attig, M.A.HermannsB.Orth W.Frings
May 2009May 2009 DEISA PRACEDEISA PRACE
ContentsContents
Introduction
Algorithms / Parallelization / Software
Binary Black Holes / Gravitational Waves
Hardware Issues
May 2009May 2009 DEISA PRACEDEISA PRACE 44
May 2009May 2009 DEISA PRACEDEISA PRACE 55
(Credit: X-ray: NASA/CfA/J. Grindlay et al., Optical: NASA/STScI/R. Gilliland et al.)
6
Slide:MiguelPreto
May 2009May 2009 DEISA PRACEDEISA PRACE
Galaxies merge, hierarchical Structure formation, their centres? Black Holes?
Volonteri
Slide:Ingo Berentzen
May 2009May 2009 DEISA PRACEDEISA PRACE
Sesana et al. 2004
Extending study ofVolonteri et al. 03:Rate of expected Black HoleMergers in Galaxies
Classical Black HolesClassical Black Holes
9
Contents
Single Black Hole
Binary Black Holes / Gravitational Waves
Special Supercomputing
Galaxies merge, hierarchical Structure formation, their centres? Black Holes?
May 2009May 2009 DEISA PRACEDEISA PRACE
Observations of Binary Black Holes?Observations of Binary Black Holes?
Left: double-double radio galaxies, clear out of inner disk, milliparsec, small mass ratio
Right: Jet Flipping, large mass ratio, depend on viscous time scales(Theory: Liu 2004, Liu et al. 2006)
May 2009May 2009 DEISA PRACEDEISA PRACE
Astrophysical IntroductionAstrophysical Introduction
Dynamical Time ScaleDynamical Time ScaleRelaxation Time ScaleRelaxation Time ScaleAge of UniverseAge of Universe
Laboratories for gravothermal NLaboratories for gravothermal N--Body Systems!Body Systems!Note: Cosmological and Galactic NNote: Cosmological and Galactic N--Body Simulations need few crossing times, Body Simulations need few crossing times,
and less than a relaxation time, while gravothermal systems needand less than a relaxation time, while gravothermal systems need multiples of multiples of N crossing times, several relaxation times! Complexity goes as NN crossing times, several relaxation times! Complexity goes as N3 3 !!
10106 6 yrsyrs101088 yrsyrs
10101010 yrsyrs
←← Virial Equilibrium Virial Equilibrium →→
May 2009May 2009 DEISA PRACEDEISA PRACE
ThreeThree--Body Body –– Million BodyMillion Body
Astrophysical Introduction
May 2009May 2009 DEISA PRACEDEISA PRACE
Kinetic Models for Comparison:
Gaseous Model: Any N spherical smooth profiles no bin.Fokker-Planck: Any N axisymm. “ “ no bin.Monte Carlo: Large N spherical star-by-star binariesN-Body: Fair N anything star-by-star (binaries)
May 2009May 2009 DEISA PRACEDEISA PRACE
ContentsContents
Introduction
Algorithms / Parallelization / Software
Binary Black Holes / Gravitational Waves
Hardware Issues
+ General Relativity....(if separations get smaller than some critical radius,for galactic supermassive black holes of 107 M O it is a=0.1 pc circular, or e.g. a=1.0 pc for e=0.9 )
So we need (among others):Supercomputers and Supersoftware …
•2-body Regularization (Kustaanheimo & Stiefel 1965) •3-body Regularization (Aarseth & Zare 1974)•Hierarchical Subsystems (Chain, Aarseth & Mikkola) •Triples, Quadruples, Stability (Mardling & Aarseth 2001) •Individual Hierarchical Time Stepping•Ahmad-Co ...hen Neighbour Scheme
Two-Body Relaxation, Core Collapse, Interplay with Most Compact Binaries Direct NDirect N--Body Body
is NOTis NOTbrute forcebrute force
May 2009May 2009 DEISA PRACEDEISA PRACE
Scaling of algorithm:
O(p N/p) + O(N2 /p) [ + O(N Nn/p) + O(N2bin/p)]
1 2 3 4Communication Total Force SPH Forcesystolic Regular Force Irregular Force KS Binaries
Parallelization and SoftwareParallelization and Software
Codes Discussed Here: φGRAPE/φGPU 1 + 2 (parallel GRAPE, GPU) NBODY6++ 1 + 2 (parallel) + 3 (parallel), no GRAPENBODY6xx as NBODY6++ + 4 (parallel) NBODY6yy as NBODY6++ + φGRAPE
17
Moore's Law for Direct N-Body
by D.C. HeggieVia www.maths.ed.ac.uk
●Baumgardt, Heggie, HutBaumgardt, Makino
2010
● Berentzen, Preto,Berczik, Merritt, Spurzem
106
Vector Computers
GRAPE
GRAPE/GPU Clusters
May 2009May 2009 DEISA PRACEDEISA PRACE
Method A: use geodetic equations, harmonic gauge, directly obtaineqs. of motion (Blanchet et al.)
Method B: Hamiltonian approach using ADM gauge (Schaefer et al.)
A and B equivalent till PN2.5 (1/c**5), higher order gauge functions appear.
PostPost--Newtonian DynamicsNewtonian Dynamics
Perihel shift
... higher order...
Grav. Radiation
May 2009May 2009 DEISA PRACEDEISA PRACE
Kupi, Amaro-Seoane & Spurzem 2006
What happens afterwards? Post-Newton Order „2.5“...
PostPost--Newtonian DynamicsNewtonian Dynamics
May 2009May 2009 DEISA PRACEDEISA PRACE
Indirect Proof by Hulse and Taylor, binary pulsar (Nobel prize 1Indirect Proof by Hulse and Taylor, binary pulsar (Nobel prize 1993) 993)
Gravity WavesGravity Waves
May 2009May 2009 DEISA PRACEDEISA PRACE
PostPost--Newtonian DynamicsNewtonian Dynamics
Kupi,Amaro-Seoane,Spurzem, MNRAS 2006
A complete inspiral in NBODY6++…
May 2009May 2009 DEISA PRACEDEISA PRACE
Parallelization and SoftwareParallelization and Software
•Copy Algorithm: parallelize work over block membersreplicate all data on all processors
Example: NBODY6++, for regular and irregular forcesexperimental: for binaries(Spurzem 1999)
•Ring Algorithm: domain decompositionpartial forces shiftedblocking or non-blocking, systolic or hyper-systolic
(Gualandris et al. 2005, Dorband et al. 2003)
•Mixed Algorithm: φGPU – domain decomposition on GPUmemories, copy algorithm for active particles (Harfst el al. 2007)
All scaling: O(N p) + O(N2 /p) Note: Special hypersystolic quadratic algorithm (Makino 2002):
O(N/sqrt(p)) + O(N2 /p)
May 2009May 2009 DEISA PRACEDEISA PRACE
Parallelization and SoftwareParallelization and Software
N: part. Numbers: Block sizep: processor numberτf: force calc. timeτl: latency timeτc: communic. timeξ: Number of bytes (192) β: Number of flop/force (20)X: speed (flop/s)Y: bandwidth (Byte/s)
GIZMO/NBODY6++ nearly same!But: Memory!
Dorband et al. 2003, Harfst et al. 2007
May 2009May 2009 DEISA PRACEDEISA PRACE
Parallelization and SoftwareParallelization and Software
Dorband, Hemsendorf, Merritt 2003, J. Comp. Phys.
May 2009May 2009 DEISA PRACEDEISA PRACE
DEISA Cineca / JUBL
Benchmarks
With Oliver Porth, Andreas ErnstAnd NIC Support TeamMarc-Andre Hermanns, Boris Orth,Wolfgang Frings
May 2009May 2009 DEISA PRACEDEISA PRACE
DEISA Louhi / JUMP
Benchmarks
With Oliver Porth, Andreas ErnstAnd NIC Support TeamMarc-Andre Hermanns, Boris Orth,Wolfgang Frings
May 2009May 2009 DEISA PRACEDEISA PRACE
Right: DEISA / JUMP / JUBLBenchmarks, Optimal Processor NumbersWith Oliver Porth, Andreas ErnstAnd NIC Support TeamMarc-Andre Hermanns, Boris Orth,Wolfgang Frings
Software, BenchmarksSoftware, Benchmarks
May 2009May 2009 DEISA PRACEDEISA PRACE
ContentsContents
Introduction
Algorithms / Parallelization / Software
Binary Black Holes / Gravitational Waves
Hardware Issues
May 2009May 2009 DEISA PRACEDEISA PRACEX
Y
Z
Two equal-mass black holes near center of Plummer-model galaxy
Initial Conditions - I:
mBH 1=mBH2= 0 .005mBH 1=mBH2= 0 .020
a=0. 6G=M=1
ETOT=−14
Galactic Nuclei, Black HolesGalactic Nuclei, Black Holes
May 2009May 2009 DEISA PRACEDEISA PRACE
Binary BH Evolution in non-rotating spherical Models
Slide:Ingo Berentzen
IncreasingParticle Number
May 2009May 2009 DEISA PRACEDEISA PRACEX
Y
Z
Two equal-mass black holes near center of King-model (W0=6) galaxy
Initial Conditions - II:
mBH 1=mBH2= 0. 020a=0. 6
G=M=1
ETOT=−14
ωo= 0.0+,0 .3± ,0 .6± ,1. 2± ,1. 8±
ωo= 1.8+,Lz≈ 0
Galactic Nuclei, Black HolesGalactic Nuclei, Black Holes
May 2009May 2009 DEISA PRACEDEISA PRACE
Returning to Returning to Classical Dynamics:Classical Dynamics:
Spherical Systems:BH binary stalls with N
Axisymmetric,Rotating systems:No stalling observed
Berczik, Merritt, Spurzem, 2005, ApJBerczik, Merritt, Spurzem, Bischof,
2006, ApJ
Movie!
May 2009May 2009 DEISA PRACEDEISA PRACE
Post-Newtonian Dynamics
Berentzen, Preto, Berczik, Merritt, Spurzem, Astroph. Jl. 2009
Astrophysical Objects in the realm of LISA (left) and VIRGO or LIGO (right)
= activities with Nbody Simulations
Gravitational Wave Prediction from Black Holes in Galactic Nuclei and Star Clusters
in star clusters
Advanced VIRGO
May 2009May 2009 DEISA PRACEDEISA PRACE
Post-Newtonian Dynamics
Slide:Ingo Berentzen
May 2009May 2009 DEISA PRACEDEISA PRACE
Berentzen, Preto, Berczik, Merritt,SpurzemAstroph. Jl. 2009
May 2009May 2009 DEISA PRACEDEISA PRACE
Our templates Our templates (Gopakumar, Sch(Gopakumar, Schääfer et al.) fer et al.)
QuasiQuasi--periodic periodic variations in orbital elementsvariations in orbital elements
We can handle We can handle arbitrary eccentricitiesarbitrary eccentricities
May 2009May 2009 DEISA PRACEDEISA PRACE
ContentsContents
Introduction
Algorithms / Parallelization / Software
Binary Black Holes / Gravitational Waves
Hardware Issues
May 2009May 2009 DEISA PRACEDEISA PRACE
Computational Science......after von Neumann...
To reach Exaflop/s (109 Gflop/s = 1018 Flop/s = mol + 2) new ideas needed!
Problems:Power ConsumptionEfficiency for Real Applications
Exaflop/s?
Petaflop/s
Teraflop/s
Gigaflop/s
May 2009May 2009 DEISA PRACEDEISA PRACE
Computational Science......after von Neumann...
Our (and other's) solutions.... GPU 1 Tflop/s peak ~ 0.4 Watt / Gflop/sIBM BlueGene ~ 2.5 Watt / Gflop/sStandard PC ~ 25 Watt / Gflop/s
Real Codes Need More Power:
Efficiency N-Body ~ 80 % ~ 0.32 W/Gflop/sEfficiency SPH ~ 4 % ~ 10 W/Gflop/s
Reconfigurable Hardware (FPGA) Univ. Heidelberg MPRACE Board
Efficiency SPH ~ 4 W/Gflop/s
Tesla C1060 graphical processingunit (GPU), 240 cores, 100 Gbit/s
Hardware at CAS
Very (!) Competitive! ( → Peter Berczik )200 Tesla GPU ~ 200 Tflop/s Peak
Frontier Cluster Heidelberg41x8=328 Cores40 Tesla GPU20 Tflop/s PeakKlessen, Banerjee, BerczikSpurzem, Männer et al.
Frontier Cluster Heidelberg41x8=328 Cores40 Tesla GPU20 Tflop/s PeakKlessen, Banerjee, BerczikSpurzem, Männer et al.
titan Cluster HeidelbergUsing two different kinds of acceleratorsGRAPE/GPUMPRACE/FPGAFunded by Volkswagen Foundation
green Supercomputing...
Custom Hypertransport Based Interconnectusing FPGA Interface Cards, at the Univ. of Heidelberg, partnership with AMDU. Brüning and collaborators
HTX Boardsub microsec latencyEXTOLL Interconnectswitchless
shared / distributememory boundaryblurs!
green Supercomputing...
May 2009May 2009 DEISA PRACEDEISA PRACE
•32 dual-Xeon 3.0 GHz nodes•32 GRAPE6a•14 TB RAID•Infiniband link (10 Gb/s) •Speed: ~4 Tflops•N up to 4M•Cost: ~500K USD•Funding: NSF/NASA/RIT
ARI 32 node cluster GRACE = GRAPE + MPRACE
•32 dual-Xeon 3.2 GHz nodes•32 GRAPE6a•32 FPGA•7 TB RAID•Dual port Infiniband link (20 Gb/s) •Speed: ~4 Tflops•N up to 4M•Cost: ~380K EUR•Funding: Volkswagen/Baden-Württemberg
Infiniband Dual 20Gb/s
Special Supercomputing
May 2009May 2009 DEISA PRACEDEISA PRACE
May 2009May 2009 DEISA PRACEDEISA PRACE
Pipeline Generation on FPGA I
May 2009May 2009 DEISA PRACEDEISA PRACE
Pressure force
pipeline:
Special Supercomputing
May 2009May 2009 DEISA PRACEDEISA PRACE
Harfst, Gualandris,Merritt, Spurzem,Portegies Zwart, Berczik2007, New Astron.
Parallelization and SoftwareParallelization and Software
10-1
100
101
102
103
104
103 104 105 106
Speed (GFlops)
Particle number - N
GRAPE6a
GRAPE6
32xGRAPE6a
010204081632
May 2009May 2009 DEISA PRACEDEISA PRACE
GeForce 8800 GTX (NVIDIA)Using CUDA LibrarySpecial Interfaces and API from GRACE project ported.
Spurzem et al. 2007, Jl.Phys.Conf.Ser.Berczik et al. 2008, Marcus et al. 2008(SPHERIC)
HardwareHardwareRecent Development: GPU – Graphics Cards
φGPU Hermite 6th results
2009.04.22 – China, Peking, Lenovo cluster testing
Speed maximum: for N = 4M on NGPU = 49, ~13 Tflops!!!
Future Work
• How does SMBH (-binary) interact withinterstellar central gas disk? Circum-binary, individual, exchange of torques between 0.1 and 10-3 pc? Stellar Dynamical Shrinking Time ScaleViscous Timescale of Disk (Thin/Thick)Migration of Black Holes / Star Formation in Disk
• Spin-Orbit Interaction/with Albert-Einstein Inst. GolmSpin Alignment BH-Disk Bardeen-Petterson EffectSpin Alignment due to Star Accretion by Tidal Disruption
• Relativistic Spin-Orbit- and Spin-Spin-Interaction
May 2009May 2009 DEISA PRACEDEISA PRACE
Spin-Orbit Interaction S / Spin-Spin SS
Faye, Blanchet, Buonanno 2006
Note: Strue = m a vspin
S = c Strue
Maximally rotating body: a ~ Gm/c2 ; vsprin = c Strue ~ Gm2/c ; S contains no c-PowerIf not maximally rotating, PN order’s change.
PostPost--Newtonian DynamicsNewtonian Dynamics
May 2009May 2009 DEISA PRACEDEISA PRACE
Spin-Orbit Interaction S / Spin-Spin SS
Faye, Blanchet, Buonanno 2006
S = c Strue
May 2009May 2009 DEISA PRACEDEISA PRACE
T. Prince (Caltech), Project Scientist, LISA:T. Prince (Caltech), Project Scientist, LISA:Two of the outstanding theoretical issues to be Two of the outstanding theoretical issues to be
addressed if LISA is to succeed are:addressed if LISA is to succeed are:““Formation and evolution of nuclear star clusters Formation and evolution of nuclear star clusters around supermassive black holesaround supermassive black holes””““Understanding the fate of supermassive black holes in Understanding the fate of supermassive black holes in galaxy mergersgalaxy mergers””
GPU, PRACE and future computers will GPU, PRACE and future computers will be premier computational instruments in be premier computational instruments in the world for answering these questions.the world for answering these questions.(R. Spurzem)(R. Spurzem)