107
Topics from Relativity 1 SEVERAL TOPICS FROM RELATIVITY FRANZ ROTHE 2010 Mathematics subject classification: 51Fxx; 51Pxx; 70H40. Keywords and phrases: Instructional exposition, General theory, Geometry and physics, special relativity. Contents 1 Riemannian geometry 4 1.1 Curved coordinate systems ............................. 4 1.2 About dierentiable manifolds ........................... 5 1.3 Tensors ....................................... 7 1.4 Riemannian manifold ................................ 17 1.5 Lie derivative .................................... 21 2 Special Relativity 23 2.1 Relativity of time and length ............................ 23 2.2 Discovery of Aberration and Parallax ....................... 28 2.3 Aberration and the Doppler eect ......................... 29 2.4 The one-dimensional Doppler eect ........................ 30 2.5 Four-vectors and Minkowski metric ........................ 31 2.6 The relativistic Doppler eect ........................... 32 2.7 Four-velocity .................................... 34 2.8 The energy-momentum vector ........................... 34 2.9 The Compton eect ................................ 36 2.10 Collision of particles ................................ 40 2.11 The motion of particles ............................... 42 3 The Lorentz Group 45 3.1 Dierent aging of twins .............................. 45 3.2 The Lorentz transformations ............................ 48 3.3 Infinitesimal generators .............................. 53

SEVERAL TOPICS FROM RELATIVITY - UNC Charlottemath2.uncc.edu/~frothe/grelativity.pdfTopics from Relativity 1 SEVERAL TOPICS FROM RELATIVITY FRANZ ROTHE 2010 Mathematics subject classification:

Embed Size (px)

Citation preview

Topics from Relativity 1

SEVERAL TOPICS FROM RELATIVITY

FRANZ ROTHE

2010 Mathematics subject classification: 51Fxx; 51Pxx; 70H40.Keywords and phrases: Instructional exposition, General theory, Geometry and physics, special relativity.

Contents

1 Riemannian geometry 41.1 Curved coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 About differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Riemannian manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5 Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Special Relativity 232.1 Relativity of time and length . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Discovery of Aberration and Parallax . . . . . . . . . . . . . . . . . . . . . . . 282.3 Aberration and the Doppler effect . . . . . . . . . . . . . . . . . . . . . . . . . 292.4 The one-dimensional Doppler effect . . . . . . . . . . . . . . . . . . . . . . . . 302.5 Four-vectors and Minkowski metric . . . . . . . . . . . . . . . . . . . . . . . . 312.6 The relativistic Doppler effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.7 Four-velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.8 The energy-momentum vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.9 The Compton effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.10 Collision of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.11 The motion of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 The Lorentz Group 453.1 Different aging of twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.2 The Lorentz transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.3 Infinitesimal generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2 F. Rothe

4 The Poincaré Half-Plane Model 564.1 Poincaré half-plane and Poincaré disk . . . . . . . . . . . . . . . . . . . . . . . 564.2 The Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.3 The curve of minimal hyperbolic length . . . . . . . . . . . . . . . . . . . . . . 604.4 The minimum of hyperbolic length . . . . . . . . . . . . . . . . . . . . . . . . 614.5 Some useful reflections in the half-plane . . . . . . . . . . . . . . . . . . . . . . 62

5 Equation of motion 645.1 Affine geodesic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2 Metric geodesic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.3 The quadratic Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.4 Null geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.6 Killing vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Geodesics in the Schwarzschild metric 756.1 The equation for the shape of relativistic orbits . . . . . . . . . . . . . . . . . . 776.2 Kepler’s classical nonrelativistic orbits . . . . . . . . . . . . . . . . . . . . . . 786.3 Scattering in Newtonian dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 796.4 Perturbation expansion for relativistic bounded orbits . . . . . . . . . . . . . . . 796.5 The mercury perihelion rotation . . . . . . . . . . . . . . . . . . . . . . . . . . 806.6 Perburtation expansion for the angle of deflection . . . . . . . . . . . . . . . . . 816.7 The bending of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7 Gauss’ Differential Geometry and the Pseudo-Sphere 837.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.2 About Gauss’ differential geometry . . . . . . . . . . . . . . . . . . . . . . . . 837.3 Riemann metric of the Poincaré disk . . . . . . . . . . . . . . . . . . . . . . . . 847.4 Riemann metric of Klein’s model . . . . . . . . . . . . . . . . . . . . . . . . . 887.5 A second proof of Gauss’ remarkable theorem . . . . . . . . . . . . . . . . . . 917.6 Principal and Gaussian curvature of rotation surfaces . . . . . . . . . . . . . . . 947.7 The pseudo-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.8 Poincaré half-plane and Poincaré disk . . . . . . . . . . . . . . . . . . . . . . . 997.9 Embedding the pseudo-sphere into Poincaré’s half-plane . . . . . . . . . . . . . 1007.10 Embedding the pseudo-sphere into Poincaré’s disk . . . . . . . . . . . . . . . . 1017.11 About circle-like curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.12 Mapping the boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

References[1] Max Born, Einstein’s Theory of Relativity, revised edition, Dover publications, 1924, 1965.[2] , The Born-Einstein Letters 1916-1955, Friendship, Politics and Physics in Uncertain Times, Macmillan,

2005.[3] Ta-Pei Cheng, Relativity, Gravitation and Cosmology, second edtion, Oxford University Press, 2010.[4] Leo Corry, David Hilbert and the Axiomatization of Physics (1898-1918), from Grundlagen der Geometrie to

Grundlagen der Physik , Kluwer Academic Publishers, ISBN 1-4020-2777-X(HB), 2004.[5] David Griffiths, Introduction to Elementary Particles, second edition, Wiley-VCH, 2008.[6] Stephen G. Gasiorowicz Jeremy Bernstein, Paul M. Fishbane, Modern Physics, Prentice-Hall, 2000.[7] G.Efstathiou M.P.Hobson and A.N.Lasenby, General Relativity, Cambridge University Press, 2006.[8] Abraham Pais, Subtle is the Lord—the science and life of Albert Einstein, Oxford University Press, 1982.[9] David Park, The Grand Contraption— the world in myth, number, and chance, second printing, Princeton

University Press, 2005.

Topics from Relativity 3

[10] Matthew Sands Richard P. Feynman, Robert B. Leighton, The Feynman Lectures on Physics, Addison-Wesley,1965.

[11] ’Gerald ’t Hooft, Introduction to General Relativity, Rinton Press Princeton, 2001.[12] Hermann Weyl, Raum-Zeit-Materie, 5th revised edition, Springer, 1923.

Franz Rothe, Department of Mathematics and StatisticsUniversity of North Carolina at CharlotteCharlotte, NC 28223e-mail: [email protected]

4 F. Rothe

1. Riemannian geometry

1.1. Curved coordinate systems The conversion of spherical coordinates (r, θ, φ) to Cartesiancoordinates (x, y, z) is

x = r sin θ cos φ (1.1)y = r sin θ sin φ (1.2)z = r cos θ (1.3)

These are the formulas normally used to define spherical coordinates, taking as their standarddomain the values (r, θ, φ) with r ≥ 0, 0 ≤ θ ≤ π and −π < φ ≤ π. According to the multivariablechain rule, the conversion of the differentials becomes dx

dydz

=

sin θ cos φ r cos θ cos φ −r sin θ sin φsin θ sin φ r cos θ sin φ r sin θ cos φ

cos θ −r sin φ 0

dr

dθdφ

(1.4)

For example we consider a point particle moving along a path x = x(t), y = y(t), z = z(t),respectively r = r(t), θ = θ(t), φ = φ(t). The components of the velocity

vx =dxdt, vy =

dydt, vz =

dzdt

respectively

wr =drdt, wθ =

dθdt, wz =

dφdt

have the same transformation law given by formula (1.4). Indeed, one gets vx

vy

vz

=

sin θ cos φ r cos θ cos φ −r sin θ sin φsin θ sin φ r cos θ sin φ r sin θ cos φ

cos θ −r sin φ 0

wr

(1.5)

The following is a totally different situation. A potential V is called scalar if the invariance

W(r, θ, φ) = V(x, y, z)

holds. Usually, we use the same letter, and indicate by a prime that the functions V and Ware different. But they describe the same geometric or physical situation. According to themultivariable chain rule, the conversion of the gradient [∂rW, ∂θW, ∂φW] to [∂xV, ∂yV, ∂zV] becomes

[∂rW, ∂θW, ∂φW] = [∂xV, ∂yV, ∂zV]

sin θ cos φ r cos θ cos φ −r sin θ sin φsin θ sin φ r cos θ sin φ r sin θ cos φ

cos θ −r sin φ 0

(1.6)

One sees that the same matrix as in formula (1.5) occurs, but now on the other side of the equation.Too, note that this coincidence is only made possible by using column vectors for the differentials,whereas we have used row vectors for the gradient. I use the symbol T meaning "transpose",to convert row vectors to column vectors, for typesetting convenience, and for transposition ofmatrices.

Definition 1.1 (Components of a contravariant vector). The components of a contravariantvector have the same conversion (1.4) as the differentials.

Definition 1.2 (Components of a covariant vector). The components of a covariant vector havethe same conversion (1.6) as the components of the gradient of a scalar.

Topics from Relativity 5

For example, the electric field [Ex, Ey, Ez] in Cartesian coordinates, gets in spherical coordi-nates the components [Fr, Fθ, Fφ] such that

[Fr, Fθ, Fφ] = [Ex, Ey, Ez]

sin θ cos φ r cos θ cos φ −r sin θ sin φsin θ sin φ r cos θ sin φ r sin θ cos φ

cos θ −r sin φ 0

(1.7)

Remark. The same conversion (1.6) as for a gradient, holds for partial derivatives in any generallinear transformation, but only for the covariant derivatives in all nonlinear transformations.

1.2. About differentiable manifolds These laws for the conversion of coordinates systemsapply in a more general context, indeed on any differential manifold.

Definition 1.3 (Differential manifold). A differential manifold M of dimension n is a topologicallocally compact Hausdorff space, with the following additional structure. For every point thereexists a neighborhood with a coordinate system (x1, . . . , xn).

The coordinate systems for intersecting neighborhoods are compatible. There exists bijectivedifferentiable transformations between any two different coordinate systems say (x1, . . . , xn) and(x′1, . . . , x

′n), which are valid in the intersection of such neighborhoods and make them compatible.

Remark. Thus coordinate transformation are considered to be passive transformations, naming thesame points with different coordinate labels,—at least at first hand. We name the coordinates,respectively base vectors, of two system S and S ′ by the same letters and use primes fordistinguishing between them.Remark. The extension of such transformations to the entire manifold are called point transfor-mations. Existence of such point transformations, indeed locally of any prescribed form, can beproved. The proof uses a tool called the partition of unity.

Definition 1.4 (Tangent plane and tangent manifold). For every point P of a differential manifoldof dimension n, there exists a tangent plane TP. This is an n-dimensional linear space, with somebasis [e1, . . . , en]. The contravariant differential [dxa] corresponds to the vector

dx =

n∑a=1

eadxa

These vector are invariant under coordinate transformations, just like scalar quantities. The union

T =⋃P∈M

TP

of all tangent planes is called the tangent manifold. This is a differential manifold of dimension 2n.

Lemma 1.1. Under any point transformation

x′a = x′a(x1, . . . , xn) for a = 1, . . . n

the contravariant components [vb] have the transformation law

v′a =∂x′a

∂xb vb for a = 1, . . . n

and the base vectors eb have the transformation law

eb =∂x′a

∂xb e′a for b = 1, . . . n

Hence the vector itself is invariant:v = vbeb = v′ae′a

These are the vectors in the tangent plane TP.

6 F. Rothe

Remark. We have already used the Einstein sum convention: over any index which appears upperand lower-hand in an expression, is to be taken the sum, unless otherwise stated. For example, thedifferential is simply written

dx = eadxa

Remark. The existence of the tangent manifold can be proved in general, by means of a somerather abstract (and farfetched) construction. The existence proof is easy under the assumption themanifold is, at least locally, embedded into any higher dimension flat space RN , and no restrictionson the dimension N are imposed. 1 This situation occurs for general relativity.

Indeed, suppose an embedding of a manifold M ⊆ RN is given by formulas

Xi = Xi(x1, . . . xn) for i = 1, . . . ,N (1.8)

The tangent space basis at any point P = (x1, . . . , xn) is simply

ea =

[∂X1

∂xa , . . . ,∂XN

∂xa

]T

for a = 1, . . . , n (1.9)

Definition 1.5 (Dual linear space). The dual X∗ of a real linear space X consists of all linearfunctionals X 7→ R. We get the natural bilinear form 〈·, ·〉 which maps X∗ × X to R and assigns toany ordered pair of x∗ ∈ X∗ and x ∈ X the real number 〈x∗, x〉.

The cotangent plane T ∗P is identified with the dual of TP, which is linear space of linearfunctionals Tp 7→ R. One gets a convenient basis [ωa] for T ∗P by requiring

〈ωa, eb〉 = δab =

1 if a = b;0 if a , b

for a, b = 1 . . . n and extending by linearity.

Lemma 1.2. Under any point transformation

x′a = x′a(x1, . . . , xn) for a = 1, . . . n

the covariant components [ fb] have the transformation law

fb =∂x′a

∂xb fa for b = 1, . . . n

and the base vectors ωb have the transformation law

ω′a =∂x′a

∂xb ωb for a = 1, . . . n

Hence the vectorv = fbωb = f ′aω

′a

itself is invariant. These are the vectors in the cotangent plane T ∗P.

Lemma 1.3. For a particle moving in a scalar potential field S with velocity v, the rate of changeof the potential felt by the particle is

dSdt

= (∂aS )va = 〈∇S , v〉

1 Allowing any large number N, this not a too strong assumption. The situation is different, once restrictions on thenumber N are imposed.

Topics from Relativity 7

Proof. For the rate of change of the composite function S (t) = S (x1(t), . . . , xn(t)), the multivariablechain rule gives

dSdt

=∂S∂xa

dxa

dt

but this is just the functional ∇S = (∂aS )ωa ∈ T ∗P applied to the vector v = vbeb since

〈∇S , v〉 = 〈(∂aS )ωa, vbeb〉 = (∂aS )vbδab = (∂aS )va

Lemma 1.4. By the rule〈b, a〉 = 〈∗id(a),b〉∗ (1.10)

for all a ∈ Tp and b ∈ T ∗P, the natural bijection id : TP 7→ T ∗∗P is given. Via this mapping id, onemay identify the double-dual T ∗∗P with the tangent plane TP, Hence one may view TP as the spaceof linear functionals T ∗P 7→ R.

Proof. The double-dual T ∗∗P consists by definition of the linear functionals T ∗P 7→ R. A basis [fa]of T ∗∗P is given by requiring

〈∗fa,ωb〉∗ = δ b

a =

1 if a = b;0 if a , b

(1.11)

and extending by linearity. An injective linear mapping id : TP 7→ T ∗∗P is defined by settingid(ea) = fa for a = 1 . . . n and extending by linearity. Since the linear spaces TP and T ∗∗P both havethe same finite dimension n, one obtains even a bijection. This bijection gives the identificationT ∗∗P = TP. Since

〈∗fa,ωb〉∗ = δa

b = 〈ωb, ea〉

linearity gives the rule (1.10) for all c ∈ T ∗∗P = TP and b ∈ T ∗P.

1.3. Tensors

Definition 1.6 (Tensor). Let q ≥ 0 and r ≥ 0 be integers. A tensor T of type (q, r) is a multilinearmapping 1

T :

q factors︷ ︸︸ ︷T ∗P × T ∗P × · · · × T ∗P ×

r factors︷ ︸︸ ︷TP × TP × · · · × TP 7→ R

The type of such a tensor is q-fold contravariant and r-fold covariant.

We see that a tangent vector v ∈ TP has become a tensor of type (1, 0). It has been "disguised"as the linear mapping 〈., v〉 : T ∗P 7→ R. A generic tensor T of type (q, r) is a linear combination

T = T a1...aq

b1...brea1 ⊗ ea2 ⊗ · · · ⊗ eaq ⊗ ω

b1 ⊗ ωb2 ⊗ · · · ⊗ ωbr (1.12)

Note the convenience of the Einstein sum convention. In shorthand, we often write [T a1...aq

b1...br]

for such a tensor, since all information is already contained in the set of components.The basis of the linear space of tensors of type (q, r) is the set of exterior products

ea1 ⊗ ea2 ⊗ · · · eaq ⊗ ωb1 ⊗ ωb2 ⊗ · · ·ωbr = e b1...br

a1...aq

1 Some authors used to talk about a "machine", but mathematicians still need coffee machines to convert their thoughtsinto theorems, nevertheless.

8 F. Rothe

with any a1 . . . aq and b1 . . . br in 1 . . . n. The basis tensors are defined by the requirements

e b1...bra1...aq

(ωc1 , . . . ,ωcq , ed1 , . . . , edr

)= δ c1

a1· δ c1

a1· δ d1

b1· δ d1

b1

=

1 if a1 = c1, . . . , aq = cq and b1 = d1, . . . , br = dr ;0 otherwise

with any a1 . . . aq, c1 . . . cq and b1 . . . br, d1 . . . dq in 1 . . . n and extending by linearity. Thedimension of this linear space is nq+r.

Lemma 1.5. The rule

a1 ⊗ a2 ⊗ · · · aq ⊗ b1 ⊗ b2 ⊗ · · ·br(c1, . . . , cq,d1, . . . ,dr

)= 〈c1, a1〉 · · · 〈cq, aq〉 · 〈b1,d1〉 · · · 〈br,dr〉

holds for any a1 . . . aq ∈ Tp, c1 . . . cq ∈ T ∗P and b1 . . . bq ∈ T ∗P, d1 . . . dq ∈ TP.

Problem 1.1. In general, when transforming the components of a tensor of arbitrary type (q, r), thecomponents for the S ′-system are obtained from those of the S -system putting for each superscripta Jacobian transformation matrix ∂x′a/∂xc, and for each subscript an inverse Jacobian ∂xc/∂x′a.Both Jacobians appear on the right-hand side together with the S -system tensor.

Apply these rules to get the components t′ cab from the components t f

de .

Answer.

t′ cab =

∂xd

∂x′a∂xe

∂x′b∂x′c

∂x f t fde

Problem 1.2. Suppose we may only use the Jacobian ∂x′a/∂xc but not its inverse. Under thatrestriction, the superscripts are handled in the same way as in problem 1.1, but for each subscriptthe Jacobian ∂x′a/∂xc appear on the left-hand side together with the S ′-system tensor.

Apply these rules to relate the components t′ cab and t f

de .

Answer.∂x′a

∂xd

∂x′b

∂xe t′ cab =

∂x′c

∂x f t fde

Remark. It helps to remember that upper and lower indices on the same side of the equation arealways paired, when one counts the upper index in the denominator of a differential quotient like alower index.

Definition 1.7 (Affine connection). An affine connection defines the tangential part of the rate ofchange of the tangent-plane and its base vectors. The connection coefficients are defined by

Γabc = 〈ωa, ∂ceb〉 (1.13)

Lemma 1.6. An affine connection yields

〈∂cωa, eb〉 = −Γa

bc (1.14)

and hence determines, too, the cotangential part of the rate of change of the cotangent-plane andits base vectors.

Topics from Relativity 9

Lemma 1.7. The rate of change of the tangent base vectors are

∂ceb = eaΓabc + nbc

where the normal parts satisfy〈ωa,nbc〉 = 0

for all a, b, c = 1 · · · n.

Lemma 1.8. The rate of change of the cotangent base vectors are

∂cωa = −ωbΓa

bc + mac

where the normal parts satisfy〈ma

c, eb〉 = 0

for all a, b, c = 1 · · · n.

Definition 1.8 (Intrinsic derivatives). The intrinsic derivatives 1 of the base vectors are

∂ceb = ea〈ωa, ∂ceb〉 = Γa

bcea (1.15)

∂cωa = ωb〈∂cω

a, eb〉 = −Γabcω

b (1.16)

Definition 1.9 (Covariant derivative). Let c be any index in 1 . . . n. The partial covariantderivative of a scalar function S is equal to the partial derivative:

∇cS = ∂cS (1.17)

The partial covariant derivative ∇cT of any tensor T, say of type (q, r) as in equation (1.12), isobtained by using linearity and the Leibniz product for the partial derivatives of the components,and the intrinsic partial derivatives of the base vectors.

Remark. Suppose a specific embedding M ⊆ RN of the manifold is given. Then the followingsituation occurs:• The intrinsic part of any vector X ∈ RN is given by the projection

Pro j‖TP X =∑

a=1...n

ea〈ωa,X〉 (1.18)

• As explained in lemma 1.12 below, the connection turns out to be symmetric: Γabc = Γa

cbholds for all indices a, b, c.

• In traditional Gaussian differential geometry the dimensions are n = 2 and N = 3. Here thenormal parts can be calculated, and yield the Weingarten formulas. One needs to carefullydistinguish partial derivatives, which refer to the embedding R3, from the intrinsic derivativesused in this exposition.

Lemma 1.9. The following are the two simplest cases for a covariant derivative. Let u = vses bea contravariant vector and c be any index in 1 . . . n. The c-th partial covariant derivative has thecomponents

∇cva = ∂cva + Γascvs (1.19)

Let f = fbωb be a covariant vector. The c-th partial covariant derivative has the components

∇c fb = ∂c fb − Γsbc fs (1.20)

1 In this exposition to differentiable manifolds, the intrinsic derivatives get no own symbol.

10 F. Rothe

Lemma 1.10 (Rule to get covariant derivatives). In the general case of a tensor of type (q, r), thecomponents of the partial covariant derivative are denoted by

∇cT a1...aq

b1...bror even simpler T a1...aq

b1...br ;c

They are obtained by the following rule: Each such components is the sum of 1 + q + r terms. Thefirst term is the partial derivative

∂cT a1...aq

b1...br

The remaining terms are all products of the tensor components with a Christoffel symbol. The nextq terms are added. For each term, a Christoffel symbol has robbed a different one of contravariantindices a1 . . . aq and replaced this index by a contravariant summation index s. The Christoffelsymbol gets the robbed index, the covariant summation index s, and c as last index.

The last r terms are subtracted. Once more, for each term, a Christoffel symbol "has robbed" adifferent one of covariant indices b1 . . . br, and replaced it by a covariant summation index s. TheChristoffel symbol gets as a contraction the corresponding contravariant summation index s, therobed index, and c as last index.

Proof for a contravariant vector. Let u = vses be a contravariant vector and c be any index in1 . . . n. The c-th partial covariant derivative is

∇cv = ea〈ωa,

DvDxc 〉

where the capital D means taking into account the derivatives of the base vectors es, too. Butbecause of the projection along the tangent plane, only the tangential part of these derivativesincorporated into the connection coefficient is taken into account. Hence

∇cv = ea〈ωa,

D(vses)Dxc 〉 = ea〈ω

a,∂vs

∂xc es + vs ∂es

∂xc 〉

= ea∂va

∂xc + eaus〈ωa,∂es

∂xc 〉 = ea

[∂va

∂xc + Γascvs

]

Lemma 1.11. For any product or contraction of tensors, the covariant derivatives are formedfollowing the Leibniz product rule.

Proof for the simplest case. Take the contraction va fa of a contravariant vector v = vaea and acovariant vector f = fbωb. Since the contraction is a scalar, its covariant derivative is just thepartial derivative, and clearly satisfies the Leibniz product rule.

∇c(va fa) = ∂c(va fa) = (∂cva) fa + va(∂c fa)

= (∂cva + Γascvs) fa + vb(∂c fb − Γs

bc fs) = (∇cva) fa + vb(∇c fb)

One can add and subtract the Christoffel terms to get the corresponding covariant derivatives. Thusone sees that the covariant partial derivative satisfies the Leibniz product rule.

One sees that formula (1.17), together with lemma 1.9 and 1.11 already uniquely determine thecovariant derivatives of tensors of types (0, 0), (1, 0) and (0, 1). One may proceed inductively fromtensors of type (q, r) to those of types (q + 1, r) and (q, r + 1). Indeed, take any tensor [T a1...aq

b1...br]

of type (q, r), and [vs] of types (1, 0), as well as [ ft] of type (0, 1).

Topics from Relativity 11

The outer product [vsT a1...aq

b1...br] is of type (q+1, r). Similarly, the outer product [T a1...aq

b1...brft]

is of type (q, r + 1). There covariant derivatives are to be obtained via the stipulated Leibniz rule.One obtains:

∇c(vsT a1...aq

b1...br) = (∇cvs)T a1...aq

b1...br+ vs(∇cT a1...aq

b1...br) (1.21)

where the right-hand side may already be calculated using formula (1.19), and the rule for tensors oftype (q, r). Thus one may proceed inductively to tensors of any type, and check that the rules givenby definition 1.9 are always valid. Indeed, with a bid of additional work, one obtains followingresult.

Theorem 1.1. Once a connection is specified, and the Leibniz rule rule is stipulated, the covariantderivatives are uniquely determined for all tensors of arbitrary types. Indeed, they are obtained bythe rule from lemma 1.10, and moreover have the following properties:• The covariant derivative of any contraction equals the contraction of the covariant derivative.

As an example we take a tensor [tabd f ] of type (2, 2).

If vabd f c = ∇ctab

d f and one contracts to yaf = tab

b f then ∇cyaf = vab

b f c

In the end of course, both sides are called ∇ctabb f .

• The Leibniz product rule holds for all possible products of tensors. As an example take atensor [tab] of type (2, 0) and a tensor [ fd] of type (0, 1).

∇c(tab fd) = (∇ctab) fd + tab(∇c fd)

Problem 1.3. Apply the rules from lemma 1.10 to get the covariant derivative ∇dt cab.

Answer.∇dt c

ab = ∂dt cab − Γs

adt csb − Γs

bdt cas + Γc

sdt sab

Proposition 1.1 (Transformation of an affine connection). For any C2-smooth point transforma-tion between two coordinate systems, say (x1, . . . , xn) and (x′1, . . . , x

′n), the Christoffel symbols are

transformed following the rule

Γ′abc =∂x′a

∂xd

∂x f

∂x′b∂xg

∂x′cΓd

f g +∂x′a

∂xd

∂2xd

∂x′c∂x′b

=∂x′a

∂xd

∂x f

∂x′b∂xg

∂x′cΓd

f g −∂x f

∂x′b∂xd

∂x′c∂2x′a

∂xd∂x f

(1.22)

Proof. The connection coefficients in the S ′-system are defined following the rule (1.13)

Γ′abc = 〈ω′a,∂e′b∂x′c〉

Now the point transformation gives

Γ′abc = 〈∂x′a

∂xd ωd,∂

∂x′c

(∂x f

∂x′be f

)〉

= 〈∂x′a

∂xd ωd,

(∂x f

∂x′b∂e f

∂x′c+

∂2x f

∂x′b∂x′ce f

)〉

=∂x′a

∂xd

[∂x f

∂x′b〈ωd,

∂xg

∂x′c∂e f

∂xg 〉 +∂2x f

∂x′b∂x′c〈ωd, e f 〉

]=∂x′a

∂xd

[∂x f

∂x′b∂xg

∂x′cΓd

f g +∂2xd

∂x′b∂x′c

]

12 F. Rothe

thus proving the first formula. Inverse point transformations have inverse Jakobi matrices. Hence

∂x′a

∂xd

∂xd

∂x′c= δa

c

∂x′b

(∂x′a

∂xd

∂xd

∂x′c

)= 0

∂2x′a

∂xd∂x f

∂x f

∂x′b∂xd

∂x′c+∂x′a

∂xd

∂2xd

∂x′c∂x′b= 0

Thus we get the second formula from the first one.

Corollary 1. For a C2-smooth manifold, the variation of the connection symbols δΓabc is a tensor.

Definition 1.10 (Torsion tensor). For a C2-smooth manifold, the antisymmetric part T abc =

Γabc − Γa cb is a tensor, called the torsion tensor.

Definition 1.11 (Symmetric connection). A connection is called symmetric iff Γabc = Γa

cb holdsfor all indices a, b, c.

Problem 1.4. Convince convince yourself that a C2-smooth point transformation takes a symmetricconnection to a symmetric one.

Lemma 1.12. Suppose there exists a C2-smooth embedding M ⊆ RN of the manifold. Then theconnection coefficients are symmetric: Γa

bc = Γacb holds for all indices a, b, c.

Proof. With the embedding given by formulas (1.8), the tangent space basis at any point P =

(x1, . . . , xn) has the vectors

eb =

[∂X1

∂xb , . . . ,∂XN

∂xb

]T

=∂~X∂xb

for b = 1, . . . , n. Hence the connection coefficients are

Γabc = 〈ωa, ∂ceb〉 = 〈ωa,

∂2 ~X∂xb∂xc 〉 = Γa

cb

since the order taking partial derivatives can be exchanged for C2-smooth functions.

Proposition 1.2. Assume that the connection is symmetric. For any given point, there exists a pointtransformation which makes all connection coefficients zero at this point.

Proof. To transform the connection coefficients at point P to zero, the quadratic transformation

x′a = xa − xa(P) +Γa

bc(P)2

[xb − xb(P)

] [xc − xc(P)

]for a = 1 . . . n (1.23)

will do. Since the connection is symmetric the derivatives of the above transformation are

∂x′a

∂xb = δab + Γa

bc(P)(xc − xc(P))

∂2x′a

∂xb∂xc = Γabc(P)

Topics from Relativity 13

Now the second formula from equation (1.22) to transform the connection coefficients yields

Γ′abc =∂x′a

∂xd

∂x f

∂x′b∂xg

∂x′cΓd

f g −∂x f

∂x′b∂xd

∂x′c∂2x′a

∂xd∂x f

=∂x f

∂x′b∂xg

∂x′c[δa

d + Γadc(P)(xc − xc(P))

]Γd

f g −∂x f

∂x′b∂xd

∂x′cΓa

d f (P)

=∂x f

∂x′b∂xg

∂x′cΓa

f g +∂x f

∂x′b∂xg

∂x′cΓa

dc(P)(xc − xc(P))Γdf g −

∂x f

∂x′b∂xd

∂x′cΓa

d f (P)

=∂x f

∂x′b∂xg

∂x′c[Γa

f g − Γag f (P)

]+

∂x f

∂x′b∂xg

∂x′cΓa

dc(P)(xc − xc(P))Γdf g

Both terms are zero at point P. If the connection coefficient are C1-smooth, both terms are smallnear the point P. Indeed

Γ′(x′) = O(‖x − x(P)‖) = O(‖x′ − x′(P)‖) = O(‖x′‖)

Problem 1.5. If the connection is not assumed to be symmetric, convince yourself that one may atleast achieved by the above quadratic transformation that Γ′abc(P) = −Γ′acb(P).

Problem 1.6. Assume that the connection is symmetric and the connection coefficient are C1-smooth. Use the first formula from equation (1.22) and the quadratic transformation x′ 7→ x

xd = xd(P) + x′d −Γd

bc(P)2

x′bx′c for d = 1 . . . n (1.24)

to achieve that the transformed connection Γ′(P) = 0 and moreover Γ′(x′) = O(‖x′‖) near to thepoint P.

Proof. Since the connection is symmetric the derivatives of the above transformation (1.24) are

∂xd

∂x′b= δa

b − Γdbc(P)x′c

∂2xd

∂x′b∂x′c= −Γd

bc(P)

Now the first formula from equation (1.22) to transform the connection coefficients yields

Γ′abc =∂x′a

∂xd

[∂x f

∂x′b∂xg

∂x′cΓd

f g +∂2xd

∂x′c∂x′b

]=∂x′a

∂xd

[(δ f

b − Γfbc(P)x′c)(δg

c − Γgce(P)x′e)Γd

f g − Γdbc(P)

]=∂x′a

∂xd

fbδ

gcΓ

df g − δ

fbΓ

gce(P)x′eΓd

f g − Γfbc(P)x′cδg

cΓdf g + Γ

fbc(P)x′cΓg

ce(P)x′eΓdf g − Γd

bc(P)]

=∂x′a

∂xd

[Γd

bc − Γgce(P)x′eΓd

bg − Γfbc(P)x′cΓd

f c + Γfbc(P)x′cΓg

ce(P)x′eΓdf g − Γd

bc(P)]

=∂x′a

∂xd

[Γd

bc − Γdbc(P)

]+ O(‖x′‖) = O(‖x′‖)

Theorem 1.2. The partial covariant derivatives (∇cT)ec of any tensor T of type (q, r), is a tensorof type (q, r + 1).

14 F. Rothe

Proof for the case (q, r) = (1, 0) . Let v = vaea be a contravariant vector. The c-th partialcovariant derivative has the components

∇cva =∂va

∂xc + Γascvs

obtained from formula (1.19). Under any point transformation x′a = x′a(x1, . . . , xn) the contravariantcomponents [vb] have the transformation law

v′a =∂x′a

∂xb vb

The partial derivatives are

∂v′a

∂x′e=∂xc

∂x′e∂v′a

∂xc =∂xc

∂x′e∂

∂xc

(∂x′a

∂xb vb)

=∂xc

∂x′e∂2x′a

∂xb∂xc vb +∂xc

∂x′e∂x′a

∂xb

∂vb

∂xc

But we need the covariant derivatives

∇′ev′a =∂v′a

∂x′e+ Γ′abev′b

The connection coefficient are transformed by the second form of the rule (1.22)

Γ′abe =∂x′a

∂xd

∂x f

∂x′b∂xg

∂x′eΓd

f g −∂x f

∂x′b∂xd

∂x′e∂2x′a

∂xd∂x f

Γ′abev′b =

[∂x′a

∂xd

∂xg

∂x′eΓd

f g −∂xd

∂x′e∂2x′a

∂xd∂x f

]∂x f

∂x′bv′b

=

[∂x′a

∂xd

∂xc

∂x′eΓd

f c −∂xb

∂x′e∂2x′a

∂xb∂x f

]v f

and addition of the formulas yields the covariant derivatives, and the terms with the secondderivative cancel.

∇′ev′a =∂v′a

∂x′e+ Γ′abev′b

=∂xc

∂x′e∂2x′a

∂xb∂xc vb +∂xc

∂x′e∂x′a

∂xb

∂vb

∂xc +∂x′a

∂xb

∂xc

∂x′eΓb

f cv f −∂xb

∂x′e∂2x′a

∂xb∂x f v f

=∂xc

∂x′e∂x′a

∂xb

[∂vb

∂xc + Γbf cv f

]=∂xc

∂x′e∂x′a

∂xb ∇cvb

Proof for the case (q, r) = (0, 1). Let f = faωa be a covariant vector. The c-th partial covariantderivative has the components

∇c fa =∂ fa∂xc − Γs

ac fs

obtained from formula (1.20). Under any point transformation xa = xa(x′1, . . . , x′n) the transformed

covariant components [ f ′b] are

f ′b =∂xs

∂x′bfs

and taking the partial derivatives one gets

∂ f ′b∂x′c

=∂

∂x′c

(∂xs

∂x′bfs

)=

∂2xs

∂x′b∂x′cfs +

∂xa

∂x′b∂ fa∂xd

∂xd

∂x′c

Topics from Relativity 15

But we need the covariant derivatives

∇′c fb =∂ f ′b∂x′c

− Γ′sbc f ′s

The connection coefficient are transformed by the first form of the rule (1.22)

Γ′sbc =∂x′s

∂xd

∂xh

∂x′b∂xg

∂x′cΓd

hg +∂x′s

∂xd

∂2xd

∂x′c∂x′b

Γ′sbc f ′s =

[∂xh

∂x′b∂xg

∂x′cΓd

hg +∂2xd

∂x′c∂x′b

]∂x′s

∂xd f ′s

=

[∂xh

∂x′b∂xg

∂x′cΓd

hg +∂2xd

∂x′c∂x′b

]fd

and subtraction of the formulas yields the covariant derivatives. The terms with the secondderivative cancel. Some index shoveling is needed, and one gets

∇′c fb =∂ f ′b∂x′c

− Γ′sbc f ′s =∂xa

∂x′b∂xd

∂x′c∂ fa∂xd −

∂xh

∂x′b∂xg

∂x′cΓd

hg fd

=∂xa

∂x′b∂xd

∂x′c

[∂ fa∂xd − Γs

ad fs

]=∂xa

∂x′b∂xd

∂x′c∇c fa

End of the proof of theorem 1.2. One now proceeds by induction. Because of formula (1.17) thecovariant derivatives of tensors of type (0, 0) have type (0, 1). In other words, the derivative of ascalar is a covariant vector.

By the proof for the case (q, r) = (1, 0), the covariant derivative [∇cva] of a tensor [va] oftype (1, 0) has type (1, 1). The Leibniz product rule from equation (1.3) implies that the covariantderivatives of tensors of type (0, 1) has type (0, 2). Indeed, for any contravariant vector [va] andcovariant vector [ fa] holds

∇c(va fa) − (∇cva) fa = vb(∇c fb)

Both terms on the left-hand side have type (0, 1). The contravariant vector [vb] has type (1, 0),and let the tensor [∇c fb] have type (q, r). The contraction operation lowers the type from (q, r) to(q, r−1) for the tensor [vb∇c fb]. Since types on both sides are equal one concludes (0, 1) = (q, r−1),and hence tensor [∇c fb] has type (0, 2).

Based on the fact that all covariant derivatives are to be obtained via the stipulated Leibnizrule (1.21)

∇c(vsT a1...aq

b1...br) = (∇cvs)T a1...aq

b1...br+ vs(∇cT a1...aq

b1...br)

one may proceed inductively from tensors of type (q, r) to those of types (q + 1, r) and (q, r + 1),and check that indeed the tensor [∇cT a1...aq

b1...br] is of type (q, r + 1).

Definition 1.12 (Intrinsic derivative of a vector along a curve). For a field v = vaea ofcontravariant vectors, the intrinsic derivative along the curve x = x(u) with any real parameteru is

Dva

du=

dva

du+ Γa

scvs dxc

du(1.25)

Dvdu

= (∇cva)eadxc

du(1.26)

16 F. Rothe

For a field f = fbωb of covariant vectors, the intrinsic derivative along the curve x = x(u) with anyreal parameter u is

D fbdu

=d fbdu− Γs

bc fbdxc

du(1.27)

Dfdu

= (∇c fb)ωb dxc

du(1.28)

Remark. The first formula (1.25) yields

Dva

du=

dva

du+ Γa

scvs dxc

du=

[∂va

∂xc + Γascvs

]dxc

du= ∇cva dxc

du

Hence after putting a basis the second formula (1.26) is obtained.

Dvdu

=D(vaea)

Du=

Dva

duea = (∇cva)ea

dxc

du

Remark. Suppose a specific embedding M ⊆ RN of the manifold is given. The intrinsic part of anyvector X ∈ RN is given by the projection Pro j‖TP X from equation (1.18). The intrinsic derivativeof the vector v along a curve xc = xc(u) gets

Dvdu

= Pro j‖TP

dvdu

(1.29)

Indeed

Pro j‖TP

dvdu

= Pro j‖TP

d(vaea)du

= Pro j‖TP

[dva

duea + vs ∂es

∂xc

dxc

du

]=

dva

duea + vsPro j‖TP

∂es

∂xc

dxc

du=

dva

duea + vsea〈ω

a,∂es

∂xc 〉dxc

du

=

[dva

du+ vsΓa

sc

]ea

dxc

du=

[∇cva] ea

dxc

du

Definition 1.13 (Intrinsic derivative along a curve). Given is a curve xc = xc(u) and a tensorfield T = T(x) of type (q, r). The intrinsic derivative DT

Du of the tensor T along this curve is definedby the identity

DTdu

= (∇cT a1...aq

b1...br)e b1...br

a1...aq

dxc

du(1.30)

Under the assumption that specific embedding M ⊆ RN of the manifold is given

DTdu

= e b1...bra1...aq

〈〈ωa1...aq

b1...br,

dT(x(u))du

〉〉 (1.31)

Corollary 2. We assume an embedding M ⊆ RN of the manifold exists. Then the intrinsic derivativeDTDu of the tensor T of type (q, r) is again a tensor of the same type (q, r).

Proof. The equation (1.31) is a coordinate free definition since the projection involved is coordi-nate free. 1

Corollary 3. We assume an embedding M ⊆ RN of the manifold exists. The covariant derivative ofthe tensor T of type (q, r)

∇T = (∇cT) ⊗ ωc

is a tensor of the same type (q, r + 1).

1 Helmholtz’ ants can feel tensors but no coordinates.

Topics from Relativity 17

Proof. The contractionDTdu

= (∇cT)dxc

du

is of type (q, r) and the tangential vector dxc

du is of type (1, 0). Hence the tensor ∇cT is of type(q, r + 1).

1.4. Riemannian manifold

Definition 1.14 (Riemannian manifold). A Riemannian manifold M is a differentiable manifoldwith an additional metric structure. At every point and for every differential dx = eadxa at thatpoint, the length ds is given by the Riemannian metric

ds2 = gab dxadxb (1.32)

The symmetric matrix [gab] is assumed to be nonsingular

gab = gba and g := det gab , 0

and have the same number of positive and negative eigenvalues at all points of the manifold. 1

Definition 1.15 (Dot product). By putting ea · eb = gab for all a, b = 1 . . . n and extending bylinearity, a commutative dot product is defined on a Riemann manifold.

Postulate. For a Riemannian manifold, the tangent space TP and the cotangent space T ∗P areidentified by the requirement that the inner product equals the bilinear form:

b · a = 〈b, a〉 for all b ∈ T ∗P and a ∈ TP (1.33)

Lemma 1.13. The postulate to identify the tangent plane to the cotangent plane is equivalent tothe rules for lifting and lowering an index:

ωa = gab eb and ea = gab ωb

Here the matrices [gab] and [gab] are inverse of each other. The rules extend to any of the indicesof any tensor of any type.

Moreover, under this postulate hold both

ωa · ec = δac

ea · ec = gac(1.34)

for all a, c = 1 . . . n. The postulate is compatible with the identification T ∗∗P = TP from lemma 1.4.

Proof. For the base vectors, the requirement (1.33) gives

ωa · ec = 〈ωa, ec〉 = δac for all a, c = 1 . . . n (1.35)

Since [eb] is a basis of the tangent plane, any identification TP ↔ T ∗P gives formulas ωa = gabeb

with a, b = 1 . . . n. From equations (1.35) one obtains

gabeb · ec = gabgbc = δac (1.36)

Hence the matrices [gab] and [gab] are inverse to each other, and one has obtained the rule to liftthe index. The rule to lower the index is now easy to check.

1 The last requirement can be proved under the assumption that the manifold is connected.

18 F. Rothe

Conversely, the rules to lower and lift indices imply the requirement (1.33) that the innerproduct equals the bilinear form.

The identifications T ∗∗P = id(TP) = TP from equation (1.10) and T ∗P = TP from equation (1.33)work together to produce

b · a = 〈b, a〉 = 〈∗id(a),b〉∗ = 〈∗a,b〉∗ = a · b

for all b ∈ T ∗P and a ∈ TP. One puts 〈., .〉 = 〈∗., .〉∗. No contradiction arises since the dot product iscommutative.

Corollary 4. Corresponding rules to lift and lower indices apply to tensors of any type (q, r).

Problem 1.7. Apply the rules for lifting and lowering the indices from lemma 1.13 to get thecomponents ta

bc in terms of the components t fde .

Answer.ta

bc = gadgc f t fdb

Lemma 1.14. For any C1-smooth Riemann manifold, the metric tensor has covariant derivativeszero, and indeed satisfies

∂cgab = Γbac + Γabc and ∇cgab = 0 (1.37)

Proof. The connection coefficients are defined by equation (1.13). By the rule from lemma 1.13,one may lower the first indices on both sides and obtain

Γabc = 〈ωa, ∂ceb〉

Γabc = 〈ea, ∂ceb〉

Taking the intrinsic partial derivative ∂c on both sides of the second equation (1.34) and usingLeibniz rule and commutativity of the dot product, we get

ea · eb = gab

eb · (∂cea) + ea · (∂ceb) = ∂cgab

From the postulate to identify the tangent space TP and the cotangent space T ∗P, the dot productsare bilinear forms and

〈eb, ∂cea〉 + 〈ea, ∂ceb〉 = ∂cgab

For the partial and the covariant derivatives of the metric tensor, one gets

∂cgab = Γbac + Γabc

∇cgab = ∂cgab − gsbΓsac − gasΓ

sbc = ∂cgab − Γbac − Γabc = 0

Theorem 1.3. For a C2-smoothly embedded Riemann manifold, the metric tensor determines theconnection symbols.

Γabc =12

(∂bgca + ∂cgab − ∂agbc) (1.38)

Γabc =

gad

2(∂bgcd + ∂cgdb − ∂dgbc) (1.39)

Topics from Relativity 19

Proof. By lemma 1.12 the connection coefficients are symmetric: Γabc = Γa

cb and hence Γabc =

Γacb. Since the equationsΓbac + Γabc = ∂cgab and Γabc = Γacb

holds for all indices a, b, c = 1 . . . n and especially their permutations, they imply identity (1.38).The second identity (1.39) follows by the rule to lift an index.

Problem 1.8. A metric is called diagonal if gab = 0 for all a , b. Convince yourself that for adiagonal metric and symmetric connection, the connection coefficients Γa

bc are zero for a, b, c allthree different. Check the formulas

Γaac =

∂cgaa

2gaafor all a, c = 1 . . . n;

Γabb = −

∂agbb

2gaafor all a, b = 1 . . . n with a , b;

where no summation is implied.

Given a point P on the pseudo-Riemannian manifold, as used in general relativity. We knowthat there exists a point transformation such that gab(P) = ηab(P) at this one point P.

Problem 1.9. Explain, using linear algebra, how to get such a point transformation,—it is even alinear one.

In the above situation, the cotangent and tangent base vectors satisfy

ω0 = e0 and ωi = −ei

e0 · e0 = 1 , e0 · ei = 0 , ei · ek = −δik

eα · eβ = ηαβ

for i, k = 1, 2, 3 and α, β = 0, 1, 2, 3.

Definition 1.16 (Tetrad). The base vectors satisfying

eα · eβ = ηαβ for α, β = 0, 1, 2, 3

are called a tetrad. They are denoted by carots .I call an orthonormal basis in three dimension a tetrad, too, and use the same notation.

Problem 1.10. Suppose the metric gab is diagonal and positive definite, as occurs for example forspherical coordinates in R3. Write down, in terms of gaa:• the relations of the bases ea for the tangent space and ωa for the cotangent space;• and the relations to the corresponding tetrad ea.

Answer.

ωa = gabeb =ea

gaa

ea =ea√

gaa=√

gaaωa

20 F. Rothe

In the case of orthogonal coordinates, one obtains the same tetrad from the tangent basis asthe cotangent basis. Hence one may further simplify the notation. For example, take sphericalcoordinates (r, θ, φ) in R3. It is customary to denote the unit vector by the boldface name of thecoordinate with a carot put above it:

r = er = ωr

θ = r−1 eθ = rωθ

φ = (r sin θ)−1 eφ = r sin θωφ

Problem 1.11. Assume at some point, the electrical field has the covariant components [Er, Eθ, Eφ] =

[2, 3, 5]. Calculate the components for the orthonormal basis r, θ, φ.

Problem 1.12. Assume at some point, the velocity of a particle has the contravariant components[vr, vθ, vφ] = [2, 3, 5]. Calculate the components for the orthonormal basis r, θ, φ.

Definition 1.17 (Christoffel symbols). For any smooth Riemann manifold, the Christoffel symbolsare defined by

Mabc =12

(∂bgca + ∂cgab − ∂agbc) (1.40)a

b c

=

gad

2(∂bgcd + ∂cgdb − ∂dgbc) =

gad

2Mdbc (1.41)

Lemma 1.15. For any C2-smooth Riemann manifold

Γabc =

a

b c

+

12

(T a

bc + T acb + T a

bc

)(1.42)

Γabc =

a

b c

+

12

(T a

bc − T ac b + T a

bc

)(1.43)

12

(Γa

bc + Γacb

)=

a

b c

+

12

(T a

bc + T acb

)(1.44)

where T abc = Γa

bc − Γacb is the torsion tensor.

Proof. Since the torsion is a tensor, it is sufficient to prove

Γabc = Mabc +12

(Tbca + Tcba + Tabc)

Γabc = Mabc +12

(Tabc − Tcab + Tbca)

12

(Γabc + Γacb) = Mabc +12

(Tbca + Tcba)

Since ∂cgab = Γbac + Γabc by equation (1.37), the definition (1.40) yields

Mabc =12

(∂bgca + ∂cgab − ∂agbc) =12

(Γacb + Γcab + Γbac + Γabc − Γcba − Γbca)

Mabc − Γabc =12

(Γacb + Γcab + Γbac − Γabc − Γcba − Γbca) =12

(Tacb + Tcab + Tbac)

Γabc − Mabc =12

(Tbca + Tcba + Tabc) =12

(Tabc − Tcab + Tbca)

(Γabc + Γacb) − 2Mabc =12

(Tbca + Tcba + Tabc + Tcba + Tbca + Tacb) = Tbca + Tcba

which checks formulas (1.42) and (1.43) and (1.44).

Topics from Relativity 21

1.5. Lie derivative Any vector field [va] defines infinitesimal point transformations with

x′a = xa + εva xa = x′a − εva + O(ε2) (1.45)∂x′a

∂xc = δac + ε∂cva ∂xc

∂x′a= δc

a − ε∂avc + O(ε2) (1.46)

One may even define on the manifold a (local) flow ε 7→ x′ = Φ(ε, x) as the (local) solution of theinitial value problem

∂Φa(ε, x)∂ε

= va(x) (1.47)

For any tensor field one gets the induced flow t′ = Φ(ε, t). Here the tensor t′ has been transformed,but is still evaluated at the same coordinates x. 1

t′(x′) = Product o f Jakobians · t(x) (1.48)

Φ(ε, t)(x) = t′(x) = Product o f Jakobians · t(Φ(−ε, x)) (1.49)

Definition 1.18 (Lie derivative). The Lie derivative Lv t of any tensor t along the vector field v isdetermined by the transformation of the tensor under the above flow. Either in terms of infinitesimalpoint transformations (1.45), one defines

t(x′) − t′(x′) = εLv t + O(ε2)

t′(x) − t(x) = −εLv t + O(ε2)(1.50)

or the flow x′ = Φ(ε, x) and its induced flow t′ = Φ(ε, t)

limε→0

Φ(ε, t) − tε

= Lv t (1.51)

Lemma 1.16 (Rule to get the Lie derivative). In the general case of a tensor T of type (q, r), thecomponents of the Lie derivative are denoted by

Lv Ta1...aq

b1...br

They are obtained by the following rule: Each component is the sum of 1 + q + r terms. The firstterm is the directional derivative

vs∂sTa1...aq

b1...br

The remaining terms are all products of the tensor components with some partial derivative ofcomponents from the vector field v along which the Lie derivative is taken. The terms correspondingto the q upper indices are subtracted. For each term, the vector field has robbed a different oneof contravariant indices a1 . . . aq and replaced this index by a contravariant summation index s.The partial derivative ∂s is taken along the robbed index, of the vector component with the robbedindex.

The last r terms are added. Once more, for each term, a different one of covariant indicesb1 . . . br has been "robbed" and is replaced by a covariant summation index s. The partialderivative of the vector field components vs is taken along the robbed index.

Problem 1.13. Convince yourself that for any scalar function S = S (x), the Lie derivative is thedirectional derivative:

Lv S = va∂aS (1.52)

1 The minus sign appears naturally, well motivated by throughout use of passive transformations.

22 F. Rothe

Answer. Since a scalar function transforms by the rule S ′(x′) = S (x) under any point transforma-tion, one gets from the first formula (1.5)

S (x′) − S (x) = S (x′) − S ′(x′) = εLv S + O(ε2)

and from any Taylor expansion

S (x′) = S (x) + (x′a − xa)∂aS + O(|x − x′|2) = S (x) + εva∂aS + O(ε2)

Thus comparison give the formula (1.52). Hence the Lie derivative of a scalar is the directionalderivative.

Problem 1.14. Apply the rules for the Lie derivative, given by lemma 1.16, to get Lv t cab of tensor

t along the vector field v.

Answer.Lvt c

ab = vd∂dt cab − (∂avs)t c

sb − (∂bvs)t cas + (∂svc)t s

ab

Proof of validity.

t(x′) − t′(x′) = t(x′) − t(x) −[t′(x′) − t(x)

]The first term comes from partial derivatives:

t cab(x′) − t c

ab(x) = εvs∂st cab + O(ε2)

The second term comes from the transformation flow:

t′ cab(x′) =

∂xd

∂x′a∂xe

∂x′b∂x′c

∂x f t fde (x)

=(δd

a − ε∂avd) (δe

b − ε∂bve) (δc

f + ε∂ f vc)

t fde (x) + O(ε2)

= t cab(x) − ε∂avdt c

db − ε∂bvet cae + ε∂ f vct f

ab + O(ε2)

t′ cab(x′) − t c

ab(x) = −ε∂avst csb − ε∂bvst c

as + ε∂svct sab + O(ε2)

Subtraction of the second from the first yields

t cab(x′) − t′ c

ab(x′) = εvs∂st cab + ε∂avst c

sb + ε∂bvst cas − ε∂svct s

ab + O(ε2)

The second formula is obtained since

t cab(x) − t′ c

ab(x) = t cab(x′) − t′ c

ab(x′) + O(ε2)

The third formula (1.51) is obtained from the definition (1.48).

Corollary 5. The Lie derivative Lv of a tensor t is a tensor of the same type,—provided the vectorfield v is transformed, too.

Proposition 1.3. The Lie derivative Lv applied to tensors of any type with the vector field v fixed,obeys the Leibniz product rule.

Proof of the most simple case. For any contravariant vector [ha] and covariant vector [ka] holds

(Lvha)ka + haLv(ka) =[(vs∂sha) − hs(∂sva)

]ka + ha [

(vs∂ska) + ks(∂avs)ks]

= (vs∂sha)ka + ha(vs∂ska) = vs∂s(haka) = Lv(haka)

Topics from Relativity 23

2. Special Relativity

2.1. Relativity of time and length (ct, x, y, z) be the coordinates for any event, measured in theinertial system S . Let (ct′, x′, y′, z′) be the coordinates for the same event, measured in the inertialsystem S ′. Assume that the origins of systems S and S ′ are equal, and that the system S ′ moveswith velocity v in +x direction relative to system S . We want to determine the linear transformation

t′ = At + Bxx′ = Dt + Exy′ = y and z′ = z

(2.1)

In special relativity, it is customary to introduce the dimensionless parameters

β :=vc

and γ :=1√

1 − β2

Problem 2.1. From the relative velocity of the two systems, and from the postulate of constancy ofthe velocity of light c, one gets the three assumptions• x = vt if and only if x′ = 0;• x = ct if and only if x′ = ct′;• x = −ct if and only if x′ = −ct′.Use these three assumptions to determine the constants B,D and E in terms of A and the relativityparameters β and γ.

Answer. • x = vt ⇔ x′ = 0 yields D = −Ev;• x = ct ⇔ x′ = ct′ yields cA − D + c2B − cE = 0 since

0 = ct′ − x′ = (cA − D)t + (cB − E)x = (cA − D + c2B − cE)t

• x = −ct ⇔ x′ = −ct′ yields −cA − D + c2B + cE = 0 since

0 = ct′ + x′ = (cA − D)t + (cB − E)x = (cA − D − c2B + cE)t

Subtracting the last two relations yields 2cA − 2cE = 0, hence A = E. Adding them yields−2D + 2c2B = 0, hence D = c2B. After eliminating B,D and E one gets

t′ = A t − Avx/c2

x′ = −Av t + A x

y′ = y and z′ = z

In 4d-matrix notation: ct′

x′

y′

z′

=

A −βA 0 0−βA A 0 0

0 0 1 00 0 0 1

ctxyz

Definition 2.1 (Isochronic Lorentz-, proper Lorentz transformation). An isochronic Lorentztransformation maps the cone of light rays which point into the future into itself. A proper Lorentztransformation maps the cone of light rays which point into the future into itself, and has thedeterminant +1.

24 F. Rothe

Problem 2.2. In many texts and lectures, it is customary to begin with the stronger postulate ofinvariance of the Minkowski metric:

c2t2 − x2 − y2 − z2 = c2t′2 − x′2 − y′2 − z′2 (2.2)

instead of the weaker postulate of constancy of the velocity of light.Use the invariance of the Minkowski metric and the fact that x = vt ⇔ x′ = 0, to determine

the constants A, B,D and E in the transformation (2.1), in terms of the relativity parameters β andγ. There are four solutions, with different signs of these constants. Write down all four solutions.What is the meaning of these solutions? Which one of these four solutions is a proper Lorentztransformation.

Answer. Invariance of the Minkowski metric implies c2t2 − x2 = c2(At + Bx)2 − (Dt + Ex)2. Wecompare the coefficients of t2, tx and x2 and get

c2 = c2A2 − D2

0 = 2c2AB − 2DE

−1 = c2B2 − E2

We still use that D = −Ev for a boost with relative velocity v in +x direction. Hence c2AB = −vE2

from the second relation. Squaring and plugging in the first and third relation yields

v2E4 = (c2A2)(c2B2) = (c2 + v2E2)(E2 − 1)

0 = −c2 + (c2 − v2)E2

E = ±c

√c2 − v2

= ±γ

A2 = 1 + c−2D2 = 1 + c−2v2E2 = 1 +v2

c2 − v2 =c2

c2 − v2

A = ±γ

First consider the solution with A = E = γ. One gets B = −vE2/(c2A) = −vγ/c2 and thetransformation

t′ = γ t − γvx/c2

x′ = −γv t + γ xy′ = y and z′ = z

which is the Lorentz boost with relative velocity +v.The solution with A = −γ and E = +γ is

t′ = −γ t + γvx/c2

x′ = −γv t + γ xy′ = y and z′ = z

which is the Lorentz boost followed by time reversal.The solution with A = +γ and E = −γ is

t′ = γ t − γvx/c2

x′ = +γv t − γ xy′ = y and z′ = z

Topics from Relativity 25

which is the Lorentz boost followed by space reflection. For the forth solution

t′ = −γ t + γvx/c2

x′ = +γv t − γ xy′ = y and z′ = z

both the time is reversed and the space reflected.

Here is another way to determine the constant A left open in problem 2.1. We begin with theassumptions

(i) The proper Lorentz transformations are a group.

(ii) Among the Lorentz transformations are the boosts as well as the usual 3-dimension rotations.

(iii) The rotations without boost leave the time invariant.

(iv) Any isochronic Lorentz transformation that is diagonal leaves the time invariant.

Thus both the boost

L =

A −βA 0 0−βA A 0 0

0 0 1 00 0 0 1

and the rotation

S =

1 0 0 00 −1 0 00 0 −1 00 0 0 1

about the z-axis by 180 are Lorentz transformations.

Problem 2.3. Calculate the matrix products S LS and LS LS . Give a convincing argument thatA2(1 − β2) = 1 holds in the physically meaningful case. Determine the sign of A occurring for aproper Lorentz transformation. Once more, write down the matrix for a boost in +x direction.

Answer.

S LS =

A βA 0 0βA A 0 00 0 1 00 0 0 1

and LS LS =

A2(1 − β2) 0 0 0

0 A2(1 − β2) 0 00 0 1 00 0 0 1

This last matrix is a proper Lorentz transformation, and is diagonal, too. By the above assumptionitem (iv), it leaves the time invariant. Hence A2(1 − β2) = 1 and A = ±γ. Only the case A = +γ isan isochronic Lorentz transformation.

For a graphic representation of the usual boost

ct′ = γct − βγxx′ = −βγct + γx

(2.3)

one uses the same scale for the lengths ct and x, and a right angle between the ct-axis and thex-axis.

26 F. Rothe

Problem 2.4. Convince yourself that• the same angle occurs between the ct- and ct′-axis as between the x- and x′-axis;• the angle between the x′- and the ct′-axis is acute.Draw the future light ray x = ct , t ≥ 0. Draw the hyperbola of all points which have the invariantspace-like squared distance −1 from the origin.

To illustrate the Lorentz contraction, we imagine a rigid tube of (comoving) length d withmirrors on both ends. Now the tube is moving with velocity v relative to the S -system. Thus thetube is at rest in the comoving S ′ system.

I calculate at first with coordinates from the S -system. The mirrors at the end of the tube movealong the lines

x = vt and x = vt + L

Let a light ray OA be sent from the mirror at the right end to the mirror at the left end, and reflectedinto a light ray AB from the mirror at the left end to the mirror at the right end. The equations ofthese light rays are x = ct and x = −ct + a.

Problem 2.5. Determine constant a. Determine the S -system coordinates of the reflection eventsA and B.

Answer. Point A is the intersection of light ray x = ct with the world line of the right mirrorx = vt + L. Hence t = L

c−v and the coordinates are

(ct, x)A =

( cLc − v

,cL

c − v

)=

(L

1 − β,

L1 − β

)and a = 2ctA = 2cL

c−v . Point B is the intersection of light ray x = a − ct with the world line of the leftmirror x = vt. Hence t = a

c+v and the coordinates are

(ct, x)B =

(2c2L

c2 − v2 ,vc·

2c2Lc2 − v2

)=

(2L

1 − β2 ,2βL

1 − β2

)Problem 2.6. Use the Lorentz transformation (2.3) to determine the S ′-system coordinates of thereflection events A and B.

Answer. From the equations (2.3) for the boost one gets the S ′-coordinates of point A to be

ct′A = γct − βγx = γ(1 − β) ·L

1 − β= γ L

x′A = −βγct + γx = γ(−β + 1) ·L

1 − β= γ L

and the S ′-coordinates of point B to be

ct′B = γct − βγx = γ(1 − β2)2L

1 − β2 = 2γL

x′B = −βγct + γx = γ(−β + β)2L

1 − β2 = 0

Problem 2.7. Convince yourself that

L =

√1 − β2 d < d (2.4)

which is the famous Lorentz-FitzGerald contraction. There are no forces of any kind involved.

Topics from Relativity 27

Answer. In the comoving system S ′, the coordinates of points A and B are, by definition of theproper distance d,

(ct′, x′)A = (d, d) and (ct′, x′)B = (2d, 0)

Comparing with the result of the previous problem 2.6 yields the relation (2.4) for the Lorentzcontraction.

To illustrate the time dilation, we imagine a rigid tube of (comoving) length d with mirrors onboth ends. Again the tube is moving with velocity v relative to the S -system, but turned by 90.Thus the tube is at rest in the comoving S ′ system, and lying on the y′-axis.

I calculate at first with coordinates (ct′, x′, y′) from the comoving S ′-system. The mirrors atthe end of the tube move along the lines

x′ = 0, y′ = 0 and x′ = 0, y′ = d

Let a light ray OA be sent from the mirror at the right end to the mirror at the left end, and reflectedinto a light ray AB from the mirror at the left end to the mirror at the right end. The light goes onto be reflected forth and back, and each reflection give a tick of this mirror-clock.

In the comoving system, the equations of these light rays are

OA : x′ = 0, y′ = ct′

AB : x′ = 0, y′ = 2d − ct′

The emission and reflection events have S ′-coordinatesO = (0, 0, 0), A = (cd, 0, d), B = (2cd, 0, 0). Thus cd is the proper time interval between the ticksof the mirror clock.

Problem 2.8. Determine the S -system coordinates (ct, x, y) of the reflection events A and B.Determine the equations of the light rays OA and AB.

Answer. One needs the inverse of the boost (2.3). The S -coordinates of reflection event A are

ctA = γct′ + βγx′ = γcd

xA = βγct′ + γx′ = βγd

yA = y′ = d

The S -coordinates of reflection event B are

ctB = γct′ + βγx′ = 2γcd

xB = βγct′ + γx′ = 2βγd

yB = y′ = 0

We determine the equations of the light rays OA and AB. In the S -system, the equations of lightray OA are

ct = γct′ + βγx′ = γct′

x = βγct′ + γx′ = γvt′

y = y′ = ct′

Remark. One has to keep some parameter along the light ray. This cannot be the proper time. Sucha parameter is also called an affine parameter. I have chosen t′ for that role.

28 F. Rothe

In the S -system, the equations of light ray AB are

ct = γct′ + βγx′ = γct′

x = βγct′ + γx′ = γvt′

y = y′ = 2d − ct′

Problem 2.9. Convince yourself that times of the reflection events A and B in the S -system andγcd and 2γcd. Since γ > 1, we get longer time intervals between the ticks of the clock. The movingclock is slowing down by the relativity parameter γ.

Answer. From the expression for ctA and ctB calculated in the previous problem 2.8, we foundalready that γcd is the time interval between the ticks of the clock, as observed in the S -system.

In spite of the slowing down of the clock rate, the velocity of light remains the same. Usingthe theorem of Pythagoras, we find which distance the light has travelled from event O to A to be√

x2A + y2

A =

√β2γ2d2 + d2 = γd

So the moving clock slows down since the light has to travel a longer distance between the reflectionevents.

2.2. Discovery of Aberration and Parallax In 1725, James Bradley, who held a position atOxford as astronomer and natural philosopher, began observations of γ Draconis at the home ofa friend, Samuel Molyneux. Using a telescope affixed to a chimney, so that it pointed nearlyvertically, he changed the position of the telescope very slightly, and very accurately measured itschange in position, using a screw and plumb-line; and over the course of a year or so, found that thestar did indeed vary in position during the course of the year by 40 arc-seconds, just like Polaris.

Figure 2.1. Aberration.

Stellar aberration produces an elliptical motion, circular at the Ecliptic poles, and linear atthe Ecliptic plane, whose semi-major axis equals a constant, regardless of the distance or angular

Topics from Relativity 29

Figure 2.2. Parallax.

position of the star, equal to one radian multiplied by the ratio of the Earth’s orbital velocity, to thespeed of light. This ratio is about 4.9610−5 20.48′′. The first figure gives radian measure, thesecond figure gives the angle in angular seconds—multiply by 3600 · 180/π. As the Earth moves,the apparent positions of any star are shifted in the direction of the velocity of Earth’s motion.

Exactly like already had been known for Polaris, the change in motion was in the wrongdirection for stellar parallax. Parallax was really observed more than hundred years later, for thefirst time by F. W. Bessel and W. Struve in 1838. They observed a shift of 0.292′′ for the star 61Cyni. This star has a distance of about 11 light years from the earth, one of the stars nearest to theearth. Roughly spoken, parallax is about 1/100 of aberration, or less.

Parallax produces a shift reciprocal to each star’s distance in parsecs, and is used to measurethe distance of the stars nearest to the earth. Parallax produces an elliptical motion of the star,circular at the Ecliptic poles, and linear at the Ecliptic plane, whose semi-major axis equals thereciprocal of each star’s distance in parsecs, which is of course different for different stars. As theEarth moves, the apparent positions are shifted in the direction of the radius from the earth to thesun. Parallax occurs a quarter cycle of the circular motion later than of the aberration shift.

2.3. Aberration and the Doppler effect Here is the simple non-relativistic reasoning. Assumeraindrops fall with the velocity c vertically. A pedestrian moves with velocity v. In which angleshall he observe the rain? Seen in a frame at rest, the rain gives a right triangle with hypothenusec and horizontal leg c cosα, vertical leg c sinα. In the moving frame, the right triangle has samevertical leg c sinα and (non-relativistic!) horizontal leg c cosα + v. Hence

tanα′ =c sinα

c cosα + v

Remark. Part of the information above is taken from website of Courtney Seligman, Professor ofAstronomy.

30 F. Rothe

Figure 2.3. Aberration of the rain—and the light.

http://cseligman.com/text/history/bradley.htm

A second source is dtv Atlas der Astronomie.

Figure 2.4. The one-dimensional Doppler effect.

2.4. The one-dimensional Doppler effect Let the observer O be stationary in the inertial S -frame, with world-line CD. Let the light source or emitter E be stationary in the inertial S ′-frame,with world-line AB. Let the S ′-frame move with uniform velocity v along the positive x-axis.

Topics from Relativity 31

Suppose a light ray is sent in the negative x-direction. Let rays AC and BD be two successive crestsof the light wave. I calculate in the S -system. A figure is provided on page 30. Subtracting

(c + v)tA + k = ctA + xA = ctC + xC and(c + v)tB + k = ctB + xB = ctD + xD = ctD + xC yields(1 + β)∆tAB = ∆tCD

To obtain the frequency shift, one needs to calculate with the proper times, both for the emitter andobserver. Since the phase of the wave is invariant, and differs by 2π for successive crests of thelight wave

νE∆τAB = νO∆τCD = 2π

and taking into account the time dilation

∆τAB =

√1 − β2 ∆tAB and ∆τCD = ∆tCD

For the ratio of the frequencies we obtain

νOνE

=∆τAB

∆τCD=

√1 − β2 ∆tAB

∆tCD=

√1 − β2

1 + β=

√1 − β1 + β

For β > 0, we see that νO < νE. Thus a receding light source is redshifted, as expected.

2.5. Four-vectors and Minkowski metric The approach to special relativity we use here, goesback to a lecture by Minkowski from 1908. I use a modern notation as in Hobson’s book [7]General Relativity. I shall use space-time with 1 + 3 dimensions. Space-time vectors are eitherdenoted by their contravariant components with upper Greek indices and put into square brackets,or by an invariant four-vector and written in bold face. To begin with, to denote any position inspace-time the four-vector x, or [xµ] is used. Here x0 = ct and x1 = x, x2 = y, x3 = z are thecomponents of the three-dimensional space vector ~x. We assume t, x, y, z to be real. I anywaysmake an endeavor that all components of a four-vector have the same dimension. Any two space-time vectors x = (ct, x, y, z) and p = (s/c, p, q, r) have a Lorentz-invariant scalar product

xT ηp = ts − xp − yq − zr (2.5)

With the matrix notation from linear algebra this means that

η =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

x · p := xT ηp (2.6)

for the Lorentz-invariant scalar product. In spite of its surprising properties, the Lorentz-invariantscalar product has still many properties in common with the ordinary scalar product. It is easy tocheck that the scalar product is• commutative: x · p = p · x• non-degenerate: If x · p = 0 for all vectors p, then x = 0.

Definition 2.2 (Perpendicular subspace). For any subspace U ⊆ R4, the Minkowski-perpendicularspace is defined as

U⊥ = y ∈ R4 : y · u = 0 for all u ∈ U

32 F. Rothe

Definition 2.3 (Present, future, future light-cone)..

The future cone F uture = (ct, x, y, z) : c2t2 − x2 − y2 − z2 > 0 and t > 0.A vector such that x ∈ F uture or −x ∈ F uture is called time-like.

The future light-cone Light+ = (ct, x, y, z) : c2t2 − x2 − y2 − z2 = 0 and t ≥ 0.A vector x ∈ Light+ is called future light-like. A vector such that x ∈ Light+ or −x ∈ Light+

is called light-like.

The present Present = (ct, x, y, z) : c2t2 − x2 − y2 − z2 < 0.A vector x ∈ Present is called space-like.

The proper time along a time-like vector x = (ct, x, y, z) ∈ F uture or −x ∈ F uture is the Lorentzinvariant quantity

|x| =√

x · x =

√c2t2 − x2 − y2 − z2

2.6. The relativistic Doppler effect Now I give the correct 1 + 3-dimensional relativisticargument. The amplitude of a pure plane sinus wave is given by a space-time function is describedby plane waves cos(ωt−~k · ~x), and their superposition. We consider any material- or light-wave butdisregard polarization. The vector ~k ∈ R3 is called wave vector and ω > 0 is the angular frequency.The wave length λ is related to the wave vector by 2π

λ= |~k|.

Concerning (passive) transformation by the Lorentz group, we know that (ct, ~x) is a four-vectorand the phase (ωt − ~k · ~x) is invariant. Hence [kµ] = [ω/c,~k] is a four-vector, too. Thus the phasehas turned out to be the invariant scalar product x · k of two four-vectors. So far, the argumentshold for any type of waves.

We now restrict ourselves to light waves in vacuum. In that case, the wave equation implies~k · ~k − ω2/c2 = 0 and hence the four-vector (ω/c,~k) is light-like. The wave length λ, the frequencyν and the circular frequency ω are related by

|~k| =2πλ

c=

2πνc

Suppose a light ray lies in the xy-plane, and let θ be the angle between the direction of propagationof light and the positive x-axis. In this setting, the wave four-vector is

(ω/c,~k) =2πλ

(1, cos θ, sin θ, 0)

Let the observer O be stationary in the inertial S -frame. Let the light source or emitter E bestationary in the inertial S ′-frame. Let the S ′-frame move with uniform velocity v along the positivex-axis. The corresponding quantities referring to a S ′-frame are denoted by primes.

The relative velocity is β = tanh λL with the Lobashevskij parameter λL. We use theabbreviation, common in relativity

β =vc

and γ =1√

1 − β2= cosh λL

The Lorentz transformation between the two frames is

ct′ = γct − βγxx′ = −βγct + γxy′ = y and z′ = z

Topics from Relativity 33

or in matrix notation ct′

x′

y′

z′

=

γ −βγ 0 0−βγ γ 0 0

0 0 1 00 0 0 1

ctxyz

Exactly the same Lorentz transformation applies to (ω/c,~k) since this is a the space-time vector,too. Hence we conclude

ω′/ck′xk′yk′z

=

γ −βγ 0 0−βγ γ 0 0

0 0 1 00 0 0 1

ω/ckx

ky

kz

So far, this transformation even holds for any type of waves. We specialize to light rays in thexy-plane, and use polar coordinates to get

2πλ′

1

cos θ′

sin θ′

0

=2πλ

γ −βγ 0 0−βγ γ 0 0

0 0 1 00 0 0 1

1cos θsin θ

0

Hence the Doppler shift of the wave length, respectively frequency, is

νEνO

=ν′

ν=λ

λ′= γ(1 − β cos θ) =

1 − β cos θ√1 − β2

The transformation of the spatial components give the change of direction called aberration. Weobtain the formulas

cos θ′ =cos θ − β

1 − β cos θ

sin θ′ =γ−1 sin θ

1 − β cos θ

tan θ′ =sin θ

γ cos θ − γβ

For the motion of the earth around the sun β ≈ 10−4. Even motion through the milky way givesrelative velocity of the same magnitude. And even the recently measured motion of the milky wayagainst the average position of many spiral nebula gives β in the order of 2 · 10−3 1. Hence inthe case of astronomical aberration, the deflection angle θ′ − θ is small. After some calculation weget

sin(θ′ − θ) = sin θ′ cos θ − cos θ′ sin θ =β sin θ + sin θ cos θ(γ−1 − 1)

1 − β cos θ= β sin θ + O(β2)

For the approximation in first order of β, we have confirmed the common formula

θ′ = θ + β sin θ + O(β2)

Remark. With a "wave" vector pointing from the observer, we need to do the substitutions:θ → α + 180 and θ′ → α′ + 180. Hence the different sign of β occurs in the intuitive argumentabove.

34 F. Rothe

2.7. Four-velocity For a particle moving under arbitrary forces, one uses the world-linexµ = xµ(τ) for components µ = 0, 1, 2, 3 and the proper time τ as parameter. For the world-line of a light ray xµ = xµ(p), one has to use any other arbitrary parameter p since the proper timeis constant along the light path. The four-velocity [uµ] is defined as the derivative

uµ =dxµ

dτwhereas the Newtonian velocity is

~v =d~xdt

Because of time dilation dt = γdτ, they are related by 1

[uµ] = γ[c,~v] = γ[c, vx, vy, vz]

Lemma 2.1. The four-velocity for a material particle is a time-like vector pointing to the future,and has the Lorentz-invariant length |u| = c.

Reason.u · u = γ2(c2 − ~v2) = c2γ2(1 − β2) = c2

and u0 = γc ≥ c > 0.

2.8. The energy-momentum vector For any particle of rest-mass m, the energy-momentumvector [pµ] is defined by

pµ = muµ = mdxµ

dτThe space components of this four vector are

~p = γmd~xdt

This turns out to be the momentum occurring in Newton’s law 2

~F =d~pdt

(Newton)

What is the meaning of the component p0? To find out, one needs to use the relation the relation

W = ~F · ~x (work)

for the work W done by a force ~F during a motion over a distance ~x. We imagine that a particleis moving along some path, under the influence of forces. These may be for example forces fromelectric and magnetic fields, among other possibilities. The path is denoted as xµ = xµ(τ), with theproper time τ as parameter. From the property of the four-velocity, checked in lemma 2.1, one gets

(p0)2 − ~p 2= p · p = m2c2

As long as the forces do not charge the rest-mass,—which is true for electric forces, anyway,—wehave found the constant of motion m2c2. One differentiates by the parameter τ used on the particlepath and obtains

p0 dp0

dτ= ~p ·

d~pdτ

1 To avoid upper indices 1, 2, 3 for the components of a common vector, I use instead the lower indices x, y, z.2 I have to ask the reader to accept this statement at face-value.

Topics from Relativity 35

The definition of the four-momentum implies

~p = γm~v =p0

c~v

One uses this identity and cancels p0

cp0 dp0

dτ= p0~v ·

d~pdτ

cdp0

dτ= ~v ·

d~pdτ

=d~xdt·

d~pdτ

=d~xdτ·

d~pdt

Now Newton’s law (Newton) is used to obtain

cdp0

dτ=

d~xdτ· ~F

The right-hand side is the rate at which work is done by the force ~F. This work is only used toincrease or decrease the kinetic energy T of the particle. Hence

cdp0

dτ=

dTdτ

Integration over the particle path yields cp0 = T + R with some constant of integration R. Theconstant is determine by referring to a spot pµ(τ0) along the path, where the particle is at rest. Forthe parameter τ0 we obtain p0 = mc and T = 0, and hence conclude that R = mc2. Since R isconstant, one has obtained

cp0 = T + mc2

Following Einstein, the term mc2 is interpreted as the energy equivalent of the rest mass, andE := T + mc2 as the total energy of the particle.

Theorem 2.1. The four-momentum of a particle moving in any field of forces is

pµ = [E/c, ~p ]

The total energy E and the rest mass m are related by

E =

√m2c4 + c2~p 2 (2.7)

The momentum and the velocity by

~p =Ec2 ~v (2.8)

These formulas retains their meaning for m = 0, as does occur for a photon, or other masslessparticle.

Problem 2.10. A particle is non-relativistic if c|~p | mc2, or equivalently β 1. Use the powerexpansion

√1 + x = 1 +

x2−

x2

8± . . .

to get the approximation of a kinetic energy for a non-relativistic particle.

36 F. Rothe

Answer. From the relation (2.7), the total energy E = T + mc2 and the kinetic energy T are

E =

√m2c4 + c2~p 2

= mc2

√1 +

p2

m2c2

T = mc2

1 +p2

m2c2 − 1

= mc2[

p2

2m2c2 −p4

8m4c4 ± . . .

]T '

p2

2m

as well known from the basics of classical mechanics.

Theorem 2.2. The four-momentum of a massive particle moving in a field of forces is, in terms ofits velocity

pµ = [E/c, ~p ] = [γmc, γm~v]

The total energy E and the total mass γm are related by

E = γmc2

These formulas do not retain their meaning for m = 0.

Figure 2.5. The principal setup of Compton’s experiment.

2.9. The Compton effect If light consists of photons, collisions between photons and particlesof matter should be possible. For photons and electrons, this quantum effect was discovered in1922 by A.H. Compton from the university of St. Louis in Missouri. The figure on page 36 showsthe principle setup of Compton’s experiment. The monochromatic Mo-Kα-rays are scattered bya graphite crystal. The wave-length spectrum dW/dλ of the scattered x-radiation is measured, bymeans of a Bragg crystal, for different scattering angles θ. Additionally to photons of the incidentwavelength λ = 70 · 10−12 m = 70 pm, photons with a longer wavelength λ′ occurs. The shift∆λ(θ) = λ′(θ)−λ is an increasing function of the scattering angle θ. For example, ∆λ(90) = 2.4 pm.Compton already gave the correct interpretation of his results as an elastic scattering of photons bythe quasi-free electrons inside the graphite.

Topics from Relativity 37

Actually one finds that the scattered photons have two different wavelengths. One set ofphotons gets their wavelength shifted, by a shift depending on the scattering angle. The shift comesout as predicted below, by the scattering from electrons. A second set of photons has unshiftedwavelength. This set is due to scattering from the positively charged ions. The mechanism ofscattering is the same for both sets, except that for the ions the electron mass has to be replaced bythe ion mass, which is many thousand time larger. So the shift is tiny.

Figure 2.6. The kinematics of the collision of a photon with an electron, initially at rest.

In short-hand, the scattering process is written as γ+ e→ γ+ e. The situation in the lab systemis shown in the figure on page 37. For a relativistic calculation, we assume that the four-momentaof the incident photon and electron are

q = (~ω/c)[1, 1, 0, 0] and p = [mc, 0, 0, 0]

and for the scattered photon and electron 1 are

q′ = (~ω′/c)[1, cos θ, sin θ, 0] and p′ = (E′/c)[1, β′ cos φ, β′ sin φ, 0]

and use the conservation of the total four-momentum. Thus one gets four equations

q + p = q′ + p′ (2.9)

from which we want to eliminate the less interesting quantity p′, by means of the known rela-tion (2.7), which hold both for the incident and the scattered electron. From here on, it is perfectlypossible to proceed just by elementary means. 2 I prefer to take advantage of the invariant scalarproducts as follows:

(q + p)2 = (q′ + p′)2

q2 + 2 q · p + p2 = q′2 + 2 q′ · p′ + p′2

We use now q2 = q′2 = 0 and p2 = p′2 = m2c4, and next eliminate the variable p′:

q · p = q′ · p′ = q′ · (p + q − q′) = q′ · (p + q)

1 I prefer to use unwound angles from polar coordinates. The drawing on page 37 is an example with φ > 0 and θ < 0.2 A up-to-date presentation along these lines is given for example in the textbook [6] p.114

38 F. Rothe

We put the assumed vectors into the last equation to get

~ωmc2 = ~ω′(~ω + mc2) − ~2ωω′ cos θ

λ′

λ=ω

ω′=~ω(1 − cos θ) + mc2

mc2

λ′ − λ =~ωλ(1 − cos θ)

mc2 =h

mc(1 − cos θ)

The factorh

mc= λC = 2.4263 pm

is called the Compton wavelength of the electron. A photon of the wavelength λC has the energymc2 = 511000 eV, equivalent to the rest mass of the electron. Thus we have obtained the shift ofthe wavelength to be

λ′ − λ = λC(1 − cos θ) (2.10)

Problem 2.11. Check that the kinetic energy of the scattered electron is

T ′ = mc2 λ2C

λλ′(1 − cos θ)

Take as an example the monochromatic Mo-Kα-rays from A.H. Compton’s experiment of 1922. Theincident wavelength is λ = 70 pm. Calculate the kinetic energy of the scattered electron.

Answer. From the 0-component of the four-momentum conservation (2.9), the kinetic energy ofthe electron is T ′ = cq0 − cq′ 0 and converted to wavelengths becomes

T ′ = cq0 − cq′ 0 = ~ω − ~ω′ = hc(

1λ−

1λ′

)= mc2

(λC

λ−λC

λ′

)The equation (2.10) for the shift of the wave length yields

T ′ = mc2 λ2C

λλ′(1 − cos θ)

With the data from Compton’s experiment of 1922, one obtains T ′ = (1 − cos θ)635.9 eV.

Problem 2.12. Determine the direction of the scattered electron, from the y-component of the four-momentum conservation (2.9).

(a) Write down the y-component of the four-momentum conservation.

(b) Get sin φ in terms of sin θ, and the wave length λ′ and relativity parameter γ′ for the scatteredelectron.

(c) From the kinetic energy T ′ obtained in the previous problem see that

γ′ − 1 =λ2

C

λλ′(1 − cos θ)

Use this expression to simply. After a bid lengthy calculation one gets sin φ = −A cos(θ/2)with a factor A < 1 which is approaching one for non-relativistic electron.

(d) Express the result sin φ = − cos(θ/2), to be expected in the experiments, in simple geometricterms.

Topics from Relativity 39

Answer. (a) The y-components ~q ′y = (~ω′/c) sin θ and ~p ′y = mcγ′β′ sin φ add up to zero. Onegets cp′ sin φ + ~ω′ sin θ = 0.

(b)

sin φ = −~ω′

cp′sin θ = −

hλ′p′

sin θ = −h

mcλ′β′γ′sin θ = −

λC

λ′√

(γ′2 − 1)sin θ

(c)

sin φ = −λC

λ′√γ′ + 1

√γ′ − 1

sin θ = −λC

λ′√γ′ + 1

√λλ′

λC

sin θ√

(1 − cos θ)

= −

√λ

λ′

√2

γ′ + 1cos(θ/2)

(d) The non-relativistic electron is scattered along the ray opposite to the angle bisector of theincident and scattered photon. The deviation from this rule is of first order in v/c.

Remark. We see that the kinetic energy of the scattered electron increases when using harder x-rays. The scattered electrons have been visible by means of a Wilson cloud chamber, for thefirst time by Compton in 1925. Moreover Bothe and Geiger in 1925, have used an electroniccoincidence circuit connecting two Geiger counters, and have demonstrated the scattered electronand x-ray to appear simultaneously. Decreasing the intensity of the incoming x-ray, one obtainsin the counters no continuous intensity, but discrete pulses. Similar to observations with a photonmultiplier, these observations cannot be explained by considering light as a wave. Light acts,in these experiments, as a flow of discrete particles with all mechanical properties of a particle.During the same experiment, in the interaction with the Bragg crystal, diffraction is used for themeasurement of the wave length. Here the same light acts as a wave!

Part of the information above is taken from the script Wellen und Quanten, by K.R. Schubert(2004/5), professor of physics.

http://hep.phy.tu-dresden/~schubert/physik3.html

Problem 2.13. Rotate the lab-system by the angle φ such that the scattered electron moves alongthe positive x-axis. Transform the components of the four-momenta for the incoming and scatteredphoton and electron to this system. Convince yourself that

sin(φ − θ)sin φ

ω′= 1 +

λC

λ(1 − cos θ)

Conclude that for an experiment with λ λC , one gets indeed φ ≈ θ/2 − 90.

Answer. One applies the rotation matrix

R =

1 0 0 00 cos φ sin φ 00 − sin φ cos φ 00 0 0 1

to get transformed components of four-momenta. For the incident photon and electron

(~ω/c)R[1, 1, 0, 0]T = (~ω/c)[1, cos φ,− sin φ, 0]T and

(mc)R[1, 0, 0, 0]T = (mc)[1, 0, 0, 0]T

40 F. Rothe

and for the scattered photon and electron

(~ω′/c)R[1, cos θ, sin θ, 0]T = (~ω′/c)[1, cos(θ − φ), sin(θ − φ), 0]T and

(E′/c)R[1, β′ cos φ, β′ sin φ, 0]T = (E′/c)[1, β′, 0, 0]T

Only the photon has a momentum along the new y-axis. By conservation of momentum one gets−~ω sin φ = ~ω′ sin(θ − φ). Together with the formula for the shift of the wave length

sin(φ − θ)sin φ

ω′= 1 +

λC

λ(1 − cos θ)

For an experiment with λ λC , one gets indeed sin(φ − θ) ≈ sin φ. Hence φ − θ ≈ 180 − φ , φ ≈θ/2 + 90 and |φ| ≈ 90 − |θ|/2.

2.10. Collision of particles

Problem 2.14. Inverse Compton scattering 1 occurs whenever a photon scatters off a particlemoving with almost speed of light. Suppose that a particle with rest mass M and total energy Ecollides head on with a photon of energy Eγ. For simplicity, assume that the scattered particlesmove back along the same axis.

(i) Convince yourself that the conservation of the four-momentum impliesEγ(E + cp) = E′γ(E − cp) + 2EγE′γ

(ii) Solve for E′γ and use that Mc2 E and hence E + cp ≈ 2E holds with any accuracy needed.Show that under this assumption the scattered photon has the energy

E′γ =4E2 · Eγ

M2c4 + 4E · Eγ

(iii) Such a situation may occur for example for a collision of an ultra-relativistic cosmic ray witha photon from the cosmic background radiation. How much energy can such a cosmic rayproton transfer to a microwave background photon?

Energies up to E = 1020 eV may occur in cosmic rays. The proton has rest energyMc2 = 938.272 MeV. A typical energy of photon from the cosmic background radiationis Eγ = 2.7K · 8.617 · 10−5 eV/K.

Answer. The four-momenta of the incident photon and proton are

cq = Eγ[1, 1, 0, 0] and cp = [E,−cp, 0, 0]

the four-momenta for the scattered photon and proton are

cq′ = E′γ[1,−1, 0, 0] and cp′ = [E′,−cp′, 0, 0]

From the conservation of four-momentum (2.9) we eliminate the less interesting quantity p′. Iprefer to take advantage of the invariant scalar products and get, with the same calculation as above

q · p = q′ · p′ = q′ · (p + q − q′) = q′ · (p + q)

1 See Hobson [7] p.132 problem 5.14

Topics from Relativity 41

We put the assumed vectors into the last equation and obtain

Eγ(E + cp) = E′γ(E − cp) + 2EγE′γ

E′γ =Eγ(E + cp)

E − cp + 2Eγ=

Eγ(E + cp)2

E2 − c2 p2 + 2Eγ(E + cp)=

Eγ(E + cp)2

M2c4 + 2Eγ(E + cp)

We use that Mc2 E and hence E + cp ≈ 2E holds with any accuracy needed to simplify

E′γ =4Eγ · E2

M2c4 + 4Eγ · E

Definition 2.4 (Center of mass system). For any set of colliding particles, the center of masssystem or CMS, is the inertial system for which the total momentum of the incoming particles iszero. Thus their four-momentum is [Mc, 0, 0, 0] where M is the total rest mass.

Problem 2.15. Consider the Compton scattering process in the CMS system, with incoming photonmoving in the positive x-direction. Write down the four-momenta of the incident photon andelectron, and the possible four-momenta for the scattered photon and electron.

Answer. The four-momenta of the incident photon and electron are

q = (~ω/c)[1, 1, 0, 0] and p = (γmc)[1,−β, 0, 0]

with ~ω = mc2βγ, and the possible four-momenta for the scattered photon and electron are

q′ = (~ω/c)[1, cos θ, sin θ, 0] and p′ = (γmc)[1,−β cos θ,−β sin θ, 0]

with any angle θ.

Problem 2.16. The Bevatron at Berkeley was built with the idea of producing antiprotons, 1 by thereaction p+ p→ p+ p+ p+ p. It was designed to give about 6.2 GeV kinetic energy to the protons itaccelerates. Thus one intended to let a high-energy proton strike a proton at rest;—at that point ofhistory, it was not yet possible to send two proton rays against each other. By known conservationlaws, it was clear that only an additional pair of proton and antiproton could be expected. Findthe threshold for the energy E, and kinetic energy T , of the incoming proton at which this reactionbecomes possible.

(i) Calculate the total energy-momentum four-vector ptot of the incoming particles in the labsystem.

(ii) At the threshold, the created four particles cannot have any additional kinetic energy. Thusthey need to be at rest in the CMS-system. Calculate the total energy-momentum four-vectorp′tot of the scattered particles in the CMS-system.

(iii) From the conservation of the four-momentum and Lorentz invariance, we know that

ptot · ptot = p′tot · p′tot

(iv) Determine E and c|~p | from the two equations

(E + Mc2)2 − c2~p 2 = (4Mc2)2

E2 − c2~p 2 = M2c4

1 See also R. Feynman [10], vo. II, chpt. 25 and D. Griffiths [5], p.106

42 F. Rothe

(v) Determine the kinetic energy T and the speed of the incoming protons at threshold.

Answer. (i) ptot = [(E + Mc2)/c, p, 0, 0] is the total four-momentum for the incoming two protons,in the lab system.

(ii) p′tot = [4Mc, 0, 0, 0] is the total energy-momentum four-vector of the four scattered particles,in the CMS system.

(iii) The Lorentz invariant of the total four-momenta are

ptot · ptot = (E/c + Mc)2 − ~p 2

p′tot · p′tot = 16M2c2

(iv) From the conservation of the four-momentum and Lorentz invariance, we get the first equa-tion. The second one refers to the incoming proton from the ray.

(E + Mc2)2 − c2~p 2 = (4Mc2)2

E2 − c2~p 2 = M2c4

One gets E = 7Mc2 and c|~p | =√

48Mc2.

(v) T = E − Mc2 = 6Mc2. Indeed the antiprotons were discovered when the machine reachedabout 6000 MeV. At threshold, the speed of the proton is β = (

√48)/7 times the speed of

light.

+++++++++++++++++++++++

2.11. The motion of particles

Definition 2.5 (Four-force).f µ =

[(γ/c)~v · ~F, γ ~F

]is called the four-force.

Proposition 2.1. We imagine that a particle is moving along some path, under the influence offorces of whatever origin. Newton’s law

~F =d~pdt

(Newton)

together with the relationW = ~F · ~x (work)

for the work W done by a force ~F during a motion over a distance ~x are equivalent to the equationof motion

f µ =d pµ

dτ(2.11)

Under the assumptions made above, the four-force is a four-vector.

Reason. The relation (work) gives the rate at which work is done on the particle, which is equal tothe rate of energy increase for the particle. Hence

dEdt

= ~F ·d~xdt

Topics from Relativity 43

Now the definition of the four-force and Newton’s law (Newton) yield

f µ =

d ~xd ct· ~F, γ ~F

]=

dEd ct

, γd~pd t

]We convert the derivatives by coordinate time back to derivatives by the proper time, and finallyuse the definition of the four-momentum. Hence

f µ =

[dEd cτ

,d~pd τ

]=

d pµ

The converse statement is checked in the same way.

Remark. The conservation of the four-momentum both for the free motion, and in collisionprocesses is well confirmed experimentally, and justified theoretically. Already for these reason, itis sound to formulate Newton’s equation of motion together with the rate of work, and in terms ofa differential equation for the four-momentum pµ. Moreover, the actual studies about the motionof electrons and other particles under the influence of electromagnetic forces confirm the equationof motion

d~pdt

= q[~E + ~v × ~B] (Lorentz)

for a charged particle with charge q, both in the non-relativistic as well the relativistic regions.Based on these facts, the occurrence of the acceleration in the traditional form of Newton’s law

seems for me to be not more than an artifact. The complicated distinguishment of the parallel andtransverse mass in the corrected form of the Newton’s law is a further hint supporting that point ofview.

Definition 2.6 (Four-accelaration). For any particle with four-velocity [uµ], and moving alongany path xµ = xµ(τ), with the proper time τ as parameter, the four-accelaration vector [aµ] isdefined by

aµ =d uµ

Lemma 2.2. In the Minkowski metric, the four-velocity and the four-acceleration are perpendicu-lar. For a material particle, the four-acceleration is a space-like vector.

Reason. We know from lemma 2.1 that the four-velocity has length c. Differentiating by the propertime yields

u · u = c2

0 =d( u · u )

dτ= 2 u · a

For a material particle the four-velocity is a time-like vector. As shown in lemma 3.3, any vectorperpendicular to a time-like vector is space-like.

Proposition 2.2. We assume the equations of motion (2.11) to be valid. Equivalent are:

(i) The equations of motion have the form f µ = maµ;

(ii) the four-force [ f µ] does not change the rest mass m;

(iii) the four-force is perpendicular to the four-velocity: f · u = f µuµ = 0.

44 F. Rothe

Proof. The equations of motion (2.11) imply

f µ =d muµ

dτ=

d mdτ

uµ + maµ (2.12)

( f µ − maµ)uµ =d mdτ

uµuµ = c2 d mdτ

(2.13)

Assume item (i) holds. We conclude 0 = ( f µ − maµ)uµ = c2 d mdτ and thus item (ii) holds. Too, item

(i) and lemma 2.2 yield f µuµ = maµuµ = 0 and thus item (iii) holds.Conversely assume item (iii) to hold. Hence lemma 2.2 and equation (2.13) imply 0 = f µuµ =

( f µ − maµ)uµ = c2 d mdτ and thus item (ii) holds.

Assume now item (ii) to hold. Hence we get the equation of motion in the form f µ = d muµdτ =

d mdτ uµ + maµ = maµ of item (i).

Lemma 2.3. In terms of the velocity ~v, acceleration ~a and relativity parameter γ, the four-acceleration has the components

a0 = cγdγdt

, ai = γ2~ai + γdγdt~vi for i = 1, 2, 3

Topics from Relativity 45

3. The Lorentz Group

3.1. Different aging of twins

Problem 3.1 (Twin paradox or travelling keeps young). For this problem, I put c = 1 andconsider only one space dimension. We do a series of simplifying calculations, and prove theconjecture at first in 1 + 1 dimensions. Confirm for any two time-like vectors x = (t, x) ∈ F utureand p = (E, p) ∈ F uture, the reversed triangle inequality

|x + p| ≥ |x| + |p|

holds—even with proper inequality > unless they are proportional.

Answer. All the following inequalities are equivalent:

|x| + |p| ≤ |x + p||x|2 + |p|2 + 2|x||p| ≤ |x + p|2

t2 − x2 + E2 − p2 + 2√

(t2 − x2)(E2 − p2) ≤ (t + E)2 − (x + p)2

2√

(t2 − x2)(E2 − p2) ≤ 2tE − 2xp

(t2 − x2)(E2 − p2) ≤[tE − xp

]2

−t2 p2 − x2E2 ≤ −2tExp

0 ≤ (tp − xE)2

Problem 3.2. At least the calculation in problem 3.1 above is a guide, as I go now back to 1 + 3dimensions. Check that any two vectors x,p ∈ F uture ∪ Light+ have Minkowski scalar productx · p ≥ 0.

Answer. Let x = (t, x, y, z) and p = (E, p, q, r) be the two vectors in F uture ∪ Light+. We calculatethe Minkowski scalar product, we use the common Cauchy-Schwarz inequality xp + yq + zr ≤√

(x2 + y2 + z2)(p2 + q2 + r2). Since the vectors are assumed to be in the future or light cone,0 ≤

√x2 + y2 + z2 ≤ t and 0 ≤

√p2 + q2 + r2 ≤ E. We obtain as claimed

− x · p = −tE + xp + yq + zr ≤√

(x2 + y2 + z2)(p2 + q2 + r2) − tE ≤ 0 (3.1)

Problem 3.3. Assume that two nonzero vectors x,p ∈ F uture ∪ Light+ \ 0 have Minkowskiscalar product x · p = 0. Check that x = αp ∈ Light+ holds with α > 0. Thus they are light-likeand linearly dependent.

Answer. The assumption that the vectors are nonzero implies t > 0, E > 0 and tE > 0. Hence(x, y, z) , 0 and (p, q, r) , 0. We get equality everywhere in formula (3.1). Hence the vectors(x, y, z) and (p, q, r) are proportional, as can be seen from the following calculation:

(xp + qy + rz)2 = (x2 + y2 + z2)(p2 + q2 + r2)

2xpyq + 2xprz + 2qyrz = (x2q2 + y2 p2) + (x2r2 + z2 p2) + (y2q2 + z2q2)

(xq − yp)2 + (xr − zp)2 + (yr − zq)2 = 0xq = yp , xr = zp , yr = zq

46 F. Rothe

Since (x, y, z) , 0 and (p, q, r) , 0, we conclude that x = αp, y = αq, z = αr with some α , 0.Furthermore,√

(x2 + y2 + z2)(p2 + q2 + r2) = tE ,

√x2 + y2 + z2 ≤ t and

√p2 + q2 + r2 ≤ E

imply√

x2 + y2 + z2 = t and√

p2 + q2 + r2 = E. Hence x,p ∈ Light+ and t = αE with α > 0.Together, we have confirmed x = αp ∈ Light+ as conjectured.

Problem 3.4. Assume the sum of any two light-like vectors is light-like. Prove that they are linearlydependent.

Answer. We assume x,p ∈ Light and x + p ∈ Light. Calculate the Minkowski scalar products

0 = x + p · x + p = x · x + p · p + 2 x · p = 2 x · pHence the two vectors are orthogonal. By the last problem, they are linearly dependent.

Lemma 3.1 (Reversed inequalities in the future cone). For any two future time-like or futurelight-like vectors x,p ∈ F uture ∪ Light+ we get the inequalities

0 ≤ ( x · x )( p · p ) ≤ ( x · p )2 (3.2)|x||p| ≤ x · p (3.3)

|x| + |p| ≤ |x + p| (3.4)

Equality occurs in any one of these formulas if and only if the vectors x and p are linearlydependent. Note that these inequalities in the future cone are the reversed versions of thecorresponding inequalities from Euclidean geometry.

Proof. Take vectors x = (t, x, y, z) and p = (E, p, q, r) ∈ F uture ∪ Light+ and p , 0.

( x · x )( p · p ) − ( x · p )2

= (t2 − x2 − y2 − z2)(E2 − p2 − q2 − r2) − (tE − xp − yq − zr)2

≤ (t2 − x2 − y2 − z2)(E2 − p2 − q2 − r2) −[tE −

√(x2 + y2 + z2)(p2 + q2 + r2)

]2

= t2E2 + (x2 + y2 + z2)(p2 + q2 + r2) − t2(p2 + q2 + r2) − (x2 + y2 + z2)E2

− t2E2 − (x2 + y2 + z2)(p2 + q2 + r2) + 2tE√

(x2 + y2 + z2)(p2 + q2 + r2)

= −t2(p2 + q2 + r2) − E2(x2 + y2 + z2) + 2√

t2(p2 + q2 + r2)√

E2(x2 + y2 + z2)

= −

[√t2(p2 + q2 + r2) −

√E2(x2 + y2 + z2)

]2

≤ 0

We have confirmed formula (3.2). To check under which assumptions equality occurs, assumeformula (3.2) holds with equality. Equality occurs in the common Cauchy-Schwarz inequalityxp + yq + zr =

√(x2 + y2 + z2)(p2 + q2 + r2), hence x = αp, y = αq, z = αr with some factor

α. The last line implies√

t2(p2 + q2 + r2) =√

E2(x2 + y2 + z2) and hence t = αE and α ≥ 0.Together, we see that x = αp.

Formula (3.3) follows by taking roots since x · p ≥ 0 by Problem 3.2. Finally we check thereversed triangle inequality (3.4)

−|x + p|2 + (|x| + |p|)2 = −|x + p|2 + |x|2 + |p|2 + 2|x||p|= x + p · x + p − x · x − p · p + 2|x||p|= 2 x · p + 2|x||p| ≤ 0

Topics from Relativity 47

Lemma 3.2 (Convexity)..• The convex combination of any two independent future light-like vectors is in the future cone.• The future cone F uture is convex.• The union F uture ∪ Light+ of the future cone and the future light cone is convex.

Proof. For given linearly independent vectors x,p ∈ Light+ the inequality (3.4) holds in the strictform. Hence with any 0 < α < 1

0 = |αx| + |(1 − α)p| < |αx + (1 − α)p|

and αx + (1 − α)p ∈ F uture. The other claims follow similarly.

Proposition 3.1 (Twin paradox). For any two future time-like or future light-like vectors x,p ∈F uture ∪ Light+ which are not linearly dependent holds the reversed strict triangle inequality 1

|x + p| > |x| + |p|

Lemma 3.3 (Orthogonal complement of a one-dimension space). Let e , 0 be any vector. Theorthogonal complement e⊥ is always three-dimensional.

If e , 0 is time-like, the orthogonal complement e⊥ consists only of space-like vectors.

If e , 0 is light-like, the orthogonal complement e⊥ is spanned by e itself and two space-likevectors.

If e , 0 is space-like, the orthogonal complement e⊥ contains both space-like and time-likevectors. There is a basis of two light-like, and a space-like vector.

+??????????++++++++++++++++++++++++++++++ We use the abbreviationImagine that a radar signal is sent from a station on earth to the moon and the reflected signal

is received. The duration between the sending time t1 and the receiving time t2 is measured. Inwhich sense can we conclude the distance from the station on earth to the reflector on the moon isc(t2 − t1)/2?

I assume for simplicity that both the station on earth and the reflector on the moon move withoutacceleration, and do not discuss the effects of gravity. So I deal with the following simplifiedpicture: In space-time, there is a sender active at space-time s, a reflector on the moon hit at space-time m, and a receiver getting the signal back at space-time r. The vectors m − s and r − m alongthe radar signal are light-like. The proper time |r − s| is measured.

Question. What is the distance between the world line s r and the reflector m? How can the spatialdistance be specified precisely?

Answer. Indeed we can say the distance to be moon seen in the frame with sender and receiver onthe same spot is |r − s|/2. Let

h =s + r

2be the midpoint of the segment s r from sender to receiver. We get the following three statementsspecifying a spatial distance:

(a) The vectors m − h and r − s are perpendicular:

〈 m − h , r − s 〉 = 0

1 That is why traveling keeps young!

48 F. Rothe

(b) The space-like distance from the midpoint to the moon is greater than the distance from anyother point on the word line sender-receiver to the moon:

〈 m − h , m − h 〉 ≥ 〈 m − p , m − p 〉 for any point p on the word line s r.

(c) The space-like distance from the midpoint to the moon is the (negative) half the reflection time:

〈 m − h , m − h 〉 = −14〈 s − r , s − r 〉

3.2. The Lorentz transformations

Definition 3.1. A Lorentz-transformation is a linear mapping from space-time to space-time thatleaves the Minkowski metric (2.5) invariant.

Let A be the matrix for the Lorentz transformation. Any two vectors x, y and their imagesAx, Ay satisfy

〈 Ax , Ay 〉 = x · y (3.5)

In other words, this means that the Lorentz transformations are the isometries for the Minkowskimetric. We convert requirement to a matrix equation characterizing the Lorentz transformations.For all vectors x, y

xT ATGA y = (Ax)T G Ay = xT G y

yields the matrix equationAT G A = G (3.6)

Proposition 3.2. The set of all Lorentz transformations is a group.

Definition 3.2 (orthochronous and time-reversing transformations)..• A Lorentz transformation which maps the future cone to itself is called orthochronous.• A Lorentz transformation which maps the future cone to the past cone is called antichronous

or time-reversing.

Problem 3.5. Give a reason why any Lorentz transformation either maps future and past cones tothemselves, or exchanges them.

Reason. The invariance of the Minkowski metric implies

A(F uture) ⊆ F uture ∪ (−F uture)

Since F uture is a convex set, the image A(F uture) is convex, too. There exist linear convexcombinations of a time-like vector in the future and the past which lie in the present. Hence anyconvex subset of the union F uture∪−F uture is either a subset of F uture or a subset of −F uture.Since a Lorentz transformation is a bijection, and the inverse is Lorentz, too, we conclude that eitherA(F uture) = F uture or A(F uture) = (−F uture). These two cases lead to the orthochronous andtime-reversing transformations.

From the matrix equation (3.6), we see that (det A)2 = 1 and hence det A = +1 or det A = −1.

Topics from Relativity 49

Definition 3.3. An orthochronous Lorentz transformation with determinant one is called proper.The subgroup of proper Lorentz transformations is the proper Lorentz group.

Problem 3.6. Fix an arbitrary vector a. Show that the transformation

x′ =−x + x · x a

NN = 1 − 2 x · a + a · a x · x

(3.7)

is its own inverse. It leaves the light-cone invariant. This is a nonlinear conformal transformation.

Problem 3.7. Prove that any linear conformal transformation x 7→ Ax such that x · x = 〈 Ax , Ax 〉for all vectors x is a Lorentz transformation.

Lemma 3.4. If a subspace U ⊂ R3 is invariant, then the orthogonal complement is Lorentzinvariant, too. Especially, the orthogonal complement of an eigenvector is invariant.

Proof. Assume the subspace U ⊂ R3 is invariant for Lorentz transformation A. This means bydefinition A(U) ⊆ U, and hence A(U) ⊆ U ⊆ A−1A(U) ⊂ A−1(U). Since the Lorentz transformationis invertible, these spaces have equal finite dimension and hence A(U) = U = A−1(U).

Given is any vector x ∈ U⊥. Hence x · u = 0 for all u ∈ U. The Lorentz invariance implies

〈 Ax , Au 〉 = x · u = 0

for all u ∈ U. HenceAx ∈ A(U)⊥ = U⊥

as to be shown.

Lemma 3.5. Assume a Lorentz transformation A has two linearly independent light-like eigenvec-tors l, m ∈ Light+. Then there eigenvalues ν, µ have product νµ = 1. The vector b orthogonal totheir span is a space-like eigenvector, indeed Ab = (det A)b.

Proof. Let U = span(l, m) be the linear span of the two light-like eigenvectors. The orthogonalcomplement space U⊥ is spanned by one space-like vector b. Since the orthogonal complement ofan invariant space is invariant, this is an eigenvector.

Al = νl , Am = µm and Ab = βb

The Lorentz invariance implies

〈 Al , Am 〉 = l · mνµ l · m = l · m

Since the two light like vectors are independent, we know by Lemma 3.1, inequality (3.3) thatl · m < 0, and hence νµ = 1 and ν = µ−1. Since det A = νµβ, we get β = det A.

Question. Explain these facts geometrically in Klein’s model.

Lemma 3.6. Assume an orthochronous Lorentz transformation A has a space-like eigenvector.Then there exist

(i) either two linearly independent light-like eigenvectors;

(ii) or a time-like eigenvector.

50 F. Rothe

If both (i) and (ii) happen together, the transformation is the identity A = I, or det A = −1, andthese three vectors lie in a plane and the transformation A is a reflection across this plane.

Proof. Let b ∈ Present be the eigenvector and Ab = βb. The orthogonal complement b⊥ isinvariant, too. It contains two different linearly independent light-like eigenvectors l, m ∈ Light+.Now the invariance of b⊥ together with the Lorentz invariance leaves us with two possibilities:

(i) They are both eigenvectors: Al = νl and Am = µm

(ii) They are switched: Al = νm and Am = µl

Since the Lorentz transformation A is orthochronous, in both cases ν, µ > 0. Here are the furtherconclusions for case (ii). Indeed

A(õ l +

√ν m) =

√νµ (√µ l +

√ν m)

is an eigenvector, which is a convex combination of vectors in the future light-cone, and hence byProblem ?? it is time-like.

Now we assume that both (i) and (ii) happen together. Hence ν = µ > 0 and, by Lemma 3.5,νµ = 1, which implies ν = µ = 1. By Lemma 3.5, the orthogonal complement U⊥ = span(l, m)⊥

is a space-like eigenvector satisfying Ab = (det A)b. Altogether, we see that either det A = 1 andA = I, or det A = −1 and the transformation A is a reflection across the plane U = span(l, m).

Theorem 3.1 (Structure of a orthochronous Lorentz transformation). Each proper Lorentztransformation A has an eigenvector with eigenvalue 1. Except for the identity, this eigenvector eis unique. There are three mutually exclusive cases:

Rotation If the eigenvector e is time-like, the orthogonal complement is an invariant planespanned by two space-like vectors. The restriction of A to this plane is a rotation by someangle α. There exists a proper Lorentz transformation S such that

S −1AS =

cosα sinα 0− sinα cosα 0

0 0 1

and hence TrA = 1 + 2 cosα.

Lorentz boost If the eigenvector e is space-like, the orthogonal complement has an orthonormalbasis of a space-like and a time-like unit vector. The restriction of A to this plane is a Lorentzboost with some Lobachevskij parameter λ. There exists a proper Lorentz transformation Ssuch that

S −1AS =

cosh λ 0 sinh λ0 det A 0

sinh λ 0 cosh λ

and hence TrA = det A + 2 cosh λ.

Rotation about a light ray If the eigenvector e is light-like, the orthogonal complement hasan orthonormal basis of a space-like unit vector and a light-like vector. The restrictionof A to this plane is sheering map. The transformation A has the eigenvalue one withgeometric multiplicity one and algebraic multiplicity three. There exists a proper Lorentztransformation S such that

S −1AS =

1 − α2

2 −α α2

2α 1 −α

−α2

2 −α 1 + α2

2

(3.8)

Indeed, the parameter α > 0 or α < 0 can be chosen arbitrarily—except for its sign!

Topics from Relativity 51

Identity The existence of three linearly independent light-like eigenvectors implies that thetransformation is the identity.

Each orthochronous Lorentz transformation A with det A = −1 has an eigenvector with eigenvalue−1. This eigenvector b is unique, and it is always space-like.

The orthogonal complement b⊥ has an orthonormal basis of a space-like and a time-like unitvector. The restriction of A to this plane is a Lorentz boost with some Lobachevskij parameter λ.Only the Lorentz boost case can occur, where λ = 0 yields a common reflection.

Proof. Given is an orthochronous Lorentz transformation A. Since dimension 3 is odd, there existsa real eigenvalue and eigenvector. It is obvious from logic to state the following mutually exclusivecases:

(a) There exists a time-like eigenvector.

(b) There does not exist any time-like eigenvector, but there exists two light-like eigenvectors.

(c) There exist no time-like and only one light-like eigenvector.

(d) There exist three linearly independent light-like eigenvectors.

(e) There exist only space-like eigenvectors.

Consider case (a) and let e ∈ F uture be a time-like eigenvector, hence Ae = αe. The Lorentzinvariance (3.6) implies α2 = 1. Because the transformation is orthochronous, we know thatα = +1. By Lemma 3.4, the orthogonal complement e⊥ is invariant, too. By Lemma 3.3, theorthogonal complement is spanned by two space-like vectors a and b. We may choose a, b, e all tobe orthogonal unit vectors.

The matrix S for the change of basis has the new basis vectors as columns:

S =[a, b, e

]Question. Check that our choice of the basis makes S an orthochronous Lorentz transformation,and indeed the sign of a can be chosen to make it a proper one.

Too, a straightforward matrix calculation—that each student needs to do at least ten times—tells that the above explanations mean

AS = S N with the normal form N =

n11 n12 0n21 n22 00 0 1

Since the proper Lorentz transformations are a group, the normal form is an orthochronous Lorentztransformation and det N = det A. Because of the zeros and one occurring in N, the 2× 2 matrix onthe upper left corner is a two-dimensional rotation or reflection. In the latter case, we can choosethe space-like base vectors along and perpendicular to the axis of reflection, and get even case (b)with λ = 0.

Now we consider case (b). Let l, m ∈ Light+ be two linearly independent light-like eigenvec-tors. The eigenvalues are µ > 0, since the transformation is orthochronous and µ−1 by Lemma 3.5:

Am = µm and Al = µ−1l

52 F. Rothe

Any linear combination of l and m with positive coefficients is in the future cone, as shown inProblem ??. Hence the span U = span(l, m) contains a normalized time-like vector e ∈ F uture,from which we get an orthogonal space-like vector a ∈ U. Here is the most simple choice:

a =2l − 2ml · m

e =2l + 2ml · m

The action of transformation A on these vectors is

A(a + e) = µ−1(a + e)A(a − e) = µ(a − e)

Aa =µ + µ−1

2a +

µ−1 − µ

2e

Ae =µ−1 − µ

2a +

µ + µ−1

2e

In terms of the Lobachevskij parameter λ, the matrix elements are

µ + µ−1

2= cosh λ and

µ−1 − µ

2= sinh λ

The orthogonal complement space U⊥ is spanned by one space-like vector b. Since the orthogonalcomplement of an invariant space is invariant, this is an eigenvector with eigenvalue det A. Thematrix S = [a, b, e] for the change of basis has these new basis vectors as columns. Our choiceof an orthonormal basis makes S an orthochronous Lorentz transformation. Indeed, switching thesign of a if needed, makes det S = +1 and hence S a proper Lorentz transformation. The abovecalculations are summarized in a matrix equation

AS = S N with the normal form N =

cosh λ 0 sinh λ0 det A 0

sinh λ 0 cosh λ

Consider case (d) and assume there exist three linearly independent light-like eigenvectors.

Using just any two of them, we argue as in case (b). The existence of the third eigenvector impliesµ = 1 and hence Lobachevskij parameter λ = 0. Hence the transformation is the identity.

Finally we discuss the case (e), with the only purpose to rule it out. Assume there exist onlyspace-like eigenvectors. Let b ∈ Present and Ab = βb. By Lemma 3.6 two cases can occur:

(i) either two linearly independent light-like eigenvectors exist—which has already been consid-ered in case (b);

(ii) or there exists a time-like eigenvector—which has already been considered in case (a).

The case (c) is possible for det A = 1, and interesting. The argument is continued in the nextsection.

Topics from Relativity 53

3.3. Infinitesimal generators LetA(x) = exB (3.9)

in the sense of the matrix exponential function. The set of A(x) for all real x is called a one-parameter group and B is called the generator.

Lemma 3.7. Formula (3.9) generates a one-parameter group of Lorentz transformations if andonly if

BT G + G B = 0 (3.10)

Proof. We need to confirm that

exBTG exB = G (3.11)

Differentiating by x yields

ddx

exBTG exB = exBT

(BT G + G B) exB

Equation (3.11) holds for all x if and only if the derivative of the left-hand side is zero whichhappens if and only the relation (3.10) holds for the generator.

In 2 + 1 dimensions with the Minkowski metric as above, relation (3.10) holds if and only if

B =

0 −b3 b2b3 0 −b1b2 −b1 0

= b1S 1 + b2S 2 + b3S 3 (3.12)

where I have defined

S 1 =

0 0 00 0 −10 −1 0

S 2 =

0 0 10 0 01 0 0

S 3 =

0 −1 01 0 00 0 0

We need to assume b , 0.

Lemma 3.8. The eigenvectors of the Lorentz transformation exB are eigenvectors of the generatorB, too. If generator B has the eigenvalue β, the Lorentz transformation A = exB has the eigenvalueexβ with the same eigenvector.

The characteristic polynomial of B is

det(B − λI) = −λ3 + (b21 + b2

2 − b23)λ

It is easy to see that B has the eigenvalue 0 with eigenvector b = (b1, b2, b3)T . Hence b is theeigenvector with eigenvalue one for the Lorentz transformation exB. The other two eigenvalues of

B are ±√

b21 + b2

2 − b23.

For the evaluation of the exponential series, one needs the powers

B2 =

−b23 + b2

2 −b1b2 b1b3−b1b2 −b2

3 + b21 b2b3

−b1b3 −b2b3 b21 + b2

2

(3.13)

B3 = (b21 + b2

2 − b23)B

54 F. Rothe

The last formula can be obtained from the Hamilton-Cayley theorem. By the Hamilton-Cayleytheorem, one gets zero by plugging the matrix into its own characteristic polynomial.

I now consider the case (c) of Theorem 3.1. In this case b is a light-like vector. Hence B3 = 0.This implies that the exponential series of formula (3.9) has only three terms:

exB = I + xB +x2

2B2

Problem 3.8. Calculate the example replacing b by f = (1, 0, 1)T and matrix B by

F =

0 −1 01 0 −10 −1 0

= S 1 + S 3

Confirm you get back the right-hand side of formula (3.8) in Theorem 3.1, the Special case. Hence

N = eαF

Proof of Theorem 3.1 for case (c). Given any other proper Lorentz transformation A in the case(c) of Theorem 3.1. Let f = (1, 0, 1)T be the eigenvector for the normal form N in formula (3.8).There exists a proper Lorentz transformation R such that

b = R f

The matrices N and R−1AR have the eigenvector f in common. Are the matrices N and R−1ARequal? No, one cannot expect such a coincidence, because of the free parameter α appearing informula (3.8). But there is a similarity transformation to adjust this parameter. Let

L(λ) =

cosh λ 0 sinh λ0 1 0

sinh λ 0 cosh λ

With a bid of calculations one sees that

L(λ)−1FL(λ) = eλF

Hence the exponential series implies

L(λ)−1eαF L(λ) = eαeλF

Thus we get a transformation α 7→ αeλ.We choose λ such that the matrices L(λ)−1NL(λ) and R−1AR have equal restrictions to the

invariant subspace f⊥. After all the work, these two matrices have the following commonproperties:• Both are proper Lorentz transformations.• Both have the eigenvector f with eigenvalue 1.• Both have equal restrictions to the invariant subspace f⊥.• Both have the same characteristic polynomial.• Both have the same Jordan normal form with triple eigenvalue 1, and only a one-dimensional

eigenspace spanned by f .Question. Give a detailed reasoning with linear algebra to prove that any two matrices with all thecommon properties mentioned above are equal.Question. Given are two automorphic collineations, and of both one knows• They preserve the orientation.

Topics from Relativity 55

• They have the same fixpoint on the line of infinity.• Neither one of them has a second fixpointGive a simple geometric argument why these two automorphic collineations are equal.In the end, with the appropriate choice of λ, we have confirmed that R−1AR = L(λ)−1NL(λ). HenceS −1AS = N holds with the proper Lorentz transformation S = RL(λ)−1.

Problem 3.9. Convince (at least) yourself that rotation angle α mod 2π, and the Lobacheskijparameter λ in Theorem 3.1, and the shift α along a horocycle are unique, once you require thatthe transformation S is a proper Lorentz transformation.

56 F. Rothe

4. The Poincaré Half-Plane Model

In the first subsection, we construct the half-plane model via an isometric mapping of the diskto the half plane. We obtain the hyperbolic lines and distances as expected. Both the distance oftwo points and the Riemann metric of the Poincaré’s half-plane are calculated via the isometry.

The next section explains the Euler-Lagrange equation from the calculus of variations. Inthe following sections, we reconstruct all features of the Poincaré’s half-plane model, taking theRiemann metric as starting point.

At first, the curve of minimal distance between any two given points is calculated from theEuler-Lagrange equation. It turns out to be a circular arc with center on the boundary line of thehalf-plane, or—in a special exceptional case—a Euclidean line perpendicular to the boundary line.These minimizing lines specify the hyperbolic line connecting the two given points. The minimumof the hyperbolic length of any connecting curve determines the hyperbolic distances betweenthese two points. Using the Riemann metric, the length of the hyperbolic segment is obtained byintegration. The hyperbolic distance turns out to be the logarithm of the cross ratio of the twoendpoints of the given segment, and the ideal endpoints of the hyperbolic line through these twopoints.

4.1. Poincaré half-plane and Poincaré disk The open unit disk is denoted by D = (x, y) :x2 + y2 < 1, and its boundary is ∂D = (x, y) : x2 + y2 = 1. We denote the upper open half-planeby H = (u, v) : v > 0. Its boundary is just the real axis ∂H = (u, v) : v = 0. We shall constructthe half-plane model via an isometric mapping of the disk to the half plane. It is convenient to usethe complex variables z = x + iy and w = u + iv. We use the notation w = u − iv for the conjugatecomplex.

Proposition 4.1 (Isometric mapping of the half-plane to the disk). The linear fractional function

w = i1 − z1 + z

(4.1)

is a conformal mapping, and a bijection from CC ∪ ∞ to CC ∪ ∞. The inverse mapping is

z =i − wi + w

(4.2)

These bijective mappings preserves angles, the cross ratio, the orientation, and map generalizedcircles to generalized circles.

The unit disk D = z = x + iy : x2 + y2 < 1 is mapped bijectively to the upper half-planeH = w = u + iv : v > 0. Especially, one easily checks that

z = −1 7→ w = ∞ , z = i 7→ w = 1 , z = 1 7→ w = 0 , z = 0 7→ w = i

Problem 4.1..

(a) We check whether the mapping (4.1) maps indeed the boundaries ∂D 7→ ∂H and find therestriction of the mapping to the boundary. Confirm that a point z = eiθ is mapped tow = tan θ

2 . Do not separate real and imaginary parts.

(b) Only now separate now real and imaginary parts. Use the mapping (4.1) to confirm theidentities

tanθ

2=

sin θ1 + cos θ

=1 − cos θ

sin θ

Topics from Relativity 57

(c) Use the inverse mapping (4.2) to confirm the identities

cos θ =1 − tan2 θ

2

1 + tan2 θ2

and sin θ =2 tan θ

2

1 + tan2 θ2

Solution of part (a). We plug z = eiθ into formula (4.1) and get

w = i1 − z1 + z

= i1 − eiθ

1 + eiθ = ie−iθ/2 − eiθ/2

e−iθ/2 + eiθ/2 =2 sin(θ/2)2 cos(θ/2)

= tanθ

2

The Poincaré half-plane model of hyperbolic geometry is constructed from the disk model viathis isometry. One translates the definitions from the section on the Poincaré disk model to thehalf-plane model and arrives at the following conventions:

The points of H are the "points" for Poincaré’s model. The points of ∂H are called "idealpoints" or "endpoints". Those are not points for the hyperbolic geometry. The "lines" forPoincaré’s model are circular arcs, or—in a special case—Euclidean lines perpendicular to ∂H.The "angles" for Poincaré’s model are the usual Euclidean angles between tangents to the circulararcs.

Remark. The inversion image of any point P = (u, v) obtained by reflection across the real axisis P′ = (u,−v). In complex notation, reflection by the real axis is complex conjugation: pointP = w = u + iv is reflected to P′ = w = u − iv.

Remark. The one-point compactification CC ∗ = CC ∪ ∞ of the complex plane is well-knownand useful in complex analysis, especially it is possible to define regularity and power series offunctions in a neighborhood of ∞. Only a linear fractional mapping as for example mapping (4.1)and its inverse are naturally extended to bijective continuous mappings 1

CC ∪ ∞ 7→ CC ∪ ∞

Hence the half-plane of Poincaré’s model gets just one point ∞. This point can occur as ideal endof a line, circle, equidistance lines or horocyle.

Especially for the half-plane, this state of affairs is a bid contrary to the common imagination.Indeed, we have a totally different definition and usage of improper points in the projectivecompletion from projective geometry.

Given are any two points A and B. Let P and Q be the ideal endpoints of the hyperbolic linethrough A and B. These points are named in a way that A, B, P,Q occur in this order during anentire turn around the circle.

For the definition of a hyperbolic "distance" and of "congruence of segments", theDefinition ?? and the preservation of the cross ratio are used as starting points. Thus we arrive atthe following

Definition 4.1 (Distance and Congruence). The hyperbolic distance of points A and B is givenby

s(A, B) := ln(AB, PQ) = ln|AP| · |BQ||AQ| · |BP|

(4.3)

Two segments AB and XY are called "congruent" iff s(A, B) = s(X,Y).

1 They are indeed the only analytic mappings with such an extension

58 F. Rothe

Since the mapping (4.2) provides an isometry between the half-plane and the disk model, wecan directly calculate the Riemann metric for Poincaré’s half-plane:

Proposition 4.2 (Riemann Metric for Poincaré’s half-plane). In the Poincaré half-plane, theinfinitesimal hyperbolic distance ds of points with coordinates (u, v) and (u + du, v + dv) is

(dsH)2 =du2 + dv2

v2 (4.4)

Proof. The metric is determined by the requirement that

z =i − wi + w

(4.2)

provides an isometry from the half-plane to the disk:

dsD = dsH (4.5)

Hence we need to convert the known metric

ds2 = 4dx2 + dy2

(1 − x2 − y2)2 (??)

of the Poinaré disk model to a metric in the half-plane. We calculate the denominator

1 − | z|2 =|i + w|2 − |i − w|2

|i + w|2=

(w + i)(w − i) − (i − w)(−i − w)|w + i|2

=4v|w + i|2

and the derivative of the mapping (4.2):

dzdw

= −2

(w + i)2

Putting these two results into formula (??) yields

ds2 =4(dx2 + dy2)(1 − x2 − y2)2 =

4| dz|2

(1 − |z|2)2

= 4∣∣∣∣∣ dzdw

∣∣∣∣∣2 | dw|2(|w + i|2

4v

)2

= 4∣∣∣∣∣ 2(w + i)2

∣∣∣∣∣2 | dw|2(|w + i|2

4v

)2

=| dw|2

v2 =du2 + dv2

v2

Thus formula (4.4) arises from the isometry (4.5) of the half-plane and the disk.

4.2. The Euler-Lagrange equation The basic problem of the calculus of variations is todetermine the curve y = y(x) between two given points (a, y(a)) and (b, y(b)) for which theprescribed functional

L[y] :=∫ b

aF(x, y, y′) dx

assumes an extremum (minima or maxima), or simply becomes stationary. 1 It turns out that thestationary curves for the functional L[y] satisfy the Euler-Lagrange equation

ddx

∂F∂y′−∂F∂y

= 0

1 In physical applications, the functional is obtained from first principles of physics.

Topics from Relativity 59

To derive the Euler-Lagrange equation, we take a pencil of connecting curves y = y(x, p) dependingsmoothly on a parameter p, and differentiate the functional L[y(., p)] by the parameter p. It iscustomary to denote the derivative of any quantity by p with the symbol δ and call it the variationof this quantity. One obtains

δL[y] =d

dpL[y] =

∫ b

a

d F(x, y, y′)dp

dx

=

∫ b

a

[∂ F(x, y, y′)

∂y·∂ y∂p

+∂ F(x, y, y′)

∂y′·∂ y′

∂p

]dx

=

[∂ F(x, y, y′)

∂y′·∂ y∂p

]b

a+

∫ b

a

[∂ F(x, y, y′)

∂y·∂ y∂p−

ddx

(∂ F(x, y, y′)

∂y′

)·∂ y∂p

]dx

The boundary terms vanish for a problem with prescribed endpoints (a, y(a)) and (b, y(b)) of thecurve. Hence we obtain

δL[y] =

∫ b

a

[∂ F∂y−

ddx

(∂ F∂y′

)]δy dx

Since the variation δy of the curve can be chosen to be any smooth function of x, the Lemma of thecalculus of variations shows that the expression in the bracket has to vanish identically. Thus weobtain the Euler-Lagrange differential equation.

Lemma 4.1 (Lemma of the calculus of variations). Let g(x) be a piecewise continuous functionand suppose that ∫ 1

0η(x)g(x)dx = 0

for all functions η ∈ C∞. Then the function g is identically zero.

Proof. We show that for a continuous function g , 0 the assertion∫ 1

0η(x)g(x)dx = 0

does not hold for all functions η ∈ C∞.Assume g , 0 and g continuous. Hence there exists 0 < a < 1 such that g(a) > 2ε > 0.

There are cases where you need to go with the negative −g and get the following reasoning for thenegative function −g.

Because of the continuity of g there exists δ > 0 such that |x − a| < 2δ implies |g(x) − g(a)| < εand hence g(x) > g(a) − ε > ε.

There exists a continuous, and even C∞ function η ≥ 0 such that η(x) = 0 for |x − a| > 2δ andη(x) = 1 for |x − a| < δ. Hence∫ 1

0η(x)g(x)dx =

∫ a+2δ

a−2δη(x)g(x)dx ≥

∫ a+δ

a−δη(x)g(x)dx = 2δε > 0

Hence the assumption that ∫ 1

0η(x)g(x)dx = 0

for all functions η ∈ C∞ does not hold.As a contrapositive, the assumptions that a continuous function g , 0 satisfies∫ 1

0η(x)g(x)dx = 0

for all functions η ∈ C∞ imply g(x) = 0 for all x.

60 F. Rothe

4.3. The curve of minimal hyperbolic length We want to find the curve of minimal hyper-bolic length connecting two given points. In this problem, it turns out to be more convenient to usethe right half plane (x, y) : x > 0 as model of hyperbolic geometry. The corresponding Riemannmetric is

ds =

√dx2 + dy2

x(4.6)

The hyperbolic length of any curve y = y(x) between two given points (a, y(a)) and (b, y(b)) isgiven by the functional

L[y] :=∫ b

a

√dx2 + dy2

xChoosing the variable x as independent, we get∫ b

a

√dx2 + dy2

x=

∫ b

a

√1 + y′2

xdx

In the variational problem occurs the function

F(x, y, y′) =

√1 + y′2

xThe Euler-Lagrange equation becomes particularly simple. Since the variable y does not occur inthe functional F, we can immediately perform one integration and get

ddx

∂∂y′

√1 + y′2

x

= 0

∂y′

√1 + y′2

x= c

y′

x√

1 + y′2= c

y′2 = (1 + y′2)c2x2

y′2(1 − c2x2) = c2x2

y′ =cx

√1 − c2x2

Here c denotes a constant independent on x. Of course the value of c can still depend on thecoordinates of the endpoints. The last line is a first order differential equation. If c = 0, we getthe solution y = const. The minimizing curve is a Euclidean line perpendicular to the boundary. Ifc , 0, we do a further integration and obtain

y = y0 +

∫cx dx√

1 − c2x2

We substitute v = 1 − c2x2 and dv = −2c2xdx to obtain

y = y0 −12c

∫dv√

v

= y0 −

√v

c= y0 ∓

√c−2 − x2

This is the equation of an circular arc with center (0, y0) and radius |c|−1.

Topics from Relativity 61

4.4. The minimum of hyperbolic length I go now back to the more commonly used upperhalf-plane. For the convenience of the reader, I use variables x and y. The upper half plane is(x, y) : y > 0 and has the metric

ds =

√dx2 + dy2

y(4.7)

The minimum of the hyperbolic length of any connecting curve determines the hyperbolic distancesbetween two points. Given are two points A with Euclidean coordinates (xA, yA) and B withcoordinates (xB, yB). In the case xA , xB, the minimizing curve of connection is a circular arcwith center on the x-axis. 1 The equation of such an arc is

y = +√

r2 − (x − x0)2 (4.8)

The radius r > 0 and the center (x0, 0) are to be determined from the coordinates of the two pointsA and B.

Problem 4.2. We can check directly that the function (4.8) is a solution of the Euler-Lagrangeequation for the functional

F(x, y, y′) =

√1 + y′2

y

(a) Confirm thatddx

∂F∂y′−∂F∂y

=1 + y′2 + y′′y

y2(1 + y′2)√

1 + y′2

(b) Check that the derivatives of function (4.8) for the upper half-circle satisfyy′2 + yy′′ + 1 = 0.

The hyperbolic length of the arc is

s(A, B) =

∫ xB

xA

√dx2 + dy2

y=

∫ xB

xA

√1 + y′2

ydx

We need to differentiate the square root composite function (4.8) occurring inside the integral andget

y′ =x − x0√

r2 − (x − x0)2

1 + y′2 =r2

r2 − (x − x0)2 =r2

y2

We plug into the distance functional and obtain

s(A, B) =

∫ xB

xA

√1 + y′2

ydx

=

∫ xB

xA

r dxy2

=

∫ xB

xA

r dxr2 − (x − x0)2

1 I leave the special case xA = xB as an exercise.

62 F. Rothe

This integral can be calculated by means of the partial fraction decompositionr

r2 − (x − x0)2 =1

2(r − x + x0)+

12(r + x − x0)

s(A, B) =

∫ xB

xA

dx2(r + x − x0)

+

∫ xB

xA

dx2(r − x + x0)

=

[12

ln(r + x − x0) −12

ln(r − x + x0)]xB

xA

It remains to check that this result agrees with formula (4.3). We calculate the logarithm of thecross ratio of the two endpoints A and B of the given segment, and the ideal endpoints P andQ of the hyperbolic line through these two points. We assume xA < xB. The points A, B, P,Qoccur during a clockwise turn around the circle. The Euclidean coordinates of the endpoints are(xP, yP) = (x0 + r, 0) and (xQ, yQ) = (x0 − r, 0). Hence the cross ratio, and its logarithm are

(AB, PQ) =|AP| · |BQ||AQ| · |BP|

ln(AB, PQ) =12

ln|AP|2

|AQ|2−

12

ln|BP|2

|BQ|2

=12

[ln

(r − x + x0)2 + y2

(r + x − x0)2 + y2

]xA

xB

=12

[ln−2(x − x0)r + 2r2

2(x − x0)r + 2r2

]xA

xB

=12

[ln−x + x0 + rx − x0 + r

]xA

xB

in agreement with the result (??). In the special case xA = xB, the minimizing curve is a Euclideanline perpendicular to the x-axis. We leave the calculation of the distance in the special case as anexercise.

4.5. Some useful reflections in the half-plane

Problem 4.3. Check how the mapping

w = i1 − z1 + z

(4.1)

maps the boundaries ∂D 7→ ∂H. Confirm that a point z = eiθ is mapped to w = tan θ2 .

Use the inverse mapping

z =i − wi + w

(4.2)

to confirm the identities

cos θ =1 − tan2 θ

2

1 + tan2 θ2

and sin θ =2 tan θ

2

1 + tan2 θ2

Problem 4.4. Let S α denote the reflection across the line with ends α ∈ R and∞. Confirm that

S α(w) = 2α − w (4.9)

for any α ∈ R. Use this result for an easy check that

S α+β = S αS 0S β (4.10)

holds for any α, β ∈ R.

Topics from Relativity 63

Problem 4.5. Let Mγ denote the reflection across the line with ends γ > 0 and −γ. Confirm that

Mγ(w) =γ2w|w|2

(4.11)

for any γ > 0. Use this result for an easy check of

Mγδ = MγM1Mδ (4.12)

Problem 4.6. Confirm that S 0Mγ = MγS 0 and

MγM1S αM1Mγ = S αγ2 (4.13)

for any α, γ ∈ R.

64 F. Rothe

5. Equation of motion

The motion of a particle or photon in a gravitational field can in principle be determined byeither one of the following three principles:

(A) "The trajectory is even on itself".

(B) "The geodesic is an extremal for the proper time or distance."

(C) The quadratic Lagrangian integral is stationary.

5.1. Affine geodesic Alternative (A) leads to the equation of parallel transport to be postulatedfor the tangent of the trajectory. With any parameter u, consider the trajectory u 7→ xa(u) withtangent ta = dxa

du . Using the definition 1.12 of the intrinsic derivative of a vector along a curve,postulate (A) gives the equation of motion

Dtdu

= λ(u)t (5.1)

The arbitrary function λ(u) depends on the choice of the parameter. Using equation (1.26), we maywrite out (5.1) in coordinates:

d2xa

du2 + Γabc

dxb

dudxc

du= λ(u)

dxa

du(5.2)

Definition 5.1. The solutions of the equation of motion (5.2) with [ dxa

du ] , 0 everywhere define theaffine geodesics. A parameter with λ(u) ≡ 0 is called an affine parameter.

Lemma 5.1. There exist affine parameters. Any two affine parameters v and v′ are relatedby v′ = Av + B with any constants A , 0 and B . One may find the bijective substitution(reparametrization) v = v(u) for which λ(v) = 0 holds along the entire curve as a solution ofthe linear differential equation

d2vdu2 = λ(u)

dvdu

(5.3)

Proof. Putting v = v(u) into the equation (5.2) for the affine geodesic, one obtains

0 =d2xa

du2 + Γabc

dxb

dudxc

du− λ(u)

dxa

du

=

[d2xa

dv2 + Γabc

dxb

dvdxc

dv

] (dvdu

)2

+dxa

dv

[d2vdu2 − λ(u)

dvdu

]A nonconstant solution of the linear differential equation (5.3) exists and has dv

du , 0 everywhere.Hence one obtains

d2xa

dv2 + Γabc

dxb

dvdxc

dv= 0 and

[dxa

dv

], 0

as expected.

Lemma 5.2. For an affine parameter, the affine equation of motion (5.2) has the followingequivalent forms

xa + Γabc xb xc = 0 (5.4)

d (gab xb)du

− Γcab xc xb = 0 (5.5)

where the Christoffel symbol is defined by equation (1.41). In the second form, we use the covariantvelocity (or momentum) xc = gcd xd.

Topics from Relativity 65

5.2. Metric geodesic Following alternative (B) the geodesic,—more precisely metric geodesic,—is defined to be an extremal for the proper time or distance. The Euler-Lagrange equation for thisvariational problem is the geodesic equation. Let xa = xa(u) be a parametric equation for the curveusing any, not necessarily affine parameter u. For fixed endpoints P and Q, we have to solve thevariational problem

δ

∫ Q

Pds = 0 or δ

∫ Q

Pdτ = 0 (5.6)

with the line element ds =√

Ldu or cdτ =√

Ldu and the Lagrangian

L := gab xa xb (5.7)

Here the derivative by any arbitrary parameter u is denoted by a dot. Without any furtherassumption, the quantity

s =dsdu

=√

L

may depend on the parameter u. We see immediately that only trajectories may be considered forwhich the tangent vector is either everywhere time-like (material particles), or everywhere space-like. In the second case one needs to put s =

√−L. Only the first alternative will be pursued here.

The Euler-Lagrange equation for the variational problem (5.6) is

ddu

∂√

L∂xa −

∂√

L∂xa = 0 (5.8)

Lemma 5.3. Even without any further assumption on the parameter u, the Euler-Lagrangeequation (5.8) has the following equivalent forms

ddu

(gab xb

√L

)−

(∂agbc)xb xc

2√

L= 0 (5.9)

d gab xb

du−

12

(∂agbc)xb xc =ss

gab xb (5.10)

xa +

a

b c

xb xc =

ss

xa (5.11)

where the Christoffel symbol is defined by equation (1.41).

Checking the calculations.

ddu

(gab xb

√L

)−

(δagbc)xb xc

2√

L= 0

ddu (gab xb)s − (gab xb)s

s2 −(∂agbc)xb xc

2s= 0 from

√L = s and the quotient rule;

d gab xb

du−

12

(∂agbc)xb xc =ss

gab xb

(∂cgab)xc xb + gab xb −12

(∂agbc)xb xc =ss

gab xb from product and chain rule;

gab xb +12

(∂cgab + ∂bgac − ∂agbc)xb xc =ss

gab xb multiplying by gda one gets

xd +

d

b c

xb xc =

ss

xd

66 F. Rothe

Let the rest mass be m > 0 and let c denote the speed of light. We assume the tangent vector iseverywhere time-like along the geodesic. This case occurs for the motion massive particles. Herethe most convenient choice of curve parameter is the proper time τ defined by

L := gabdxa

dτdxb

dτ= c2 (5.12)

to hold.

Lemma 5.4. There exists parameters for which s ≡ 0 and hence the Lagrangian is constant alongthe geodesic. Any two such parameters v and v′ are related by v′ = Av + B with any constantsA , 0 and B . One may find the bijective substitution (reparametrization) τ = τ(u) by integrating

cdτdu

=√

L (5.13)

and thus introduce the proper time τ as curve parameter.

5.3. The quadratic Lagrangian We now consider alternative (C), which turns out to be a veryattractive possibility. For fixed endpoints P and Q, we have to solve the variational problem

δ

∫ Q

PLdu = 0 (5.14)

for the quadratic LagrangianOne obtains the Euler-Lagrange equation

ddu

∂L∂xa −

∂L∂xa = 0 (5.15)

Lemma 5.5. The Euler-Lagrange equation (5.15) has the following equivalent forms

d (gab xb)du

−12

(∂agbc)xb xc = 0 (5.16)

xa +

a

b c

xb xc = 0 (5.17)

dxa

du−

c

a b

xc xb = 0 (5.18)

where the Christoffel symbol is defined by equation (1.41). In the third form, we use the covariantvelocity (or momentum) xc = gcd xd.

Proposition 5.1. The Euler-Lagrange equation (5.15) is equivalent to the affine geodesic equa-tion (5.4) if and only if the torsion tensor satisfies

T abc + T a

cb = 0 (5.19)

Especially, for a symmetric connection, the Euler-Lagrange equation (5.15) is equivalent to theaffine geodesic equation (5.4).

Proof. From formula (1.44) one concludes

xa + Γabc xb xc = xa +

a

b c

xb xc + T a

bc xb xc

Topics from Relativity 67

Lemma 5.6. Along the trajectories of equation (5.16), the quadratic Lagrangian

L ≡ gab xa xb = Const (5.20)

is a constant of motion. Under the assumption that this constant of motion L > 0 is positive, theparameter u is automatically u = As + B or u = Aτ+ B with some constants A , 0 and B. Thus wehave obtained as solution for equation (5.10) with s ≡ 0, too.

Proof. We differentiate the quantity L by the parameter u, denoting any derivatives by u with adot. The Leibnitz product rule yields

d gab xa xb

du=

d gab

duxa xb + gab xa xb + gab xa xb

= 2d gab xb

duxa −

d gab

duxa xb = (∂agbc)xb xc xa − (∂cgab)xa xb xc = 0

and confirms that L is a constant of motion.

The situation with L > 0 holds always for a positive definite metric, as usually assumed inclassical differential geometry. Too, in general relativity one gets L > 0 for the time-like geodesics,which are the paths of massive particles. In all these cases the parameter is automatically the arc-length or proper time, up to a linear transformation. Indeed, under the assumption that L > 0, thedefinition of arc-length,—respectively proper time,—gives

d2sdu2 =

d√

Ldu

=1

2√

L

d Ldu

= 0

and hence u = As + b, or u = Aτ + B, respectively.

Theorem 5.1. Each solution of the variational problem

δ

∫ Q

PL du = 0 (5.14)

with the quadratic Lagrangian L has the Lagrangian as a constant of motion. Under the additionalassumption that this constant of motion L > 0 is positive, the curve parameter u is automaticallyproportional to the arc-length or proper time; and the trajectory satisfies the variational problem

δ

∫ Q

Pds = 0 (5.6)

for the arc-length or proper time, too.Conversely, from solution of the variational problem (5.6) by introducing the arclength or

proper time as parameter, one obtains a solution of the variational problem (5.14) for the quadraticLagrangian L.

The geodesic equation has the two following two equivalent forms, which I nickname the"physical form" and the "Christoffel form":

ddλ

(gab xb

)−

12

(∂agbc)xb xc = 0 (5.21)

dxa

dλ+ Γa

bc xb xc = 0 (5.22)

68 F. Rothe

Problem 5.1. As a first benefit, one can use the equivalence of the two forms (5.21) and (5.22)for the geodesic equation in order to calculate the connection coefficients Γa

bc of a symmetricconnection. As an example, we take the spherical coordinates (r, θ, φ) of the Euclidean R3. Theyhave the metric

ds2 = dr2 + r2dθ2 + r2 sin2 θdφ2

Get all non-zero connection coefficients.

Answer. The r-component of the physical form of the geodesic equation yieldsddλ

(grr r) −12

(∂rgθθ)θ2 −12

(∂rgφφ)φ2 = 0

r − r θ2 − r sin2 θ φ2 = 0

Γrrr = 0 , Γr

θθ = −r , Γrφφ = −r sin2 θ

The θ-component of the physical form of the geodesic equation yieldsddλ

(gθθθ

)−

12

(∂θgφφ)φ2 = 0

r2θ + 2rrθ − r2 sin θ cos θ φ2 = 0

Γθθr + Γθrθ = 2r−1 , Γθφφ = − sin θ cos θ

one has to use that the connection is symmetric to end up with Γθθr = r−1.The φ-component of the physical form of the geodesic equation yields

ddλ

(gφφφ

)= 0

2r sin2 θrφ + 2r2 sin θ cos θ θφ + r2 sin2 θ φ = 0

Γφ

rφ = r−1 , Γφθφ = cot θ

again one has used that the connection is symmetric to cancel the factor 2.

5.4. Null geodesics In general relativity the null-geodesics with L ≡ 0 are the paths of photons.Here are two among several possibilities to find and justify the equation of motion for the photon:• The affine geodesic equation (5.4) may be applied, and an affine parameter is still available;• In the variational problem

δ

∫ Q

Pdτ = 0 (5.6)

for the proper time, one takes the limit rest mass to zero.I define the physical affine parameter as

λ =τ

mwhere m denotes the rest mass. The equations governing the four-momentum

pa =dxa

dλand p0 =

dx0

dλ=

Ec

= cdtdλ

and pa pa = m2c2 (5.23)

hold independently of the rest mass. In the limit m → 0, they remain valid and are still physicallymeaningful. Indeed, in this way one obtains correct dynamical equations for the photon.

The affine geodesic equation (5.4) may be written as the systemdxa

dλ= pa

dpa

dλ= Γc

ab pcdxb

(5.24)

Topics from Relativity 69

which is an initial value problem for the functions xa(λ), pa(λ). Quite similarly, at least for m > 0,the Euler-Lagrange equation (5.15) may be written as

dxa

dλ= pa

dpa

dλ=

12

(∂agbc)pc dxb

(5.25)

Proposition 5.2 (Local smoothness). The solutions of both initial value problems (5.24) and(5.25) depend smoothly on the rest mass and the physical affine parameter. Especially, bothproblems are well posed for rest mass m = 0.

More precise: assume the respective initial value problem has a solution for some mass m ≥ 0and some initial value [xa], [pa] such that pa pa = m2c2 and [pa] , 0, existing on an open intervalincluding the closed interval [Λ1,Λ2] 3 0 including zero, and having everywhere nonzero four-momentum [pa] , 0. Then there exist open neighborhoods for this initial value, this mass m andthe interval [Λ1,Λ2] on which the initial value problem is solvable. Moreover, the solution dependssmoothly on these parameters and the four-momentum [pa] , 0 is nonzero everywhere.

Remark. Still a blow-up of the solution after a finite "parameter-lifetime" is possible. The end ofexistence may occur for a photon even slightly sooner than for a massive particle.

Proof. Because of the conservation law

gab pa pa = m2c2 = Const

and the signature of the metric, one obtains bounds cp0 ∈ [Emin, Emax] > 0 for all the energy valuesoccurring within parameters m ∈ [0,M] and λ ∈ [−Λ,+Λ]. Hence

dtdλ

= p0/c ∈ [c−2Emin, c−2Emax]

which allows to use the coordinate time as independent variable. With this substitution, the Euler-Lagrange equation (5.15) are transformed into

dxa

dt= c

pa

p0

dpa

dt=

12

(∂agbc)pc dxb

dt

We have set up a system which remain meaningful for rest mass m = 0. Choose T ≥ c−2Emax.The smooth dependence of the solution for m ∈ [0,M] and t ∈ [−T,+T ] is a standard theoremfrom differential equations. Hence smooth dependence for the solution of equation (5.25) andthe Euler-Lagrange equation (5.15) holds within any bounded parameter ranges m ∈ [0,M] andλ ∈ [−Λ,+Λ],—under the only assumption that pa , 0 holds for all four-momentum valuesoccurring. The proof for the affine geodesic equation (5.4) is similar.

5.5. The method The equation of motion can be derived either from

(A) parallel transport;

(B) the extremum for proper time;

(C) the variational principle for the quadratic Lagrangian integral.

70 F. Rothe

Methods (A) and (C) lead to well-posed initial value problems, with solutions which dependsmoothly on the rest mass m ≥ 0 and the physical affine parameter λ. Moreover, under theadditional assumption of a symmetric connection, we have shown that methods (A) and (C) lead tothe same initial-value problem.

For the (somehow most attractive) method (B) we have obtained a well-posed initial valueproblem only for massive particles. Indeed, for positive rest mass m > 0, the solution of the initialvalue problem depends smoothly on m > 0 and the proper time τ. Moreover, under the restrictionof positivity of mass, methods (B) and (C) turn out to be essentially equivalent.

The method (C) allows to include the case of zero rest mass continuously. On the otherhand, method (B) may be physically more justified because of its relation to the principles ofsuperposition of waves. These principles of interference are well-known to hold both for massiveparticles and light. Hence there are reasons to believe that the method (B) can be extendedsmoothly, to include the case of massless particles, too.

In case, one accepts such a postulate, the unique smooth extension to rest mass m→ 0 obtainedfrom method (C), is the physically meaningful extension of method (B) to rest mass m → 0, too,since for positive mass, methods (B) and (C) are essentially equivalent.

Corollary 6 ("The method"). Assuming the connection is symmetric, the same equation ofmotion are obtained either from parallel transport, the extremum of proper time, or the variationalprinciple for the quadratic Lagrangian. Both for photons and massive particles, the equation ofmotion may be obtained as solution of the variational problem

δ

∫ Q

PL dλ = 0 (5.26)

with the quadratic Lagrangian

L := gabdxa

dλdxb

dλ(5.27)

In place of u, we have to use the physical affine parameter λ = τ/m; and denote derivative by λwith a dot. The physical momentum of the particle or photon is pa = xa, and may be used,—andindeed shows up,—both in its original contravariant, as well as the covariant form pa = gab pb.One obtains (and easily checks) the equations of motion

dxa

dλ= pa

dpa

dλ=

12

(∂agbc)pc pb(5.28)

Automatically, the Lagrangian is a constant of motion. Because of the choice of the physical affineparameter, its value is fixed to be

L = pa pa = m2c2 (5.29)

5.6. Killing vector

Definition 5.2 (Isometry). A point transformation x′ = x′(x) on a Riemannian manifold is calledan isometry iff g′ab(x) = gab(x) holds for all coordinates x and indices a, b.

Definition 5.3 (Killing vector). A vector field [Ka] is a Killing vector field or simply Killing vectoriff the flow

∂Φa(ε, x)∂ε

= Ka(x) (5.30)

produces isometries.

Topics from Relativity 71

Proposition 5.3. A vector field K is a Killing vector field if and only if the Lie derivative of themetric along the vector field vanishes:

LK gab = 0 (5.31)Kc∂cgab + (∂aK s)gsb + (∂bK s)gas = 0 (5.32)

Independent proof. The flow defined by the initial value problem (5.30) gives for small ε theinfinitesimal transformations

x′a = Φa(ε, x) = xa + εKa(x) + O(ε2)

Under these point transformations the metric is transformed by

g′ab(x′) =∂xd

∂x′a∂xe

∂x′bgde(x) = gab(x) − ε(∂aKd)gdb − ε(∂bKe)gae + O(ε2)

At the same coordinate x one gets as transformation of the metric

g′ab(x) − gab(x) = [g′ab(x) − g′ab(x′)] + [g′ab(x′) − gab(x)]

= −εKc∂cgab − ε(∂aKd)gdb − ε(∂bKe)gae + O(ε2)

Assuming that the flow (5.30) of the vector field K induces isometries for all ε, we conclude thatthe first order term is zero and get equation (5.32). Conversely, let us assume that equation (5.32)holds identically on the entire manifold. Then the Killing flow induces transformations of themetric such that

Φ(ε, g)(x) = g′(x) = Product o f Jakobians · g(Φ(−ε, x))

= g(x) − ε[Kc∂cgab + (∂aKd)gdb + (∂bKe)gae

]ωa ⊗ eb + O(ε2)

= g(x) + O(ε2)

for all ε and hence g′ = g for all ε on the entire manifold. This means that the Killing flow inducesisometries.

Proposition 5.4. Given is the vector field [Ka] on a Riemannian manifold. The scalar productS = Ka xa is a constant of motion along all geodesics λ 7→ xa(λ) if and only if the Killing equationholds:

∇aKb + ∇bKa = 0 (5.33)

Proof. The covariant derivative of the scalar product S = Ka xa = Ka xa along the geodesicis calculated. Since Leibniz’ rule holds for covariant derivatives, and the geodesic equation isassumed

S =dSdλ

= (∇bKa xa)xb = (∇bKa)xa xb + Ka(∇b xa)xb (5.34)

= (∇bKa)xa xb + Ka Dxa

dλ=

12

(∇aKb + ∇bKa)xa xb (5.35)

Hence the scalar product S = Ka xa is a constant of motion along all geodesics if and only if theKilling equation (5.33) holds for the vector field [Ka].

Remark. Note that for the velocities and corresponding momenta, the index is lowered and liftedby the rules

xa = gab xb and xa = gab xb

pa = gab pb and pa = gab pb

even if the metric is not constant.

72 F. Rothe

Corollary 7. Assume the metric does not depend on the index c, and the connection is symmetric.Under that assumption, the vector field Ka := δ c

a is a Killing vector. Hence the correspondingcovariant velocity uc = gca ua and momentum pc = gca pa is constant of motion.

Proof.

∇aKb = ΓbsaK s = Γb

ca

∇aKb = Γbca

∇aKb + ∇bKa = Γbca + Γacb = Γbac + Γabc = δcgab = 0

Corollary 8. Assume the metric satisfies Kcδcgab = 0 for a constant vector [Kc], and theconnection is symmetric. Under that assumption, the product

S = gabKaub

and the corresponding momentum is a constant of motion.

Proof.

∇aKb = ΓbsaK s

∇aKb = ΓbsaK s = ΓbasK s

∇aKb + ∇bKa = K sδsgab = 0

Hence K is a Killing vector and the scalar product S = gabKaub is a constant of motion.

Theorem 5.2. Let K be a vector field on a Riemannian manifold with symmetric connection.Equivalent are

(i) K is a Killing vector, which means the flow

∂Φa(ε, x)∂ε

= Ka(x) (5.30)

produces isometries.

(ii) The infinitesimal point transformations x′a = xa + εKa(x) + O(ε2) are isometries up to orderO(ε2).

(iii) The Lie derivative of the metric along the vector field vanishes LK gab = 0.

(iv) Kc∂cgab + gbc∂aKc + gac∂bKc = 0

(v) The Killing equation ∇aKb + ∇bKa = 0 holds identically.

(vi) The scalar product S = Ka xa = Ka xa is constant along all geodesics.

Proof. We need still to check that the Killing equation is equivalent to item (iv).

∇aKb + ∇bKa = ∇a(gbcKc) + ∇b(gacKc) = gbc∇aKc + gac∇bKc

= gbc(∂aKc + K sΓcsa) + gac(∂bKc + K sΓc

sb)= gbc∂aKc + K sΓbsa + gac∂bKc + K sΓcsb = gbc∂aKc + gac∂bKc + K s(Γbas + Γcbs)= K s∂sgab + gbc∂aKc + gac∂bKc

Topics from Relativity 73

Problem 5.2. Check that the metric

ds2 = dr2 + r2dθ2 + r2 sin2 θdφ2 (??)

has the Killing vector K = r2 sin φ dθ + r2 cos θ sin θ cos φ dφ.

Answer. We may directly check the Killing equation (5.33). The non-zero Christoffel symbols havebeen calculated in problem 5.1

Γrrr = 0 , Γr

θθ = −r , Γrφφ = −r sin2 θ

Γθθr = r−1 , Γθφφ = − sin θ cos θ

Γφ

rφ = r−1 , Γφθφ = cot θ

To check whether ∇aKb + ∇bKa = 0 calculate

∇rKθ + ∇θKr = ∂rKθ − ΓsθrKs + ∂θKr − Γs

θrKs = ∂rKθ − 2ΓθθrKθ = 2r sin φ − 2r−1r2 sin φ = 0

∇rKφ + ∇φKr = ∂rKφ − 2ΓsφrKs = 2r cos θ sin θ cos φ − 2r−1r2 cos θ sin θ cos φ = 0

∇θKθ = ∂θKθ − ΓsθθKs = ∂θKθ = 0

∇φKφ = ∂φKφ − ΓθφφKθ = −r2 cos θ sin θ sin φ + sin θ cos θr2 sin φ = 0

∇θKφ + ∇φKθ = ∂θKφ + ∂φKθ − 2 cot θKφ = 0

Problem 5.3. Find a more inspired solution of problem 5.3 without use of Christoffel symbols.

Problem 5.4. Fear Schopenhauer’s mousetraps and do no mousetrap proofs! Give the reasonbehind the last calculation. Why does the Killing flow maps geodesics into geodesics, and evenwhy do these have the same two constants of motion: both Ka xa as well as gab xa xb.

Lemma 5.7. For any vector field [Ka] holdsLKKa = 0. For a Killing vector field holdsLKKa = 0,too.

Proof.

LKKa = (K s∂s)Ka − K s(∂sKa) = 0

LKKa = LK(gabKb) = (LKgab)Kb + gabLKKb = 0

The first term is zero by proposition 5.3. The second term is zero by first line above.

Lemma 5.8. Let xa = xa(λ, ε) be a family of curves satisfying

∂xa(λ, ε)∂ε

= va(x) (5.36)

In other words, the curve xa = xa(λ, 0) is transported by the flow of vector field [va]. ThenLv xa = 0,where the dot denotes partial derivative by curve parameter λ.

Proof.

Lv xa = (vs∂s)xa − xs(∂sva) =∂2xa

∂ε ∂λ− xs(∂sva) =

dva

dλ−∂xs

∂λ(∂sva) = 0

74 F. Rothe

Proposition 5.5. Let xa = xa(λ, ε) be a family of curves satisfying

∂xa(λ, ε)∂ε

= Ka(x) (5.37)

Moreover, assume the curve xa = xa(λ, 0) a geodesic and it is transported by the flow of Killingfield [Ka].

(i) The curves λ 7→ xa(λ, ε) are geodesics for all ε.

(ii) The Lagrangian L = gab xa xb is a independent of both λ and ε.

(ii) The quantity S = gabKa xb is a independent of both λ and ε.

Proof of item (i). This is left to the reader.

Proof of item (ii).

∂L∂ε

= LK(gab xa xb) = (LKgab)xa xb + 2gab xaLK xb = 0

The first term is zero by proposition 5.3. The second term is zero by lemma 5.8.

Proof of item (iii).

∂S∂ε

= LK(gabKa xb) = (LKgab)Ka xb + gab(LKKa)xb + gabKaLK xb = 0

The first term is zero by proposition 5.3. The second term is zero by lemma 5.7. The third term iszero by lemma 5.8.

Topics from Relativity 75

6. Geodesics in the Schwarzschild metric

The Schwarzschild metric is the solution of Einstein’s field equation for a mass M at the center.

ds2 = c2(1 −

r∗

r

)dt2 −

(1 −

r∗

r

)−1

dr2 − r2dθ2 − r2 sin2 θdφ2 (6.1)

Herer∗ =

2GMc2 (6.2)

is the Schwarzschild radius. For the sun r∗ ≈ 2.96 km.

Problem 6.1. Write down the "physical form" of the geodesic equations in the Schwarzschildmetric, with proper time as parameter.

ddτ

[(1 −

r∗

r

)dtdτ

]= 0 (6.3)

ddτ

(1 − r∗

r

)−1 drdτ

= −12

(∂rgbc)xb xc (6.4)

ddτ

[r2 dθ

]= r2 sin θ cos θ

(dφdτ

)2

(6.5)

ddτ

[r2 sin2 θ

dφdτ

]= 0 (6.6)

Remark. The right-hand side of the radial equation is a nuisance, but can be avoided.

The orbits lie in plane. For simplicity I may choose the plane θ ≡ 90. Since the Schwarzschildmetric does not depend on time t nor angle φ, one gets two constants of motion(

1 −r∗

r

)dctdτ

=Eesc

mc≡ const (6.7)

r2 dφdτ

=lm≡ const (6.8)

The physical meaning of these integration constants can be determined from the limit r → ∞

where special relativity holds. It turns out that Eesc is the energy of the particle or planet escaped toinfinity. The relativity parameter is γ = Eesc/(mc2). One gets the slow-down of proper time t = γτas known from special relativity.

Let pesc be the momentum of the escaped particle. From special relativity we know theimportant formula

E2esc = m2c4 + c2 p2

esc

It turns out that l is the angular momentum of the particle or planet around the z-axis. Let usintroduce the impact parameter b to be the perpendicular distance of the line of straight motion ofthe particle from the center mass. Suppose that the particle escapes along a line x(τ)ex + bey. Withx = cos φ, y = r sin φ, one gets the angular momentum

l = mr2φ = −xmy + ymx ' bmx = bpesc (6.9)

as expected.

76 F. Rothe

Remark. For bounded orbits we may still use the parameters pesc ∈ i[0,∞) and b = lpesc ∈ i[0,∞),but with imaginary values. The boundary case of parabolic motion has pesc = 0 and any valueA ∈ [0,∞) for the semilatus rectum and l ∈ [0,∞).

The integration of the equations of motions becomes possible since the Lagrangian is thefurther constant of motion. With the proper time τ as curve parameter

L := gabdxa

dτdxb

dτ= c2 (5.12)

is the constant value of the Lagrangian. For the Schwarzschild metric this identity reduces to

c2(1 −

r∗

r

) (dtdτ

)2

(1 −

r∗

r

)−1 (drdτ

)2

− r2(

dθdτ

)2

− r2 sin2 θ

(dφdτ

)2

= c2

and is further simplified by use of θ ≡ 90. Too, I use from now on the simpler dot notation.

c2(1 −

r∗

r

)t2 −

(1 −

r∗

r

)−1

r2 − r2φ2 = c2 (6.10)

A bid of arithmetic is still needed. Then I further simplify by use of the constants of motion fromequations (6.7) and (6.8).

c2(1 −

r∗

r

)2

t2 − r2 − r2(1 −

r∗

r

)φ2 = c2

(1 −

r∗

r

)r2 + r2

(1 −

r∗

r

)φ2 −

c2r∗

r= c2

(1 −

r∗

r

)2

t2 − c2

r2 +

(1 −

r∗

r

)l2

m2r2 −c2r∗

r=

E2esc

m2c2 − c2 ≡p2

esc

m2

The last equation may be multiplied by m/2 to get the energy balance for the Kepler motion, withjust one extra term! Too, we shall need the angular equation of motion.

m2

r2 +

(1 −

r∗

r

)l2

2mr2 −GMm

r=

p2esc

2m(6.11)

mr2φ = l ≡ bpesc (6.12)

For the discussion of motion of photons, I use the physical affine parameter λ = τ/m, for whichas expected, one gets equations which are meaningful in the limit m → 0. Equation (6.11)is multiplied by 2m and the affine parameter λ is introduced. Too, we need the equation ofmotion (6.7)for the time and the angular equations of motion (6.8). Altogether we obtain asequations of motion for photons and massless particles(

drdλ

)2

+

(1 −

r∗

r

)l2

r2 = p2esc ≡

l2

b2 (6.13)(1 −

r∗

r

)dctdλ

=Eesc

c≡ pesc (6.14)

r2 dφdλ

= l ≡ bpesc (6.15)

Problem 6.2. Write a paragraph on the rotation of the perihelion of mercury.• How does one proceed from equations of motion (6.11) and (6.12) to get an equation about

the shape of the orbit.

Topics from Relativity 77

• One does an expansion in powers of r∗, respectively c−2, which is a small parameter. What isthe zeroth order term obtained with r∗ = 0.

• One puts r−1 =: u = u0 + u1 + . . . . Which equation does one get for u1.• Which formula for the perihelion rotation is obtained?

6.1. The equation for the shape of relativistic orbits For the equation about the shape of theorbit, we use the variable u := r−1 and need an equation of motion for the function u = u(φ). Since

dudφ

= −1r2

drdφ

= −r

r2φ= −

mrl

(6.16)

we multiple equation (6.11) by 2ml−2 and substitute to obtain(dudφ

)2

+ (1 − r∗u)u2 −2GMm2

l2u =

p2esc

l2

This step excludes the case l = 0 of radial motion. As a convenient geometric quantity, one mayintroduce the semilatus rectum

A :=l2

GMm2 ≥ 0 (6.17)

and gets (dudφ

)2

+ u2 −2uA− r∗u3 =

p2esc

l2(6.18)

Remark. Equation (6.18) is valid both for unbounded as well as bounded orbits, as long as l , 0.Note that in the latter case, one needs to use imaginary values pesc ∈ i[0,∞) for the escapemomentum.

For hyperbolic motion the impact parameter b = lpesc from equation (6.9) and the escapemomentum pesc are convenient parameters. Since equations (6.17), (6.9) and (6.2) imply

A =l2

GMm2 =b2 p2

esc

GMm2 =2b2

r∗

( pesc

mc

)2

one obtains (dudφ

)2

+ u2 −r∗

b2

(mcpesc

)2

u − r∗u3 =1b2 (6.19)

Remark. Again the equation (6.19) is valid both for unbounded orbits with pesc > 0, as wellas bounded orbits. But in the latter case, one needs to use imaginary values pesc ∈ i(0,∞) andb = lpesc ∈ i(0,∞).

Remark. The boundary case of parabolic motion occurs for pesc = 0 and any value of the angularmomentum l ∈ [0,∞). The semilatus rectum from equation (6.17) still exists and takes any valueA ∈ [0,∞). For A , 0, one get an impact parameter formally b = ∞. For the parabolic motion andA , 0, one gets the equation of shape(

dudφ

)2

+ u2 −2uA− r∗u3 = 0 (6.20)

78 F. Rothe

6.2. Kepler’s classical nonrelativistic orbits One does an expansion in powers of r∗, respec-tively c−2, which is a small parameter. The zeroth order term is obtained by putting r∗ = 0 intoequation (6.18). One obtains the classical approximation of Kepler motion(

dudφ

)2

+ u2 −2uA

= Const ≡p2

esc

l2(6.21)

from which the orbits are obtained to be conic sections. The equation of motion (6.21) is also calledBinet equation. All cases lead to the orbits

u(φ) =1 + e cos φ

A(6.22)

with the eccentricity e ∈ [0,∞) and the semilatus rectum A ∈ (0,∞) as convenient parameters. Onlythe purely radial motion with l = 0 needs to be treated separately. Plugging the solution (6.22) intothe equation of motion (6.21), the reader should check that the equation of motion holds. Moreoverone obtains

e2 − 1A2 =

p2esc

l2=

1b2 (6.23)

to be the relation between the gets geometrical and physical parameters and the impact parameter.The first equation holds in all cases with l > 0. For e , 1 the major half-axis a > 0 satisfies

b2 =A2

e2 − 1= ∓Aa = (e2 − 1)a2 (6.24)

with the upper minus sign for the ellipsis. Hence the equation

b = a√

e2 − 1 (6.25)

holds wonderfully in all cases.These are the different shapes of orbits for l > 0:

• The case p2esc < 0 and u ≡ const gives circles. Eccentricity is e = 0 and the radius is

1/u ≡ r = A =|l||pesc|

• The case p2esc < 0 but u not constant gives ellipses. The eccentricity lie in the range e ∈ (0, 1).

Reusing relation (6.23) and the almost magical semilatus rectum from equation (6.17), themajor half axis turns out to be

a =A

1 − e2 = −l2

Ap2esc

= −GMm2

p2esc

= −r∗

2

( pesc

mc

)−2

into which formula even the Schwarzschild radius from (6.2) fits!• The case p2

esc = 0 gives parabolas, here e = 1. The impact parameter is formally b = ∞.• The case p2

esc > 0 gives hyperbolas, here the eccentricity lies in the range e ∈ (1,∞). themajor half axis is

a =A

e2 − 1=

l2

Ap2esc

=r∗

2

( pesc

mc

)−2

Topics from Relativity 79

6.3. Scattering in Newtonian dynamics From the equation (6.25), we see that the impactparameter is the minor half axis of the hyperbola. The scattering angle θ is supplementary to theangle between asymptotes of the hyperbola. Hence the Cartesian equation

y2

b2 =x2

a2 − 1

implies

cotθ

2=

ba

I have shall use spherical coordinates with the incoming beam in +z direction. One wants to expressthe ray properties in terms of impact parameter b and escape momentum pesc. To this end, we needb2 = aA from good old conics (6.24), the semilatus rectum from equation (6.17), and l = bpesc andproduce

cotθ

2=

ba

=Ab

=l2

GMm2b=

bp2esc

GMm2

For small angles, we get the first order approximation

θ = 2 tanθ

2=

2GMm2

bp2esc

=r∗

bm2c2

p2esc

which is only half of the physical value calculated in equation (6.32) below.For arbitrary scattering angles, but nonrelativistic dynamics, we may obtain the Rutherford

scattering formula. The argument below works both for attractive as well as repulsive interinteractions.

For an incoming ray of intensity I, some fraction dI is scattered into the space angle dΩ =

sin θdθ dφ, and is measured at large distance R from the scatterer as an intensity IR−2 dσ. Thus onedefines the differential cross section dσ. These scattered particles correspond to impact parameter[b, b + db] and rotation angles [φ, φ + dφ]. Hence

dσ = b db dφdσdΩ

=b

sin θdbdθ

Simply differentiate

b =GMm2

p2esc

cotθ

2dbdθ

= −GMm2

2p2esc

sin−2 θ

2

bdbdθ

= −G2M2m4

2p4esc

cosθ

2sin−3 θ

2

dσdΩ

=b

sin θdbdθ

= −G2M2m4

4p4esc

sin−4 θ

2

6.4. Perturbation expansion for relativistic bounded orbits One expands in powers of thesmall parameter r∗ and puts u = u0 + u1 + . . . into the equation of shape (6.18). The zeroth order

80 F. Rothe

term is the classical Kepler ellipse, given by equation (6.22). For the next term of order r∗ we get alinear the equation, which one may solve

2du0

dφdu1

dφ+ 2u0u1 −

2u1

A= r∗u3

0

−e sin φdu1

dφ+ e cos φ u1 =

r∗(1 + e cos φ)3

2A2

u1 =3r∗eφ sin φ

2A2 + periodic terms with only sin φ, cos φ, sin2 φ, cos2 φ

One needs only the secular term proportional to φ sin φ. The period terms with sin φ, cos φ, sin2 φ, cos2 φare below the accuracy of observation.

Problem 6.3. Since expansions work best for linear equation, some people may want to see thefollowing approach. We differentiate the equation of shape (6.18) by the angle φ and obtain

d2udφ2 + u −

1A−

3r∗

2u2 = 0 (6.26)

One expands in powers of the small parameter r∗ and puts u = u0 + u1 + . . . . The zeroth orderterm is the classical Kepler ellipse, given by equation (6.22). Determine and solve the first orderequation. Get once more the secular term of first order and thus check the above calculation.

6.5. The mercury perihelion rotation The solution

u =1 + e cos φ

A+

3r∗eφ sin φ2A2

has a tiny perihelion advance ε per rotation. Successive maxima of the solution take place for φ = 0and φ = 2π + ε.

u′ = −e sin φ

A+

3r∗e(sin φ + φ cos φ)2A2

Ae

u′(2π + ε) = − sin ε +3r∗

2A(sin ε + (2π + ε) cos ε)

' −ε +3r∗

2A[ε + (2π + ε)] + O(ε2)

We need to put this derivative to zero and solve for ε. The terms of order r∗2 are neglected, oncemore. One gets

ε =3πr∗

A=

3πr∗

a(1 − e2)

This value is approximately ε = 3r∗2A ≈ 5 · 10−7. Next I introduce the major half axis and the

eccentricity, easily to be obtained from astronomical tables. For the sun the Schwarzschild radiusis approximately r∗ ≈ 2.96 km. But we may instead use the known velocity c of light, and obtainr∗ from Kepler’s third law. Finally, one needs to convert to arcseconds per century and gets

4π2a3

T 2 = GM =c2r∗

2

ε =3πr∗

a(1 − e2)=

6πr∗

a(1 − e2)4π2a3

c2T 2 =24π2a2

(1 − e2)c2T 2

ε =3600 · 180

π

Tcentury

T24π2a2

(1 − e2)c2T 2

which you may now check with easily obtainable data. The famous value is ε ≈ 43′′

century−1.

Topics from Relativity 81

6.6. Perburtation expansion for the angle of deflection We start from the equation (6.19)(dudφ

)2

+ u2 −r∗

b2

(mcpesc

)2

u − r∗u3 =1b2 (6.19)

for the shape of the orbit. We assume that the Schwarzschild radius r∗ b is a small quantitycompared to the impact parameter, and the particle is either a photon, or at least moving fastenough such that say mc/pesc ≤ 10. To start an perturbation expansion, we put r∗ := 0 and obtainfor the zeroth order term the equation of motion(

dudφ

)2

+ u2 =1b2

which has the solutionu0 =

cos φb

This is simply a straight line with impact parameter b. The goal is to set up the expansionu = u0 + u1 + . . . by powers of r∗. We need the first order term u1 in order to calculate thedeflection angle. From the terms linear in r∗, we get a linear differential equation for u1, which onehas to solve.

2du0

dφdu1

dφ+ 2u0u1 =

r∗

b2

(mcpesc

)2

u0 + r∗u30

−2 sin φdu1

dφ+ 2 cos φu1 =

r∗

b2

(mcpesc

)2

cos φ +r∗

b2 cos3 φ

I take the Ansatz u1 = A + B cos2 φ and compare coefficients of A and B. The reader should checkthe following result:

(2A + 4B) cos φ − 2B cos3 φ =r∗

b2

(mcpesc

)2

cos φ +r∗

b2 cos3 φ

u =cos φ

b+ A + B cos2 φ

u =cos φ

b+

r∗

b2 +r∗

2b2

(mcpesc

)2

−r∗

2b2 cos2 φ

The ingoing and outcoming rays correspond to r → ∞ and hence u = 0. They occur for the polarangles ±φ = 90 + ε/2, where ε is the total deflection. Since cos(90 + ε/2) ' −ε/2 for smallε 1, one gets

0 = −ε

2+

r∗

b2 +r∗

2b2

(mcpesc

)2

+ O(ε2)

and finally the first order approximation for the small bending angle

ε '2r∗

b

1 +12

(mcpesc

)2 + O(r∗2) (6.27)

6.7. The bending of light The equation of shape (6.19) is valid for photons, too. One maysimply put m = 0 and obtains a "Binet-type" equation for the inverse radius u = r−1 as a functionof φ (

dudφ

)2

+ u2 − r∗u3 =1b2 (6.28)

82 F. Rothe

The bending of light is obtained from the perturbation expansion, similarly as in the last section.One obtains the small bending angle in first approximation to be

ε '2r∗

b+ O(r∗2) (6.29)

For illustration of this important result, I give the entire reasoning independently, once more.Clearly we may start with equations of motion (6.13) and (6.15) and eliminate the proper timeby means of

dudφ

= −1r2

drdφ

= −drdλ

(r2 dφ

)−1

= −1l

drdλ

= −1

bpesc

drdλ

(6.30)

Thus one arrives at the same equation (6.28) about the shape of the orbit. An expansion in terms ofpowers of the small parameter r∗ is needed. The zeroth order term is obtained putting r∗ = 0(

dudφ

)2

+ u2 =1b2

which equation has the solution

u0 =cos φ

bAs to be expected, this is simply a straight line with impact parameter b. We set up the expansionu = u0 + u1 + . . . by powers of r∗. We need only the first order term u1 in order to calculate thesmall deflection angle. I illustrate a variant method which is sometimes useful since expansionswork best for linear equations. We differentiate the equation of shape (6.28) by the angle φ andobtain

d2udφ2 + u −

3r∗

2u2 = 0 (6.31)

From the terms linear in r∗, we get a linear differential equation for u1, which one has to solve. Tothis end, I make the Ansatz u1 = A + B cos2 φ and compare coefficients of A and B.

d2u1

dφ2 + u1 =3r∗

2u2

0

2B(1 − 2 cos2 φ) + A + B cos2 φ =3r∗

2b2 cos2 φ

u =cos φ

b+ A + B cos2 φ

u =cos φ

b+

r∗

b2 −r∗

2b2 cos2 φ

The reader should check the above calculation.We may now calculate the total deflection ε of the light ray. The ingoing and outgoing rays

correspond to r → ∞ and hence u = 0. They occur for the polar angles ±φ = 90 + ε/2. Sincecos(90 + ε/2) ' −ε/2 for small |ε| 1, one has to solve

0 = −ε

2+

r∗

b2 + O(ε2)

and finally the first order approximation for the small bending angle

ε '2r∗

b+ O(r∗2) (6.32)

Topics from Relativity 83

7. Gauss’ Differential Geometry and the Pseudo-Sphere

7.1. Introduction Through the work of Gauss on differential geometry, it became clear—after a painfully slow historic process—that there is a model of hyperbolic geometry on surfaces ofconstant negative Gaussian curvature. One particularly simple such surface is the pseudo-sphere.

According to Morris Kline, it is not clear whether Gauss himself already saw this non-Euclidean interpretation of his geometry of surfaces. Continuing Gauss work, Riemann andMinding have thought about surfaces of constant negative curvature. Neither Riemann nor Mindingdid relate curved surfaces to hyperbolic geometry (Morris Kline III, p.888 etc). But, independentlyof Riemann, Beltrami finally recognized that surfaces of constant curvature are non-Euclideanspaces. Due to the ideas forwarded by Gauss, mathematicians have in the end advanced to theconcept of a curved surface as a space of its own interest. Gauss’ work implies that there are non-Euclidean geometries on surfaces regarded as spaces in themselves. An obvious and important ideais finally spelled out!

As we explain in detail below, Beltrami shows that one can realize a piece of the hyperbolicplane on a rotation surface of negative constant curvature. This surface is called a pseudo-sphere.But this new discovery comes with a disappointment: by a result of Hilbert, there is no regularanalytic surface of constant negative curvature on which the geometry of the entire hyperbolicplane is valid (see Hilbert’s Foundations of Geometry, appendix V).

Concerning models of hyperbolic geometry, the final outcome turns out to be a trade off

between the pseudo-sphere and the Poincaré disk. Both have their strengths and weaknesses. Thepseudo-sphere is a model for a limited portion of the hyperbolic plane. Both angle and length arerepresented correctly. The arc length of a geodesic is the correct hyperbolic distance. Furthermore,because of the constant Gaussian curvature, on the pseudo-sphere a figure may be shifted about andjust bending will make it conform to the surface. The situation is similar to the more familiar caseof Euclidean geometry on a circular cylinder or cone. As everybody knows, on a circular cylinder,a plane figure can be fitted by simply bending it, without stretching and shrinking.

On the other hand, only the Poincaré disk is a model for the entire hyperbolic plane. Here onlyangles are still represented correctly, but the price one finally has to pay is that hyperbolic distancesare distorted. The hyperbolic lines become circular arcs, perpendicular to the ideal boundary. Onecan see the distortion easily in Esher’s superb artwork, based on tiling of the hyperbolic plane withcongruent figures.

The trade off just explained makes the isometry between the pseudo-sphere into the Poincarédisk especially interesting. One such isometric mapping is explicitly constructed below. Hilbert’sresult gets rather natural, too. As explained below, in the sense of hyperbolic geometry, theboundary of the pseudo-sphere turns out to be an arc of a horocyle.

7.2. About Gauss’ differential geometry Karl Friedrich Gauss had devoted an immenseamount of work to geodesy and map making, starting 1816. This stimulus leads to his definitivepaper in differential geometry of 1827: "Disquisitiones Generales circa Surperficies Curvas" . Inthis work, Gauss introduces the basics of curved surfaces, and goes far beyond. The real benefit isthat, due to the ideas forwarded by Gauss, mathematicians have in the end advanced to the conceptof a curved surface as a space of its own interest.

To begin with, one imagines a curved surface to be embedded into three dimensional space R3,and given by some parametric equations

x = x(u, v) , y = y(u, v) , z = z(u, v) (7.1)

The distance ds of neighboring points on the surface with parameters (u, v) and (u + du, v + dv) isgiven by the first fundamental form

ds2 = Edu2 + 2Fdudv + Gdv2 (7.2)

84 F. Rothe

The first fundamental form is straightforward to calculate from the parametric equations (7.1) since

E = x2u + y2

u + z2u

F = xuxv + yuyv + zuzv

G = x2v + y2

v + z2v

(7.3)

follows from elementary vector calculus.The geodesics on curved surfaces are defined to be the shortest curves lying on the given

surface, connecting any two given points. Gauss’ work sets up the differential equation forthe geodesics. Gauss introduces the two main curvatures, called c1, c2. They turn out to besimply the extremal curvatures of normal sections of the surface. A new important feature is theGaussian curvature, called K. Gauss shows that K = LN−M2

EG−F2 , the quotient of the determinants of thesecond and first fundamental form. But, even simpler, the Gaussian curvature turns out to be theproduct of the two principle curvatures:

K = c1c2 (7.4)

Gauss shows the remarkable fact that this curvature is preserved during the process of bending thecurved surface inside a higher dimensional space, without stretching, contracting or tearing it. Onthe contrary, the two main curvatures are changed by flexing the surface.

There are actually at least two different proofs for this fact contained in Gauss’ work. The firstone depends on Gauss’ characteristic equation

K =1

2H∂

∂u

[F

EH∂E∂v−

1H∂G∂u

]+

12H

∂v

[2H∂F∂u−

1H∂E∂v−

FEH

∂E∂u

](7.5)

where H =√

EG − F2. Obviously, any such equation implies that the Gaussian curvature dependsonly on the first fundamental form. The first fundamental form is preserved, if one bends thecurved surface in three space, without stretching, contracting or tearing it. Therefore the functionsE, F,G,H which determine the first fundamental form depend only on the parameters (u, v), but donot depend at all on how—or even whether at all—the surface lies in a three dimensional space.Because of the Gauss’ characteristic equation (7.5), the same is true for the Gaussian curvatureK. Because of all that, one says that the Gaussian curvature is an intrinsic property of the curvedsurface.

7.3. Riemann metric of the Poincaré disk

Proposition 7.1 (Riemann Metric for Poincaré’s Model). In the Poincaré model, the infinitesimalhyperbolic distance ds of points with coordinates (x, y) and (x + dx, y + dy) is

(dsD)2 =4(dx2 + dy2)(1 − x2 − y2)2 (7.6)

Reason. The fact that angles are measured in the usual Euclidean way implies that ds2 =

E(x, y)(dx2 + dy2). The rotational symmetry around the center O implies that E(x, y) =

E(√

x2 + y2, 0). Hence

ds2 = E(√

x2 + y2, 0)(dx2 + dy2) (7.7)

Now it is enough to calculate the distance of the points (x, 0) and (x + dx, 0). The hyperbolicdistance of a point (x, 0) from the center (0, 0) is 2 tanh−1 x, as we have derived in Proposition ??

Topics from Relativity 85

in the section on the Poincaré disk model. See formula (??) there, which is of course the primaryorigin of the hyperbolic distance! Taking the derivative by the variable x yields

dsdx

=ddx

(2 tanh−1 x) =ddx

ln1 + x1 − x

=1

1 + x+

11 − x

=2

1 − x2

ds2 =4

(1 − x2)2 dx2(7.8)

Hence E(x, 0) = 4(1−x2)2 and

E(√

x2 + y2, 0) =4

(1 − x2 − y2)2 (7.9)

Now formulas (7.7) and (7.9) imply the claim (7.14).

Problem 7.1 (Hyperbolic circumference of a circle). Calculate the circumference of a circle ofhyperbolic radius R. We use the Poincaré disk, put the center of the circle at the center O of thedisk. In polar coordinates, the Riemann metric is

ds2 = 4dx2 + dy2

(1 − x2 − y2)2 = 4dr2 + r2dθ2

(1 − r2)2

(a) Calculate the hyperbolic length R of a segment OA with Euclidean length |OA| = r < 1.

(b) Get the circumference C =∫

ds of the circle around O, at first in terms of the Euclideanradius |OA| = r < 1.

(c) Get the circumference C of this circle in terms of the hyperbolic radius R.

Solution. We take O as center of the circle, and point A on the circumference. Let r = |OA| denotethe Euclidian radius, and R = s(O, A) be the hyperbolic radius.

The hyperbolic radius R can be found directly for the Riemann metric (7.14). One needs partialfractions to do the integral.

R =

∫ r

0ds =

∫ r

0

2dr1 − r2 =

∫ r

0

[1

1 − r+

11 + r

]dr = [− ln(1 − r) + ln(1 + r)]r

0

= 2 tanh−1 r

Remark. Of course, we can go back once more to Proposition ??, formula (??) from the section onthe Poincaré disk model and get R = s(O, A) = 2 tanh−1 |OA| = 2 tanh−1 r.

We solve R = 2 tanh−1 r for the Euclidean radius and get

r = tanhR2

=eR/2 − e−R/2

eR/2 + e−R/2 =eR − 1eR + 1

For the usual Euclidean polar coordinates (r, θ) we get the Euclidean arc length:

LEucl =

∫ 2π

0

√dx2 + dy2 =

∫ 2π

0

√dr2 + r2dθ2 = r

∫ 2π

0dθ = 2πr (7.10)

The first line holds for any smooth curve. In the second line, we go to the special case of a circle. For a circle, the coordinate r is constant and hence dr = 0, and the factor r can be pulled out ofthe integral.

86 F. Rothe

Now the distance along the circumference is measure in the hyperbolic metric (7.14) fromProposition 7.1. Hence the calculation above is modified to

Lhyp =

∫ 2π

02

√dx2 + dy2

1 − x2 − y2 =

∫ 2π

02

√dr2 + r2dθ2

1 − r2

=2r

1 − r2

∫ 2π

0dθ =

4πr1 − r2

(7.11)

We have found the correct hyperbolic arc length. But still, one needs to use the formula r = eR−1eR+1 to

express r in terms of the hyperbolic distance R.

Lhyp =4πr

1 − r2 =4π(eR − 1)

[1 − ( eR−1eR+1 )2](eR + 1)

=4π(eR − 1)(eR + 1)

(eR + 1)2 − (eR − 1)2 =4π(e2R − 1)

4eR

= π(eR − e−R) = 2π sinh R

(7.12)

Proposition 7.2 (The circumference of a circle). In hyperbolic geometry, the circumference of acircle of hyperbolic radius R is 2π sinh R.

Problem 7.2. The hyperbolic circumference of a circle is much larger than the Euclidean circum-ference. Let R = 1, 2, 5, 10 and estimate how many times the radius fits around the circumferenceof a circle of that radius.

Answer. A simple calculation yields

R (2πsinh R)/R1 7.382 11.395 93.2510 6919.82

Problem 7.3 (Hyperbolic area of a circle). For a circle of hyperbolic radius R, calculate the areaA. Again, we use the Poincaré disk, put the center of the circle at the center O of the disk. For thearea, we use the formula from differential geometry

A =

∫ 2π

0

∫ r

0

√EG − F2 dr dθ

The first fundamental form is the Riemann metric. It has been already given by formula (7.14), andtransformed to polar coordinates in the previous problem.

ds2 = 4dx2 + dy2

(1 − x2 − y2)2 = 4dr2 + r2dθ2

(1 − r2)2 = Edr2 + 2Fdrdθ + Gdθ2 (7.13)

(a) Get the area of the circle, at first in terms of the Euclidean radius |OA| = r < 1.

(b) Get the area A of this circle in terms of the hyperbolic radius R.

Topics from Relativity 87

(c) Check thatdAdR

= C

(a) The first fundament form (7.13) yields

H =√

EG − F2 =4r

(1 − r2)2

and hence the hyperbolic area of a circle of Euclidean radius r is

A =

∫ 2π

0

∫ r

0

√EG − F2 dr dθ = 2π

∫ r

0

4r dr(1 − r2)2

This integral is solved with the substitution u = r2 and du = 2rdr.

A = 2π∫ u

0

2 du(1 − u)2 =

[4π

(1 − u)

]u

0=

4π(1 − r2)

− 4π =4πr2

(1 − r2)

This is the area in terms of the Euclidean radius |OA| = r < 1.

(b) The hyperbolic radius R has already been calculated in the previous problem. We solveR = 2 tanh−1 r for the Euclidean radius and get

r = tanhR2

=eR/2 − e−R/2

eR/2 + e−R/2 =eR − 1eR + 1

r2 =(eR − 1)2

(eR + 1)2 and 1 − r2 =(eR + 1)2 − (eR − 1)2

(eR + 1)2 =4eR

(eR + 1)2

Now plug this formula into the result from part (a) and get

A =4πr2

(1 − r2)=π(eR − 1)2

eR = π(eR − 1 + e−R) = 2π(cosh R − 1)

An alternative formula is

A =π(eR − 1)2

eR = π(eR/2 − e−R/2)2 = 4π sinh2 R2

(c) We have obtained in Proposition 7.2 from the section on the Poincaré disk model, that thehyperbolic circle of radius R has the circumference C = 2π sinh R. On the other hand,differentiating the result of (b) gives

dAdR

= 2πd cosh R

dR= 2π sinh R = C

as to be shown.

Problem 7.4. Use the fundamental form for the Poincaré disk model

(dsD)2 =4(dx2 + dy2)(1 − x2 − y2)2 (7.14)

to calculate its Gaussian curvature.

88 F. Rothe

Answer. Formula (7.14) implies that the functions in the first fundamental form are E = G = H =

4(1 − x2 − y2)−2 and F = 0. Hence, with x = u and v = y, we get from formula (7.5)

K =1

2E∂

∂x

[−

1E∂E∂x

]+

12E

∂y

[−

1E∂E∂y

]= −

12E

(∂2

∂x2 +∂2

∂y2

)ln E

= +1E

(∂2

∂x2 +∂2

∂y2

)ln(1 − x2 − y2) = +

1E

(∂

∂x−2x

(1 − x2 − y2)+∂

∂y−2y

(1 − x2 − y2)

)=−2E

(1 − x2 − y2 + 2x2

(1 − x2 − y2)2 +1 − x2 − y2 + 2y2

(1 − x2 − y2)2

)=

(−2) · 24

= −1

By the way, the result K = −1 motivates the annoying factor 4 in formula (7.14).

7.4. Riemann metric of Klein’s model

Proposition 7.3 (Hilbert-Klein Metric). In the Klein model, the infinitesimal hyperbolic distanceds of points with coordinates (X,Y) and (X + dX,Y + dY) is

ds2 =dX2 + dY2 − (XdY − YdX)2

(1 − X2 − Y2)2 (7.15)

Proof. We shall derive this metric using the transformation from the Poincaré to the Klein model.As stated in Proposition ??, the mapping from a point P in Poincaré’s model to a point K in Klein’smodel is

|OK| =2 |OP|

1 + |OP| 2(??)

requiring that the rays−−→OP =

−−→OK are identical. We use Cartesian coordinates and put P = (x, y) for

Poincaré’s model and K = (X,Y) for the points in Klein’s model. Finally we put r2 = x2 + y2 andR2 = X2 + Y2. From the mapping (??), we get

X =2x

1 + r2 and Y =2y

1 + r2 (7.16)

The Riemann metric for Poincaré’s model has been derived in Proposition 7.1 to be

ds2 = 4dx2 + dy2

(1 − x2 − y2)2 = E dx2 + 2F dxdy + G dy2

Here E, F,G denotes the fundamental form for the Poincaré model in terms of (x, y). In thefollowing we shall use the matrix[

E FF G

]=

4(1 − x2 − y2)2

[1 00 1

](7.17)

From the fact that the transformation from Poincaré’s to Klein’s model is a passive coordinatetransformation, we know that the infinitesimal hyperbolic distance ds of points is left invariant.Because of the invariance, the fundamental form E, F,G for the Klein model has to satisfy

ds2 = E dX2 + 2F dXdY + G dY2 = E dx2 + 2F dxdy + G dy2

We take for now (x, y) as independent variables. From calculus, we know that by means of thechain rule

∂ X∂x

∂Y∂x

∂ X∂y

∂Y∂y

[

E FF G

] ∂ X∂x

∂ X∂y

∂Y∂x

∂Y∂y

=

[E FF G

](7.18)

Topics from Relativity 89

It now remains to carry out the arithmetic. The superscript T denotes transposition of matrices andthe superscript −1 denote inversion of matrices. As usual, we use

D XDx

=

∂ X∂x

∂ X∂y

∂Y∂x

∂Y∂y

as shorthand for the Jacobi matrix of the transformation (7.16). We need to solve the equation (7.18)for the new fundamental form E, F,G to obtain[D X

Dx

]T [E FF G

] [D XDx

]=

[E FF G

][E FF G

]=

[D XDx

]T,−1 [E FF G

] [D XDx

]−1

The Jacobi matrix of the transformation (7.16) is obtained explicitly from equations (7.16) to be

D XDx

=2

(1 + x2 + y2)2

[1 − x2 + y2 −2xy−2xy 1 + x2 − y2

]The determinant is

DetD XDx

=4

(1 + r2)4

[1 − (x2 − y2)2 − 4x2y2

]=

4(1 + r2)4

[1 − r4

]=

4(1 − r2)(1 + r2)3

Hence the inverse turned out to be[D XDx

]−1

=(1 + r2)3

4(1 − r2)2

(1 + r2)2

[1 + x2 − y2 2xy

2xy 1 − x2 + y2

]=

1 + r2

2(1 − r2)

[1 + x2 − y2 2xy

2xy 1 − x2 + y2

]With the fundamental form from formula (7.17) and the inverse Jacobi matrix just obtained pluggedinto equation (??), we calculate[

E FF G

]=

(1 + r2)2

4(1 − r2)2

4(1 − r2)2

[1 + x2 − y2 2xy

2xy 1 − x2 + y2

]2

=(1 + r2)2

(1 − r2)4

[1 + 2x2 − 2y2 + (x2 − y2)2 + 4x2y2 4xy

4xy 1 − 2x2 + 2y2 + r4

]=

(1 + r2)2

(1 − r2)4

[(1 + r2)2 − 4y2 4xy

4xy (1 + r2)2 − 4x2

]This is the new fundamental form. We need still to introduce the new coordinates (X,Y). We usethe short hands r2 = x2 + y2 and R2 = X2 + Y2. By means of equation (7.16) we get

1 − X2 =(1 + r2)2 − 4x2

(1 + r2)2 , 1 − Y2 =(1 + r2)2 − 4y2

(1 + r2)2 ,

XY =4xy

(1 + r2)2 , 1 − R2 =(1 − r2)2

(1 + r2)2

Thus the new fundamental form miracously simplifies to be[E FF G

]=

1(1 − R2)2

[1 − Y2 XY

XY 1 − X2

](7.19)

90 F. Rothe

For the line element we get from this fundamental form

ds2 = E dX2 + 2F dXdY + G dY2

=(1 − Y2)dX2 + 2XYdXdY + (1 − X2)dY2

(1 − R2)2

=dX2 + dY2 − (XdY − YdX)2

(1 − X2 − Y2)2

Problem 7.5 (Gaussian curvature of the Hilbert-Klein metric). Use Gauss’ characteristicequation (7.5) and check directly that the Hilbert-Klein metric (7.15) from proposition 7.3 hasconstant Gaussian curvature K = −1. We use polar coordinates X = r cos θ,Y = r sin θ andconvert formula to

ds2 =dr2 + r2(1 − r2)dθ2

(1 − r2)2 (7.20)

since this simplifies the calculation considerably.

Answer. We have to put u = r and v = θ. The first fundamental form and its coefficients become

ds2 = Edr2 + 2Fdrdθ + Gdθ2

E = (1 − r2)−2 , F = 0 , G = r2(1 − r2)−1

H =√

EG − F2 = r(1 − r2)−3/2

Hence we get for the Gaussian curvature K from the characteristic equation

2HK = −∂

∂r

[H−1 ∂G

∂u

]= −

∂r

[r−1(1 − r2)3/2 ∂

∂rr2

1 − r2

]= −

∂r

[r−1(1 − r2)3/2 2r(1 − r2) − r2(−2r)

(1 − r2)2

]= −

∂r

[2(1 − r2)−1/2

]= (−2)(−1/2)(1 − r2)−3/2(−2r) = −2r(1 − r2)−3/2 = −2H

We get the constant Gaussian curvature K = −1, as expected.

Problem 7.6 (Distortion of angles by the Hilbert-Klein metric). We use polar coordinates tosimplify the calculation, and the corresponding contravariant components for the tangent vectors.At a point K with polar coordinates X = r cos θ,Y = r sin θ are attached the radial tangent vector(ar, aθ) = (1, 0) and any other tangent vector (br, bθ). Hence the apparent angle α satisfies

cosα =br√

b2r + r2b2

θ

and tanα =rbθbr

Check with the Hilbert-Klein metric (7.20) that the angle ω between the two vectors in Klein’smodel satisfies

tanω = tanα√

1 − r2 (7.21)

Topics from Relativity 91

Answer. The apparent angle α and the hyperbolic angle ω between the two tangent vectors (ar, aθ)and (br, bθ) satisfy

cosα =arbr + r2aθbθ√

(a2r + r2a2

θ) ·√

(b2r + r2b2

θ)

cosω =arbr + r2(1 − r2)aθbθ√

a2r + r2(1 − r2)a2

θ ·

√b2

r + r2(1 − r2)b2θ

In the given example with ar = 1, aθ = 0 the expressions simplify

cosα =br√

b2r + r2b2

θ

and tan2 α =r2b2

θ

b2r

cosω =br√

b2r + r2(1 − r2)b2

θ

and tan2 ω =r2(1 − r2)b2

θ

b2r

Hencetanω = tanα

√1 − r2

7.5. A second proof of Gauss’ remarkable theorem The most enlightened proof that theGaussian curvature is an intrinsic property of the surface uses Gauss’ notion of integral curvature.For any domain G on a given curved surface, the integral curvature is defined as the integral∫ ∫

G KdA, where dA denotes the area element of the surface.

Take a geodesic triangle 4ABC. Let T denote the region bounded by the geodesics betweenany three given points A, B,C on the surface. Let α, β, γ be the angles (between tangents) to thegeodesics at the three vertices. Gauss proves∫ ∫

TKdA = α + β + γ − π (7.22)

where angles are to be measured in radians. The quantity on the right hand side is the deviation ofthe angle sum α+ β+ γ from the Euclidean value 180, respectively π. 1 The quantity α+ β+ γ− πis called the excess of the triangle 4ABC. For the hyperbolic case, the excess is negative. In thatcase, one calculates using the excess times −1, which is called defect. In words, Gauss’ theoremtells the following:

For a geodesic triangle, the integral curvature equals the excess of its angle sum.This theorem, Gauss says, ought to be counted as a most elegant theorem.

I discuss a few immediate, but important consequences of (7.22). First of all, instead ofthe complicated characteristic equation (7.5), one has a simple property of a geodesic trianglefrom which to derive the Gaussian curvature in a limiting process. Secondly, as an immediateimplications of (7.22), the Gaussian curvature is an intrinsic property of a curved surface. Recallthat both geodesics, as well as measurement of area depend only on the first fundamental form.Hence, because of (7.22), the same is true for the Gaussian curvature.

Another easy consequence of (7.22) is obtained from the special case of a sphere. For thissurface, the Gaussian curvature is constant, and equal to K = R−2 where R is the radius of thesphere. Hence one obtains for the area of a spherical triangle A = (α+ β+ γ− π)R2. as was alreadyknown before Gauss, e.g. to Lambert.

1 In radian measures, the Euclidean angle sum is π.

92 F. Rothe

Problem 7.7. Tile a sphere by equilateral triangles. It can be done in three ways:(i) Four triangles with α = β = γ = 120.(ii) Eight triangles with α = β = γ = 90.(iii) N triangles with α = β = γ = 72.

Explain and draw these tilings. To which Platonic bodies do the vertices correspond? Determinethe surface area of the sphere from (i) and (ii), then get the number N in (iii).

Answer. From item (i), the area of the sphere is

A = 4 ·(3

2π3− π

)R2 = 4πR2

The vertices of the four triangles form a tetrahedron. Similarly, item (ii) yields

A = 8 ·(3π

2− π

)R2 = 4πR2

The vertices of the eight triangles form a octahedron.We can now calculate the number N of triangles in the tiling (iii). Because of

A = N ·(3

2π5− π

)= 4πR2

one gets N = 20. The vertices of the twenty triangles form an icosahedron.

Here is a further important consequence of equation (7.22).

Corollary 9 (A common bound for the area of all triangles). On a surface with negative constantGaussian curvature K < 0, the area of any triangle is less than

π

−K.

On December 17, 1799, Gauss wrote to his friend, the Hungarian mathematician WolfgangFarkas Bolyai (1775-1856):

As for me, I have already made some progress in my work. However, the path I havechosen does not lead at all to the goal which we seek [deduction of the parallelaxiom], and which you assure me you have reached. It seems rather to compel meto doubt the truth of geometry itself. It is true that I have come upon much whichby most people would be held to constitute a proof; but in my eyes it proves asgood as nothing. For example, if we could show that a rectilinear triangle whosearea would be greater than any given area is possible, then I would be ready toprove the whole of Euclidean geometry absolutely rigorously.

Most people would certainly let this stand as an axiom; but I, no! It would indeedbe possible that the area might always remain below a certain limit, however farapart the three angular points of the triangle were taken.

From about 1813 on Gauss developed his new geometry. He became convinced that it was logicallyconsistent and rather sure that it might be applicable. His letter written in 1817 to Olbers says:

I am becoming more and more convinced that the physical necessity of our Euclideangeometry cannot be proved, at least not by human reason nor for human reason.Perhaps in another life we will be able to obtain insight into the nature of space,which is now unattainable. Until then we must place geometry not in the sameclass with arithmetic, which is purely a priori, but with mechanics.

Topics from Relativity 93

Problem 7.8. (a) Find two further enlightening statements of Gauss, and comments on all fourstatements.

(i) Are they courageous?

(ii) Are they to the benefit of the scientific community?

(iii) Are they helpful for the person he addresses?

(iv) Are they just against other people?

(v) What would you have done in Gauss’ place?

(b) Choose two of Gauss’ comments. Write a letter as you imagine you would have written inplace of Gauss.

Problem 7.9. To test the applicability of Euclidean geometry and his non-Euclidean geome-try, Gauss actually measured the sum of the angles of the triangle formed by three mountainpeaks in middle Germany: Broken, Hohenhagen, and Inselberg. the sides of the triangle were69, 85, 197 km. His measurement yielded that the angle sum exceeded 180 by 14”.85.

(a) Use Herons formula A =√

s(s − a)(s − b)(s − c) to calculate the area of the triangle, in a verygood approximation.

(b) Take R = 6378 km as radius of the earth. Calculate the angle excess for a spherical trianglebetween the three mountain peaks. You need to convert angular measurements! 1 radian

equals180

π=

180 · 3600”π

.

(c) Is the triangle that Gauss measured actually a spherical triangle. Why or why not?

(d) Reflect on the motives why Gauss did his measurement. Find and read some further sources.Think of the following and further motives and possibilities. Did Gauss really just want to

(i) check accuracy?

(ii) check geometry?

(iii) It was just a theoretical thought experiment, not really performed.

Answer. (a) Herons formula give the area A =√

s(s − a)(s − b)(s − c) = 2929.42 km2.

(b) Take R = 6378 km as radius of the earth. The angle excess for a spherical triangle between thethree mountain peaks is

α + β + γ − π =AR2 = 7.201 10−5 (7.23)

This is the value in radian measure. Converted to degrees, we get .00413 which is 14.9”.

(c) Of course the triangle one measures is not a spherical triangle, since light rays do not followthe curvature of the earth.

94 F. Rothe

7.6. Principal and Gaussian curvature of rotation surfaces Before introducing the pseudo-sphere, we need some facts about the curvature of general rotation surfaces. We take the graph ofan arbitrary function y = f (x), and rotate it about the y-axis to produce a rotation surface in threedimensional space. The first principle curvature of a rotation surface in the xy plane is

c1 =y′′

(1 + y′2)32

(7.24)

This is just the curvature of the graph y = f (x).Recall that the perpendicular to the tangent of a curve is called the normal of the curve. The

second principal curvature occurs for a section of the surface by a plane P2, which intersects thexy plane along the normal of the curve y = f (x), and is perpendicular to the xy plane. The secondprincipal curvature is

c2 =y′

x(1 + y′2)12

(7.25)

Proposition 7.4. The Gaussian curvature of the rotation surface produced by rotating the graphof y = f (x) around the y-axis, is the product

K =y′ y′′

x(1 + y′2)2(7.26)

The formula (7.24) for a curvature of a plane curve is standard. Finally, since K = c1c2,formulas (7.24) and (7.25) imply the claim (7.26).

Here is an argument to justify (7.25): Let tan β = y′ be tangent of the slope angle for y = f (x),as usual. Calculate sin β. Calculate the hypothenuse AB of the right 4ABC, with vertex A = (x, y)on the curve, leg AC parallel to the x-axis, leg BC on the axis of rotation, and hypothenuse ABperpendicular to the curve. One can show that point B is the center of the best approximating circlein the plane P2. Hence c2 = 1

AB. Use this idea to get the main curvature c2.

Answer. tan β = ACBC

= y′ Hence

sin β =AC

AB=

AC√BC

2+ AC

2=

y′√1 + y′2

c2 =1

AB=

sin βx

=y′

x(1 + y′2)12

A second proof of Proposition 7.4 . On the surface of rotation, we choose as parameters u = φ therotation angle, and v = r the distance from the rotation axis. Since the y-axis is the axis of rotation,the surface of rotation gets the parametric representation

x = v cos uy = f (v)z = v sin u

The derivatives by the parameters are

xu = −v sin u xv = cos uyu = 0 yv = f ′(v)zu = v cos u zv = sin u

(7.27)

Topics from Relativity 95

Figure 7.1. Curvature of a rotation surface.

From these derivatives, one gets the first fundamental form. We use the general formulas

E = x2u + y2

u + z2u

F = xuxv + yuyv + zuzv (7.3)

G = x2v + y2

v + z2v

valid for any surface, and specialize to the surface of rotation given above. Now calculateE, F,G from (7.3) and (7.27), to get the first fundamental form (7.2). Next get the root of thedeterminant H =

√EG − F2. Finally calculate the Gaussian curvature from the characteristic

equation (7.5).

Problem 7.10. Use the approach as indicated to confirm formula (7.26).

Answer. One gets E = v2 , F = 0 and G = 1 + f ′2 and

ds2 = v2du2 + (1 + f ′2)dv2 = r2dφ2 + (1 + f ′(r)2)dr2

The root of the determinant is H = v√

1 + f ′2. Because all four quantities E, F,G,H dependonly on v, the partial derivatives by u all vanish. Thus Gauss’ characteristic equation (7.5) can besimplified to yield

K =1

2H∂

∂u

[F

EH∂E∂v−

1H∂G∂u

]+

12H

∂v

[2H∂F∂u−

1H∂E∂v−

FEH

∂E∂u

]=

12H

∂v

[−

1H∂E∂v

]= −

12H

ddv

[2vH

]

96 F. Rothe

Now use H = v(1 + f ′2)12 and go on. We arrive at

K = −1

2Hddv

[2vH

]= −

1

v√

1 + f ′2

d(1 + f ′2)−12

dv

=1

2v√

1 + f ′2

[(1 + f ′2)−

32

]2 f ′ f ′′ =

f ′ f ′′

v(1 + f ′2)2

This result is equivalent to formula (7.26), since x = v = r is the distance from the axis of rotationand y = f (x).

Problem 7.11. Calculate the Gaussian curvature of a three dimensional sphere of radius a.

Answer. The sphere is provided by rotating the graph of x2 + y2 = a2 about the y-axis. Implicitdifferentiation yields 2x + 2yy′ = 0 and hence

y′ = −xy

y′′ = −1 · y − xy′

y2 =xy′ − y

y2 =−x2/y − y

y2 = −a2

y3

K =y′y′′

x(1 + y′2)2=

a2

y4 (1 + x2/y2)2 =

a2

(x2 + y2)2 =1a2

7.7. The pseudo-sphere The issue is now to find a rotation surface of constant negativeGaussian curvature K = −a−2. Such a surface is called pseudo-sphere.

Problem 7.12. Use the formula (7.26) for the Gaussian curvature, and get a differential equationof first order for the function u := y′2. You may begin by getting the derivative du

dx .

Answer. The derivative of the function u := y′2 is dudx = 2y′y′′. Next, I put the requirement K = −a−2

into the formula (7.26). One gets

y′ y′′

x(1 + y′2)2= −

1a2 (7.28)

2y′ y′′ = −2x(1 + y′2)2

a2 (7.29)

dudx

= −2x(1 + u)2

a2 (7.30)

Problem 7.13. Solve the differential equation

dudx

= −2x(1 + u)2

a2 (7.30)

by separation of variable. For simplicity, we use the initial data u(a) = 0, and get a curve throughthe point x = a, u = 0.

Topics from Relativity 97

Answer.

dudx

= −2x(1 + u)2

a2

du(1 + u)2 = −

2xdxa2∫ u

a

du(1 + u)2 du = −

∫ x

0

2xdxa2 dx[

−1

1 + u

]u

a=

[−

x2

a2

]x

0

−1

1 + u+ 1 = −

x2

a2 + 1

1 + u =a2

x2

u =a2 − x2

x2

Problem 7.14. Check that the differential equation

y′ = −

√a2 − x2

x

with |x| ≤ a, has the solution

y = a ln

a +√

a2 − x2

x

− √a2 − x2 + C (7.31)

Too, find the general solution of the equation

y′ = +

√a2 − x2

x

Answer.

y = a ln

a −√

a2 − x2

x

+√

a2 − x2 + C

Definition 7.1 (Pseudo-sphere). The rotation surface with constant negative Gaussian curvatureis called a pseudo-sphere. With curvature K = −a−2 and the y-axis as axis of rotation, its equationis

y = a lna +

√a2 − x2 − z2√

x2 + z2

− √a2 − x2 − z2

Problem 7.15. Check, once more, that the Gaussian curvature of the specified surface is K = −a−2.

Answer.

Problem 7.16. Check the following fact: The segment on the tangent to the curve (7.31), betweenthe touching point T , and the intersection S of the tangent with the y-axis has always the samelength a. For that reason, the curve (7.31) is called tractrix.

98 F. Rothe

Figure 7.2. The tractrix has a segment on its tangent of constant length.

Answer. Take the right 4TS C, formed by the segment TS on the tangent at point T , and theperpendicular from T onto the y-axis. 1 We know that

y′ = tanα =S C

TC

TS2

= TC2

+ S C2

= x2(1 + y′2) = a2

Problem 7.17. The surface area of of rotation surface, made by rotating y = f (x) about the y-axis,for x1 ≤ x ≤ x2 is

S =

∫ x2

x1

2πx√

1 + y′2 dx

Calculate the surface of the pseudo-sphere for bounds 0 < x ≤ a.

Answer. Because of 1 + y′2 = 1 + u = a2

x2 , we get

S =

∫ a

02πx

ax

dx = 2πa2

We introduce now (φ, r) as two convenient coordinates on the pseudo-sphere. As firstcoordinate, we choose the angle of rotation φ about the y-axis. The second coordinate is the radius

1 This triangle is different from triangle 4ABC in the figure on page 95.

Topics from Relativity 99

r =√

x2 + z2 measured from the axis of rotation, Up to now it was called x, but now I choose toname it r. The three parameters r, φ, y are cylindrical coordinates of three dimensional space. Thefirst two of them are convenient parameters on the pseudo-sphere.

Proposition 7.5 (Riemann Metric for the Pseudo Sphere). The infinitesimal distance ds of pointswith coordinates (φ, r) and (φ + dφ, r + dr) is

ds2 = r2dφ2 +a2

r2 dr2 (7.32)

Proof. The distance on the pseudo-sphere is calculated from the usual Euclidean distance forpoints of the three dimensional space into which the surface is embedded. At first, I convertthe distance from Cartesian to cylindrical coordinates. Because the y-axis is the rotation axis,its coordinate stays, but the pair (x, z) is converted to polar coordinates. Hence one gets

ds2 = dx2 + dy2 + dz2 = dr2 + r2dφ2 + dy2 (7.33)

We restrict to points on the pseudo-sphere. Hence the coordinates r and y are related in the sameway as x and y before. Thus y′ = −

√a2−x2

x gets

dydr

= −

√a2 − r2

r(7.34)

Now we use (7.34) to eliminate y from (7.33) and get

ds2 = dr2 + r2dφ2 + dy2 = dr2 + r2dφ2 +

(dydr

)2

dr2

= dr2 + r2dφ2 +a2 − r2

r2 dr2 = r2dφ2 +a2

r2 dr2

as to be shown. As an alternative, we can use the first fundamental form calculated above. Since1 + f ′(r)2 = 1 + a2−r2

r2 = a2

r2 , one gets again

ds2 = r2dφ2 + (1 + f ′(r)2)dr2 = r2dφ2 +a2

r2 dr2

7.8. Poincaré half-plane and Poincaré disk Throughout, we denote the upper open half-plane by H = (u, v) : v > 0. Its boundary is just the real axis ∂H = (u, v) : v = 0. The open unitdisk is denoted by D = z = x + iy : x2 + y2 < 1, and its boundary is ∂D = z = x + iy : x2 + y2 = 1.The following isometric mapping of the half-plane to the disk is used in this section. It differsfrom the one used in the previous section by a rotation of the disk by a right angle. We repeat forconvenience.

Proposition 7.6 (Isometric Mapping of the Half-plane to the Disk). The linear fractionalfunction

z =iw + 1w + i

(7.35)

is a conformal mapping and a bijection from CC ∪ ∞ to CC ∪ ∞. The inverse mapping is

w =1 − izz − i

(7.36)

100 F. Rothe

These mappings preserves angles, the cross ratio, the orientation, and map generalized circles togeneralized circles.

The upper half-plane H = w = u + iv : v > 0 is mapped onto the unit disk D = z = x + iy :x2 + y2 < 1. Especially

w = 0 7→ z = −i , w = 1 7→ z = 1 , w = ∞ 7→ z = i , w = i 7→ z = 0

Proposition 7.7 (Riemann Metric for Poincaré’s half-plane). In the Poincaré half-plane, theinfinitesimal hyperbolic distance ds of points with coordinates (u, v) and (u + du, v + dv) is

(dsH)2 =du2 + dv2

v2 (7.37)

The mapping (7.35) provides an isometry between the half-plane and the disk:

dsD = dsH (4.5)

Proof. The metric of the half plane is calculated from the known metric

(dsD)2 =4(dx2 + dy2)(1 − x2 − y2)2 (7.14)

of the Poinaré disk model. The mapping

z =iw + 1w + i

(7.35)

provides an isometry from the half-plane to the disk. The denominator is

1 − | z|2 =(w + i)(w − i) − (iw + 1)(−iw + 1)

|w + i|2=

2iw − 2iw|w + i|2

=4v|w + i|2

The derivative of the mapping (7.35) isdzdw

= −2

(w + i)2

Putting the last two formulas into (7.14) yields

ds2 =4(dx2 + dy2)(1 − x2 − y2)2 =

4| dz|2

(1 − |z|2)2

= 4∣∣∣∣∣ dzdw

∣∣∣∣∣2 | dw|2(|w + i|2

4v

)2

= 4∣∣∣∣∣ 2(w + i)2

∣∣∣∣∣2 | dw|2(|w + i|2

4v

)2

=| dw|2

v2 =du2 + dv2

v2

Hence formula (7.37) arises from the isometry (7.35) between half-plane and the disk.

7.9. Embedding the pseudo-sphere into Poincaré’s half-plane

Proposition 7.8. The mappingw = φ + i

ar

(7.38)

transforms the line element dsPS of the pseudo-sphere to the line element dsH of the half-planesuch that

dsPS = a (dsH) (7.39)

For a = 1, we get an isometry. This is just the case with Gaussian curvature K = −a−2 = −1.Because an isometry conserves the Gaussian curvature, this shows that the Poincaré half-planehas Gaussian curvature −1.

Topics from Relativity 101

Proof. We separate equation (7.38) into its real- and imaginary part to get

u = φ , v =ar

(7.40)

Using its derivatives, we plug into

dsH2 =

du2 + dv2

v2 (7.37)

One gets

dsH2 =

dφ2 +(−ar−2

)2dr2

a2r−2 = a−2(r2dφ2 +

a2

r2 dr2)

Now comparing with

ds2PS = r2dφ2 +

a2

r2 dr2 (7.32)

one concludesdsH

2 = a−2(dsPS )2

and hence equation (7.39) holds.

7.10. Embedding the pseudo-sphere into Poincaré’s disk The next goal is to construct anisometric mapping from the pseudo-sphere to the Poincaré disk. It is convenient to get this mappingas composition of a mapping from the pseudo-sphere to the half-plane, and the conformal mappingfrom the half-plane to the disk.

Proposition 7.9. We take a pseudo-sphere with a = 1. This normalizes the Gaussian curvature tobe K = −1. The mapping

z =r − 1 + irφ

rφ + i(r + 1)(7.41)

maps the pseudo-sphere isometrically into the Poincaré disk.

Proof. The mapping (7.41) is constructed as a composition of two mappings PS 7→ H 7→ D. Takethe mapping PS 7→ H given by equation (7.42), and the mapping H 7→ D given by equation (7.35).The composition of the mapping

w = φ + i1r

(7.42)

from the pseudo-sphere to the Poincaré half-plane, with the mapping

z =iw + 1w + i

(7.35)

from the Poincaré half-plane to the Poincaré disk is the required mapping. following mapping

z =i(φ + i

r

)+ 1

φ + ir + i

=r − 1 + irφ

rφ + i(r + 1)

from the pseudo-sphere to the Poincaré disk. Both mappings (7.42) and (7.35) are isometries, asstated by formulas (7.39) with a = 1 and formula (??). Hence their composition (7.41) conservesthe line element: dsPS = dsH = dsD.

102 F. Rothe

7.11. About circle-like curves We now go back to the Poincaré disk model. At first, here area few remarks about circle-like curves. In hyperbolic geometry, there exist three different types ofcircle-like curves. I define as a circle-like curve a curve which appears to be a circle in the Poincarémodel.

Recall that ∂D is the boundary circle of the Poincaré disk. Take any second circle C. I callits Euclidean center M the quasi-center. The meaning of C for the hyperbolic geometry of thePoincaré disk depends on the nature of the intersection of the two circles C and ∂D. There are threeimportant cases:

(i) The circle C lies totally in the interior D. In that case, it is a circle for hyperbolic geometry.This circle has a center A in hyperbolic geometry. Note that the quasi-center M is different fromthe center A of C as an object of hyperbolic geometry.

(ii) The circle C touches the boundary ∂D from inside, say at endpoint E. In that case, it is ahorocycle for hyperbolic geometry. A horocycle has no hyperbolic center, instead it contains anideal point E. Hence it is unbounded. The hyperbolic circumference of a horocycle is infinite, asfollows from part (c) below.

(iii) The circle C intersects the boundary ∂D at two endpoints E and F. In that case, thecircular arc inside the disk D is either an equidistance line or a geodesic for hyperbolic geometry.A geodesic intersects ∂D perpendicularly. In the case of non perpendicular intersection of ∂D andC, one gets an equidistance line. Actually all points of that equidistance line have the same distancefrom the hyperbolic straight line with ends E and F.

Problem 7.18. Take points Y+ = (1, 0) and O = (0, 0). Find the analytic equation for a horocycleH with apparent diameter OY+.

Answer. In complex notation, point Y+ is i. The quasi-center is M = i2 , and the apparent radius is

12 . Hence one gets the equation ∣∣∣∣∣z − i

2

∣∣∣∣∣2 =14

x2 +

(y −

12

)2

−14

= 0

x2 + y(y − 1) = 0

We need another more convenient parametric equation for the horocycle H . Let Z = (x, y) beany point on H and define the circumference angle d ∠OY+Z. Calculate tan d in terms of (x, y).Then express x and y in terms of the central angle 2d ∠OMZ. Use double angle formulas, andfinally express x and y in terms of tan d.

Answer.tan d =

x1 − y

=yx

x =sin 2d

2= sin d cos d =

tan d1 + tan2 d

y =1 − cos 2d

2= sin2 d =

tan2 d1 + tan2 d

(7.43)

Problem 7.19. Confirm that the hyperbolic arc length of the arc OZ on the horocycle H is justs = 2 tan d.

Topics from Relativity 103

Figure 7.3. Measuring an arc of a horocycle.

Answer. Let t = tan d. Differentiation yields

x =t

1 + t2 ,dxdt

=1 − t2

(1 + t2)2

y =t2

1 + t2 ,dydt

=2t

(1 + t2)2

1 − x2 − y2 = 1 − y =1

1 + t2

(dxdt

)2

+

(dydt

)2

=1

(1 + t2)2

Hence the hyperbolic metric (7.14) implies(dsdt

)2

= 4(1 − x2 − y2)−2

(dxdt

)2

+

(dydt

)2 = 4

Hence by elementary integration s = 2t = 2 tan d.

Problem 7.20. Give the representation of the horocycle H with this arc length as parameter.Explain in a drawing, how to measure the arc length on this horocycle.

Answer. We get the parametrization

x =2s

4 + s2

y =s2

4 + s2

(7.44)

The hyperbolic arc length of OZ on the horocycle H is the Euclidean length |Y−Z′| since s =

2 tan d = |Y−Z′|.

104 F. Rothe

Figure 7.4. Isometry of the sliced pseudo-sphere to a half infinity strip.

7.12. Mapping the boundaries There cannot exist an isometry of between the pseudo-sphereand the half-plane, since they have different topologies.

A corresponding problem already arises in Euclidean geometry, for the construction of for anisometry between the cylinder and the plane. At least, there exists a non-invertible homomorphismfrom the plane onto the cylinder. This homomorphism can be restricted to on isomorphism betweena strip of the plane and the sliced cylinder.

We return to the hyperbolic case. By slicing the pseudo-sphere, we get an isomorphism of thesliced pseudo-sphere into a strip of the half-plane, and furthermore into part of the disk, too.

The pseudo-sphere is sliced along the geodesic in the negative (x, y)-plane, restricting therotation angle to the half-open interval −π ≤ φ < π. The mapping

w = φ + i1r

(7.42)

maps the sliced pseudo-sphere onto a half open rectangular domain PS H in the upper half-plane.The boundary of PS H consists of a segment AB with the endpoints A = −π + i, B = π + i, and twounbounded rays

−−→A∞ and

−−→B∞ with vertices A and B parallel to the positive v axis. Furthermore, we

map the sliced pseudo-sphere to the Poincaré disk via the isometry (7.41). The image PS D of thepseudo-sphere is a part of the interior of the horocycleH with apparent diameter 0 to i.

Problem 7.21. On the pseudo-sphere, we use as parameters the cylindrical coordinates r and φ.The boundary of the sliced pseudo-sphere is given by r = 1, and −π < φ < π. To which curve inthe disk D is the boundary mapped by the isometry (7.41)?

Answer. The boundary ∂PS D of image PS D consists of three circular arcs. A segment AB of ahorocycleH with endpoints

A =iπ

π + 2i, B =

iππ − 2i

as well as two geodesic rays−−−→AY+ and

−−−→BY+ with vertices A and B pointing to the ideal endpoint

Y+ = i.

Topics from Relativity 105

Figure 7.5. Isometric image of the sliced pseudo-sphere in the Poincaré disk.

Problem 7.22. Give a parametric equation for the boundary, with parameter φ, at first in complexnotation for z = x + iy. Then separate into real and imaginary parts to get equations for x and y.

Answer. Simply put r = 1 into equation (7.41). One gets

z =iφ

φ + 2iTo separate real and imaginary parts, one needs a make the denominator real:

z =iφ

φ + 2i=

iφ(φ − 2i)(φ + 2i)(φ − 2i)

(7.45)

x + iy =iφ2 + 2φφ2 + 4

(7.46)

x =2φ

φ2 + 4(7.47)

y =φ2

φ2 + 4(7.48)

Problem 7.23. Check that your parametric equation is a circle with center i2 .

Answer. This is a parametric equation of a circle with center i2 because

z −i2

=iφ

φ + 2i−

i2

=i(φ − 2i)2(φ + 2i)∣∣∣∣∣z − i

2

∣∣∣∣∣ =12

(d) Compare (7.46) with the result

x =tan d

1 + tan2 d

y =tan2 d

1 + tan2 d

106 F. Rothe

and check that φ = 2 tan d. Hence, because of s = 2 tan d was shown, one concludes thatφ = s = 2 tan d. Of course φ = s follows directly because of the isometries PS 7→ H 7→ D.

Problem 7.24. Draw sketches of the pseudo-sphere as it appears in the domains PS 7→ H 7→ D.Use different colors for the different vertices and edges of the boundary, but same colors forcorresponding objects in all three domains PS 7→ H 7→ D, as they are mapped by our isometriesfrom above.

Problem 7.25. The Poincaré disk can be tiled with congruent triangles. Indeed, there existinfinitely many different types of such tilings. I choose a tiling with congruent equilateral triangles,such that at each vertex seven triangles meet. Use Gauss’ remarkable theorem to calculate thehyperbolic area of one such triangle.

Answer. I measure angles in radian measure. The angles of one triangle are all

α = β = γ =2π7

and hence the defect of the angle sum is

α + β + γ − π =6π7− π = −

π

7

Since the Gaussian curvature of Poincaré’s model is K = −1, the area is just the negative of theexcess, (this is also called the defect) and is π

7 .

Problem 7.26. Overlay two drawings, of the tiling by equilateral triangles, and a second drawingof the image PS D of the pseudo-sphere and its boundary ∂PS D.

How many of those triangles of the tiling fit entirely into PS D?How many triangles make up (with bids and pieces!) the total area of PS D?

Answer. Using item 12, one calculated that the total area of the pseudo-sphere is 2π. This equalsthe area of 14 triangles.

Topics from Relativity 107

List of Figures

2.1 Aberration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2 Parallax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3 Aberration of the rain—and the light. . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4 The one-dimensional Doppler effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5 The principal setup of Compton’s experiment. . . . . . . . . . . . . . . . . . . . . . . . 362.6 The kinematics of the collision of a photon with an electron, initially at rest. . . . . . . . . . 377.1 Curvature of a rotation surface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.2 The tractrix has a segment on its tangent of constant length. . . . . . . . . . . . . . . . . . 987.3 Measuring an arc of a horocycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.4 Isometry of the sliced pseudo-sphere to a half infinity strip. . . . . . . . . . . . . . . . . . 1047.5 Isometric image of the sliced pseudo-sphere in the Poincaré disk. . . . . . . . . . . . . . . . 105