web.media.mit.eduweb.media.mit.edu/~Raskar/Prakash2/FemtoDec7th2008/Sig09_noref_nofigs.pdfOnline Submission ID: 010 Inverse Transient Light Transport via Time-Images 1 Abstract 2 We

Online Submission ID: 010

Inverse Transient Light Transport via Time-Images

Abstract1

We show that multi-path analysis of a photo from a time-of-flight2

(ToF) camera provides a tantalizing opportunity to infer about ge-3

ometry and reflectance of not only visible but also hidden parts of4

a scene. We provide a frame-work for analyzing global light trans-5

port from just a single viewpoint using cameras which capture a6

temporal profile for each pixel at pico-second resolution. Unlike7

traditional cameras which estimate intensity per pixel, I(x, y), the8

ultra-short sampling cameras capture a time-image I(x, y, t). This9

time-image encodes global dynamical interactions of light with the10

scene elements.11

In this paper we model the dynamics of global illumination using12

transient properties of light transport formulated as a linear time13

invariant state space system. We also propose formulations and14

algorithms that use the time-image I(x, y, t) for reasoning about15

scene content in scenarios where transient reasoning exposes scene16

properties that are beyond the reach of traditional computer vision.17

In particular, we present a system identification algorithm for in-18

ferring the depth and scene structure as well as the measuring 4-19

dimensional bidirectional reflectance distribution function (BRDF)20

of the scene elements using transient global light transport. We21

conclude by presenting inverse rendering results using the inferred22

scene properties and traditional global illumination rendering.23

Keywords: Light transport, Global illumination, Multi-path anal-24

ysis, Inverse problems, Inverse rendering, Computational Imaging25

1. Introduction26

Cameras are an integral part of computer graphics and vision re-27

search and 2-dimensional intensity images I(x, y, ) have long been28

used to observe and interpret scenes. The recovered digital mod-29

els have included properties such as 3D geometry, scene lighting,30

and surface reflectance. These have found applications in robotics,31

industrial sensing, security, and user interfaces. New sensors and32

methods for interpreting scenes will clearly be of benefit in many33

application areas. This paper introduces a framework for reason-34

ing about transient light transport and shows that it can allow new35

properties of scenes to be observed and interpreted.36

During the exposure time, the incoming light arriving at a pixel37

is integrated along angular, temporal and wavelength dimensions38

to record a single intensity value. Many distinct scenes result in39

identical projections, and thus identical recorded intensity values40

on the sensor. Thus it is challenging to estimate scene properties41

such as the BRDF of scene elements, the depth in a multi-reflective42

scene, or the overall scale of a scene which are not not directly ob-43

servable. It is possible to sample other dimensions. 4D lightfield44

cameras sample both the incoming angle of light and spatial loca-45

tion. They have paved the way for powerful algorithms to perform46

many unconventional tasks. We show that capturing time-variation47

of incoming light, similarly provides powerful tools.48

Steady-state and Transient Light Transport: Steady-state light49

transport assumes an equilibrium in global illumination due to in-50

finite speed of light. In a room-sized environment, a microsecond51

exposure (integration) time is long enough for a light impulse to52

fully traverse all the possible multi-paths introduced due to inter-53

reflections between scene elements and reach equilibrium (steady)54

state. Traditional video cameras sample light very slowly compared55

to the time scale at which the transient properties of light come into56

play. Videos may be interpreted as a sequence of images of dif-57

ferent but static worlds because the exposure time of each frame is58

sufficiently long.59

In transient light transport, we assume that the speed of light is fi-60

nite. As light scatters around a scene, it takes different paths, and61

longer paths take a longer time to traverse. Even a single pulse of62

light along a single ray-beam can evolve into a complex pattern in63

time, as shown in Figure ??. The ray impulse is a space-time im-64

pulse and the resultant time-image I(x, y, t) which is patterns of65

rate of arrival of photons at a particular pixel. It is analogous to66

the impulse response function in signal processing literature. We67

define the Space Time Impulse Response (STIR(S)) of a scene,68

S, as a five-dimensional function, i.e., for each outgoing 2D ray-69

impulse direction for illumination we record the 3D ray-impulse70

time-image. Unlike a traditional 2D pixel which measures the num-71

ber of photons, transient transport describes rate of incoming pho-72

tonsas a function of time. The forward model for transient light73

transport for direct and global illumination is well understood in74

graphics, e.g. in ray tracing or photon mapping. In this paper, we75

emphasize the inverse transient light transport problem to enable76

novel scene understanding by inferring scene characteristics.77

Our approach: This work proposes an imaging model which78

samples light on the picosecond scale, equivalent to light travel on79

the order of millimeters. At this scale, it is possible to reason about80

the individual paths light takes within a scene, enabling measure-81

ment of scene properties that are beyond the reach of traditional82

machine vision. We enable this reasoning with a theoretical formu-83

lation for transient global light transport.84

The photon-bounce counting recursive linear systems used in85

graphics and vision systems are well-suited for steady state trans-86

port. However, analyzing dynamic time based interactions requires87

a different approach. System identification (sysid) is the algorith-88

mic procedure of building dynamical models from measured in-89

put/output data and estimating their parameters. We choose a ’grey90

box’ model of sysid where, although specific of the internals are91

known only partially, physical laws of the nature of light transport92

allows estimation of unknown free parameters, such as geometry93

and reflectance of visible as well as hidden scene elements.94

In this paper, we show how one can invert the captured STIR into95

geometric and reflectance properties of the scene by discretizing it96

into patches. We perform this in two steps. We first exploit in-97

formation about on-sets for estimating geometry of patches with98

unknown reflectance. Onsets are used in LIDAR for estimating ge-99

ometry of only visible and diffuse patches. In the second stage,100

we use the state-space formulation to infer the irradiance from one101

patch to another, indirectly computing reflectance of each patch.102

This allows reasoning about general surface reflectance of visible103

as well as about hidden patches.104

Our work is deeply inspired by pioneering work in analysis of light105

transport by Seitz, Matsushita and Kutulakos [2005], dual photog-106

raphy by Sen et. al. [2005] and separation of direct and global com-107

ponents by Nayar et al [2006]. Earlier inverse light transport used108

diffuse-world assumptions due to the limited data available from109

traditional steady state cameras. A measured STIR contains more110

information, and thus requires only minimum assumptions about111

the scene in order to infer its properties. In dual Photography [Sen112

et al. 2005], to reconstruct a surface hidden from the camera, one113

requires a light source to illuminate that hidden surface. Transient114

1


reasoning allows inferences even when no device is in the line of115

sight of that surface.116

In addition to simulated results, we built a prototype physical sys-117

tem using a directional femto-second laser and directionally sen-118

sitive pico-second accurate detectors. However, conducting full119

experiments is beyond our current abilities. We instead show all120

the key elements: geometry, photometry, bounce observations, and121

functioning in free space. We emphasize that our hardware experi-122

mentation, although based on expensive components, is extremely123

preliminary and the paper is focused on providing the theoretical124

tools necessary to enable new applications. Importantly, our hard-125

ware design is well aligned with existing commercial devices, and126

we believe that STIR imaging will find practical applications.127

Limitations: Our approach shares limitations with existing ac-128

tive illumination systems in terms of power and use scenarios:129

the sources must overcome ambient illumination, and only scenes130

within finite distance from the camera can be imaged. In addi-131

tion, we require precise high-frequency pulsed opto-electronics at132

a speed and quality that are not currently available at consumer133

prices. We expect that solid state lasers will continue to increase134

in power and frequency, doing much to alleviate these concerns.135

An important challenge we face is to collect strong multi-path136

signals. Direct reflections from scene elements are significantly137

stronger than light which has traveled a more complex path, pos-138

sibly reflecting from diffuse surfaces. Computationally, solving139

the inverse light transport problem introduces significant complex-140

ity. As with all inverse problems, the inverse light transport prob-141

lem has degenerate cases in which multiple solutions (scenes) exist142

for the same observable STIR. Importantly, although our additional143

time-domain data is noisy, it still restricts the class of solutions to a144

greater extent than using the more limited data of traditional cam-145

eras. As is common, we make a set of a priori assumptions that146

serve as robustness constraints for the optimization algorithm we147

use to estimate the scene parameters.148

Contributions We propose a model of the dynamics of global il-149

lumination and inverse rendering using transient properties of light150

transport. This paper makes the following contributions:151

• A state space dynamical system formulation to derive the152

relationship between the transient and steady state impulse153

response, and a state update approach based on time rather154

than the photon bounce methods commonly used in computer155

graphics.156

• A specific system identification algorithm which accepts the157

scene STIR, and infers the scene structure and reflectance.158

• Example inverse rendering scenarios in which transient rea-159

soning can be used to infer scene properties that are beyond160

the reach of traditional machine vision: including direct ob-161

servation of distance, separation of direct and indirect illu-162

mination, and inference of scene elements hidden from the163

camera.164

• Reflectance-based segmentation, relighting, and material re-165

placement from a single STIR photo.166

• A hardware design for transient imaging based on a femto-167

second laser and pico-second accurate detectors, as well as168

initial experiments from our prototype.169

2. Related work170

2.1. Imaging Solutions171

LIDAR: LIDAR systems modulate light, typically on the order172

of nanoseconds, and measure the phase of the reflected signal to173

determine depth [Kamerman 1993]. Flash LIDAR systems use a174

2D imager to provide fast measurement of full depth maps [Id-175

dan and Yahav 2001; Lange and Seitz 2001; Gvili et al. 2003].176

Importantly, a number of companies are pushing this technology177

towards consumer price points [Canesta ; MESA Imaging ; 3DV178

Systems ; PMD Technologies ]. The quality of phase estimation179

can be improved by simulating the expected shape of the reflected180

signal [Jutzi06], or estimating the effect of ambient light [Miya-181

gawa and Kanade 1997; Schroeder et al. 1999; Kawakita et al.182

2000; Gonzalez-Banos and Davis 2004]. Some LIDAR systems183

do measure the transient photometric response function explicitly.184

Separately detecting multiple peaks in the sensor response can al-185

low two surfaces, such as forest canopy and ground plane, to be de-186

tected [Blair et al. 1999; Hofton et al. 2000] and waveform analysis187

can detect surface discontinuities [Vandapel et al. 2004]. However,188

all of these methods reason locally about the sensor response in a189

single direction, rather than about the global scene structure. This190

paper proposes that complex global reasoning about scene contents191

is possible given a measured TPRF. Although SONAR does not use192

light propagation, reasoning about global structure in that domain193

is common [Russell et al. 1996].194

Time gated imaging: Time gated imaging allows a reflected195

pulse of light to be integrated over extremely short windows, effec-196

tively capturing I(x, y, tδ). Multiple captures while incrementing197

the time window, tδ , allow I(x, y, t) to be captured. e.g. Busck198

et. al show a response function measured to 100 picosecond accu-199

racy [Busck and Heiselberg 2004]. While gated imaging is related200

to LIDAR, it has uses beyond 3D imaging. Nanosecond windows201

are used for imaging tanks at the range of kilometers [Andersson202

2006]. Picosecond gating allows imaging in turbid water [McLean203

et al. 1995]. Femtosecond windows allow ballistic photons to be204

separated from scattered photons while imaging through biological205

tissue [Farsiu et al. 2007; Das et al. 1993]. Most applications make206

limited use of global reasoning about scene characteristics, instead207

using a single time-gated window to improve signal to noise ratio208

while imaging.209

Streak cameras: Streak cameras are ultrafast photonic recorders210

which deposit photons across a spatial dimension, rather than in-211

tegrating them in a single pixel. Using a 2D array, I(x, yδ, t) can212

be measured. Sweeping the fixed direction, yδ , allows I(x, y, t)213

to be captured. Picosecond streak cameras have been available for214

decades [Campillo and Shapiro 1983]. Modern research systems215

can function in the attosecond range [Itatani et al. 2002]. Commer-216

cially available products image in the femtosecond regime [Hama-217

matsu ].218

2.2. Algorithmic approaches219

Global light transport: Light often follows a complex path be-220

tween the emitter and sensor. A description of forward steady-221

state light transport in a scene is referred to as the rendering equa-222

tion [Kajiya 1986a]. Extensions have been described to include223

transient light transport [Arvo 1993], but no rendering work has yet224

built on this foundation. Accurate measurement of physical scene225

properties, often called inverse-rendering, requires reasoning about226

this path [Marschner 1998; Patow and Pueyo 2003]. Complex mod-227

els have been developed for reconstructing specular scenes [Kutu-228

lakos and Steger 2007], transparent scenes [Morris and Kutulakos229

2007], Lambertian scenes [Nayar et al. 1990], reflectance proper-230

2


ties [Yu et al. 1999], joint lighting and reflectance [Ramamoorthi231

and Hanrahan 2001], and scattering properties [Narasimhan et al.232

2006]. All of this work has made use of traditional cameras which233

provide measurements only of the steady-state light transport phe-234

nomena. In this work, we propose that transient light transport can235

both be observed and meaningfully used to improve estimates of236

scene properties.237

Capturing light transport: Recent work in image-based model-238

ing and computational photography has shown several methods for239

capturing steady-state light transport [Sen et al. 2005; Garg et al.240

2006; Masselus et al. 2003; Debevec et al. 2000]. The incident illu-241

mination is represented as a 4D illumination field and the resultant242

radiance is represented as a 4D view field. Taken together, the 8D243

reflectance field represents all time-invariant interaction of a scene.244

More relevant to this paper, the light transport has been decomposed245

into direct and indirect components under the assumption that the246

scene has no high-frequency components [Nayar et al. 2006], as247

well as into multi-bounce components under the assumption that248

the scene is Lambertian [Seitz et al. 2005].249

3. Transient Light Transport250

The theory of light transport describes the interaction of light rays251

with a scene. Incident illumination provides first set of light rays252

that travel towards other elements in the scene and the camera. The253

direct bounce is followed by a complex pattern of inter-reflections254

whose dynamics is governed by the scene geometry and material255

properties of the scene elements. This process continues until an256

equilibrium light flow is attained.257

We consider a scene S (figure 1) composed of M small planar258

facets (patches with unit area) p1, . . . pM with geometry G =259

Z,D,N, V comprised of the patch positions Z = [z1, . . . , zM ]260

where each zi ∈ R3; the distance matrix D = [dij ] where261

dij = dji, dii = 0 is the Euclidean distance between patches pi262

and pj ; the relative orientation matrix N = [n1, . . . , nM ] consists263

of unit surface normal vectors ni ∈ R3 at patch pi with respect264

to a fixed coordinate system, and the visibility matrix V = [vij ]265

where vij = vji = 0 or 1 depending on whether or not patch pi266

is occluded from pj . For analytical convenience, we consider the267

camera (observer) and illumination (source) as a single patch de-268

noted by p0. All the analysis that follows can be generalized to269

include multiple sources and the observer at an arbitrary position in270

the scene.271

We now introduce a variation of the rendering equation [Kajiya272

1986b] in which light takes a finite amount of time to traverse dis-273

tances within the scene. This introduces delays in the arrival of274

light from one patch to the other. Let t denote a discrete time step275

and Lij : i, j = 0, . . . ,M be the set of radiances for rays that276

travel between scene patches. Transient light transport is governed277

by the following dynamical equation which we term as transient278

rendering equation:279

Lij [t] = Eij [t] +M∑k=0

fkijLki[t− δki] (1)

Equation 1 states that the radiance Lij [t] of a ray traveling from280

patch pi to pj at time t is the sum of emissive radiance Eij [t] and281

the appropriately weighted sum of the incoming radiances from all282

other patches pj , j 6= i from earlier time instances. For simplicity,283

let the speed of light c = 1. Then the propagation delay δij is equal284

to the distance dij (see figure 2). The scalar weights fkij or form285

factors denote the proportion of light incident from patch pk on to286

0p

2p

1p

3p

4p

5p

1n

2n

3n

4n

5n

12d

23d13d

43d

54d

03d

12 0v =

12 1v =

03 1v =

3 3 3( , , )y zxz z z

2 2 2( , , )y zxz z z

1 1 1( , , )y zxz z z

5 5 5( , , )y zxz z z

Figure 1: A scene consisting of M = 5 patches and theillumination-camera patch p0. The patches have different spatialcoordinates (zxi , z

yi , z

zi ), orientations ni and relative visibility be-

tween patches vij . The patches also have different material proper-ties, for instance p4 is diffused, p4 is translucent and p5 is a mirror

pi that will be directed towards pj .287

fkij = ρkij

(cos(θin)cos(θout)

‖ zi − zj ‖2vkivij

)where ρkij is the directional reflectance which depends on the ma-288

terial property and obeys Helmholtz reciprocity (ρkij = ρjik),289

θin is the incident angle and θout is the viewing angle (see fig-290

ure 3). Additionally, if the patch does not interact with itself and291

then fkij = 0 for k = i or i = j. We assume that the scene292

is static and material properties are constant over the imaging in-293

terval. The source and observer patch p0 does not participate in294

inter-reflections: Fi0j = 0; i, j = 0, . . . ,M .295

020 2[ ]tL δ+02[ ]E t

1p

3p

2p

0p

021 2[ ]tL δ+

023 2[ ]tL δ+

020 21 1[ ]L t δ δ++

020 23 3[ ]L t δ δ++

Figure 2: A ray impulse E02[t] directed towards patch p2 at timet. This ray illuminates p2 at time instant t + δ02 and generatesthe directional radiance vector [L20[t + δ], L21[t], L23[t]]. Theselight rays travel towards the camera p0 and scene patches p1 and p3

resulting in global illumination

We model illumination using the emitter patch p0. All other patches296

in the scene are non-emissive, Eij [t] = 0 : i = 1, . . . ,M ; j =297

0, . . . ,M ; t = 0, . . . ,∞. Illumination is the set of radiances298

E0j [t] : ∀j = 1, . . . ,M ; t = 0, . . . ,∞ representing the light299

emitted towards all scene patches at different time instants.300

3


ip

[ ]ki kiL t δ−kpjp

[ ]kiL t [ ]ijL t

[ ]ji ijL t δ+

inθ outθ1 kijf

Figure 3: Scalar form factors fkij are the proportion of light inci-dent from patch pk on to pi that will be directed towards pj

The outgoing light field at patch pi is the vector of directional ra-301

diances L[i, t] = [Li0[t], . . . , LiM [t]]T and for the entire scene we302

have the transient light field matrix L[t] = [L[1, t], . . . ,L[M, t]]T303

which contains (M(M − 1) + M) scalar irradiances. We can304

only observe a projection of L[t] that is directed towards the cam-305

era p0. At each time t we record a vector of M intensity values306

Lc[t] = [L10[t− δ10], . . . , LM0[t− δM0]]T .307

A transient light field camera (see figure 4) comprises of a gener-308

alized sensor and a pulsed illumination source. The sensor is direc-309

tional: it can separately sense the incident light field arriving from310

distinct patches. It can also time sample the incoming irradiance311

at picosecond or shorter time scales. A transient light field camera312

captures a time image I(ix, iy, t) : i = 1, . . . ,M: a collection313

(of time profiles) of irradiances arriving at the camera pixel (ix, iy)314

observing the patch pi. Due to bandwidth limitations, the incoming315

light can only be sampled at discrete instants. The pulsed illumina-316

tion source should be capable of transmitting directional impulses317

and the sensor and illumination source need to be synchronized.318

2p

3p

0p

01δ 1p

02δ

03δ

t →

2[ ]cL t

1[ ]cL t

3[ ]cL t

Figure 4: The transient light field camera consists of a pulsed illu-mination source and a high bandwidth detector which are synchro-nized with each other. It can time sample the incoming light fieldand sense the direction of incident light rays

We define the ray impulse for a patch pi as a unit pulse that is di-319

rected only towards pi at time t = 0. Analogous to the notion of320

an impulse response of a multiple-input multiple-output (MIMO)321

filter, we define the Space Time Impulse Response (STIR) of the322

scene S denoted by STIR(S) as the collection of time profiles ar-323

riving at the camera from all patches when only a single patch is324

illuminated. We can measure STIR(S) using the transient light325

field camera as follows (also see figure 5):326

1 Illuminate patch pi with an impulse ray327

2 Observe the radiance profile arriving at the sensor from all the328

patches i.e record Lc[t] for t = 0, . . . ,∞329

3 Repeat steps 1-2 for all observable scene patches pi : i =330

1, . . . ,M331

2p

1p

3p

0p

11O

12O

212O

213O

223O

3123O

t →

t →

t →

2p

1p

3p

0p

t →

t →

t →

221O

3231O

Figure 5: Measuring the STIR of a scene with 3 patches. A ray im-pulse is directed towards each patch and a time image is recorded.The time image is the time profile of radiances arriving at the cam-era from different patches. The STIR is simply a collection of timeimages captured under different ray impulse illuminations

At macro scales, light transport is a linear phenomenon. We note332

that the transient rendering equation (1) describes a MIMO linear333

time invariant (LTI) system. One of the important properties of an334

LTI system is that scaled and time shifted inputs will result in a335

corresponding scale and time shift in the outputs. Hence the STIR336

of the scene can be used to completely characterize its behavior337

under any general illumination.338

The transient rendering equation (1) is a difference equation de-339

scribing the evolution of a discrete linear dynamical system. If the340

delays in the scene are finite, then we need a finite number of states341

to drive the forward light transport process without any loss of ra-342

diance information. Our key contribution is to show that transient343

light transport can be represented as a MIMO LTI state space sys-344

tem whose parameters can be estimated using a system identifica-345

tion framework. We develop our formulation for a scene with arbi-346

trary form factors and geometry but for simplicity we analyze this347

scenario:348

• Each patch in the scene is visible from all the other patches349

i.e. vij 6= 0 : i, j = 0, . . . ,M350

• Each patch pi has a non-zero diffuse component351

Clearly these assumptions are not true for scenes with occluders352

and objects with no diffuse component such as mirrors. In the later353

sections, we will extend our framework for general scenes. In the354

next section we show how to model transient light transport as a355

state space system.356

4. State Space Formulation357

Now we formulate transient light transport in a scene S as a tradi-358

tional state space system which can be parameterized in terms of359

scene geometry G and form factors (fkij). A state space represen-360

tation is a mathematical model of a dynamical system with input,361

4


output and state variables related by first-order differential equa-362

tions. A state space model can be represented in a matrix form363

when the dynamical system is linear and time invariant [Kailath364

1979]. It provides a convenient and compact way to model and an-365

alyze systems with multiple inputs and outputs. The internal state366

variables are the smallest possible subset of system variables that367

can represent the entire state of the system at any given time. State368

variables must be linearly independent and the minimum number369

of state variables required to represent a given system is usually370

equal to the order n of the system’s defining differential equation.371

Our contribution is to show that transient light transport in any real372

scene can be represented as the following LTI state space system 1373

(see figure 6)374

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] + η(t)

where x[t] ∈ Rn is called the state vector which in our case is

( )ΘB + Delay

( )ΘA

( )ΘC[ ]tu [ ]ty

[0]x

[ ]tx[ 1]t +x+

( )tη

Figure 6: Discrete state space representation of the transient lighttransport process in any scene. The state matrix A is related tointer-reflection between scene elements (indirect component). Theinput matrix B controls how the illumination (u[t]) directly inter-acts with the scene (direct component) and the observation matrixC determines how much of the global light transport (x[t]) reachesthe camera (y[t]). The observations are recorded with zero-meanAWGN noise η(t) ∼ N (0, σ2). Also, the system matrices can beparameterized in terms of scene geometry G and form factors Θ

375

related to the transient light field matrix L[t], y[t] ∈ RM is called376

the output vector and is related to time image Lc[t], u[t] ∈ Rp is377

called the input (or control) vector which can be derived from the378

instantaneous illumination vector E[t].379

The matrix A ∈ Rn×n is the state matrix which embeds the form380

factors for inter-reflection between scene elements,B ∈ Rn×p is381

the input matrix which consists of the form factors related to illu-382

mination and C ∈ RM×n is the output or projection matrix which383

embeds form factors related to the observer (camera). The time384

variable is t and all system matrices A,B,C are time invariant.385

We assume we are only dealing with only finite measurement pre-386

cisions and hence all the delays are rational which can be adjusted387

to be integers. We now show how to formally construct the system388

matrices using delays and form factors.389

State Vector x[t]: The state vector is essentially a vectorization390

of the transient light field matrix L[t] for all possible delay values.391

1The discrete time-invariant state space system is usually expressed as

x[t+ 1] = Ax[t] +Bu[t] + w[t]

y[t] = Cx[t] + D[t] + η(t)

whereD ∈ RM×p is the gain matrix which models the direct feed-throughof the input. It is often chosen to be the zero matrix 0 ∈ RM×p, which isalso true for our formulation since light from the source must interact withthe scene before reaching back to the camera. The vectors w(t) and η(t)are Gaussian noise vectors which model the process and measurement noiserespectively. We only consider measurement noise η(t) for our analysis andtreat process noise as an intrinsic property of the system

The intuition behind the construction of state vector is that for each392

pair of patches pi and pj we need to hold the state of outgoing393

light field Lij [t] till it reaches pj . Thus, the system needs to have394

a memory of δij states for each patch pair. If all the distances (and395

hence delays) in the scene are finite we only need a finite number396

of states to model transient rendering equation (1). The size of the397

state vector is the order of the system n (=∑i

∑j δij). We can398

construct the state vector x[t] as follows:399

x[t] = [xi[t],x2[t], . . . ,xM [t]]T

xi[t] = [xij,r[t]] , j = 0, . . . ,M ; j 6= i; r = 1, . . . , δij

xij,r[t] = Lij [t− r]

Control Vector u[t]: The input pulses originate at p0 and reach a400

patch pi after a time delay δ0i. To model this propagation delay,401

we need δ0i states for each pi. We have also shown that general402

illumination can be realized by using the ray impulse set E0i[t] :403

i = 1, . . .M ; t = 0, . . .∞. It therefore suffices to model the404

input vector u(t) for ray impulse illumination only.405

u[t] = [u1[t],u2[t], . . . ,uM [t]]T

ui[t] = [uir[t]] , r = 0, . . . , δ0i

uir(t) =

E0i[t] if t = 0, r = δ0i

0 otherwise

Output Vector y[t]: The output of the linear state space406

system is the directional irradiance measured at the camera407

(p0) arriving from the M patches. Thus y[t] = Lc[t] =408

[L10[t− δ10], . . . , LM0[t− δM0]]T409

State Matrix A: The state matrix A must model two aspects of410

the transient light transport process: the interactions due to inter-411

reflections (indirect component) and the intrinsic delays δij . The412

intuition behind constructing the state matrix A is that rows of the413

A matrix that correspond to ray radiance Lij [t] should model the414

light transport (equation 1). This implies that it should contain the415

form factors fkij , k = 1, . . . ,M at the appropriate column indices416

governed by delays δki : k = 1, . . . ,M . The rest of the rows of417

A must contain exactly a single 1 entry along the off-diagonal to418

model the time propagation. As a simple example we show the A419

matrix for the lambertian scene with M = 5 patches in figure 7.420

Input Matrix B: The input matrix B models the transport of light421

from the emitter patch p0 to the other scene patches (direct compo-422

nent). Since the B matrix occurs with input (illumination) vector,423

the rows of B corresponding to the ray radiance Lij [t] should con-424

tain the form factor f0ij in the appropriate column index governed425

by δ0i. The rest of rows of B are all zeros 01×p. The B matrix for426

the same lambertian scene with M = 5 patches is shown in figure427

8.428

Output Matrix C: The projection matrix C is related to the ob-429

servability and embeds the form factors fij0 : i, j, k = 1, . . .M430

in each of its M rows, again at the appropriate column indices gov-431

erned by δi0. The matrix C for the example lambertian scene is432

shown in figure 9.433

Also the initial state x(0) is the zero vector 0 ∈ Rn since we as-434

sume no a priori scene illumination. The effect of ambient light435

can be modeled by setting x(0) to the value of ambient illumina-436

tion. We end the formulation by restating the fact that all the sys-437

tem matrices A,B and C depend on the relative distances between438

patches and the form factors. A models the indirect component of439

light transport (inter-reflections), B governs how the illumination440

interacts with the scene (direct component) and C is the projection441

matrix which controls how the overall light transport in the scene442

reaches the observer (camera).443

5


Figure 7: The state matrix A for a lambertian scene containingM = 5 diffuse patches. Using the diffuse assumption the form fac-tors fkij can reduced to fki since the same amount of radiance willbe directed towards all the patches. This also implies that Lij [t]can be reduced to Li[t]. As an illustration the first row of A thatcorresponds to the state variable L1[t] contains the form factorsf21, f31, f41, f51 (we let f11 = 0). These are in columns indicesthat governed by the delays δ21, δ31, δ41, δ51. The off-diagonal 1smodel the forward propagation of light

020 2[ ]tL δ+02[ ]E t

1p

3p

2p

0p

021 2[ ]tL δ+

023 2[ ]tL δ+

020 21 1[ ]L t δ δ++

020 23 3[ ]L t δ δ++

Figure 8: The input matrix B for a lambertian scene containingM = 5 diffuse patches. The rows of B that corresponds to thestate variable Li[t] contain the form factors f0i : i = 1, . . . , 5.These are in columns indices that governed by the delays δ0i : i =1, . . . , 5

Figure 9: The output matrix C for a lambertian scene containingM = 5 diffuse patches. The M rows of C contain a 1 in eachrow in the column position that corresponds to the state variableLi[t − δi0]. This models the fact that the diffuse light rays reachcamera after a delay

4.1. System Identification444

System identification is the algorithmic procedure of building dy-445

namical models from measured input/output data and estimating446

their parameters. In the context of the light transport problem, sys-447

tem identification refers to two kinds of inverse problems:448

• Replicating the I/O behavior of scene which enables appli-449

cations such as scene relighting from a novel viewpoint and450

illumination451

• Estimating the parameters of the state space system system452

that describes the transient light transport process. These pa-453

rameters are the form factors [fkij ] : i, j, k = 0, . . . ,M454

which can be used to devise novel algorithms for scene un-455

derstanding and material sensing456

The form factors can be vectorized into the parameter matrix Θ457

where Θ is a O(M3) length vector for a scene containing M458

patches. The system matrices A,B and C are parameterized by459

the delays and Θ and we have the parameterized state space model460

M(Θ)461

M(Θ) :x[t+ 1] = A(Θ)x[t] +B(Θ)u[t]

y[t] = C(Θ)x[t] + η[t]

Our system identification problem can be formally stated as:462

Given STIR(S)

estimate A(Θ), B(Θ), C(Θ)

The reflectance terms ρkij occurring in fkij can be arbitrary (within463

the bounds of energy conservation and Helmholtz reciprocity). Ide-464

ally we would like to estimate the whole set of reflectances whose465

cardinality scales O(M3) in the number of scene patches M . As466

an example, for a real world scene comprising of M = O(103)467

patches, we have O(109) form factors to estimate.468

The inverse process of computing estimates for such a large num-469

ber of parameters is extremely computationally expensive and is470

an ill-conditioned inverse problem. Moreover the form factors en-471

ter nonlinearly in the observations recorded by the camera due to472

the recursive behavior of system. System identification for tran-473

sient light transport is a challenging inverse problem and we have474

demonstrated initial progress in this area by formulating it based on475

a thorough understanding of the physical process.476

If the STIR of the scene is available then it can be directly used to477

obtain the geometry G of the scene by solving the inverse geome-478

try problem discussed in section 5. It can also be used to construct479

the I/O pair [y[t],u[t] : t = 0, . . . ,∞] which we will use in sec-480

tion 6 to estimate Θ by solving the inverse reflectance problem. If481

we only have a time image which profiles the light arriving at the482

camera from the scene y[t] : t = 0, . . . ,∞ in response to an483

arbitrary illumination u[t] : t = 0, . . . ,∞ then we can obtain484

STIR(S) by using subspace system identification []. We now485

propose the following system identification algorithm486

5. Inverse Geometry487

Unlike traditional time-of-flight imaging, our goal is to not only488

compute the direct distance from first bounce, but also the pairwise489

distance between patches. If the camera and light source are in-490

trinsically calibrated, the Euclidean direction of each ray is known491

directly providing 3D location of each patch. But, to support more492

sophisticated operations later such as estimating positions of hid-493

den patches, we will instead exploit the second and higher order494

bounces for inter-patch distance estimation. The first step in our495

6


Algorithm 1 SYSID [STIR(S)]

1. Solve inverse geometry using STIR(S)-Estimate the distance matrix Dusing onsets

-Compute the coordinate set Zusing isometric embedding

-Compute the surface normals Nusing smoothness assumption

2. Use D to construct the model M(Θ)4. Solve inverse reflectance problem

- Estimate form factors Θ using M(Θ)

inversion pipeline is to use the STIR(S) to infer scene geome-496

try G = X,D,N, V which is then used to construct the model497

M(Θ).498

5.1. Distances from STIR499

We now demonstrate how to compute the distance matrix D =500

[dij ] : i, j = 0, 1, . . . ,M from onsets contained in the STIR(S).501

NoteD is a symmetric matrix with a 0 diagonal and it may have up502

toM(M+1)/2 distinct entries. DefineO1 = O1i |i = 1, . . . ,M503

as the set of first onsets: the collection of all times O1i when the504

transient light field camera receives the first non-zero response from505

patch pi when illuminating the same patch (see figure 5). It is easy506

to observe that O1i is the time taken by the impulse ray originat-507

ing at p0 directed towards pi to arrive back at p0 after the first508

bounce; this corresponds to the direct path p0 → pi → p0. Simi-509

larly we can define O2 = O2ij |i, j = 1, . . . ,M ; j 6= i as the set510

of second onsets: the collection of times when the transient light511

field camera receives the first non-zero response from a patch pj512

when illuminating a different patch pi (see figure 5). This corre-513

sponds to the multi-path p0 → pi → pj → p0. Note that ideally514

O2ij = O2

ji. It is worth noting here although all the onsets con-515

tained in O1 and O2 correspond to first and second bounces re-516

spectively but they are onsets contained completely different time517

profiles Lci [t] : i = 1, . . . ,M ; t = 0, . . . ,∞ that comprise the518

STIR(S). This is a result of the procedure we use to measure the519

STIR (section 3).520

In order to compute D using O1 and O2 we con-521

struct the forward distance transform matrix T2 of size522

(M(M + 1)/2×M(M + 1)/2) which models the sum of523

appropriate combinations of path lengths contained in the distance524

vector d = [dij ] : i = 1, . . . ,M ; j = i, . . . ,M and relates it to525

the vector of observed onsets O. Then we solve the linear system526

T2d = O to obtain d and construct D. As an example, consider527

the scene in figure 10 consisting of 3 patches (M = 3). The linear528

system for this scene that will allow us to solve for the distances529

can be constructed as follows:530 2 0 0 0 0 01 1 0 1 0 01 0 1 0 1 00 0 0 0 2 00 0 0 1 1 10 0 0 0 0 2

d01

d12

d13

d02

d23

d03

=

O1

1

O212

O213

O22

O223

O33

It can be easily verified that for any M the forward distance trans-531

form matrix T2 is full rank and well-conditioned. Due to syn-532

chronization errors, device delays and response times; the observed533

onsets have measurement uncertainties which we assume to corre-534

spond to bounded error (ε)in D and we generate distance approx-535

imations dij such that (dij − ε) ≤ dij ≤ (dij + ε) : i, j =536

1, . . . ,M . Moreover we can use the redundancy in second onset537

0p

1p

3p

12d

23d

13d

03d

2p02d

01d

Figure 10: A scene with M = 3 patches showing the distancesbetween the patches that form the distance matrix D

values (O2ij = O2

ji) to obtain multiple estimates D and reduce er-538

ror by averaging them.539

5.2. Structure from Pairwise Distances540

An algebraic framework for estimating the scene structure Z using541

the distance estimates D is discussed in appendix ??. The prob-542

lem of estimating structure from distances can be stated as finding543

an isometric embedding Z ⊂ RM×3 → R3. For computational544

convenience we take p0 to be the origin, i.e. z0 = (0, 0, 0). An ex-545

ample of recovering structure from noisy distance estimates using546

the isometric embedding algorithm is shown in figure 11. The em-547

bedding algorithm and the use of convex optimization to compute548

the optimal coordinate set estimate Z in presence of distance uncer-549

tainties is also explained in appendix ??. The estimated coordinates550

Z can be used to recompute robust distance estimates.551

In general the orientation matrix N = [n1, . . . , nM ] which con-552

sists of unit surface normal vectors ni ∈ R3 cannot computed even553

with the knowledge of distances D and structure Z. However we554

can estimate N for a patches sharing a common surface normal.555

Also, if we assume that we are imaging piecewise smooth surfaces,556

then given the coordinate set X we can fit an analytical (piecewise)557

surface and compute N .558

5.3. Scenes with Occluders559

Now we consider the case of estimating geometry of a scene that560

contains a set H of patches hidden from the camera: v0i = 0 :561

pi ∈ H . To illustrate our analysis we make two assumptions: that562

there is no inter-reflection amongst hidden patches and no two third563

bounces involving hidden patches arrive at the same time in the564

same time profile contained in STIR(S). The likelihood of the565

latter assumption being true for general scenes is very high because566

because we are illuminating one patch at a time. If a patch pi is not567

visible from the camera p0 then the set of first and second onsets568

do not contain any responses from pi i.e the vector of distances569

dH = [dij ] : pi ∈ H, j = 0, . . . ,M cannot be estimated usingO1570

and O2. Hence we need to consider the set of third onsets O3 =571

O3ijk : i, j, k = 1, . . . ,M ; i 6= j; j 6= k which corresponds to572

third bounces. Note that the number of first onsets |O1| is O(M)573

and there are O(M2) second onsets and O(M3) third onsets. Also574

Euclidean geometry imposes that O3ijk = O3

kji. Although we can575

easily identify (label) the onsets contained in O1 and O2 using the576

STIR(S) labeling the onsets in O3 is non-trivial. We used the577

following procedure to compute distance estimates D for scenes578

with hidden patches discussed in more detail below:579

1 Estimate the distance set DS\S(H) for the restricted scene580

which only contains all the visible patches581

7


Figure 11: The original and estimated geometries of a piecewisesmooth scene with M = 147 patches. Each of the 3 planar sur-faces is made up of M/3 patches that have non-zero diffuse com-ponent. Given the noisy STIR of this scene, the distances D arecomputed using up to the 2 bounce onsets and the distance trans-form T2. We average to reduce noise in estimates using redundantonsets contained in the STIR. Then the patch coordinates are es-timated using isometric embedding (appendix ??) and a piecewisesmooth surface is fitted using the Matlab function griddata.This fitted surface is used to compute patch normals are using thefunction surfnorm

2 Use DS\S(H) and the stated assumption to label the time on-582

sets contained in O3583

3 Construct the distance transform operator T3 that relates the584

arrivals times for third bounce OH that involve the hidden585

patches and the distances to hidden patches dH586

4 Solve the resulting linear system T3dH = OH and obtain587

the complete distance set D588

2p

3p

1p

4p

0p

11O

12O

214O

241O 3

421O

3124O

3431O

3134O3

424O

3434O3

414O

3121O

3131O

3141O

t →

t →

t →

t →

Figure 12: A scene with M = 4 patches where the two patches p2

and p3 are not directly visible from the camera and the STIR onsetarrival profile. The green (first) and blue (second) onsets are a re-sult of direct observations from patches p1 and p4. The pattern ofarrival of third onsets depends on the relative distance of the hiddenpatches p2 and p3 from the visible patches. The onsets that corre-spond to light traversing the same Euclidean distance can be easilyidentified (they have the same arrival times in different time pro-files). Once the onsets are labeled, they are used to obtain distancesthat involve hidden patches

As a simple example consider the scene in figure 12. As-589

sume that the patches p2 and p3 are hidden. We first compute590

the distances: d01, d04, d14 between visible patches. The dis-591

tances (d21, d24) and (d31, d34) are not directly observable al-592

though once these distances are estimated, d02, d03, d23 can be593

computed using triangulation. Now we apply our labeling algo-594

rithm to identify third onsets: The onsets O3141, O

3414 are readily595

identified since we know distances to patch p1 and p4. The on-596

sets O3121, O

3131, O

3424, O

3434, O

3124, O

3134, O

3421, O

3431 can be dis-597

ambiguated using the fact that O3421 = O3

124 and O3431 = O3

134598

and that they arrive in different time profiles of the STIR(S) . We599

sort the remaining onsets based on their arrival times and label them600

based on their proximity to visible patches. This labeling procedure601

can be generalized to multiple planar hidden patches as:602

1 Establish an ordering of hidden patches based on their prox-603

imity to an arbitrary visible patch604

2 Identify the onsets that obey the constraint O3ijk = O3

kji and605

label them using the ordering in step 1606

3 Sort the remaining onsets according to their arrival time and607

again use ordering in step 1 to label them608

The onset arrival profile for the example scene is shown in figure 12.609

Now we construct the distance operator T3 and setup the system610

of equations to solve for the distances involving hidden patches as611

follows:612 2 0 0 01 1 0 00 0 2 00 0 1 1

d21

d24

d31

d34

= c

O3

121 −O11

O2124 − (O1

1 +O14)/2

O3131 −O1

3

O2134 − (O1

1 +O14)/2

8


020 2[ ]tL δ+02[ ]E t

1p

3p

2p

0p

021 2[ ]tL δ+

023 2[ ]tL δ+

020 21 1[ ]L t δ δ++

020 23 3[ ]L t δ δ++

Figure 13: Estimating distances in scenes with hidden patches: theoriginal and estimated geometries of the piecewise smooth scene asin figure 11. The second planar surface is occluded and all patcheshave a non-zero diffuse component. Again, given the noisy STIR ofthis scene we used up to the third bounce onsets and the operator T3

to estimate distance matrix D. We also averaged using redundantonsets contained in the STIR to reduce noise in estimates. Thecoordinates, surface fitting and the normals were computed usingexactly the same Matlab routines as figure 11

An example of estimating distances in a scene containing hidden613

patches is shown in figure 13. With our stated assumptions, extend-614

ing this construction for any number of hidden patches is straight-615

forward 2. Once the distance setD is estimated we do not need spe-616

cial constructions any further and we proceed to estimate structure617

(Z) and orientation (N ) as discussed in section 5. Having obtained618

the geometry G, we use the distance matrix to obtain the delays ∆.619

We can then construct the modelM(Θ).620

6. Inverse Reflectance621

In this section we develop a system identification formulation to622

estimate form factors contained in the parameter vector Θ. The623

numerical sensitivity of the modelM(Θ) and performance of the624

inverse reflectance procedure may vary dramatically between state625

space parameterizations. Hence it is very important to carefully626

choose a model parameterization whose behavior is as close to the627

physical process as possible. We have used physical insight and628

the knowledge of inferred geometry to constructM(Θ) so that it629

closely matches the actual transient light transport process.630

The structure of the system matrices A(Θ), B(Θ), C(Θ) of the631

state space modelM(Θ) is parameterized by the scene geometry632

G and the form factors contained in Θ. The Inverse reflectance633

problem is to estimate Θ given a time image y[t],u[t]|t =634

0, 1, . . . , T and the structural parameterization of M(Θ). T is635

the number of time samples under consideration. Note that if we636

are just given the STIR(S) we can easily construct a time im-637

age from it. The next step is formulating the estimation of the638

model parameters as an optimization problem: we seek to find639

a ΘOPT such that it minimizes an error norm J(.) between the640

predicted output y[t|Θ] = M(Θ,u(t)) and the observed output641

y[t], ∀t = 0, . . . ,∞. Regularization is frequently used to over-642

come the non-uniqueness of minimizing Θ and a penalty term is643

2An important generalization of the hidden patches scenario is to esti-mate distances in the case of multiple interactions between hidden patches.It can be shown that if a hidden patch can have at most N inter-reflectionswith the other hidden patches, then we need to utilize onsets that correspondto up to (N + 3) bounces i.e. the setsO1,O2, . . . ,ON+3

added to the cost function:644

ΘOPT = minΘ∈Ω

J(y[t|Θ],y[t]) + λ ‖ Θ ‖2

where λ is a user selectable control and Ω is the solution space of645

all possible parameter values. We used two common quadratic error646

criterion: the least squares Output Error Minimization (OEM) and647

one step Prediction Error minimization (PEM)648

PEM : J(y[t|Θ],y[t]) =1

T

T−1∑t=0

‖ y[t|(t− 1),Θ]− y[t] ‖2

OEM : J(y[t|Θ],y[t]) =1

T

T−1∑t=0

‖ y[t|Θ]− y[t] ‖2

For both PEM and OEM cost functions it can be shown that649

the model output is linear in the unknown parameter vector Θ:650

y[t|Θ] = Φ(t)TΘ where Φ(t) is a linear transform derived using651

structural parameterization of the modelM(Θ). This is discussed652

in detail in the appendix ??.653

The next step is iterative numerical minimization of the cost func-654

tion in equation 2 to compute the optimum parameter values ΘOPT655

starting with an initial guess Θ0. We obtained good initial esti-656

mates for parameter values (such as diffuse reflectance) using a sin-657

gle image taken with a traditional camera which captures steady658

state light transport. To determine a numerical solution to the659

parameter optimization problem (2) we minimized the cost func-660

tion by using the following iterative gradient descent methods:661

Gauss-Newton, Steepest Descent and Levenberg-Marquardt algo-662

rithms. We found that the Levenberg-Marquardt (LM) algorithm663

outperformed the other two methods in terms of both convergence664

and computational complexity. An excerpt of the Matlab code665

that implements the above procedure for our PEM formulation us-666

ing the System identification Toolbox and Control667

theory Toolbox is included in appendix 12.668

The computation of the Jacobian J ′(y[t|Θi],y[t]) of the cost func-669

tion is a central step in the numerical minimization process. For670

our case of linear state space system identification an analytic Jaco-671

bian can be efficiently obtained by simulating the linear state space672

systemM(Θ). It is shown in appendix ?? that the calculation of673

the Jacobian boils down to simulatingM(Θ) for every element of674

the parameter vector. Therefore, if there are O(M3) parameters in675

Θ, we need to simulate O(M3) linear dynamical systems at each676

step of the algorithm in order to compute both the Jacobian and the677

Hessian. This computational burden is circumvented by using ad-678

joint methods [], allowing the computation of Jacobian using only679

2 simulations ofM(Θ).680

Inverse problems such as system identification usually do not have681

a unique solution. In the case of transient light transport, it im-682

plies that there are several possible scenes with different state space683

parameterizations M(Θ) that correspond to the same STIR(S)684

or time image y[t],u[t]|t = 0, 1, . . . , T. We now summarize685

the unique properties of our state space base formulation that allow686

us to regularize the inverse system identification problem, produce687

robust parameter estimates in presence of noise and reduce compu-688

tational complexity:689

1 Imposing structural constraints on the system matrices690

A(Θ), B(Θ), C(Θ) using scene geometry G constraints691

the solution space of set of possible transfer functions692

M(Θ)|Θ ∈ Ω to scenes having the same G as the orig-693

inal scene but possibly different form factors694

2 The structural constraints also automatically ensure that our695

state space parameterization M(Θ) is asymptotically sta-696

ble. This means that given a bounded input, the dynamical697

9


system will eventually reach steady state. It is also a property698

of the physical light transport process. Requiring asymptotic699

stability of the model leads to further restriction on the solu-700

tion space. A detailed discussion on stability can be found in701

appendix ??702

3 Exploiting physical properties of light transport, we can fur-703

ther constrain the parameters (fijk) of M(Θ). For instance704

Helmholtz reciprocity (ρijk = ρkji; 0 < ρijk < 1), and en-705

ergy conservation(∑

i,j,k fijk < 1; 0 < fijk < 1)

restrict706

the otherwise unconstrained parameter set (Ω = R|Θ|) to a707

much smaller subset bounded by the unit hypercube. Starting708

with good initial estimates, the convergence and performance709

can be further improved710

4 One of the biggest advantages of modeling light transport as a711

linear state space system is that we have an analytical expres-712

sion of the Jacobian (J ′(.)) of the cost function used in nu-713

merical minimization procedure. Moreover J ′(.) is sparse for714

our formulation. Usually Jacobian is approximated by finite-715

differences and may lead to poor convergence. Having an716

explicit access to the Jacobian of the cost function not only717

reduces computational complexity but also improves conver-718

gence and numerical performance of the minimization algo-719

rithm, especially near ΘOPT720

5 The PEM is an online algorithm and is used to update esti-721

mates as the I/O data is being recorded. In the presence of722

low measurement noise (high SNR) it produces accurate es-723

timates with few observations. If the measurement noise is724

high the offline OEM produces unbiased and consistent esti-725

mates of the parameter vector Θ as T → ∞. Noise analysis726

is discussed in appendix ??727

6 Our formulation allows a straightforward integration of higher728

order BRDF parameterization in the state space model as dis-729

cussed in section 6.1730

Like all ill-conditioned inverse problems, our inverse reflectance al-731

gorithm may be unstable. Even with the key structural and compu-732

tational properties of our state space formulation, we suffer from733

poor convergence in low SNR (high measurement noise) condi-734

tions. Also there is a strict dependence on the accuracy of esti-735

mation of scene geometry. Depending on the noise, initial guesses736

and structure of the scene, we might converge a local minimum.737

The rule of thumb is to use as much a priori information available738

about the scene as possible to impose structural and parametric con-739

straints and start with good initial guesses. In the next section, we740

show how to BRDF models to reduce model complexity and im-741

prove scalability.742

6.1. Simplified Reflectance Models743

One way to reduce the dimensionality of the problem is using a744

parametric model of the BRDF. In the BRDF is unparameterized745

there are O(M2) form factors per patch. Once we have computed746

the scene geometry G, then by assuming Lambertian (diffuse) or747

Phong reflection models we can decrease the number of parameters748

at each patch to just 1 or 3 respectively.749

6.1.1 Lambertian Scenes750

For demonstration of concept we will use the the Lambertian or751

diffuse model:752

ρkij = ρdi

fkij = ρdi cos(θin) = fki

lout[i→ j] = ρdi cos(θin)lin[k → i]

where ρdi is the diffuse reflectance of patch pi. lout[i → j] is the753

fraction of incident light lin[k → i] arriving from pk onto pi that is754

reflected towards pj . Under the lambertian assumption each patch755

radiates light equally in a hemisphere of directions i.e. lout[i →756

j] = lout[i] : ∀j = 1, . . . ,M . Once the scene geometry G is757

estimated (or known) then the problem of estimating O(M3) form758

factors reduces to inferring justO(M) reflectance parameters since759

cos(θin) is available.760

6.1.2 Parameterized General Scenes761

If we use the Modified-Phong BRDF capable of modeling glossy762

(specular+diffuse) patches:763

ρkij = ρdi + ρsi [cos(θref )]αsi

fkij = ρkij[vkivijcos(θin)cos(θout)/d

2ij

]lout[i→ j] = fkij lin[k → i]

The model parameters ρdi , ρsi and αsi control the behavior of the764

material ranging from purely diffuse to highly glossy and θref is the765

angle between the direction of true reflection and viewing direction.766

Again, if the scene geometry G is estimated (or known) we need to767

estimate only 3M parameters.768

7 Implementation769

Device770

type of laser, crystal, wavelength, repetition771

[Photo here of the setup, Cartoon]772

Experiment773

There are free-space mechanisms to use femto and pico-second ac-774

curate devices. So far they have been used only in small-size en-775

vironments for femtochemistry of biological samples such as for776

optical coherence tomography.777

Performance Analysis778

• (a) Geometry: Computing distance without interference .. two779

detectors [Photo and Plot]780

• (b) Photometry: Show the radial fall off .. how good confi-781

dence can we measure this, show plots [Plot]782

• (c) Effect of Bounces: show 1, 2, 3 bounces [Plot of some-783

thing that has three bounces]784

Ability to infer the scene785

• (a) single laser, onto diffuser, second bounce from a shiny786

surface787

• (b) glass barcode788

• (c)789

8 Implementation790

In our initial experiments, we opted to verify our theory using791

a pulsed laser and single photodiode. We believe this relatively792

simple configuration demonstrates that practical implementation is793

within reach. In particular, we intend this prototype to show that it794

is possible to reason about multi-bounce global transport using the795

STIR of a scene. A 2D implementation using time gated or sweep796

imagers is left for the future.797

10


Figure 14: Verifying distance calculations. (a) Configuration of twodetectors receiving same femto-second laser pulse with distance-induced delay. (b) Plot from a 10GHz scope showing the time ar-rival.

8.1 Device798

Our sensor is a commercially-available reverse-biased silicon pho-799

todiode Thorlabs FDS02 intended for fiber-coupling. The sensor800

has an active area of 250 micron diameter and a condensing lens of801

diameter 1 millimeter. Sensor signals (photocurrents) are digitized802

by 5 GHz sampling oscilloscope, LeCroy Wavemaster 8500A. The803

combination results in an impulse response width (-3 dB) of 210804

picoseconds. No optical or electrical amplifier are used, and the805

sensor has a gain of unity.806

Scene illumination is provided by a modelocked Ti-sapphire oscil-807

lator, manufactured by Kapteyn-Murnane Laboratories, with a cen-808

ter wavelength of 810 nm emitting pulses of FWHM duration 50 fs809

at a repetition rate of 93.68 MHz. The spatial bandwidth of these810

pulses so far exceeds the response bandwidth of our sensor that we811

consider the laser pulses to be effectively impulses. Average laser812

power is 420 milliwatts, corresponding to peak powers greater than813

84 kW. The high peak powers generated by our oscillator are criti-814

cal for the preservation of the signal-to-noise ratio in a photodiode815

generating current proportional to incident power.816

8.2 Experiments and Performance Analysis817

The combination of a high-speed photodiode with a modulated818

source of illumination is the basis of the optical time-domain re-819

flectometer commonly used in the telecommunications industry to820

detect back reflections caused by defects in optical fibers. The nov-821

elty of our device is dual: that it functions in free-space, rather than822

being confined to an optical fiber; and that the scale of its opera-823

tion is centimeters, rather than the dozens of meters to hundreds of824

kilometers usual to telecommunication. The increased difficulty on825

account of the signal loss due to both misalignment and inverse-826

square diffusion loss in free-space as well as the much higher827

speeds—and ensuing further inhibition of the signal—is clear.828

The success of a transient light field camera depends on the ability829

to recover two broad factors from scene information: photometry830

and geometry. We successfully demonstrated that both factors can831

be recovered in a free space setting, by three experiments.832

The first experiment, depicted in Figure ??, tested the precision3 of833

distance measurements achievable with our device and correspond-834

ingly the ability of our sensor to recover scene geometry. A plot of835

the time delay against a reference arm as the detector travels on a836

linear stage is seen in Figure ??.837

The second experiment demonstrated the recoverability of the scene838

photometry after multiple-bounces; that is, that a twice-diffused839

signal will be detectable. A scene diagram is found in Figure ??.840

The two diffusers were both anisotropic: the dull side of aluminum841

foil and a similar copper foil. The impulse response of this scene,842

consisting solely of the second bounce from the second patch, was843

successfully recorded, showing that a scene with two diffusions844

does not result in an irrecoverable loss of signal. Therefore, the845

photometric characteristics of the two diffusers can be inferred.846

The third experiment combined demonstrations of our device’s abil-847

ity to recover both geometry and photometry. The scene, drawn in848

Figure ?? and pictured in Figure ??, consists of two mirrors out-849

side the field of view of the sensor. The presence of the multiple850

hidden mirror patches is nonetheless revealed in the time profile of851

third bounces. Though our sensor did not have the resolution to852

fully distinguish the two different path lengths, the fact that the am-853

plitude of the signal varied when one or both of the mirrors were854

removed is strong evidence for the recovery of the third bounce.855

9. Implementation, George Stuff Sat Night856

Collecting the TPRF tensor corresponding to the time profile of the857

intensity of the incident light required hardware able to operate at858

the speeds required with the amount of signal present. Though it859

is clear that high-collection-speed, high-angular-resolution systems860

are considerably into the future, we sought to demonstrate that the861

time profile of an introduced dynamic in a scene can be reliably862

captured even after reflections of order greater than 1, increasing863

the difficulty compared to LIDAR or previously demonstrated time-864

gated systems.[?]865

9.1. Device866

Our sensor was a commercially-available reverse-biased silicon867

photodiode, Thorlabs FDS02, intended for fiber-coupling. The sen-868

sor has an active area of 250 micron diameter and a condensing lens869

of diameter 1 millimeter. Sensor signals (photocurrents) were digi-870

tized by 5 GHz sampling oscilloscope.4. The combination results in871

an impulse response width (-3 dB) of 210 picoseconds. No optical872

or electrical amplifier was used, and the sensor had a gain of unity.873

Introducing dynamics into the scene was a modelocked Ti-sapphire874

oscillator5 with a center wavelength of 810 nm emitting pulses of875

FWHM duration 50 fs at a repetition rate of 93.68 MHz. The spa-876

tial bandwidth of these pulses so far exceeds the response band-877

width of the sensor that we considered the laser pulses to be—878

3A note on precision vs. resolution in our system: the temporal andtherefore spatial resolution of our system is known absolutely, based on theconvention that two finite-width pulses can be resolved provided that theircenters are separated by half their FWHM; in our case: 105 ps, correspond-ing to a distance of 3.15 cm. The precision and accuracy, however, arecharacteristics of the measurement noise.

4LeCroy Wavemaster 8500A5Manufactured by Kapteyn-Murnane Laboratories.

11


effectively—impulses. Average laser power was 420 milliwatts,879

corresponding to peak powers greater than 84 kW. The high peak880

powers generated by our oscillator were critical for the preserva-881

tion of the signal-to-noise ratio in a photodiode generating current882

proportional to incident power.883

9.2 Experiments884

The combination of a high-speed photodiode with a modulated885

source of illumination is the basis of the optical time-domain re-886

flectometer commonly used in the telecommunications industry to887

detect back reflections caused by defects in optical fibers. The nov-888

elty of our device is dual: that it functions in free-space, rather than889

being confined to an optical fiber; and that the scale of its opera-890

tion is centimeters, rather than the dozens of meters to hundreds of891

kilometers usual to telecommunication. The increased difficulty on892

account of the signal loss due to both misalignment and inverse-893

square diffusion loss in free-space as well as the much higher894

speeds—and ensuing further inhibition of the signal—is clear.895

The success of a transient light field camera depends on the ability896

to recover two broad factors from scene information: photometry897

and geometry. We successfully demonstrated that both factors can898

be recovered in a free space setting, by three experiments. The first899

experiment, depicted in Figure ??, tested the precision6 of distance900

measurements achievable with our device and correspondingly the901

ability of our sensor to recover scene geometry. A plot of the time902

delay against a reference arm as the detector travels on a linear stage903

is seen in Figure ??.904

The second experiment demonstrated the recoverability of the scene905

photometry after multiple-bounces; that is, that a twice-diffused906

signal will be detectable. A scene diagram is found in Figure ??.907

The two diffusers were both anisotropic: the dull side of aluminum908

foil and a similar copper foil. The impulse response of this scene,909

consisting solely of the second bounce from the second patch, was910

successfully recorded, showing that a scene with two diffusions911

does not result in an irrecoverable loss of signal. Therefore, the912

photometric characteristics of the two diffusers can be inferred.913

The third experiment combined demonstrations of our device’s abil-914

ity to recover both geometry and photometry. The scene, drawn in915

Figure ?? and pictured in Figure ??, consists of two mirrors out-916

side the field of view of the sensor. The presence of the multiple917

hidden mirror patches is nonetheless revealed in the time profile of918

third bounces. Though our sensor did not have the resolution to919

fully distinguish the two different path lengths, the fact that the am-920

plitude of the signal varied when one or both of the mirrors were921

removed is strong evidence for the recovery of the third bounce.922

10. Future work923

An important culmination of our imaging approach in this paper924

is that by exploiting emerging technologies for sensors, optics and925

active illumination, we explore a completely new area of methods926

and algorithms for solving hard problems in computer vision and927

graphics based on time-of-flight analysis. This in turn has tremen-928

dous applications in other areas such as Medical imaging and Mil-929

itary reconnaissance. We expect our analysis and results to pave930

the way not only for improved approaches to solving existing hard931

problems but also the definition of novel class of problems that can932

be cast in our framework.933

6A note on precision vs. resolution in our system: the temporal andtherefore spatial resolution of our system is known absolutely, based on theconvention that two finite-width pulses can be resolved provided that theircenters are separated by half their FWHM; in our case: 105 ps, correspond-ing to a distance of 3.15 cm. The precision and accuracy, however, arecharacteristics of the measurement noise.

We consider two primary directions for future work:934

Theoretical: There is still a lot of scope for compact representation935

of the described state space system. Better optimization algorithms936

for state space identification can be devised with the use of surface937

BRDF models. The state space framework can also be used for938

solving rendering problems in computer graphics.939

Implementation: Building the hardware for time-of-flight camera940

that is capable of imaging at picosecond or better resolution is cru-941

cial to testing our approach in practice and on real scenes. We pro-942

pose the design of the very first transient light field camera in figure943

??944

11. Conclusion945

We have presented a conceptual framework for scene understanding946

through the modeling and analysis of global light transport. We ex-947

plore new opportunities in multi-path analysis using time-of-flight948

sensors. Our approach to scene understanding is four-fold:949

• Measure the scene’s transient photometric response function950

using the directional time-of-flight camera and active impulse951

illumination.952

• Estimate the structure and geometry of the scene using the953

observed TPRF954

• Use the estimated structure and geometry along with aprior955

models of surface light scattering properties to infer the BRDF956

form factors of the scene957

• Higher order inference engines can be constructed that use the958

estimated scene properties for higher level scene abstraction959

and understanding960

A time-image camera described here is not available today but the961

pico-second resolution impulse response can be captured by scan-962

ning in time or space. Emerging trends in femto-second accurate963

emitters, detectors and non-linear optics may support single-shot964

time-image cameras. The goal of this paper is to explore the op-965

portunities in multi-path analysis of the transient response. We de-966

veloped the theoretical basis for analysis and demonstrated poten-967

tial methods for recovering scene properties using simple examples.968

But a complete procedure for estimating scene parameters remains969

future work. The contribution of this work is conceptual rather than970

experimental. We hope to influence the direction of future research971

in time-of-flight systems both in terms of sensor design and algo-972

rithms for scene understanding.973

Femto and pico-second lasers are high-power and increasing their974

power involves careful design. Fortunately, we are seeing an in-975

creasing trend towards solid state lasers and in the future ultra-short976

duration solid state lasers will allow them to bring down costs the977

same way 3DV and Canesta have done so with solid state nano-978

length sensors and emitters. There is nothing in physical laws to979

prevent such a trend for foreseeable future. We require precise opto-980

electronics that introduces minimum measurement noise.981

(?? For Kirmani’s part [Immel et al. 1986; Ljung 1987; Verhaegen982

and Verdult 2007])983

References984

3DV SYSTEMS. http://www.3dvsystems.com/.985

ANDERSSON, P. 2006. Long-range three-dimensional imaging using range-gated laser986

radar images. Optical Engineering 45, 034301.987

ARVO, J. 1993. Transfer equations in global illumination. In Global Illumination,988

SIGGRAPH 93 Course Notes.989

BLAIR, J., RABINE, D., AND HOFTON, M. 1999. The Laser Vegetation Imaging990

Sensor: a medium-altitude, digitisation-only, airborne laser altimeter for mapping991

12


vegetation and topography. ISPRS Journal of Photogrammetry and Remote Sensing992

54, 2-3, 115–122.993

BUSCK, J., AND HEISELBERG, H. 2004. Gated Viewing and High-Accuracy Three-994

dimensional Laser Radar. Applied Optics 43, 24, 4705–4710.995

CAMPILLO, A., AND SHAPIRO, S. 1983. Picosecond streak camera fluorometry–A996

review. Quantum Electronics, IEEE Journal of 19, 4, 585–603.997

CANESTA. http://canesta.com/.998

DAS, B., YOO, K., AND ALFANO, R. 1993. Ultrafast time-gated imaging in thick999

tissues: a step toward optical mammography. Optics letters 18, 13, 1092–1094.1000

DEBEVEC, P., HAWKINS, T., TCHOU, C., DUIKER, H.-P., SAROKIN, W., AND1001

SAGAR, M. 2000. Acquiring the reflectance field of a human face. In SIGGRAPH1002

’00: Proceedings of the 27th annual conference on Computer graphics and interac-1003

tive techniques, ACM Press/Addison-Wesley Publishing Co., New York, NY, USA,1004

145–156.1005

FARSIU, S., CHRISTOFFERSON, J., ERIKSSON, B., MILANFAR, P., FRIEDLANDER,1006

B., SHAKOURI, A., AND NOWAK, R. 2007. Statistical detection and imaging of1007

objects hidden in turbid media using ballistic photons. Appl. Opt. 46, 23, 5805–1008

5822.1009

GARG, G. G., TALVALA, E.-V., LEVOY, M., AND LENSCH, H. P. A. 2006. Sym-1010

metric photography: Exploiting data-sparseness in reflectance fields. In Rendering1011

Techniques 2006: Eurographics Symposium on Rendering, Eurographics Associa-1012

tion, Nicosia, Cyprus, T. Anenine-Moller and W. Heidrich, Eds., 251–262.1013

GONZALEZ-BANOS, H., AND DAVIS, J. 2004. Computing depth under ambient illu-1014

mination using multi-shuttered light. In Computer Vision and Pattern Recognition,1015

2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference1016

on, vol. 2.1017

GVILI, R., KAPLAN, A., OFEK, E., AND YAHAV, G. 2003. Depth keying. SPIE1018

Elec. Imaging 5006, 564–574.1019

HAMAMATSU. www.hamamatsu.com/.1020

HOFTON, M., MINSTER, J., AND BLAIR, J. 2000. Decomposition of laser altimeter1021

waveforms. Geoscience and Remote Sensing, IEEE Transactions on 38, 4 Part 2,1022

1989–1996.1023

IDDAN, G., AND YAHAV, G. 2001. 3D imaging in the studio (and elsewhere...). In1024

Proc. SPIE, vol. 4298, 48–55.1025

IMMEL, D. S., COHEN, M. F., AND GREENBERG, D. P. 1986. A radiosity method1026

for non-diffuse environments. 133–142.1027

ITATANI, J., QUERE, F., YUDIN, G., IVANOV, M., KRAUSZ, F., AND CORKUM, P.1028

2002. Attosecond Streak Camera. Physical Review Letters 88, 17, 173903.1029

KAILATH, T. 1979. Linear Systems. Prentice Hall Information and System Sciences1030

Series.1031

KAJIYA, J. T. 1986. The rendering equation. In SIGGRAPH ’86: Proceedings of the1032

13th annual conference on Computer graphics and interactive techniques, ACM,1033

New York, NY, USA, 143–150.1034

KAJIYA, J. T. 1986. The rendering equation. SIGGRAPH Comput. Graph. 20, 4,1035

143–150.1036

KAMERMAN, G., 1993. Laser Radar [M]. Chapter 1 of Active Electro-Optical System,1037

Vol. 6, The Infrared and Electro-Optical System Handbook.1038

KAWAKITA, M., IIZUKA, K., AIDA, T., KIKUCHI, H., FUJIKAKE, H., YONAI, J.,1039

AND TAKIZAWA, K. 2000. Axi-Vision Camera (real-time distance-mapping cam-1040

era). Appl. Opt 39, 22, 3931–3939.1041

KUTULAKOS, K. N., AND STEGER, E. 2007. A theory of refractive and specular1042

3d shape by light-path triangulation. International Journal of Computer Vision 76,1043

13–29.1044

LANGE, R., AND SEITZ, P. 2001. Solid-state time-of-flight range camera. Quantum1045

Electronics, IEEE Journal of 37, 3, 390–397.1046

LJUNG, L. 1987. System Identification: Theory for the User. Prentice Hall Information1047

and System Sciences Series.1048

MARSCHNER, S. 1998. Inverse rendering for computer graphics. PhD Dissertation1049

Cornell University Ithaca, NY, USA.1050

MASSELUS, V., PEERS, P., DUTRE, P., AND WILLEMS, Y. D. 2003. Relighting with1051

4d incident light fields. In SIGGRAPH ’03: ACM SIGGRAPH 2003 Papers, ACM,1052

New York, NY, USA, 613–620.1053

MCLEAN, E., BURRIS JR, H., AND STRAND, M. 1995. Short-pulse range-gated1054

optical imaging in turbid water. Applied Optics LP 34, 21.1055

MESA IMAGING. http://www.mesa-imaging.ch/.1056

MIYAGAWA, R., AND KANADE, T. 1997. CCD-based range-finding sensor. Electron1057

Devices, IEEE Transactions on 44, 10, 1648–1652.1058

MORRIS, N. J. W., AND KUTULAKOS, K. N. 2007. Reconstructing the surface of1059

inhomogeneous transparent scenes by scatter trace photography. In Proceedings of1060

the 11th International Conference on Computer Vision.1061

NARASIMHAN, S. G., GUPTA, M., DONNER, C., RAMAMOORTHI, R., NAYAR,1062

S. K., AND JENSEN, H. W. 2006. Acquiring scattering properties of participating1063

media by dilution. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Papers, ACM, New1064

York, NY, USA, 1003–1012.1065

NAYAR, S. K., IKEUCHI, K., AND KANADE, T. 1990. Shape from interreflections.1066

In Third International Conference on Computer Vision.1067

NAYAR, S. K., KRISHNAN, G., GROSSBERG, M. D., AND RASKAR, R. 2006. Fast1068

separation of direct and global components of a scene using high frequency illumi-1069

nation. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Papers, ACM, New York, NY,1070

USA, 935–944.1071

PATOW, G., AND PUEYO, X. 2003. A Survey of Inverse Rendering Problems. Com-1072

puter Graphics Forum 22, 4, 663–687.1073

PMD TECHNOLOGIES. http://www.pmdtec.com/.1074

RAMAMOORTHI, R., AND HANRAHAN, P. 2001. A signal-processing framework1075

for inverse rendering. In Proceedings of the 28th annual conference on Computer1076

graphics and interactive techniques, ACM New York, NY, USA, 117–128.1077

RUSSELL, G., BELL, J., HOLT, P., AND CLARKE, S. 1996. Sonar image interpreta-1078

tion and modelling. Autonomous Underwater Vehicle Technology, 1996. AUV ’96.,1079

Proceedings of the 1996 Symposium on (Jun), 317–324.1080

SCHROEDER, W., FORGBER, E., AND ESTABLE, S. 1999. Scannerless laser range1081

camera. Sensor Review 19, 4, 28–29.1082

SEITZ, S. M., MATSUSHITA, Y., AND KUTULAKOS, K. N. 2005. A theory of inverse1083

light transport. In Proc. Tenth IEEE International Conference on Computer Vision1084

ICCV 2005, vol. 2, 1440–1447.1085

SEN, P., CHEN, B., GARG, G., MARSCHNER, S. R., HOROWITZ, M., LEVOY, M.,1086

AND LENSCH, H. P. A. 2005. Dual photography. In SIGGRAPH ’05: ACM1087

SIGGRAPH 2005 Papers, ACM, New York, NY, USA, 745–755.1088

VANDAPEL, N., AMIDI, O., AND MILLER, J. 2004. Toward laser pulse waveform1089

analysis for scene interpretation. In Robotics and Automation, 2004. Proceedings.1090

ICRA’04. 2004 IEEE International Conference on, vol. 1.1091

VERHAEGEN, M., AND VERDULT, V. 2007. Filtering and System Identification: A1092

Least Squares Approach. Cambridge University Press.1093

YU, Y., DEBEVEC, P., MALIK, J., AND HAWKINS, T. 1999. Inverse global il-1094

lumination: recovering reflectance models of real scenes from photographs. In1095

SIGGRAPH ’99: Proceedings of the 26th annual conference on Computer graph-1096

ics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., New1097

York, NY, USA, 215–224.1098

12 Appendix1099

Matlab source code1100

\small1101

%% System identification Code1102

% M=# of patches in scene; Delays=pairwise delay matrix1103

% Y[:,1:T] = Time image output for T time instants1104

% U[:,1:T] = Time image input for T time instants1105

1106

% Initial guess for parameter values f[i,j,k]1107

initialGuess = 0.1.*ones(M,M,M);1108

1109

% Construct system matrices. Implements section 21110

[A, B, C] = constructSysMat (M,Delays,initialGuess);1111

1112

% Initial state vector1113

L0 = zeros(size(A,1),1);1114

1115

% Construct state space model with initial estimates1116

ssModel = idss(A,B,C,0,0,L0,'Ts',1, ...1117

13


'SSparameterization','structured');1118

1119

% Identify parameters to be estimated. Set them to NaN1120

[Aest, Best, Cest] = setParams2NaN (A,B,C);1121

ssModel.As = Aest; ssModel.Bs = Best; ssModel.Cs = Cest;1122

1123

% Create I/O data1124

data = iddata(y(:,1:T)',U(:,1:T)',1);1125

1126

% Prediction error minimization to estimate parameters1127

modelEstimated = pem(data, ssModel);1128

1129

% Read out the estimated parameters1130

estimates = readEstimates (modelEstimated.As,1131

modelEstimated.Bs,modelEstimated.Cs);1132

14

Documents

web.media.mit.eduweb.media.mit.edu/~Raskar/Prakash2/FemtoDec7th2008/Sig09_noref_nofigs.pdfOnline Submission ID: 010 Inverse Transient Light Transport via Time-Images 1 Abstract 2 We