Lectures given 31 J am. — 4 Feb., 1994 OCR Output · 2017. 11. 2. · A F77 package of adaptive artificial neural network algorithms. JETNET 2.0. is presented. its primary target

Lectures given 31 J am. — 4 Feb., 1994 OCR Output

F. JAMES

Introduction to NEURAL NETWCRKS

ACADEMIC TRAINING PROGRAMME

C E R N

February 8, 1994

(wEc>A/G [ OCR Output

How 77·Cc` !-UU7/M/ SRAM! 6uof€¢<.S

VV/*:5 GN/CG A Mabel OF

MOST ol? THF TU-1.6

MOST 0F THF $7?>f2€ [S /30Le

(/\J5TQxJcT/0f\LY AND T/F7?}

PMN'! A STORE COA/TH/A//VV

OF I/\/§7`R\)C7`lOALS'

SEGUE/\/TI`/H. EXECUTYOU

7/lg \/my /VEUHM/4} QV!/?<J7?'7€

@9 : OCR Output

°* **;¢·~•s&cwps

Swag cmu 7/

mmmHkéqusY {0 *,10* I, /0 C°#’~c—2Ln“g

;,

ewsrcn4.©er¢r

@2awbAJQQ

DeusxvCATHEZJ.

GouczM- · Hinmic

imma;(Him?

È:·@€gsmiivé '

·$-v gx;Povs

ShrihriuGhbréuam

/~·P¢rV5.

Rods RUBBIIIOCGW?

OJH

we uw A/·

ff6GQ`¢€OCR Output

€r»v$·rcAMBGJZT

J" BéawbHLFQQ

Gave;MP4:

·ze;*'"*€HRnsr

igizec. . Awua éan

2S`—P jg;PoP?

S/%G6·T'?~GHGQEL

RUBECHQL

in{O ’O

/MP!/T5. JL, é O

$‘{sTéM$~. AW may [N lqqq OCR Output

meHwroraieav.

Rr¤m·L¤~@0r2¢r

O IF T4 5.

(G`) =2L as qj'7

"éo xrcmrév A,»

HHDLLDFiee I _—D°races ¤¤=· ‘Tr?A¤\>sF `

AUX

J é5; UU;}

***3 S.,

(na

2 = iff?

I•1(’o$S/BLE AS coMPur£rz$ (sr z? ic¢$EE OCR Output

VZGGQAMHZ/\}§ wie.; 45c;q;HE

(ey Ir is sr·ec¤·.A·r<q> THAT T¤?A{p1`7‘z;>MA4_

c2Az,.»;m. ¢De·m»r+>,¢v/¤»v

wswsml/e~· 7B /wzsg-3Zmr- www?(9

yy; swan /"¢€•€K6"7',/wénfwél

um`./»;vz$AL ?24;2>i»L·,me5(#1

No Mem Tb em/.0&>?sm»¤/_»b

LGFWN Ftoh @><aw»74@sls)

mmf AA/7 O77{4·'7( ¢<‘wcuA/ M6ì7Z¢,bAf bowé M·+·f{ PQGBCGHS EcvmeCz)

muezgmwuy 7°»4r2»°»Lz.e`¢ -—> FA SrOCR OutputM)

H€%a>0T N· M. ?

WHAT is 50 EXOYT/‘N

@

\F THLS c.ZM.iT' is F’u»vz>LUG.- (AMLL. TK? T5 UA/%é`Q,$

>wc> /A/'PUTS

KJUEOGLCT/WSfor &;[_(fE”Srzq,

LA·nSO YE

OUT Q· A/)°

S/Af

www is gg5.2 OCR Output

, Z.=4wWi

O.?

OCR OutputOCR OutputOCR Output"

Mlm; ea OCR Output

¢» (M O me

Sl/PE?}//56313 ” 4€H:@zvm/ OCR Output

THF TW~Am$¢U`{ SNMPLG

LFF-V2 VSTug wsrwozmé

Qbwsr we wave

Heeuzrézrwép

A zvevwcé/<7»e;=m»rJz~/4 mwzms

Ge»v€RA»rE A beczpc mr

6/*1*L 2>°c@

U/ig AOCR OutputHOC`)

BUT IT $0VlG'Tj/WGS yUOg{(_g_ OCR Output¢/SiTEC}!/V/@06* /5 \/my pgp;} 7/·

%’77gA/v*7%//I ¢A/0U—!»/*‘/9% HW}'7/

(Q 4¤‘F<> (2)

Nfib TVXPEOUG-S'nr miie; cm v~·¤¢$€, CUT STEP my

, ;/BT; FONT (NEW (»V€'6"q- /J/-Swan? [H LMCUCAT

(STEP LENGTH /.6 A €!.·*.§.§§!<z>z¤e>¤¢Trb»/41. ·rc> /·"$e:’z(v»°r‘f"V<=’ Gmlm We-P

(5) HOXUFY MEMHTS B'! AMUW "Sn·:zs¢>¢=sr Dé7$<6*7

TH/<’oU6»‘( UG"`/°

£’mJ> Swag?2)LAl-CULATE ALL /é-T bgfll/ATg:i"i

av ¤€T¤·>¤€4<__ Amb Swe n CALLULATE CHPSWAEE - xm

{‘T?·§ - HJ'? ·/WY- /7’3’6

OCR OutputOCR OutputOCR OutputOCR OutputHE BACK- QDROPAG/VUGM ALqOR`}77!H

network with the back~propagation updating algorithm in OCR Outputand good performance. IETNET 2.0 implements such a

Nb. of bits in a word: 32 multilayer networks are widely used due to their simplicitywide range of problem areas. In particular feed-forward

Memory required to execute with typical data: —· 90 kword: powerful paradigm for automated feature recognition in aArtificial neural networks (ANN) have tumed out to be a very

Programming language used: FORTRAN 77 Method of solution

ULTRIX RISC 4.2 data.Operating system under which the program has been tested: is the introduction of relevant cuts in the multi-dimensional

products. Standard procedures for such recognition problemsTheoretical Physics. University of Lund. Sweden quark based on the kinematics of the hadronic fragmentationComputer: DECstation 3lO0: Installation: Department of based on calorimeter infomation ot the identification of aSUN. Apollo. VAX. IBM and others with a F77 compiler tion problems. It could be separating photons from leptonsComputer for which the program Ls designed: DECstati0n. High energy physics offers many challenging pattern-recogni

Nature of physical problemLicensing pr0t·i.ttons: none

neural network

issue) Keywords: pattem recognition. jet identification. artificialUniversity of Belfast. N. Ireland (see application form in thisProgram obtainable from: CPC Program Library, Queen's etc.: 3345

N0. of lines in distributed program. including test deck data.Catalogue number: ACGV

output

`Htle of program: JETNET version 2.0 Penpherals used: terminal for input. terminal or printer for

PROGRAM SUMMARY

architectures and procedures.

consists of a set of subroutines. which can either be used with standard options or be easily modified to host alternativeingredients are the multilayer perceptron back·propagation algorithm and the topological self-organizing map. The packageenergy physics community, but it is general enough to be used in any pattern-recognition application area. The basic

A F77 package of adaptive artificial neural network algorithms. JETNET 2.0. is presented. its primary target is the high

Received 27 August 1991

Deparrmenz of Theorerical Physics. University of Lund, Sélcegacan I 4 A, S-223 62 Lund. Sweden

Leif Lonnblad, Carsten Peterson and Thorsreirm Rognvaldsson

networks - JETNET 2.0Pauem recognition in high energy physics with artificial neural

N¤rrh-Holland CommunicationsComputer Physics Communications 70 (1992) 167-182 COUWQUIBT PhYS1CS

-. { ·- -— /5)..· -iv»¢-oi ¢- ·r

Operating system: DEC OSF 1.3 OCR Output

Lund, SwedenComputer: DEC Alpha. 3000; installation: Department of Theoretical Physics, University of Lund,

Hewlett·Packard, and others with a F77 compilerComputer for which the programme ir designed: DEC Alpha, DECstation, SUN, Apollo, VAX, IBM,

pub/J•tn•t/ or from t:r••h•p.sc:1.tsu. edu in directory !z·••h•p/ uxalys 1s/ j•tn•e.Program obtainable from: denni0thep.lu.se or via anonymous ftp from thep.1u.s• in directory

Catalogue number:

Title of Program: J ETNET version 3.0

PROGRAM SUMMARY

Submitted to Computer Physics Communications

Theory Division, CERN, CH 1211 Geneva 23, Switzerland

Leif Lonnblad

Solvegatan 14 A, $-223 62 Lund, SwedenDepartment of Theoretical Physics, University of Lund,

Carsten Peterson and Thorsteinn Rognvaldsson

Neural Network Package

J ETNET 3.0 — A Versatile Artificial

December 1993CERN-TH.7135/94

LU TP 93-29

1,j OCR Output7,%, -4 S 54-8,,s +4.

gw E —> O0 M Hjéw) -9 mz mn Q;

E—>0 as E’_Cw>-;·&.THEN

A ·**¤ .» 0- Q(H M - l s i, {0; ({ @4, — 1·»Z1·»/1 + . (`¢gY=)z,an ‘?M·n<-¤w¤2= (¤€kiv€D neon Tummm-,},,0 'T+q¤g

'

Lf5jL»·ri€S§_>zCs·r cs

1* $+u*°·»€wmsw fl'-) c¤ns*(¤··J", mf

$¤¤.uTlo•0 SS Af_g1`Q`57‘ vgzy 5·) )C

s wzxcms Ger VEZY FA£ Fz¤

wp ?€2¤L>c-uz-; ;& A.rJ.M.=

(ee; Polsswuj

mmgw Fkrring G,.¢>.¤ss¢À4>—Dz>77@ 2,42;>?r•`»»mL gvpm s·n<.»+¢ @4og>cw;T;G

( E} CW)KL ?

CDU Pep rzprnk. Cas? F-ZMQTLQM

Quercw Mhxqs TQ,

wm;.Mx QMBEMTE OCR OutputFml gw] H. l N`! Mu M. Jgv•vq,£Z.¢»

¤>,1?·f6`You 5`r2/Han?

_ I __ "

s. Gm

M m~¤e¤:_ iéowb -<>r?»e2" mF¤>em¢4TZ°¢>" "

O I <>\Ié` I"`

N} UNCT`l§k‘%$

E KI TEP SEE.

x

2.\ UW B6 UQ?

5 QE

G R na HT

45 `ZD<>€$M'T Know

cc-:M‘rTEEPEST

/NWO M) é7di \/QS

03#77 CN

cw` ` ' '

J/Eé"C.f!cAf` OCR Output

FP uf ofmhre

) __Ì F Q NE6/!7rV»!' @5; Z/FW/4 (_

,{;p}A//re-··S7é"P

J __$0[F L2 oO

Pune N`STM§LE

$€C.<> up D€El\/A#7TU€:"

3UT METKGLS U5} nl

/2

vAwPm»~6> @J><7>Li<JrL·OCR Output

C<>N30AAT€ D/12GcT/ws W!THOU7'IT ÌS P¤SSl1SLE ‘TC> FU~/QD

AEG C0/vJuc;»4r€ mnt-.F me Af Luvss [J /ee-·cv*zwv5

UJ•(TH N

You DQ•\L.L. F GXHC-TLFurrcuéz <—Z¤5v€g '3 mw:} THA'?

’’Bx,-9x _, I N ( : Jwird ?LF

CO~·sz }>€>Z A Cpu/h7KAT{k, ¢q¢A/c/77¤4/

xr J, = 0ji

w.né._Qne; smc cm Bc? C.e>MJ<z7G

gz7`wo {Difco nb Ms

( pc.e·:‘rcHé’1c + fees/65, "’/79OCR OutputG>,v.7u& era 6’2A1>iew7‘ M£7%b

Auf New °b(k€cné*'G wQ'(·é€° Co/\}JUG#7’€' OCR Output3) COUT7·A}L¢,é' {N ,$A»r•4€ WA 7

c>2·rHoC»0•dAg, TpNagano!/~

1} cams: new Q;

mhxénéze mm lx TB X .-·— __[ )

$¤‘-*}CHO¤$é g_;_<_»I»»·¢N{;;

COA/JU64T€ To Aa;

.gyéjr4L•Q4"O z gg

Tl-{CU A-NY \/ECTDZ dt ORTHDGGNA4. TD Q7 [S

=> il =G AK- A'

T,C __F FblNcTfo4] jg @(/A?£A

A__5W

KQ +G X4A K :·. X` ...-2gc

v~Q·<.¤Ld1

COMJcJ&42@’ GKM/gm ,4.;qM;,;y,(22

QMBAL r»t?A/MUN OCR Outputu>S Ou-nj QF 5 NA., •»umnA·(Z>ur séTT¤.ss /~r<>H\G,s—( TEM-PER RTUQE ·—-5 Lew TEHPGRRTURE `

¤rwLA1·:p AmeA¤`wQ=

I,0<.P<L Mc»/MA i

gvé'2 THE wH¤Lf5 TYf}}A/[A/[ <S0"‘*7P¢€3) '[D€\‘¤`¤¢=; AS GL-<>YsAL 5uVl

TQA·v”Mui/`C, %£;M,PwCWCLG 0V€l2 SETS (SUQQGWKOF TNF

yhp •/¤ mméer af z‘¤w1"¤.¤r»€A Few Pxwrevmse>V67<2-J 5MM

—- ?TZA·•`N}v5$$A;»~¤.PL€ {S CALLSD A/UMG HP GTE C, CLE 7‘}{€¢Iiq'0 c

$6..0116*:él-¤8Ar.

Fez P4‘rt‘€Rnl #2

P¤5cT ¢F2$g:¤®f·QF ‘T£A¢N¢M<, $A·wtPLG’ gx/6 04/(_=‘__

4] \%|MiV\.L’Q_-e? L QA-U-( /

KASEUQ T1-{E TRQIMIMQ $F\'M,'FCQ`-§I>iFF·2¤<€m‘ $·n2A·re<;i¤$ GF

DL·\LOG 4.9} OCR Output

can now recognize a wide range of is currently the world's fastest and38 types of neural networks and learnexample. artificial neural networks er is required which can simulate all faster than a powerful workstation. itresearch and development. For form such tasks: so a special comput SYNAPSE-1 is not only 8 OOO times

achieved in scientific and industrial weeks · sometimes months - to per second. With power on this scale.Successful results have already been Even the fastest workstations take adding 16-bit numbers) everycomputers to do in the future. point operations (i.e. multiplying orthe sort of things we will want our power required perform more than five billion fixed

Q when knowledge is incomplete are process neural applications. lt` canA leap in computingognition, or making sensible guesses cally reducing the time needed toautomatic speech and image rec magnitude. mens AG directed towards dramati

language or recognizing faces. Yet Development Department at' SieGannaay resources by at least five orders ofway from understanding natural Munich. power required exceeds existing work by the Central Research and

Si•nr•naA6. learning process: the computingpowerful computers are still a long SYNAPSE-l is the result of two years‘capacity.scious effort. For example, even very difficulty of simulating the neural

appear to achieve without any con ing massively parallel processingvelopment of new applications is thebrain. nervous system and senses search and The major obstacle to the rapid de network of simple processors provid

replicating what the human C••¤elR•- rates. neural computer is equipped with aputers have fallen far short in Remaster. predict the movement of exchange based on the von Neumann model, a

ing processes. Unlike com_ ersmil now, conventional com By Dr. Ulrich handwritten letters and numbers, or

is the fastest neural computer in the world

SYNAP$E·1. designed tu simulate the human brain,

OCR Output? Bs

Fax: (021) 635 88 82 OCR OutputTel.: (021) 632 01 11tions of neural algorithms.

the compute intensive operaCH-1020 Renenscessor MA16 which exewtes

for future developments. Division Band from the neuro-signal-proported and provision is made lnfbfmationssysieme SAsor and memory architecture

from a scalable multi-proces neural networks are sup- Siemens Nixdorfture, so that any of today‘stions per second) is resulting'general purpose' architec- contact:multiplications and accumula

to 3.2 billion connections or For more information, pleaseimplementedThe power of SYNAPSE-1 (uptentlal applications can be station.

that of a powerful workstation. that a complete area of po- VME bus connection to workand large neural networks. so mtgggorders of magnitude abovesupport of small, mediumperformance that iles several

computer SYNAPSE-1 offers a Frequency rate 25 MHzbe minimizedware solutions. The neuro MBytesls.that development times canmuch superior to those son VME bus connection 4conventional computers, soteuure of neural networks are Working memory 8 MBytes,orders of magnitude above per second,take into account the ardtlcompute power of at lea: 3Special purpose conputers that generator 25 million addresses

Peak perfonnance Y addressplatforms. universal neurocomputer: 20 MIPS.

Peak performance sequencerrelevant requirements for aneural applications on suchControl UniSYNAPSE-1 fulfills all thevents fast developments of

neural Ieaming proces preFrequency rate 25 MHzEspecially the simulation of the nAPL is embedded in C+-•·.Y memory 8 MBytes.tremely long processing times. development of his algorithms.Mbytesls.explored because of the ex whidt supports the user in the

only very simple cases can be VME bus connection 4gramming Language" (nAPL)Working memory 8 MBytes,computers like workstations, use 'neural Algorithms Prointeger operations per second,be simulated on conventional programmed in the easy-toPeak performance 100 millionAlthough neural networks can environment. Applications areMtaxmof today‘s computer systems. and workstation programming

different from the architeuure programs, operating softwaretem is delivered with microral networks is considerably Capacity 128 MBytesOn the software side. the sys W memoThe stiumure of artificial neu

problem areas. face. Frequency rate 25 MHzto tackle the above mentioned - Address bus 56 MBytes/s.mented via a VME bus interHcial neural networks in order input/output devices is imple Control busses 200 MByteslssystems are modelled by ani· workstation and spedalized Data busses 0.8 GBytes/s

(plus 100 MBytesIs parity)tant. Therefore, natural neural Communication with hostbecome more and more impor Backplane 0.8 GBytes/sespecially Ieaming capabilities Transfer ratescoordination.

Memory capacity 32 MBytes.optimization problems and a control unit for control andsecond.gnition, solution of complex a high-bandwidth memoryaccumulations (48 bit) pere.g. image and speech reco intensive operationsmultiplications (16x16 bit) andhand, these capabilities like a data unit for non computePeak performance 3.2 billiontion technology. On the other an array of 8 MA16 chipsMMG arranents:to model by classical informa

brain and senses are difficuit following hardware compoThe capabilities ofthe human Technical data:The system consists of the

Neurocomputer SYNAPSE-1

Dr es sc ngemeur ml Teeefax 32* /535 B6 B2NPeter Puhlmarm Te·e:r»ore ~cer~zran 32* /632 C1

Taleonare Vcvecn 327 x632 D3 32SI

Cr——WO2O ¤er·ens

Avenue ¤eS Baumétles 7

•¤¤emauo¤a¤ Aceoums

¤r¤:ma¤·¤¤ssvs:¤me SA

$·emens Nuccd

NIXDORF

SIEMENS

Bu? THE/ZE {IS HUGH /•A/ CCH/¢6l/' OCR Output

0~J*·¢6’@€`N$ MJZS is A/ÒA/·L/`}U6/»‘£

Z.//\/579% I/V VEÈSé`·'

T}-{E7 AEG 7’£&7¤·‘TM/¢ TFKF

/V“’*¢"?¢¢Z»+&· 2é‘¢//’eS /»~/ F¤r2r¢v»/ "_ r4¢e:s‘ ?7.S`- <?/Fmgvizw sy wzicim peeas, mz em4>»v¢2> ms

A5 AU JAfVE7€5E FEOELFM

Fin/.D VALUES 4;,:: we"1GH¤

c/{cass Nun rsél sec A/€U»c¤»v‘.Semswn

www Su! M TEAM//Vb MwrzsCwmuG Mc-IGSK u‘e‘Twò BK Mgyzfawze

COMSFDQR. THE GCO@AL (P{€0z$L67‘<

Lew us ENE'? BHQK Mb

HA»3>~'r $6EV

A/éTw0m.<

'¥`¤ A VATTGRM

QeuERALr2ATzc>N

HE DAW OCR Output

v M<¤M<>rZ{21}~M

Ox/€£·Fl7

'T7Z4i»w}~f5 SGWWLFH-na THE

'€>LA<-K @0}

EN

Rn, 2 r,(:1:,,)(x,,+i - 2:,,-1)/2 (18.4.8) OCR Output

where the N x M matrix R has components

ct = > R;,_,u(a:,,) + nt (18.4.7)

replace equation (18.4.5) by a quadrature liketo denote values of immediate observables.) For such a dense set of x,,’s, we canit to denote values in the space of the underlying process, and Roman letters like imuch between any 2:,, and :1:,,+1. (Here and following we will use Greek letters likelarge, and the a:,,’s are sufficiently evenly spaced, that neither u(x) nor r,(a:) varieslarge number M of discrete points :1:,,, it = 1,2,.. . ,M, where M is sufficiently

We can’t really want every point :1: of the function 1'I(:r). We do want somecurve" through a scattering of Fourier coefficients?)underlying function live in quite different function spaces. (How do you "draw athese assumptions, and to extend our abilities to cases where the measurements andthe nature of the response functions r,(:r), or both. Our purpose now is to formalizeare making some assumptions, either about the underlying function u(::), or aboutmeasure "enough points" and then "draw a curve through them." In doing so, weYet, whether formally or informally, we do this all the time in science. We routinelyreconstruct a whole function t'i(:1:) from only a finite number of discrete values c,?

It should be obvious that this is an ill-posed problem. After all, how can wehow do we find a good statistical estimator of u(:r:), call it iZ(z)?

Sij 5 Covar[n.,, nj]

about the errors ni such as their covariance matrixThe inverse problem is, given the ci ’s, the r,(:z:) ’s, and perhaps some information

of u(z) for example.entirely different function space from u(z), measuring different Fourier componentsnarrow instrumental response centered around z = :1:,. Or, the c,’s might "live" in ancertain locations 2:,, in which case ri(x) would have the form of a more or lessthis is quite a general formulation. The cfs might approximate values of u(z) at

ithin theassumption of linearity,(compare this to equations 13.3. lari`

;. s, + rt, = [ r,(:z:)u(:z:)da: + n,6kemel r,, and with its own other words,

each ci measures a (hopefully distinct) aspect of u(x) through its own linear responsemeasurements q, i = 1,2,...,N. The relation between u(:c) and the c,’s is thatand underlying!) physical process, which we hope to determine by a set or N. b QT

Suppose that ·u(x) is some unknown or underlying (u stands for both unknown

The Inverse Problem with Zeroth-Order HegularlzatlonEC well-posed. We are now equipped to face the subject of inverse problems.

positive, only one of the two need be nondegenerate for the overall problem to be

CF"; Z] minimization principle is combined with a quadratic consuaint. and both areA/M QWe can combine these two points. for this conclusion: When a quadratic

Chapter 18. Integral Equations and Inverse Theory? 5*33 R LSE E 'T pt (

second piece guaranteeing nondegeneracy.) OCR Outputsolution for u. ('l'he sum of two quadratic forms is itself a quadratic form. with thea positive definite matrix, then minimization of A[u] + AB[u] will lead to a uniquemultiple A times a nondegenerate quadratic form B[u], for example u - H · u with HAT · A in the normal equations 15.4.10 is degenerate.) However, if we add anyand note that for a “design matrix" A with fewer rows than columns, the matrixminimizing A[u] will not give a unique solution for u. (To see why, review §15.4,but degenerate (has a nonuivial nullspace, see §2.6, especially Figure 2.6.1), thenfor some matrix A and vector c. If A has fewer rows than columns, or if A is square

(18.4.4)A[u] = IA · u - c|2

ln the example above. now suppose that A[u] has the particular formThe second preliminary point has to do with degenerate minimization principles.

minimization of the sum A + A18.or (ii) a minimization of B for some constrained value of A, or (iii) a weightedbe thought of as either (i) a minimization of A for some constrained value of B,the problem of minimizing B. Any solution along this curve can equally wellvaries along a so·called trade-ojf curve between the problem of minimizing A andfamily of solutions, say, u(A1). As A; varies from 0 to oo, the solution u(A1)exactly the same in the two cases. Both cases will yield the same one-parameterconstant l/A2, and identifying 1/A2 with A1, we see that the actual variations arewith, this time, A2 the Lagrange multiplier. Multiplying equation (18.4.3) by the

(Su {B[u] + A2(A[u] — a)} = g (B[u] + A2A[u]) = 0 (18.4.3)

we have

to the constraint that A[u] have a particular value, a. Instead of equation (18.4.2)Next, suppose that we change our minds and decide to minimize B[u] subject

since it doesn’t depend on u.where A1 is a Lagrange multiplier. Notice that b is absent in the second equality,

6u {A[u] + A;(B[u] — b)} = i (A[u] + A1B[u]) = 0 (18.4.2)

some particular value, say b. The method of Lagrange multipliers gives the variationnow suppose that we want to minimize A[u] subject to the constraint that B [u] have(Of course these will generally give different answers for u.) As another possibility,

minimize: A[u] or minimize: B[u] X (18.4.1)

e u by eitherpositive functiorgssf-uedetermine by some minimization principle. Let A[u] > 0 and B[u] > O be twoof mathematical points. Suppose that u is an "unknown" vector that we plan to

Later discussion will be facilitated by some preliminary mention of a couple

Priori information

18.4 In verse Problems and the Use of A

18.4 inverse Problems and the Use of A Priori information 795 ( Q

at all to do with the meas OCR Output

to give a solution that is “smootlt" or "stable" or "likely" ·— and that has nofunctional or regularizing operator. In any case. g B by itself is supposeda priori judgments about the likelihood of a solution. B is called the stabilizingthe solution with respect to variations in the data, or sometimes a quantity reflectingdesired solution, or sometimes a related quantity that parametrizes the stability of

is where 8 comes in. It measures something like the "smoothness" of the

minimization Droblem.

other ways unrealistic, reflecting that A alone typically defines a highly degene(often impossibly good), but the solution becomes unstable, wildly oscillating, or in

the agreement or sharpness becomes very gten A by itself is mithe "sharpness" of the mapping between the solution and the underlying function.the agreement of a model to the data (e.g., X2), or sometimes a related quantity liketwo positive functionals, call them A and B. The first, A, measures something likemost of the basic ideas that are used in inverse problem theory. In general, there are

Zeroth-order regularization, though dominated by better methods, demonstrates

1/2N +(2N)1/2, the other N —(2N)distribution). One might equally plausibly try two values of A, one giving X2 =distribution with mean N and standard deviation (2N )1/ 2 (the asymptotic X2

The value N is actually a surrogate for any value drawn from a Gaussian

other solution is dominated by at least one solution on the curve.um of B are the "best” solutions, in the sense that everyum of A and tlte

shown here schematically as the shaded region, those on the boundary connecting the uncons8). Among all possible solutions,denoted A), and smoothness or stability of the solution (

agreement between data and solution. or "sharpness" of mapping between ¤·ue and estimated solution (hereFigure 18.4.1. Almost all inverse problem methods involve a trade-off between two optimizations;

Better Smoothness B

(independent of smoothness)best agreement

achievable solutions

2;;0:_,2:U i t / { // . /1Q uESQc

5C- Evn. **2

Chapter 18. Integral Equajqns and Inverse Theory798

called the solution of the inverse problem with zeroth·order regularization. OCR Outputmuch extra regularization as a plausible value of X2 dictates. The resulting 1'I(:i:) ischoice, in fact, is to find that value of A which yields X2 = N, that is, to get about assubject to the constraint that X2 have some constant nonzero value. A popularii · E. From the preliminary discussion above, we can view this as minimizing ii · G

Increasing A pulls the solution away from minimizing X2 in favor of minimizingnumber of degrees of freedom and the expected value of X2 should be about u z N.that for the true underlying function u(::), which has no adjustable parameters, thelarge, is being driven down to zero (and, not meaningfully, beyond). Yet, we knowfreedom u = N - M, which is approximately the expected value of X2 when u isunrealistically small, if not zero. In the language of §15.1, the number of degrees ofu will often have enough freedom to be able to make X2 (equation 18.4.9) quitevalue of A? First, note that if M > N (many more unknowns than equations), then

What happens if we determine E by equation (18.4.11) with a non·infinitesimalas more general ones, without the ad hoc use of SVD.in the limit of small A. Below, we will leam how to do such minimizations, as well

(18.4.11)X’[ii]+A(i-6)

minimizing the sum of the two positive functionalsis a limiting case of what is called zeroth-order regularization, corresponding to(look at Figure 2.6.1). This solution is often called the principal solution. It

(18.4.10)[i2(a:,,)]2 a minimum

sense ofselect the one with smallestsolutions (most of them badly behaved with arbitrarily large t'Z(z,,)’s) SVD willvalues, indicative of a highly non·unique solution. Among the infinity of degeneratediscussed. The SVD process will thus surely find a large number of zero singularof normal equations; since M is greater than N they will be singular, as we alreadyto find the vector ii that minimizes equation (18.4.9). Don’t t1·y to use the method

Now you can use the method of singular value decomposition (SVD) in §l5.4with er, 2 (Covar[i, i])*and the approximate equality holds if you can neglect the off~diagonal covariances,(compare with equation 15.1.5). Here S" is the inverse of the covariance matrix,

°qua we ( 6 H Uci _ 21 R2#a(z#)N :2Q { U.

(1849) l .

X2 = - 1z,,,a(x,,)scj - R,,,a(a,,)N N M Z Z Z t=1 j=1 1.:1 M ] y {Z ,1:1Form a X2 measure of how well a model e’I(.r:) agrees with the measured data,u(z,,)’s? Here is a bad way, but one that contains the germ of some correct ideas:

I-low do you solve a set of equations like equation (18.4.7) for the unknown( 18.4.5) and (18.4.7) as being equivalent for practical purposes.(or any other simple quadrature — it rarely matters which). We will view equations

797

G6/K/é'I2A·!. /2/3*770 4/ OCR Output

MHKFS

¥GuLAKlZ4770A/`

(2} GO!/Sfff}/21/‘ Tug Zdéfqhyj 7-0 gg SHELL

OpT(P'i¢AL..

IF Z w€!`GMf3 Age' Hygyqc

f2€`N0\fE NE:'UI?oA/S (A9!`7`7·{ SMALL OU·77;;¤-S

SMALL ¢Mg;6H7-3

'2€M»ué" C,é>I~·"M6C·Z`70U~$ wl

THAT b0{L..L» $0L\f€ q4(6~ y9,gO’Zg7.{

ll USF TM5 $1-}"'lPL€$f’ Ngrwoztb

Awcrb g\fé{ZF•?`·n,u

FET T"? 73*79. BW mr /%:36

q€QULR7€[2F\·T{¤:J`

tail. (a) Marchand et al. [1990]. (b) Frean [1990]. (c) Mézard and Nada.! [1989]. OCR OutputShaded arrows represent connections from all the units or inputs at the arrow’sthe numbered circles are threshold units, numbered in order of their creation.FIGURE 6.16 Network construction algorithm. The black dots are inputs, while

••••••••••••••••

1) (2) (2) (43) (a) B (3) 9 (3

+ NA · + HA —

s) (s

(U)

(c)

••••••••

1) (2) (:1) (4) (s

(a)

15Q6.6 Optimal N•1w¤rk Ard1i1•ezu¢••

AU) flcbb/}1/{ /\fOD6`$5779727/}1/`§` Ffafi 4 H UV//4#C. 5/E15

EKSIL/M TL/65/2/A/4 A

A do rx-tex TE?/f /1// 0 Uc-F Cousrszf //l/

D A»w1 A Q G"gmmx/`

ZMSTEED 05 7’»€cM/f/1/Q A ALM

uzbxu LA*/6:2‘4'·'·‘/~ couuec FED

) ? ééOCR Outputiw M vxs W ‘Lb O Mg W ... M S _...FEEL-Pompey MM

~·`¤f N ·'~P~¤ > XzvkgzvSMMLE ‘2oA/

SF2? P Hm erm 6- Type OF MM

·rneP~»i/s $Mr¢e·"

ana size p.

{Q NAI

wl,)fs Re w~·.a.xcinw.w»

wir)dye

0F THE Né`7'i».'>O’R\<.. CH H)\;M>,_;{•4 ~ cwezvoucvkis ‘Z>¢r·u:w5'4¤M’

r`{ ·T·’}.£C•£WE )J}\f·C,bNU€GT;V(Wis EDBY $¢."P<;ÌJD§ gg] A, QQOPLMTY OZ:

BENQ Pn$¢,e To GGMEZATF A KFC, Sh·wn7>Lg·

Lyxumf OFSc>•—»€‘n`M»eZS mE HAvé`

L12q€C?6Q<)rf2@ Foe GG EE@£< Q

·

TWA·{UtA/{ SAWZPLG ${26

{25..,.] OCR Output

wg gC2}

¢:O_..m(0 W S +4)

t¤+¢24;

F=·'¤rb MN. c».>ZTH IMTECSK L¤€7Z.`Hg weTi-el f2»é%uz.vsESE "“’ "‘N$

Ave

Q [UQ)How A<CC-URA.( 3Q

UV?—c.c~r>U-2 ym’{>2vvv~% wa. HfSoLU?>L€ jmpmgl NQ OCR Outputer.woo +•'mz.sZ :- 2_'° w

F" M

4oo mevvms -—> 200 mmmg

6wP¤uc~;»~>·r¢À¢_ Qcowm Moms: T ·;· Z

L Sfsr) Sm. u‘ri¤M`(1.;) an emuc.¢.sELL] Ir! W¤¢S‘F

(ZJ) Fon LA4C».€Txi-\€ mc,¢€A$CS LLKE N {Q ju smeiide

?oL`1AI¤r·tlhL ‘[`(v:1E' HERNS

6 {HL. 4%

L) N0 QNF ldfbwg How W QLV .¢ {

A5 5cHE

(I] I1- Kb gTL€R$-I-· wb S L CP ° U.'WMT Menws

@@0£C.G'¤*(

NP ·· C¤HPL/SWE'

TMLS 1}; AA;CO V H LEXITY `Tu®¢Y 5AyS

EX%MG1\S'T1ALy{

To ?S(,;;;LS A F°'°°T'°MoFNN.$;?G~i?fl How °°"°T LE“““‘“" `"*"G

?( ¤7 QF C0VLPU7`H‘|’)·O/~JA4_, @npCéy;7UJG Lgnfau FaookUJ\··(pfT

}~A/ 7}L/HUH/AL Tru; OCR Output

UWY 077-lcI /‘¢é"T/MDS 67-*/6 Goo; 4vfPRo¥r'M/H'/dv!

_ Lc,7(Wvn/S ear, WJ **—

L•v¤vs<¢>•»$·¢ '

2. Tusenrnék TE¤vwi¤u¢;, **9/,4/$¤LUT1ON'

•...>•~€c•·4 fs L€SS THAN -?— K LENQT•-1 OF $5*

LA-N BE Such pw. ‘T·5.`P. mk 6(N4. usivoc, we I’l¢}vimu.$vA~¤¢`M$Tz¤`é‘, A s¤L¤-ném

Scnzrfnas f}?f’rZ0xz`Mnré' $¤\.=f‘¤'¤Ms MGT

Jc&$€

nswBu? %¤H.6Te°M€S EAS`! I.-"°"‘

..@n‘•°•.E7'€'

T¤‘r‘AL. ‘r’BA•./E¢.`1N$·

cpyggHlA}¢M(2HU¢,BH)

KLL. CUUES calui

TTS. www- msi?

ExAMP¢-EW TRS F/<P\00S T-QEAVGLWC; €A·LESHP¢AI`{sca. EM-<.r '·>e>¢u·r·n¤»f

IN woesr cA·z<·.‘

xv? \\S(P·C53¢’\<i’L€7E3GH‘ZS$ \`1> OHL

Perspective OCR Output8 Learning in Artificial Neural Networks: a Statistical

817 Neural Network Learning and Statistics

79Part ll Leaming and Statistics

With K. Hornik and M. Stinchcornbe

55Derivative

Universal Approximation of an Unknown Mapping and lts

With M. StinchcombeMultilayer Feedforward Networks with Bounded Weights 4lApproximating and Learning Unknown Mappings Using

With M. StinchcombeBritish Library.

29Non-sigmoid Hidden Layer Activation FunctionsUniversal Approximation Using Feedforward Networks with

With K. Hornik and M. Stinchcombe

l2Approxirnators

Multilayer Feedforward Networks Are Universal

With A. R. Gallant

Avoidable Mistaks

There Exists a Neural Network That Does Not Make

Partl Approximation Theory

and J. Wooldridgewith A. R. Gallant, K. Homik, M. stinchcombe,

( i JHalbert White

Approximation and Learning Theory

Artificial Neural Networks

OCR OutputOCR Output@

dimension (Vapnik and Chervonenkis. 1971) we use the concept of metricto corollary 2 of l-laussler (1989). lnstead of using the concept of V-C·3»(u) Qn(wv

We now give a result permitting verification of condition (4.1), related, measurableapplication.constants g, and 5, of the previous section determine Q, and 9,, in ourous on 9,, forfast and that Q, converges to Q uniformly in an appropriate sense. TheY® .Q(9)·element of 9. The behavior of 9, ensures that 9,(w) does not increase too,, and the setficiently dense in 9, so that an element of 6,(q) can well approximate anondence withbounding 6,(¤). The behavior of Q, ensures that 9,(o) becomes sufe a. completeassumption of part (b) is the existence of nonstochastic sets Q, and 9,cy space andpermits this for i.i.d. and stationary mixing processes. The other notableconvergence can be verified in particular stochastic contexts; our next resultthe limit to which Q, converges uniformly (condition 4.1). This uniform

the indicated The object 0, is distinguished by its role as minimizer of Q (condition 4.2),che indicated statements about consistency. Part (b) establishu consistency of 6, for 0,.icrated by the measurability we cannot make probability statements about 8,,_ such asshcombe and Part (a) establishm the existence of a measurable estimator 0,. Without>f White and Q x 9,.no: limited to measurable extension to 0 x 9 of Q, originally defined on gr 9, orad so as to be results in no loss of generality, as there generally exists an appropriatei 2.1 of White Stinchcombe and White (1989b) establishes that defining Q, on fi x 9ined from the Q x 9, instead of on all of Q x 9 as we assume. Lemma 2.1 ofthe previous it may be natural to have Q, defined only on the graph of 9,,(w) or on

permits treatment of cross-validation procedures. ln some applicationswhich optimization is carried out may depend on the data through ui; this

(squared error) optimized to arrive at an estimator 6,,. The set é,,(w) overin this space (weighted mean squared error). Q, is the criterion functionunknown regrusion function 0,). and p is a metric that measures distancedent or a mixing sequence. The space 9 contains the object of interest (the

mailer is this [Z,} is defined. The properties of P determine whether {Z,} is an indepencme mappmg ln our application, (Q, JZ P) is the space on which the stochastic processIn the limit, n (0.. . 9s) L 0

nn provides a where $(0,, c) s i9e9:p(0,0,) 2 ci, and Q is continuous at 9,. Then1cn optimiza

<4.2>i¤f...·i... ., QW) - QW,) > 0.+E)1 ·• 1 as

.4 or A.l(b), and for 0, s 9

Connectionist Nonparametric Regression 175

| n stochastic; OCR Output

exponentiaP[¤=?¤:|Q.(<».0) -Q(0)|>c] ··0asn-·¤¤, (4.1)the same a

that for all e > 0 number offor all to in 0, n = l, 2, .... Suppose there exists a function Q: 6 —• I5 such metric entpact subsets of 9 such that UZ., Q. is dense in 9 and Q, Q é,,(o) g 6,, entropy (lt

(b) In addition, supposei Q,} and (G,.} are incrusing sequences of com dimensionor all u in 0. to corolla:

We nots?·”/$(9,) (hence-Y/9(6)) such that Q,,(¤,€,,(t.i)) = mi¤,_°_:T5;::,?§;Then for each n =1, 2, . . . there exists a function 6,,:0 —• 9_ all application

constantseachoin0,n=l,2,.measurable, and Suppose that Q.(w. ·) is lower semicontinuous on 9 for ” fast and t9,,(w) is non-empty and compact. Let Q_;0 x 9··§ b, y®g(6)_ element o

9..6 M?®9(9.)) Such that for each o in 0 é,(¤) q 9__ ud th, sa ficiently cboundingseparable Borel subset of 9 and let é,,;0 • 6 be ; ccnapoudma withassumpttcM (9.p) be a metric space. For n s 1,z,____ in g_ be a commupermits tlT’*¢¤'¢M *-1- (l) Let (0,.9I P) be a complete probability spaee mdconverger

the limitU•ll¢l¤•

. corrapondence, and .d( · ) is the collxtion of analytic sets of the indicate;\ I / TM ¤b5¤ - #2-IAS¢¤¤¢m¢¤open sets of the argument set, gr( ·) denotes the yaph of the indica:m¢¤¤¤kWhite (l989b). We write 9( ·) to denote the Bore! c·Held generatd bythePm (IWooldridge. Other notation and dehnitions are as in Stinchcombe and

0 x 9,.this uae, howevc. Where possible, notation follow that of White and

>· vc OCR Output

biological, vol. 3 contains tutorial and software)(Vol. 1 gives the foundations and algorithms, vol. 2 is moreMIT Press, 19IIParallel Distributed Processing, vols. 1, 2 and 3,

McClelland and Rumelhart,

Springer•Verlag, 1999se1f·Organi:ation and Associative Memory (3rd edition)

Kohonen, T.,

(probably the best book available for physicists)Addison·wesley, 1991Introduction to the Theory of Neural Computation

nertz, Krogh and Palmer,

world Scientific, 1990Physical Models of Neural Networks,

Geszti, T.,

(reprinted in Anderson and Rosenfeld)Reviews of Modern Physics 34(1962), 123-135The Perceptron: A model for brain functioning,

Block, H.D.

(A good general introduction, paperback, not expensive.)Adam Hilger, 1990Neural Computing, an introduction,

Beale and Jackson,

(A collection of important papers, expensive)MIT Press, 1988weurocomputing: Foundations of Research

Anderson and Rosenfeld (eds.),

approach;)(Amit is a theoretical physicist, but often adopts the biologicalCambridge University Press, 1989Modelling Brain Function,

Amit, Daniel

F. James, August, 1991SOM! RLCENT GOOD BOOKS ON NBURAL NETWORKS

A microscopic "rnask" is laid on thetriaminel in their experiments.They used one called DETA (for diethylene· nirlinnOCR Outputthat promotes the growth of neurons.of

er single layer of molecules of a chemicalat0 chip of silicon or glass and coat it with a

The two scientists said they take aUsing DETA

University of California·lrvine. othertional institutes of Health and at thecollaborations with scientists at the NaResearch. are developing the biochlps infunded largely by the Office of Naval

Drs. Stenger and Hlchnan. who arememory or learning.with, or perhaps enhance. functions like than onuse to see if new compounds might interfere

aw make "biochips" that drug makers couldhe It also may be possible to eventuallyI 3.

idea. Dr. Stenger said.ne

variation of the "canary-in·the-coa.l mine'used in chemical warfare. a bioelectronic$9 mxdetected a nerve poison. such as nerve gas

more oftenws on a chip that would sound the alarm when it

·/* { FEB. to develop a semor composed of nerve cellsing with researchers at Stanford Universitypany. For example. the pair are collaboratginia. a "high-tech" contract-research comJaz/KA/A·¢ ELtiom lntermtionsl Corp. in Mclean. Vir

list

the For Euchemist Jay Hickman of Science Appllca·51-*said Dr. Stenger and his collaborator.lead to some useful bioelectronic devices.ww-L

Nevertheles. their basic research couldFEE

Possible Nerve·Gas Sensor

engineer. Dr. Stenger explained.puter developers are currently trying to

ellche

on. artificial neurons that electronics and comlearn how to make better networks of1%

UC. with each other on the chips. they shouldthe biological neurons function and connect animal for only a matter of hours.

comment onhe said. As the researchers study how networks of brain cells outside the intactin Rockvilleyou can investigate how neurons work:mof ent. neuroscientists are able to study

Susan Cr‘We’re working with systems where perimentation. Dr. Stenger noted. At presbacterial coeffort to make an artificial brain. functioning for months for study and ex0.5% concemight be misinterpreted as a fledgling The neurons can be kept alive anda nationwicin Washington, worried that the research of DETA.

ln earlyStenger of the Naval Research Laboratory dendrites and axons following the pathsinCopley’siresearch," said biophysicist David L. then send out connecting "wires" calledterol. The iBx ‘I want to emphasize this is fundamental off," Dr. Stenger explained. The neuronssystem userons form in the brain. sparkly things on it and then shook themmaintenancwill crudely mimic the circuitry that neu a piece of paper and sprinkled thoseical monitoto grow connections to each other that when you were a kid and you put glue onhave writtetor months to get the brain cells. or neurons, will "take" and grow. "lt's kind of likenumber ofto be able within the next six to 12 that those landing on the DETA spots

The FDAalong desired paths. The scientists hope less sprinkled on the chip with the hoperesolved.and then induce the brain cells to grow Embryonic rat neurons are more orand will bedesired spots on silicon or glass chips ron growth.this matter§€I' how to place embryonic brain cells in layer of molecules that discourage neu·Copley comce! The researchers said they had learned and the cleared sections are filled with aUE DWDMthat use living brain cells. let laser removes the unshielded DETA.through a sstep coward creating electronic microchips follow to make connections. An ultravio·

CopleyResearchers said they took a key first the channels they want the neurons toSea;] Reporzgr ceutlcal finscientists want the neurons to settle and

) keted at hl:By Jann? E. Brsuor chip that shields both the spots where thecost versioproducersproduction.In Microchip-Neuron Linkuconcern abhas nrvmltrouble ovtResearchers Make Advanceoimts ven

Albuuqulky at

'°”“‘""§;..i..‘ei.1...$$.$ 0... th. p.a¤;•p§mp;.€§.t;2L¥,?¥‘£7 §'G"E”‘”?‘$ ¥£}§$‘°‘‘}°’°

T

t; sl;¤n~te¤¤. Peng chip an\. II

• I I 53

'PNZAM. OCR Outputrnmmezg 4-» \ A 7;

has ¤@

rf! 7- Y 07c ‘ fy Re$mont.s

several NN architectures are compared with conventional methods. Excellent results are obtained.network with error haclopropagation. Two completely different methods are described in detail and their efficiencies for

We developped neural-network learning techniques for the recognition of decays of charged tracks using .i |èed·for~s4rd

Received 10 Nlav l99l

Luhnrunure Je Phixrque C urptm u/utre. L`ntt·erstté Blume Pascal. C/errrmnr-Ferrund. M'} " .·lu/were. Frame

Georg Stimpfl-Abele

network techniquesRecognition of decays of charged tracks with neural

~··¤h·H·*"¢¤·* CommunicationsOCR Outputgimputer Physics Ctimmunicmons 67 ( l99l» IH}-1*42 CO{"npu{Qf Physics

@2.-ti QJQZQ ·yt,c<.cvLx Xg.4$1(Q,\F` LUQV1 {Dk QU/{Ki OCR Output

N - ‘\l' ·

consumingpa,

47.078.3 65.9àster. Since Nm 48.179.3 65.3of the input

XBM 40.276.0 62.0; effects (theXgcs 14.353.1 35.8

) in randoml0 GeV3 GeV 5 GeV

.d 150 times

Kinlvfinding efficiencies (%) for DSIM datathem to the

Table 3bbout 40000

par51.679.0 69.1

50.1N res 75.2 68.5

one hiddenX par 50.279.8 68.0

concentrateX res 38.073.5 57.8

not gain by10 GeV5 GeV3 GeV

2-1 layouts.Kink-finding efficiencies (%) for FSIM datahe same efTable 3azonfiguration

residual ap

l0-·10-1 the

parameter-comparison method.wc obtainedrectly be compared to the results of the analyticaltwo hidden

The efficiency of the neural networks can dilarger for DSIM data.ber of non-decaying tracks with bad fits is much

tracks and to 5% for DSIM tracks since the num

0010-1655/93/$06.00 © 1993 — Elsevier Science Publishers B.V. All rights reserved OCR Output

Correspondence to: R. Barlow. Department of Physics. Manchester University, Manchester, M13 9PL. UK

Carlo models.

systematic effects that may result from the use of such a network, by using samples from different MonteSection 4 studies the size and topology of network needed to bring out the results. In section 5 we discusspreprocessed variables (following the suggestions of ref. [2]), secondary jet clusters, and raw momenta.we compare the performance of networks using 3 different philosophies for input variables: highlydefinition of the extent to which one network performance is "better" than another. Then in section 3particularly when using it to perform a statistical separation. This gives an objective and unambiguous

In the next section we discuss how the separating power of a network can be quantitatively measured.GeV. and it is not clear that the situations are comparable.study was done at a centre of mass energy of 14 GeV, where the distinction is much clearer than at 91format (they used a hypothetical calorimeter) is preferable to using more sophisticated variables. but thisthem. Gottschalk and Nolty [1] suggest that entering the shape of the event in a largely unprocessed("raw") event information, trusting to the network training algorithm to find the best way of combiningfor b identification, to give the network the best possible chance of success, or to use unprocessednetwork; in particular, whether to use constructed shape variables known from experience to be useful

In such an application the most important question is what the best input variables are to use with theof b quark jets produced in the reaction e* e`-• Z -• qi at a centre of mass energy of 91 GeV.nature of the jets is known. This note explores the possibility further, concentrating on the identificationnetworks trained and tested on samples of data generated by Monte Carlo programs. for which theparameters). Some studies have already been done on this topic [1,2], examining the performance ofgeneral jet shape as input (as opposed to specihc inputs such as high pT leptons or large impactjet, specifically to discrimination between jets from heavy (b) and light (u, d, s, c) quarks. using only thephysics they can be applied to the problem of recognising the nature of the quark producing a hadronic

Neural networks with feed-forward topology are widely used in pattern recognition. In high energy

1. Introduction

techniques are given.

believed to be good for separation. Some first studies of systematic errors resulting from using neural network separationnetworks perform better separation if they are given simple inputs, as opposed to inputs already combined into variables

The problem of identifying b quark jets produced at LEP using a neural network technique has been studied. We find that

Received 5 June 1992

Department of Physics, Manchester University. Manchester, M13 9PL. UK

Graham Bahan and Roger Barlow

Identification of b jets using neural networks

CommunicationsN°"“`H°"“‘°

Computer Physics Communications 74 (1993) 199-216 CO.-nputer physics

Fig. 7. The learning curve for nw inputs. OCR Output

N0. of Training Cycle:

ZM lm W W IW IZM 14% IW IW

OJ25

0.15

0.175

0.2

0.225

0.25

0.275

?•I‘I MIIHH I ?

0.3

0.325

Fig. 5. The lcaming curve for the semndary clustering inpuu.

No. dTr¤ining Cycle:

2% % @ 1% 12% 14%

0.125

0.15

0.1 75

0.2

0.225

0. 25

O. 2 75

0.3

0. 325

208 G. Bahan, R. Barlow / ldenufcazian of b jets using neural network!

nnn OCR Output· (ht · ir) F · " ———— Mi

i- 1, 2, 3...NB and F estimated ash(x) and l(x) but to the two Monte Carlo event samples. These must be histogrammed in bins

To evaluate 1-' for the results of a particular network one does not have access to the parent functions

Appendix. Evaluating F

useful discussions.

We are indebted to Leif Lbnnblad for providing us with the JETNET program, and to Terry Wyatt for

Acknowledgements

networks, or different versions of the same network.discriminating power of the network, enabling meaningful comparisons to be made between different

We recommend the use of the figure of merit, F, as an unambiguous parameter describing theunderstood.

network is much longer, and there are still features of the output which are unpleasant and not well

Fig. 9. Effect of the number of nodes in a single hidden layer on the performance of networks using different input;

No. of

I 3 5 7 9 Il I3 I3 17 19 210.23

A IL

0.26

0.2 7

0.28

0.29

0.3

0.31

OCR OutputI 212 G. Bahan. R. Barlow / Idennfcanbn of b jets mn; neural network:

0010-4655/93/S 06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved OCR Output

‘ E-mail address: [email protected] (Em) around 91 GeV (LEP 100), has [email protected] CERN, operating with a center-of-mass energyCERN, CH·l2ll Geneva 23, Switzerland; E-mail addr¤s:

the Large Electron Positron collider (LEP) atCorrespondence to: G. Stimph-Abele, PPE division,Breaking mechanism [2]. During recent yearsson, H, responsible for the so-called Symmetrymodel predicts the existence of the Higgs bointeractions among elementary particles. Thisphasis is given to the selection of the best inputis the commonly accepted theory to explain the. layer and error back-propagation are used. EmThe Standard Model of particle physics [1]Standard feed-forward nets with one hidden

the Z mass.LEP 200mass is rather challenging since it is just below2. Higgs production and backgrounds atgrounds can be better discriminated. The higher

case because the sipal is higher and the backGeV/cz. The lower mass represents the easiering two mass hypotheses are chosen: 70 and 90 of the methods. Finally, conclusions are given.for the Higgs boson at LEP 200. The follow sively in section 7 in order to test the reliabilitywith a neural net (NN) approach in the search in section 6. Systematic eHècts are studied exten

standard one-dimensional cuts, is compared of the N'Ns in the Higgs search is demonstratedIn this study a traditional filtering method, us select the best input·variables. The performance

background events. methods developed to analyse neural nets and totering process intended to separate signal and and learning procedure, and a description of theysis looking for new phenomena needs a fil cal details ofthe net generation, like architecturenumber of conventional events. Hence an anal dimensional cuts. Section S contains the techniticles are produced along with a much larger icated to the standard analysis based on onephysics. In general, events containing new par preselection of the input data. Section 4 is dedamong the most important tasks in high energy lowed by a description ofthe generation and the

The search for new elementary particles is The physics case is discussed in section 2, folSystematic eHècts are studied in detail.

1. Introduction variables by analysing their utility inside the net.

nets are found to be sipiticantly better than those of standard methods.are presented. The sensitivity of the nets for systematic eHècts is studied extensively. The ehiciencies of the neuralat LEP 200. New methods to select the most edicient variables in such a classiiication task and to analyse the netsof neural nets. Feed-forward nets with error bacbpropagation are applied to the search for the standard Higgs boson

Two aspects of ¤eura1·¤et analysis are addressed: the application of neural nets to physics analysis and the analysis

Received 8 June 1993; in revised form 22 August 1993

Rice University, Houston TX 77251-1892, USALaboratoire de Physique Corpusculaire, Université Blaise Pascal, Clermont-Ferrand, F-63177 Aubiere, France

Georg Stimpt'1-Abele “ and Pablo Yepes "·‘

Higgs search and neural-uct analysis

>*<>¤¤·H<>¤¤¤¤ CommunicationsComputer Physics Communications 78 (1993) 1-16 COWWDUTGT

1ò allow for a direct compari standard cuts (table 3) are included to ease the OCR Outputdoes not improve the statisti the significance is maximal. The results of thePgm (9). Adding them to the The cut on the NN output is chosen such thatibove. There are two new vari the analysis at 70 and 90 GeV/cz, respectively.the standard analysis are con cal significance are shown in tables 7 and 8 forwith section 4 shows that all ground events of the three nets and the statisti

The number of accepted signal and backnputs (N7) but still good per maximum around 0.9for high performance and the raise almost linearly with the cut, reaching theirfor further study. The net with to the preselection. The statistical significancesthe performance two nets from statistical fluctuations. The threshold at O is due

events above the cut are demanded to avoid bigng between the number of input for both masses. At least two background

r nets N, are the variables 1 to the three nets as function of the cut on the outre the inputs for N 10. The in

354.391.422.Min. luminosity [pb"] 610.fitted with the HZ hypothesis8.47.7 8.0Statistical significance 6.4

f secondaries in the Higgs jets15.5 10.9 6.935.1Total backgound

4.211.4 5.36.6e+e‘ -• Z Z. ofthe most energetic electron 1.63.9 2.611.8e+e" -• W+W‘·

1.1:0 three jets (S9). 3.0ll.9 5.0e"’e' - qdggangles between the jets if the 22.026.438.0 30.5e+e‘ -—· H90 Z

N10CutsReactiontass of all secondary tracks

canoe.

. event is timed with the WW analyses at my = 90 GeV/cz and their statistical signifiSignal and background events left for standard and NN

Table 8·-.1argcd tracks (Md;).

124.157.174.Min. luminosity [pb"] 330.10.9 8.0 14.2 '12.612.0Statistical simiticance 8.712.6 8.0

8.33.56.048.7Total backgound13.2 8.20.60.21.4 0.4e+e" —· Z Z13.2 8.44.01.43.225.0e+e‘ -» W+W‘14.2 8.43.71.92.422.3e*e‘ -·» qdgg15.5

40.923.629.560.515.8 e+e‘—» Hm Z

19.1 8.7N10NetCutsReaction

Hm Hoocance.

.scs. analyses at my = 70 GeV/cz and their statistical signitiSignal and background events left for standard and NNas function of the number of input

Table 7

G. Stimpfl-Abele. P. Yepes / Higgs search and neural~net analysis

© l990 The American Physical Society OCR Output l32|

hidden units.ard layered architecture (see Fig. l) with inputforwkc ours the neurons are often organized in a feed FIG. l. A lèed·lòrward neural network with one layer of

nectivity weights wi,. For feature recognition problemsingredients in a neural network are neurons n, and con . $• Xit

‘—° jkThe neuralmetwork learning algorithm.--The basiclions.

could be used in a variety of different triggering situaSluon and quark jets. it is clear that the methodology

Although this paper is limited to the separation ofindependent, e.g.. considering the fastest particles only.ourselves to kinematical quantities that are most modelready there. This eHèct can be minimized by limiting

(6)Amy;. ·· nZmj,5,g'(a,)xi +aAm}’l°our studies; some of the physics one wants to study is al`Some extent this induces a "chicken·and·egg" eH'ect to Correspondingly, for the input to hidden layers one hasefe ` events using the Lund Monte Carlo model. To

6; '(yg ‘I, )g'(di) .We conhne our studies to Monte Carlo·generatedtime triggering.

for the hidden to output layers, where 6, is given bystructure. The latter feature is very important for real~in custom·made hardware with its simple processor Aw;] ' ` Hajhj +GA(.r)3 (4)very general, inherently parallel. and easy to implement

sponds tohas the advantage over other Etting schemes in that it is

is minimized. Changing my- by gradient descent correis exactly what the neural~network approach aims at. Itxpert`s exercise to a "black box" htting procedure. This

(3)E- % ZZ(y,‘P’—t,‘P’)’(quark or gluon). This reduces the problem from anserved hadronic kinematical information and the feature tion

would be to End the functional mapping between the ob back·pr0pagation learning rules where the error funcA straightforward method for identifying the jets quently used procedure for accomplishing this is the

produced in high·pr hadron-hadron collisions. equals the desired output or target value ti". A freenergies are less well known. One such example is jets ter x ') gives rise to an output (feature) value y"that(’that in many situations “g1obal" quantities like total jet changing the weights my such that a given input paramewould be desirable to focus on the latter alternative given to be learned. Training the network corresponds tothe entire event rather than just a single isolated jet. It building up an "internal representation" of the patternsorate schemes.’·° Such procedures are often based on The hidden nodes have the task of correlating andjet with smallest energy as the gluon jet' to more elab and Z, wijhj, respectively.the kinematic variables ranging from just identifying the weighted input sums aj and aj are given by Zi wjixiidentihcation has been done by making various cuts on where the "temperature" T sets the slope of g and thecoupling in e*e ' annihilation.; To date the gluon-jet

y;'g(G;/T) ,quired for establishing the existence of the three—gluonAlso. a fairly precise identincation of the gluon jet is re

(I)h,•g(aj/T),studies on the so-called string eHèct‘ and related issues.

one haslight on the confinement mechanism in terms of detailedtiit from many perspectives. lt can shed experimental 0.5[l+tanh(x)]. For the hidden and output neurons

__ab\e to distinguish the origin of a jet of hadrons is impor thresholds this sum with a "sigmoid" function g(.x)performs a weighted sum of the incoming signals andand quark jets using a neural-network identifier. Being

In this Letter, we demonstrate how to separate gluon (xi), hidden (hj), and output (y.) nodes. Each neuron

PACS numbers: l3.87.Fh. l2.38.Qk, l3.65.+i

Monte Carlo-generated e 'e ' events with 85%-90% accuracy.Using a neural-network classifier we are able to separate gluon from quark jets originating from

(Received 6 April 1990)Deparrmenz of Theoretical Physics, L'nii-army of Lund. Sélvegaran [4,4. S -22362 Lund. Sweden

Leif L6nnblad.“Carsten Pe1erson,and Thorsteinn R6gnvaldsson(""()

Finding Gluon Jets with a Neural Trigger

vo1.L·Ms65, Numsanll 10 SEPTEMBER 1990PHYSICAL REVIEW LETTERS

l.9;O.l OCR Output2.730.13.0;0.lr(m,,___,>nrK)l.b0;0.il22.1630.03 2.00:0.02r={n:`>/(rt,}

M';74'2l()0‘I·Success rate

Smallest jetNeural networkMC truth

approach and the "smallest jet" approachdifferent methods of identifying the gluon jet: using the true gluon jet of the MC. the NN ,r

Results for the r · (rt:)/<n,> measured on a sample of l0000 MC events (ARIADNEI for / ‘N`DTitan.; 7 LP AA

* ** thgpdrm 5¢ldc52 (bitnet) tlennitu thep.lu.se (internet) Jfwyo,* ' thepcaptu seldc52 (bitnet) carstenfu thep.lu.se (internetl

thepllét seldc52 (bitnetl leifw thep.lu.se (internet)

and energy. ln a previous paper [1] preliminary results for gluon·quark separationfeature extraction procedures will become more acute with increasing luminosityrelevant quantities in collected data. Needless to say. the demand for efficientlow-level trigger conditions in experimental setups, to extraction of theoretically

High·energy physics contain many feature recognition problems, ranging fromexecution times and thereby facilitating real-time performance.

neural networks and the feasibility of making custom made hardware with fast

ness and robustness. Another attractive feature is the inherent parallelism inof the NN promising but the entire approach is very appealing with its adaptivevariety of real·world feature recognition applications. Not only is the performanceenthusiasm is the power this new computational paradigm has shown for a widebrain·style computing in terms of artificial neural networks (NN). The origin of this

During the last couple of years there has been an upsurge in interest for

1. Introduction

compressing the dimensionality of the state space of hadrons.how the neural network method can be used to disentangle different hadronization schemes bypurity. which is comparable with what is expected from vertex detectors. We also speculate onjust observing the hadrons. ln particular we are able to separate b-quarks with an efficiency and

ln addition, heavy quarks fb and cl in e°e‘ reactions can be identified on the 50% level by

effect.

model used. This approach for isolating the gluon jet is then used to study the so-called stringCarlo generated e’e‘ events with ~85‘? approach. The result is independent of the MCnetwork. With this method we are able to separate gluon from quark jets originating from Montefunctions using a gradient descent procedure. where the errors are back-propagated through thequark-gluon identity. This is done with a neuronic expansion in terms of a network of sigmoidalis to find an efficient mapping between certain observed hadronic kinematical variables and the

A neural network method for identifying the ancestor of a hadron jet is presented. The idea

Received 29 June l990

Department of Theoretical Physics. University of Lund, Séltegaran I 4.4. S-22362 Lund. Sweden

* * *Leif LONNBLAD ". Carsten PETERSON * ' and Thorsteinn ROGNVALDSSON

USING NEURAL NETWORKS TO IDENTIFY JETS

North-Holland

Nuclear Physics B349 ( l99l l 675-702

0010-4655/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved OCR Output

Amendola 173, 70126 Bari, Italy. followed by a Xe—CH, filled proportional chamCorrespondence to: M. Castellano, INFN, Sez. di Bari, Via ules, each one made of a carbon fiber radiator

The TRD prototype [10] consists of 10 mod

produced by known particle-classes, are stored in 2. The pattern classiication systempurpose, the output patterns of the detectors,the classification power of detectors. For thistaken into account in high energy physics to study data and test results are discussed.ful statistical and neurocomputing techniques, are the third section the application to experimental

Pattem recognition methods, involving power ture and the likelihood ratio test are described; inments [8-10]. tion system is presented and the neural architecup to l TeV energy in cosmic ray space experi ray experiments. In the next section the classificacan be used to distinguish positrons from protons for electrons/ hadrons discrimination in cosmicexperiments (see e.g. ref. [7]). Moreover, TRDs ate the pcrforrnances of a TRD developed [10,11]

network and likelihood ratio algorithm to evalufuture also in LHC (see e.g. ref. [6]) and SSCtion system based on a back-propagation neuralat CERN [2,3] and Fermilab [4,5] and in the

high energy particle identification in experiments In this paper is presented a pattem recogiiclassification power.Lorentz factor [1]: they have been employed forin a suitable feature space provides the detectorused to discriminate particles with a different yspace. A measure of the separability of the classesTransition radiation detectors (TRDs) can betern from pattern space into class·membershipfor particle identification in high energy physics.tionship among data in terms of mapping a patSeveral detectors have been proposed, so far,cation system in order to carry out explicit rela

1. Introduction tagged data files. They are examined by a classifi

successfully identities 4.0 GeV/c electrons with an hadron contamination of about 4><10°’ at 98% acceptance efficiency.multiwires proportional chamber of the detector. The best results are obtained by the neural network approach thatlikelihood ratio technique. The information fed into the classihcation system consists of the number of hits detected by eachdiscrimination is presented. It is based both on a layered feed~forward neural network trained using back-propagation and a

A classihcation system able to evaluate the performances of a transition radiation detector prototype for electrons / hadrons

Received 16 June 1993; in revised form 16 August 1993

___ ‘ Istituto Elabaraziane Segnali e Immagini - CMR., Va Amendola 166/5, 70126 Bari, Italyb INFN Sez. di Bari, Via Amendola 173, 70126 Bari, Italy“ Dipartimento di Fzlsica, Université di Bari. and INFM Se:. di Bari, Va Amendola 173, 70126 Bari, Italy

G. Pasquariello ° and P. SpinelliR. Bellotti a, M. Castellano b, C. De Marzo “, N. Giglietto b,

radiation detector

method to evaluate the performance of a transitionA comparison bctwccn a neural network and the likelihood

CommunicationsNorth-Holland

Computer Physics Communications 78 (1993) 17-22 Computer Physics

(solid line). Collider in the LEP-Tunnel, ECFA 90-133, Vol. l (1990). OCR Outputhood ratio technique (dotted line) and the neural network [6] Proc. ECFA-CERN Workshop on the Large HadronGeV/ c data: the TRD performance evaluated by the likeli (1989).Fig. 5. Pion contamination versus electron acceptance for 4 [5] D. Errede et al., preprint Fermilab-Conf-89/170-E

E|•etron keontence (1984).¤.S¤ 0.9 0.92 0.94 0.96 0.98

[4] A. Denisov et al., preprint Fermilab-Conf-84/134-E[3] A. Vacchi. Nucl. Instrum. Methods A 252 (1986) 235.[2] J. Cobb et al., Nucl. Instrum. Methods 140 (1977) 413.[1] B. Dolgoshein, Nucl. Instrum. Methods A 326 (1993) 434.

Neurol Network References

task.Lihelihm

components to speed up the particle classificationcould be hardware implemented using neuron-likeparallel distributed processing model of the netresults. Finally, from the practical view point, the

4 Gov/C Dole played a fundamental role in obtaining betterthe likelihood ratio statistical technique hasmany hypotheses simultaneously with respect tobeen presented. The ability of the net to explorefor the electron / pion discrimination problem hasapproaches to evaluate the TRD performances

A comparison between statistical and neuralthree different momentum data set for bothmerizes pion contamination against acceptance at

4. Conclusionselectron acceptance efficiency. The table 1 summum pion contamination value around 95% ofAs a result, the NN technique reaches the mini

as possible is required.to the procedure described in the second section.short duration exposures, an acceptance as largeversus electron acceptance is computed accordingsearching for rare events with the constraint ofemphasized in fig. 5 where pion contaminationspace cosmic ray experiments [9,10] where,respect to the L one. This behaviour is well[12,19,20] and it can be useful to employ TRDs infeature values is achieved in the No space withperformance is never shown by other methodsonly. Better accumulation around 0 and 1 of thenation is already achieved at full acceptance. Thisated, as shown in fig. 4 for 4 GeV/c momentummethods. In the NN method a very low contamimentum, the NO and L feature spaces are gener

sets, i.e. about 5000 events for each beam mocounts in each chamber. Using the same datadetector, while the electron patterns have some

98% 236.3 26.7 97.9 90.0 21.4 4.5tected with a small number of hits in the whole 97% 134.9 26.7 46.6 8.1 14.1 4.5shown in fig. 3. The pion events are mainly de 21.9 6.0 6.8 4.295 % 83.4 26.7

90% 25.3 15.1 14.8 6.0 3.0 4.2able. Typical data for pions and electrons are1, are taken at the different beam momenta avail L NN L NN L NN

The TRD data, according to the scheme in fig. 2 GeV 4 GeVAcceptance 1 GeVfor the electrons is evaluated of about 10"‘ [10].

techniques.ule electronic logic. A mean tagging inefficiencyacceptance by the likelihood ratio test and neural network

ters, lead glass scintillators and a standard mod Pion contamination (x l0") evaluated for different electronfixed momentum, by means of Cherenkov coun Table l

21R. Belloui ct aL / Evaluating the performance of a transition radiation detector

0010-4655 /93/$06.1X) © 1993 — Elsevier Science Publishers B.V. All rights reserved OCR Output

A-1050 Wien, Austria (permanent address). can be mapped on a compatibility graph in whichCorrespondence to: R. Friihwirth, Nikolsdorfer Gasse 18. The relation of compatibility in a set of tracks

2. Architecture of the network

maximal set.

.,..---., niuicators (Qls) will be called the ues. are presented in section 4.. with the largest sum .. performance of the network algorithm

.q•A¢. ,-J. Inc GEIZUS UI. una ......¤•¤[IOn

2538Processing time [CPU s] 518 2067

Tracks in class 3 16251176149

Tracks in class 2 850240 903251

Tracks in class 1 487455183867 3712

8279Reconstructable tracks 4331 4331 8279

10000Simulated tracks 5H 10000Sm

standardAlgorithm neural netneural net standard

Data set llData set I

Classitication of simulated tracks.

t'Table 2

of c¤···¤tible tracks. It has

mization problem by selecting ar "° Hcpfield n··‘algorithm therefore has to solve a ifiderable detail in ref. [1]. Bvpatible, i.e. must not share data. The selection sets in a compatibility graph has been analyzed intwo tracks in the f'mal selection have to be com The problem of finding maximal compatiblestruction algorithm is given by the fact that any cedure is described in section 4.

An important constraint on any track recon track angle and chi-square probability. The proful to define the track QI as a combination of

1. Introduction In our specific application we have found it use

simulated data of the DELPHI detector.

compatibility graph. The properties of the network are explored with random graphs. The performance is illustrated withA feed-back neural network is described which solves the problem of finding an optimal set of compatible nodes in a

Received 6 July 1993

Institut jiir Hochenergtbphysik der Osterreichischcn Akadznuh der Wsscnschafterr, Wenna, Auswia

Rudolf Friihwirth

neural networkSelection of optimal subsets of tracks with a feed·back

North-HollandComputer Physiu Communications 78 (1993) 23-28 Cgrnputg Phyggs

the experimental data. OCR Outputtional effects. The result is a simulated data sample (MC) that can be checked againstMonte Carlo events according to the theoretical model and to fold them with the addiexperimental data). A commonly used strategy to overcome this problem is to generateeffects not included in the model ( e.g.: detector effects during the acquisition of theof a. large number of calculations with complex algorithms) and/ or there are additionalbecause their explicit analytical form does not exist ( e.g.: the models are the resultexperimental data. In some cases the theoretical models cannot be checked directlyAn everyday task in all areas of science is the comparison of theoretical models with

1 Introduction

(Submitted to Computer Physics Communications)

between the two clistributiomc.

nets, gives a lower bound value on any estimator that measures the inc:oni·1i.··:tenc:ybinning. The method, which can be successfully implemented on layerecl neuralclimensionx clue to the lad: of data., and with the relevant fact that it is free ofempirical distributiomn which is not restricted to work with projections in fewer

V6/e present a. method to text the agreement between two multidimensional

Abstract

January 20, 1994

E-38200 La Lagima (Tenerife), Spain(3) Instituto de Astrofisica de Canaries

E-08193 Bellaterra (Barcelona), SpainUniversitat Autonoma de Barcelona(2) Institut de Fisica d’Altes Energies

Diagonal 647, E-08028 Barcelona, Spain

Universitat de Barcelona(1) Departament d’Estructu.ra i Constituens de la Materia

LLuis Garridolml , Vicens Gaitan, Miquel Serra-Ricartilzl

empirical distributionsTest of agreement between two multidimensional

Tm; NET Wm El/Ely Mm pm To ,9vALv€<¤‘/<OCR Output

9 4.

.-Jp O ')"`°

'T<>8€

TP UT`7 ` `

;,&c—~>¢e~; .g°“ruggsEU£A·L NFHOOQK ,;·;F CQUESF

How TO E$T1MATE' K

(UA/t<Nowu {I were {uhh

3 s 4. M2

I ` pv¤B¤LZN; A».sL(¤$v‘p F z u,,..Au I Z

E as) r E QE}

Q (5me wwwa:

Rc3€U` 0VVU

OCR OutputOCR OutputL2 M 64 W ..66%,% mm?

Documents

Lectures given 31 J am. — 4 Feb., 1994 OCR Output · 2017. 11. 2. · A F77 package of adaptive artificial neural network algorithms. JETNET 2.0. is presented. its primary target