Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Lectures given 31 J am. — 4 Feb., 1994 OCR Output
F. JAMES
Introduction to NEURAL NETWCRKS
ACADEMIC TRAINING PROGRAMME
C E R N
February 8, 1994
(wEc>A/G [ OCR Output
How 77·Cc` !-UU7/M/ SRAM! 6uof€¢<.S
VV/*:5 GN/CG A Mabel OF
MOST ol? THF TU-1.6
MOST 0F THF $7?>f2€ [S /30Le
(/\J5TQxJcT/0f\LY AND T/F7?}
PMN'! A STORE COA/TH/A//VV
OF I/\/§7`R\)C7`lOALS'
SEGUE/\/TI`/H. EXECUTYOU
7/lg \/my /VEUHM/4} QV!/?<J7?'7€
@9 : OCR Output
°* **;¢·~•s&cwps
Swag cmu 7/
mmmHkéqusY {0 *,10* I, /0 C°#’~c—2Ln“g
;,
ewsrcn4.©er¢r
@2awbAJQQ
DeusxvCATHEZJ.
GouczM- · Hinmic
imma;(Him?
`E:·@€gsmiivé '
·$-v gx;Povs
ShrihriuGhbréuam
/~·P¢rV5.
Rods RUBBIIIOCGW?
OJH
we uw A/·
ff6GQ`¢€OCR Output
€r»v$·rcAMBGJZT
J" BéawbHLFQQ
Gave;MP4:
·ze;*'"*€HRnsr
igizec. . Awua éan
2S`—P jg;PoP?
S/%G6·T'?~GHGQEL
RUBECHQL
in{O ’O
/MP!/T5. JL, é O
$‘{sTéM$~. AW may [N lqqq OCR Output
meHwroraieav.
Rr¤m·L¤~@0r2¢r
O IF T4 5.
(G`) =2L as qj'7
"éo xrcmrév A,»
HHDLLDFiee I _—D°races ¤¤=· ‘Tr?A¤\>sF `
AUX
J é5; UU;}
***3 S.,
(na
2 = iff?
I•1(’o$S/BLE AS coMPur£rz$ (sr z? ic¢$EE OCR Output
VZGGQAMHZ/\}§ wie.; 45c;q;HE
(ey Ir is sr·ec¤·.A·r<q> THAT T¤?A{p1`7‘z;>MA4_
c2Az,.»;m. ¢De·m»r+>,¢v/¤»v
wswsml/e~· 7B /wzsg-3Zmr- www?(9
yy; swan /"¢€•€K6"7',/wénfwél
um`./»;vz$AL ?24;2>i»L·,me5(#1
No Mem Tb em/.0&>?sm»¤/_»b
LGFWN Ftoh @><aw»74@sls)
mmf AA/7 O77{4·'7( ¢<‘wcuA/ M6`i7Z¢,bAf bowé M·+·f{ PQGBCGHS EcvmeCz)
muezgmwuy 7°»4r2»°»Lz.e`¢ -—> FA SrOCR OutputM)
H€%a>0T N· M. ?
WHAT is 50 EXOYT/‘N
@
\F THLS c.ZM.iT' is F’u»vz>LUG.- (AMLL. TK? T5 UA/%é`Q,$
>wc> /A/'PUTS
KJUEOGLCT/WSfor &;[_(fE”Srzq,
LA·nSO YE
OUT Q· A/)°
S/Af
www is gg5.2 OCR Output
, Z.=4wWi
O.?
OCR OutputOCR OutputOCR Output"
Mlm; ea OCR Output
¢» (M O me
Sl/PE?}//56313 ” 4€H:@zvm/ OCR Output
THF TW~Am$¢U`{ SNMPLG
LFF-V2 VSTug wsrwozmé
Qbwsr we wave
Heeuzrézrwép
A zvevwcé/<7»e;=m»rJz~/4 mwzms
Ge»v€RA»rE A beczpc mr
6/*1*L 2>°c@
U/ig AOCR OutputHOC`)
BUT IT $0VlG'Tj/WGS yUOg{(_g_ OCR Output¢/SiTEC}!/V/@06* /5 \/my pgp;} 7/·
%’77gA/v*7%//I ¢A/0U—!»/*‘/9% HW}'7/
(Q 4¤‘F<> (2)
Nfib TVXPEOUG-S'nr miie; cm v~·¤¢$€, CUT STEP my
, ;/BT; FONT (NEW (»V€'6"q- /J/-Swan? [H LMCUCAT
(STEP LENGTH /.6 A €!.·*.§.§§!<z>z¤e>¤¢Trb»/41. ·rc> /·"$e:’z(v»°r‘f"V<=’ Gmlm We-P
(5) HOXUFY MEMHTS B'! AMUW "Sn·:zs¢>¢=sr Dé7$<6*7
TH/<’oU6»‘( UG"`/°
£’mJ> Swag?2)LAl-CULATE ALL /é-T bgfll/ATg:i"i
av ¤€T¤·>¤€4<__ Amb Swe n CALLULATE CHPSWAEE - xm
{‘T?·§ - HJ'? ·/WY- /7’3’6
OCR OutputOCR OutputOCR OutputOCR OutputHE BACK- QDROPAG/VUGM ALqOR`}77!H
network with the back~propagation updating algorithm in OCR Outputand good performance. IETNET 2.0 implements such a
Nb. of bits in a word: 32 multilayer networks are widely used due to their simplicitywide range of problem areas. In particular feed-forward
Memory required to execute with typical data: —· 90 kword: powerful paradigm for automated feature recognition in aArtificial neural networks (ANN) have tumed out to be a very
Programming language used: FORTRAN 77 Method of solution
ULTRIX RISC 4.2 data.Operating system under which the program has been tested: is the introduction of relevant cuts in the multi-dimensional
products. Standard procedures for such recognition problemsTheoretical Physics. University of Lund. Sweden quark based on the kinematics of the hadronic fragmentationComputer: DECstation 3lO0: Installation: Department of based on calorimeter infomation ot the identification of aSUN. Apollo. VAX. IBM and others with a F77 compiler tion problems. It could be separating photons from leptonsComputer for which the program Ls designed: DECstati0n. High energy physics offers many challenging pattern-recogni
Nature of physical problemLicensing pr0t·i.ttons: none
neural network
issue) Keywords: pattem recognition. jet identification. artificialUniversity of Belfast. N. Ireland (see application form in thisProgram obtainable from: CPC Program Library, Queen's etc.: 3345
N0. of lines in distributed program. including test deck data.Catalogue number: ACGV
output
`Htle of program: JETNET version 2.0 Penpherals used: terminal for input. terminal or printer for
PROGRAM SUMMARY
architectures and procedures.
consists of a set of subroutines. which can either be used with standard options or be easily modified to host alternativeingredients are the multilayer perceptron back·propagation algorithm and the topological self-organizing map. The packageenergy physics community, but it is general enough to be used in any pattern-recognition application area. The basic
A F77 package of adaptive artificial neural network algorithms. JETNET 2.0. is presented. its primary target is the high
Received 27 August 1991
Deparrmenz of Theorerical Physics. University of Lund, Sélcegacan I 4 A, S-223 62 Lund. Sweden
Leif Lonnblad, Carsten Peterson and Thorsreirm Rognvaldsson
networks - JETNET 2.0Pauem recognition in high energy physics with artificial neural
N¤rrh-Holland CommunicationsComputer Physics Communications 70 (1992) 167-182 COUWQUIBT PhYS1CS
-. { ·- -— /5)..· -iv»¢-oi ¢- ·r
Operating system: DEC OSF 1.3 OCR Output
Lund, SwedenComputer: DEC Alpha. 3000; installation: Department of Theoretical Physics, University of Lund,
Hewlett·Packard, and others with a F77 compilerComputer for which the programme ir designed: DEC Alpha, DECstation, SUN, Apollo, VAX, IBM,
pub/J•tn•t/ or from t:r••h•p.sc:1.tsu. edu in directory !z·••h•p/ uxalys 1s/ j•tn•e.Program obtainable from: denni0thep.lu.se or via anonymous ftp from thep.1u.s• in directory
Catalogue number:
Title of Program: J ETNET version 3.0
PROGRAM SUMMARY
Submitted to Computer Physics Communications
Theory Division, CERN, CH 1211 Geneva 23, Switzerland
Leif Lonnblad
Solvegatan 14 A, $-223 62 Lund, SwedenDepartment of Theoretical Physics, University of Lund,
Carsten Peterson and Thorsteinn Rognvaldsson
Neural Network Package
J ETNET 3.0 — A Versatile Artificial
December 1993CERN-TH.7135/94
LU TP 93-29
1,j OCR Output7,%, -4 S 54-8,,s +4.
gw E —> O0 M Hjéw) -9 mz mn Q;
E—>0 as E’_Cw>-;·&.THEN
A ·**¤ .» 0- Q(H M - l s i, {0; ({ @4, — 1·»Z1·»/1 + . (`¢gY=)z,an ‘?M·n<-¤w¤2= (¤€kiv€D neon Tummm-,},,0 'T+q¤g
'
Lf5jL»·ri€S§_>zCs·r cs
1* $+u*°·»€wmsw fl'-) c¤ns*(¤··J", mf
$¤¤.uTlo•0 SS Af_g1`Q`57‘ vgzy 5·) )C
s wzxcms Ger VEZY FA£ Fz¤
wp ?€2¤L>c-uz-; ;& A.rJ.M.=
(ee; Polsswuj
mmgw Fkrring G,.¢>.¤ss¢`A4>—Dz>77@ 2,42;>?r•`»»mL gvpm s·n<.»+¢ @4og>cw;T;G
( E} CW)KL ?
CDU Pep rzprnk. Cas? F-ZMQTLQM
Quercw Mhxqs TQ,
wm;.Mx QMBEMTE OCR OutputFml gw] H. l N`! Mu M. Jgv•vq,£Z.¢»
¤>,1?·f6`You 5`r2/Han?
_ I __ "
s. Gm
M m~¤e¤:_ iéowb -<>r?»e2" mF¤>em¢4TZ°¢>" "
O I <>\Ié` I"`
N} UNCT`l§k‘%$
E KI TEP SEE.
x
2.\ UW B6 UQ?
5 QE
G R na HT
45 `ZD<>€$M'T Know
cc-:M‘rTEEPEST
/NWO M) é7di \/QS
03#77 CN
cw` ` ' '
J/Eé"C.f!cAf` OCR Output
FP uf ofmhre
) __`I F Q NE6/!7rV»!' @5; Z/FW/4 (_
,{;p}A//re-··S7é"P
J __$0[F L2 oO
Pune N`STM§LE
$€C.<> up D€El\/A#7TU€:"
3UT METKGLS U5} nl
/2
vAwPm»~6> @J><7>Li<JrL·OCR Output
C<>N30AAT€ D/12GcT/ws W!THOU7'IT `IS P¤SSl1SLE ‘TC> FU~/QD
AEG C0/vJuc;»4r€ mnt-.F me Af Luvss [J /ee-·cv*zwv5
UJ•(TH N
You DQ•\L.L. F GXHC-TLFurrcuéz <—Z¤5v€g '3 mw:} THA'?
’’Bx,-9x _, I N ( : Jwird ?LF
CO~·sz }>€>Z A Cpu/h7KAT{k, ¢q¢A/c/77¤4/
xr J, = 0ji
w.né._Qne; smc cm Bc? C.e>MJ<z7G
gz7`wo {Difco nb Ms
( pc.e·:‘rcHé’1c + fees/65, "’/79OCR OutputG>,v.7u& era 6’2A1>iew7‘ M£7%b
Auf New °b(k€cné*'G wQ'(·é€° Co/\}JUG#7’€' OCR Output3) COUT7·A}L¢,é' {N ,$A»r•4€ WA 7
c>2·rHoC»0•dAg, TpNagano!/~
1} cams: new Q;
mhxénéze mm lx TB X .-·— __[ )
$¤‘-*}CHO¤$é g_;_<_»I»»·¢N{;;
COA/JU64T€ To Aa;
.gyéjr4L•Q4"O z gg
Tl-{CU A-NY \/ECTDZ dt ORTHDGGNA4. TD Q7 [S
=> il =G AK- A'
T,C __F FblNcTfo4] jg @(/A?£A
A__5W
KQ +G X4A K :·. X` ...-2gc
v~Q·<.¤Ld1
COMJcJ&42@’ GKM/gm ,4.;qM;,;y,(22
QMBAL r»t?A/MUN OCR Outputu>S Ou-nj QF 5 NA., •»umnA·(Z>ur séTT¤.ss /~r<>H\G,s—( TEM-PER RTUQE ·—-5 Lew TEHPGRRTURE `
¤rwLA1·:p AmeA¤`wQ=
I,0<.P<L Mc»/MA i
gvé'2 THE wH¤Lf5 TYf}}A/[A/[ <S0"‘*7P¢€3) '[D€\‘¤`¤¢=; AS GL-<>YsAL 5uVl
TQA·v”Mui/`C, %£;M,PwCWCLG 0V€l2 SETS (SUQQGWKOF TNF
yhp •/¤ mméer af z‘¤w1"¤.¤r»€A Few Pxwrevmse>V67<2-J 5MM
—- ?TZA·•`N}v5$$A;»~¤.PL€ {S CALLSD A/UMG HP GTE C, CLE 7‘}{€¢Iiq'0 c
$6..0116*:él-¤8Ar.
Fez P4‘rt‘€Rnl #2
P¤5cT ¢F2$g:¤®f·QF ‘T£A¢N¢M<, $A·wtPLG’ gx/6 04/(_=‘__
4] \%|MiV\.L’Q_-e? L QA-U-( /
KASEUQ T1-{E TRQIMIMQ $F\'M,'FCQ`-§I>iFF·2¤<€m‘ $·n2A·re<;i¤$ GF
DL·\LOG 4.9} OCR Output
can now recognize a wide range of is currently the world's fastest and38 types of neural networks and learnexample. artificial neural networks er is required which can simulate all faster than a powerful workstation. itresearch and development. For form such tasks: so a special comput SYNAPSE-1 is not only 8 OOO times
achieved in scientific and industrial weeks · sometimes months - to per second. With power on this scale.Successful results have already been Even the fastest workstations take adding 16-bit numbers) everycomputers to do in the future. point operations (i.e. multiplying orthe sort of things we will want our power required perform more than five billion fixed
Q when knowledge is incomplete are process neural applications. lt` canA leap in computingognition, or making sensible guesses cally reducing the time needed toautomatic speech and image rec magnitude. mens AG directed towards dramati
language or recognizing faces. Yet Development Department at' SieGannaay resources by at least five orders ofway from understanding natural Munich. power required exceeds existing work by the Central Research and
Si•nr•naA6. learning process: the computingpowerful computers are still a long SYNAPSE-l is the result of two years‘capacity.scious effort. For example, even very difficulty of simulating the neural
appear to achieve without any con ing massively parallel processingvelopment of new applications is thebrain. nervous system and senses search and The major obstacle to the rapid de network of simple processors provid
replicating what the human C••¤elR•- rates. neural computer is equipped with aputers have fallen far short in Remaster. predict the movement of exchange based on the von Neumann model, a
ing processes. Unlike com_ ersmil now, conventional com By Dr. Ulrich handwritten letters and numbers, or
is the fastest neural computer in the world
SYNAP$E·1. designed tu simulate the human brain,
OCR Output? Bs
Fax: (021) 635 88 82 OCR OutputTel.: (021) 632 01 11tions of neural algorithms.
the compute intensive operaCH-1020 Renenscessor MA16 which exewtes
for future developments. Division Band from the neuro-signal-proported and provision is made lnfbfmationssysieme SAsor and memory architecture
from a scalable multi-proces neural networks are sup- Siemens Nixdorfture, so that any of today‘stions per second) is resulting'general purpose' architec- contact:multiplications and accumula
to 3.2 billion connections or For more information, pleaseimplementedThe power of SYNAPSE-1 (uptentlal applications can be station.
that of a powerful workstation. that a complete area of po- VME bus connection to workand large neural networks. so mtgggorders of magnitude abovesupport of small, mediumperformance that iles several
computer SYNAPSE-1 offers a Frequency rate 25 MHzbe minimizedware solutions. The neuro MBytesls.that development times canmuch superior to those son VME bus connection 4conventional computers, soteuure of neural networks are Working memory 8 MBytes,orders of magnitude above per second,take into account the ardtlcompute power of at lea: 3Special purpose conputers that generator 25 million addresses
Peak perfonnance Y addressplatforms. universal neurocomputer: 20 MIPS.
Peak performance sequencerrelevant requirements for aneural applications on suchControl UniSYNAPSE-1 fulfills all thevents fast developments of
neural Ieaming proces preFrequency rate 25 MHzEspecially the simulation of the nAPL is embedded in C+-•·.Y memory 8 MBytes.tremely long processing times. development of his algorithms.Mbytesls.explored because of the ex whidt supports the user in the
only very simple cases can be VME bus connection 4gramming Language" (nAPL)Working memory 8 MBytes,computers like workstations, use 'neural Algorithms Prointeger operations per second,be simulated on conventional programmed in the easy-toPeak performance 100 millionAlthough neural networks can environment. Applications areMtaxmof today‘s computer systems. and workstation programming
different from the architeuure programs, operating softwaretem is delivered with microral networks is considerably Capacity 128 MBytesOn the software side. the sys W memoThe stiumure of artificial neu
problem areas. face. Frequency rate 25 MHzto tackle the above mentioned - Address bus 56 MBytes/s.mented via a VME bus interHcial neural networks in order input/output devices is imple Control busses 200 MByteslssystems are modelled by ani· workstation and spedalized Data busses 0.8 GBytes/s
(plus 100 MBytesIs parity)tant. Therefore, natural neural Communication with hostbecome more and more impor Backplane 0.8 GBytes/sespecially Ieaming capabilities Transfer ratescoordination.
Memory capacity 32 MBytes.optimization problems and a control unit for control andsecond.gnition, solution of complex a high-bandwidth memoryaccumulations (48 bit) pere.g. image and speech reco intensive operationsmultiplications (16x16 bit) andhand, these capabilities like a data unit for non computePeak performance 3.2 billiontion technology. On the other an array of 8 MA16 chipsMMG arranents:to model by classical informa
brain and senses are difficuit following hardware compoThe capabilities ofthe human Technical data:The system consists of the
Neurocomputer SYNAPSE-1
Dr es sc ngemeur ml Teeefax 32* /535 B6 B2NPeter Puhlmarm Te·e:r»ore ~cer~zran 32* /632 C1
Taleonare Vcvecn 327 x632 D3 32SI
Cr——WO2O ¤er·ens
Avenue ¤eS Baumétles 7
•¤¤emauo¤a¤ Aceoums
¤r¤:ma¤·¤¤ssvs:¤me SA
$·emens Nuccd
NIXDORF
SIEMENS
Bu? THE/ZE {IS HUGH /•A/ CCH/¢6l/' OCR Output
0~J*·¢6’@€`N$ MJZS is A/`OA/·L/`}U6/»‘£
Z.//\/579% I/V VE`ESé`·'
T}-{E7 AEG 7’£&7¤·‘TM/¢ TFKF
/V“’*¢"?¢¢Z»+&· 2é‘¢//’eS /»~/ F¤r2r¢v»/ "_ r4¢e:s‘ ?7.S`- <?/Fmgvizw sy wzicim peeas, mz em4>»v¢2> ms
A5 AU JAfVE7€5E FEOELFM
Fin/.D VALUES 4;,:: we"1GH¤
c/{cass Nun rsél sec A/€U»c¤»v‘.Semswn
www Su! M TEAM//Vb MwrzsCwmuG Mc-IGSK u‘e‘Tw`o BK Mgyzfawze
COMSFDQR. THE GCO@AL (P{€0z$L67‘<
Lew us ENE'? BHQK Mb
HA»3>~'r $6EV
A/éTw0m.<
'¥`¤ A VATTGRM
QeuERALr2ATzc>N
HE DAW OCR Output
v M<¤M<>rZ{21}~M
Ox/€£·Fl7
'T7Z4i»w}~f5 SGWWLFH-na THE
'€>LA<-K @0}
EN
Rn, 2 r,(:1:,,)(x,,+i - 2:,,-1)/2 (18.4.8) OCR Output
where the N x M matrix R has components
ct = > R;,_,u(a:,,) + nt (18.4.7)
replace equation (18.4.5) by a quadrature liketo denote values of immediate observables.) For such a dense set of x,,’s, we canit to denote values in the space of the underlying process, and Roman letters like imuch between any 2:,, and :1:,,+1. (Here and following we will use Greek letters likelarge, and the a:,,’s are sufficiently evenly spaced, that neither u(x) nor r,(a:) varieslarge number M of discrete points :1:,,, it = 1,2,.. . ,M, where M is sufficiently
We can’t really want every point :1: of the function 1'I(:r). We do want somecurve" through a scattering of Fourier coefficients?)underlying function live in quite different function spaces. (How do you "draw athese assumptions, and to extend our abilities to cases where the measurements andthe nature of the response functions r,(:r), or both. Our purpose now is to formalizeare making some assumptions, either about the underlying function u(::), or aboutmeasure "enough points" and then "draw a curve through them." In doing so, weYet, whether formally or informally, we do this all the time in science. We routinelyreconstruct a whole function t'i(:1:) from only a finite number of discrete values c,?
It should be obvious that this is an ill-posed problem. After all, how can wehow do we find a good statistical estimator of u(:r:), call it iZ(z)?
Sij 5 Covar[n.,, nj]
about the errors ni such as their covariance matrixThe inverse problem is, given the ci ’s, the r,(:z:) ’s, and perhaps some information
of u(z) for example.entirely different function space from u(z), measuring different Fourier componentsnarrow instrumental response centered around z = :1:,. Or, the c,’s might "live" in ancertain locations 2:,, in which case ri(x) would have the form of a more or lessthis is quite a general formulation. The cfs might approximate values of u(z) at
ithin theassumption of linearity,(compare this to equations 13.3. lari`
;. s, + rt, = [ r,(:z:)u(:z:)da: + n,6kemel r,, and with its own other words,
each ci measures a (hopefully distinct) aspect of u(x) through its own linear responsemeasurements q, i = 1,2,...,N. The relation between u(:c) and the c,’s is thatand underlying!) physical process, which we hope to determine by a set or N. b QT
Suppose that ·u(x) is some unknown or underlying (u stands for both unknown
The Inverse Problem with Zeroth-Order HegularlzatlonEC well-posed. We are now equipped to face the subject of inverse problems.
positive, only one of the two need be nondegenerate for the overall problem to be
CF"; Z] minimization principle is combined with a quadratic consuaint. and both areA/M QWe can combine these two points. for this conclusion: When a quadratic
Chapter 18. Integral Equations and Inverse Theory? 5*33 R LSE E 'T pt (
second piece guaranteeing nondegeneracy.) OCR Outputsolution for u. ('l'he sum of two quadratic forms is itself a quadratic form. with thea positive definite matrix, then minimization of A[u] + AB[u] will lead to a uniquemultiple A times a nondegenerate quadratic form B[u], for example u - H · u with HAT · A in the normal equations 15.4.10 is degenerate.) However, if we add anyand note that for a “design matrix" A with fewer rows than columns, the matrixminimizing A[u] will not give a unique solution for u. (To see why, review §15.4,but degenerate (has a nonuivial nullspace, see §2.6, especially Figure 2.6.1), thenfor some matrix A and vector c. If A has fewer rows than columns, or if A is square
(18.4.4)A[u] = IA · u - c|2
ln the example above. now suppose that A[u] has the particular formThe second preliminary point has to do with degenerate minimization principles.
minimization of the sum A + A18.or (ii) a minimization of B for some constrained value of A, or (iii) a weightedbe thought of as either (i) a minimization of A for some constrained value of B,the problem of minimizing B. Any solution along this curve can equally wellvaries along a so·called trade-ojf curve between the problem of minimizing A andfamily of solutions, say, u(A1). As A; varies from 0 to oo, the solution u(A1)exactly the same in the two cases. Both cases will yield the same one-parameterconstant l/A2, and identifying 1/A2 with A1, we see that the actual variations arewith, this time, A2 the Lagrange multiplier. Multiplying equation (18.4.3) by the
(Su {B[u] + A2(A[u] — a)} = g (B[u] + A2A[u]) = 0 (18.4.3)
we have
to the constraint that A[u] have a particular value, a. Instead of equation (18.4.2)Next, suppose that we change our minds and decide to minimize B[u] subject
since it doesn’t depend on u.where A1 is a Lagrange multiplier. Notice that b is absent in the second equality,
6u {A[u] + A;(B[u] — b)} = i (A[u] + A1B[u]) = 0 (18.4.2)
some particular value, say b. The method of Lagrange multipliers gives the variationnow suppose that we want to minimize A[u] subject to the constraint that B [u] have(Of course these will generally give different answers for u.) As another possibility,
minimize: A[u] or minimize: B[u] X (18.4.1)
e u by eitherpositive functiorgssf-uedetermine by some minimization principle. Let A[u] > 0 and B[u] > O be twoof mathematical points. Suppose that u is an "unknown" vector that we plan to
Later discussion will be facilitated by some preliminary mention of a couple
Priori information
18.4 In verse Problems and the Use of A
18.4 inverse Problems and the Use of A Priori information 795 ( Q
at all to do with the meas OCR Output
to give a solution that is “smootlt" or "stable" or "likely" ·— and that has nofunctional or regularizing operator. In any case. g B by itself is supposeda priori judgments about the likelihood of a solution. B is called the stabilizingthe solution with respect to variations in the data, or sometimes a quantity reflectingdesired solution, or sometimes a related quantity that parametrizes the stability of
is where 8 comes in. It measures something like the "smoothness" of the
minimization Droblem.
other ways unrealistic, reflecting that A alone typically defines a highly degene(often impossibly good), but the solution becomes unstable, wildly oscillating, or in
the agreement or sharpness becomes very gten A by itself is mithe "sharpness" of the mapping between the solution and the underlying function.the agreement of a model to the data (e.g., X2), or sometimes a related quantity liketwo positive functionals, call them A and B. The first, A, measures something likemost of the basic ideas that are used in inverse problem theory. In general, there are
Zeroth-order regularization, though dominated by better methods, demonstrates
1/2N +(2N)1/2, the other N —(2N)distribution). One might equally plausibly try two values of A, one giving X2 =distribution with mean N and standard deviation (2N )1/ 2 (the asymptotic X2
The value N is actually a surrogate for any value drawn from a Gaussian
other solution is dominated by at least one solution on the curve.um of B are the "best” solutions, in the sense that everyum of A and tlte
shown here schematically as the shaded region, those on the boundary connecting the uncons8). Among all possible solutions,denoted A), and smoothness or stability of the solution (
agreement between data and solution. or "sharpness" of mapping between ¤·ue and estimated solution (hereFigure 18.4.1. Almost all inverse problem methods involve a trade-off between two optimizations;
Better Smoothness B
(independent of smoothness)best agreement
achievable solutions
2;;0:_,2:U i t / { // . /1Q uESQc
5C- Evn. **2
Chapter 18. Integral Equajqns and Inverse Theory798
called the solution of the inverse problem with zeroth·order regularization. OCR Outputmuch extra regularization as a plausible value of X2 dictates. The resulting 1'I(:i:) ischoice, in fact, is to find that value of A which yields X2 = N, that is, to get about assubject to the constraint that X2 have some constant nonzero value. A popularii · E. From the preliminary discussion above, we can view this as minimizing ii · G
Increasing A pulls the solution away from minimizing X2 in favor of minimizingnumber of degrees of freedom and the expected value of X2 should be about u z N.that for the true underlying function u(::), which has no adjustable parameters, thelarge, is being driven down to zero (and, not meaningfully, beyond). Yet, we knowfreedom u = N - M, which is approximately the expected value of X2 when u isunrealistically small, if not zero. In the language of §15.1, the number of degrees ofu will often have enough freedom to be able to make X2 (equation 18.4.9) quitevalue of A? First, note that if M > N (many more unknowns than equations), then
What happens if we determine E by equation (18.4.11) with a non·infinitesimalas more general ones, without the ad hoc use of SVD.in the limit of small A. Below, we will leam how to do such minimizations, as well
(18.4.11)X’[ii]+A(i-6)
minimizing the sum of the two positive functionalsis a limiting case of what is called zeroth-order regularization, corresponding to(look at Figure 2.6.1). This solution is often called the principal solution. It
(18.4.10)[i2(a:,,)]2 a minimum
sense ofselect the one with smallestsolutions (most of them badly behaved with arbitrarily large t'Z(z,,)’s) SVD willvalues, indicative of a highly non·unique solution. Among the infinity of degeneratediscussed. The SVD process will thus surely find a large number of zero singularof normal equations; since M is greater than N they will be singular, as we alreadyto find the vector ii that minimizes equation (18.4.9). Don’t t1·y to use the method
Now you can use the method of singular value decomposition (SVD) in §l5.4with er, 2 (Covar[i, i])*and the approximate equality holds if you can neglect the off~diagonal covariances,(compare with equation 15.1.5). Here S" is the inverse of the covariance matrix,
°qua we ( 6 H Uci _ 21 R2#a(z#)N :2Q { U.
(1849) l .
X2 = - 1z,,,a(x,,)scj - R,,,a(a,,)N N M Z Z Z t=1 j=1 1.:1 M ] y {Z ,1:1Form a X2 measure of how well a model e’I(.r:) agrees with the measured data,u(z,,)’s? Here is a bad way, but one that contains the germ of some correct ideas:
I-low do you solve a set of equations like equation (18.4.7) for the unknown( 18.4.5) and (18.4.7) as being equivalent for practical purposes.(or any other simple quadrature — it rarely matters which). We will view equations
797
G6/K/é'I2A·!. /2/3*770 4/ OCR Output
MHKFS
¥GuLAKlZ4770A/`
(2} GO!/Sfff}/21/‘ Tug Zdéfqhyj 7-0 gg SHELL
OpT(P'i¢AL..
IF Z w€!`GMf3 Age' Hygyqc
f2€`N0\fE NE:'UI?oA/S (A9!`7`7·{ SMALL OU·77;;¤-S
SMALL ¢Mg;6H7-3
'2€M»ué" C,é>I~·"M6C·Z`70U~$ wl
THAT b0{L..L» $0L\f€ q4(6~ y9,gO’Zg7.{
ll USF TM5 $1-}"'lPL€$f’ Ngrwoztb
Awcrb g\fé{ZF•?`·n,u
FET T"? 73*79. BW mr /%:36
q€QULR7€[2F\·T{¤:J`
tail. (a) Marchand et al. [1990]. (b) Frean [1990]. (c) Mézard and Nada.! [1989]. OCR OutputShaded arrows represent connections from all the units or inputs at the arrow’sthe numbered circles are threshold units, numbered in order of their creation.FIGURE 6.16 Network construction algorithm. The black dots are inputs, while
••••••••••••••••
1) (2) (2) (43) (a) B (3) 9 (3
+ NA · + HA —
s) (s
(U)
(c)
••••••••
1) (2) (:1) (4) (s
(a)
15Q6.6 Optimal N•1w¤rk Ard1i1•ezu¢••
AU) flcbb/}1/{ /\fOD6`$5779727/}1/`§` Ffafi 4 H UV//4#C. 5/E15
EKSIL/M TL/65/2/A/4 A
A do rx-tex TE?/f /1// 0 Uc-F Cousrszf //l/
D A»w1 A Q G"gmmx/`
ZMSTEED 05 7’»€cM/f/1/Q A ALM
uzbxu LA*/6:2‘4'·'·‘/~ couuec FED
) ? ééOCR Outputiw M vxs W ‘Lb O Mg W ... M S _...FEEL-Pompey MM
~·`¤f N ·'~P~¤ > XzvkgzvSMMLE ‘2oA/
SF2? P Hm erm 6- Type OF MM
·rneP~»i/s $Mr¢e·"
ana size p.
{Q NAI
wl,)fs Re w~·.a.xcinw.w»
wir)dye
0F THE Né`7'i».'>O’R\<.. CH H)\;M>,_;{•4 ~ cwezvoucvkis ‘Z>¢r·u:w5'4¤M’
r`{ ·T·’}.£C•£WE )J}\f·C,bNU€GT;V(Wis EDBY $¢."P<;`IJD§ gg] A, QQOPLMTY OZ:
BENQ Pn$¢,e To GGMEZATF A KFC, Sh·wn7>Lg·
Lyxumf OFSc>•—»€‘n`M»eZS mE HAvé`
L12q€C?6Q<)rf2@ Foe GG EE@£< Q
·
TWA·{UtA/{ SAWZPLG ${26
{25..,.] OCR Output
wg gC2}
¢:O_..m(0 W S +4)
t¤+¢24;
F=·'¤rb MN. c».>ZTH IMTECSK L¤€7Z.`Hg weTi-el f2»é%uz.vsESE "“’ "‘N$
Ave
Q [UQ)How A<CC-URA.( 3Q
UV?—c.c~r>U-2 ym’{>2vvv~% wa. HfSoLU?>L€ jmpmgl NQ OCR Outputer.woo +•'mz.sZ :- 2_'° w
F" M
4oo mevvms -—> 200 mmmg
6wP¤uc~;»~>·r¢`A¢_ Qcowm Moms: T ·;· Z
L Sfsr) Sm. u‘ri¤M`(1.;) an emuc.¢.sELL] Ir! W¤¢S‘F
(ZJ) Fon LA4C».€Txi-\€ mc,¢€A$CS LLKE N {Q ju smeiide
?oL`1AI¤r·tlhL ‘[`(v:1E' HERNS
6 {HL. 4%
L) N0 QNF ldfbwg How W QLV .¢ {
A5 5cHE
(I] I1- Kb gTL€R$-I-· wb S L CP ° U.'WMT Menws
@@0£C.G'¤*(
NP ·· C¤HPL/SWE'
TMLS 1}; AA;CO V H LEXITY `Tu®¢Y 5AyS
EX%MG1\S'T1ALy{
To ?S(,;;;LS A F°'°°T'°MoFNN.$;?G~i?fl How °°"°T LE“““‘“" `"*"G
?( ¤7 QF C0VLPU7`H‘|’)·O/~JA4_, @npCéy;7UJG Lgnfau FaookUJ\··(pfT
}~A/ 7}L/HUH/AL Tru; OCR Output
UWY 077-lcI /‘¢é"T/MDS 67-*/6 Goo; 4vfPRo¥r'M/H'/dv!
_ Lc,7(Wvn/S ear, WJ **—
L•v¤vs<¢>•»$·¢ '
2. Tusenrnék TE¤vwi¤u¢;, **9/,4/$¤LUT1ON'
•...>•~€c•·4 fs L€SS THAN -?— K LENQT•-1 OF $5*
LA-N BE Such pw. ‘T·5.`P. mk 6(N4. usivoc, we I’l¢}vimu.$vA~¤¢`M$Tz¤`é‘, A s¤L¤-ném
Scnzrfnas f}?f’rZ0xz`Mnré' $¤\.=f‘¤'¤Ms MGT
Jc&$€
nswBu? %¤H.6Te°M€S EAS`! I.-"°"‘
..@n‘•°•.E7'€'
T¤‘r‘AL. ‘r’BA•./E¢.`1N$·
cpyggHlA}¢M(2HU¢,BH)
KLL. CUUES calui
TTS. www- msi?
ExAMP¢-EW TRS F/<P\00S T-QEAVGLWC; €A·LESHP¢AI`{sca. EM-<.r '·>e>¢u·r·n¤»f
IN woesr cA·z<·.‘
xv? \\S(P·C53¢’\<i’L€7E3GH‘ZS$ \`1> OHL
Perspective OCR Output8 Learning in Artificial Neural Networks: a Statistical
817 Neural Network Learning and Statistics
79Part ll Leaming and Statistics
With K. Hornik and M. Stinchcornbe
55Derivative
Universal Approximation of an Unknown Mapping and lts
With M. StinchcombeMultilayer Feedforward Networks with Bounded Weights 4lApproximating and Learning Unknown Mappings Using
With M. StinchcombeBritish Library.
29Non-sigmoid Hidden Layer Activation FunctionsUniversal Approximation Using Feedforward Networks with
With K. Hornik and M. Stinchcombe
l2Approxirnators
Multilayer Feedforward Networks Are Universal
With A. R. Gallant
Avoidable Mistaks
There Exists a Neural Network That Does Not Make
Partl Approximation Theory
and J. Wooldridgewith A. R. Gallant, K. Homik, M. stinchcombe,
( i JHalbert White
Approximation and Learning Theory
Artificial Neural Networks
OCR OutputOCR Output@
dimension (Vapnik and Chervonenkis. 1971) we use the concept of metricto corollary 2 of l-laussler (1989). lnstead of using the concept of V-C·3»(u) Qn(wv
We now give a result permitting verification of condition (4.1), related, measurableapplication.constants g, and 5, of the previous section determine Q, and 9,, in ourous on 9,, forfast and that Q, converges to Q uniformly in an appropriate sense. TheY® .Q(9)·element of 9. The behavior of 9, ensures that 9,(w) does not increase too,, and the setficiently dense in 9, so that an element of 6,(q) can well approximate anondence withbounding 6,(¤). The behavior of Q, ensures that 9,(o) becomes sufe a. completeassumption of part (b) is the existence of nonstochastic sets Q, and 9,cy space andpermits this for i.i.d. and stationary mixing processes. The other notableconvergence can be verified in particular stochastic contexts; our next resultthe limit to which Q, converges uniformly (condition 4.1). This uniform
the indicated The object 0, is distinguished by its role as minimizer of Q (condition 4.2),che indicated statements about consistency. Part (b) establishu consistency of 6, for 0,.icrated by the measurability we cannot make probability statements about 8,,_ such asshcombe and Part (a) establishm the existence of a measurable estimator 0,. Without>f White and Q x 9,.no: limited to measurable extension to 0 x 9 of Q, originally defined on gr 9, orad so as to be results in no loss of generality, as there generally exists an appropriatei 2.1 of White Stinchcombe and White (1989b) establishes that defining Q, on fi x 9ined from the Q x 9, instead of on all of Q x 9 as we assume. Lemma 2.1 ofthe previous it may be natural to have Q, defined only on the graph of 9,,(w) or on
permits treatment of cross-validation procedures. ln some applicationswhich optimization is carried out may depend on the data through ui; this
(squared error) optimized to arrive at an estimator 6,,. The set é,,(w) overin this space (weighted mean squared error). Q, is the criterion functionunknown regrusion function 0,). and p is a metric that measures distancedent or a mixing sequence. The space 9 contains the object of interest (the
mailer is this [Z,} is defined. The properties of P determine whether {Z,} is an indepencme mappmg ln our application, (Q, JZ P) is the space on which the stochastic processIn the limit, n (0.. . 9s) L 0
nn provides a where $(0,, c) s i9e9:p(0,0,) 2 ci, and Q is continuous at 9,. Then1cn optimiza
<4.2>i¤f...·i... ., QW) - QW,) > 0.+E)1 ·• 1 as
.4 or A.l(b), and for 0, s 9
Connectionist Nonparametric Regression 175
| n stochastic; OCR Output
exponentiaP[¤=?¤:|Q.(<».0) -Q(0)|>c] ··0asn-·¤¤, (4.1)the same a
that for all e > 0 number offor all to in 0, n = l, 2, .... Suppose there exists a function Q: 6 —• I5 such metric entpact subsets of 9 such that UZ., Q. is dense in 9 and Q, Q é,,(o) g 6,, entropy (lt
(b) In addition, supposei Q,} and (G,.} are incrusing sequences of com dimensionor all u in 0. to corolla:
We nots?·”/$(9,) (hence-Y/9(6)) such that Q,,(¤,€,,(t.i)) = mi¤,_°_:T5;::,?§;Then for each n =1, 2, . . . there exists a function 6,,:0 —• 9_ all application
constantseachoin0,n=l,2,.measurable, and Suppose that Q.(w. ·) is lower semicontinuous on 9 for ” fast and t9,,(w) is non-empty and compact. Let Q_;0 x 9··§ b, y®g(6)_ element o
9..6 M?®9(9.)) Such that for each o in 0 é,(¤) q 9__ ud th, sa ficiently cboundingseparable Borel subset of 9 and let é,,;0 • 6 be ; ccnapoudma withassumpttcM (9.p) be a metric space. For n s 1,z,____ in g_ be a commupermits tlT’*¢¤'¢M *-1- (l) Let (0,.9I P) be a complete probability spaee mdconverger
the limitU•ll¢l¤•
. corrapondence, and .d( · ) is the collxtion of analytic sets of the indicate;\ I / TM ¤b5¤ - #2-IAS¢¤¤¢m¢¤open sets of the argument set, gr( ·) denotes the yaph of the indica:m¢¤¤¤kWhite (l989b). We write 9( ·) to denote the Bore! c·Held generatd bythePm (IWooldridge. Other notation and dehnitions are as in Stinchcombe and
0 x 9,.this uae, howevc. Where possible, notation follow that of White and
>· vc OCR Output
biological, vol. 3 contains tutorial and software)(Vol. 1 gives the foundations and algorithms, vol. 2 is moreMIT Press, 19IIParallel Distributed Processing, vols. 1, 2 and 3,
McClelland and Rumelhart,
Springer•Verlag, 1999se1f·Organi:ation and Associative Memory (3rd edition)
Kohonen, T.,
(probably the best book available for physicists)Addison·wesley, 1991Introduction to the Theory of Neural Computation
nertz, Krogh and Palmer,
world Scientific, 1990Physical Models of Neural Networks,
Geszti, T.,
(reprinted in Anderson and Rosenfeld)Reviews of Modern Physics 34(1962), 123-135The Perceptron: A model for brain functioning,
Block, H.D.
(A good general introduction, paperback, not expensive.)Adam Hilger, 1990Neural Computing, an introduction,
Beale and Jackson,
(A collection of important papers, expensive)MIT Press, 1988weurocomputing: Foundations of Research
Anderson and Rosenfeld (eds.),
approach;)(Amit is a theoretical physicist, but often adopts the biologicalCambridge University Press, 1989Modelling Brain Function,
Amit, Daniel
F. James, August, 1991SOM! RLCENT GOOD BOOKS ON NBURAL NETWORKS
A microscopic "rnask" is laid on thetriaminel in their experiments.They used one called DETA (for diethylene· nirlinnOCR Outputthat promotes the growth of neurons.of
er single layer of molecules of a chemicalat0 chip of silicon or glass and coat it with a
The two scientists said they take aUsing DETA
University of California·lrvine. othertional institutes of Health and at thecollaborations with scientists at the NaResearch. are developing the biochlps infunded largely by the Office of Naval
Drs. Stenger and Hlchnan. who arememory or learning.with, or perhaps enhance. functions like than onuse to see if new compounds might interfere
aw make "biochips" that drug makers couldhe It also may be possible to eventuallyI 3.
idea. Dr. Stenger said.ne
variation of the "canary-in·the-coa.l mine'used in chemical warfare. a bioelectronic$9 mxdetected a nerve poison. such as nerve gas
more oftenws on a chip that would sound the alarm when it
·/* { FEB. to develop a semor composed of nerve cellsing with researchers at Stanford Universitypany. For example. the pair are collaboratginia. a "high-tech" contract-research comJaz/KA/A·¢ ELtiom lntermtionsl Corp. in Mclean. Vir
list
the For Euchemist Jay Hickman of Science Appllca·51-*said Dr. Stenger and his collaborator.lead to some useful bioelectronic devices.ww-L
Nevertheles. their basic research couldFEE
Possible Nerve·Gas Sensor
engineer. Dr. Stenger explained.puter developers are currently trying to
ellche
on. artificial neurons that electronics and comlearn how to make better networks of1%
UC. with each other on the chips. they shouldthe biological neurons function and connect animal for only a matter of hours.
comment onhe said. As the researchers study how networks of brain cells outside the intactin Rockvilleyou can investigate how neurons work:mof ent. neuroscientists are able to study
Susan Cr‘We’re working with systems where perimentation. Dr. Stenger noted. At presbacterial coeffort to make an artificial brain. functioning for months for study and ex0.5% concemight be misinterpreted as a fledgling The neurons can be kept alive anda nationwicin Washington, worried that the research of DETA.
ln earlyStenger of the Naval Research Laboratory dendrites and axons following the pathsinCopley’siresearch," said biophysicist David L. then send out connecting "wires" calledterol. The iBx ‘I want to emphasize this is fundamental off," Dr. Stenger explained. The neuronssystem userons form in the brain. sparkly things on it and then shook themmaintenancwill crudely mimic the circuitry that neu a piece of paper and sprinkled thoseical monitoto grow connections to each other that when you were a kid and you put glue onhave writtetor months to get the brain cells. or neurons, will "take" and grow. "lt's kind of likenumber ofto be able within the next six to 12 that those landing on the DETA spots
The FDAalong desired paths. The scientists hope less sprinkled on the chip with the hoperesolved.and then induce the brain cells to grow Embryonic rat neurons are more orand will bedesired spots on silicon or glass chips ron growth.this matter§€I' how to place embryonic brain cells in layer of molecules that discourage neu·Copley comce! The researchers said they had learned and the cleared sections are filled with aUE DWDMthat use living brain cells. let laser removes the unshielded DETA.through a sstep coward creating electronic microchips follow to make connections. An ultravio·
CopleyResearchers said they took a key first the channels they want the neurons toSea;] Reporzgr ceutlcal finscientists want the neurons to settle and
) keted at hl:By Jann? E. Brsuor chip that shields both the spots where thecost versioproducersproduction.In Microchip-Neuron Linkuconcern abhas nrvmltrouble ovtResearchers Make Advanceoimts ven
Albuuqulky at
'°”“‘""§;..i..‘ei.1...$$.$ 0... th. p.a¤;•p§mp;.€§.t;2L¥,?¥‘£7 §'G"E”‘”?‘$ ¥£}§$‘°‘‘}°’°
T
t; sl;¤n~te¤¤. Peng chip an\. II
• I I 53
'PNZAM. OCR Outputrnmmezg 4-» \ A 7;
has ¤@
rf! 7- Y 07c ‘ fy Re$mont.s
several NN architectures are compared with conventional methods. Excellent results are obtained.network with error haclopropagation. Two completely different methods are described in detail and their efficiencies for
We developped neural-network learning techniques for the recognition of decays of charged tracks using .i |`eed·for~s4rd
Received 10 Nlav l99l
Luhnrunure Je Phixrque C urptm u/utre. L`ntt·erstté Blume Pascal. C/errrmnr-Ferrund. M'} " .·lu/were. Frame
Georg Stimpfl-Abele
network techniquesRecognition of decays of charged tracks with neural
~··¤h·H·*"¢¤·* CommunicationsOCR Outputgimputer Physics Ctimmunicmons 67 ( l99l» IH}-1*42 CO{"npu{Qf Physics
@2.-ti QJQZQ ·yt,c<.cvLx Xg.4$1(Q,\F` LUQV1 {Dk QU/{Ki OCR Output
N - ‘\l' ·
consumingpa,
47.078.3 65.9`aster. Since Nm 48.179.3 65.3of the input
XBM 40.276.0 62.0; effects (theXgcs 14.353.1 35.8
) in randoml0 GeV3 GeV 5 GeV
.d 150 times
Kinlvfinding efficiencies (%) for DSIM datathem to the
Table 3bbout 40000
par51.679.0 69.1
50.1N res 75.2 68.5
one hiddenX par 50.279.8 68.0
concentrateX res 38.073.5 57.8
not gain by10 GeV5 GeV3 GeV
2-1 layouts.Kink-finding efficiencies (%) for FSIM datahe same efTable 3azonfiguration
residual ap
l0-·10-1 the
parameter-comparison method.wc obtainedrectly be compared to the results of the analyticaltwo hidden
The efficiency of the neural networks can dilarger for DSIM data.ber of non-decaying tracks with bad fits is much
tracks and to 5% for DSIM tracks since the num
0010-1655/93/$06.00 © 1993 — Elsevier Science Publishers B.V. All rights reserved OCR Output
Correspondence to: R. Barlow. Department of Physics. Manchester University, Manchester, M13 9PL. UK
Carlo models.
systematic effects that may result from the use of such a network, by using samples from different MonteSection 4 studies the size and topology of network needed to bring out the results. In section 5 we discusspreprocessed variables (following the suggestions of ref. [2]), secondary jet clusters, and raw momenta.we compare the performance of networks using 3 different philosophies for input variables: highlydefinition of the extent to which one network performance is "better" than another. Then in section 3particularly when using it to perform a statistical separation. This gives an objective and unambiguous
In the next section we discuss how the separating power of a network can be quantitatively measured.GeV. and it is not clear that the situations are comparable.study was done at a centre of mass energy of 14 GeV, where the distinction is much clearer than at 91format (they used a hypothetical calorimeter) is preferable to using more sophisticated variables. but thisthem. Gottschalk and Nolty [1] suggest that entering the shape of the event in a largely unprocessed("raw") event information, trusting to the network training algorithm to find the best way of combiningfor b identification, to give the network the best possible chance of success, or to use unprocessednetwork; in particular, whether to use constructed shape variables known from experience to be useful
In such an application the most important question is what the best input variables are to use with theof b quark jets produced in the reaction e* e`-• Z -• qi at a centre of mass energy of 91 GeV.nature of the jets is known. This note explores the possibility further, concentrating on the identificationnetworks trained and tested on samples of data generated by Monte Carlo programs. for which theparameters). Some studies have already been done on this topic [1,2], examining the performance ofgeneral jet shape as input (as opposed to specihc inputs such as high pT leptons or large impactjet, specifically to discrimination between jets from heavy (b) and light (u, d, s, c) quarks. using only thephysics they can be applied to the problem of recognising the nature of the quark producing a hadronic
Neural networks with feed-forward topology are widely used in pattern recognition. In high energy
1. Introduction
techniques are given.
believed to be good for separation. Some first studies of systematic errors resulting from using neural network separationnetworks perform better separation if they are given simple inputs, as opposed to inputs already combined into variables
The problem of identifying b quark jets produced at LEP using a neural network technique has been studied. We find that
Received 5 June 1992
Department of Physics, Manchester University. Manchester, M13 9PL. UK
Graham Bahan and Roger Barlow
Identification of b jets using neural networks
CommunicationsN°"“`H°"“‘°
Computer Physics Communications 74 (1993) 199-216 CO.-nputer physics
Fig. 7. The learning curve for nw inputs. OCR Output
N0. of Training Cycle:
ZM lm W W IW IZM 14% IW IW
OJ25
0.15
0.175
0.2
0.225
0.25
0.275
?•I‘I MIIHH I ?
0.3
0.325
Fig. 5. The lcaming curve for the semndary clustering inpuu.
No. dTr¤ining Cycle:
2% % @ 1% 12% 14%
0.125
0.15
0.1 75
0.2
0.225
0. 25
O. 2 75
0.3
0. 325
208 G. Bahan, R. Barlow / ldenufcazian of b jets using neural network!
nnn OCR Output· (ht · ir) F · " ———— Mi
i- 1, 2, 3...NB and F estimated ash(x) and l(x) but to the two Monte Carlo event samples. These must be histogrammed in bins
To evaluate 1-' for the results of a particular network one does not have access to the parent functions
Appendix. Evaluating F
useful discussions.
We are indebted to Leif Lbnnblad for providing us with the JETNET program, and to Terry Wyatt for
Acknowledgements
networks, or different versions of the same network.discriminating power of the network, enabling meaningful comparisons to be made between different
We recommend the use of the figure of merit, F, as an unambiguous parameter describing theunderstood.
network is much longer, and there are still features of the output which are unpleasant and not well
Fig. 9. Effect of the number of nodes in a single hidden layer on the performance of networks using different input;
No. of
I 3 5 7 9 Il I3 I3 17 19 210.23
A IL
0.26
0.2 7
0.28
0.29
0.3
0.31
OCR OutputI 212 G. Bahan. R. Barlow / Idennfcanbn of b jets mn; neural network:
0010-4655/93/S 06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved OCR Output
‘ E-mail address: [email protected] (Em) around 91 GeV (LEP 100), has [email protected] CERN, operating with a center-of-mass energyCERN, CH·l2ll Geneva 23, Switzerland; E-mail addr¤s:
the Large Electron Positron collider (LEP) atCorrespondence to: G. Stimph-Abele, PPE division,Breaking mechanism [2]. During recent yearsson, H, responsible for the so-called Symmetrymodel predicts the existence of the Higgs bointeractions among elementary particles. Thisphasis is given to the selection of the best inputis the commonly accepted theory to explain the. layer and error back-propagation are used. EmThe Standard Model of particle physics [1]Standard feed-forward nets with one hidden
the Z mass.LEP 200mass is rather challenging since it is just below2. Higgs production and backgrounds atgrounds can be better discriminated. The higher
case because the sipal is higher and the backGeV/cz. The lower mass represents the easiering two mass hypotheses are chosen: 70 and 90 of the methods. Finally, conclusions are given.for the Higgs boson at LEP 200. The follow sively in section 7 in order to test the reliabilitywith a neural net (NN) approach in the search in section 6. Systematic eH`ects are studied exten
standard one-dimensional cuts, is compared of the N'Ns in the Higgs search is demonstratedIn this study a traditional filtering method, us select the best input·variables. The performance
background events. methods developed to analyse neural nets and totering process intended to separate signal and and learning procedure, and a description of theysis looking for new phenomena needs a fil cal details ofthe net generation, like architecturenumber of conventional events. Hence an anal dimensional cuts. Section S contains the techniticles are produced along with a much larger icated to the standard analysis based on onephysics. In general, events containing new par preselection of the input data. Section 4 is dedamong the most important tasks in high energy lowed by a description ofthe generation and the
The search for new elementary particles is The physics case is discussed in section 2, folSystematic eH`ects are studied in detail.
1. Introduction variables by analysing their utility inside the net.
nets are found to be sipiticantly better than those of standard methods.are presented. The sensitivity of the nets for systematic eH`ects is studied extensively. The ehiciencies of the neuralat LEP 200. New methods to select the most edicient variables in such a classiiication task and to analyse the netsof neural nets. Feed-forward nets with error bacbpropagation are applied to the search for the standard Higgs boson
Two aspects of ¤eura1·¤et analysis are addressed: the application of neural nets to physics analysis and the analysis
Received 8 June 1993; in revised form 22 August 1993
Rice University, Houston TX 77251-1892, USALaboratoire de Physique Corpusculaire, Université Blaise Pascal, Clermont-Ferrand, F-63177 Aubiere, France
Georg Stimpt'1-Abele “ and Pablo Yepes "·‘
Higgs search and neural-uct analysis
>*<>¤¤·H<>¤¤¤¤ CommunicationsComputer Physics Communications 78 (1993) 1-16 COWWDUTGT
1`o allow for a direct compari standard cuts (table 3) are included to ease the OCR Outputdoes not improve the statisti the significance is maximal. The results of thePgm (9). Adding them to the The cut on the NN output is chosen such thatibove. There are two new vari the analysis at 70 and 90 GeV/cz, respectively.the standard analysis are con cal significance are shown in tables 7 and 8 forwith section 4 shows that all ground events of the three nets and the statisti
The number of accepted signal and backnputs (N7) but still good per maximum around 0.9for high performance and the raise almost linearly with the cut, reaching theirfor further study. The net with to the preselection. The statistical significancesthe performance two nets from statistical fluctuations. The threshold at O is due
events above the cut are demanded to avoid bigng between the number of input for both masses. At least two background
r nets N, are the variables 1 to the three nets as function of the cut on the outre the inputs for N 10. The in
354.391.422.Min. luminosity [pb"] 610.fitted with the HZ hypothesis8.47.7 8.0Statistical significance 6.4
f secondaries in the Higgs jets15.5 10.9 6.935.1Total backgound
4.211.4 5.36.6e+e‘ -• Z Z. ofthe most energetic electron 1.63.9 2.611.8e+e" -• W+W‘·
1.1:0 three jets (S9). 3.0ll.9 5.0e"’e' - qdggangles between the jets if the 22.026.438.0 30.5e+e‘ -—· H90 Z
N10CutsReactiontass of all secondary tracks
canoe.
. event is timed with the WW analyses at my = 90 GeV/cz and their statistical signifiSignal and background events left for standard and NN
Table 8·-.1argcd tracks (Md;).
124.157.174.Min. luminosity [pb"] 330.10.9 8.0 14.2 '12.612.0Statistical simiticance 8.712.6 8.0
8.33.56.048.7Total backgound13.2 8.20.60.21.4 0.4e+e" —· Z Z13.2 8.44.01.43.225.0e+e‘ -» W+W‘14.2 8.43.71.92.422.3e*e‘ -·» qdgg15.5
40.923.629.560.515.8 e+e‘—» Hm Z
19.1 8.7N10NetCutsReaction
Hm Hoocance.
.scs. analyses at my = 70 GeV/cz and their statistical signitiSignal and background events left for standard and NNas function of the number of input
Table 7
G. Stimpfl-Abele. P. Yepes / Higgs search and neural~net analysis
© l990 The American Physical Society OCR Output l32|
hidden units.ard layered architecture (see Fig. l) with inputforwkc ours the neurons are often organized in a feed FIG. l. A l`eed·l`orward neural network with one layer of
nectivity weights wi,. For feature recognition problemsingredients in a neural network are neurons n, and con . $• Xit
‘—° jkThe neuralmetwork learning algorithm.--The basiclions.
could be used in a variety of different triggering situaSluon and quark jets. it is clear that the methodology
Although this paper is limited to the separation ofindependent, e.g.. considering the fastest particles only.ourselves to kinematical quantities that are most modelready there. This eH`ect can be minimized by limiting
(6)Amy;. ·· nZmj,5,g'(a,)xi +aAm}’l°our studies; some of the physics one wants to study is al`Some extent this induces a "chicken·and·egg" eH'ect to Correspondingly, for the input to hidden layers one hasefe ` events using the Lund Monte Carlo model. To
6; '(yg ‘I, )g'(di) .We conhne our studies to Monte Carlo·generatedtime triggering.
for the hidden to output layers, where 6, is given bystructure. The latter feature is very important for real~in custom·made hardware with its simple processor Aw;] ' ` Hajhj +GA(.r)3 (4)very general, inherently parallel. and easy to implement
sponds tohas the advantage over other Etting schemes in that it is
is minimized. Changing my- by gradient descent correis exactly what the neural~network approach aims at. Itxpert`s exercise to a "black box" htting procedure. This
(3)E- % ZZ(y,‘P’—t,‘P’)’(quark or gluon). This reduces the problem from anserved hadronic kinematical information and the feature tion
would be to End the functional mapping between the ob back·pr0pagation learning rules where the error funcA straightforward method for identifying the jets quently used procedure for accomplishing this is the
produced in high·pr hadron-hadron collisions. equals the desired output or target value ti". A freenergies are less well known. One such example is jets ter x ') gives rise to an output (feature) value y"that(’that in many situations “g1obal" quantities like total jet changing the weights my such that a given input paramewould be desirable to focus on the latter alternative given to be learned. Training the network corresponds tothe entire event rather than just a single isolated jet. It building up an "internal representation" of the patternsorate schemes.’·° Such procedures are often based on The hidden nodes have the task of correlating andjet with smallest energy as the gluon jet' to more elab and Z, wijhj, respectively.the kinematic variables ranging from just identifying the weighted input sums aj and aj are given by Zi wjixiidentihcation has been done by making various cuts on where the "temperature" T sets the slope of g and thecoupling in e*e ' annihilation.; To date the gluon-jet
y;'g(G;/T) ,quired for establishing the existence of the three—gluonAlso. a fairly precise identincation of the gluon jet is re
(I)h,•g(aj/T),studies on the so-called string eH`ect‘ and related issues.
one haslight on the confinement mechanism in terms of detailedtiit from many perspectives. lt can shed experimental 0.5[l+tanh(x)]. For the hidden and output neurons
__ab\e to distinguish the origin of a jet of hadrons is impor thresholds this sum with a "sigmoid" function g(.x)performs a weighted sum of the incoming signals andand quark jets using a neural-network identifier. Being
In this Letter, we demonstrate how to separate gluon (xi), hidden (hj), and output (y.) nodes. Each neuron
PACS numbers: l3.87.Fh. l2.38.Qk, l3.65.+i
Monte Carlo-generated e 'e ' events with 85%-90% accuracy.Using a neural-network classifier we are able to separate gluon from quark jets originating from
(Received 6 April 1990)Deparrmenz of Theoretical Physics, L'nii-army of Lund. Sélvegaran [4,4. S -22362 Lund. Sweden
Leif L6nnblad.“Carsten Pe1erson,and Thorsteinn R6gnvaldsson(""()
Finding Gluon Jets with a Neural Trigger
vo1.L·Ms65, Numsanll 10 SEPTEMBER 1990PHYSICAL REVIEW LETTERS
l.9;O.l OCR Output2.730.13.0;0.lr(m,,___,>nrK)l.b0;0.il22.1630.03 2.00:0.02r={n:`>/(rt,}
M';74'2l()0‘I·Success rate
Smallest jetNeural networkMC truth
approach and the "smallest jet" approachdifferent methods of identifying the gluon jet: using the true gluon jet of the MC. the NN ,r
Results for the r · (rt:)/<n,> measured on a sample of l0000 MC events (ARIADNEI for / ‘N`DTitan.; 7 LP AA
* ** thgpdrm 5¢ldc52 (bitnet) tlennitu thep.lu.se (internet) Jfwyo,* ' thepcaptu seldc52 (bitnet) carstenfu thep.lu.se (internetl
thepllét seldc52 (bitnetl leifw thep.lu.se (internet)
and energy. ln a previous paper [1] preliminary results for gluon·quark separationfeature extraction procedures will become more acute with increasing luminosityrelevant quantities in collected data. Needless to say. the demand for efficientlow-level trigger conditions in experimental setups, to extraction of theoretically
High·energy physics contain many feature recognition problems, ranging fromexecution times and thereby facilitating real-time performance.
neural networks and the feasibility of making custom made hardware with fast
ness and robustness. Another attractive feature is the inherent parallelism inof the NN promising but the entire approach is very appealing with its adaptivevariety of real·world feature recognition applications. Not only is the performanceenthusiasm is the power this new computational paradigm has shown for a widebrain·style computing in terms of artificial neural networks (NN). The origin of this
During the last couple of years there has been an upsurge in interest for
1. Introduction
compressing the dimensionality of the state space of hadrons.how the neural network method can be used to disentangle different hadronization schemes bypurity. which is comparable with what is expected from vertex detectors. We also speculate onjust observing the hadrons. ln particular we are able to separate b-quarks with an efficiency and
ln addition, heavy quarks fb and cl in e°e‘ reactions can be identified on the 50% level by
effect.
model used. This approach for isolating the gluon jet is then used to study the so-called stringCarlo generated e’e‘ events with ~85‘? approach. The result is independent of the MCnetwork. With this method we are able to separate gluon from quark jets originating from Montefunctions using a gradient descent procedure. where the errors are back-propagated through thequark-gluon identity. This is done with a neuronic expansion in terms of a network of sigmoidalis to find an efficient mapping between certain observed hadronic kinematical variables and the
A neural network method for identifying the ancestor of a hadron jet is presented. The idea
Received 29 June l990
Department of Theoretical Physics. University of Lund, Séltegaran I 4.4. S-22362 Lund. Sweden
* * *Leif LONNBLAD ". Carsten PETERSON * ' and Thorsteinn ROGNVALDSSON
USING NEURAL NETWORKS TO IDENTIFY JETS
North-Holland
Nuclear Physics B349 ( l99l l 675-702
0010-4655/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved OCR Output
Amendola 173, 70126 Bari, Italy. followed by a Xe—CH, filled proportional chamCorrespondence to: M. Castellano, INFN, Sez. di Bari, Via ules, each one made of a carbon fiber radiator
The TRD prototype [10] consists of 10 mod
produced by known particle-classes, are stored in 2. The pattern classiication systempurpose, the output patterns of the detectors,the classification power of detectors. For thistaken into account in high energy physics to study data and test results are discussed.ful statistical and neurocomputing techniques, are the third section the application to experimental
Pattem recognition methods, involving power ture and the likelihood ratio test are described; inments [8-10]. tion system is presented and the neural architecup to l TeV energy in cosmic ray space experi ray experiments. In the next section the classificacan be used to distinguish positrons from protons for electrons/ hadrons discrimination in cosmicexperiments (see e.g. ref. [7]). Moreover, TRDs ate the pcrforrnances of a TRD developed [10,11]
network and likelihood ratio algorithm to evalufuture also in LHC (see e.g. ref. [6]) and SSCtion system based on a back-propagation neuralat CERN [2,3] and Fermilab [4,5] and in the
high energy particle identification in experiments In this paper is presented a pattem recogiiclassification power.Lorentz factor [1]: they have been employed forin a suitable feature space provides the detectorused to discriminate particles with a different yspace. A measure of the separability of the classesTransition radiation detectors (TRDs) can betern from pattern space into class·membershipfor particle identification in high energy physics.tionship among data in terms of mapping a patSeveral detectors have been proposed, so far,cation system in order to carry out explicit rela
1. Introduction tagged data files. They are examined by a classifi
successfully identities 4.0 GeV/c electrons with an hadron contamination of about 4><10°’ at 98% acceptance efficiency.multiwires proportional chamber of the detector. The best results are obtained by the neural network approach thatlikelihood ratio technique. The information fed into the classihcation system consists of the number of hits detected by eachdiscrimination is presented. It is based both on a layered feed~forward neural network trained using back-propagation and a
A classihcation system able to evaluate the performances of a transition radiation detector prototype for electrons / hadrons
Received 16 June 1993; in revised form 16 August 1993
___ ‘ Istituto Elabaraziane Segnali e Immagini - CMR., Va Amendola 166/5, 70126 Bari, Italyb INFN Sez. di Bari, Via Amendola 173, 70126 Bari, Italy“ Dipartimento di Fzlsica, Université di Bari. and INFM Se:. di Bari, Va Amendola 173, 70126 Bari, Italy
G. Pasquariello ° and P. SpinelliR. Bellotti a, M. Castellano b, C. De Marzo “, N. Giglietto b,
radiation detector
method to evaluate the performance of a transitionA comparison bctwccn a neural network and the likelihood
CommunicationsNorth-Holland
Computer Physics Communications 78 (1993) 17-22 Computer Physics
(solid line). Collider in the LEP-Tunnel, ECFA 90-133, Vol. l (1990). OCR Outputhood ratio technique (dotted line) and the neural network [6] Proc. ECFA-CERN Workshop on the Large HadronGeV/ c data: the TRD performance evaluated by the likeli (1989).Fig. 5. Pion contamination versus electron acceptance for 4 [5] D. Errede et al., preprint Fermilab-Conf-89/170-E
E|•etron keontence (1984).¤.S¤ 0.9 0.92 0.94 0.96 0.98
[4] A. Denisov et al., preprint Fermilab-Conf-84/134-E[3] A. Vacchi. Nucl. Instrum. Methods A 252 (1986) 235.[2] J. Cobb et al., Nucl. Instrum. Methods 140 (1977) 413.[1] B. Dolgoshein, Nucl. Instrum. Methods A 326 (1993) 434.
Neurol Network References
task.Lihelihm
components to speed up the particle classificationcould be hardware implemented using neuron-likeparallel distributed processing model of the netresults. Finally, from the practical view point, the
4 Gov/C Dole played a fundamental role in obtaining betterthe likelihood ratio statistical technique hasmany hypotheses simultaneously with respect tobeen presented. The ability of the net to explorefor the electron / pion discrimination problem hasapproaches to evaluate the TRD performances
A comparison between statistical and neuralthree different momentum data set for bothmerizes pion contamination against acceptance at
4. Conclusionselectron acceptance efficiency. The table 1 summum pion contamination value around 95% ofAs a result, the NN technique reaches the mini
as possible is required.to the procedure described in the second section.short duration exposures, an acceptance as largeversus electron acceptance is computed accordingsearching for rare events with the constraint ofemphasized in fig. 5 where pion contaminationspace cosmic ray experiments [9,10] where,respect to the L one. This behaviour is well[12,19,20] and it can be useful to employ TRDs infeature values is achieved in the No space withperformance is never shown by other methodsonly. Better accumulation around 0 and 1 of thenation is already achieved at full acceptance. Thisated, as shown in fig. 4 for 4 GeV/c momentummethods. In the NN method a very low contamimentum, the NO and L feature spaces are gener
sets, i.e. about 5000 events for each beam mocounts in each chamber. Using the same datadetector, while the electron patterns have some
98% 236.3 26.7 97.9 90.0 21.4 4.5tected with a small number of hits in the whole 97% 134.9 26.7 46.6 8.1 14.1 4.5shown in fig. 3. The pion events are mainly de 21.9 6.0 6.8 4.295 % 83.4 26.7
90% 25.3 15.1 14.8 6.0 3.0 4.2able. Typical data for pions and electrons are1, are taken at the different beam momenta avail L NN L NN L NN
The TRD data, according to the scheme in fig. 2 GeV 4 GeVAcceptance 1 GeVfor the electrons is evaluated of about 10"‘ [10].
techniques.ule electronic logic. A mean tagging inefficiencyacceptance by the likelihood ratio test and neural network
ters, lead glass scintillators and a standard mod Pion contamination (x l0") evaluated for different electronfixed momentum, by means of Cherenkov coun Table l
21R. Belloui ct aL / Evaluating the performance of a transition radiation detector
0010-4655 /93/$06.1X) © 1993 — Elsevier Science Publishers B.V. All rights reserved OCR Output
A-1050 Wien, Austria (permanent address). can be mapped on a compatibility graph in whichCorrespondence to: R. Friihwirth, Nikolsdorfer Gasse 18. The relation of compatibility in a set of tracks
2. Architecture of the network
maximal set.
.,..---., niuicators (Qls) will be called the ues. are presented in section 4.. with the largest sum .. performance of the network algorithm
.q•A¢. ,-J. Inc GEIZUS UI. una ......¤•¤[IOn
2538Processing time [CPU s] 518 2067
Tracks in class 3 16251176149
Tracks in class 2 850240 903251
Tracks in class 1 487455183867 3712
8279Reconstructable tracks 4331 4331 8279
10000Simulated tracks 5H 10000Sm
standardAlgorithm neural netneural net standard
Data set llData set I
Classitication of simulated tracks.
t'Table 2
of c¤···¤tible tracks. It has
mization problem by selecting ar "° Hcpfield n··‘algorithm therefore has to solve a ifiderable detail in ref. [1]. Bvpatible, i.e. must not share data. The selection sets in a compatibility graph has been analyzed intwo tracks in the f'mal selection have to be com The problem of finding maximal compatiblestruction algorithm is given by the fact that any cedure is described in section 4.
An important constraint on any track recon track angle and chi-square probability. The proful to define the track QI as a combination of
1. Introduction In our specific application we have found it use
simulated data of the DELPHI detector.
compatibility graph. The properties of the network are explored with random graphs. The performance is illustrated withA feed-back neural network is described which solves the problem of finding an optimal set of compatible nodes in a
Received 6 July 1993
Institut jiir Hochenergtbphysik der Osterreichischcn Akadznuh der Wsscnschafterr, Wenna, Auswia
Rudolf Friihwirth
neural networkSelection of optimal subsets of tracks with a feed·back
North-HollandComputer Physiu Communications 78 (1993) 23-28 Cgrnputg Phyggs
the experimental data. OCR Outputtional effects. The result is a simulated data sample (MC) that can be checked againstMonte Carlo events according to the theoretical model and to fold them with the addiexperimental data). A commonly used strategy to overcome this problem is to generateeffects not included in the model ( e.g.: detector effects during the acquisition of theof a. large number of calculations with complex algorithms) and/ or there are additionalbecause their explicit analytical form does not exist ( e.g.: the models are the resultexperimental data. In some cases the theoretical models cannot be checked directlyAn everyday task in all areas of science is the comparison of theoretical models with
1 Introduction
(Submitted to Computer Physics Communications)
between the two clistributiomc.
nets, gives a lower bound value on any estimator that measures the inc:oni·1i.··:tenc:ybinning. The method, which can be successfully implemented on layerecl neuralclimensionx clue to the lad: of data., and with the relevant fact that it is free ofempirical distributiomn which is not restricted to work with projections in fewer
V6/e present a. method to text the agreement between two multidimensional
Abstract
January 20, 1994
E-38200 La Lagima (Tenerife), Spain(3) Instituto de Astrofisica de Canaries
E-08193 Bellaterra (Barcelona), SpainUniversitat Autonoma de Barcelona(2) Institut de Fisica d’Altes Energies
Diagonal 647, E-08028 Barcelona, Spain
Universitat de Barcelona(1) Departament d’Estructu.ra i Constituens de la Materia
LLuis Garridolml , Vicens Gaitan, Miquel Serra-Ricartilzl
empirical distributionsTest of agreement between two multidimensional
Tm; NET Wm El/Ely Mm pm To ,9vALv€<¤‘/<OCR Output
9 4.
.-Jp O ')"`°
'T<>8€
TP UT`7 ` `
;,&c—~>¢e~; .g°“ruggsEU£A·L NFHOOQK ,;·;F CQUESF
How TO E$T1MATE' K
(UA/t<Nowu {I were {uhh
3 s 4. M2
I ` pv¤B¤LZN; A».sL(¤$v‘p F z u,,..Au I Z
E as) r E QE}
Q (5me wwwa:
Rc3€U` 0VVU
OCR OutputOCR OutputL2 M 64 W ..66%,% mm?