SOLAR @ 2003 1
Janusz Starzyk and Yongtao GuoJanusz Starzyk and Yongtao Guo
School of Electrical Engineering and Computer ScienceSchool of Electrical Engineering and Computer ScienceOhio University, Athens, OH 45701, U.S.A.Ohio University, Athens, OH 45701, U.S.A.
September, 2003September, 2003
SOLAR @ 2003 2
ONTLINEONTLINE
1.1. IntroductionIntroduction2.2. SOLAR PrincipleSOLAR Principle3.3. Simulation ResultsSimulation Results4.4. HW/SW Co-SimulationHW/SW Co-Simulation5.5. Hardware OrganizationHardware Organization6.6. ConclusionConclusion
SOLAR @ 2003 3
Self Organizing Learning ArraySOLAR
• New learning algorithm– Multi layer structure and on-line learning; – local and sparse interconnections; – entropy based self-organized learning
• Superior performance– Parallel computing organization;– Low power dissipation;– Efficient communication;– High chip utilization rate;
• Potential to be a leading technology in machine learning– pave the way to machine intelligence application areas including
pattern recognition, intelligent control, signal processing, robotics and biological research.
SOLAR @ 2003 4
DARPA: Cognitive Information Processing Technology
Wanted: machine that can reason, using substantial amounts of knowledge
Can learn from its experiences so that its performance improves with knowledge and experience
Can explain itself and can accept direction Is aware of its own behavior and reflects on its
own capabilities Responds in a robust manner to a surprise
SOLAR @ 2003 5
Self-Organizing Learning Self-Organizing Learning
ARray (SOLAR )ARray (SOLAR )
Dowling, 1998, p. 17
SOLAR @ 2003 6
)log()log(
logmax
max1
s sPsPscPs c scPE
c cPcPE
E
EI
Here, , , represent the probabilities of each class, attribute probability and joint probability respectively.
cP sP scP
Self-organizing PrincipleSelf-organizing Principle
Neuron self-organization includes:Selection of inputsChoosing transformation functionSetting thresholdProviding output probabilitiesSetting output control
SOLAR @ 2003 7
Self-Organizing Process and Self-Organizing Process and Neuron Structure Neuron Structure
Self-organizing Self-organizing Process Matlab SimulationProcess Matlab Simulation
Initial interconnectionInitial interconnection Learning processLearning process
SOLAR @ 2003 9
Synthetic Data Classification
SOLAR @ 2003 10
Credit Card Data Set
Method Error Rate
Cal5 0.131
SOLAR 0.133
Itrule 0.137
Discrim 0.141
Logdisc 0.141
DIPOL92 0.141
CART 0.145
RBF 0.145
CASTLE 0.148
NaiveBay 0.151
Backprop 0.154
C4.5 0.155
SMART 0.158
Baytree 0.171
k-NN 0.181
NewID 0.181
LVQ 0.197
ALLOC80 0.201
Quadisc 0.207
Default 0.440
Kohonen Failed
SOLAR self organizing structure
SOLAR @ 2003 11
SW/HW codesign of SOLARSW/HW codesign of SOLAR
JTAGProgramming
Software run in PC
PCI Bus
Hardware Board
Virtex XCV800FPGA dynamic configuration
SOLAR @ 2003 12
Cosimulation - What and Why?Cosimulation - What and Why?
• Cosimulation– Simulation of heterogeneous systems whose
hardware and software components are interacting
• Benefits of cosimulation– Verifying correct functionality of the target even
before hardware is built
– Profiling the dynamic behavior
– Identifying the performance bottleneck
– Preventing problems such as over-design or under-design related to system integration
– Saving the system development cost and cycle
SOLAR @ 2003 13
Traditional Cosimulation Traditional Cosimulation EnvironmentEnvironment
– A software process• Written in high-level language,
such as C/C++
– A simulation process of hardware model
• Hardware description language, such as VHDL
– Inter-process communication (IPC) routine
• Connect the hardware process and software process
Software Model
(C-program)
Hardware Model(VHDL)
IPCroutines
Foreign IPC
proceduresIPC
Two simulators
SOLAR @ 2003 14
Traditional CosimulationTraditional Cosimulation
To perform cosimulation, two simulators should be combined and complex IPC should be developed. These IPCs are error-prone routines requiring to handle various formats of data and processed signals
Especially, when focusing on hardware part, we hope that the software part is minimized and the HW/SW communication is simple and reliable
SOLAR @ 2003 15
SOLAR CosimulationSOLAR Cosimulation
– A software process• Written in behavioral VHDL
which is not synthesizable
– A hardware process• Written in RTL VHDL which
is synthesizable
– HW/SW communication• FSM and FIFOs
SoftwareModel
(BehavioralVHDL)
Hardware Model
(RTL VHDL)
One simulator
FSM and FIFOs
SOLAR @ 2003 16
SOLAR CosimulationSOLAR Cosimulation To perform SOLAR cosimulation, one single
VHDL simulator is applied. So complex error-prone IPC is avoided. Data formats and other problems can be easily handled.
The interface between HW/SW is implemented by several FIFOs controlled by a FSM, which is simple, reliable and easily modified.
File I/O functions are used to simplify software part design when focusing on hardware part implementation.
SOLAR @ 2003 17
Co-simulation System Co-simulation System DecompositionDecomposition
Interfacemodeling
(RTL VHDLMain
Initialization
File I/O
SOLAR
Training
Over
No
Yes
System architecture modelling (Behavioral VHDL)
Input FIFO
Output FIFO
F S M
Inte
rface
Control
OP
EBEREG FIFO
ME
M
Self-organizing learning architecture (Structural VHDL)
SOLAR @ 2003 18
SW Organization VHDL ModelSW Organization VHDL ModelAll functions and signal variables in the packages are shared, and program execution is functionally interleaved.
•lower level package is the description for system input and output, initialization and update of the memory element in the network.
•The higher level packages encapsulate new system functions based on the functions described by the lower level packages.
•The highest design level function representing the software part in the overall system implements the system organization and management.
SOLAR @ 2003 19
Single Neuron’s Hardware Single Neuron’s Hardware ArchitectureArchitecture
Figure 4: Single neuron’s learning architecture
DREG CTRL
R
R
R
R
FIFO/DMA CTRL
MAIN CONTROLLEROP
1024X32 FIFO
INTERFACE INTERFACE M
AL
U
M
SOLAR @ 2003 20
Interface ProcessInterface Process
SW
HW
time
conf
igur
atio
n
send
dat
a
Rec
eive
dat
a
conf
don
e
star
t
wai
t co
mm
and
send
com
man
d
over
read
reg
iste
rs
dma
requ
est
…
…
time
SOLAR @ 2003 21
Interface ModelingInterface Modeling
clas
sot
her
1
2
3
4
5
6
Sof
twar
e (b
ehav
iora
l V
HD
L)
Interface
FIFOs
memorymodule
Ctr
l
Others
Figure 5: Interface modeling using FSM&FIFO
Har
dw
are
(str
uct
ura
l V
HD
L)
trai
ning
SOLAR @ 2003 22
Interface SimulationInterface Simulation
Small Training Data Set
SOLAR @ 2003 23
System Synchronized WorkSystem Synchronized Work
Software Work
Hardware Work
Interface Work
Time
SOLAR @ 2003 24
Incremental PrototypingIncremental Prototyping
Overall system design can be accelerated by replacing HW subcomponent with real hardware once successfully simulated.
HW function is completely defined and
prototyped
t
HW
function
VHDL- simulated
(incremental part)
SOLAR @ 2003 25
EBE SimulationEBE Simulation Main Procedures contain:
Sending data from software to Chip Memory
Trigger start signal ALU calculation for all data Moving calculated results to
intermediate memory Threshold scanning & ID
calculation Updating the intermediate values Data Movement if the current ID
is optimal Repeating from 3 to 6 untill all
functions are scanned Sending data from Chip to
softwareIn this simulation waveform, the signal “Opt_Threshold” and “ID” represent the optimal threshold and the corresponding information index deficiency for this particular training neuron in its learning subspace.
SOLAR @ 2003 26
EBE PrototypingEBE Prototyping
SOLARTrainin
g
SOLARTrainin
g
Map onto Virtex (57.8% logic, 60.3% route)
Minimum period: 23.140ns (Maximum Frequency: 43.215MHz)
Minimum input arrival time before clock: 11.036ns
Maximum output required time after clock: 13.758ns
SOLAR @ 2003 27
For instance, a particular neuron has 1024 subspace data.
PC to Chip: 38x1024 = 38912 CLKs
ALU calculation: 16x1024=16384 CLKs
Threshold scan & ID calculation (maximum): (4x1024+7) x1024=4201472 CLKs
Data Movement (Maximum) 1x1024=1024 CLKs
Chip to PC: 1x1024=1024 CLKs
Other: (starting sequence, wait, handshaking, etc.) 20x1024 =20480 CLKs
Total: 38912+(16384+4201472+1024)x7+1024+20480=29592576 CLKs
Run TimeRun Time Main
OperationsCLK Number per DATA
PC data to in-chip memory
38
ALU Calculation
16
Threshold Scanning
4
ID calculation 7
Memory data
Movement
1
In-chip FIFO to PC
1
x7
functions
SOLAR @ 2003 28
Prototyping Board
SOLAR @ 2003 29
Future WorkFuture Work- System SOLAR- System SOLAR
SOLAR @ 2003 30
SOLAR will growSOLAR will grow
RackRack
(4 boards,1x4)
1 Million gates1 Million gates
6 Million gates6 Million gates
24 Million gates24 Million gates
Half of a billion gatesHalf of a billion gates
BoardBoard
(6 chips,2x3)
SystemSystem
(16 cabinets, 4X4)
ChipChip
VIRTEXCV1000
SOLAR @ 2003 31
Questions