8

Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S ... · Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S Lakshmi, 2014 embedded memories, application specific circuits,

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S ... · Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S Lakshmi, 2014 embedded memories, application specific circuits,

385

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

DESIGN AND PERFORMANCE OF ON-CHIPMEMORY ARCHITECTURE EXPLORATION OF

EMBEDDED SYSTEM ANALYSIS ON CHIPG Ravi Kumar1 and S Lakshmi2

Corresponding Author G Ravi Kumar ravikumarg034gmailcom

Todayrsquos feature-rich multimedia products require embedded system solution with complexSystem-on-Chip (SoC) to meet market expectations of high performance at low cost and lowerenergy consumption SoCs are complex designs with multiple embedded processors memorysubsystems and application specific peripherals The memory architecture of embedded SoCsstrongly influences the area power and performance of the entire system Further the memorysubsystem constitutes a major part (typically up to 70) of the silicon area for the current daySoC The performance of a memory architecture also depends on how the data variables of theapplication are placed in the memory There are multiple data layouts for each memoryarchitecture that are efficient from a power and performance viewpoint Due to its large impacton system performance parameters the memory architecture is often hand-crafted byexperienced designers exploring a very small subset of this design space The vast memorydesign space prohibits any possibility for a manual analysis In this work we propose anautomated framework for on-chip memory architecture exploration Our proposed frameworkintegrates memory architecture exploration and data layout to search the design space efficientlyWhile the memory exploration selects specific memory architectures the data layout efficientlymaps the given application on to the memory architecture under consideration and thus helps inevaluating the memory architecture The proposed memory exploration framework works atboth logical and physical memory architecture level Our work addresses on-chip memoryarchitecture for DSP processors that is organized as multiple memory banks with each backcan be a singledual port banks and with non-uniform bank sizes Further our work also addressmemory architecture exploration for on-chip memory architectures that is SPRAM and cachebased Our proposed method is based on multi-objective Genetic Algorithm based and outputsseveral hundred Pareto-optimal design solutions that are interesting from a area power andperformance viewpoints within a few hours of running on a standard desktop configuration

Keywords SoC On-Chip Memory Organizer DSP Processor and Multiple memory banks

INTRODUCTIONTodayrsquos Embedded Systems and VLSI

Research Paper

1 MTech PhD Scholar Department of Electronics amp Communication Engineering Sathyabama University Jeppiaar Nagar RajivGandhi Road Chennai Tamil Nadu India

2 MTech PhD Professor Dept Of ECE Sathyabama University Jeppiaar Nagar Rajiv Gandhi Road Chennai Tamil Nadu India

technology allows us to integrate tens ofprocessor cores on the same chip along with

Int J Engg Res amp Sci amp Tech 2014

ISSN 2319-5991 wwwijerstcomVol 3 No 2 May 2014

copy 2014 IJERST All Rights Reserved

386

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

embedded memories application specificcircuits and interconnect infrastructure As aresult it is possible to integrate an entiresystem onto a single chip The single chipphone which has been introduced by severalsemiconductor vendors is an example of sucha system-on-chip it includes the modem radiotransceiver power management functionalitya multimedia engine and security features allon the same chip An embedded system is anapplication-specific system which is optimizedto perform a single function or a small set offunctions We distinguish this from a general-purpose system which is software-programmable to perform multiple functionsA personal computer is an ex- ample of ageneral-purpose system depending on thesoftware we run on the computer it can beuseful for playing games word processingdatabase operations scientific computationetc On the other hand a digital camera is anexample of an embedded system which canperform a limited set of functions such astaking pictures organizing them or transferringthem to another device through a suitable IOinterface Other examples of embeddedsystems include mobile phones audiovideoplayers videogame consoles set- top boxescar infotainment systems personal digitalassistants telephone central-office switchesdedicated network routers and bridges Notethat a large number of embedded systems arebuilt for the consumer market As a result inorder to be competitive the cost of anembedded system cannot be very high Yetthe consumers demand higher performanceand more features from the embeddedsystems products It is easy to appreciate thispoint if we compare the performance and

feature set offered by mobile phones that costRs 5000-(or 100$) today and which cost thesame a few years ago We also see that alarge number of embedded systems are beingbuilt for the mobile market This trend is notsurprising - the number of mobile phonesubscribers increased from 500 Million in year2000 to 26 Billion in 2007 Because of suchhigh volumes embedded systems areextremely cost sensitive and their designdemands careful silicon-area optimizationSince mobile devices use batteries as themain source of power embedded systemsmust also be optimized for energy dissipationPower which represents the rate at whichenergy is consumed must also be kept low toavoid heating and improving reliability Insummary the designer of an embeddedsystem must simultaneously consider andoptimize price performance energy andpower dissipation Application specificembedded systems designed today demandinnovative methods to optimize these systemcost functions

MEMORY SUBSYSTEMOn-chip Memory OrganizationThe memory architecture of an embeddedprocessor core is complex and is custom de-signed to improve run-time performance andpower consumption In this section wedescribe only on the memory architecture ofthe DSP processor as this is the focus of thethesis This is because the memoryarchitecture of the DSP is more complex thanthat of microcontrollers (MCU) due to thefollowing reasons

(a) DSP applications are more datadominated than the control-dominated

387

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

software executed on an MCU Memorybandwidth requirements for DSPapplications range from 2 to 3 memoryaccesses per processor clock cycle For anMCU this figure is at best one memoryaccess per cycle

(b) It is critical in DSP application to extractmaximum performance from the memorysubsystem in order to meet the real-timeconstraints of the embedded application Asa consequence the DSP software forcritical kernels is developed mostly as handoptimized assembly code In contrast thesoftware for MCU is typically developed inhigh-level languages The memoryarchitecture for a DSP is unique since theDSP has multiple on- chip buses andmultiple address generation units to servicehigher bandwidth needs

DATA LAYOUTTo efficiently use the on-chip memory criticaldata variables of the application need to beidentified and mapped to the on-chip RAMThe memory architecture may contain both on-chip cache and SPRAM In such a case it isimportant to partition the data section andassign them appropriately to on-chip cacheand SPRAM such that memory performanceof the application is optimized Further amongthe data sections assigned to on-chip cacheand SPRAM a proper placement of the datasections on the cache and SPRAM is requiredto ensure that the cache misses are reducedand the multiple memory banks of the SPRAMand the dual ported SPRAMs are efficientlyutilized Identifying such a data placement fordata sections referred to as the data layoutproblem is complex and critical step This task

Figure 1 Architecture of an Embedded SoC

388

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

is typically performed manually as the compilercannot assume that the code undercompilation represents the entire system

Memory Architecture Exploration

In modern embedded systems the area andpower consumed by the memory subsystemis up to 10 times that of the data path makingmemory a critical component of the designFurther the memory subsystem constitutes alarge part (typically up to 70) of the siliconarea for the current day SoC and it is expectedto go up to 94 in 2014 as shown in the Figure2 The main reason for this is that embeddedmemory has a relatively small subsystem per-area design cost in terms of both man-powertime-to- market and power consumptionHence the memory plays an important role inthe design of embedded SoCs Further thememory architecture strongly influences thecost performance and power dissipation ofan embedded SoC

As discussed earlier the on-chip memoryorganization of embedded processors varieswidely from one SoC to another depending

on the application and market segment forwhich the SoC is deployed There is a widevariety of choices available for the embeddeddesigners starting from simple on-chipSPRAM based architecture to more complexcache-SPRAM based hybrid architecture Tobegin with the system designer needs todecide if the SoC requires cache and what isthe right size of on-chip RAM Once the highlevel memory organization is decided the finerparameters need to be defined to completethe memory architecture definition For the on-chip SPRAM based architecture theparameters namely size latency number ofmemory banks number of readwrite ports permemory bank and connectivity collectivelydefine the memory organization and stronglyinfluence the performance cost and powerconsumption For cache based on-chip RAMthe finer parameters are the size of cacheassociativity line size miss latency and writepolicy Due to its large impact on systemperformance parameters the memoryarchitecture is often hand-crafted by the

Figure 2 Memory Trends in SoC

389

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

designer based on the targeted applicationsHowever with the combination of on-chipSPRAM and cache the memory design spaceis too large for a manual analysis Also withthe projected growth in the complexity ofembedded systems and the vast design spacein memory architecture hand optimization ofthe memory architecture will soon becomeimpossible This warrants an automated frame-work which can explore the memoryarchitecture design space and identifyinteresting design points that are optimal froma performance power consumption and VLSIarea (and hence cost) perspective As thememory architecture design space itself isvast a brute force design space explorationtool may take large computation time andhence is unlikely to be useful in meeting thetight time-to-market constraint Further foreach given memory architecture there areseveral possible data section layouts which

are opti- mal in terms of performance andpower This further compounds the memoryarchitecture exploration problem

DSP ON-CHIP SPRAMARCHITECTUREDSP processor based embedded systemshave an on-chip memory which typically has asingle cycle access time The on-chip memoryalso referred to as scratch pad memory ismapped into an address space disjoint fromthe off-chip memory but connected to the sameaddress and data buses Typically the scratch-pad memory is organized into multiple memorybanks to facilitate multiple simultaneous dataaccesses DSP Processors typically have 2or more address generation units and multipleon-chip buses to facilitate multiple memoryaccesses

Further each on-chip memory bank can beorganized either as a single-access RAM

Figure 3 On-chip Memory Architecture of Embedded Processors

390

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

(SARAM) or as a dual-access RAM(DARAM) to provide single or dual accessesto the same memory bank in a single cycleFor example Texas Instruments TMS320C54Xdigital signal processor has two data readbuses and one data write bus TexasInstruments TMS320C55X processor hasthree data read busses and two data writebusses since concurrent access to the samearray are common in DSP applications Figure4 presents memory map of C55X DSPswhere multiple memory banks of SARAM andDARAM memory banks form a part of memorymap and MMR represents memory mappedregisters which typically contain controlregisters status registers and stack pointersThe DARAM and SARAM regions can berecognized using multiple memory bank toenable two concurrent accesses

BIBLIOGRAPHY1 ARM920T and ARM922T ARM9 Family

of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

2 ARM926EJ-S and ARM926E-S ARM9E

Family of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

3 lp solve httplpsolvesourceforgenet55

4 SystemC ndash Language for System-LevelModeling Design and Verification httpwwwsystemcorghome

5 Verilog Hardware DescriptionLanguage httpwwwverilogcomindexhtml

6 International Technology Roadmap forSemiconductors SEMATECH 3101Indus- trial Terrace Suite 106 Austin TX78758 2001

7 2007 global mobile communications -statistics trends and forecastsTechnical report httpwwwreportbuyercomtelecomsmobile2007 globalmobile trendshtml 2007

8 1st IEEE Inter Symposium on IndustrialEmbedded Systems PanelDiscussion Open Issues in SoCDesign httpwwwiestcfaorgpaneldiscussionshtm 2006

Figure 4 Example DSP Memory Map

Page 2: Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S ... · Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S Lakshmi, 2014 embedded memories, application specific circuits,

386

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

embedded memories application specificcircuits and interconnect infrastructure As aresult it is possible to integrate an entiresystem onto a single chip The single chipphone which has been introduced by severalsemiconductor vendors is an example of sucha system-on-chip it includes the modem radiotransceiver power management functionalitya multimedia engine and security features allon the same chip An embedded system is anapplication-specific system which is optimizedto perform a single function or a small set offunctions We distinguish this from a general-purpose system which is software-programmable to perform multiple functionsA personal computer is an ex- ample of ageneral-purpose system depending on thesoftware we run on the computer it can beuseful for playing games word processingdatabase operations scientific computationetc On the other hand a digital camera is anexample of an embedded system which canperform a limited set of functions such astaking pictures organizing them or transferringthem to another device through a suitable IOinterface Other examples of embeddedsystems include mobile phones audiovideoplayers videogame consoles set- top boxescar infotainment systems personal digitalassistants telephone central-office switchesdedicated network routers and bridges Notethat a large number of embedded systems arebuilt for the consumer market As a result inorder to be competitive the cost of anembedded system cannot be very high Yetthe consumers demand higher performanceand more features from the embeddedsystems products It is easy to appreciate thispoint if we compare the performance and

feature set offered by mobile phones that costRs 5000-(or 100$) today and which cost thesame a few years ago We also see that alarge number of embedded systems are beingbuilt for the mobile market This trend is notsurprising - the number of mobile phonesubscribers increased from 500 Million in year2000 to 26 Billion in 2007 Because of suchhigh volumes embedded systems areextremely cost sensitive and their designdemands careful silicon-area optimizationSince mobile devices use batteries as themain source of power embedded systemsmust also be optimized for energy dissipationPower which represents the rate at whichenergy is consumed must also be kept low toavoid heating and improving reliability Insummary the designer of an embeddedsystem must simultaneously consider andoptimize price performance energy andpower dissipation Application specificembedded systems designed today demandinnovative methods to optimize these systemcost functions

MEMORY SUBSYSTEMOn-chip Memory OrganizationThe memory architecture of an embeddedprocessor core is complex and is custom de-signed to improve run-time performance andpower consumption In this section wedescribe only on the memory architecture ofthe DSP processor as this is the focus of thethesis This is because the memoryarchitecture of the DSP is more complex thanthat of microcontrollers (MCU) due to thefollowing reasons

(a) DSP applications are more datadominated than the control-dominated

387

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

software executed on an MCU Memorybandwidth requirements for DSPapplications range from 2 to 3 memoryaccesses per processor clock cycle For anMCU this figure is at best one memoryaccess per cycle

(b) It is critical in DSP application to extractmaximum performance from the memorysubsystem in order to meet the real-timeconstraints of the embedded application Asa consequence the DSP software forcritical kernels is developed mostly as handoptimized assembly code In contrast thesoftware for MCU is typically developed inhigh-level languages The memoryarchitecture for a DSP is unique since theDSP has multiple on- chip buses andmultiple address generation units to servicehigher bandwidth needs

DATA LAYOUTTo efficiently use the on-chip memory criticaldata variables of the application need to beidentified and mapped to the on-chip RAMThe memory architecture may contain both on-chip cache and SPRAM In such a case it isimportant to partition the data section andassign them appropriately to on-chip cacheand SPRAM such that memory performanceof the application is optimized Further amongthe data sections assigned to on-chip cacheand SPRAM a proper placement of the datasections on the cache and SPRAM is requiredto ensure that the cache misses are reducedand the multiple memory banks of the SPRAMand the dual ported SPRAMs are efficientlyutilized Identifying such a data placement fordata sections referred to as the data layoutproblem is complex and critical step This task

Figure 1 Architecture of an Embedded SoC

388

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

is typically performed manually as the compilercannot assume that the code undercompilation represents the entire system

Memory Architecture Exploration

In modern embedded systems the area andpower consumed by the memory subsystemis up to 10 times that of the data path makingmemory a critical component of the designFurther the memory subsystem constitutes alarge part (typically up to 70) of the siliconarea for the current day SoC and it is expectedto go up to 94 in 2014 as shown in the Figure2 The main reason for this is that embeddedmemory has a relatively small subsystem per-area design cost in terms of both man-powertime-to- market and power consumptionHence the memory plays an important role inthe design of embedded SoCs Further thememory architecture strongly influences thecost performance and power dissipation ofan embedded SoC

As discussed earlier the on-chip memoryorganization of embedded processors varieswidely from one SoC to another depending

on the application and market segment forwhich the SoC is deployed There is a widevariety of choices available for the embeddeddesigners starting from simple on-chipSPRAM based architecture to more complexcache-SPRAM based hybrid architecture Tobegin with the system designer needs todecide if the SoC requires cache and what isthe right size of on-chip RAM Once the highlevel memory organization is decided the finerparameters need to be defined to completethe memory architecture definition For the on-chip SPRAM based architecture theparameters namely size latency number ofmemory banks number of readwrite ports permemory bank and connectivity collectivelydefine the memory organization and stronglyinfluence the performance cost and powerconsumption For cache based on-chip RAMthe finer parameters are the size of cacheassociativity line size miss latency and writepolicy Due to its large impact on systemperformance parameters the memoryarchitecture is often hand-crafted by the

Figure 2 Memory Trends in SoC

389

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

designer based on the targeted applicationsHowever with the combination of on-chipSPRAM and cache the memory design spaceis too large for a manual analysis Also withthe projected growth in the complexity ofembedded systems and the vast design spacein memory architecture hand optimization ofthe memory architecture will soon becomeimpossible This warrants an automated frame-work which can explore the memoryarchitecture design space and identifyinteresting design points that are optimal froma performance power consumption and VLSIarea (and hence cost) perspective As thememory architecture design space itself isvast a brute force design space explorationtool may take large computation time andhence is unlikely to be useful in meeting thetight time-to-market constraint Further foreach given memory architecture there areseveral possible data section layouts which

are opti- mal in terms of performance andpower This further compounds the memoryarchitecture exploration problem

DSP ON-CHIP SPRAMARCHITECTUREDSP processor based embedded systemshave an on-chip memory which typically has asingle cycle access time The on-chip memoryalso referred to as scratch pad memory ismapped into an address space disjoint fromthe off-chip memory but connected to the sameaddress and data buses Typically the scratch-pad memory is organized into multiple memorybanks to facilitate multiple simultaneous dataaccesses DSP Processors typically have 2or more address generation units and multipleon-chip buses to facilitate multiple memoryaccesses

Further each on-chip memory bank can beorganized either as a single-access RAM

Figure 3 On-chip Memory Architecture of Embedded Processors

390

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

(SARAM) or as a dual-access RAM(DARAM) to provide single or dual accessesto the same memory bank in a single cycleFor example Texas Instruments TMS320C54Xdigital signal processor has two data readbuses and one data write bus TexasInstruments TMS320C55X processor hasthree data read busses and two data writebusses since concurrent access to the samearray are common in DSP applications Figure4 presents memory map of C55X DSPswhere multiple memory banks of SARAM andDARAM memory banks form a part of memorymap and MMR represents memory mappedregisters which typically contain controlregisters status registers and stack pointersThe DARAM and SARAM regions can berecognized using multiple memory bank toenable two concurrent accesses

BIBLIOGRAPHY1 ARM920T and ARM922T ARM9 Family

of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

2 ARM926EJ-S and ARM926E-S ARM9E

Family of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

3 lp solve httplpsolvesourceforgenet55

4 SystemC ndash Language for System-LevelModeling Design and Verification httpwwwsystemcorghome

5 Verilog Hardware DescriptionLanguage httpwwwverilogcomindexhtml

6 International Technology Roadmap forSemiconductors SEMATECH 3101Indus- trial Terrace Suite 106 Austin TX78758 2001

7 2007 global mobile communications -statistics trends and forecastsTechnical report httpwwwreportbuyercomtelecomsmobile2007 globalmobile trendshtml 2007

8 1st IEEE Inter Symposium on IndustrialEmbedded Systems PanelDiscussion Open Issues in SoCDesign httpwwwiestcfaorgpaneldiscussionshtm 2006

Figure 4 Example DSP Memory Map

Page 3: Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S ... · Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S Lakshmi, 2014 embedded memories, application specific circuits,

387

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

software executed on an MCU Memorybandwidth requirements for DSPapplications range from 2 to 3 memoryaccesses per processor clock cycle For anMCU this figure is at best one memoryaccess per cycle

(b) It is critical in DSP application to extractmaximum performance from the memorysubsystem in order to meet the real-timeconstraints of the embedded application Asa consequence the DSP software forcritical kernels is developed mostly as handoptimized assembly code In contrast thesoftware for MCU is typically developed inhigh-level languages The memoryarchitecture for a DSP is unique since theDSP has multiple on- chip buses andmultiple address generation units to servicehigher bandwidth needs

DATA LAYOUTTo efficiently use the on-chip memory criticaldata variables of the application need to beidentified and mapped to the on-chip RAMThe memory architecture may contain both on-chip cache and SPRAM In such a case it isimportant to partition the data section andassign them appropriately to on-chip cacheand SPRAM such that memory performanceof the application is optimized Further amongthe data sections assigned to on-chip cacheand SPRAM a proper placement of the datasections on the cache and SPRAM is requiredto ensure that the cache misses are reducedand the multiple memory banks of the SPRAMand the dual ported SPRAMs are efficientlyutilized Identifying such a data placement fordata sections referred to as the data layoutproblem is complex and critical step This task

Figure 1 Architecture of an Embedded SoC

388

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

is typically performed manually as the compilercannot assume that the code undercompilation represents the entire system

Memory Architecture Exploration

In modern embedded systems the area andpower consumed by the memory subsystemis up to 10 times that of the data path makingmemory a critical component of the designFurther the memory subsystem constitutes alarge part (typically up to 70) of the siliconarea for the current day SoC and it is expectedto go up to 94 in 2014 as shown in the Figure2 The main reason for this is that embeddedmemory has a relatively small subsystem per-area design cost in terms of both man-powertime-to- market and power consumptionHence the memory plays an important role inthe design of embedded SoCs Further thememory architecture strongly influences thecost performance and power dissipation ofan embedded SoC

As discussed earlier the on-chip memoryorganization of embedded processors varieswidely from one SoC to another depending

on the application and market segment forwhich the SoC is deployed There is a widevariety of choices available for the embeddeddesigners starting from simple on-chipSPRAM based architecture to more complexcache-SPRAM based hybrid architecture Tobegin with the system designer needs todecide if the SoC requires cache and what isthe right size of on-chip RAM Once the highlevel memory organization is decided the finerparameters need to be defined to completethe memory architecture definition For the on-chip SPRAM based architecture theparameters namely size latency number ofmemory banks number of readwrite ports permemory bank and connectivity collectivelydefine the memory organization and stronglyinfluence the performance cost and powerconsumption For cache based on-chip RAMthe finer parameters are the size of cacheassociativity line size miss latency and writepolicy Due to its large impact on systemperformance parameters the memoryarchitecture is often hand-crafted by the

Figure 2 Memory Trends in SoC

389

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

designer based on the targeted applicationsHowever with the combination of on-chipSPRAM and cache the memory design spaceis too large for a manual analysis Also withthe projected growth in the complexity ofembedded systems and the vast design spacein memory architecture hand optimization ofthe memory architecture will soon becomeimpossible This warrants an automated frame-work which can explore the memoryarchitecture design space and identifyinteresting design points that are optimal froma performance power consumption and VLSIarea (and hence cost) perspective As thememory architecture design space itself isvast a brute force design space explorationtool may take large computation time andhence is unlikely to be useful in meeting thetight time-to-market constraint Further foreach given memory architecture there areseveral possible data section layouts which

are opti- mal in terms of performance andpower This further compounds the memoryarchitecture exploration problem

DSP ON-CHIP SPRAMARCHITECTUREDSP processor based embedded systemshave an on-chip memory which typically has asingle cycle access time The on-chip memoryalso referred to as scratch pad memory ismapped into an address space disjoint fromthe off-chip memory but connected to the sameaddress and data buses Typically the scratch-pad memory is organized into multiple memorybanks to facilitate multiple simultaneous dataaccesses DSP Processors typically have 2or more address generation units and multipleon-chip buses to facilitate multiple memoryaccesses

Further each on-chip memory bank can beorganized either as a single-access RAM

Figure 3 On-chip Memory Architecture of Embedded Processors

390

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

(SARAM) or as a dual-access RAM(DARAM) to provide single or dual accessesto the same memory bank in a single cycleFor example Texas Instruments TMS320C54Xdigital signal processor has two data readbuses and one data write bus TexasInstruments TMS320C55X processor hasthree data read busses and two data writebusses since concurrent access to the samearray are common in DSP applications Figure4 presents memory map of C55X DSPswhere multiple memory banks of SARAM andDARAM memory banks form a part of memorymap and MMR represents memory mappedregisters which typically contain controlregisters status registers and stack pointersThe DARAM and SARAM regions can berecognized using multiple memory bank toenable two concurrent accesses

BIBLIOGRAPHY1 ARM920T and ARM922T ARM9 Family

of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

2 ARM926EJ-S and ARM926E-S ARM9E

Family of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

3 lp solve httplpsolvesourceforgenet55

4 SystemC ndash Language for System-LevelModeling Design and Verification httpwwwsystemcorghome

5 Verilog Hardware DescriptionLanguage httpwwwverilogcomindexhtml

6 International Technology Roadmap forSemiconductors SEMATECH 3101Indus- trial Terrace Suite 106 Austin TX78758 2001

7 2007 global mobile communications -statistics trends and forecastsTechnical report httpwwwreportbuyercomtelecomsmobile2007 globalmobile trendshtml 2007

8 1st IEEE Inter Symposium on IndustrialEmbedded Systems PanelDiscussion Open Issues in SoCDesign httpwwwiestcfaorgpaneldiscussionshtm 2006

Figure 4 Example DSP Memory Map

Page 4: Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S ... · Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S Lakshmi, 2014 embedded memories, application specific circuits,

388

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

is typically performed manually as the compilercannot assume that the code undercompilation represents the entire system

Memory Architecture Exploration

In modern embedded systems the area andpower consumed by the memory subsystemis up to 10 times that of the data path makingmemory a critical component of the designFurther the memory subsystem constitutes alarge part (typically up to 70) of the siliconarea for the current day SoC and it is expectedto go up to 94 in 2014 as shown in the Figure2 The main reason for this is that embeddedmemory has a relatively small subsystem per-area design cost in terms of both man-powertime-to- market and power consumptionHence the memory plays an important role inthe design of embedded SoCs Further thememory architecture strongly influences thecost performance and power dissipation ofan embedded SoC

As discussed earlier the on-chip memoryorganization of embedded processors varieswidely from one SoC to another depending

on the application and market segment forwhich the SoC is deployed There is a widevariety of choices available for the embeddeddesigners starting from simple on-chipSPRAM based architecture to more complexcache-SPRAM based hybrid architecture Tobegin with the system designer needs todecide if the SoC requires cache and what isthe right size of on-chip RAM Once the highlevel memory organization is decided the finerparameters need to be defined to completethe memory architecture definition For the on-chip SPRAM based architecture theparameters namely size latency number ofmemory banks number of readwrite ports permemory bank and connectivity collectivelydefine the memory organization and stronglyinfluence the performance cost and powerconsumption For cache based on-chip RAMthe finer parameters are the size of cacheassociativity line size miss latency and writepolicy Due to its large impact on systemperformance parameters the memoryarchitecture is often hand-crafted by the

Figure 2 Memory Trends in SoC

389

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

designer based on the targeted applicationsHowever with the combination of on-chipSPRAM and cache the memory design spaceis too large for a manual analysis Also withthe projected growth in the complexity ofembedded systems and the vast design spacein memory architecture hand optimization ofthe memory architecture will soon becomeimpossible This warrants an automated frame-work which can explore the memoryarchitecture design space and identifyinteresting design points that are optimal froma performance power consumption and VLSIarea (and hence cost) perspective As thememory architecture design space itself isvast a brute force design space explorationtool may take large computation time andhence is unlikely to be useful in meeting thetight time-to-market constraint Further foreach given memory architecture there areseveral possible data section layouts which

are opti- mal in terms of performance andpower This further compounds the memoryarchitecture exploration problem

DSP ON-CHIP SPRAMARCHITECTUREDSP processor based embedded systemshave an on-chip memory which typically has asingle cycle access time The on-chip memoryalso referred to as scratch pad memory ismapped into an address space disjoint fromthe off-chip memory but connected to the sameaddress and data buses Typically the scratch-pad memory is organized into multiple memorybanks to facilitate multiple simultaneous dataaccesses DSP Processors typically have 2or more address generation units and multipleon-chip buses to facilitate multiple memoryaccesses

Further each on-chip memory bank can beorganized either as a single-access RAM

Figure 3 On-chip Memory Architecture of Embedded Processors

390

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

(SARAM) or as a dual-access RAM(DARAM) to provide single or dual accessesto the same memory bank in a single cycleFor example Texas Instruments TMS320C54Xdigital signal processor has two data readbuses and one data write bus TexasInstruments TMS320C55X processor hasthree data read busses and two data writebusses since concurrent access to the samearray are common in DSP applications Figure4 presents memory map of C55X DSPswhere multiple memory banks of SARAM andDARAM memory banks form a part of memorymap and MMR represents memory mappedregisters which typically contain controlregisters status registers and stack pointersThe DARAM and SARAM regions can berecognized using multiple memory bank toenable two concurrent accesses

BIBLIOGRAPHY1 ARM920T and ARM922T ARM9 Family

of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

2 ARM926EJ-S and ARM926E-S ARM9E

Family of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

3 lp solve httplpsolvesourceforgenet55

4 SystemC ndash Language for System-LevelModeling Design and Verification httpwwwsystemcorghome

5 Verilog Hardware DescriptionLanguage httpwwwverilogcomindexhtml

6 International Technology Roadmap forSemiconductors SEMATECH 3101Indus- trial Terrace Suite 106 Austin TX78758 2001

7 2007 global mobile communications -statistics trends and forecastsTechnical report httpwwwreportbuyercomtelecomsmobile2007 globalmobile trendshtml 2007

8 1st IEEE Inter Symposium on IndustrialEmbedded Systems PanelDiscussion Open Issues in SoCDesign httpwwwiestcfaorgpaneldiscussionshtm 2006

Figure 4 Example DSP Memory Map

Page 5: Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S ... · Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S Lakshmi, 2014 embedded memories, application specific circuits,

389

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

designer based on the targeted applicationsHowever with the combination of on-chipSPRAM and cache the memory design spaceis too large for a manual analysis Also withthe projected growth in the complexity ofembedded systems and the vast design spacein memory architecture hand optimization ofthe memory architecture will soon becomeimpossible This warrants an automated frame-work which can explore the memoryarchitecture design space and identifyinteresting design points that are optimal froma performance power consumption and VLSIarea (and hence cost) perspective As thememory architecture design space itself isvast a brute force design space explorationtool may take large computation time andhence is unlikely to be useful in meeting thetight time-to-market constraint Further foreach given memory architecture there areseveral possible data section layouts which

are opti- mal in terms of performance andpower This further compounds the memoryarchitecture exploration problem

DSP ON-CHIP SPRAMARCHITECTUREDSP processor based embedded systemshave an on-chip memory which typically has asingle cycle access time The on-chip memoryalso referred to as scratch pad memory ismapped into an address space disjoint fromthe off-chip memory but connected to the sameaddress and data buses Typically the scratch-pad memory is organized into multiple memorybanks to facilitate multiple simultaneous dataaccesses DSP Processors typically have 2or more address generation units and multipleon-chip buses to facilitate multiple memoryaccesses

Further each on-chip memory bank can beorganized either as a single-access RAM

Figure 3 On-chip Memory Architecture of Embedded Processors

390

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

(SARAM) or as a dual-access RAM(DARAM) to provide single or dual accessesto the same memory bank in a single cycleFor example Texas Instruments TMS320C54Xdigital signal processor has two data readbuses and one data write bus TexasInstruments TMS320C55X processor hasthree data read busses and two data writebusses since concurrent access to the samearray are common in DSP applications Figure4 presents memory map of C55X DSPswhere multiple memory banks of SARAM andDARAM memory banks form a part of memorymap and MMR represents memory mappedregisters which typically contain controlregisters status registers and stack pointersThe DARAM and SARAM regions can berecognized using multiple memory bank toenable two concurrent accesses

BIBLIOGRAPHY1 ARM920T and ARM922T ARM9 Family

of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

2 ARM926EJ-S and ARM926E-S ARM9E

Family of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

3 lp solve httplpsolvesourceforgenet55

4 SystemC ndash Language for System-LevelModeling Design and Verification httpwwwsystemcorghome

5 Verilog Hardware DescriptionLanguage httpwwwverilogcomindexhtml

6 International Technology Roadmap forSemiconductors SEMATECH 3101Indus- trial Terrace Suite 106 Austin TX78758 2001

7 2007 global mobile communications -statistics trends and forecastsTechnical report httpwwwreportbuyercomtelecomsmobile2007 globalmobile trendshtml 2007

8 1st IEEE Inter Symposium on IndustrialEmbedded Systems PanelDiscussion Open Issues in SoCDesign httpwwwiestcfaorgpaneldiscussionshtm 2006

Figure 4 Example DSP Memory Map

Page 6: Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S ... · Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S Lakshmi, 2014 embedded memories, application specific circuits,

390

This article can be downloaded from httpwwwijerstcomcurrentissuephp

Int J Engg Res amp Sci amp Tech 2014 G Ravi Kumar and S Lakshmi 2014

(SARAM) or as a dual-access RAM(DARAM) to provide single or dual accessesto the same memory bank in a single cycleFor example Texas Instruments TMS320C54Xdigital signal processor has two data readbuses and one data write bus TexasInstruments TMS320C55X processor hasthree data read busses and two data writebusses since concurrent access to the samearray are common in DSP applications Figure4 presents memory map of C55X DSPswhere multiple memory banks of SARAM andDARAM memory banks form a part of memorymap and MMR represents memory mappedregisters which typically contain controlregisters status registers and stack pointersThe DARAM and SARAM regions can berecognized using multiple memory bank toenable two concurrent accesses

BIBLIOGRAPHY1 ARM920T and ARM922T ARM9 Family

of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

2 ARM926EJ-S and ARM926E-S ARM9E

Family of Embedded Processors httpwwwarmcomproductsCPUsfamiliesARM9Familyhtml

3 lp solve httplpsolvesourceforgenet55

4 SystemC ndash Language for System-LevelModeling Design and Verification httpwwwsystemcorghome

5 Verilog Hardware DescriptionLanguage httpwwwverilogcomindexhtml

6 International Technology Roadmap forSemiconductors SEMATECH 3101Indus- trial Terrace Suite 106 Austin TX78758 2001

7 2007 global mobile communications -statistics trends and forecastsTechnical report httpwwwreportbuyercomtelecomsmobile2007 globalmobile trendshtml 2007

8 1st IEEE Inter Symposium on IndustrialEmbedded Systems PanelDiscussion Open Issues in SoCDesign httpwwwiestcfaorgpaneldiscussionshtm 2006

Figure 4 Example DSP Memory Map

Page 7: Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S ... · Int. J. Engg. Res. & Sci. & Tech. 2014 G Ravi Kumar and S Lakshmi, 2014 embedded memories, application specific circuits,