4
The DDRx Memory Controller Extended for Reconfigurable Computing Jih-Ching Chiu Department of Electrical Engineering National Sun Yat-Sen University 70 Lien-Hai Rd Kaohsiung 804 Taiwan [email protected] Kai-Ming Yang Department of Electrical Engineering National Sun Yat-Sen University 70 Lien-Hai Rd Kaohsiung 804 Taiwan [email protected] AbstractWith the popularity of the DDRx memory there are a lot of applications in digital products and platforms and the reconfigurable computing system has potential to accelerate in large amounts of data computing. However, current trend is towards combining a microprocessor with one or many reconfigurable computing units. Thus, the massive data transfer among CPUs, memory modules and reconfigurable accelerators will be a big challenge for system bus. And then the system performance will be limited on the system bus bandwidth. In this paper, we propose the architecture to connect DDRx memory and reconfigurable FPGA directly and it can support the data transfer function between them bypassing system bus, called brain module controller, whose instruction set is created through the extension of DDRx memory controller’s. By the controller functions, we can construct a Software-Hardware co-design platform with memory mapped methods. Keywords-Reconfigurable computing, Memory controller Reconfigurable computing systems had been proven on large amounts of data computing has the potential to improve performance. Reconfigurable computing research, but most focus on how to increase the degree of parallel execution by the hardware, and how to transfer algorithms to the implementation of the hardware [1]. But few studies explore to solve the problems (1) how to expand the system bus bandwidth for large amounts of data operations and (2) how to respect the hardware /software communication interface. Reconfigurable computing systems connected to multiple acceleration hardware when acceleration hardware to transfer data, will take up the system bus, other devices will not be transferred, reducing the system's efficiency. This paper focuses on the problem, integrated DDRx memory and reconfigurable computing unit. The functions of DDRx controller are extended with data exchange mechanisms for the data transfer between reconfigurable device and memory, to reduce the incidence of snatch the situation of the bus. Due to the hardware interface is not exactly the same, so that software designers need to write different programs for different hardware, hardware and software communication added a lot of programming complexity. Through a hardware management unit hardware and software communication mechanisms, hardware designers and software designers to provide a standard interface, and reduce program development time. Memory mapped mechanism is proposed to make the suitable way for programming stiles, so that software developers can be integrated programming environments to achieve the co-design of hardware and software[13] [14]. However, in the current reconfigurable computing systems the reconfigurable accelerators are usually connected with the system bus, and therefore the system bus is crowding. DDRx controller [2] is very popular on the applications of personal computer, working station and server especially on the digital electronic products and embedded multiprocessor systems for example video transformation box, digital camera and embedded SOC platforms. In multiprocessor systems the local memory may be constructed with DDRx SDRAM for getting a large memory spaces and the data duplication will be a big issue on large data transfer. According to above characteristics and widespread applications, we could design a DDRx memory controller with the reconfigurable device interface and implement the fast data switching function and configurable functions on it. To make it applicable to transfer data directly between reconfigurable device and memory and achieve the rapid data switching approaches. Therefore we can not only reduce the developer’s working load study on how to control the memory but also increase the applications of the digital product on the markets. In this paper, we propose the memory controller with internal fast data switching mechanisms and configurable functions for reconfigurable computing. According to these ideas it can both control the memory and switch the different bank address of data for reconfigurable device without increasing the burden on the work of system bus. Thus it will reduce the working load of system bus and increase the data changing rate between reconfigurable device and memory and by the memory mapped styles these mechanisms will support more friendly interface for hardware and software co-design applications. I. PREVIOUS WORK AND BACKGROUND A. Overview the Vforce The Vforce framework, shown as Fig. 1 [10], based on the object-oriented VSIPL++ standard, proposes the same application code that can run on different reconfigurable computer architectures with no change. To achieve this goal, they implement the Vforce framework encapsulating the hardware-specific implementations behind a standard API, thus the application code does not need to know the hardware- 978-1-4673-2588-2/12/$31.00 ©2012 IEEE 33

[IEEE 2012 International Conference on Information Security and Intelligence Control (ISIC) - Yunlin, Taiwan (2012.08.14-2012.08.16)] 2012 International Conference on Information Security

Embed Size (px)

Citation preview

The DDRx Memory Controller Extended for Reconfigurable Computing

Jih-Ching Chiu Department of Electrical Engineering

National Sun Yat-Sen University 70 Lien-Hai RdKaohsiung 804 Taiwan [email protected]

Kai-Ming Yang Department of Electrical Engineering

National Sun Yat-Sen University 70 Lien-Hai RdKaohsiung 804 Taiwan

[email protected]

Abstract—With the popularity of the DDRx memory there are a lot of applications in digital products and platforms and the reconfigurable computing system has potential to accelerate in large amounts of data computing. However, current trend is towards combining a microprocessor with one or many reconfigurable computing units. Thus, the massive data transfer among CPUs, memory modules and reconfigurable accelerators will be a big challenge for system bus. And then the system performance will be limited on the system bus bandwidth. In this paper, we propose the architecture to connect DDRx memory and reconfigurable FPGA directly and it can support the data transfer function between them bypassing system bus, called brain module controller, whose instruction set is created through the extension of DDRx memory controller’s. By the controller functions, we can construct a Software-Hardware co-design platform with memory mapped methods.

Keywords-Reconfigurable computing, Memory controller

Reconfigurable computing systems had been proven on large amounts of data computing has the potential to improve performance. Reconfigurable computing research, but most focus on how to increase the degree of parallel execution by the hardware, and how to transfer algorithms to the implementation of the hardware [1]. But few studies explore to solve the problems (1) how to expand the system bus bandwidth for large amounts of data operations and (2) how to respect the hardware /software communication interface. Reconfigurable computing systems connected to multiple acceleration hardware when acceleration hardware to transfer data, will take up the system bus, other devices will not be transferred, reducing the system's efficiency.

This paper focuses on the problem, integrated DDRx memory and reconfigurable computing unit. The functions of DDRx controller are extended with data exchange mechanisms for the data transfer between reconfigurable device and memory, to reduce the incidence of snatch the situation of the bus. Due to the hardware interface is not exactly the same, so that software designers need to write different programs for different hardware, hardware and software communication added a lot of programming complexity. Through a hardware management unit hardware and software communication mechanisms, hardware designers and software designers to provide a standard interface, and reduce program development time. Memory mapped mechanism is proposed to make the suitable way for programming stiles, so that software

developers can be integrated programming environments to achieve the co-design of hardware and software[13] [14].

However, in the current reconfigurable computing systems the reconfigurable accelerators are usually connected with the system bus, and therefore the system bus is crowding. DDRx controller [2] is very popular on the applications of personal computer, working station and server especially on the digital electronic products and embedded multiprocessor systems for example video transformation box, digital camera and embedded SOC platforms. In multiprocessor systems the local memory may be constructed with DDRx SDRAM for getting a large memory spaces and the data duplication will be a big issue on large data transfer.

According to above characteristics and widespread applications, we could design a DDRx memory controller with the reconfigurable device interface and implement the fast data switching function and configurable functions on it. To make it applicable to transfer data directly between reconfigurable device and memory and achieve the rapid data switching approaches. Therefore we can not only reduce the developer’s working load study on how to control the memory but also increase the applications of the digital product on the markets.

In this paper, we propose the memory controller with internal fast data switching mechanisms and configurable functions for reconfigurable computing. According to these ideas it can both control the memory and switch the different bank address of data for reconfigurable device without increasing the burden on the work of system bus. Thus it will reduce the working load of system bus and increase the data changing rate between reconfigurable device and memory and by the memory mapped styles these mechanisms will support more friendly interface for hardware and software co-design applications.

I. PREVIOUS WORK AND BACKGROUND

A. Overview the Vforce The Vforce framework, shown as Fig. 1 [10], based on the

object-oriented VSIPL++ standard, proposes the same application code that can run on different reconfigurable computer architectures with no change. To achieve this goal, they implement the Vforce framework encapsulating the hardware-specific implementations behind a standard API, thus the application code does not need to know the hardware-

978-1-4673-2588-2/12/$31.00 ©2012 IEEE 33

specific detail. The Vforce framework can therefore be used to solve the communication interface between the software and the hardware. However, designer use the Vforce in reconfigurable supercomputing architectures in contract to the embedded system, and using Vforce needs a large space to build up a processing kernel library and VISPl++ data.

B. IRES The IRES (I-link for Reconfigurable Embedded System),

shown as Fig. 2, is a reconfigurable hardware and software for embedded systems development, integration platform [3] [4]. It is mainly composed of I-Link (the Integration the linker), Hardware Management Unit (Hardware Management Unit, HMU), the boot loader and operating system. I-link is the main portion of IRES, linking the application codes and acceleration hardware net-list files with operating objects to an execution file, called executor. The hardware management unit is the bridge between reconfigurable hardware and tasks,

and it has the ability of dynamically controlling hardware threads. Boot loader initials the system environment. Operating objects provide multiple tasking environments. Eventually, IRES uses the FPGA matrix, formed by multiple FPGAs to create simulated partial reconfigurable FPGA [9] [11] [12], to design the testing platform.

II. CONTROLLER ARCHITECTURE

A. The architecture of the controller In this paper, we propose the architecture to connect DDRx

memory and reconfigurable FPGA directly and it can support the data transfer function between them bypassing system bus, called brain module controller, shown as Fig. 3, whose instruction set is created through the extension of DDRx memory controller’s. By the controller functions, we can construct a Software-Hardware co-design platform with memory mapped methods.

The controller is composed by a DDRx controller and FPGA controller, shown as Fig. 4. The DDRx controller is responsible for memory parameters with read and writes data, and the FPGA controller is responsible for the configuration of the FPGA, as well as takes the FPGA instructions. In addition,

Fig. 4. The architecture of the controllerFig. 3. The propose architecture

Fig. 1. The Figure Vforce Framework architecture.Vforce framework provides the user program portability and scalability, but Vforce framework platform itself does not generate the hardware configuration tools, so must also provide the application referred by the hardware designers to the actual configuration of the surface of the underlying hardware object to the software program designers. In addition, because the software program must be through the processing object through the hardware inside the object to communicate with RTRM talent and re-connection, the parameters needed by the transmission operator and the hardware configuration of the computing unit, so the communication have to pay significant additional costs that.

Fig. 2. Block diagram of the Figure of IRESIRES executor is a novel concept and it has potential in embedded system. To develop reconfigurable embedded system is a hard work for designers, but IRES does well for communicating HW and SW. Using the data structure of the IRES executor, the operating system interface can be enhanced to connect with the conventional operating systems. By the way, the IRES executor can be executed as conventional executable file to achieve the purpose of using hardware for acceleration in conventional operating systems.

34

the controller uses the Queue structure for DDRx and FPGA to take the internal data exchange.

Controller internal architecture is shown as fig. The decoder module is responsible for decoding the address and instructions from the CPU side respectively, and generate relative the control signals to complete the instruction operations. The initial module, the write module and the read module are responsible for the control of the memory. The Initial module is responsible for controlling the memory initialization. The Write module is responsible for the memory write control and informs the written data number and address at the same time. The Read module is responsible for memory read control to produce the enable signal to read the data, and also informs the read data number and address.

The FPGA control module is responsible for the FPGA control can be divided into two parts, a configuration control module and command module. As the configuration control module is enabled, the FPGA is started for configuring hardware by the configuration data in memory. The FPGA command module is responsible for the FPGA hardware management. As the FPGA command module is enabled, it will base on the address to determine which command to the control registers, including to set the number of the hardware, to set the parameters of the starting address, to set the number of parameters, to set the result to store the address, to set results written back to the size of the memory space of the number and to set of hardware. As the FPGA accelerated hardware is not only one, in order to increase the parallelism, so we will create a hardware management unit (Hardware Management Unit, HMU) [3] [5] to manage communication between the memory and hardware, shown as Fig. 5. The in_ arbiter arbitrates the data path linked to acceleration hardware. Through Out_ Arbiter, the operation results will be stored back.

The controller is composed by a DDRx controller and FPGA controller, shown as Fig. 4. The DDRx controller is responsible for memory parameters with read and writes data, and the FPGA controller is responsible for the configuration of the FPGA, as well as takes the FPGA instructions. In addition, the controller uses the Queue structure for DDRx and FPGA to take the internal data exchange.

Controller internal architecture is shown as fig. The decoder module is responsible for decoding the address and instructions from the CPU side respectively, and generate relative the control signals to complete the instruction

operations. The initial module, the write module and the read module are responsible for the control of the memory. The Initial module is responsible for controlling the memory initialization. The Write module is responsible for the memory write control and informs the written data number and address at the same time. The Read module is responsible for memory read control to produce the enable signal to read the data, andalso informs the read data number and address.

The FPGA control module is responsible for the FPGA control can be divided into two parts, a configuration control module and command module. As the configuration control module is enabled, the FPGA is started for configuring hardware by the configuration data in memory. The FPGA command module is responsible for the FPGA hardware management. As the FPGA command module is enabled, it will base on the address to determine which command to the control registers, including to set the number of the hardware, to set the parameters of the starting address, to set the number of parameters, to set the result to store the address, to set results written back to the size of the memory space of the number and to set of hardware. As the FPGA accelerated hardware is not only one, in order to increase the parallelism, so we will create a hardware management unit (Hardware Management Unit, HMU) [3] [5] to manage communication between the memory and hardware, shown as Fig. 5. The in_ arbiter arbitrates the data path linked to acceleration hardware. Through Out_ Arbiter, the operation results will be stored back.

B. Data switching timing Data exchange mechanism, shown as Fig. 6, can be

illustrated with two topics; one is acceleration hardware to read data and the others operation results stored back. When the acceleration hardware read data, hardware management unit issues the request and the acceleration hardware sends the data address to be read. The hardware management unit send the read command and the requested data will be read out from memory and stored in controller FPGA_write_Queue. When the read operation is completed, the controllers will info the hardware management unit with the ready signal and the data will be bridged to the acceleration hardware. When the acceleration hardware will write the result of the operation, the hardware management unit issue the request, and the acceleration hardware send the data address to be written, then the hardware management unit will check if the locations are

Fig. 5. The connection between Hardware Management Unit and hardwires Fig. 6. The Data exchange mechanism

35

free and send the ready signal to the acceleration hardware to store data into Memory_write_Queue, Until the data store operation is finished, the hardware management unit would issue a write instruction, the data can be written back to memory.

III. SIMULATION RESULT

We use the application of the data sorting as an example to verify the feasibility of the system architecture. Sorting algorithm can make a bunch of data arranged in accordance. The effective sorting function on some algorithm is the most important like as in data base applications. The full example simulation architecture is shown as Fig. 7.

Through the emulator we sent the data write commend to the controller and progress the whole data written to the memory, and then sent command to the hardware management unit to make the hardware function starting procedure. Until the hardware function is finished, we read the memory which stores the sorted data. From the figure, we can check the data sorting if it is completed successfully. From the simulation results, shown as Fig. 8, the functions of the controller are verified successfully.

IV. CONCLUSION

In this paper, we extend the instructions of the DDRx memory controller to support the control functions of the reconfigurable computing unit, such as FPGA. By this approach, the memory part and reconfigurable computing modules can be connected together and support memory mapped interface for the hardware/software co-design applications. The software–hardware integration methods for computing in reconfigurable embedded systems are explored by the IRES platform. Users can design the application in memory mapped concepts, which operate the executor on target-embedded environments to achieve the communication between hardware and software in light system bus overhead. The tasks communicate with the hardware using the function call and defined variables, to invoke hardware functions. This controller functions had been completely vivificated using the simulated partial reconfigurable FPGA target board and demonstrates communication effectiveness between hardware and software. This controller’s design philosophy is a novel concept for reconfigurable computing. To develop reconfigurable embedded system is a hard work for designers, but it does well for communicating hardware and software. By the way, the reconfigurable computing unit can be executed as

conventional programming concepts to achieve the purpose of using hardware for acceleration in conventional operating systems.

ACKNOWLEDGMENT

The authors would like to thank the reviewers for their many constructive comments and suggestions in improving this paper. We also thank the contributions of National Sun Yat-Sen University and aim for the top university project under grant 01C030710.

REFERENCES

[1] D. Buell, T. El-Gbazawi, K. Gaj, V. Kindratenko, “High-Performance Reconfigurable Computing,” Computer, March. 2007, pp. 23-27.

[2] Jih-Ching Chiu and Kai-Ming Yang, "Data Tunnel in DDRx Memory Controller," Workshop on Parallel and Distributed Computing (PD), 2009 National Computer Symposium (NCS 2009)), Taipei, Taiwan, pp. 257-262, Nov. 2009.

[3] CHIU J.C., CHOU Y.L., LIN R.B."The multi-context reconfigurable processing unit for fine-grain computing," J. Inform. Sci. Eng., 2008, 24, (3), pp. 965–979.

[4] CHIU J.C., CHOU Y.L., LIN R.B."The multi-context reconfigurable processing unit for fine-grain computing," J. Inform. Sci. Eng., 2J.-C. Chiu T.-L. Yeh, "IRES: An integrated software and hardware interface framework for reconfigurable embedded system," IET Computers and Digital Techniques, Vol. 4, No. 1, pp. 27-37, Jan. 2010.

[5] Jih-Ching Chiu, Ta-Li Yeh, and Mun-Kit Leong, “The Software and Hardware Integration Linker for Reconfigurable Embedded System,” IEEE International Conference on Computational Science and Engineering (CSE '09), Vancouver, Canada, Aug. 2009.

[6] Jih-Ching Chiu, Kai-Ming Yang, and Ta-Li Yeh, “A Hardware Invocation Mechanism for Reconfigurable Embedded System,” International Computer Symposium, pp. 664-669, Dec. 2010.

[7] HyperDrive Multi-port DDR2 Memory Controller IP. Available : http://www.altera.com.cn/products/ip/iup/memory/m-mtx-multiport-hyperdrive-sdram.html

[8] Primecell DDR2 Dynamic Memory Controller. Available: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0418d/index.html.

[9] Altera Application note 116: Configuring SRAM-Based LUT Devices[S], 2002.

[10] N. Moore, A. Conti, M. Leeser, L.S. King, “Vforce: An Extensible Framework for Reconfigurable Supercomputing,” Computer, March. 2007, pp. 39-49.

[11] D. Andrews, D. Niehaus, and P. Ashenden, “Programming Models for Hybrid CPU/FPGA Chips,” Computer, Jan. 2004, pp. 118-120.

[12] K. Parnell and R. Bryner, “Comparing and Contrasting FPGA and Microprocessor System Design and Development,” Xilinx Inc., July 2004.

[13] A. Samahi, S. Boukhechem, E. bourennane, “Communication Interface Generation For HW/SW Architecture In The STARSoC Environment,” IEEE International conference on Reconfigurable Computing and FPGA's, IEEE Computer Society Press , San louis Potosi , Mexico, 20 September 2006.

[14] A. Samahi, S. Boukhechem, “Automated Integration and Communication Synthesis of Reconfigurable MPSoC Platforms,” NASA/ESA Conference on Adaptive Hardware and Systems, Edinburgh, United Kingdom, 8 August 2007

Fig. 7. The simulation architecture

Fig. 8. The simulation Result

36