Upload
saravanan-ns
View
68
Download
1
Embed Size (px)
DESCRIPTION
Project list
Citation preview
VLSI IEEE-2012 Titles
Image processing
1. Implementation of image reconstruction algorithm using
compressive sensing in FPGA
Abstract
Compressive Sensing (CS) is a technique that suggests the possibility of
reconstruction of a signal vector using much smaller linear measurements than
its dimension. Sparse signals are acquired in vectors using sensing matrices. If
the signals are sparse enough the original signal can be reconstructed
successfully. In CS applications while the signal can be acquired using basic
methods, in reconstructing the signal using incomplete data sets high processing
power and complex statistical computations are required. In this research OMP
(Orthogonal Matching Pursuit) which is a faster and more hardware-
implementable reconstruction algorithm among other methods is used. OMP
algorithm is implemented on a Virtex-6 type FPGA (Field Programmable Gate
Array). With various optimizations the designed system yielded at least
thousand times faster results than CPU (Central Processing Unit) and GPU
(Graphics Processing Unit) applications.
2. Implementation of algorithm for detection and correction of
defective pixels in FPGA
VLSI IEEE-2012 Titles
Abstract
Defect pixels are a common occurrence in digital camera sensors, either
resulting from the manufacturing process or developing over time. Though low
in quantity, they are very noticeable and can destroy the perceived quality of the
images. This paper presents a method for detection and correction of defect
pixels in images generated by a Bayer mosaic image sensor. We propose an
online and adaptive algorithm, which analyzes the images retrieved from a
Bayer array sensor on a pixel by pixel basis. We consider the values of adjacent
pixels to determine if the current pixel is possibly defective, which is either
confirmed or refuted by repeating the analysis in subsequent frames. For the
confirmed defective pixels, interpolation is performed to restore the image
quality. The algorithm is implemented on a FPGA logic device, suitable for
very high frequency operation required to correct defect pixels in images
produced by high definition (HD) cameras.
3. Real time hardware co-simulation of Edge Detection for video
processing system
Abstract
A methodology for implementing real-time DSP applications on a field
programmable gate arrays (FPGA) using Xilinx System Generator (XSG) for
Matlab is presented in this paper. It presents architecture for Edge Detection
using Sobel Filter for image processing using Xilinx System Generator. The
design was implemented targeting a Spartan3A DSP 3400 device
(XC3SD3400A-4FGG676C) then a Virtex 5 (xc5vlx50-1ff676). The Edge
VLSI IEEE-2012 Titles
Detection method has been verified successfully with no visually perceptual
errors in the resulted images.
4. FPGA implementation of graph cut based image thresholding
Abstract
Thresholding is an important process in many image processing applications.
Recently, a bi-level image thresholding method based on graph cut was
proposed. The method provided thresholding results which were superior to
those obtained with previous techniques. Moreover, the technique was
computationally less complex compared to other graph cut-based image
thresholding approaches. However, the execution time requirements may still
be significant, especially if it is of interest to perform real-time thresholding of
a large number of images, such as in the case of high-resolution video
sequences. In this paper, we propose a method based on the previously
proposed graph cut thresholding method, which is nevertheless appropriate for
hardware (FPGA) real-time implementations. A subset of the proposed
modifications are also appropriate for a general software implementation.
Considering only this subset, the C implementation of the modified method is
approximately 2.2 times faster than the original method, as it was presented in
the original graph cut-based thresholding paper. Furthermore, the FPGA-based
implementation is designed to be 70-100 times faster than the software
implementation, depending on the image used.
VLSI IEEE-2012 Titles
5. Background subtraction algorithm for moving object detection in
FPGA
Abstract
Currently, both the market and the academic communities have required
applications based on image and video processing with several real-time
constraints. On the other hand, detection of moving objects is a very important
task in mobile robotics and surveillance applications. In order to achieve an
alternative design that allows for rapid development of real time motion
detection systems, this paper proposes a hardware architecture for motion
detection based on the background subtraction algorithm, which is implemented
on FPGAs (Field Programmable Gate Arrays). For achieving this, the following
steps are executed: (a) a background image (in gray-level format) is stored in an
external SRAM memory, (b) a low-pass filter is applied to both the stored and
current images, (c) a subtraction operation between both images is obtained,
and (d) a morphological filter is applied over the resulting image. Afterward,
the gravity center of the object is calculated and sent to a PC (via RS-232
interface). Both the practical results of the motion detection system and
synthesis results have demonstrated the feasibility of FPGAs for implementing
the proposed algorithms on an FPGA based hardware platform. The
implemented system provides one processed pixel per FPGA's clock cycle
(after the latency time) and speed-ups the software implementation (using the
real-time xPC Target OS from MathWorks) by a factor of 32.
VLSI IEEE-2012 Titles
6. Efficient FPGA implementation of steerable Gaussian smoothers
Abstract
Smoothing filters have been extensively used in image and video analysis. In
particular, directional smoothers have been employed in motion analysis, edge
detection, line parameter estimation, and texture analysis. Such applications
often necessitate the use of several directional filters oriented at different
angles. However, applying a large number of filters commonly requires a
significant amount of computing resources. In such cases, real-time
performance may be possibly achieved through utilization of hardware devices
having parallel processing capabilities. Additionally, techniques can take
advantage of the inherent properties of certain smoothing filters. Such a
property is steerability, which implies that the outputs of several filtering
operations can be linearly combined in order to produce the output of a
directional filter at an arbitrary orientation. Although several efficient FPGA
implementations of the convolution operation have been presented in the
literature for non-separable and separable, research on steerable filter
implementations on FPGA is limited. In this paper, steerable Gaussian
smoothers are implemented on an FPGA platform. The technique is compared
with a software-based implementation. Performance comparisons indicate that
the FPGA technique provides significant speed-up factor of at least ~6, utilizing
only a small percentage of the FPGA resources.
VLSI IEEE-2012 Titles
7. An FPGA-Based Hardware Implementation of Configurable Pixel-
Level Color Image Fusion
Abstract
Image fusion has attracted a lot of interest in recent years. As a result, different
fusion methods have been proposed mainly in the fields of remote sensing and
computer (e.g., night) vision, while hardware implementations have been also
presented to tackle real-time processing in different application domains. In this
paper, a linear pixel-level fusion method is employed and implemented on a
field-programmable-gate-array-based hardware system that is suitable for
remotely sensed data. Our work incorporates a fusion technique (called VTVA)
that is a linear transformation based on the Cholesky decomposition of the
covariance matrix of the source data. The circuit is composed of different
modules, including covariance estimation, Cholesky decomposition, and
transformation ones. The resulted compact hardware design can be
characterized as a linear configurable implementation since the color properties
of the final fused color can be selected by the user in a way of controlling the
resulting correlation between color components.
8. Implementation of image reconstruction algorithm using
compressive sensing in FPGA
Abstract
Compressive Sensing (CS) is a technique that suggests the possibility of
reconstruction of a signal vector using much smaller linear measurements than
its dimension. Sparse signals are acquired in vectors using sensing matrices. If
the signals are sparse enough the original signal can be reconstructed
VLSI IEEE-2012 Titles
successfully. In CS applications while the signal can be acquired using basic
methods, in reconstructing the signal using incomplete data sets high processing
power and complex statistical computations are required. In this research OMP
(Orthogonal Matching Pursuit) which is a faster and more hardware-
implementable reconstruction algorithm among other methods is used. OMP
algorithm is implemented on a Virtex-6 type FPGA (Field Programmable Gate
Array). With various optimizations the designed system yielded at least
thousand times faster results than CPU (Central Processing Unit) and GPU
(Graphics Processing Unit) applications.
9. A hardware acceleration of a real time video processing
Abstract
This paper presents a method based on Edge histogram descriptor to accelerate
shot cut detector algorithm for real-time applications. In fact, before any content-
based manipulations, the hierarchical structure of video must be determined and
software pure solution is not suitable for this application due of constraints
imposed by this algorithm. In this context we have used a Field Programmable
Gate Array (FPGA) integrated architecture to accelerate this treatment.
10.A non linear equation based cryptosystem for image encryption and
decryption
Abstract
In this paper a new approach for image encryption and decryption using chaotic
map and a non linear equation known as BB equation is described. Chaotic maps
have been widely used in data encryption. Various chaos map based encryption
VLSI IEEE-2012 Titles
and decryption algorithms are used but are found to be insecure. Hence a new
method is implemented based on BB (Brahmagupta-Bhaskara) equation which is
combined with chaos to give a non linear dependency and thus improved security.
VLSI architecture for the proposed algorithm is designed and realized using Xilinx
ISE VLSI software for hardware implementation.
VLSI IEEE-2012 Titles
DSP
1. Efficient VLSI implementation of soft-input soft-output fixed-
complexity sphere decoder
Abstract
Fixed-complexity sphere decoder (FSD) is one of the most promising techniques
for the implementation of multiple-input multiple-output (MIMO) detection, with
relevant advantages in terms of constant throughput and high flexibility of parallel
architecture. The reported works on FSD are mainly based on software level
simulations and a few details have been provided on hardware implementation.
The authors present the study based on a four-nodes-per-cycle parallel FSD
architecture with several examples of VLSI implementation in 4×4 systems with
both 16-quadrature amplitude modulation (QAM) and 64-QAM modulation and
both real and complex signal models. The implementation aspects and details of
the architecture are analysed in order to provide a variety of performance-
complexity trade-offs. The authors also provide a parallel implementation of log-
likelihood-ratio (LLR) generator with optimised algorithm to enhance the proposed
FSD architecture to be a soft-input soft-output (SISO) MIMO detector. To the
authors best knowledge, this is the first complete VLSI implementation of an FSD
based SISO MIMO detector. The implementation results show that the proposed
SISO FSD architecture is highly efficient and flexible, making it very suitable for
real applications.
VLSI IEEE-2012 Titles
2. Lossy Compression of Discrete Sources via the Viterbi Algorithm
Abstract
We present a new lossy compressor for finite-alphabet sources. For coding a
sequence xn, the encoder starts by assigning a certain cost to each possible
reconstruction sequence. It then finds the one that minimizes this cost and
describes it losslessly to the decoder via a universal lossless compressor. The cost
of each sequence is a linear combination of its distance from the sequence xn and a
linear function of its kth order empirical distribution. The structure of the cost
function allows the encoder to employ the Viterbi algorithm to find the sequence
with minimum cost. We identify a choice of the coefficients used in the cost
function which ensures that the algorithm universally achieves the optimum rate-
distortion performance for any stationary ergodic source, in the limit of large ,
provided that increases as o(log n). Iterative techniques for approximating the
coefficients, which alleviate the computational burden of finding the optimal
coefficients, are proposed and studied.
3. FPGA implementation of IEEE 802.15.3c receiver
Abstract
This paper presents the implementation of the OFDM demodulator and the Viterbi
decoder, proposed as part of a wireless High Definition video receiver to be
integrated in an FPGA. These blocks were implemented in a Xilinx Virtex-6
FPGA. The complete system was previously modeled and simulated using
VLSI IEEE-2012 Titles
MATLAB/Simulink to extract important hardware characteristics for the FPGA
implementation.
4. A Network-on-Chip-based turbo/LDPC decoder architecture
Abstract
The current convergence process in wireless technologies demands for strong
efforts in the conceiving of highly flexible and interoperable equipments. This
contribution focuses on one of the most important baseband processing units in
wireless receivers, the forward error correction unit, and proposes a Network-on-
Chip (NoC) based approach to the design of multi-standard decoders. High level
modeling is exploited to drive the NoC optimization for a given set of both turbo
and Low-Density-Parity-Check (LDPC) codes to be supported. Moreover,
synthesis results prove that the proposed approach can offer a fully compliant
WiMAX decoder, supporting the whole set of turbo and LDPC codes with higher
throughput and an occupied area comparable or lower than previously reported
flexible implementations. In particular, the mentioned design case achieves a
worst-case throughput higher than 70 Mb/s at the area cost of 3.17 mm2 on a 90 nm
CMOS technology.
VLSI IEEE-2012 Titles
5. Design and implementation of an optical OFDM baseband receiver in
FPGA
Abstract
In this paper, a baseband receiver design and its FPGA implementation for an
OOFDM system aimed at the NG-PON (passive optical network) applications are
presented. A low cost IMDD (intensity modulation, direct detection) architecture is
adopted and baseband DSP measures are employed to compensate various optical
impairments. Targeting a 4GSps throughput rate, an 8-way parallel architecture is
developed to perform the synchronization, FFT and equalization each with massive
parallelism. A real valued FFT module taking advantage of the Hermitian spectrum
is also developed to reduce the circuit complexity significantly. The simulation
results show the proposed baseband receiver is capable of achieving an 8Gbps
(effective) transmission bandwidth for 64-QAM coded OFDM symbols over a
25km long single mode fiber network. The uncoded BER reaches 10-3 when the
received optical power is -16dBm. Due to the speed and resource limitation, the
FPGA implementation obtains a fully functional but speed degraded system. The
maximum working frequency is 250 MHz, which is one half of the 500MHz
required for real time processing. The design occupies 21,423 logic slices and 56
embedded multiplier modules.
6. VLSI Architecture for a Reconfigurable Spectrally Efficient FDM
Baseband Transmitter
Abstract
Spectrally efficient FDM (SEFDM) systems employ non-orthogonal overlapped
carriers to improve spectral efficiency for future communication systems. One of
VLSI IEEE-2012 Titles
the key research challenges for SEFDM systems is to demonstrate efficient
hardware implementations for transmitters and receivers. Focusing on transmitters,
this paper explains the SEFDM concept and examines the complexity of published
modulation algorithms, with particular consideration to implementation issues. We
then present two new variants of a digital baseband transmitter architecture for
SEFDM, based on a modulation algorithm which employs the discrete Fourier
transform (DFT) implemented efficiently using the fast Fourier transform (FFT).
The algorithm requires multiple FFTs, which can be configured either as parallel
transforms, which is optimal for throughput or using a multi-stream FFT
architecture, for reduced circuit area. We propose a simplified approach to IFFT
pruning for pipeline architectures, based on a token-flow control style, specifically
optimized for the SEFDM application. Reconfigurable implementations for
different bandwidth compression ratios, including conventional OFDM, are easily
derived from the proposed implementations. The SEFDM transmitters have been
synthesized, placed and routed in a commercial 32 nm CMOS process technology
and also verified in FPGA. We report circuit area and simulated power dissipation
figures, which confirm the feasibility of SEFDM transmitters.
7. A Nonbinary LDPC Decoder Architecture With Adaptive Message
Control
Abstract
A new decoder architecture for nonbinary low-density paritycheck (LDPC) codes
is presented in this paper to reduce the hardware operational complexity in VLSI
implementations. The low decoding complexity is achieved by employing adaptive
message control (AMC) that dynamically trims the message length of belief
information to reduce the amount of memory accesses and arithmetic operations.
VLSI IEEE-2012 Titles
To implement the proposed AMC, we develop the architecture of a horizontal
sequential nonbinary LDPC decoder. Key components in the architecture have
been designed with the consideration of variable message lengths to leverage the
benefit of the proposed AMC. Simulation results demonstrate that the proposed
nonbinary LDPC decoder architecture can significantly reduce hardware
operations and power consumption as compared with existing work with negligible
performance degradation.
8. Design and implementation of low power FFT/IFFT processor for
wireless communication
Abstract
Fast Fourier transform (FFT) processing is one of the key procedure in popular
orthogonal frequency division multiplexing (OFDM) communication systems.
Structured pipeline architectures, low power consumption, high speed and reduced
chip area are the main concerns in this VLSI implementation. In this paper, the
efficient implementation of FFT/IFFT processor for OFDM applications is
presented. The processor can be used in various OFDM-based communication
systems, such as Worldwide Interoperability for Microwave access (Wi-Max),
digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T).
We adopt single-path delay feedback architecture. To eliminate the read only
memories (ROM's) used to store the twiddle factors, this proposed architecture
applies a reconfigurable complex multiplier to achieve a ROM-less FFT/IFFT
processor and to reduce the truncation error we adopt the fixed width modified
booth multiplier. The three processing elements (PE's), delay-line (DL) buffers are
used for computing IFFT. Thus we consume the low power, lower hardware cost,
high efficiency and reduced chip size.
VLSI IEEE-2012 Titles
9. High-Speed Low-Power Viterbi Decoder Design for TCM Decoders
Abstract
High-speed, low-power design of Viterbi decoders for trellis coded modulation
(TCM) systems is presented in this paper. It is well known that the Viterbi
decoder (VD) is the dominant module determining the overall power
consumption of TCM decoders. We propose a pre-computation architecture
incorporated with T-algorithm for VD, which can effectively reduce the power
consumption without degrading the decoding speed much. A general solution to
derive the optimal pre-computation steps is also given in the paper.
Implementation result of a VD for a rate-3/4 convolutional code used in a TCM
system shows that compared with the full trellis VD, the precomputation
architecture reduces the power consumption by as much as 70% without
performance loss, while the degradation in clock speed is negligible.
VLSI IEEE-2012 Titles
Network On Chip1. Design methodology for on-chip bus architectures using system-on-chip
network protocol
Abstract
As the number of IP cores that can be integrated into a single chip has increased
significantly in recent years, various types of multi-layered bus architectures are
now being used. However, a reckless use of bus layers may lead to an excessive
number of wires and low-resource utilization. To reduce such waste, researches
have studied automated on-chip bus design methods for optimal architecture
synthesis. This study expands the existing studies in two aspects. First, it considers
all possible topologies and redefines the existing exploration problem, whereas the
existing studies assume only a few types of topologies. Second, the study includes
an exploration process based on a new on-chip bus protocol, system-on-chip
network protocol (SNP), as well as processes based on existing protocols to solve
the redefined problem. After the time complexity is investigated, it is found that
the problem is NP-hard. Accordingly, this study proposes fast search algorithms
that can be applied to each of the exploration steps. The proposed algorithms are
implemented as a software program of exploration. The overall reduction ratio of
the time complexity reaches about three millionths, with a maximal 16% increase
VLSI IEEE-2012 Titles
in communication time (CT). Considering todays design life cycle, this seems to be
a good trade-off.
2. Configuring algorithm for reconfigurable Network-on-Chip
architecture
Abstract
With the challenge that a larger number of cores will be integrated on one single
chip, Network-on-Chip (NoC) has been the popular solution gradually. And
recently, researchers have focused on improving the performance of NoC to
achieve well-performed chips. In this paper, we will propose a configuring
algorithm based on one reconfigurable NoC architecture to design application-
specific NoC. The reconfigurable NoC architecture decreases the design
complexity and makes NoC design more flexible comparing to the topology-
generation-floorplanning scheme and mapping scheme respectively. Besides, our
configuring algorithm aims at optimized networks with better performance. For
one specific application, we can choose the reconfigurable NoC architecture with
suitable size and configure it according to the communication relationship to make
sure that the final network is optimized. A cycle-accurate simulator is used to carry
out simulations for three networks designed by our scheme and two other methods
for the same application under the same environment. The results turn out that our
network performs better.
VLSI IEEE-2012 Titles
3. A Novel Encoding Scheme for Low Power in Network on Chip Links
Abstract
Dynamic power dissipation in interconnects is a major contributor to power
consumption in Network on Chips (NoCs). This is mainly due to two factors, self
switching activity of the particular link and coupling switching activity among
adjacent links. Two novel techniques are proposed to reduce power consumption
due to switching transition and cross talk. First technique reorders the data in such
a way that switching transition is brought down. In the second technique, it is
ensured that power consumption due to cross coupling activity is reduced. An end
to end encoding scheme facilitating two stage coding to reduce power consumption
in wormhole routed network on chip is designed using the proposed power
reduction techniques. Encoder and Decoder exhibiting the proposed scheme have
been described in RTL level in Verilog HDL, synthesized and mapped into
UMC180 nm technology library. It has been observed that the proposed technique
(TSC) offers an average reduction in dynamic power consumption of 17.34%.
Proposed scheme was compared with existing techniques and observations
concluded that there was not much degradation in area, speed and static power
dissipation. Power reduction when subjected to different kinds of data streams was
analyzed and results indicate that proposed scheme offers uniform power reduction
irrespective of the nature of data stream unlike the existing techniques
4. Dynamic buffer management to improve the performance of fault
tolerance adaptive Network-On-Chip applications
VLSI IEEE-2012 Titles
Abstract
Networks-On-Chip are developed with a trade-off between latency and power
dissipation defined at design time. But, if the communication pattern is changed,
decisions taken at design time (say buffer size) may result in large area and power
consumption or higher latency. Using large buffers to guarantee performance leads
to excessive power dissipation. Small buffers reduce power consumption but result
in increase in latency. The purpose of the proposed work is to design a
heterogeneous router where the buffer slots are dynamically assigned to improve
the performance, under different communication needs in fault-tolerant adaptive
NoC applications. In the proposed router, buffer slots can dynamically be re-
allocated for various applications to improve performance. Reallocation is based
on the number of hotspots using EBLA (Extended Buffer Loan Algorithm). By
introducing oversized IPs (OIPs), regular mesh-based NoC architecture may be
destroyed. Resulting mesh-based NoC becomes irregular and needs new routing
algorithms to solve routing problems in case of faulty links. A NoC with irregular
2D mesh topology is considered and an fault tolerant adaptive routing algorithm is
used.
5. Congestion mitigation using flexible router architecture for
Network-on-Chip
Abstract
An important topic in Network-on-Chip (NoC) design is the tradeoff between area
and performance. Some techniques tend to increase the number of buffers to
improve performance. However this method increases the chip area and so does the
power consumption. In this paper we introduce a new flexible router architecture
VLSI IEEE-2012 Titles
that can improve the performance of the overall network using the same amount of
buffering available but in an efficient way. Therefore there is no need to increase
the size of buffers or to use extra virtual channels (VCs) which have high power
and area overheads or complex logic. If there is a request to a busy buffer the
router will store the incoming packet in any other suitable free buffer in the router.
The Flexible router shows an increase in performance in terms of increasing the
saturation rate for Hotspot, Uniform, and Nearest-Neighbor traffics, especially
Hotspot with 11.4% increase. Discussion about area overhead over a standard Base
router and the analysis of arriving unordered packets (side-effect) are also
presented.
6. Performance evaluation of a flow control algorithm for Network-
on-Chip
Abstract
Network-on-chip (NoC) has been proposed for SoC (System-on-Chip) as an
alternative to on-chip bus-based interconnects to achieve better performance and
lower energy consumption. Several approaches have been proposed to deal with
NoCs design and can be classified into two main categories, design-time
approaches and run-time approaches. Design-time approaches are generally
tailored for an application domain or a specific application by providing a
customized NoC. All parameters, such as routing and switching schemes, are
defined at design time. Run-time approaches, however, provide techniques that
allow a NoC to continuously adapt its structure and its behavior (i.e., at runtime).
In this paper, performance evaluation of a flow control algorithm for congestion
avoidance in NoCs is presented. This algorithm allows NoC elements to
VLSI IEEE-2012 Titles
dynamically adjust their inflow by using a feedback control-based mechanism.
Analytical and simulation results are reported to show the viability of this
mechanism for congestion avoidance in NoCs.
7. Low-area boundary BIST architecture for mesh-like network-on-
chip
Abstract
Current paper proposes a Built-In Self-Test (BIST) architecture for targeting the
routing infrastructure of mesh-like NoCs from their boundaries. The architecture
contains a counter and a Finite State Machine (FSM) implementing the test
configurations. Test data is generated and test responses compacted by a dedicated
hardware structure requiring very little silicon area. The advantages of this new
boundary BIST concept with respect to existing methods is that costly data
wrappers in the NoC network are unnecessary, and thus, area and performance
penalties are avoided. We have also improved previously developed test
configurations. Experiments show that up to two orders of magnitude gains in the
speed of testing are achieved using the new method for large NoCs.
8. Effect of Application Mapping on Network-on-Chip Performance
Abstract
Network-on-Chip (NoC) is a developing and promising on-chip communication
paradigm that improves scalability and performance of System-on-Chips. NoC
design flow contains many problems from different areas, for example networking,
embedded design and computer architecture. Application mapping is one of these
VLSI IEEE-2012 Titles
problems, which is well studied in literature but generally considered as a
communication energy minimization problem. The present study discusses the
effect of application mapping on network parameters such as average queuing
delay or packet loss rates of routers. On the other hand, self similarity is a
phenomenon that is used to characterize Ethernet and/or wide area network traffic,
as well as most of on-chip network traffic. The main concern of this study is to
analyze the effect of application mapping on network related parameters by using
an on-chip traffic characterization that contains self similarity. The results of our
computational study show that mapping of cores may have a significant
degenerative effect on network performance, and so adding network related terms
to application mapping problem may improve the overall on-chip network
performance considerably
9. AdNoC: Runtime Adaptive Network-on-Chip Architecture
Abstract
Networsk-on-chip (NoCs) have emerged as a promising on-chip interconnect for
future multi/many-core architectures as NoCs are able to scale communication
links with the growing number of cores. State-of-the-art NoC designs rely mainly
on a static network configuration using fixed routing algorithms and buffer
placements. These approaches are not effective in dealing with hard-to-predict
system behavior, for instance due to user behavior or varying workloads, since in
order for static NoCs to cover these scenarios, they would have to be designed for
worst case scenarios. In this paper, we address these problems with a runtime
adaptive network-on-chip (AdNoC). Focusing on the architecture-level adaptation,
we present an adaptive route allocation algorithm which provides a required level
VLSI IEEE-2012 Titles
of QoS (guaranteed bandwidth) coupled with an adaptive buffer assignment
scheme which reassigns buffer blocks on-demand. Furthermore, the adaptivity
requires a comprehensive, hardly intrusive, runtime observability infrastructure,
i.e., using monitoring components, in order to gather data on the system state. The
area overhead introduced by the adaptive scheme can be traded off against the
flexibility gained. Moreover, the area overhead is also reduced by resource
multiplexing due to the on-demand buffer assignment at each output port (we
achieved on an average 42% buffer saving in our experiments). We demonstrate
the advantage by using various digital media applications and compare our
approach to the state-of-the-art static NoC architectures e.g., Xpipe, QNoC, and
Æthereal.
10. Fine-Grained Bandwidth Adaptivity in Networks-on-Chip
Using Bidirectional Channels
Abstract
Networks-on-Chip (NoC) serve as efficient and scalable communication substrates
for many-core architectures. Currently, the bandwidth provided in NoCs is over
provisioned for their typical usage case. In real-world multi-core applications, less
than 5% of channels are utilized on average. Large bandwidth resources serve to
keep network latency low during periods of peak communication demands.
Increasing the average channel utilization through narrower channels could
improve the efficiency of NoCs in terms of area and power, however, in current
NoC architectures this degrades overall system performance. Based on thorough
analysis of the dynamic behaviour of real workloads, we design a novel NoC
architecture that adapts to changing application demands. Our architecture uses
VLSI IEEE-2012 Titles
fine-grained bandwidth-adaptive bidirectional channels to improve channel
utilization without negatively affecting network latency. Running PARSEC
benchmarks on a cycle-accurate full-system simulator, we show that fine-grained
bandwidth adaptivity can save up to 75% of channel resources while achieving
92% of overall system performance compared to the baseline network, no
performance is sacrificed in our network design configured with 50% of the
channel resources used in the baseline.
11. Active Memory Processor for Network-on-Chip-Based
Architecture
Abstract
Memory-intensive operations and their memory access latency are often the
performance bottleneck in parallel applications. In this paper, we investigate the
concept of active memory operation which is an active data processing operation
performed on the memory side. Utilizing the active memory operation, we can
replace multiple transactions of memory accesses over the on-chip network and
related computations on the processor side with a smaller number of high-level
transactions and computations on the memory side. To realize the concept, we
have designed a special-purpose processor called active memory processor which
is tightly coupled with the memory and executes the active memory operations. In
our case studies, we have applied the concept to five real-world applications
(parallelized JPEG, FFT, text indexing for data mining, histogram, and eikonal
equation solver) running on a 36--tile architecture with 64 cores and four memory
tiles and found that the proposed approach can improve performance by 20.5~
259.3 percent.
VLSI IEEE-2012 Titles
12. Implementation of CDMA technique for Network-on-Chip
Abstract
A Code-Division Multiple Access (CDMA) based on-chip communication network
is proposed in this paper. The proposed design features a novel encoding and
decoding scheme for CDMA transmission which improves area, latency and power
dissipation of the network on Chip (NoC). The orthogonal and balance properties
of Walsh codes are used for the routing of data between the resources on the
network. The proposed CDMA encoding and decoding schemes are compared with
the conventional schemes. The overall area required to implement the proposed
CDMA NoC design is reduced by 54%. The design decreases the latency of the
network by 48.2%. The total power consumption required to achieve the proposed
design is decreased by 54.8%.
Cryptography
1. A novel architecture for VLSI implementation of RSA cryptosystem
Abstract
The RSA system is widely employed in networking applications and achieves good
performance and high security. In this paper, we use Verilog to implement a 16-bit
RSA block cipher system. The whole implementation includes three parts: key
generation, encryption and decryption process. The key generation stage aims to
generate a pair of public key and private key, and then the private key will be
distributed to receiver according to certain key distribution schemes. The memory
VLSI IEEE-2012 Titles
usage and overhead associated with the key generation is eliminated by the
proposed system model. The cipher text can be decrypted at receiver side by RSA
secret key. These are simulated in Xilinx and hardware is synthesized using RTL
Compiler. The existing and proposed models are then analyzed for performance
measures using Synopsis-Design Vision. Net list generated from RTL Compiler
will be used to generate IC layout.
2. VLSI Implementation of Advanced Encryption Standard
Abstract
Information Security is always the primary concern for a user. Information
Security is required to save data There are number of approaches as well number
of available software's to achieve the information security. The proposed work is
representing one of such cryptographic technique called AES. The proposed work
is the implementation of AES for a hardware using the VHDL. For the simulation
and implementation we are using Active HDL software. The system will accept the
input and form the encoding and later the decoding process is defined. The results
are presented in the waveforms.
3. Highly secured high throughput VLSI architecture for AES algorithm
Abstract
This paper provides an efficient VLSI architecture to increase the throughput and
security of the Advanced Encryption Standard (AES) Algorithm. The existing
architecture provide the Look up Table technique for the Subbytes and inverse
Subbytes transformation used in AES algorithm, our proposed technique uses
combinational circuit and pipelining technique which increase the throughput and
VLSI IEEE-2012 Titles
reduce the delay. This design proposes a new technique for implementing the S-
box, which decides the speed and power of AES architecture and the basic
components of this architecture is made completely fault detectable by using
pseudo-nMOS technology and thereby increases the security of this system. This
AES design was modeled using Verilog HDL and synthesized using TSMC's 90
nm standard cell library with RTL Compiler, and physical design implementation
was done using SOC Encounter and thereby achieved a through put of 58.18 Gbps
after detailed routing. The basic security of the system is validated by using
Cadence Virtuoso in the transistor level design.
4. A non linear equation based cryptosystem for image encryption and
decryption
Abstract
In this paper a new approach for image encryption and decryption using chaotic
map and a non linear equation known as BB equation is described. Chaotic maps
have been widely used in data encryption. Various chaos map based encryption
and decryption algorithms are used but are found to be insecure. Hence a new
method is implemented based on BB (Brahmagupta-Bhaskara) equation which is
combined with chaos to give a non linear dependency and thus improved security.
VLSI architecture for the proposed algorithm is designed and realized using Xilinx
ISE VLSI software for hardware implementation.
VLSI IEEE-2012 Titles
Low power VLSI
1. Routing-efficient implementation of an internal-response-based BIST
architecture
Abstract
Recently internal-response-based BIST techniques are proposed. By using internal
circuit responses to directly generate test patterns, these techniques can
significantly reduce or even eliminate storage requirement for test data. For these
techniques, appropriate routing of the circuit internal nets to the BIST circuitry is
crucial for minimizing the required area overhead and the induced performance
impact. In this paper, an efficient net sharing algorithm together with special
response decompressor hardware is proposed to minimize the total number of
required internal nets for an internal-response-based BIST scheme. Experimental
results show that on average 3.24% of nets and 2.83% area overhead of the
VLSI IEEE-2012 Titles
response decompressors are sufficient to achieve complete fault coverage for
ISCAS'85 circuits.
2. A high throughput sort free VLSI architecture for wireless applications
Abstract
For high data rate Multiple Input Multiple Output technology is used in wireless
communications. The use of multiple antennas at both transmitter and receiver
(MIMO) significantly increases the capacity and spectral efficiency of wireless
systems. This paper presents a Field Programmable Gate Array (FPGA)
implementation for a 4 × 4 breadth first K-best MIMO decoder using a 64
Quadrature Amplitude Modulation (QAM) scheme. A novel sort free approach to
path extension, as well as, quantized metrics result in a high throughput, low power
and area. Finally, VLSI architectural tradeoffs are explored for a synthesized using
synopsys the power analysis, throughput analysis in 120 nm technology. The
power needed is 20.0025 μW.
3. Design and implementation of high-performance high-valency ling
adders
Abstract
Parallel prefix adders are used for efficient VLSI implementation of binary number
additions. Ling architecture offers a faster carry computation stage compared to the
conventional parallel prefix adders. Recently, Jackson and Talwar proposed a new
method to factorize Ling adders, which helps to reduce the complexity as well as
the delay of the adder further. This paper discusses the design and implementation
details for such lower complexity, fast parallel prefix adders based on Ling theory
VLSI IEEE-2012 Titles
of factorization. In particular, valency or radix, the number of inputs to a single
node, is explored as a design parameter. Several low and high valency adders are
implemented in 65 nm CMOS technology. Experimental results show that the
high-valency Ling adders have superior area×delay characteristics over previously
reported Ling-based or non-Ling adders for the same input size. Moreover, our 20-
bit high valency adder has a better area×delay measurement than the previously-
published 16-bit adders.
4. Design of 64-bit low power parallel prefix VLSI adder for high speed
arithmetic circuits
Abstract
The addition of two binary numbers is the basic and most often used arithmetic
operation on microprocessors, digital signal processors and data processing
application specific integrated circuits. Parallel prefix adder is a general technique
for speeding up binary addition. This method implements logic functions which
determine whether groups of bits will generate or propagate a carry. The proposed
64-bit adder is designed using four different types prefix cell operators, even-dot
cells, odd-dot cells, even-semi-dot cells and odd-semi-dot cells; it offers robust
adder solutions typically used for low power and high-performance design
application needs. The comparison can be made with various input ranges of
Parallel Prefix adders in terms power, number of transistor, number of nodes.
Tanner EDA tool was used for simulating the parallel prefix adder designs in the
250nm technologies.
VLSI IEEE-2012 Titles
5. Low-power dissipation using FPGA architecture
Abstract
Power optimization is the process of generating the best design in digital VLSI
circuits without violating design specifications. In this paper, the existing FPGA
routing switch is compared with the proposed low-power FPGA routing circuitry.
The experimental results show that the power dissipation in the proposed technique
is less than the existing FPGA design.
4G Techonlogy
1. Design and implementation of an optical OFDM baseband receiver
in FPGA
Abstract
In this paper, a baseband receiver design and its FPGA implementation for an
OOFDM system aimed at the NG-PON (passive optical network) applications are
presented. A low cost IMDD (intensity modulation, direct detection) architecture is
adopted and baseband DSP measures are employed to compensate various optical
impairments. Targeting a 4GSps throughput rate, an 8-way parallel architecture is
developed to perform the synchronization, FFT and equalization each with massive
parallelism. A real valued FFT module taking advantage of the Hermitian spectrum
is also developed to reduce the circuit complexity significantly. The simulation
VLSI IEEE-2012 Titles
results show the proposed baseband receiver is capable of achieving an 8Gbps
(effective) transmission bandwidth for 64-QAM coded OFDM symbols over a
25km long single mode fiber network. The uncoded BER reaches 10-3 when the
received optical power is -16dBm. Due to the speed and resource limitation, the
FPGA implementation obtains a fully functional but speed degraded system. The
maximum working frequency is 250 MHz, which is one half of the 500MHz
required for real time processing. The design occupies 21,423 logic slices and 56
embedded multiplier modules.
2. Performance analysis of Multiple carrier code division multiple
access system
Abstract
To achieve high data rate Multi-Carrier Code Division Multiple Access (MC-
CDMA) is one suitable choice for next generation wireless communication system.
MC-CDMA is the combination of CDMA and OFDM schemes, resulting into
getting the advantages of both the schemes. Capacity planning is one of the major
issues in designing of wireless communication system. In wireless communication
system capacity planning greatly depends on bit error rate (BER). This study
investigates the BER performance of MC-CDMA system over Rayleigh fading
channel for different length of spreading code. Walsh-Hadamard (W-H) code has
been chosen for spreading, which reduces the multiple access interference (MAI)
in downlink due to its orthogonal property. Simulation results show the
VLSI IEEE-2012 Titles
improvement in BER performance with the increasing length of the spreading
code. Also the comparative study of BER performance over different modulation
techniques show that minimum BER obtained with the BPSK modulation
technique.