# VLSI-Oriented Lossy Image Compression Approach using DA-Based 2D-Discrete Wavelet

Devangkumar Shah<sup>1</sup> and Chandresh Vithlani<sup>2</sup> <sup>1</sup>Electronics and Communication Department, RK University, India <sup>2</sup>Electronics and Communication Department, Government Engineering College, India

Abstract: In this paper, we introduced a Discrete Wavelet Transform (DWT) based VLSI-oriented lossy image compression approach, widely used as the core of digital image compression. Here, Distributed Arithmetic (DA) technique is applied to determine the wavelet coefficients, so that the number of arithmetic operation can be reduced substantially. As well, the compression rate is enhanced with the aid of introducing RW block that blocks some of the coefficients obtained from the high pass filter to zero. Subsequently, Differential Pulse-Code Modulation (DPCM) and huffman-encoding are applied to acquire the binary sequence of the image. The functional simulation of each module is presented as well as the performance of each module is widely analyzed with gate required, clock cycles required, power, processing rate, and processing time. From the analysis, it is found that the DCM module requires more gates to do the transformation process compared to other modules. Eventually, the proposed compression approach is compared with the existing methods in terms of processor area and power. Comparative result shows that the proposed method offers good performance in power-efficiency corresponding to 0.328 mW/chip than the prior methods.

Keywords: Image compression, DWT, DA, DPCM, huffman-coding.

Received November 11, 2011; accepted May 22, 2012; published online January 29, 2013

## **1. Introduction**

One major problem that happens during the transmission and storage of raw images is the necessity for giant amount of disk space. Thus, there is an ever increasing need for a very potent and robust technique for compression of such images. A better compression technique that is faster, memory efficient, and simple can definitely satisfies the requirements of the user [34]. Generally, image compression refers to the compression of data on digital images. Principally, its goal is to diminish the redundancy of the image data in order to store or transmit the data in an effective manner as well as to provide a best image quality at a given bit-rate or compression rate. Two types of compression techniques often employed for image compression are: Lossless and Lossy [35]. The lossy compression that produces indiscernible differences can be called as visually lossless [23, 25]. In lossless image compression, the compression ratio obtained is extremely low and so, considerable resources cannot be saved by using such image compression. The image compression technique with compromising resultant image quality, without much notice of the viewer is the lossy image compression. The loss in the image quality increases with the percentage of the compression, hence results in saving the resources [23, 25]. In recent years, the wavelet theory and its application in image

compression has progressed rapidly [7, 22, 23, 25, 30]. The field of wavelets is still sufficiently new and further progressions will continue to be reported in several areas. One of the most imperative processing components of image compression is wavelet transform [15].

Discrete Wavelet Transform (DWT), quantization, and entropy encoding are the three main sequential steps followed during compression. After performing a preprocessing process, each component is analyzed individually by an appropriate discrete wavelet transform [28]. Due to the emergence of JPEG 2000 standard, a substantial attention has been paid to the development of an efficient DWT system architecture. Field-Programmable Gate Array (FPGA) implementations can speed up the DWT by pipelining these operations. For real-time signal processing, several DWT based VLSI architectures have been designed and implemented [5, 29, 31]. For 1-D DWT, the architectures are classified into three types: convolution-based [8], lifting-based [1, 20, 25, 26] and B-spline-based [21]. The convolution-based method is used to implement two-channel filter banks directly. The lifting-based method exploits the relationship of low pass and high pass filters for saving multipliers and adders [14, 37] whereas, the third method i.e., Bspline can diminish the multipliers based on the B-**B-spline-based** spline factorization [21]. The

architectures offer less number of multipliers, while the lifting scheme fails to reduce the intricacy.

In wavelet image compression, the quantization, a lossy compression technique, is carried out after applying the wavelet, where a set of values is compressed to a single quantum value. When the innumerable discrete symbols in a given stream are diminished, the stream becomes more compressible. For e.g., it is feasible to diminish the file size of digital image by reducing the number of colors used to define such digital images. Certain applications perform Discrete Cosine Transform (DCT) data quantization in JPEG and DWT data quantization in JPEG-2000. After the quantization process, the quantized DWT coefficients are converted into sign-magnitude represented before entropy coding because of the intrinsic characteristics of the entropy encoding process. Normally, entropy coding can provide a much shorter image representation by means of short code words for probable images and longer code words for less probable images [13]. Entropy encoding, a lossless type of compression, is done on a certain image for making more competent storage. Normally, either 8 bits or 16 bits are necessary to store a pixel on a digital image. But, by means of efficient entropy encoding, a small number of bits are adequate to represent a pixel in an image and thus, this results in less memory usage to store or even transmit an image. Moreover, the Karhunen-Loeve theorem allows us to select the best basis for encoding in order to diminish the entropy and error for better image representation with efficient storage and transmission. As well, engineers have employed Shannon-Fano entropy, Huffman coding, Kolmogorov entropy, and arithmetic coding in various applications [36].

In our study, we have proposed a wavelet-based image compression algorithm via a popular Distributed Arithmetic (DA) technique. Here, the diminution of wavelet coefficients is done using the  $R_w$  block in each level of computation in order to increase the compression rate. Then, Differential pulse-code modulation (DPCM) is applied as quantization technique to abbreviate the range of wavelet coefficients. Eventually, the transformed wavelet coefficients are given to huffman-encoder that is designed by combining the lowest probable symbols in such a way that, the images will get compressed. The main contribution of the paper is discussed as follows:

- To improve the compression rate, we have included one more block, *R<sub>w</sub>* in wavelet computation.
- To make low power consumption, we utilize DAbased wavelet technique.
- To improve the significance, the above two points are incorporated into the standard JPEG 2000 compression method along with DPCM as quantization method and Huffman coding as entropy encoder.

- To evaluate the performance, the parameters such as, gate required, clock cycles required, power, processing rate, and processing time are employed.
- To prove the competency, we have conducted a comparative analysis over the prior methods in power-efficiency.

The paper is organized as follows: The review of recent works is given in section 2, the overview of wavelets is discussed in section 3, the system architecture of the proposed image compression approach is described in section 4, the functional simulation of the proposed approach is discussed in section 5, the performance analysis and comparative analysis of the proposed image compression approach are given in sections 6 and 7, respectively and finally, the conclusion is given in section 8.

# 2. Review of Related Works

In literature, a handful of low power architectures are presented for determining the wavelet co-efficient. Our proposed methodology mainly concentrates on the wavelet-based image compression. In prior works, low power architecture was employed to calculate the wavelet, so that the overall efficiency can be achievable. Some of the low power architecture presented in the literature for wavelet transform was often based on the lifting, distributed arithmetic, and spline. Huang et al. [18], have presented architecture for DWT based on B-spline factorization. The B-spline factorization mainly comprises two parts: 1). B-spline part and 2). Distributed part. The former has been constructed by means of direct implementation or Pascal implementation and the latter has introduced multipliers and has been implemented with the Type-1 or Type-2 polyphase decomposition. As the extent of the distributed part has been designed as small as possible, the proposed architectures have employed only a fewer multipliers than the prior arts, but it requires more adders. Thus, many adders have been implemented within the smaller area and most of the adders were low speed because they were not on the critical path. Three case studies, using the JPEG2000 default (9, 7) filter, the (6, 10) filter, and the (10, 18)filter have been provided to reveal the potency of the proposed architecture.

On the other hand, Liu et al. [27], have presented a VLSI architecture, which performs the line-based DWT by means of a lifting approach. The architecture comprises row processors, column processors, an intermediate buffer, and a control module. Row processor and Column processor work as the horizontal and vertical filters. respectively. Intermediate buffer contains five FIFOs to store the temporary results of horizontal filter. Control module has scheduled the output order to external memory. As compared to prior techniques, the proposed

architecture has parallelized all levels of wavelet transform to calculate multilevel DWT within one image transmission time. As well, a Decomposed Lifting Algorithm (DLA) has been introduced by Chao and Peng [9], where the image data has been processed in raster scan manner both in row processor and column processor. Theoretical analysis has revealed that the accuracy of DLA in terms of round-off noise and internal word-length was better than the other lifting-based algorithms. Also. line-based а architecture has been modeled to perform DLA based 2-D DWT with high performance and low memory by ignoring the implementation of data buffer. For an N\*N image, only 4-N internal memory has been utilized for 9/7 filter with output latency of 2N clock cycles. When compared to related 2-D DWT architectures, the size of on-chip memory and output latency have been reduced substantially under the same arithmetic cost, memory bandwidth, and timing constraint.

Cao et al. [6], have proposed a simple architecture for 9/7 DWT based on DA. This architecture has been designed by considering the periodicity and symmetry of DWT to increase the performance as well as to diminish the computational redundancy. The inner product of coefficient matrix of DWT has been distributed over the input by careful evaluation of input, output, and coefficient word lengths. The elements in the space domain have been processed by assigning the required computation using linear maps in the coefficient matrix. Also, the architecture has regular data flow, and low control complexity. The result was low hardware intricacy DWT processors for 9/7 transform that allows two times faster clock than the direct implementation. This design was very applicable for image compression systems such as JPEG2000 and MPEG4. Farahani and Eshghi [16], have proposed the design of the Discrete Wavelet Packet Transform with robust hardware acceleration. This design operates based on the word serial pipeline architecture and the parallel filter processing. A highpass filter and a low-pass filter have been employed simultaneously in each level for accelerating in the Discrete Wavelet Packet Transform. When compared with the design presented in [38], the proposed design using parallel filters has worked two times faster than the compared one. Using the internal multipliers of the FPGA, the architecture has been implemented and the results of these implementations for the various filter lengths have been presented. This high speed architecture was very applicable for on-line applications and can be implemented for the Direct Wavelet Packet Transform with any levels of tree.

Huang *et al.* [19], have presented a detailed study of VLSI architectures for the 1-D and 2-D DWT in several aspects, and also a three related architectures have been designed. The 1-D DWT and Inverse DWT (IDWT) architectures have been classified into three

categories: convolution-based, lifting based, and Bspline-based. These categories have been discussed in terms of hardware complexity, critical path, and registers. As for the 2-D DWT, the large amount of the frame memory access and the die area employed by the embedded internal buffer were the most important problems. Different external memory scan techniques have been applied for categorizing and analyzing the 2-D DWT architectures. The implementation problems of the internal buffer has also been discussed, and some real-life experiments have been performed to demonstrate that the area and power for the internal buffer were highly related to memory technology and working frequency, instead of the required memory size only. In addition to the analysis, the B-splinebased IDWT architecture and the overlapped stripebased scan technique have also been proposed. Eventually, they have developed an adaptable and efficient architecture for a one-level 2-D DWT, which utilizes several advantages of the presented analysis.

While analyzing the literature, several VLSI implementations are presented for image compression. Gupta et al. [17], have presented the VLSI design of a Block Coder (BC) system, which can process 21 mega pixels per second. For the Bit Plane Coder (BPC), a Concurrent Symbol Processing (CSP) algorithm has been used for processing all 4 sample locations within a stripe-column in a single clock cycle during a pass. The BPC has produced 1.21 Context Data (CxD) pairs per clock cycle. Moreover, an Arithmetic Coder (AC) has been developed that processes 2 CxDs/clock cycle. Also, architecture has been designed for an intermediate buffer in order to allow for an efficient coupling of the proposed BPC and AC modules. The BC chip has been implemented on TSMC 0.18 micrometer technology, which has occupied an area of  $1.6 \text{ mm}^2$ , with an equivalent gate count of 95,000 that includes 24576 memory bits. Its throughput was higher, and so JPEG2000 BC engine was efficient in handling both normal and causal modes of operation.

Rao and Latha [35], have introduced a reversible blockade transform coding based hybrid image compression method. This proposed method, implemented over the Regions of Interest (ROIs), was based on the selection of the coefficients that belong to diverse transforms. The method allows: 1). codification of numerous kernals at different levels of interest, 2). arbitrary shaped spectrum, and 3). proper adjustment of the compression quality of the image and the background. As well, it is dispensable to perform standard modification for JPEG-2000 decoder. The image coding methods has been applied over diverse types of images and a better performance has been achieved for the selected regions. Lastly, the VLSI implementation of proposed method was shown. It has also been proved that the kernal of Hartley and Cosine transform has provided an improved performance than any other model. Uzun and Amira [39], have presented the design and FPGA implementation of non-separable 2-D DBWT architecture, which is the core of the proposed High-Definition Television (HDTV) compression system. The architecture has utilized periodic symmetric extension at the image boundaries, thus it conforms the JPEG-2000 standard. Hardware implementation results based on a Xilinx Virtex-2000E FPGA chip have revealed that the processing of 2-D DBWT performed at 105MHz has provided a satisfactory solution for the real-time calculation of 2-D DBWT for HDTV compression.

# 3. Overview of Wavelet

DWT is based on sub-band coding, which is established to produce a speedy result of Wavelet Transform. It is simple to implement and reduce the computation time with required resources [12]. The DWT evaluates the signals at diverse frequency bands with different resolutions by decomposing the signal into a coarse approximation and detailed information. The approximation components are acquired by passing the signal through the low pass filter H, which eliminates the high frequency components. The resolution get reduced to half at this time, nevertheless the scale stays unaffected. Subsequently, the signal is sub sampled, thus half the redundant samples are removed. It should be noted that this process does not affect the resolution which gets doubled, but affects the scale. Likewise, the detail coefficients are attained by passing the signal through the high pass filter G [3]. Again, these values gets multiplied with the low pass and high pass filter coefficients to obtain the LL, LH, HL and HH band. Generally, the wavelet transform of the image is computed by using the following equations, and Figure 1 illustrates the block diagram to compute the wavelet coefficients.

$$W_{Low}[n] = \sum_{k} x[k] g[n-k]$$
(1)

$$W_{HIGH}[n] = \sum_{k} x[k] h[n-k]$$
<sup>(2)</sup>



# 4. System Architecture for the Proposed Image Compression Approach

This section depicts the system architecture for the proposed image compression that increases the

compression rate with low power consumption. Here, we utilize the procedure followed in JPEG-2000 compression along with some modifications in wavelet computation. As well, DA-based wavelet is used in the proposed architecture so that substantial performance gain can be achievable than the traditional arithmetic formulation of wavelet computation. Subsequently, DPCM are applied to reduce the number of bits in order to represent the coefficients obtained from the wavelet so that the compressibility of images can be improved. Then, the data is transformed to bit stream with the aid of Huffman encoding, an entropy encoding algorithm employed for lossless data compression. The steps involved in the proposed image compression scheme are described in the following sections (as shown in Figure 2):

- 1. DWT module.
- 2. DPCM transformation.
- 3. Huffman encoding module.



Figure 2. The proposed image compression approach.

#### 4.1. DWT Module

Initially, the input image is given to DWT module, which transforms the input image into wavelet coefficients. In the proposed wavelet architecture, the filter coefficients acquired after the decimation process is minimized according to the devised procedure. Here, zig-zag scanning order is employed to determine the filter coefficients that are normalized to zero based on the neighborhood condition. This procedure is applied to the filter coefficients obtained only from the high pass filter in order that the compression can be improved. Accordingly, we have integrated this model into the wavelet computation, which is shown in Figure 3. In this figure,  $R_W$  block is included in every wavelet transformation level.  $R_W$  block: This block is utilized to enhance the compressibility of the image by changing some of the wavelet coefficients to zero. Here, the pixel value is changed to zero only if the neighborhood pixels contain the coefficient values. This process is applied to the coefficients obtained after applying the high pass filter in a way to maintain the visual quality of the image. Subsequently, the results obtained from the R<sub>w</sub> block and the low pass filter output are given to the next level to obtain LL, LH, HL and HH band. R<sub>W</sub> procedure is applied again in the second level of the high pass filter output and this process is repeated up to the desired level of wavelet computation.



Figure 4 shows the VLSI architecture of the  $R_W$  block, where an 1-bit counter is present in between  $W_H$  block and m-bit register. The coefficients obtained from the  $W_H$  block is stored in the m-bit register that loads the input bits in parallel upon receiving a high signal on its CLK input from the counter, and blocks its input otherwise. When the input gets blocked, the respective values are filled with zero.



Figure 4. Implementation of RW block.

DA-based wavelet is applied to diminish the computation time and arithmetic operations. Here, distributed arithmetic computations are bit-serial in nature, i.e., each bit of the input samples must be indexed successively before a new output sample becomes available. Also, a shift register is used to continuously shift the input bit data and accordingly multiplied with the filter coefficients stored in LUT. Subsequently, addition operation is carried out to add the multiplier output, so that the final coefficients can be generated. The VLSI architecture of the DA-based wavelet is shown in Figure 5.



Figure 5. VLSI architecture of DA-based wavelet computation.

#### 4.2. DPCM Coding

The wavelet coefficients acquired from the DWT module is applied on the DPCM coding, which is essential to enhance the compressibility of the images and at the same time, the quantization can also be possible. Here, the range of wavelet coefficients acquired from the previous steps varies in wider range that affects the compression rate of the images. In order to shun this situation, DPCM coding shown in Figure 6 is applied, which quantizes the wavelet coefficients in a shorter range. The steps involved in DPCM coding are:

1. Obtain the forecast wavelet coefficients  $F_W$  using the below equation:

$$F_{w} = mean\{F_{W}(n-1), F_{W}(n-2), ..., F_{W}(n-M)\}$$
(3)

Determine the residual coefficients:

$$R_c = \overline{F_w} - F_w \tag{4}$$



Figure 6. DPCM transformation.

The VLSI architecture as shown in Figure 7 is used to implement the DPCM transformation. In this architecture, the wavelet coefficients are indexed in shift registers and the adder section adds such coefficients in shift registers. Subsequently, the wavelet coefficient value is predicted by performing averaging operation via multiplier. Finally, the predicted coefficient is subtracted with the current coefficient value using the subtractor in order to obtain the residual coefficients.



Figure 7. VLSI architecture of DPCM transformation.

#### 4.3. Huffman Encoding Module

After obtaining the residual coefficients, Huffman coding is employed to convert the residual coefficients into bit stream, which is shown in Figure 8. In order to obtain the encoded bit stream, initially, we obtain the frequency of the residual coefficients that are arranged in ascending order. Then, two nodes that contain lowest frequency are selected to merge and the addition of two values is given into the new node. Subsequently, the same process is repeated for all nodes until we obtain a single node. Finally, the binary value is assigned to every node in accordance with the location (left or right) of the node. Then, each value obtains one code vector, which is used to create the bit stream of the input image stored instead of the image. For the implementation of Huffman encoding procedure, the VLSI architecture developed in [33] is employed in the proposed compression technique.



Figure 8. Huffman encoding procedure.

# 5. Functional Simulation of the System Architecture of the Proposed Image Compression Approach

This section discusses the functional simulation of the system architecture of the proposed image compression. Here, the modules are programmed using Verilog hardware description language and then, it is synthesized using the active HDL and synplify pro software. For evaluating the result, the developed module is verified by the MATLAB EDA simulator. Simulation waveforms of the DWT module, DPCM module, and Huffman encoding module are shown in the Figures 9, 10 and 11, respectively.



Figure 9. Functional verilog simulation of DA-based DWT.



Figure 10. Functional verilog simulation of DPCM transformation.



Figure 11. Functional verilog simulation of Huffman encoding.

# 6. Performance Analysis of the Proposed Image Compression Approach

This section describes the performance analysis of the proposed image compression approach. We have conducted the performance analysis of the proposed architecture with two images (lena and baboon) of different size, 128\*128, 256\*256, 512\*512 and 1024\*1024. Along with, the performance of the proposed image compression approach is analyzed with the terms such as, gate, clock cycles required, power, processing rate, processing time. The number of gates required to each module of the proposed approach is given in Table 1, in which we identify that the gate requirement is not varied based on the size of the image. On the other hand, the DPCM module need more logic gates to execute their execution compared with other module.

| T 11  | 4  | DC          |    |       | 0   | 1 .      |       |
|-------|----|-------------|----|-------|-----|----------|-------|
| Table |    | Pertormance | 1n | term  | ot. | logic    | ratec |
| raute | 1. | 1 CHOIMance | ш  | willi | U1  | logic    | gaics |
|       |    |             |    |       |     | $\omega$ | 0     |

| Image Size |           | DWT<br>Module                                        | DPCM<br>Module | Huffman<br>Encoding<br>Module  | Proposed<br>Image<br>Compression |
|------------|-----------|------------------------------------------------------|----------------|--------------------------------|----------------------------------|
|            |           | Number of<br>GatesNumber of<br>GatesRequiredRequired |                | Number of<br>Gates<br>Required | Number of<br>Gates<br>Required   |
|            | 128*128   | 302                                                  | 5160           | 500                            | 5962                             |
| Long       | 256*256   | 302                                                  | 5160           | 500                            | 5962                             |
| Lena       | 512*512   | 302                                                  | 5160           | 500                            | 5962                             |
|            | 1024*1024 | 302                                                  | 5160           | 500                            | 5962                             |
| Baboon     | 128*128   | 302                                                  | 5160           | 500                            | 5962                             |
|            | 256*256   | 302                                                  | 5160           | 500                            | 5962                             |
|            | 512*512   | 302                                                  | 5160           | 500                            | 5962                             |
|            | 1024*1024 | 302                                                  | 5160           | 500                            | 5962                             |

Then, we have taken 'number of clock cycles' as parameter to analyze the performance of the modules as well as the proposed compression approach. In Table 2, the 'number of clock cycles' is varied with respect to the size of the input images. When the size of the input image is larger, the approach needs more 'number of clock cycles' to execute the process.

Table 2. Performance in term of clock cycles.

| Image Size |           | DWT<br>Module             | DPCM<br>Module               | Huffman<br>Encoding<br>Module | Proposed<br>Image<br>Compression |
|------------|-----------|---------------------------|------------------------------|-------------------------------|----------------------------------|
|            |           | Number of<br>Clock Cycles | Number of<br>Clock<br>Cycles | Number of<br>Clock Cycles     | Number of<br>Clock Cycles        |
|            | 128*128   | 262144                    | 262144                       | 262144                        | 262144                           |
| Lana       | 256*256   | 1048576                   | 1048576                      | 1048576                       | 1048576                          |
| Lena       | 512*512   | 4194304                   | 4194304                      | 4194304                       | 4194304                          |
|            | 1024*1024 | 16777216                  | 16777216                     | 16777216                      | 16777216                         |
|            | 128*128   | 262144                    | 262144                       | 262144                        | 262144                           |
| Baboon     | 256*256   | 1048576                   | 1048576                      | 1048576                       | 1048576                          |
|            | 512*512   | 4194304                   | 4194304                      | 4194304                       | 4194304                          |
|            | 1024*1024 | 16777216                  | 16777216                     | 16777216                      | 16777216                         |

When we have considered power as a parameter, we can see that no significance variation belong to image size or module. So, from the Table 3, we can conclude that the power is not varied corresponding to images or image sizes. Similarly, the processing rate of every module shown in Table 4 provides the similar results.

Table 3. Performance in term of power.

| Image<br>Size |           | DWT Module | DPCM<br>Module | Huffman<br>Encoding<br>Module | Proposed<br>Image<br>Compression |  |
|---------------|-----------|------------|----------------|-------------------------------|----------------------------------|--|
|               |           | Power      | Power          | Power                         | Power                            |  |
|               | 128*128   | 328µW      | 328µW          | 328µW                         | 328µW                            |  |
| Lana          | 256*256   | 328µW      | 328µW          | 328µW                         | 328µW                            |  |
| Lena          | 512*512   | 328µW      | 328µW          | 328µW                         | 328µW                            |  |
|               | 1024*1024 | 328µW      | 328µW          | 328µW                         | 328µW                            |  |
|               | 128*128   | 328µW      | 328µW          | 328µW                         | 328µW                            |  |
| Baboon        | 256*256   | 328µW      | 328µW          | 328µW                         | 328µW                            |  |
|               | 512*512   | 328µW      | 328µW          | 328µW                         | 328µW                            |  |
|               | 1024*1024 | 328µW      | 328µW          | 328µW                         | 328µW                            |  |

Finally, the performance of the proposed image compression approach is analyzed with the processing time shown in Table 5. When the image size is changed to larger value, every module takes more time to complete the process. Furthermore, the processing time required to complete the process is not significantly varied for the different modules.

Table 5. Performance in term of processing time.

| Image<br>Size |           | DWT<br>Module            | DPCM<br>Module           | Huffman<br>Encoding<br>Module | Proposed<br>Image<br>Compression |
|---------------|-----------|--------------------------|--------------------------|-------------------------------|----------------------------------|
|               |           | Processing<br>Time (sec) | Processing<br>Time (sec) | Processing<br>Time (sec)      | Processing<br>Time (sec)         |
|               | 128*128   | 0.262144                 | 0.262144                 | 0.262144                      | 0.262144                         |
| Lana          | 256*256   | 1.0486                   | 1.0486                   | 1.0486                        | 1.0486                           |
| Lena          | 512*512   | 4.1943                   | 4.1943                   | 4.1943                        | 4.1943                           |
|               | 1024*1024 | 16.7772                  | 16.7772                  | 16.7772                       | 16.7772                          |
|               | 128*128   | 0.262144                 | 0.262144                 | 0.262144                      | 0.262144                         |
| Baboon        | 256*256   | 1.0486                   | 1.0486                   | 1.0486                        | 1.0486                           |
|               | 512*512   | 4.1943                   | 4.1943                   | 4.1943                        | 4.1943                           |
|               | 1024*1024 | 16.7772                  | 16.7772                  | 16.7772                       | 16.7772                          |

# 7. Comparative Analysis

Table 6 compares the performance of the proposed approach presented in this paper with the previous approaches given in the literature [16, 17, 35, 38, 39]. The comparison of the different algorithms is not be a noticeable because different algorithms have used different designs, computational requirements, circuit complexities, image quality and image resolutions. Any way, here, we have presented the comparative report of the different VLSI architectures of the compression methods taken from the paper [28]. Along with, we have added the details of our proposed approach to comparatively analyze the performances in various parameters. In most of the works taken for comparison, they have presented the architecture, algorithm, and VLSI hardware of image acquisition, storage, and compression on a single-chip CMOS image sensor. But, the proposed approach is mostly concentrated on the compression. From the Table 6, our work seems better in terms of power needed to do the compression of images compared with previous methods.

Table 4. Performance in term of processing rate.

|        | Image     | DWT Module                      | DPCM Module      | Huffman Encoding Module | Proposed Image Compression |  |
|--------|-----------|---------------------------------|------------------|-------------------------|----------------------------|--|
| Size   |           | Processing Rate Processing Rate |                  | Processing Rate         | Processing Rate            |  |
|        | 128*128   | 11 cycles/pixels                | 11 cycles/pixels | 11 cycles/pixels        | 11 cycles/pixels           |  |
| Ture   | 256*256   | 11 cycles/pixels                | 11 cycles/pixels | 11 cycles/pixels        | 11 cycles/pixels           |  |
| Lena   | 512*512   | 11 cycles/pixels                | 11 cycles/pixels | 11 cycles/pixels        | 11 cycles/pixels           |  |
|        | 1024*1024 | 11 cycles/pixels                | 11 cycles/pixels | 11 cycles/pixels        | 11 cycles/pixels           |  |
|        | 128*128   | 11 cycles/pixels                | 11 cycles/pixels | 11 cycles/pixels        | 11 cycles/pixels           |  |
| Dahara | 256*256   | 11 cycles/pixels                | 11 cycles/pixels | 11 cycles/pixels        | 11 cycles/pixels           |  |
| Baboon | 512*512   | 11 cycles/pixels                | 11 cycles/pixels | 11 cycles/pixels        | 11 cycles/pixels           |  |
|        | 1024*1024 | 11 cycles/pixels                | 11 cycles/pixels | 11 cycles/pixels        | 11 cycles/pixels           |  |

Table. 6 Comparative analysis of the proposed image compression approach.

| Compression Scheme       | DCT [4]            | QTD [2]            | Haar Wavelet [32]           | Predictive [24]         | SPIHT [26]          | AQ /QTD [10]             | Shoushun Chen et al. [11] | Our Work            |
|--------------------------|--------------------|--------------------|-----------------------------|-------------------------|---------------------|--------------------------|---------------------------|---------------------|
| Compression Type         | Lossy              | Lossy              | Lossy                       | Lossless                | Lossy               | Lossy                    | Lossy                     | Lossy               |
| Technology               | 0.5µm              | 0.35µm             | 0.35µm                      | 0.35µm                  | 0.5µm               | 0.35µm                   | 0.35µm                    | 0.35µm              |
| Array Size               | 104×128            | 32 ×32             | 128×128                     | 80×44                   | 33×25               | 64×64                    | 64×64                     | 128×128             |
| Processor Area           | 1.5mm <sup>2</sup> | 0.4mm <sup>2</sup> | 1.8mm <sup>2</sup>          | 0.11mm <sup>2</sup>     | 0.36mm <sup>2</sup> | 1.8mm <sup>2</sup>       | 0.55mm <sup>2</sup>       | 0.65mm <sup>2</sup> |
| Power                    | 80µW/frame         | 70mW/chip          | 26.2mW/chip<br>24.4mW/proc. | 150mW/chip<br>3mW/proc. | 0.25mW/chip         | 20mW/chip<br>6.3mW/proc. | 17mW/chip<br>2mW/proc.    | 0.328mW/c<br>hip    |
| Post-proc<br>Requirement | Yes                | No                 | No                          | No                      | No                  | No                       | No                        | No                  |

# 8. Conclusions

In this paper, a wavelet-based image compression algorithm using well-known DA technique was proposed. Here, we have added one more block called  $R_w$  in wavelet computation to enhance the compression rate. In addition, the DA-based wavelet method is used to ensure the low power consumption. Then, the wavelet coefficients were given to the DPCM technique that improves the compressibility of image. Subsequently, the bit stream was generated from the transformed coefficients using Huffman coding. Finally, the modules were programmed by means of verilog and then, it was synthesized with the aid of active HDL software and synplify pro. We have analyzed the performance of every module using the parameters such as, gate required, clock cycles required, power, processing rate, processing time. In addition to, we have conducted performance analysis of the proposed architecture with two images (lena and baboon) of different size i.e., 128\*128, 256\*256, 512\*512 and 1024\*1024. Eventually, from the comparative analysis over the prior methods, we have concluded that the proposed method offers good performance in power-efficiency corresponding to 0.328 mW/chip.

# References

- [1] Andra K., Chakrabarti C., and Acharya T., "A VLSI Architecture for Lifting-Based Forward and Inverse Wavelet Transform," *IEEE Transaction Signal Process*, vol. 50, no. 4, pp. 966-977, 2002.
- [2] Artyomov E. and Yadid-Pecht O., "Adaptive Multiple-Resolution CMOS Active Pixel Sensor," *IEEE Transaction on Circuits System I: Regular Papers*, vol. 53, no. 10, pp. 2178-2186, 2006.
- [3] Baili J., Lahouar S., Hergli M., Amimi A., and Besbes K., "Application of the Discrete Wavelet Transform to Denoise GPR Signals," *in Proceedings of ISCCSP*, pp. 1-4, 2006.
- [4] Bandyopadhyay A., Lee J., Robucci R., and Hasler P., "Matia: A Programmable 80 W/frame CMOS Block Matrix Transform Imager Architecture," *IEEE Journal Solid-State Circuits*, vol. 41, no. 3, pp. 663-672, 2006.
- [5] Bhuyan M., Amin N., Madesa M., and Islam M., "FPGA Realization of Lifting Based Forward Discrete Wavelet Transform for JPEG 2000," *International Journal of Circuits, Systems and Signal Processing*, vol. 1, no. 2, pp. 124-129, 2007.
- [6] Cao X., Xie Q., Peng C., Wang Q., and Yu D.,
   "An Efficient VLSI Implementation of Distributed Architecture for DWT," in Proceedings of the IEEE 8<sup>th</sup> Workshop on

*Multimedia Signal Processing*, Victoria, pp. 364-367, 2006.

- [7] Calderbank A., "Wavelet Transforms that Map Integers to Integers," *Applied and Computational Harmonic Analysis*, vol. 5, no. 3, pp. 332-369, 1998.
- [8] Chakrabarti C., Vishwanath M., and Owens R., "Architectures for Wavelet Transforms: A Survey," *Journal VLSI Signal Process*, vol. 14, no. 2, pp. 171-192, 1996.
- [9] Chao W. and Peng C., "Efficient Architecture for 2-Dimensional Discrete Wavelet Transform with Novel Lifting Algorithm," *Chinese Journal of Electronics*, vol. 19, no. 1, pp. 1-6, 2010.
- [10] Chen S., Bermak A., Yan W., and Martinez D., "Adaptive-Quantization Digital Image Sensor for Low-Power Image Compression," *IEEE Transaction Circuits System I: Regular Papers*, vol. 54, no. 1, pp. 13-25, 2007.
- [11] Chen S., Bermak A., and Wang Y., "A CMOS Image Sensor with on-Chip Image Compression Based on Predictive Boundary Adaptation and Memoryless QTD Algorithm," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 19, no. 4, pp. 538-547, 2011.
- [12] Chopade N., Ghatol A., and Kolte M., "Efficient Image Compression and Transmission using SPECK," in Proceedings of SPIT-IEEE Colloquium and International Conference, India, vol. 1, pp. 156-160, 2007.
- [13] Davis G. and Nosratinia A., "Wavelet-Based Image Coding: an Overview," *in Proceedings of Applied and Computational Control, Signals, and Circuits*, New York, vol. 1, pp. 205-269, 1998.
- [14] Daubechies I. and Sweldens W., "Factoring Wavelet Transforms into Lifting Steps," *Journal Fourier Analysis Applications*, vol. 4, no. 3, pp. 247-269, 1998.
- [15] Dhulap S. and Nalbalwar S., "Image Compression Based on IWT, IWPT & DPCM-IWPT," International Journal of Engineering Science and Technology, vol. 2, no. 12, pp. 7413-7422, 2010.
- [16] Farahani M. and Eshghi M., "Implementing a New Architecture of Wavelet Packet Transform on FPGA," in Proceedings of the 8<sup>th</sup> WSEAS International Conference on Acoustics & Music: Theory & Applications, Canada, pp. 37-41, 2007.
- [17] Gupta A., Dyer M., Hirsch A., Nooshabadi S., and Taubman D., "Design of a Single Chip Block Coder for the EBCOT Engine in JPEG2000," in Proceedings of the 48<sup>th</sup> Midwest Symposium on Circuits and Systems, pp. 63-66, 2005.
- [18] Huang C., Tseng P., and Chen L., "VLSI Architecture for Forward Discrete Wavelet Transform Based on B-Spline Factorization," *Journal of VLSI Signal Processing Systems for*

*Signal, Image and Video Technology*, vol. 40, no. 3, pp. 343-353, 2005.

- [19] Huang C., Seng P., and Chen L., "Analysis and VLSI Architecture for 1-D and 2-D Discrete Wavelet Transform," *IEEE Transactions on Signal Processing*, vol. 53, no. 4, pp. 1575-1586, 2005.
- [20] Huang C., Tseng P., and Chen L., "Flipping Structure: an Efficient VLSI Architecture for Lifting-Based Discrete Wavelet Transform," *IEEE Transaction Signal Processing*, vol. 52, no. 4, pp. 1080-1089, 2004.
- [21] Huang C., Tseng P., and Chen L., "VLSI Architecture for Discrete Wavelet Transform Based on B-Spline Factorization," in Proceedings of IEEE Workshop Signal Processing System, Taiwan, pp. 346-350, 2003.
- [22] Jeng Y., Hsu S., and Chang Y., "Entropy Improvement for Fractal Image Coder," *International Arab Journal of Information Technology*, vol. 9, no. 5, pp. 403-410, 2012.
- [23] Kharate G., Ghatol A., and Rege P., "Image Compression using Wavelet Packet Tree" *ICGST International Journal on Graphics, Vision and Image Processing*, vol. 5, no. 7, pp. 37-40, 2005.
- [24] Len-Salas W., Balkir S., Sayood K., Schemm N., and Hoffman M., "A CMOS Imager with Focal Plane Compression Using Predictive Coding," *IEEE Journal Solid-State Circuits*, vol. 42, no. 11, pp. 2555-2572, 2007.
- [25] Lin C., Zhang B., and Zheng Y., "Packed Integer Wavelet Transform Constructed by Lifting Scheme," *IEEE Transactions on Circuits and Systems for Video Technology*, vol. 10, no. 8, pp. 1496-1501, 2000.
- [26] Lin Z., Hoffman M., Schemm N., Leon-Salas W., and Balkir S., "A CMOS Image Sensor for Multi-Level Focal Plane Image Decomposition," *IEEE Transaction Circuits System: Regular Papers*, vol. 55, no. 9, pp. 2561-2572, 2008.
- [27] Liu K., Wang K., Li Y., and Wu C., "A Novel VLSI Architecture for Real-Time Line-Based Wavelet Transform using Lifting Scheme," *Journal of Computer Science and Technology*, vol. 22, no. 5, pp. 661-672, 2007.
- [28] Maamoun M., Namane A., Neggazi M., Beguenane R., Meraghni A., and Berkani D., "VLSI Design for High-Speed Image Computing using Fast Convolution-Based Discrete Wavelet Transform," *in Proceedings of the World Congress on Engineering*, London, vol. I, pp. 1-5, 2009.
- [29] Mallat S., "A Theory for Multiresolution Signal Decomposition: The Wavelet Representation," *IEEE Transaction on Pattern Analysis and Machine Intelligence*, vol. 11, no. 7, pp. 674-693, 1989.

- [30] Munteanu A., Cornelis J., and Cristea P., "Wavelet-Based Lossless Compression of Coronary Angiographic Images," *IEEE Transactions on Medical Imaging*, vol. 18, no. 3, pp. 272-281, 1999.
- [31] Nebout C., Moury G., and Blamont J., "Status of Onboard Image Compression for CNES Space Missions," in Proceedings of SPIE, Applications of Digital Image Processing XXII, vol. 3808, pp. 242-256, 1999.
- [32] Nilchi A., Aziz J., and Genov R., "Focal-Plane Algorithmically-Multiplying CMOS Computational Image Sensor," *IEEE Journal Solid-State Circuits*, vol. 44, no. 6, pp. 1829-1839, 2009.
- [33] Pillai L., "Huffman Coding," available at: http://www.xilinx.com/support/documentation/ap plication\_notes/xapp616.pdf, last visited 2003.
- [34] Pujar J. and Kadlaskar L., "A New Lossless Method of Image Compression and Decompression using Huffman Coding Techniques," *Journal of Theoretical and Applied Information Technology*, vol. 15, no.1, pp. 18-23, 2010.
- [35] Rao C. and Latha M., "A Novel VLSI Architecture of Hybrid Image Compression Model Based on Reversible Blockade Transform," in Proceedings of World Academy of Science, Engineering and Technology, USA, vol. 52, pp. 1016-1022, 2009.
- [36] Song M., "Entropy Encoding in Wavelet Image Compression," in Proceedings of Representations, Wavelets and Frames, Applied and Numerical Harmonic Analysis, Birkhäuser Boston, pp. 293-311, 2007.
- [37] Sweldens W., "The Lifting Scheme: A Custom-Design Construction of Bi-Orthogonal Wavelets," *Applied and Computational Harmonic Analysis*, vol. 3, no. 15, pp. 186-200, 1996.
- [38] Trenas M., Lopez J., and Zapata E., "FPGA Implementation of Wavelet Packet transform with Reconfigurable Tree Structure," *in Proceedings of the 26<sup>th</sup> Euromicro Conference*, Spain, vol. 1, pp. 244-251, 2000.
- [39] Uzun I. and Amira A., "Real-Time 2-D Wavelet Transform Implementation for HDTV Compression," *Real-Time Imaging*, vol. 11, no. 2, pp. 151-165, 2005.



**Devangkumar Shah** obtained his Bs degree in electrical engineering from Gujarat University, India in 1999. Then he obtained his Ms degree in microprocessor system applications from the MS University of Baroda, India in 2008. Currently, he is a

assistant professor at the School of Engineering, RK University, India. His specializations include bluetooth network, networking, and virtual reality. His current research interests are digital signal and image processing, microprocessor, embedded systems and VLSI.



**Chandresh Vithlani** obtained his Bs degree in electronics and communication engineering from Gujarat University, India in 1991. Then he obtained his Ms degree in electronics and communication engineering from Gujarat University,

India in 1998 and PhD degree in electronics communication from Gujarat University in the year 2006. Currently, he is working as an associate professor the Department in of Electronics Communication Engineering, Government. Engineering College, India. He has published number of papers in national and international conferences and journals. His current research areas of interests are microprocessor, embedded systems, digital signal and image processing.