

# General Description

This core can perform the two dimensional Discrete Cosine Transform (DCT) and its inverse (IDCT) on an 8x8 block of samples. The simple, fully synchronous design allows for fast operation while maintaining a low gate count. It offers high performance and many features to meet your multimedia, digital video and digital printing applications.

## **Applications**

DCT/IDCT is a typical building block for image processing, printers, desktop video editing, digital still cameras, surveillance systems, and video conferencing cores.

#### **Features**

- ♦ DCT and IDCT, both supported on an 8x8 block of samples.
- ♦ DCT and IDCT operations performed at one clock per sample.
- ♦ DCT input precision 8 bits; output precision 12 bits.
- ♦ IDCT input precision 11 bits; output precision 8 bits.
- ♦ High clock speed and low gate count achieved.
- Silicon proven.
- Suitable for JPEG designs.
- Fully synchronous design.
- Test benches provided.
- ♦ Available as fully functional and synthesizable VHDL or Verilog core

## **Symbol**



| Pin | Des | crip  | tion |
|-----|-----|-------|------|
|     |     | 91 IP |      |

| Name      | Туре   | Description                                              |  |  |
|-----------|--------|----------------------------------------------------------|--|--|
| RES_N     | Input  | Core reset, active low.                                  |  |  |
| CLK       | Input  | Core clock signal.                                       |  |  |
| START     | Input  | Start transform.                                         |  |  |
| EN        | Input  | Core synchronous enable signal. When LOW, the core       |  |  |
|           |        | stalls.                                                  |  |  |
| IDCT      | Input  | When HIGH, the core performs IDCT on input samples.      |  |  |
| XH[10:0]  | Input  | Input sample to transform.                               |  |  |
| XV[13:0]  | Input  | Input intermediate data from memory.                     |  |  |
| MEMH[5:0] | Output | Writing address for intermediate data.                   |  |  |
| MEMV[5:0] | Output | Reading address for intermediate data.                   |  |  |
| YH[13:0]  | Output | Output intermediate data to memory.                      |  |  |
| YV[11:0]  | Output | Output sample transformed.                               |  |  |
| READY     | Output | Output data valid. This output pulses HIGH for one cycle |  |  |
|           |        | when the first valid transformed sample of a block is    |  |  |
|           |        | produced.                                                |  |  |

# **Functional Description**

This core can perform both Discrete Cosine Transform (DCT) and its inverse (IDCT) on a 8X8 block of samples. The mathematical definition for the DCT and IDCT are shown below.

$$Y_{uv} = \frac{1}{4} C_u C_v \sum_{i=0}^{7} \sum_{j=0}^{7} X_{ij} \cos \frac{(2i+1)u\pi}{16} \cos \frac{(2j+1)v\pi}{16}$$

$$V_{uv} = \frac{1}{4} C_u C_v \sum_{i=0}^{7} \sum_{j=0}^{7} X_{ij} \cos \frac{(2i+1)u\pi}{16} \cos \frac{(2j+1)v\pi}{16}$$

$$X_{ij} = \frac{1}{4} \sum_{u=0}^{7} \sum_{v=0}^{7} C_u C_v Y_{uv} \cos \frac{(2i+1)u\pi}{16} \cos \frac{(2j+1)v\pi}{16}$$

Where  $C_u = C_v = 1/\sqrt{2}$  for u,v=0 and  $C_u = C_v = 1$  otherwise.

In order to operate, this core must be connected to a 64x14 dual port RAM. This memory is written and read synchronously.

Input samples are provided to the XH port, while transformation result are available from port YV. If we consider a block of samples as shown below, the input port XH accepts rows of samples. This means that input samples are to be provided in the order  $X_{00}$ ,  $X_{01}$ ,...,  $X_{07}$ ,  $X_{10}$ ,...,  $X_{70}$ ,...,  $X_{70}$  (see picture below).

Port YV outputs transformed samples as columns (i.e.  $Y_{00}, Y_{10}, ..., Y_{70}, Y_{01}, ..., Y_{77}, ..., Y_{77}$ ) after a latency period of 72 clock cycles.

A clock cycle wide pulse on the START input indicates the very first sample  $X_{00}$  of a series of blocks that need to be transformed.

The *IDCT* pin selects the type of transform to be performed on the input samples, DCT or IDCT. This input must be stable from the input sample  $X_{00}$  until at least the output sample  $Y_{77}$ .



When performing the IDCT, samples are input as columns (i.e.  $Y_{00}, Y_{10}, ..., Y_{70}, Y_{01}, ..., Y_{77}$ ) and output in rows ( $X_{00}, X_{01}, ..., X_{07}, X_{10}, ..., X_{77}$ ).

During DCT input range of the core is -128/+127, output range is -1024.0/+1023.5 (12 bits to be rounded to 11).

During IDCT input range of the core is -1024/+1023. Since exact computation of the IDCT would require infinite precision, the output range can be outpside -128/+127. In this case, clamping might be required.

## **Memory requirements**

The core requires a 64x14 dual port RAM (DCTRam) to correctly operate. This memory is used to store DCT and IDCT intermediate results.

The core will use one port as write only and the other port as read only. The two ports must be able to be independently addressed.

Both the write and the read ports will be accessed by the core synchronously, with a clock synchronous to the core clock.

A simple model of the dual port RAM is contained in the file distribution and used by the test bench. It can be noticed that no write enable signal exists as the write port is written continuously.

The figure below shows the dual port RAM and its connection to the core buses.



The DCTRam memory is fully synchronous and the figure below shows a sample timing diagram. Reading and writing happen when the address and data are sampled on the rising edge of the clock and when the EN signal is active.



# **Timing information**

As shown in the figure below, the core is started by pulsing the input START. After 72 clock cycles the first result emerges from the YV output. The first output sample is marked by the output READY pulsing HIGH.

As long as the synchronous enbable signal EN is HIGH, pixel can be input continuosly, one per clock, with no gaps, even between blocks.

When the input EN is low, the core stalls and all inputs are ignored with the exception of the asynchronous reset RES\_N.



#### **Performance**

Performance figures of the core, implemented with 0.5 u technology, are shown in the table below. Much higher performances are expected in a more modern process (0.18 u).

| Technology | Area                     | Speed   | Throughput     |
|------------|--------------------------|---------|----------------|
| ASIC 0.5 u | 12 Kgates + 64x14 DP RAM | >50 MHz | >50 Msamples/s |

Table 1 Performance of the OL\_DCT core.

#### **Deliverables**

Synthesizable VHDL or Verilog RTL. Test bench.

## Ocean Logic Pty Ltd

PO BOX 768 - Manly NSW 1655 - Australia Fax: +61-2-90120979 E-Mail: <a href="mailto:contact@ocean-logic.com">contact@ocean-logic.com</a> URL : <a href="http://www.ocean-logic.com/">http://www.ocean-logic.com/</a>