An Efficient CORDIC-Based Vector Interpolator in Power-Aware 3-D Graphics Rendering

Tze-Yun Sung1 Chun-Wang Yu1 Yaw-Shih Shieh1 Hsi-Chin Hsin2

1Department of Microelectronics Engineering, Chung Hua University, Hsinchu, Taiwan 300-12
2Department of Computer Science and Information Engineering, Formosa University, Hu-Wei, Taiwan 632-08

Abstract

High performance architectures for the data intensive and latency restrained applications can be achieved by maximizing both parallelism and pipelining. In this paper, the CORDIC based hardware primitives of 3-D rotation with high throughput 3-D vector interpolation are presented. The proposed architecture for 3-D vector interpolator, which is based on the redundant CORDIC arithmetic, has been implemented by VLSI.

Keywords: Redundant CORDIC arithmetic, CORDIC algorithm, 3-D vector interpolation, high-throughput.

1. Introduction

Flexible hardware along with precision control is much desirable for the power-aware 3-D graphics rendering applications. In [1], 3-D vector interpolation is required. The 3-D vector interpolator of Euh et al. provides multiple precisions for the design of power-aware systems [2].

The well known CORDIC algorithm, which has been applied with a great success to the hardware implementations of many signal processing tasks, e.g. sine and cosine generation, vector rotation, coordinate transformation, and linear system solving, is suitable for the implementation of 3-D vector interpolation [3]-[4]. In CORDIC, only simple shifters and adders are needed, which can be realized by the use of reconfigurable hardware platforms, especially by FPGA [5]. Thus, the CORDIC-based 3-D vector interpolator is more flexible for the interpolation task.

In this paper, the architecture of 3-D vector interpolator based on the CORDIC algorithm is proposed. It is suitable for VLSI implementation in terms of the computational complexity.

The remainder of the paper is organized as follows. In section 2, the conventional CORDIC algorithm is reviewed. In section 3, the 3-D CORDIC algorithm is given. The proposed VLSI architecture of 3-D vector interpolator based on the CORDIC rotation algorithm is presented in section 4. Its analysis is given in section 5, and the conclusion can be found in section 6.

2. The CORDIC Algorithm

CORDIC (COordinate Rotation DIgital Computer) is an algorithm performing a sequence of iteration computations by the use of coordinate rotation [3] [4]. It can be used to generate important elementary functions by using only simple adders and shifters. The basic CORDIC iteration equations are given by

\[ x_{i+1} = x_i - m \sigma_i 2^{-i(m,i)} y_i \]  
\[ y_{i+1} = y_i + m \sigma_i 2^{-i(m,i)} x_i \]  
\[ z_{i+1} = z_i - m \sigma_i \alpha_{m,i} \]  

where \( m \) denotes the circular \((m=1)\), linear \((m=0)\) or hyperbolic \((m=-1)\) coordinate system, \( i=0, 1, 2, \ldots, n-1 \), and

\[ s(m,i) = \begin{cases} 0, 1, 2, 3, 4, 5, \ldots, m = 1 \\ 1, 2, 3, 4, 5, 6, \ldots, m = 0 \end{cases}, \quad \text{and} \quad 1, 2, 3, 4, 5, 6, \ldots, m = -1 \]

\[ \alpha_{m,i} = m^{-1/2} \tan^{-1}\left( \sqrt{m} 2^{-i(m,i)} \right) \]  

The rotation \( \sigma_i = \text{sign}(z_i) \) for the rotation mode \( (z_n \rightarrow 0) \); \( \sigma_i = -\text{sign}(x_i) \cdot \text{sign}(y_i) \) for the vectoring mode \( (y_n \rightarrow 0) \). The scale factor \( k_{m,i} = \sqrt{1 + m \sigma_i^2 2^{-2i(m,i)}} \) in the \( i \)-th iteration. After \( n \) iterations, the product of all the scale factors is as follows.

\[ K_m = \prod_{i=0}^{n-1} k_{m,i} = \prod_{i=0}^{n} \sqrt{1 + m \sigma_i^2 2^{-2i(m,i)}} = \prod_{i=0}^{n} \sqrt{1 + m 2^{-2i(m,i)}} \]  

where the rotation direction is defined by \( \sigma_i = \{-1,1\} \).

3. 3-D CORDIC Algorithm

Figure 1 shows a vector \( R \) in the 3-D space. Its respective Cartesian and spherical coordinates are \((X_i,Y_i,Z_i)\) and \((R,\theta,\phi)\). \( R \) can be rotated and then becomes a new vector denoted by \( S \) with Cartesian coordinates \((X_{i+1},Y_{i+1},Z_{i+1})\) and spherical coordinates \((R,\theta+\alpha_i,\phi+\beta_i)\). The relationship between the Cartesian coordinates and spherical coordinates of \( R \) and \( S \) are given by

\[ X_{i+1} = R \cos \theta_i \sin \phi_i \]  
\[ Y_{i+1} = R \sin \theta_i \sin \phi_i \]  
\[ Z_{i+1} = R \cos \phi_i \]
\[
X_{\text{int}} = R \cos(\theta + \alpha) \sin(\phi + \beta) \\
Y_{\text{int}} = R \sin(\theta + \alpha) \sin(\phi + \beta) \\
Z_{\text{int}} = R \cos(\phi + \beta)
\]

Equations (9), (10) and (11) can be rewritten by
\[
X_{\text{int}} = R \cos(\alpha) \cos(\beta - \phi) - \sin(\alpha) \sin(\beta + \phi) + \cos(\alpha) \sin(\beta - \phi) \\
Y_{\text{int}} = R \sin(\alpha) \cos(\beta - \phi) + \cos(\alpha) \sin(\beta + \phi) - \sin(\alpha) \cos(\beta - \phi) \\
Z_{\text{int}} = Z \cos(\beta - \phi) - W \sin(\beta - \phi)
\]

It is noted that there are four 2-D CORDIC rotations involved in the 3-D rotation of a vector. In addition, the scale factor of \(Z_{\text{int}}\) and \(W_{\text{int}}\) is different from that of \(U_{\text{int}}\), \(V_{\text{int}}\), \(X_{\text{int}}\) and \(Y_{\text{int}}\). They can be compensated via the pre-scale of inputs or post-scale of outputs with their respective constants \(K\) and \(K'\), which are given by
\[
K = \prod_{i=0}^{n-1} k_i
\]

\[
K^2 = \prod_{i=0}^{n-1} k_i^2
\]

4. VLSI Architecture for 3-D Vector Interpolator with CORDIC Algorithm

Vector interpolation can be obtained by using algorithms based on spherical interpolation, linear interpolation or CORDIC interpolation. The spherical interpolation involves complex computations. The linear interpolation requires post-normalization, which is also complex. The proposed CORDIC-based 3-D interpolator, which is performed on polar components, is efficient and more flexible in terms of the hardware implementations. Figure 2 shows the architecture of the proposed 3-D vector interpolator by using the circular CORDIC algorithm with rotation mode. In which, the generators of \((U_{\text{int}}, V_{\text{int}})\) and \((X_{\text{int}}, Y_{\text{int}})\) consist of two 2-D CORDIC processors, two hardware shifters, and two adders/subtractors. The generators of \(W_{\text{int}}\) and \(Z_{\text{int}}\) consist of half 2-D CORDIC Processor.

The initial coordinates \((U_0, V_0, W_0)\) are obtained by using the auxiliary generator \((U_0, V_0, W_0)\) [6], which is shown in Fig. 3. Thus, the proposed architecture is composed of the auxiliary generator \((U_0, V_0, W_0)\), the redundant CORDIC arithmetic (for the computation of 3-D vector interpolation), and dual-memory banks (for storing the coordinates \((X_i, Y_i, Z_i)\) and \((U_i, V_i, W_i)\), respectively).

The hardware code of the proposed system is written in Verilog-hardware description Language (HDL) [7]. The system diagram is shown in Figure 4. The chip is synthesized by TSMC 0.18 \(\mu\)m 1P6M CMOS cell libraries [8]. The layout view of the 16-bit 3-D vector interpolator is shown in Figure 5. The gate count is reported by the Synopsys\textsuperscript{8} design analyzer.
The power consumption is reported by PrimPower®. The core size is 1620 µm × 1620 µm, and the power dissipation is 56.5 mW with the clock rate of 50 MHz at 1.8 V. The critical path is 5.4 ns. All control signals are generated internally on-chip. This chip offers a high throughput with low gate counts by using a parallel-pipelined architecture. The layout view of the 16-bit Auxiliary Coordinate Generator \((U_0, V_0, W_0)\) is shown in Figure 6. The core size is 1841 µm × 1841 µm, and the power dissipation is 51.4 mW with the clock rate of 50 MHz at 1.8 V. The critical path is 4.6 ns.

5. Advantages of New Architectures and Algorithms

The Euler angle method takes a sequence of three rotations [2], [9], each of which rotates with respect to one of the three orthogonal axes. This method can be represented by the Euler angles corresponding to the sequence of rotations with respect to the coordinate axes. In [2], the 3-D rotation is implemented by cascading two 2-D CORDIC processors. Lang and Antelo developed a method to replace the two 2-D CORDIC processors by one 3-D CORDIC processor [7]. The sequence of rotations is composed of one 2-D CORDIC rotation followed by one 3-D CORDIC rotation. Both of the aforementioned methods require more than two 2-D CORDIC computations. In the proposed 3-D rotation algorithm, the architecture based on the conventional CORDIC processor requires one 2-D CORDIC computation in parallel. The auxiliary generator of coordinate \((U_0, V_0, W_0)\) and the redundant arithmetic CORDIC for 3-D rotation can perform in parallel.

6. Conclusions

High-throughput architecture for the 3-D vector interpolation task based on the CORDIC algorithm is presented. It takes only one conventional CORDIC computation time.

The proposed architecture by the use of CORDIC processor is simple, regular and therefore suitable for VLSI implementation. In power-aware 3-D graphics rendering, the performance of 3-D vector interpolation can be improved by using the proposed algorithm and architecture. Table 1 shows the comparison of this work with Eberly [10] and Lang and Antelo [9].

References

Fig. 2. The architecture of the 3-D vector interpolator

Fig. 3. The auxiliary coordinate ($U_0, V_0, W_0$) generator

Fig. 4. The system diagram of 3-D vector interpolator

Fig. 5. The chip layout of 16-bit 3-D vector interpolator (Left)

Fig. 6. The chip layout of 16-bit auxiliary coordinate generator ($U_0, V_0, W_0$) (Right)