Hardware Design of Image Processing System based on EMCCD

Wenke He\textsuperscript{1,a}, Xuwen Li\textsuperscript{2,b} and Qiang Wu\textsuperscript{3,c}

\textsuperscript{1,3}College of Information and Communication Engineering, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China

\textsuperscript{2}College of Life Science and Bioengineering, Beijing University of Technology

Beijing, 100124, China

\textsuperscript{a}email: wenkehe@126.com, \textsuperscript{b}email: lixuwen@bjut.edu.cn, \textsuperscript{c}email: wuqiang@bjut.edu.cn

\textbf{Keywords:} EMCCD; CCD201; Image Processing System; Output Ripple.

\textbf{Abstract.} Electron Multiplying Charge Coupled Device (EMCCD) is a kind of high sensitive imaging device, widely used in shimmer detection field. The paper proposed a design of image processing system for CCD201, an EMCCD with high frequency and high sensitivity of E2V Corporation. We selected the Xilinx product ZYNQ-7000 series, based on ARM+FPGA architecture, as the processor. According to the amount of image data and the complexity of image processing, the specific model was determined as ZYNQ-7030. We designed a set of memory system out of chip and communication interface for the processor. To improve power conversion efficiency and minimize the system, a 4-channel DCDC micro-module LTM4644 was selected. LTM4644 has lower output voltage ripple than common DCDC part and can meet the power noise request of the system.

\textbf{Introduction}

The EMCCD has a set of gain register between the readout register and the output amplifier. Light raw charge signal is amplified hundreds of thousands of times through the gain register. So, the target under the condition of shimmer can be enhanced. At present, the domestic researches on the EMCCD imaging device are mostly focused on the imaging. The image collector transfers the images to PC to do calculation and processing. In this paper, we designed an image processing system for CCD201, an EMCCD device of British E2V Inc. The system uses the Xilinx product ZYNQ-7030 as the processor which has the functions of drive control, image acquisition, image processing and transmission. As an independent system, it can work as a navigator without other system. Compared with the traditional image processing system, this system has advantages of small volume, low power consumption, high stability and can ensure image processing performance. We can use this system as the star sensor in the field of aerospace.

\textbf{Performance Requirements Analysis}

The CCD201 sensor’s resolution is 1024×1024 \cite{1}, which operates at a pixel readout frequency of 10MHz. That is, the sensor generates 10 frames per second.

The output of this sensor is analog signal. To send the images to processor, we should convert the analog signal to digital signal \cite{2}. The ADC converts a pixel to 14-bit width digital signal, occupying 2 Bytes in digital system. The system should process 10M×2Byte=20MB data per second.

Image preprocessing, connected domain analysis, map matching, star tracking and attitude calculation are performed on the processor \cite{3}. Then, the processed images and measurement results are uploaded to the host computer. The whole process should be real-time. So, the computing power, storage capacity and bandwidth of the image transmission interface are the key factors to be considered in the design.
Design of Image Processing Circuit

The Processor Selection. Xilinx system level ZYNQ-7000 processor chip is based on ARM+FPGA architecture. It supports parallel development of ARM processor system and FPGA. Compared with the traditional image processor system based on DSP+FPGA dual processor, the performance of ARM and FPGA are integrated on a single chip. The power consumption and volume of the system are greatly reduced, and the hardware cost is reduced. Besides, it is necessary to design a set of high speed interface for the communication between processors in the DSP+FPGA system, increasing system resource overhead and debugging difficulty. The ARM and FPGA on ZYNQ-7000 chip can transmit large amounts of data through the on-chip AXI bus protocol, which is a high bandwidth, low delay industry standard. That makes the communication between ARM and FPGA more stable and efficient.

The Programmable Logic side (PL) in ZYNQ-7000 is embedded with two alternative FPGA, Artix-7 and Kintex-7. ZYNQ-7030 and above uses the Kintex-7 kernel, with more logical units and on-chip storage space. The ZYNQ-7030 has 125K logic cells and the total Block RAM is 9.3Mb (1.16MB). The on-chip Block RAM resource of FPGA is valuable and limited. The Block RAM is the fastest to access, which is essential to improve the image processing speed. If it is used to store the image data, the more data it caches, the better processing performance it will have. However, due to the FPGA production process and cost constraints, Block RAM cannot be unlimited increased. Sometimes, we should add external RAM for two level data cache. A line of CCD201 contains 1024 pixels. Each pixel will be converted to 14-bit width digital data which occupies 2 bytes storage in digital system. Then a line occupies 2KB of Block RAM. The ZYNQ-7030 on-chip Block RAM can cache 1.16MB/2KB=580 lines, which is about half of a frame. As we know, some algorithms can be implemented with hardware. ZYNQ-7030 provides a peak signal processing capability of 593GMACs for these algorithms. For complex image processing algorithms, on-chip ARM and FPGA can work together to optimize the processing performance.

Chip size and power consumption are also very important factors in the selection of specific models. The more logic and storage resources a processor has, the greater volume and power consumption it needs. ZYNQ-7000 series processor FPGA core VCCINT is 1V. We can estimate the power consumption through the Xilinx power consumption estimation software. ZYNQ-7030 VCCINT current is 2.368A, and ZYNQ-7035 VCCINT needs 4.725A. The current required by the Zynq-7035 core is about 2 times the current of ZYNQ-7030 core. Under the premise of ensuring the system performance, as far as possible to reduce the volume and power consumption, the use of ZYNQ-7030 is the most appropriate solution.

The Processing System side (PS) of ZYNQ-7030 is integrated with dual core ARM and the kernel is Cortex-A9 with NEON extension. This architecture is designed to support graphics applications or graphics accelerator, supporting single and double precision floating-point operation with working frequency up to 1GHz. Each core’s computation ability is 2.5DMIPS/MHz. As shown in Fig. 1, the chip contains L1 and L2 two level cache, as well as 256KB on-chip RAM. It is definitely enough to carry out medium scale image data processing.

The basic data communication between PS and PL is accomplished by AMBA AXI. AXI (Advanced eXtensible Interface) protocol, the most important part of the AMBA (Advanced Microcontroller Bus Architecture) agreement 3.0 presented by ARM Corporation, is a kind of high
performance, high bandwidth, low latency on-chip bus. With PL as the main device, PS as the slave device, the chip includes 2 32-bit AXI main device interface, 2 32-bit AXI slave device interface, providing a speedy channel for communication between PL and PS. It also includes 4 64-bit/32-bit configurable, buffered AXI slave interfaces with direct access to DDR memory and OCM. So, the PL does not need to expand the DDR memory specifically.

**Off-chip Memory System.** From the above, we can see that the RAM resource on-chip is not enough to cache a frame image data. So we need to expand the memory to cache the image data. Firstly the processor stores the image data received in the off-chip memory. When data processing is needed, the processor carries the data to the on-chip block RAM. When done, the data is carried to the off-chip memory again [4].

One of the most important features of this system is that it has abundant off-chip memory, making the system more flexible to deal with a large number of image data. Off chip memory system includes image memory, program and star image memory [5]. So, we designed DDR3 SDRAM memory, SRAM memory, NOR Flash for the system. Memory capacity and interface speed are key factors that affect system performance. It has been known that the image size of each frame is 2MB, 10 frames per second, and then the speed of memory interface and communication interface must be greater than 20MB/s.

We selected MT41J256M16HA-125, with 4Gb capacity, 1333MHz frequency, as the DDR3 SDRAM. Two pieces of MT41J256M16HA-125 connect to the PS side of ZYNQ-7030. The total storage capacity reaches to 8Gb, with 32-bit width and 1333MHz×32/8=5.332GB/s bandwidth. The DDR3 SDRAM is used to run program and store image data. The PL side of ZYNQ-7030 can access the DDR3 SDRAM directly through the on-chip AXI bus and the image from the PL side can be directly cached in the DDR3 SDRAM.

SRAM memory is IS61WV102416ALL-20MI, with capacity of 16Mb, 50MHz frequency. The system used two pieces of IS61WV102416ALL-20MI connecting to the ZYNQ-7030 PL side. The storage capacity of 32Mb (4MB) can store 2 image frames at most. Parallel width is 32-bit and bandwidth is 50MHz ×32/8=200MB/s.

NOR Flash using a parallel interface, read-write speed is much faster than NAND Flash. In the system, the NOR Flash is connected to the PS side of ZYNQ-7030, which is used to store the ARM program, the FPGA configuration file, and original star map. The specific model is S29GL256P which capacity is 256Mb (32MB) and parallel width is 32-bit. The system configures it as 8-bit parallel width. At the frequency of 11.1MHz, the bandwidth is 11.1 MHz×8/8=11.1MB/s. It costs about 2MB/11.1MB/s=180ms to read one frame original star image from the NOR Flash to the processor.

**Interface Circuit.** The interface of the processor includes an interface with the analog front end (AFE) and a communication interface with the host computer. In order to carry out the instructions, status and image transmission with the host computer, we designed interfaces of optical fiber, 1553B, RS422 and Time Check Circuit for the system.

The system uses the optical fiber interface to realize the real-time output of the image under the working condition and the injection of the test image under the simulated condition.

The system requirement for optical fiber transmission distance is within 20m. So, a multi mode, 850nm LED laser module is enough. The specific model is TSPL2G03D of TIBTRONIX Inc, with 2.67Gb/s data rate, full duplex communication, supporting hot plug. Transmission distance is up to 300m in the 50 / 125um multimode fiber.

The SFP module is connected to the ZYNQ-7030 PL side GTX/GTH high-speed transceiver, supporting full-duplex communication. There is a specific IP core for communication with the SFP module in Vivado, the official development tool for ZYNQ, greatly reducing the difficulty of development.

The GTX/GTH transceiver power supply and clock requirements are relatively high. The transceiver requires 3 analog power supplies: 1.0V MGTAVCC, 1.8V MGTVCCAUX and 1.2V MGTAVTT. Analog power supply noise is an important factor affecting the performance of the transceiver and must be strictly controlled. As we know, sources of power supply noise are:
Power supply regulator noise
• Power distribution network
• Coupling from other circuits

We have to suppress noise from these three aspects. Firstly, we try to choose a power supply with small ripple. And then, the power supply must be effectively isolated and filtered such as adding beads and filter capacitors (capacitors can be ceramic capacitors with small ESR) as shown in Fig. 2. Finally, the power after isolated and filtered must be as close as possible to the power supply pins of the GTX / GTH transceiver to reduce the coupled noise. The noise at the supply pin should be controlled within 10mV.

![Isolation and filtering circuit for GTX/GTH.](image)

GTX / GTH transceiver clock is LVDS with 125MHz frequency. Oscillator frequency stability is high. So we choose MAX9112 to convert the oscillator single-end clock to LVDS clock, connecting to the GTX/GTH transceiver pins.

In order to ensure the quality of signal, the length of high-speed signal line should be as short as possible. And the differential line impedance should be continuous to reduce signal jitter.

**Design of Power Supply System.** To implement the power supply system design, we need to consider the processor, memory, interface chip power consumption, the power conversion relationship, power efficiency, power on timing if needed. ZYNQ-7030 is not strict on power on timing. Each power can be powered at the same time.

Power chip selection is the key factor to design of power supply system. DC-DC power has advantages of high efficiency and can output high current. The disadvantage is that the output pulse and switching noise is relatively large. LDO is a low dropout linear regulator. Low cost, low noise and low static current are its outstanding advantages. However, if the input and output voltage difference is too large, the energy consumed on the LDO is too large and the conversion efficiency is not high.

High-speed digital interfaces and analog circuits have strict requirement on power noise, such as high-speed optic fiber interface in the system. ZYNQ-7030 PL side GTX/GTH high-speed transceiver requires power supply ripple within 10mV. Because LDO has the characteristic of low output noise, in the situation of low power noise is required, the general practice is to use LDO to obtain a stable low noise voltage. A key parameter to measure the LDO regulator is the Line Regulation, which refers to the fluctuation of the output voltage with a linear change in the input voltage, provided that the load is fully loaded. That is the rate of change of the output voltage when the input voltage changes within the rated range.

**Line Regulation** = \( \frac{(V_{\text{max}} - V_{\text{min}})}{V_{\text{nor}}} \)

\( V_{\text{nor}} \): the output voltage when the input voltage is normal, the output is full load.
\( V_{\text{max}} \): the maximum output voltage when the input voltage changes.
\( V_{\text{min}} \): the minimum output voltage when the input voltage changes.
The Line Regulation of common LDO is within 0.2%. For example, ASM1117 max Line Regulation is 0.2%, and RT9183 has a max Line Regulation of 0.18%. If the output voltage \( V_{\text{nor}} = 1.8V \), we can calculate the \( (V_{\text{max}} - V_{\text{min}}) \) to be 3.6mV through the formula \( (V_{\text{max}} - V_{\text{min}}) = V_{\text{nor}} \times \text{Line Regulation} \). That fully meets the GTX/GTH high-speed transceiver requirements on the power supply ripple within 10mV.

However, the addition of components not only adds the complexity of the system, but also increases additional power consumption. That is not conducive to the whole system miniaturization. In order to minimize power consumption and number of components, it is necessary to consider whether the ripple of DCDC power supply meets the GTX/GTH ripple requirements.

We selected 2 common DCDC power supply chip. As shown in Fig. 3, the left is TI TPS62140 and the right is RichTek RT8011. If the output current is different, the output voltage ripple is different, too. On the whole, the output ripple increases by the output current. As can be seen from the following figure, the two typical chip ripples are basically less than 20mV, but more than 10mV, which cannot meet the GTX/GTH high-speed transceiver power supply ripple requirements. So the common separated device DCDC is not fit.

![Fig. 3 Common DCDC output ripple.](image)

TM4644 is a four-channel DCDC micro-module with each single-channel current up to 4A. Input voltage range is 4-14V. Output voltage range is 0.6-5.5V. We can configure the output voltages through external resistors. The switches and inductors are all integrated in the chip. So it can work with a few off-chip devices and the package of the micro-module is only 9mm × 15mm, reducing the PCB layout area greatly. LT4644 output voltage ripple is within 5mV, which meets the GTX/GTH high-speed transceiver power supply requirements. In the system, LT4644 input is 13V. Outputs are configured to be 1.0V, 1.2V, 1.8V, 3.3V separately. In order to minimize the ripple, ceramic capacitors which ESR is small can be used for output filter and a reasonable PCB layout is needed.

**Measurements**

Hardware of the system is as shown in Fig. 4. The system acquires the star maps from analog front end (AFE). After image processing, it sends the images and measurements to the host computer.

![Fig. 4 Hardware of the system.](image)
The image processing results are shown in the host computer as Fig. 5, which proves that the system can work and communicate with the host computer normally and real-timely. The system can achieve the desired results.

![Fig. 5 Star Point show view.](image)

**Conclusions**

The paper proposed a design of image processing system based on EMCCD. All the functions in the design have been realized. Real-time image acquisition, processing and transmission can be realized in the case of 10 frames per second. The system can adjust the gain according to the image quality automatically. The advantages of small size, light weight, compact structure and so on make it suitable for use in spacecraft. It can be a star sensor to complete star map observation, identification and tracking.

**Literature References**


