Exploring the world of Infinite-ISP: A guide to Infinite possibilities

Motivation behind the initiative

Image Signal Processors (ISPs) play a vital role in the field of image processing across various computer vision applications, including but not limited to autonomous cars, surveillance systems, medical imaging, augmented reality, robotics and more. ISPs single-handedly dictate the quality of images, governing everything from noise reduction to color correction.ย  Understanding how ISPs work is more than simply an academic exercise; it is a necessary skill for engineers and developers in these industries. The developers can fine-tune these processors to suit the unique demands of their applications, ensuring that the images produced are of the highest quality and precision. This elevates the performance and reliability of systems across many computer vision domains, making ISP expertise a key to unlock the full potential of computer vision technology.

Infinite-ISP, an open-source project, seeks to leverage global collaboration to overcome the challenges of processing raw data in image signal processing. This collaborative setup encourages teamwork by bringing state-of-the-art ISP technology to a broader audience. Our underlying goal is to boost the project’s accessibility by encouraging creativity, maintaining quality, and uniting ISP development efforts for various real-world applications.

Design

The Infinite ISP project, has been made with user accessibility in mind, boasting a well-structured, comprehensively documented ISP pipeline that also includes sample data for reference.ย The Reference Modelย is a python based fixed-point model of the Infinite-ISP pipeline for hardware implementation. It employs lookup tables for complex functions like Gaussian and Sigmoid, applies fixed-point numbers or custom approximations for divisions and square roots, optimizing for minimum loss in image quality.

The pipeline is divided into four sections, each corresponding to a specific color space as highlighted in the image above. The current implementation includes a total of 18 ISP modules which can be easily configured using a separate configuration (config) file. In its current state, the model implements simple algorithms per module, with plans to incorporate RTL-friendly complex algorithms in future versions. Implementation of each of the module is briefly discussed below:ย 

Check out these detailed tutorials to learn how to use Infinite-ISP:

Crop
  • RAW domain

Crops the sides of the input image to a given size which can be processed using Infinite-ISP pipeline. Because the input image is raw bayer data, this module is designed to carefully crop the image only if the bayer pattern remains intact after cropping. This crop is most useful in low power applications to reduce the processing efforts required to process a larger image or in applications that require a high frame rate to improve the processing speed of the ISP.

Dead Pixel Correction (DPC)
  • RAW domain

The Infinite-ISP pipeline employs dynamic detection and correction for dead pixels, those afflicted by noise, dust, or sensor fabrication defects. The DPC algorithm assesses each image pixel to categorize them as defective or non-defective by scrutinizing a 3 ร— 3 neighborhood within the same color channel. A pixel is marked as defective if it satisfies two conditions:

  • Its value falls outside the range of neighboring pixels
  • The difference exceeds a configurable threshold.

Defective pixels are then corrected by computing gradients with neighboring pixels and averaging values in the minimum gradient direction.

Corrected deadpixel in the pipeline output with DPC module

Black Level Correction (BLC)
  • RAW domain
  • Tunable

BLC is implemented by subtracting configurable offset values for each channel, determining the amount of dark current within pixel values. Additionally, there’s an optional scaling of pixel values via a linearization function across the desired value range. Black level correction is performed independently for each channel using the following formula:

Where n represents the maximum pixel value for the given bit depth of the input image.

Opto-Electronic Conversion Function (OECF)
  • RAW domain

The OECF defines the relationship between the incident light and the sensor output. First the sensor response is captured and then OECF is calibrated using the tuning tool to make the response linearized as this linear relationship is crucial for achieving accurate white balance and color representation. The OECF module incorporates voltage re-mapping lookup curves tailored to a particular sensor. These lookup curves are normally calibrated with a tuning tool.

Bayer Noise Reduction (BNR)
  • RAW domain

The Bayer Noise Reduction (BNR) block suppresses noise in the Bayer domain beforeย demosaicing. The BNR block employs a Joint/Cross Bilateral Filter (JBF) for raw image denoising. The process includesย 

  • Green channel interpolation using the Malwar-He-Cutler algorithm,
  • Deconstruction of the input Bayer image into R and B sub-images
  • The creation of guided R and B images from interpolated G data.

The JBF is applied to the R and B sub-images using their respective guide images, and to the interpolated G image using itself as a guide. This process involves two types of kernels (range and spatial) and a lookup table for the range kernel. Finally, the output Bayer image is reconstructed by combining the R, G, and B pixels from the JBF outputs.

Denoising results zoomed in (noisy image captured at ISO 2500)

Digital Gain
  • RAW domain

Digital gain is a fundamental module in an ISP pipeline and is always active by default within the Infinite-ISP. It works in combination with the auto exposure module to choose an appropriate scalar value to scale the pixel values of all channels to improve the image exposure. This scaling factor, alternatively referred to as “gain,” can be configured by specifying a list of gain values in the configuration file.

DG 1

DG 2

DG 3

White Balance
  • RAW domain
  • Tunable

White balance corrects the appearance of the white color in the image by applying gains to the red and blue channels. Parameters can be configured to automatically tune these gains using an auto-white balance algorithm.

Output of White Balance

3A STATS
  • RAW domain

In an ISP, 3A refers to three modules: auto white balance (AWB), auto exposure (AE), and auto focus (AF, not currently implemented in our ISP), all of which contribute significantly to the image quality delivered by an ISP. These modules function in combination with digital gain and white balance to provide feedback on configured parameters while not directly modifying the image.

Auto White Balance (AWB)

AWB collaborates with the WB module to update the white balance (WB) gains. It achieves this by employing the modified Grey World algorithm, which automatically configures the red and blue gains. The algorithm first filters out the overexposed and underexposed pixels, using configurable thresholds, and then the gray world algorithm is applied to the remaining image. The choice of the Grey World algorithm is based on its ideal balance between ease of implementation in RTL (Register-Transfer Level) and overall performance.

  • Preprocess Pixel Filtering – removal of specified number of underexposed and overexposed pixels
  • Grey World Algorithm – the red and blue gains are calculated in reference to the green channel by using channel averages, assuming that the scene’s color is a neutral gray.

WB without tuned gains (R and B gains=1)

WB with Tuned gains (r: 1.253, b: 2.843)

Auto Exposure (AE)

The AE module adjusts image brightness by assessing and providing feedback on the digital gain applied to the image. The primary AE stat is the skewness of the luminance histogram of the input frame. The skewness around a central luminance (configurable parameter) provides insight into whether an image is underexposed or overexposed. In the light of the fact that an image with high dynamic range and high contrast has a uniform grayscale distribution, AE module is designed to suggest feedback such that the histogram skewness is corrected to be closer to a uniform distribution around center luminance. The feedback provided by AE is based on the skewness, calculated using Fisher Pearson coefficient, and the configurable parameterย  histogram skewness.

AE feedback categorizes the image as:

  • |๐‘ ๐‘˜๐‘’๐‘ค๐‘›๐‘’๐‘ ๐‘ | โ‰ค histogram skewness: Image has corrected exposure
  • ๐‘ ๐‘˜๐‘’๐‘ค๐‘›๐‘’๐‘ ๐‘  < – histogram skewness: Image is underexposed
  • ๐‘ ๐‘˜๐‘’๐‘ค๐‘›๐‘’๐‘ ๐‘  > histogram skewness: Image is overexposed

This feedback can be incorporated by processing the frame again using the render_3a flag discussed below.

Overexposed image (DG=3)

Corrected exposure (DG=1)

Color Filter Array (CFA)
  • RGB domain

CFA is a mosaic of tiny color filters placed over the individual pixels to capture color information enabling the sensor to detect and record full-color images. The arrangement of these filters in Infinite-ISP follows the Bayer Pattern which records each color at alternating positions. The CFA or demosaic module in an ISP transforms the captured raw data into a 3D RGB image. This transformation, for Infinite-ISP, includes interpolating missing color details through the utilization of the Malvar-He-Cutler algorithm, known for its gradient-corrected bilinear interpolation approach. It is a default module within the Infinite-ISP pipeline i.e. is always active in the pipeline.

Before Demosaicing (RAW)

After Demosaicing

Color Correction Matrix (CCM)
  • RGB domain
  • Tunable

The Color Correction Matrix (CCM) module plays a critical role in counteracting the effects of incorrect color representation. It employs a calibrated 3×3 matrix, tailored to a specific sensor, to correct the Red, Green, and Blue (RGB) color channels of an image using matrix multiplication.

This correction maps the image onto the standard sRGB space ensuring a more precise display of colors as captured by the sensor.

Gamma Correction
  • RGB domain

Gamma correction plays a pivotal role in image processing by translating the input image into a nonlinear space, aligning it with the nonlinear response of display devices. This transformation either broadens or compresses the dynamic range of images. Within Infinite-ISP, lookup tables are used to implement gamma correction as this approach is known for its ease of modification and encoding in hardware. It’s essential to highlight that this module functions as the complement to the OECF module. While the OECF initially converted the captured nonlinear data into linear data, gamma correction subsequently maps this linear data back into a nonlinear space for display, ensuring accurate and visually appealing color representation.

Color Space Conversion (CSC)
  • YCbCr/YUV domain

In Infinite-ISP, the color space conversion module translates the RGB color representation of an image into the YUV color space, with the U and V channels carrying color details while the Y channel represents luminance. These conversions adhere to established ITU-R standards, providing three options:

  1. BT.601 โ€“ Tailored for SDTV
  2. BT.709 โ€“ Optimized for HDTV
  3. BT.2020 โ€“ Anticipated for future implementation
2D Noise Reduction
  • YCbCr/YUV domain

To reduce image noise and improve image quality, Infinite-ISP includes a 2D Noise Reduction (2DNR) algorithm employing a Non-Local Means (NLM) Filter. The NLM algorithm, a spatial domain denoising technique, capitalizes on the self-similarity present in natural images. It achieves denoising by calculating weighted averages of similar pixels across the entire image, with weights determined by the Euclidean distance between pixel intensities. This approach preserves image details while effectively reducing noise, although it can be computationally intensive due to its non-local nature.

Denoising results zoomed in (noisy image captured at ISO 2500)

RGB Conversion

The RGB conversion module determines the output format of the Infinite-ISP pipeline. If enabled, it performs inverse color space conversion from YUV to RGB based on conversion type specified in CSC module parameter, otherwise the pipeline outputs a YUV image.

The conversion formula uses the ITU-R standard for BT-601 and BT_709, multiplying the transformation matrix to the YUV image.

Output of RGB conversion (RGB image)

Invalid Region Crop (IRC)

This block crops the image to a valid input size for the next scale module using a configurable starting point.

Scale

The Scale block is responsible for downsizing a full-resolution image to match the configured output resolution or size specified by the user using Nearest neighbor algorithm. The valid input sizes are 1920×1080 and  1920×1440, for the scale block. If the size of the input is invalid it is cropped to a valid size first and then scaled down to the desired output. The implemented approach use a combination of downscaling and cropping as shown in the table below:

Input Size Output Size Downscale Factor Crop Value
width height width height
1920ร—1080 640ร—480 3 2 - 60
1920ร—1080 640ร—360 3 3 - -
1920ร—1440 960ร—720 2 2 - -
1920ร—1440 640ร—480 3 3 - -

Table1: Valid input and corresponding output sizes for Infinite-ISP pipeline.

YUV Format โˆ’ 444-422
  • YCbCr/YUV domain

The YUV conversion format offers a means of subsampling YUV images to conserve bandwidth. Within the YUV format, the Y, U, and V components are consolidated into a single array, known as the packed format. Pixels are grouped into macro pixel clusters, and the arrangement varies depending on the specific YUV format. The implemented YUV formats include 4:4:4 and 4:2:2 efficiently balancing image quality and bandwidth conservation.

Some functions like Local Lens Shading Correction (LSC), High Dynamic Range Imaging HDR Stitching, Tone Mapping (TM), Dynamic Contrast Improvement (LDCI) and Sharpening or Edge Enhancement (EE) will be added to the design in the future.

Distinctive Features

Image rendering

In the domain of Image Signal Processing (ISP), 3A stands for the trio of essential modules: auto white balance (AWB), auto exposure (AE), and auto focus (AF), all of which wield significant influence over the quality of images delivered by an ISP. AWB strives to achieve natural coloration in output images, while AE corrects scene brightness through digital gain adjustments. AF fine-tunes focus for optimal contrast and image clarity. These 3A modules finely tune parameters to enhance image quality based on 3A statistics computed at various points in the pipeline. This feedback loop, evident in printed logs, guides adjustments such as reducing digital gain for overexposed scenes. The 3A render feature iteratively updates parameters, refining digital gain and white balance gains to enhance image quality, and can be activated in the config file via the render_3a flag.

Dataset Processing

Processing an entire dataset is made easy with the Infinite-ISP pipeline. The pipeline currently supports three file formats: .raw, .NEF, and .dng. the pipeline intelligently retrieves crucial sensor information from image metadata and seamlessly updates it within the ‘sensor_info’ section of the config file. The dataset should consist of a directory containing input files in one of the supported formats, along with their respective config files. The user has the flexibility to process multiple files using a unified configuration or tailor custom configurations for individual input files in the dataset. A default config file is pre-configured for the provided sample data. However, the user can also customize the script to process a dataset present on the local computer by configuring paths.ย 

Video processing

The video mode operates much like the render-3A feature, with a slight distinction โ€“ feedback of 3A modules from one frame is carried over to the next, ensuring that each frame undergoes processing just once. Since a video comprises a series of sequentially captured frames, a single config file suffixes for video processing. This feature can be enabled via a boolean flag in the script.

Test vector generation

This feature serves a unique purpose: it’s tailor-made for testing and debugging specific segments of the pipeline. This handy feature allows you to tap into the pipeline’s output at any chosen point by designating it as the Device Under Test (DUT). The DUT can be a single module or a series of consecutive modules. The DUT output, called the test vector, is saved as a numpy array (.npy file).

Modular Output

Infinite-ISP introduces a unique capability that empowers users to capture module-level output using a configurable flag called “is_save” in the config file. This level of granularity in output capture is not typically offered by most ISPs. Currently, Infinite-ISP supports two save formats: numpy files and png files.

Sample dataset

For a seamless user experience, a sample dataset is included as a part of the project. This dataset comprises raw files, each accompanied by its corresponding tuned parameter configuration file. It includes one indoor, and four outdoor images as shown below.ย 

Conclusion

In conclusion, we are firmly committed to provide not only a robust open-source Image Signal Processing solution but also an amazing user experience. Our project is deliberately developed with the user in mind, with features such as modular output and test vector generation to aid with testing and debugging. We prioritize usability, making it simple for you to leverage the power of Infinite-ISP in your image processing endeavors. As we continue to expand and enhance our project, our unrelenting commitment to user contentment will ensure that Infinite-ISP remains an essential tool for all of your ISP needs.