10xEngineers

Impact of the Code Size Reduction Extension (Zce) on RISC-V Cores

Authors: Rohan Arshid, 10xEngineers, Pakistan, Fatima Saleem, 10xEngineers, Pakistan, Farhan Ali, 10xEngineers, Pakistan

Background

The RISC-V instruction set architecture, known for its flexibility and efficiency, is popular in embedded systems and low-power applications. To optimize code size, the Zce (Code Size Reduction Extension) was introduced. Within Zce, the Zcb, Zcmp, and Zcmt sub-extensions provide compressed instructions that streamline Load/Store operations, Zero and Sign extension, Arithmetic Operations, Logical Operations, Stack operations, Data Movement, and Indexed jumps. This is especially beneficial for devices with limited memory, such as those found in embedded systems and the Internet of Things (IoT).

Objectives

This case study aims to evaluate the impact of the Zcb, Zcmp, and Zcmt sub-extension on RISC-V cores, taking OpenHW’s CVA6 as an implementation example. Architecture changes in the CVA6 core are highlighted in blue below.

Figure 1: OpenHW’s CVA6 Architecture Pipeline with Zc* extensions changes

Furthermore, we will also see the impact of these instructions focusing on:

  • Reduced code size in applications that heavily use the stack, Load/Store operations, Zero and Sign extension, Arithmetic Operations, and Logical Operations.

  • Improved performance in tasks involving register and memory management.

Overview

Zc* Extension

This extension reduces code size by replacing common instruction patterns with more compact representations. It is particularly useful for minimizing memory usage in embedded systems.

Zc* is a family of 6 extensions: Zca, Zcf, Zcd, Zcb, Zcmp, and Zcmt. From these extensions, the  Zca, Zcb, Zcmp, and Zcmt extensions are collectively known as Code Size Reduction Extension (Zce).

Zcb Sub-extension

Zcb sub extension further enhances C extension by adding 12 new 16-bit instructions. These instructions are essentially compressed versions of existing 32-bit operations, ensuring that the extension integrates seamlessly without introducing entirely new functionalities. 

Key Features of Zcb

The Zcb extension introduces compressed instructions for:

  • Load and Store Operations: Compressed versions of byte and half-word load/store instructions with limited offset ranges.
  • Zero and Sign Extension: Instructions for zero or sign-extending bytes, half-words, and words within registers.
  • Arithmetic Operations: A compressed multiply instruction where the result overwrites one of the operands.
  • Logical Operations: A compressed NOT instruction that inverts the bits of a register.

These additions are particularly beneficial for high-performance implementations and are included in the RVA23 application profile. On average, Zcb achieves nearly a 1% reduction in code size with minimal implementation cost. Some benchmarks, such as those from Embench, have demonstrated code size reductions of up to 5-6% with the adoption of Zcb [1].

Zcmp Sub-extension

The Zcmp sub-extension introduces compressed instructions for stack and register manipulation. These include:

  • cm.push: Pushes registers onto the stack.
  • cm.pop: Pops registers off the stack.
  • cm.popret: Pops registers and returns from a subroutine.
  • cm.popretz: Pops registers, returns from a subroutine, and clears a specific register (a0).
  • cm.mvsa01: Moves register values to any two specified save registers s0 – s7 from argument registers (a0, a1).
  • cm.mva01s: Moves register values from any two specified save registers s0 – s7 to argument registers (a0, a1).

These instructions optimize common operations in functions, such as saving/restoring register values and returning from function calls, which often contribute significantly to an application’s code size.

Zcmt Sub-extension

The Zcmt extension introduces support for an indexed jump to an address stored in a jump table, reducing a sequence of multiple instructions to a single instruction as long as the jump target can be stored in a 256-entry jump table.The Zcmt extension introduces a new Control/Status register (CSR): JVT, it also adds two new instructions: cm.jt and cm.jalt. cm.jalt is an extended version of cm.jt: on top of cm.jt behaviour it links the return address to ra: copying the address of the next instruction after the cm.jalt to the ra register (to allow it to return to it later).The instruction builds a table_address as JVT.base + index << (2 for RV32, 3 for RV64) and extracts the jump target address as the content of the program memory at address table_address. The hart then jumps to that address (and link the return address to ra in the case of cm.jalt only).This is illustrated by the following diagram, in this example the hart control flow eventually jumps to the address 0x1337 stored into the i-th entry of the jump table [2].
Figure 2: JVT CSR utilization to calculate the Jump table address

Compatibility

Instructions of the Zcmp and Zcmt extensions reuse encodings of compressed double precision floating-point loads and stores (c.fsdsp, c.fldsp) and are thus incompatible with these instructions. This is in line with the philosophy of those new extensions: improve code size for embedded platforms where floating-point instructions are less critical (and may not even be implemented in their uncompressed formats).

Use Case Scenarios

Embedded Systems

In embedded systems with tight memory constraints, such as automotive controllers, the Zce extension is invaluable. Code size reduction allows developers to pack more functionality into limited memory, while performance improvements ensure real-time responsiveness.

IoT Devices

IoT devices, which often have limited RAM and ROM, benefit significantly from Zce. The reduced code size leaves more room for additional features or updates, while lower energy consumption extends battery life.

Real-Time Operating Systems (RTOS)

Zce compressed instructions are especially useful in RTOS environments, where efficient context switching (which involves saving and restoring registers) is critical. Zce reduces the overhead of these operations, improving system responsiveness.

Energy Efficiency

Reduced memory accesses and smaller instruction sizes can lead to lower energy consumption. In battery-operated devices, this leads to less power consumption, which is significant in low-power, real-time applications.

Implementation of Zcb extension in CVA6

Implementing the Zcb extension is straightforward, as it involves adding 12 additional instructions to the compressed decoder (16-bit instruction decoder). This minimal addition maps onto existing decodings, ensuring compatibility with the architecture profiles. All encodings used by Zcb were previously reserved, which maintains compatibility and prevents conflicts within the ISA [3].

Implementation of Zcmp extension in CVA6

Since the new set of instructions is executed as a series of existing 32-bit instructions we need a special sequential decoder that stalls the fetching of a new instruction until all the corresponding instructions are generated (one instruction in every cycle) and issued.

This new decoder extends the existing compressed decoder and the main decoder in the instruction decode stage.

Our new macro_decoder has the following important features:

  1. It has an input signal to indicate that the current instruction is a zcmp instruction and needs to be decoded into a series of 32-bit instructions. 
  2. It has an input signal from the Issue Stage indicating the last issued instruction has been acknowledged. So, we can decode and send the next instruction in the series.
  3. It has an output signal to indicate that the macro decoder is busy decoding the current instruction into a series of instructions and instruction fetching should be stalled as long as it is busy.

Design Changes

  1. compressed_decoder Updated the compressed decoder to indicate that the current compressed instruction is of zcmp extension.
  2. id_stage As mentioned we need to instantiate and connect our new macro_decoder between the compressed decoder and the main larger decoder when the zcmp extension is enabled in the cva6 configuration. Furthermore, we need to stall the fetching of new instructions until our macro_decoder is busy [4].

State machine

Figure 3: Macro Decoder State machine

Implementation of Zcmt extension in CVA6

The ZCMT extension in the CVA6 only targets the 32-bit embedded-class configuration of CVA6.

Key additions

  • Added support for compressed table jump instructions: cm.jt (jump table) and cm.jalt (jump-and-link table) in zcmt_decoder module
  • Implemented the Jump Vector Table (JVT) CSR to store the base address of the jump table in csr_reg module
  • Implemented a return address stack, enabling cm.jalt to behave equivalently to jal ra (jump-and-link with return address), by pushing the return address onto the stack in zcmt_decoder module

The implementation of the ZCMT extension involved modifying the compressed decoder to support ZCMT instruction decoding, enhancing the branch unit to correctly execute jumps for ZCMT instructions, and connecting the cache interface to enable implicit reading of the jump table from memory. Additionally, the zcmt_decoder module was introduced to decode instructions, fetch addresses, and construct jump instructions, ensuring efficient integration of the ZCMT extension for code size reduction in embedded platforms. High-level block diagram of zcmt implementation in CVA6 is shown in Figure 4. 

Figure. 4  High-level block diagram of ZCMT extension implementation

Compressed Decoder

The instructions cm.jt and cm.jalt are decoded by the compressed decoder, which generates a signal indicating that the instruction is both compressed and a ZCMT instruction. The original instruction is then forwarded to the zcmt_decoder along with the ZCMT signal for further processing.

ZCMT_Decoder

The zcmt_decoder is the primary decoder module for ZCMT instructions. It interfaces with the compressed decoder, the cache interface for memory access, and the main decoder via the cvxif_compressed_if_driver. When cm.jt or cm.jalt instructions are received, the module first distinguishes between the two instruction types and extracts the index. Using this index, a jump instruction is constructed. The base address is fetched from the JVT CSR, and combined with the index value to calculate the effective address. A memory request is then issued to fetch the instruction address. Once the address is retrieved, a new jump instruction is created and forwarded to the issue stage. While cm.jt/cm.jalt instructions are being decoded, subsequent instructions are stalled through the cvxif_compressed_if_driver. The block diagram of the zcmt_decoder module is shown in Figure 5

Figure. 5 Block diagram of zcmt_decoder

Branch Unit

In CVA6, unconditional jumps are pre-decoded in the frontend, and predicted addresses are calculated. When the instruction reaches the branch unit, the jump is executed based on the predicted address. However, if the instruction was not identified as a jump during frontend processing, the target address is not calculated, leading to invalid jumps. To address this issue, the is_zcmt signal is propagated from the zcmt_decoder to the branch unit during the execute stage, along with the newly constructed jump instruction. A condition is implemented in the branch unit to ensure that ZCMT instructions always mispredict the predicted address and instead jump to the calculated address of the newly constructed jump instruction.

Known Limitations

The implementation targets 32-bit instructions for the embedded configuration of CVA6 without an MMU. Since these cores do not utilize an MMU, it is leveraged to connect the zcmt_decoder to the cache via port 0 [5].

Impact of Zcb, Zcmp, and Zcmt on RISC-V Cores

Code Size Reduction

The primary benefit of Zcb, Zcmp, and Zcmt instructions is the ability to reduce code size by using compressed instructions for common operations like pushing and popping registers, logical operations, register movements, and indexed jumps. Furthermore, the size of stack-heavy code can be reduced by 10-15% in some cases. This reduction is crucial in memory-constrained environments, such as embedded systems and microcontrollers.

Performance Improvements

The Zcmp instructions also improved performance by reducing the number of memory accesses during function calls and returns. Instructions like cm.popret and cm.popretz combine multiple operations (popping registers and returning from a function) into a single instruction, reducing the instruction count and improving execution speed.

Conclusion

The integration of the Zcb, Zcmp, and Zcmt sub-extension from the Zce extension into RISC-V cores can lead to instruction compression for Load/Store operations, Zero and Sign extension, Arithmetic Operations, Logical Operations, Stack operations, register Movements, and indexed jumps. By reducing code size and improving the efficiency of common tasks these extensions provide both memory and performance benefits . For embedded systems, IoT devices, and other memory-constrained environments, Zcb, Zcmp, and Zcmt extension is a powerful tool that allows developers to write compact, efficient code.

This makes RISC-V, along with extensions like Zcb, Zcmp, and Zcmt a compelling choice for the next generation of embedded processors.

References

[1] “It’s all about RISC-V code size.” Codasip. Accessed: Feb. 18, 2025. [Online]. Available:    https://codasip.com/2023/07/05/it-is-all-about-riscv-code-size/

[2] FPRox. “RISC-V Compressed Instructions (part 2): Zc extensions.” What are you optimizing for ? (fprox’s substack) | Substack. Accessed: Feb. 18, 2025. [Online]. Available: https://fprox.substack.com/p/riscv-compressed-zc-extensions

[3] “ADD SUPPORT FOR `Zcb` EXTENSION (from Code Size Reduction, Zce) by Abdulwadoodd · Pull Request #1431 · openhwgroup/cva6.” GitHub. Accessed: Feb. 19, 2025. [Online]. Available: https://github.com/openhwgroup/cva6/pull/1431

[4] “Zcmp extension support by rohan-10xe · Pull Request #1779 · openhwgroup/cva6.” GitHub. Accessed: Feb. 19, 2025. [Online]. Available: https://github.com/openhwgroup/cva6/pull/1779

[5] “Adding support for ZCMT Extension for Code-Size Reduction in CVA6 by farhan-108 · Pull Request #2659 · openhwgroup/cva6.” GitHub. Accessed: Feb. 19, 2025. [Online]. Available: https://github.com/openhwgroup/cva6/pull/2659