Cache

Introduction

Cache is a small-capacity, high-speed memory located between the CPU and main memory, used to store recently accessed instructions and data. Since the CPU operates much faster than external memory (such as PSRAM/Flash), Cache leverages the principle of locality to pre-store hot data on-chip, thereby greatly reducing CPU wait cycles, lowering the access frequency to external memory interfaces, and significantly improving system throughput and response time.

Cache Configuration

Each CPU core in the chip is equipped with a dedicated I-Cache and D-Cache, both adopting an N-way Set-Associative structure:

  • Way: The Cache is divided into multiple ways, each storing one copy of data. N ways means that up to N Cache Lines with different address mappings can be stored simultaneously within the same Set.

  • Set: Addresses are indexed by Set. Ways within the same Set can replace each other, managed by a replacement algorithm. The chip uses the LRU replacement algorithm.

  • Cache Line: The minimum unit of data transferred from main memory. A longer Cache Line allows more contiguous data to be prefetched at once; however, if the accessed data is scattered, it may result in more unnecessary data transfers.

By default, the chip uses the following strategies for memory read and write operations:

  • Read: Read-Allocate. On a read miss, the data at the read address is first loaded into Cache according to the Cache Line size, and the CPU then reads from Cache.

  • Write: Write-Allocate + Write-Back. On a write hit, the CPU writes data to Cache without writing to memory. On a write miss, the data at the write address is first loaded into Cache according to the Cache Line size, and the CPU then writes data to Cache.

Note

Users can change the Cache policy for a specific memory region by modifying the MPU or MMU configuration.

The following table lists the Cache configuration parameters for each chip:

RTL8721Dx:

CPU

Type

Size

Way

Cache Line Size

KM4

I-Cache

16KB

4

32B

D-Cache

16KB

4

32B

KM0

I-Cache

16KB

4

32B

D-Cache

16KB

4

32B

Note

For more information about KM0 and KM4 Cache, refer to Arm®v8-M Architecture Reference Manual.

Cache Way Restriction

In scenarios with strict real-time requirements, critical code or data may be evicted from Cache by the replacement algorithm (such as LRU) due to other accesses, causing unpredictable access latency jitter. The Cache Way Restriction feature allows developers to “lock” data in a specified address range to particular Cache ways, preventing it from being replaced by data outside the restricted range, thereby providing a stable and predictable Cache hit rate for critical tasks.

RTL8721Dx:

Not supported.

Tightly-Coupled Memory (TCM)

When some or all Cache ways are no longer used as Cache, they can be remapped via register configuration to the TCM (Tightly-Coupled Memory) address space for direct CPU access. TCM has a fixed access latency (typically 0 wait cycles). Unlike the probabilistic hit behavior of Cache, TCM provides deterministic access timing, making it ideal for storing interrupt service routines (ISR), real-time control algorithms, and other critical code and data with strict execution timing requirements.

Note

TCM cannot be accessed by bus masters other than the CPU (such as DMA). Data that needs to be accessed by other masters should not be placed in TCM.

RTL8721Dx:

Not supported.

Registers

The following describes the Cache-related register maps and field definitions for each chip’s CPU.

RTL8721Dx:

Not supported.