Preface#
Recently, I wanted to push the performance of the ADC to its limits, so I researched related peripheral functions.
I had never fully understood the parts related to cache and MPU attributes before, but this time I finally got a clearer picture. In ARM chips with cache functionality, the cache feature can only be used in conjunction with the MPU peripheral functionality configuration.
- cache: The buffer is mainly for the processor core to operate memory more continuously, as some memory operations involve crossing buses. The cache can prefetch memory nearby during reads and delay writes to some extent for continuity, and most importantly, it can reduce conflicts on the internal bus to accelerate data processing.
- MPU (Memory Protection Unit): The memory protection unit is the memory protection mechanism of ARM chips, mainly used to configure the cache attributes of a certain memory area.
Of course, it can have other uses, but frankly, there are no other significant uses. For example, in different task thread loads, the configuration of memory areas can be refined to achieve similar functionality to an MMU, but normal people wouldn't create such complex applications. - DMA (Direct Memory Access): Direct register access allows values from peripheral registers to be copied to memory without processor involvement. After copying a certain amount (e.g., 1024), an interrupt is generated to notify us to process this data.
Main differences between MPU and MMU
MPU | MMU | |
---|---|---|
Buffer Hit Cycle | Fixed at 1 instruction cycle | Varies from 1 to 20 cycles |
Management Granularity | Limited areas (dozens) management | Can be refined page by page in RAM |
Multi-task Support | Divides physical areas for limited isolation | Complete isolation of virtual address processes |
In simple terms | For real-time applications | For Linux usage |
Many other chips also have similar cache and MPU functionalities, but here we mainly discuss ARM chips. Their functionalities are similar, differing only in management granularity, number of divisions, or unmanageable fixed attributes, with similar attributes and corresponding functionalities.
Understanding the Bus#
The peripherals of MCU chips are distributed across different buses, and reading and writing between peripherals and memory on the same bus is relatively fast. We will focus on the cooperation of ADC, DMA, and memory to understand the bus.
In my application, since DMA can only operate on peripherals and memory on the same bus, I used ADC1/2/3, DMA1/2, and BDMA1.
Refer to the following bus diagram for understanding. ADC1 combined with DMA1 samples while occupying part of the D2 domain memory, ADC3 combined with BDMA samples while occupying part of the D3 domain memory, and finally, after initial processing, the sampled values are placed in the D1 domain memory for subsequent CPU calculations. This allocation minimizes performance loss due to crossing domains.
Configuration of Cache Attributes#
The MPU configuration controls the cache attributes, with the three most important configuration items as follows, along with my simple summary:
- IsCacheable determines whether to enable cache.
- IsBufferable determines whether to use cache buffering for write instructions before writing to memory. If IsCache is not enabled, it is meaningless.
- IsShareable determines whether to adopt strict timing control, which is very important when peripherals access external memory, as resources cross chips and strict timing needs to be enabled.
And the corresponding explanation from chat GPT, which is clearer and more detailed than what I said.
- IsCacheable controls whether to use D-Cache, and in M7, it works with TEX/C/B to determine the caching strategy.
- IsBufferable allows write buffering (write buffering/combining), which may delay the time visible to the outside, in exchange for higher throughput.
- IsShareable marks the area as "shareable by multiple hosts," and in M7, it will also change the caching strategy: Cacheable + Shareable will enforce Write-Through/No-Write-Allocate.
Detailed explanation
- IsCacheable
- Concern: Whether to go through D-Cache; works with TEX/C/B to determine whether it is Write-Back/Write-Allocate or Write-Through/No-Allocate.
- Impact: CPU access latency and bandwidth, bus traffic; does not directly change peripheral sampling rates but affects CPU processing speed.
- IsBufferable
- Meaning: Allows writes to first enter a write buffer and possibly merge, then asynchronously flush to memory/bus, reducing bus transactions and improving throughput.
- Impact:
- Performance: Continuous writes (memcpy, streaming writes) are faster.
- Visibility/Order: The "visible moment" of writes to other hosts (DMA/peripherals) may be delayed; use barrier instructions to ensure order if necessary.
- Typical usage: Normal memory is often set as BUFFERABLE; Device/MMIO is usually not set to avoid reordering/merging.
- IsShareable
- Meaning: This area may be shared by multiple hosts (CPU, DMA, etc.). It has a special impact on Cacheable Normal areas in Cortex-M7:
- Cacheable + Shareable = Write-Through, No-Write-Allocate, reducing the window for other hosts to see stale data, but CPU-side performance is slightly lower than WB/WA.
- Cacheable + Non-shareable = usually Write-Back/Write-Allocate (with TEX=1, C=1, B=1), providing the best CPU performance.
- Impact:
- Order and visibility semantics are more conservative, beneficial for collaboration with other hosts.
- Does not equal "hardware cache coherence," but changes the caching write strategy and memory barrier semantic range.
Common combination overview (Normal memory)
- B=1, C=1, Shareable=0, TEX=1 → Write-Back, Write-Allocate (CPU fastest, suitable for most computational buffering)
- B=1, C=1, Shareable=1, TEX=1 → Write-Through, No-Write-Allocate (more conservative sharing semantics)
- B=0/1, C=0 → Non-cacheable (can choose whether to allow write buffering; B=0 is stricter timing)
Refer to the previous bus diagram. Since the cache actually belongs to the CPU, the memory data obtained by DMA (D2/D3 domain memory areas) must cross domains. In cross-domain applications, the memory consistency issue must be resolved, which requires a while loop of memory area size to force hit to resolve consistency issues, causing unnecessary consumption.
Therefore, in the MPU configuration, the address ranges corresponding to the D2 and D3 domains do not enable Cache, corresponding to the third combination mentioned above. The commonly used memory area in the D1 domain can enable IsCache, enable IsBuffer, and disable IsShare to achieve maximum performance, corresponding to the first combination mentioned above.
As for the second combination mentioned above, I have not encountered a usage scenario yet and cannot imagine one.
Below is the matching code for the above configuration for reference and my own backup.
/**
* @brief MPU configuration
* @param None
* @retval None
*/
static void MPUInit(void)
{
MPU_Region_InitTypeDef MPU_InitStruct;
/* Disable MPU */
HAL_MPU_Disable( );
/* Configure AXI SRAM's MPU attributes to Write back, Read allocate, Write allocate
for optimal performance, used for CPU processing */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.BaseAddress = 0x24000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_512KB;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER0;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/* Configure D2 domain MPU
DMA using D2 domain peripherals disables cache */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.BaseAddress = 0x30000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_256KB;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER1;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/* Configure Ethernet transmit/receive descriptor part as Strongly Ordered */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.BaseAddress = 0x30040000;
MPU_InitStruct.Size = MPU_REGION_SIZE_32KB;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER2;
MPU_InitStruct.SubRegionDisable = 0x0;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/* Configure D3 domain MPU
DMA using D3 domain peripherals disables cache */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.BaseAddress = 0x38000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_64KB;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER3;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/* Configure FMC chip select 3 support */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.BaseAddress = 0x68000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_256B;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE; // This peripheral must be configured as non-cacheable, otherwise it will cause repeated chip select and read/write enable
MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER4;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/* Enable MPU */
HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}
This article was updated by Mix Space to xLog
The original link is https://www.yono233.cn/posts/novel/25_10_13_arm-cache_mpu