banner
yono

yono

哈喽~欢迎光临
follow
github

About the stack behavior of local variables - Derived from defining loop variables within loop statements

Conclusion#

The conclusion is written at the beginning

  1. Defining local loop variables within a for loop statement, whether using the AC6 compiler or the GCC compiler, will not result in multiple stack operations; instead, it will use the same two stack offsets. If optimization is enabled, the assembly will be exactly the same when there is no actual difference in logical functionality between the two.

  2. In fact, defining the loop variable at the same time as the for loop is an excellent practice. Moving all local variable definitions to the beginning of the function can lead to actual negative optimization or no optimization (depending on the optimization level and compiler).

  3. Therefore, if extreme performance is pursued, the local variable should only be declared in the branch where it is used.

The following tests are compiled with stm32H7 as the target

Discussing stack operations of local variables with the following code#

    for(int i = 0; i < 50; i++)
    {
        for(int j = 0; j < 50; j++)
        {
            HAL_Delay(1);
        }
    }

Intuitively, each time the first loop runs, a local variable j is declared. Will this lead to multiple stack allocation operations?

The disassembly for this part is as follows#

        0x0000001e:       LDR      r0,[sp,#0]
        0x00000020:       STR      r0,[sp,#8]
        0x00000022:       B        {pc}+0x2 ; 0x24
        0x00000024:       LDR      r0,[sp,#8]
        0x00000026:       CMP      r0,#0x31
        0x00000028:       BGT      {pc}+0x2c ; 0x54
        0x0000002a:       B        {pc}+0x2 ; 0x2c
        0x0000002c:       MOVS     r0,#0
        0x0000002e:       STR      r0,[sp,#4]
        0x00000030:       B        {pc}+0x2 ; 0x32
        0x00000032:       LDR      r0,[sp,#4]
        0x00000034:       CMP      r0,#0x31
        0x00000036:       BGT      {pc}+0x14 ; 0x4a
        0x00000038:       B        {pc}+0x2 ; 0x3a
        0x0000003a:       MOVS     r0,#1
        0x0000003c:       BL       HAL_Delay
        0x00000040:       B        {pc}+0x2 ; 0x42
        0x00000042:       LDR      r0,[sp,#4]
        0x00000044:       ADDS     r0,#1
        0x00000046:       STR      r0,[sp,#4]
        0x00000048:       B        {pc}-0x16 ; 0x32
        0x0000004a:       B        {pc}+0x2 ; 0x4c
        0x0000004c:       LDR      r0,[sp,#8]
        0x0000004e:       ADDS     r0,#1
        0x00000050:       STR      r0,[sp,#8]
        0x00000052:       B        {pc}-0x2e ; 0x24

Outer loop#

This is not our main discussion point; it will simply use jumps to execute the inner loop 50 times.

        0x0000001e:       LDR      r0,[sp,#0]
        0x00000020:       STR      r0,[sp,#8]
        0x00000022:       B        {pc}+0x2 ; 0x24
        0x00000024:       LDR      r0,[sp,#8]
        0x00000026:       CMP      r0,#0x31
        0x00000028:       BGT      {pc}+0x2c ; 0x54
        0x0000002a:       B        {pc}+0x2 ; 0x2c
        ; .....inner loop
        0x0000004a:       B        {pc}+0x2 ; 0x4c
        0x0000004c:       LDR      r0,[sp,#8]
        0x0000004e:       ADDS     r0,#1
        0x00000050:       STR      r0,[sp,#8]
        0x00000052:       B        {pc}-0x2e ; 0x24

Inner loop#

        0x0000002c:       MOVS     r0,#0
        0x0000002e:       STR      r0,[sp,#4]
        0x00000030:       B        {pc}+0x2 ; 0x32
        0x00000032:       LDR      r0,[sp,#4]
        0x00000034:       CMP      r0,#0x31
        0x00000036:       BGT      {pc}+0x14 ; 0x4a
        0x00000038:       B        {pc}+0x2 ; 0x3a
        0x0000003a:       MOVS     r0,#1
        0x0000003c:       BL       HAL_Delay
        0x00000040:       B        {pc}+0x2 ; 0x42
        0x00000042:       LDR      r0,[sp,#4]
        0x00000044:       ADDS     r0,#1
        0x00000046:       STR      r0,[sp,#4]
        0x00000048:       B        {pc}-0x16 ; 0x32

The instructions at 2c and 2e set the value at sp+4 to 0.

Then, using increment and jumps, it executes the loop 50 times.

This means that each time the outer loop runs, this set of operations targeting the stack at sp+4 will occur, while the outer loop will always target the stack at sp+8.

What if local variables are defined in advance?#

Change to the following code

	int i = 0;
    int j = 0;
	for(i = 0; i < 50; i++)
    {
        for(j = 0; j < 50; j++)
        {
            HAL_Delay(1);
        }
    }

The disassembly for this part is as follows#

		0x0000001e:       LDR      r0,[sp,#0]
        0x00000020:       STR      r0,[sp,#8]
        0x00000022:       STR      r0,[sp,#4]
        0x00000024:       STR      r0,[sp,#8]
        0x00000026:       B        {pc}+0x2 ; 0x28
        0x00000028:       LDR      r0,[sp,#8]
        0x0000002a:       CMP      r0,#0x31
        0x0000002c:       BGT      {pc}+0x2c ; 0x58
        0x0000002e:       B        {pc}+0x2 ; 0x30
        0x00000030:       MOVS     r0,#0
        0x00000032:       STR      r0,[sp,#4]
        0x00000034:       B        {pc}+0x2 ; 0x36
        0x00000036:       LDR      r0,[sp,#4]
        0x00000038:       CMP      r0,#0x31
        0x0000003a:       BGT      {pc}+0x14 ; 0x4e
        0x0000003c:       B        {pc}+0x2 ; 0x3e
        0x0000003e:       MOVS     r0,#1
        0x00000040:       BL       HAL_Delay
        0x00000044:       B        {pc}+0x2 ; 0x46
        0x00000046:       LDR      r0,[sp,#4]
        0x00000048:       ADDS     r0,#1
        0x0000004a:       STR      r0,[sp,#4]
        0x0000004c:       B        {pc}-0x16 ; 0x36
        0x0000004e:       B        {pc}+0x2 ; 0x50
        0x00000050:       LDR      r0,[sp,#8]
        0x00000052:       ADDS     r0,#1
        0x00000054:       STR      r0,[sp,#8]
        0x00000056:       B        {pc}-0x2e ; 0x28

It can be seen that the loop part (26-56) is no different from the previous code (22-52), but it has added two instructions to set (sp+4) and (sp+8) to zero, resulting in negative optimization.

Will complicating the loop make a difference?#

The following code, along with its disassembly, still does not produce excessive stack operations for (sp+8) and (sp+12).

    int test = 0;
    for(int i = 0; i < 50; i++)
    {
        for(int j = 0; j < 50; j++)
        {
            if((test & 0x01) == 0)
                HAL_Delay(1);
            else
                HAL_Delay(2);
        }
        test++;
    }
        0x0000001e:    9801        ..      LDR      r0,[sp,#4]
        0x00000020:    9004        ..      STR      r0,[sp,#0x10]
        0x00000022:    9003        ..      STR      r0,[sp,#0xc]
        0x00000024:    e7ff        ..      B        {pc}+0x2 ; 0x26
        0x00000026:    9803        ..      LDR      r0,[sp,#0xc]
        0x00000028:    2831        1(      CMP      r0,#0x31
        0x0000002a:    dc21        !.      BGT      {pc}+0x46 ; 0x70
        0x0000002c:    e7ff        ..      B        {pc}+0x2 ; 0x2e
        0x0000002e:    2000        .       MOVS     r0,#0
        0x00000030:    9002        ..      STR      r0,[sp,#8]
        0x00000032:    e7ff        ..      B        {pc}+0x2 ; 0x34
        0x00000034:    9802        ..      LDR      r0,[sp,#8]
        0x00000036:    2831        1(      CMP      r0,#0x31
        0x00000038:    dc12        ..      BGT      {pc}+0x28 ; 0x60
        0x0000003a:    e7ff        ..      B        {pc}+0x2 ; 0x3c
        0x0000003c:    f89d0010    ....    LDRB     r0,[sp,#0x10]
        0x00000040:    07c0        ..      LSLS     r0,r0,#31
        0x00000042:    b920         .      CBNZ     r0,{pc}+0xc ; 0x4e
        0x00000044:    e7ff        ..      B        {pc}+0x2 ; 0x46
        0x00000046:    2001        .       MOVS     r0,#1
        0x00000048:    f7fffffe    ....    BL       HAL_Delay
        0x0000004c:    e003        ..      B        {pc}+0xa ; 0x56
        0x0000004e:    2002        .       MOVS     r0,#2
        0x00000050:    f7fffffe    ....    BL       HAL_Delay
        0x00000054:    e7ff        ..      B        {pc}+0x2 ; 0x56
        0x00000056:    e7ff        ..      B        {pc}+0x2 ; 0x58
        0x00000058:    9802        ..      LDR      r0,[sp,#8]
        0x0000005a:    3001        .0      ADDS     r0,#1
        0x0000005c:    9002        ..      STR      r0,[sp,#8]
        0x0000005e:    e7e9        ..      B        {pc}-0x2a ; 0x34

The following code, with declarations moved up, still results in negative optimization.

    int test = 0;
    int i    = 0;
    int j    = 0;
    for(i = 0; i < 50; i++)
    {
        for(j = 0; j < 50; j++)
        {
            if((test & 0x01) == 0)
                HAL_Delay(1);
            else
                HAL_Delay(2);
        }
        test++;
    }
        0x0000001e:    9801        ..      LDR      r0,[sp,#4]
        0x00000020:    9004        ..      STR      r0,[sp,#0x10]
        0x00000022:    9003        ..      STR      r0,[sp,#0xc]
        0x00000024:    9002        ..      STR      r0,[sp,#8]
        0x00000026:    9003        ..      STR      r0,[sp,#0xc]
        0x00000028:    e7ff        ..      B        {pc}+0x2 ; 0x2a
        0x0000002a:    9803        ..      LDR      r0,[sp,#0xc]
        0x0000002c:    2831        1(      CMP      r0,#0x31
        0x0000002e:    dc21        !.      BGT      {pc}+0x46 ; 0x74
        0x00000030:    e7ff        ..      B        {pc}+0x2 ; 0x32
        0x00000032:    2000        .       MOVS     r0,#0
        0x00000034:    9002        ..      STR      r0,[sp,#8]
        0x00000036:    e7ff        ..      B        {pc}+0x2 ; 0x38
        0x00000038:    9802        ..      LDR      r0,[sp,#8]
        0x0000003a:    2831        1(      CMP      r0,#0x31
        0x0000003c:    dc12        ..      BGT      {pc}+0x28 ; 0x64
        0x0000003e:    e7ff        ..      B        {pc}+0x2 ; 0x40
        0x00000040:    f89d0010    ....    LDRB     r0,[sp,#0x10]
        0x00000044:    07c0        ..      LSLS     r0,r0,#31
        0x00000046:    b920         .      CBNZ     r0,{pc}+0xc ; 0x52
        0x00000048:    e7ff        ..      B        {pc}+0x2 ; 0x4a
        0x0000004a:    2001        .       MOVS     r0,#1
        0x0000004c:    f7fffffe    ....    BL       HAL_Delay
        0x00000050:    e003        ..      B        {pc}+0xa ; 0x5a
        0x00000052:    2002        .       MOVS     r0,#2
        0x00000054:    f7fffffe    ....    BL       HAL_Delay
        0x00000058:    e7ff        ..      B        {pc}+0x2 ; 0x5a
        0x0000005a:    e7ff        ..      B        {pc}+0x2 ; 0x5c
        0x0000005c:    9802        ..      LDR      r0,[sp,#8]
        0x0000005e:    3001        .0      ADDS     r0,#1
        0x00000060:    9002        ..      STR      r0,[sp,#8]
        0x00000062:    e7e9        ..      B        {pc}-0x2a ; 0x38
        0x00000064:    9804        ..      LDR      r0,[sp,#0x10]
        0x00000066:    3001        .0      ADDS     r0,#1
        0x00000068:    9004        ..      STR      r0,[sp,#0x10]
        0x0000006a:    e7ff        ..      B        {pc}+0x2 ; 0x6c
        0x0000006c:    9803        ..      LDR      r0,[sp,#0xc]
        0x0000006e:    3001        .0      ADDS     r0,#1
        0x00000070:    9003        ..      STR      r0,[sp,#0xc]
        0x00000072:    e7da        ..      B        {pc}-0x48 ; 0x2a

Using Optimization#

O1#

Still using the above complex loop

Declaring inside the for loop

        0x00000014:    2400        .$      MOVS     r4,#0
        0x00000016:    bf00        ..      NOP      
        0x00000018:    f0040501    ....    AND      r5,r4,#1
        0x0000001c:    2632        2&      MOVS     r6,#0x32
        0x0000001e:    bf00        ..      NOP      
        0x00000020:    2002        .       MOVS     r0,#2
        0x00000022:    2d00        .-      CMP      r5,#0
        0x00000024:    bf08        ..      IT       EQ
        0x00000026:    2001        .       MOVEQ    r0,#1
        0x00000028:    f7fffffe    ....    BL       HAL_Delay
        0x0000002c:    3e01        .>      SUBS     r6,#1
        0x0000002e:    d1f7        ..      BNE      {pc}-0xe ; 0x20
        0x00000030:    3401        .4      ADDS     r4,#1
        0x00000032:    2c32        2,      CMP      r4,#0x32
        0x00000034:    d1f0        ..      BNE      {pc}-0x1c ; 0x18

Declaring in advance, both are completely identical

        0x00000014:    2400        .$      MOVS     r4,#0
        0x00000016:    bf00        ..      NOP      
        0x00000018:    f0040501    ....    AND      r5,r4,#1
        0x0000001c:    2632        2&      MOVS     r6,#0x32
        0x0000001e:    bf00        ..      NOP      
        0x00000020:    2002        .       MOVS     r0,#2
        0x00000022:    2d00        .-      CMP      r5,#0
        0x00000024:    bf08        ..      IT       EQ
        0x00000026:    2001        .       MOVEQ    r0,#1
        0x00000028:    f7fffffe    ....    BL       HAL_Delay
        0x0000002c:    3e01        .>      SUBS     r6,#1
        0x0000002e:    d1f7        ..      BNE      {pc}-0xe ; 0x20
        0x00000030:    3401        .4      ADDS     r4,#1
        0x00000032:    2c32        2,      CMP      r4,#0x32
        0x00000034:    d1f0        ..      BNE      {pc}-0x1c ; 0x18

O2#

Still using the above complex loop

Declaring inside the for loop

        0x00000014:    2500        .%      MOVS     r5,#0
        0x00000016:    bf00        ..      NOP      
        0x00000018:    2402        .$      MOVS     r4,#2
        0x0000001a:    2632        2&      MOVS     r6,#0x32
        0x0000001c:    07e8        ..      LSLS     r0,r5,#31
        0x0000001e:    bf08        ..      IT       EQ
        0x00000020:    2401        .$      MOVEQ    r4,#1
        0x00000022:    bf00        ..      NOP      
        0x00000024:    4620         F      MOV      r0,r4
        0x00000026:    f7fffffe    ....    BL       HAL_Delay
        0x0000002a:    3e01        .>      SUBS     r6,#1
        0x0000002c:    d1fa        ..      BNE      {pc}-0x8 ; 0x24
        0x0000002e:    3501        .5      ADDS     r5,#1
        0x00000030:    2d32        2-      CMP      r5,#0x32
        0x00000032:    d1f1        ..      BNE      {pc}-0x1a ; 0x18

Declaring in advance, both are completely identical

        0x00000014:    2500        .%      MOVS     r5,#0
        0x00000016:    bf00        ..      NOP      
        0x00000018:    2402        .$      MOVS     r4,#2
        0x0000001a:    2632        2&      MOVS     r6,#0x32
        0x0000001c:    07e8        ..      LSLS     r0,r5,#31
        0x0000001e:    bf08        ..      IT       EQ
        0x00000020:    2401        .$      MOVEQ    r4,#1
        0x00000022:    bf00        ..      NOP      
        0x00000024:    4620         F      MOV      r0,r4
        0x00000026:    f7fffffe    ....    BL       HAL_Delay
        0x0000002a:    3e01        .>      SUBS     r6,#1
        0x0000002c:    d1fa        ..      BNE      {pc}-0x8 ; 0x24
        0x0000002e:    3501        .5      ADDS     r5,#1
        0x00000030:    2d32        2-      CMP      r5,#0x32
        0x00000032:    d1f1        ..      BNE      {pc}-0x1a ; 0x18

O3#

O3 has no discussion value, as it completely unrolls the loop.

Situation under GCC environment#

Defining local variables in advance also leads to negative optimization.

Local variable defined inside the for loop, 20 instructions

image

Local variables defined in advance, 24 instructions

image

This article is updated by Mix Space to xLog. The original link is https://www.yono233.cn/posts/shoot/24_8_6_%E5%85%B3%E4%BA%8E%E5%B1%80%E9%83%A8%E5%8F%98%E9%87%8F%E7%9A%84%E6%A0%88%E8%A1%8C%E4%B8%BA%E2%80%94%E2%80%94%E7%94%B1%E5%BE%AA%E7%8E%AF%E8%AF%AD%E5%8F%A5%E5%86%85%E5%AE%9A%E4%B9%89%E5%BE%86%E7%8E%AF%E5%8F%98%E9%87%8F%E5%BC%95%E7%94%B3

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.