【内存】C|C++ 字符串定义及内存分配

空字符

空字符 \0 代表 NUL,存储在字符串的结尾,直接打印均为空。在ASCII码表中 \0 对应的值为0,因此可将字符 \0 转换为整数后打印。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#include <stdio.h>
#include <iostream>
int main() {
    char nullChar = '\0';
    int nullInt = (int)nullChar;
    printf("|printf char before: %c\t", nullChar);
    printf("|printf int after: %d\n", nullInt);
    std::cout << "|cout char before: " << nullChar << "\t";
    std::cout << "|cout int after: " << nullInt << std::endl;
    return 0;
}
1
2
3
$ g++ test.cpp && ./a.out
|printf char before:    |printf int after: 0
|cout char before:      |cout int after: 0

C 字符串定义

C语言中字符串定义主要有两种方式,一是使用字符数组 char str[],二是使用字符指针 char*,下面分别进行讨论。

字符数组

char str[] 的数组长度可设为空,让编译器自行分配内存。

1
2
3
4
5
6
7
8
9
#include <stdio.h>
int main() {
    char str1[10] = "abc";
    char str2[] = "def";
    // char str3[] = {'h','i','j','\0'};  // 正确的初始化方法,在末尾手动添加 '\0'
    char str3[] = {'h','i','j'};          // 错误
    printf("str: %s,%s,%s\n", str1, str2, str3);
    return 0;
}

输出结果如下,可见在定义 str3\0 不可遗漏,否则会带来问题。

1
2
$ gcc test.c && ./a.out
str: abc,def,hijdef

在 C 中,字符数组的初始化大小可以刚好为字符串的 size - 1,即可以忽略结尾的空字符\0, 详见 C语言-数组初始化文档

如此初始化不会报错,但会产生类似 \0 带来的问题,考虑下面的程序:

1
2
3
4
5
6
7
8
#include <stdio.h>
int main() {
    char str1[3] = "abc";       // str1 has type char[3] and holds 'a', 'b', 'c'
    char str2[2] = "def";       // str2 has type char[2] and holds 'd', 'e'
    printf("----------------------\n");
    printf("str: %s,%s\n", str1, str2);
    return 0;
}
1
2
3
4
5
6
7
$ gcc test.c && ./a.out
test.c: In function ‘main’:
test.c:4:20: warning: initializer-string for array of chars is too long
    4 |     char str2[2] = "def";       // str2 has type char[2] and holds 'd', 'e'
      |                    ^~~~~
----------------------
str: abc,deabc

执行后发现 char str1[3] = "abc" 可以正常编译,而 char str2[2] = "def" 会报一个 warning,表示将一个长度超过目标数组大小的字符串赋值给字符数组,但程序仍可运行。

str2 字符数组存储 d e 两个字符,str1 则存储 3 个字符,均不含 \0

输出的错误结果会在后续内存分配一节进行分析。

字符指针

使用 char* ptr 指向一个字符串,ptr 为指向一个字符串的指针,*ptr 为字符串的首字符。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#include <stdio.h>
int main() {
    char* ptr = "abc";
    printf("%s|\n", ptr);
    printf("%c-%c-%c-%d|\n", *ptr, *(ptr+1), *(ptr+2), (int)(*(ptr+3)));
    printf("0x%lx\n", (unsigned long)&ptr);     // 存储指针的地址
    printf("0x%lx\n", (unsigned long)ptr);      // 指针指向的地址
    printf("%p\n", ptr);
    return 0;
}
1
2
3
4
5
6
$ gcc test.c && ./a.out
abc|
a-b-c-0|
0x7ffdb1473a00
0x557990186004
0x557990186004

两者区别

两种定义方式下,数据的存储方式不同,也导致对应的操作差异:

char str[] = "abc" 定义了一个字符数组,编译器为字符在栈上分配了内存空间,str 可认为是一个不可改变的常指针,但其指向的内容非常量,可以对数组内容进行修改,如 str1[0] = 'x'

char *ptr = "abc" 定义了一个可变指针 ptr 。编译器在内存的文字常量区分配一块内存保存"abc"这一字符串常量,并在栈上分配内存保存 ptr, ptr 的值为字符串常量的地址。由于 ptr 指向的是常量,故试图通过 ptr[0] = 'x'修改会导致程序崩溃。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include <stdio.h>
int main() {
    char str1[] = "abc";
    char str2[] = "def";
    // str2 = str1;        // error: assignment to expression with array type
    str1[0] = 'x';         // ok
    printf("%s\n", str1);
    char *ptr = "abc";  
    // ptr[1] = 'x';       // segmentation fault  ./a.out
    ptr = str2;            // ok
    ptr[1] = 'x';          // ok
    printf("%s\n", str2);
    return 0;
}
1
2
3
$ gcc test.c && ./a.out
xbc
dxf

字符串内存分配

分析字符串的内存分配情况,看下例:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#include <stdio.h>
#include <string.h>
int main() {
    char str1[] = "abc";    // str has type char[4] and holds 'a', 'b', 'c', '\0'
    char str2[3] = "def";   // str has type char[3] and holds 'a', 'b', 'c'
    char str3[2] = "hij";
    char* str4 = "klm";
    printf("----------------------\n");
    printf("str: %s,%s,%s,%s\n", str1, str2, str3, str4);
    printf("strlen: %ld,%ld,%ld,%ld\n", strlen(str1), strlen(str2), strlen(str3), strlen(str4));
    printf("sizeof: %lu,%lu,%lu,%lu\n", sizeof(str1), sizeof(str2), sizeof(str3), sizeof(str4));
    printf("point: %p,%p,%p,%p,%p\n", str1, str2, str3, &str4, str4);
    printf("diff: %p,%ld,%ld,%ld\n", str1, str1 - str2, str2 - str3, (unsigned long)str3 - (unsigned long)(&str4));
    return 0;
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ gcc test.c && ./a.out
test.c: In function ‘main’:
test.c:6:20: warning: initializer-string for array of chars is too long
    6 |     char str3[2] = "hij";
      |                    ^~~~~
----------------------
str: abc,defabc,hidefabc,klm
strlen: 3,6,8,3
sizeof: 4,3,2,8
point: 0x7ffd17049fa4,0x7ffd17049fa1,0x7ffd17049f9f,0x7ffd17049f90,0x55d871030004
diff: 0x7ffd17049fa4,3,2,15

内存模型

分析执行结果,可以画出内存模型图,并得出如下结论:

  • strlen 计算字符串长度以 \0 为截止,但不计入 \0

  • sizeof 计算字符串大小会计入 \0 且受数组长度定义的上界约束(char 1 个字节,pointer 8 个字节);

  • 字符数组定义长度小于等于输入长度时,不会自动添加 \0,后续打印字符串遇到 \0 才结束;

  • char [] 定义的字符数组在栈空间, char* 定义的字符串指针在栈空间,字符串值作为常量存储在静态数据区。

字符串内存分配的顺序与定义的顺序不是对应的,具体细节留待后续研究。下面是一个例子:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#include <stdio.h>
#include <string.h>
int main() {
    char str1[] = "abc";    // str has type char[4] and holds 'a', 'b', 'c', '\0'
    char str2[3] = "def";   // str has type char[3] and holds 'a', 'b', 'c'
    char str3[2] = "hij";
    char* str4 = "klm";
    char str5[3] = "opq";
    printf("----------------------\n");
    printf("str: %s,%s,%s,%s,%s\n", str1, str2, str3, str4, str5);
    printf("strlen: %ld,%ld,%ld,%ld,%ld\n", strlen(str1), strlen(str2), strlen(str3), strlen(str4), strlen(str5));
    printf("sizeof: %lu,%lu,%lu,%lu,%lu\n", sizeof(str1), sizeof(str2), sizeof(str3), sizeof(str4), sizeof(str5));
    printf("point: %p,%p,%p,%p,%p\n", str1, str2, str3, str4, str5);
    printf("diff: %p,%ld,%ld,%p,%ld\n", str1, str1 - str2, str1 - str3, str4, str1 - str5);
    return 0;
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ gcc test.c && ./a.out
test.c: In function ‘main’:
test.c:6:20: warning: initializer-string for array of chars is too long
    6 |     char str3[2] = "hij";
      |                    ^~~~~
----------------------
str: abc,defopqabc,hidefopqabc,klm,opqabc
strlen: 3,9,11,3,6
sizeof: 4,3,2,8,3
point: 0x7ffc58df0da4,0x7ffc58df0d9e,0x7ffc58df0d9c,0x55d2910fb004,0x7ffc58df0da1
diff: 0x7ffc58df0da4,6,8,0x55d2910fb004,3

C++ 字符串定义

字符数组

在C++中,字符数组的大小必须容纳字符串中的字符(包括字符串结尾的空字符\0)。

1
2
3
4
5
6
int main() {
    char str1[] = "abc";
    char str2[4] = "aaa";
    char str3[3] = "aaa";
    return 0;
}

上述代码会在 char str3[3] = "aaa" 处报错

1
2
3
4
5
$ g++ test.cpp && ./a.out        
test.cpp: In function ‘int main()’:
test.cpp:4:20: error: initializer-string for array of chars is too long [-fpermissive]
    4 |     char str3[3] = "aaa";
      |  

字符指针

1
2
3
4
5
6
7
int main() {
    const char* str1 = "aaa";   // 将 str1 声明为指向常量字符的指针
    str1[0] = 'x';      // error: assignment of read-only location ‘* str1’
    char* str2 = "aaa";
    str2[0] = 'x';      // segmentation fault  ./a.out
    return 0;
}
1
2
3
4
test.cpp: In function ‘int main()’:
test.cpp:2:17: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
    2 |     char* str = "aaa";
      |   

与字符数组相比,C++对字符串的定义有更严格的规范和限制,会尽量在编译阶段就解决空字符 \0 和修改字符串常量的问题,而且 C++ 的 string 类更安全、更易用。

Reference

0%