空字符
空字符 \0
代表 NUL
,存储在字符串的结尾,直接打印均为空。在ASCII码表中 \0
对应的值为0,因此可将字符 \0
转换为整数后打印。
1
2
3
4
5
6
7
8
9
10
11
|
#include <stdio.h>
#include <iostream>
int main() {
char nullChar = '\0';
int nullInt = (int)nullChar;
printf("|printf char before: %c\t", nullChar);
printf("|printf int after: %d\n", nullInt);
std::cout << "|cout char before: " << nullChar << "\t";
std::cout << "|cout int after: " << nullInt << std::endl;
return 0;
}
|
1
2
3
|
$ g++ test.cpp && ./a.out
|printf char before: |printf int after: 0
|cout char before: |cout int after: 0
|
C 字符串定义
C语言中字符串定义主要有两种方式,一是使用字符数组 char str[]
,二是使用字符指针 char*
,下面分别进行讨论。
字符数组
char str[]
的数组长度可设为空,让编译器自行分配内存。
1
2
3
4
5
6
7
8
9
|
#include <stdio.h>
int main() {
char str1[10] = "abc";
char str2[] = "def";
// char str3[] = {'h','i','j','\0'}; // 正确的初始化方法,在末尾手动添加 '\0'
char str3[] = {'h','i','j'}; // 错误
printf("str: %s,%s,%s\n", str1, str2, str3);
return 0;
}
|
输出结果如下,可见在定义 str3
时 \0
不可遗漏,否则会带来问题。
1
2
|
$ gcc test.c && ./a.out
str: abc,def,hijdef
|
在 C 中,字符数组的初始化大小可以刚好为字符串的 size - 1
,即可以忽略结尾的空字符\0
,
详见 C语言-数组初始化文档。
如此初始化不会报错,但会产生类似 \0
带来的问题,考虑下面的程序:
1
2
3
4
5
6
7
8
|
#include <stdio.h>
int main() {
char str1[3] = "abc"; // str1 has type char[3] and holds 'a', 'b', 'c'
char str2[2] = "def"; // str2 has type char[2] and holds 'd', 'e'
printf("----------------------\n");
printf("str: %s,%s\n", str1, str2);
return 0;
}
|
1
2
3
4
5
6
7
|
$ gcc test.c && ./a.out
test.c: In function ‘main’:
test.c:4:20: warning: initializer-string for array of chars is too long
4 | char str2[2] = "def"; // str2 has type char[2] and holds 'd', 'e'
| ^~~~~
----------------------
str: abc,deabc
|
执行后发现 char str1[3] = "abc"
可以正常编译,而 char str2[2] = "def"
会报一个 warning,表示将一个长度超过目标数组大小的字符串赋值给字符数组,但程序仍可运行。
str2
字符数组存储 d
e
两个字符,str1
则存储 3 个字符,均不含 \0
。
输出的错误结果会在后续内存分配一节进行分析。
字符指针
使用 char* ptr
指向一个字符串,ptr
为指向一个字符串的指针,*ptr
为字符串的首字符。
1
2
3
4
5
6
7
8
9
10
|
#include <stdio.h>
int main() {
char* ptr = "abc";
printf("%s|\n", ptr);
printf("%c-%c-%c-%d|\n", *ptr, *(ptr+1), *(ptr+2), (int)(*(ptr+3)));
printf("0x%lx\n", (unsigned long)&ptr); // 存储指针的地址
printf("0x%lx\n", (unsigned long)ptr); // 指针指向的地址
printf("%p\n", ptr);
return 0;
}
|
1
2
3
4
5
6
|
$ gcc test.c && ./a.out
abc|
a-b-c-0|
0x7ffdb1473a00
0x557990186004
0x557990186004
|
两者区别
两种定义方式下,数据的存储方式不同,也导致对应的操作差异:
char str[] = "abc"
定义了一个字符数组,编译器为字符在栈上分配了内存空间,str
可认为是一个不可改变的常指针,但其指向的内容非常量,可以对数组内容进行修改,如 str1[0] = 'x'
。
char *ptr = "abc"
定义了一个可变指针 ptr
。编译器在内存的文字常量区分配一块内存保存"abc"
这一字符串常量,并在栈上分配内存保存 ptr
, ptr
的值为字符串常量的地址。由于 ptr
指向的是常量,故试图通过 ptr[0] = 'x'
修改会导致程序崩溃。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
#include <stdio.h>
int main() {
char str1[] = "abc";
char str2[] = "def";
// str2 = str1; // error: assignment to expression with array type
str1[0] = 'x'; // ok
printf("%s\n", str1);
char *ptr = "abc";
// ptr[1] = 'x'; // segmentation fault ./a.out
ptr = str2; // ok
ptr[1] = 'x'; // ok
printf("%s\n", str2);
return 0;
}
|
1
2
3
|
$ gcc test.c && ./a.out
xbc
dxf
|
字符串内存分配
分析字符串的内存分配情况,看下例:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
#include <stdio.h>
#include <string.h>
int main() {
char str1[] = "abc"; // str has type char[4] and holds 'a', 'b', 'c', '\0'
char str2[3] = "def"; // str has type char[3] and holds 'a', 'b', 'c'
char str3[2] = "hij";
char* str4 = "klm";
printf("----------------------\n");
printf("str: %s,%s,%s,%s\n", str1, str2, str3, str4);
printf("strlen: %ld,%ld,%ld,%ld\n", strlen(str1), strlen(str2), strlen(str3), strlen(str4));
printf("sizeof: %lu,%lu,%lu,%lu\n", sizeof(str1), sizeof(str2), sizeof(str3), sizeof(str4));
printf("point: %p,%p,%p,%p,%p\n", str1, str2, str3, &str4, str4);
printf("diff: %p,%ld,%ld,%ld\n", str1, str1 - str2, str2 - str3, (unsigned long)str3 - (unsigned long)(&str4));
return 0;
}
|
1
2
3
4
5
6
7
8
9
10
11
|
$ gcc test.c && ./a.out
test.c: In function ‘main’:
test.c:6:20: warning: initializer-string for array of chars is too long
6 | char str3[2] = "hij";
| ^~~~~
----------------------
str: abc,defabc,hidefabc,klm
strlen: 3,6,8,3
sizeof: 4,3,2,8
point: 0x7ffd17049fa4,0x7ffd17049fa1,0x7ffd17049f9f,0x7ffd17049f90,0x55d871030004
diff: 0x7ffd17049fa4,3,2,15
|
分析执行结果,可以画出内存模型图,并得出如下结论:
-
strlen
计算字符串长度以 \0
为截止,但不计入 \0
;
-
sizeof
计算字符串大小会计入 \0
且受数组长度定义的上界约束(char
1 个字节,pointer
8 个字节);
-
字符数组定义长度小于等于输入长度时,不会自动添加 \0
,后续打印字符串遇到 \0
才结束;
-
char []
定义的字符数组在栈空间, char*
定义的字符串指针在栈空间,字符串值作为常量存储在静态数据区。
字符串内存分配的顺序与定义的顺序不是对应的,具体细节留待后续研究。下面是一个例子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
#include <stdio.h>
#include <string.h>
int main() {
char str1[] = "abc"; // str has type char[4] and holds 'a', 'b', 'c', '\0'
char str2[3] = "def"; // str has type char[3] and holds 'a', 'b', 'c'
char str3[2] = "hij";
char* str4 = "klm";
char str5[3] = "opq";
printf("----------------------\n");
printf("str: %s,%s,%s,%s,%s\n", str1, str2, str3, str4, str5);
printf("strlen: %ld,%ld,%ld,%ld,%ld\n", strlen(str1), strlen(str2), strlen(str3), strlen(str4), strlen(str5));
printf("sizeof: %lu,%lu,%lu,%lu,%lu\n", sizeof(str1), sizeof(str2), sizeof(str3), sizeof(str4), sizeof(str5));
printf("point: %p,%p,%p,%p,%p\n", str1, str2, str3, str4, str5);
printf("diff: %p,%ld,%ld,%p,%ld\n", str1, str1 - str2, str1 - str3, str4, str1 - str5);
return 0;
}
|
1
2
3
4
5
6
7
8
9
10
11
|
$ gcc test.c && ./a.out
test.c: In function ‘main’:
test.c:6:20: warning: initializer-string for array of chars is too long
6 | char str3[2] = "hij";
| ^~~~~
----------------------
str: abc,defopqabc,hidefopqabc,klm,opqabc
strlen: 3,9,11,3,6
sizeof: 4,3,2,8,3
point: 0x7ffc58df0da4,0x7ffc58df0d9e,0x7ffc58df0d9c,0x55d2910fb004,0x7ffc58df0da1
diff: 0x7ffc58df0da4,6,8,0x55d2910fb004,3
|
C++ 字符串定义
字符数组
在C++中,字符数组的大小必须容纳字符串中的字符(包括字符串结尾的空字符\0
)。
1
2
3
4
5
6
|
int main() {
char str1[] = "abc";
char str2[4] = "aaa";
char str3[3] = "aaa";
return 0;
}
|
上述代码会在 char str3[3] = "aaa"
处报错
1
2
3
4
5
|
$ g++ test.cpp && ./a.out
test.cpp: In function ‘int main()’:
test.cpp:4:20: error: initializer-string for array of chars is too long [-fpermissive]
4 | char str3[3] = "aaa";
|
|
字符指针
1
2
3
4
5
6
7
|
int main() {
const char* str1 = "aaa"; // 将 str1 声明为指向常量字符的指针
str1[0] = 'x'; // error: assignment of read-only location ‘* str1’
char* str2 = "aaa";
str2[0] = 'x'; // segmentation fault ./a.out
return 0;
}
|
1
2
3
4
|
test.cpp: In function ‘int main()’:
test.cpp:2:17: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
2 | char* str = "aaa";
|
|
与字符数组相比,C++对字符串的定义有更严格的规范和限制,会尽量在编译阶段就解决空字符 \0
和修改字符串常量的问题,而且 C++ 的 string 类更安全、更易用。
Reference