巧妙方法教你实现多种main()

104次阅读

共计 4792 个字符，预计需要花费 12 分钟才能阅读完成。

导读	大家都知道，我是做上层应用的，对底层不是很了解，更别说那帮人在讨论内核的时候，根本插不上话。更多的时候，还是默默记笔记，紧跟大佬们的步伐😁。

大家都知道，我是做上层应用的，对底层不是很了解，更别说那帮人在讨论内核的时候，根本插不上话。更多的时候，还是默默记笔记，紧跟大佬们的步伐😁。
于是，为了调研这个问题，也查了相关资料。今天借助本文，来分析下 C 语言中 main()的实现，顺便解答下群里的这个问题。

定义

作为 C /C++ 开发人员，都知道 main()函数是一个可执行程序的入口函数，大都会像如下这样写:
巧妙方法教你实现多种 main()

 int main() {}
int main(int argc, char *argv[]) {}

但是，作为一个开发老油条，也仅仅知道是这样做的，当看到二哥提出这个问题的时候，第一反应是重载，但是大家都知道 C 语言是不支持重载的，那么有没有可能使用的是默认参数呢？如下这种：

int main(int argc = 1, char **argv = NULL)

好了，为了验证我的疑问，咱们着手开始进行分析。

ps: 在 cppreference 上对于 main()的声明有第三个参数即 char *envp[], 该参数是环境变量相关，因为我们使用更多的是不涉及此参数的方式，所以该参数不在本文的讨论范围内。

断点调试

为了能够更清晰的理解 main()函数的执行过程，写了一个简单的代码，通过 gdb 查看堆栈信息，代码如下：

int main() {return 0;}

编译之后，我们通过 gdb 进行调试，在 main()函数处设置断点，然后看堆栈信息，如下：

 (gdb) bt
#0  main () at main.c:2
(gdb)

从上述 gdb 信息，我们看出 main()位于栈顶，显然，我们的目的是分析 main()的调用堆栈信息，而这种 main()在栈顶的方式显然不足以解答我的疑问。

于是，查阅了相关资料后，发现可以通过其它方式打印出更详细的堆栈信息。

编译命令如下：

gcc -gdwarf-5 main.c  -o main

然后 gdb 的相关命令(具体的命令可以网上查阅，此处不做过多分析):

 gdb ./main -q
Reading symbols from /mtad/main...done.
(gdb) set backtrace past-entry
(gdb) set backtrace past-main
(gdb) show backtrace past-entry
Whether backtraces should continue past the entry point of a program is on.
(gdb) show backtrace past-main
Whether backtraces should continue past "main" is on.

然后在 main()处设置断点，运行，查看堆栈信息，如下：

 (gdb) bt
#0  main () at main.c:2
#1  0x00007ffff7a2f555 in __libc_start_main () from /lib64/libc.so.6
#2  0x0000000000400429 in _start ()
(gdb)

通过如上堆栈信息，我们看到_start()–>__libc_start_main()–>main()，看来应该在这俩函数中，开始分析~~

_start()

为了查看_start()的详细信息，继续在_start()函数处打上断点，然后分析查看：

 (gdb) r
Starting program: xxx
Missing separate debuginfos, use: debuginfo-install glibc-2.17-317.el7.x86_64
Breakpoint 1, 0x0000000000400400 in _start ()
(gdb) s
Single stepping until exit from function _start,
which has no line number information.
0x00007ffff7a2f460 in __libc_start_main () from /lib64/libc.so.6

通过如上分析，没有看到_start()函数的可执行代码，于是通过网上搜索，发现_start()是用汇编编写，于是下载了 glibc2.5 源码，在路径处 sysdeps/i386/elf/start.S

 #include "bp-sym.h"
   .text
   .globl _start
   .type _start,@function
_start:
   /* Clear the frame pointer.  The ABI suggests this be done, to mark
      the outermost frame obviously.  */
   xorl %ebp, %ebp
   /* Extract the arguments as encoded on the stack and set up
      the arguments for `main': argc, argv.  envp will be determined
      later in __libc_start_main.  */
   popl %esi  /* Pop the argument count.  */
   movl %esp, %ecx  /* argv starts just at the current stack top.*/
   /* Before pushing the arguments align the stack to a 16-byte
   (SSE needs 16-byte alignment) boundary to avoid penalties from
   misaligned accesses.  Thanks to Edward Seidl 
   for pointing this out.  */
   andl $0xfffffff0, %esp
   pushl %eax  /* Push garbage because we allocate
                  28 more bytes.  */
   /* Provide the highest stack address to the user code (for stacks
      which grow downwards).  */
   pushl %esp
   pushl %edx  /* Push address of the shared library
                  termination function.  */
#ifdef SHARED
   /* Load PIC register.  */
   call 1f
   addl $_GLOBAL_OFFSET_TABLE_, %ebx
   /* Push address of our own entry points to .fini and .init.  */
   leal __libc_csu_fini@GOTOFF(%ebx), %eax
   pushl %eax
   leal __libc_csu_init@GOTOFF(%ebx), %eax
   pushl %eax
   pushl %ecx  /* Push second argument: argv.  */
   pushl %esi  /* Push first argument: argc.  */
   pushl BP_SYM (main)@GOT(%ebx)
   /* Call the user's main function, and exit with its value.
      But let the libc call main.    */
   call BP_SYM (__libc_start_main)@PLT
#else
   /* Push address of our own entry points to .fini and .init.  */
   pushl $__libc_csu_fini
   pushl $__libc_csu_init
   pushl %ecx  /* Push second argument: argv.  */
   pushl %esi  /* Push first argument: argc.  */
   pushl $BP_SYM (main)
   /* Call the user's main function, and exit with its value.
      But let the libc call main.    */
   call BP_SYM (__libc_start_main)
#endif
   hlt   /* Crash if somehow `exit' does return.  */
#ifdef SHARED
1: movl (%esp), %ebx
   ret
#endif
/* To fulfill the System V/i386 ABI we need this symbol.  Yuck, it's so
  meaningless since we don't support machines 
上述实现也是比较简单的：
xorl %ebp, %ebp：将 ebp 寄存器清零。
popl %esi、movl %esp, %ecx：装载器把用户的参数和环境变量压栈，实际上按照压栈的方法，栈顶的元素就是 argc，接着其下就是 argv 和环境变量的数组。这两句相当于 int argc = pop from stack; char **argv = top of stack。
call BP_SYM (__libc_start_main)：相当于调用__libc_start_main，调用的时候传入参数，包括 argc、argv。
上述逻辑功能，伪代码实现如下：
void _start() {
 %ebp = 0;
 int argc = pop from stack
 char ** argv = top of stack;
 __libc_start_main(main, argc, argv, __libc_csu_init, __linc_csu_fini,
 edx, top of stack);
}
__libc_start_main
在上一节中，我们了解到，_start()才是整个可执行程序的入口函数，在_start()函数中调用__libc_start_main()函数，该函数声明如下：
STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
                int argc, char *__unbounded *__unbounded ubp_av,
#ifdef LIBC_START_MAIN_AUXVEC_ARG
                ElfW(auxv_t) *__unbounded auxvec,
#endif
                __typeof (main) init,
                void (*fini) (void),
                void (*rtld_fini) (void), void *__unbounded stack_end)
{
#if __BOUNDED_POINTERS__
 char **argv;
#else
# define argv ubp_av
#endif
 /* Result of the 'main' function.  */
 int result;
 __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up;
...
...
 if (init)
   (*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
...
 result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
 exit (result);
}
可以看出，在该函数中，最终调用了 main()函数，并传入了相关命令行。（result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);）
截止到此，我们了解了整个 main()函数的调用过程，但是，仍然没有回答二哥的问题，main()是如何实现有参和无参两种方式的，其实说白了，在标准中，main()只有一种声明方式，即有参方式。无论是否有命令行参数，都调用该函数。如果有参数，则通过压栈出栈 (对于 x86 32 位) 或者寄存器 (x86 64 位) 的方式获取参数，然后传入 main()，如果命令行为空，则对应的字段为空（即没有从栈上取得对应的数据）。
 
阿里云 2 核 2G 服务器 3M 带宽 61 元 1 年，有高配
腾讯云新客低至 82 元 / 年，老客户 99 元 / 年
代金券：在阿里云专用满减优惠券

正文完

星哥玩云-微信公众号

发表至： linux教程

2024-07-24

0

转载说明：除特殊说明外本站文章皆由CC-4.0协议发布，转载请注明出处。

初识Kotlin 语言

RAID技术的基础介绍和总结

在Linux系统中有哪些命令可以用于查看进程?

使用WebP Server在不改变URL的情况下将网站图像转换为WebP

Linux下设置QT环境变量的具体方法

Nginx 五大常见应用场景

《Linux入门共学7天速成训练营》第2天第四部分学习任务：Linux基础入门笔记提交

用蛮力法解决选择排序问题

具体详实给你讲什么是系统解耦

巧妙方法教你实现多种main()

开源堡垒机JumpServer配置教程：使用步骤与配置

申请腾讯混元的API Key并且使用LobeChat调用混元AI

系统加固-Linux不允许用户使用密码登录，只能使用密钥登录

【开源安全保护】如何安装JumpServer堡垒机

Docker部署搭建一个开源强大的图书管理系统

怎样在Linux中用一个命令升级全部软件

Mariadb学习总结（八）：聚合函数及分组查询

详解：shell采集系统信息

上百TB的视频、图片如何保存？选择对象存储OSS性价比更高！

如何处理阿里云ssh连接慢

	gdb ./main -q
	Reading symbols from /mtad/main...done.
	(gdb) set backtrace past-entry
	(gdb) set backtrace past-main
	(gdb) show backtrace past-entry
	Whether backtraces should continue past the entry point of a program is on.
	(gdb) show backtrace past-main
	Whether backtraces should continue past "main" is on.

	(gdb) bt
	#0 main () at main.c:2
	#1 0x00007ffff7a2f555 in __libc_start_main () from /lib64/libc.so.6
	#2 0x0000000000400429 in _start ()
	(gdb)

	(gdb) r
	Starting program: xxx
	Missing separate debuginfos, use: debuginfo-install glibc-2.17-317.el7.x86_64
	Breakpoint 1, 0x0000000000400400 in _start ()
	(gdb) s
	Single stepping until exit from function _start,
	which has no line number information.
	0x00007ffff7a2f460 in __libc_start_main () from /lib64/libc.so.6

	#include "bp-sym.h"
	.text
	.globl _start
	.type _start,@function
	_start:
	/* Clear the frame pointer. The ABI suggests this be done, to mark
	the outermost frame obviously. */
	xorl %ebp, %ebp
	/* Extract the arguments as encoded on the stack and set up
	the arguments for `main': argc, argv. envp will be determined
	later in __libc_start_main. */
	popl %esi /* Pop the argument count. */
	movl %esp, %ecx /* argv starts just at the current stack top.*/
	/* Before pushing the arguments align the stack to a 16-byte
	(SSE needs 16-byte alignment) boundary to avoid penalties from
	misaligned accesses. Thanks to Edward Seidl
	for pointing this out. */
	andl $0xfffffff0, %esp
	pushl %eax /* Push garbage because we allocate
	28 more bytes. */
	/* Provide the highest stack address to the user code (for stacks
	which grow downwards). */
	pushl %esp
	pushl %edx /* Push address of the shared library
	termination function. */
	#ifdef SHARED
	/* Load PIC register. */
	call 1f
	addl $_GLOBAL_OFFSET_TABLE_, %ebx
	/* Push address of our own entry points to .fini and .init. */
	leal __libc_csu_fini@GOTOFF(%ebx), %eax
	pushl %eax
	leal __libc_csu_init@GOTOFF(%ebx), %eax
	pushl %eax
	pushl %ecx /* Push second argument: argv. */
	pushl %esi /* Push first argument: argc. */
	pushl BP_SYM (main)@GOT(%ebx)
	/* Call the user's main function, and exit with its value.
	But let the libc call main. */
	call BP_SYM (__libc_start_main)@PLT
	#else
	/* Push address of our own entry points to .fini and .init. */
	pushl $__libc_csu_fini
	pushl $__libc_csu_init
	pushl %ecx /* Push second argument: argv. */
	pushl %esi /* Push first argument: argc. */
	pushl $BP_SYM (main)
	/* Call the user's main function, and exit with its value.
	But let the libc call main. */
	call BP_SYM (__libc_start_main)
	#endif
	hlt /* Crash if somehow `exit' does return. */
	#ifdef SHARED
	1: movl (%esp), %ebx
	ret
	#endif
	/* To fulfill the System V/i386 ABI we need this symbol. Yuck, it's so
	meaningless since we don't support machines
	上述实现也是比较简单的：
	xorl %ebp, %ebp：将 ebp 寄存器清零。
	popl %esi、movl %esp, %ecx：装载器把用户的参数和环境变量压栈，实际上按照压栈的方法，栈顶的元素就是 argc，接着其下就是 argv 和环境变量的数组。这两句相当于 int argc = pop from stack; char **argv = top of stack。
	call BP_SYM (__libc_start_main)：相当于调用__libc_start_main，调用的时候传入参数，包括 argc、argv。
	上述逻辑功能，伪代码实现如下：
	void _start() {
	%ebp = 0;
	int argc = pop from stack
	char ** argv = top of stack;
	__libc_start_main(main, argc, argv, __libc_csu_init, __linc_csu_fini,
	edx, top of stack);
	}
	__libc_start_main
	在上一节中，我们了解到，_start()才是整个可执行程序的入口函数，在_start()函数中调用__libc_start_main()函数，该函数声明如下：
	STATIC int
	LIBC_START_MAIN (int (main) (int, char , char * MAIN_AUXVEC_DECL),
	int argc, char __unbounded __unbounded ubp_av,
	#ifdef LIBC_START_MAIN_AUXVEC_ARG
	ElfW(auxv_t) *__unbounded auxvec,
	#endif
	__typeof (main) init,
	void (*fini) (void),
	void (rtld_fini) (void), void __unbounded stack_end)
	{
	#if __BOUNDED_POINTERS__
	char **argv;
	#else
	# define argv ubp_av
	#endif
	/* Result of the 'main' function. */
	int result;
	__libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up;
	...
	...
	if (init)
	(*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
	...
	result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
	exit (result);
	}
	可以看出，在该函数中，最终调用了 main()函数，并传入了相关命令行。（result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);）
	截止到此，我们了解了整个 main()函数的调用过程，但是，仍然没有回答二哥的问题，main()是如何实现有参和无参两种方式的，其实说白了，在标准中，main()只有一种声明方式，即有参方式。无论是否有命令行参数，都调用该函数。如果有参数，则通过压栈出栈 (对于 x86 32 位) 或者寄存器 (x86 64 位) 的方式获取参数，然后传入 main()，如果命令行为空，则对应的字段为空（即没有从栈上取得对应的数据）。

	阿里云 2 核 2G 服务器 3M 带宽 61 元 1 年，有高配
	腾讯云新客低至 82 元 / 年，老客户 99 元 / 年
	代金券：在阿里云专用满减优惠券