全链路跟踪（链路追踪技术选型）

背景：本人主要在做C++ SDK的开发，需要给到业务端去集成，在集成的过程中可能会出现某些功能性bug，即没有得到想要的结果。那怎么调试?

分析：这种问题其实调试起来稍微有点困难，它不像crash，当发生crash时还能拿到堆栈信息去分析，然而功能性bug没有crash，也就没法捕捉对应到当时的堆栈信息。因为不是在本地，也没法用编译器debug。那思路就剩log了，一种方式是考虑在SDK内部的关键路径下打印详细的log，当出现问题时拿到log去分析。然而总有漏的时候，谁能保证log一定打的很全面，很有可能问题就出现在没有log的函数中。

解决：基于上面的背景和问题分析，考虑是否能做一个全链路追踪的方案，把打印出整个SDK的调用路径，从哪个函数进入，从哪个函数退出等。

想法1：可以考虑在SDK的每个接口都加一个context结构体参数，记录下来函数的调用路径，这可能是比较通用有效的方案，但是SDK接口已经固定了，更改接口要面临的困难很大，业务端基本不会同意，所以这种方案不适合我们现有情况，当然一个从0开始建设的中间件和SDK可以考虑考虑。

想法2：有没有一种不用改接口，还能追踪到函数调用路径的方案?

继续沿着这个思路继续调研，我找到了gcc和clang编译器的一个编译参数：-finstrument-functions，编译时添加此参数会在函数的入口和出口处触发一个固定的回调函数，即：

__cyg_profile_func_enter(void*callee,void*caller);
__cyg_profile_func_exit(void*callee,void*caller);

参数就是callee和caller的地址，那怎么将地址解析成对应函数名?可以使用dladdr函数：

intdladdr(constvoid*addr,Dl_info*info);

看下下面的代码：

//tracing.cc
#include<cxxabi.h>
#include<dlfcn.h>//fordladdr
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#ifndefNO_INSTRUMENT
#defineNO_INSTRUMENT__attribute__((no_instrument_function))
#endif
extern"C"__attribute__((no_instrument_function))void__cyg_profile_func_enter(void*callee,void*caller){
Dl_infoinfo;
if(dladdr(callee,&info)){
intstatus;
constchar*name;
char*demangled=abi::__cxa_demangle(info.dli_sname,NULL,0,&status);
if(status==0){
name=demangled?demangled:"[notdemangled]";
}else{
name=info.dli_sname?info.dli_sname:"[nodli_snamendstd]";
}
printf("enter%s(%s)\n",name,info.dli_fname);
if(demangled){
free(demangled);
demangled=NULL;
}
}
}
extern"C"__attribute__((no_instrument_function))void__cyg_profile_func_exit(void*callee,void*caller){
Dl_infoinfo;
if(dladdr(callee,&info)){
intstatus;
constchar*name;
char*demangled=abi::__cxa_demangle(info.dli_sname,NULL,0,&status);
if(status==0){
name=demangled?demangled:"[notdemangled]";
}else{
name=info.dli_sname?info.dli_sname:"[nodli_snameandstd]";
}
printf("exit%s(%s)\n",name,info.dli_fname);
if(demangled){
free((void*)demangled);
demangled=NULL;
}
}
}

这是测试文件：

//test_trace.cc
voidfunc1(){}
voidfunc(){func1();}
intmain(){func();}
将test_trace.cc和tracing.cc文件同时编译链接，即可达到链路追踪的目的：
g++test_trace.cctracing.cc-std=c++14-finstrument-functions-rdynamic-ldl;./a.out
输出：entermain(./a.out)
enterfunc()(./a.out)
enterfunc1()(./a.out)
exitfunc1()(./a.out)
exitfunc()(./a.out)
exitmain(./a.out)

如果在func()中调用了一些其他的函数呢?

#include<iostream>
#include<vector>
voidfunc1(){}
voidfunc(){
std::vector<int>v{1,2,3};
std::cout<<v.size();
func1();
}
intmain(){func();}

再重新编译后输出会是这样：

enter[nodli_snamendstd](./a.out)
enter[nodli_snamendstd](./a.out)
exit[nodli_snameandstd](./a.out)
exit[nodli_snameandstd](./a.out)
entermain(./a.out)
enterfunc()(./a.out)
enterstd::allocator<int>::allocator()(./a.out)
enter__gnu_cxx::new_allocator<int>::new_allocator()(./a.out)
exit__gnu_cxx::new_allocator<int>::new_allocator()(./a.out)
exitstd::allocator<int>::allocator()(./a.out)
enterstd::vector<int,std::allocator<int>>::vector(std::initializer_list<int>,std::allocator<int>const&)(./a.out)
enterstd::_Vector_base<int,std::allocator<int>>::_Vector_base(std::allocator<int>const&)(./a.out)
enterstd::_Vector_base<int,std::allocator<int>>::_Vector_impl::_Vector_impl(std::allocator<int>const&)(./a.out)
enterstd::allocator<int>::allocator(std::allocator<int>const&)(./a.out)
enter__gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int>const&)(./a.out)
exit__gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int>const&)(./a.out)
exitstd::allocator<int>::allocator(std::allocator<int>const&)(./a.out)
exitstd::_Vector_base<int,std::allocator<int>>::_Vector_impl::_Vector_impl(std::allocator<int>const&)(./a.out)
exitstd::_Vector_base<int,std::allocator<int>>::_Vector_base(std::allocator<int>const&)(./a.out)

上面我只贴出了部分信息，这显然不是我们想要的，我们只想要显示自定义的函数调用路径，其他的都想要过滤掉，怎么办?

这里可以将自定义的函数都加一个统一的前缀，在打印时只打印含有前缀的符号，这种个人认为是比较通用的方案。

下面是我过滤掉std和gnu子串的代码：

if(!strcasestr(name,"std")&&!strcasestr(name,"gnu")){
printf("enter%s(%s)\n",name,info.dli_fname);
}
if(!strcasestr(name,"std")&&!strcasestr(name,"gnu")){
printf("exit%s(%s)\n",name,info.dli_fname);
}

重新编译后就会输出我想要的结果：

g++test_trace.cctracing.cc-std=c++14-finstrument-functions-rdynamic-ldl;./a.out
输出：entermain(./a.out)
enterfunc()(./a.out)
enterfunc1()(./a.out)
exitfunc1()(./a.out)
exitfunc()(./a.out)
exitmain(./a.out)

还有一种方式是在编译时使用下面的参数：

-finstrument-functions-exclude-file-list

它可以排除不想要做trace的文件，但是这个参数只在gcc中可用，在clang中却不支持，所以上面的字符串过滤方式更通用一些。

上面只能拿到函数的名字，不能定位到具体的文件和行号，如果想要获得更多信息，需要结合bfd系列参数(bfd_find_nearest_line)和libunwind一起使用，大家可以继续研究。。。

tips1：这是一篇抛砖引玉的文章，本人不是后端开发，据我所知后端C++中有很多成熟的trace方案，大家有更好的方案可以留言，分享一波。

tips2：上面的方案可以达到链路追踪的目的，但本人最后没有应用到项目中，因为本人在做的项目对性能要求较高，使用此种方案会使整个SDK性能下降严重，无法满足需求正常运行。于是暂时放弃了链路追踪的这个想法。

本文的知识点还是值得了解一下的，大家或许会用得到。在研究的过程中我也发现了一个基于此种方案的开源项目(call-stack-logger)，感兴趣的也可以去了解了解。

原文地址：https://mp.weixin.qq.com/s/ZZd_o_x5Ti8o8haMjG0btw

如果您对该产品感兴趣，请填写办理（客服微信：xiaoxiongyidong）

关于作者：访客

全链路跟踪（链路追踪技术选型）

发表评论取消回复

业务办理

运营商合作

为您推荐：

发表评论 取消回复

业务办理

运营商合作

发表评论取消回复