一个Linux上分析死锁的简单方法

发表于:2017-1-09 10:19

字体: | 上一篇 | 下一篇 | 我要投稿

 作者:IBM    来源:51Testing软件测试网采编

  清单 2. 编译测试程序
  [dyu@xilinuxbldsrvpurify]$ g++ -g lock.cpp -o lock -lpthread
  清单 3. 查找测试程序的进程号
  [dyu@xilinuxbldsrvpurify]$ ps -ef|greplock
  dyu      6721  5751  0 15:21 pts/3    00:00:00 ./lock
  清单 4. 对死锁进程第一次执行 pstack(pstack –进程号)的输出结果
[dyu@xilinuxbldsrvpurify]$ pstack 6721
Thread 5 (Thread 0x41e37940 (LWP 6722)):
#0  0x0000003d1a80d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003d1a808e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003d1a808cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400a9b in func1() ()
#4  0x0000000000400ad7 in thread1(void*) ()
#5  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#6  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x42838940 (LWP 6723)):
#0  0x0000003d1a80d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003d1a808e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003d1a808cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400a17 in func2() ()
#4  0x0000000000400a53 in thread2(void*) ()
#5  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#6  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x43239940 (LWP 6724)):
#0  0x0000003d19c9a541 in nanosleep () from /lib64/libc.so.6
#1  0x0000003d19c9a364 in sleep () from /lib64/libc.so.6
#2  0x00000000004009bc in thread3(void*) ()
#3  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#4  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x43c3a940 (LWP 6725)):
#0  0x0000003d19c9a541 in nanosleep () from /lib64/libc.so.6
#1  0x0000003d19c9a364 in sleep () from /lib64/libc.so.6
#2  0x0000000000400976 in thread4(void*) ()
#3  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#4  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b984ecabd90 (LWP 6721)):
#0  0x0000003d1a807b35 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000400900 in main ()
  清单 5. 对死锁进程第二次执行 pstack(pstack –进程号)的输出结果
[dyu@xilinuxbldsrvpurify]$ pstack 6721
Thread 5 (Thread 0x40bd6940 (LWP 6722)):
#0  0x0000003d1a80d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003d1a808e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003d1a808cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400a87 in func1() ()
#4  0x0000000000400ac3 in thread1(void*) ()
#5  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#6  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x415d7940 (LWP 6723)):
#0  0x0000003d1a80d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003d1a808e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003d1a808cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400a03 in func2() ()
#4  0x0000000000400a3f in thread2(void*) ()
#5  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#6  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x41fd8940 (LWP 6724)):
#0  0x0000003d19c7aec2 in memset () from /lib64/libc.so.6
#1  0x00000000004009be in thread3(void*) ()
#2  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#3  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x429d9940 (LWP 6725)):
#0  0x0000003d19c7ae0d in memset () from /lib64/libc.so.6
#1  0x0000000000400982 in thread4(void*) ()
#2  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#3  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2af906fd9d90 (LWP 6721)):
#0  0x0000003d1a807b35 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000400900 in main ()
  连续多次查看这个进程的函数调用关系堆栈进行分析:当进程吊死时,多次使用 pstack 查看进程的函数调用堆栈,死锁线程将一直处于等锁的状态,对比多次的函数调用堆栈输出结果,确定哪两个线程(或者几个线程)一直没有变化且一直处于等锁的状态(可能存在两个线程 一直没有变化)。
  输出分析:
  根据上面的输出对比可以发现,线程 1 和线程 2 由第一次 pstack 输出的处在 sleep 函数变化为第二次 pstack 输出的处在 memset 函数。但是线程 4 和线程 5 一直处在等锁状态(pthread_mutex_lock),在连续两次的 pstack 信息输出中没有变化,所以我们可以推测线程 4 和线程 5 发生了死锁。
  Gdb into thread 输出:
  清单 6. 然后通过 gdb attach 到死锁进程
(gdb) infothread
5 Thread 0x41e37940 (LWP 6722)  0x0000003d1a80d4c4 in __lll_lock_wait ()
from /lib64/libpthread.so.0
4 Thread 0x42838940 (LWP 6723)  0x0000003d1a80d4c4 in __lll_lock_wait ()
from /lib64/libpthread.so.0
3 Thread 0x43239940 (LWP 6724)  0x0000003d19c9a541 in nanosleep ()
from /lib64/libc.so.6
2 Thread 0x43c3a940 (LWP 6725)  0x0000003d19c9a541 in nanosleep ()
from /lib64/libc.so.6
* 1 Thread 0x2b984ecabd90 (LWP 6721)  0x0000003d1a807b35 in pthread_join ()
from /lib64/libpthread.so.0
  清单 7. 切换到线程 5 的输出
(gdb) thread 5
[Switchingto thread 5 (Thread 0x41e37940 (LWP 6722))]#0  0x0000003d1a80d4c4 in
__lll_lock_wait () from /lib64/libpthread.so.0
(gdb) where
#0  0x0000003d1a80d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003d1a808e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003d1a808cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400a9b in func1 () at lock.cpp:18
#4  0x0000000000400ad7 in thread1 (arg=0x0) at lock.cpp:43
#5  0x0000003d1a80673d in start_thread () from /lib64/libpthread.so.0
#6  0x0000003d19cd40cd in clone () from /lib64/libc.so.6
  清单 8. 线程 4 和线程 5 的输出
(gdb) f 3
#3  0x0000000000400a9b in func1 () at lock.cpp:18
18          pthread_mutex_lock(&mutex2);
(gdb) thread 4
[Switchingto thread 4 (Thread 0x42838940 (LWP 6723))]#0  0x0000003d1a80d4c4 in
__lll_lock_wait () from /lib64/libpthread.so.0
(gdb) f 3
#3  0x0000000000400a17 in func2 () at lock.cpp:31
31          pthread_mutex_lock(&mutex1);
(gdb) p mutex1
$1 = {__data = {__lock = 2, __count = 0, __owner = 6722, __nusers = 1, __kind = 0,
__spins = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = "0200000000000000B32000001", '00'
, __align = 2}
(gdb) p mutex3
$2 = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = '00' , __align = 0}
(gdb) p mutex2
$3 = {__data = {__lock = 2, __count = 0, __owner = 6723, __nusers = 1,
__kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = "0200000000000000C32000001", '00'
, __align = 2}
(gdb)
  从上面可以发现,线程 4 正试图获得锁 mutex1,但是锁 mutex1 已经被 LWP 为 6722 的线程得到(__owner = 6722),线程 5 正试图获得锁 mutex2,但是锁 mutex2 已经被 LWP 为 6723 的 得到(__owner = 6723),从 pstack 的输出可以发现,LWP 6722 与线程 5 是对应的,LWP 6723 与线程 4 是对应的。所以我们可以得出, 线程 4 和线程 5 发生了交叉持锁的死锁现象。查看线程的源代码发现,线程 4 和线程 5 同时使用 mutex1 和 mutex2,且申请顺序不合理。
  总结
  本文简单介绍了一种在 Linux 平台下分析死锁问题的方法,对一些死锁问题的分析有一定作用。希望对大家有帮助。理解了死锁的原因,尤其是产生死锁的四个必要条件,就可以最大可能地避免、预防和解除死锁。所以,在系统设计、进程调度等方面注意如何不让这四个必要条件成立,如何确定资源的合理分配算法,避免进程永久占据系统资源。此外,也要防止进程在处于等待状态的情况下占用资源 , 在系统运行过程中,对进程发出的每一个系统能够满足的资源申请进行动态检查,并根据检查结果决定是否分配资源,若分配后系统可能发生死锁,则不予分配,否则予以分配。因此,对资源的分配要给予合理的规划,使用有序资源分配法和银行家算法等是避免死锁的有效方法。
22/2<12
《2023软件测试行业现状调查报告》独家发布~

关注51Testing

联系我们

快捷面板 站点地图 联系我们 广告服务 关于我们 站长统计 发展历程

法律顾问:上海兰迪律师事务所 项棋律师
版权所有 上海博为峰软件技术股份有限公司 Copyright©51testing.com 2003-2024
投诉及意见反馈:webmaster@51testing.com; 业务联系:service@51testing.com 021-64471599-8017

沪ICP备05003035号

沪公网安备 31010102002173号