淘宝商城(天猫)高级技术专家.3年研发+3年性能测试调优/系统测试+4年团队管理与测试架构、研发系统实践. 新舞台新气象, 深化测试基础架构及研发架构,希望能在某个技术领域成为真正的技术大牛。欢迎荐才http://bbs.51testing.com/viewthread.php?tid=120496&extra=&page=1 .邮件: jianzhao.liangjz@alibaba-inc.com,MSN:liangjianzhao@163.com.微博:http://t.sina.com.cn/1674816524

spring dm 退出时发生core dump以及QA避开陷阱

上一篇 / 下一篇  2009-04-12 16:52:06 / 个人分类:java性能监控与调优

症状 

telnet localhost 2401

>exit

或者/home/aranda/springsource/bin/shutdown.sh

 spring dm退出时发生CORE DUMP:



[aranda@dc_4 bin]$ *** glibc detected *** /home/aranda/software/jdk1.6.0_12/bin/java:corrupted double-linked list:0x00000000509cb740 ***

======= Backtrace: =========

/lib64/libc.so.6[0x3e11e7155c]

/lib64/libc.so.6(cfree+0x8c)[0x3e11e74c5c]

/home/aranda/aranda.home/bin/apr/lib64/libapr-1.so.0(apr_allocator_destroy+0x1b)[0x2aaaf61e559b]

/home/aranda/aranda.home/bin/apr/lib64/libapr-1.so.0(apr_pool_terminate+0x2d)[0x2aaaf61e61fd]

[0x2aaaab1467b0]

======= Memory map: ========

40000000-40009000 r-xp 00000000 08:08 3442306                            /home/aranda/software/jdk1.6.0_12/bin/java

40108000-4010a000 rwxp 00008000 08:08 3442306                            /home/aranda/software/jdk1.6.0_12/bin/java

4010b000-4010e000 ---p 4010b000 00:00 0

4010e000-4014c000 rwxp 4010e000 00:00 0

42015000-42115[2009-04-12 14:40:57.227] Thread-1                 <SPOF0004I> Shutdown initiated.

[2009-04-12 14:40:57.286] server-dm-1              <SPSC0002I> Shutting down ServletContainer.

/home/aranda/springsource/bin/startup.sh: line 128:  4896 Aborted                 $JAVA_HOME/bin/java $JAVA_OPTS $DEBUG_OPTS $APP_OPTS $JMX_OPTS -Dcom.springsource.server.home=$SERVER_HOME -Dcom.springsource.server.configDir=$CONFIG_DIR -Djava.io.tmpdir=$SERVER_HOME/work/tmp/ -classpath $CLASSPATH com.springsource.server.kernel.bootstrap.Bootstrap

 

 

 

[aranda@dc_4 ~]$ gdb   software/jdk1.6.0_12/bin/java  /home/core/core-java-1267-1239518181

GNU gdb Red Hat Linux (6.5-37.el5rh)

 

Core was generated by `/home/aranda/software/jdk1.6.0_12/bin/java -Djava.library.path=/home/aranda/ara'.

Program terminated with signal 6, Aborted.

#0  0x0000003e11e30155 inraise () from /lib64/libc.so.6

(gdb) bt

#0  0x0000003e11e30155 inraise () from /lib64/libc.so.6

#1  0x0000003e11e31bf0 inabort () from /lib64/libc.so.6

#2  0x0000003e11e6a38b in __libc_message () from /lib64/libc.so.6

#3  0x0000003e11e7155cin _int_free () from /lib64/libc.so.6

#4  0x0000003e11e74c5cin free () from /lib64/libc.so.6

#5  0x00002aaaf61e559b in apr_allocator_destroy (allocator=0x5105b720) at memory/unix/apr_pools.c:134

#6  0x00002aaaf61e61fd in apr_pool_terminate () at memory/unix/apr_pools.c:602

#7  0x00002aaaab1467b0 in?? ()

 

解决方案:

 

修改/home/aranda/springsource/config/servletContainer.config下面的"enabled"false,

 

 

 

"listeners": [

                        {

                                /*

                                 * APR library loader.

                                * Documentation at http://tomcat.apache.org/tomcat-6.0-doc/apr.html

                                */

                                "enabled":false,

                                "className": "org.apache.catalina.core.AprLifecycleListener",

                                "SSLEngine": "off"

                        },

 

 

原因分析

 

问题原因已经基本查明,无论64bit还是32bit JVM,只要使用了apr, spring-dm都会crash.  已经把详细的原因分析发布到springsource官方论坛中,摘抄如下:

 When the shutdown sequence is initialized, the tomcat AprLifecycleListener will got "AFTER_STOP_EVENT", then Library.terminate() will be called and finally the c function apr_terminate() will be called. then all memories managed by APR library will be released.

But unfortunately, inside dm-server, the AprLifecycleListner no-longer be the last one die.

Even though apr library has already been terminated, but the shuttingdown sequence of dm-server is still in progress, and dm-server's own executor(an instance of DelegatingExecutor which is set into the AprEndPoint when starting up) still has the opportunity to handle the broken socket, which is the native Socket class associated with some apr data structure.
Inside the AprEndpoint logic, it will invoke native apr c method to close/release those broken sockets, but those associated apr structures has already been released in previous apr_terminate() call!

 

 

通过跟踪分析apr/tomcat/spring-dm的源代码,我们认为修改dmserver/config/servletContainer.config,AprLifecycleListener关闭即可避免该crash.

多说几句,没有AprLifecycleListener, DM-Server还是安全的.因为其中的tomcat connector会自行初始化libtcnative;而在jvm退出的时候,libtcnative会收到jvm回调而释放apr管理的内存池.

 

此外,这个问题即便不处理也没关系,因为运行时是不会发生这种crash.这里的关键是:我们是否把问题分析透彻了,并找到了故障根源.

 

 

制定对策

 

默认情况下linux  /etc/profile会关闭core dump输出

# No core files by default

ulimit -S -c 0 > /dev/null 2>&1

 

对于c++应用以及java jni应用,应该修改/etc/profile文件为

ulimit -S -c ulimited > /dev/null 2>&1

或者在.bash_profile文件修改

Ulimit –c ulimited

 

默认在应用启动目录下产生core dump文件.可以通过修改    /proc/sys/kernel/core_pattern

/home/core/core-%e-%p-%t

生成特定命名格式的文件 

另外对于JAVA应用,core dump会产生hs开头的文件

 

 Spring dm1.0.1已经发现过多个BUG,对于这类新应用大家特别小心,应该全面检查各项输出,包括spring日志/系统/var/log/message等日志.

    

 对于这些新应用,需要和研发/需求方非常明确支持的JVM, web server, application,OS以及补丁版本.一些微小差异也会导致BUG .


TAG: GDB Spring spring dm core dump gdb

 

评分:0

我来说两句

Open Toolbar