个人网站: www.7dtest.com 7点测试群:(61369656)------(77273408)------(35710365)------(9410090)

How to get rid of FIN_WAIT1 sockets

上一篇 / 下一篇  2008-06-03 09:39:06 / 个人分类:Zee的生活

> As I'd reported somne time back, I've been having problems with
> these MPICH socket timeouts, with sockets going into FIN_WAIT1's and
> never coming out.

A socket enters the FIN_WAIT_1 state when one side of a connection calls
close() on an open socket (causing a FIN to be transmitted to the other
end). It stays in this state whilst waiting for the other end to respond
with an ACK to the FIN that was transmitted to it. The remote (should)
automatically send the ACK, causing the client to enter the FIN_WAIT_2
state (This is done by the kernel). It remains in this state until the
remote sends LAST_ACK. This happens when the other side calls close()
on it's end of the socket. At that point it will enter the TIME_WAIT
state where it will stay for the 2MSL timeout (30, 60 or 180 seconds
typically, linux == 60).

>From my interpretation, one end of your application is not responding to
the socket close requests that the client is making thus suspeding the
socket connection in the FIN_WAIT_1 state until the connection times out.
I can't imagine how this could be happening as a crash on the remote end
would cause the socket to be closed by the OS and the ACK to get it from
FIN_WAIT_1 to FIN_WAIT_2 (should) be send by the OS.

> Short of rebooting, is there a command I can use to kill them?

Alas, no. At least, not without forging some packets. You may, however
be able to solve it another way.

> I'm not even able to experiment with them piling up. I tried ifdowning
> and then ifup'ing them(ifconfig down followed by ifconfig up) but it
> does nothing to clear the sockets.

Unfortunatley the ifdown/up of the interface should (correctly) have no
effect on the connection.

On solution I recommend ( as it may allow you to re-use socket numbers
while waiting for the timeout ) is to enable the SO_REUSEADDR option. My
quick skimming of STEVENS 1994, section 18.6 does not directly address
FIN_WAIT_1 this, however it may help. The more typical problem I have
seen is a bunch of sockets in the FIN_WAIT_2 or TIME_WAIT states.

Here's the magic snip of code from our socket class.

...
  int* value = new int(1);
  setsockopt( Socket, SOL_SOCKET, SO_REUSEADDR, (int*) value,
                                                sizeof( value ) );
...

Another solution would be to go straight to the source (pun intended) and
change the various TCP timeouts in include/net/tcp.h. My copy of the
source delcares three interesting timers. I didn't read tcp.c carefully
to see their effects, but you have TCP_TIMEOUT_LEN which is 15 minutes,
TCP_TIMEWAIT_LEN (60s), which is the time to spend waiting for a socket to
close(!), and TCP_FIN_TIMEOUT which is 3 minutes. I don't know which one
would apply to a socket in FIN_WAIT_1. In a beowulf, however, I can't
imagine measuring timeouts in minutes as being remotely applicable.

C=)

P.S. setsockopt(2) & _TCP/IP_Illustrated_Volume_1_ section 18.6 should
prove useful.


TAG: Zee的生活

引用 删除 83969029   /   2011-11-17 15:37:34
阿肥好
 

评分:0

我来说两句

我的栏目

日历

« 2024-03-23  
     12
3456789
10111213141516
17181920212223
24252627282930
31      

数据统计

  • 访问量: 156652
  • 日志数: 146
  • 图片数: 1
  • 建立时间: 2006-12-05
  • 更新时间: 2012-11-16

RSS订阅

Open Toolbar