we've been using fsh for some internal stuff recently, along with openssh (currently we're using version 3.0.2). we've been having some problems with ssh connections failing with the message:
Received disconnect from x.x.x.x: 2: fork failed: Resource temporarily unavailable
the problem seems related to either a lot of openssh procs, a lot of fsh procs, or a combination. currently we have a cron job set to execute every 15 minutes which does:
for A in `ps aux |grep fsh |egrep -v "reset_fsh|fsh -T" |awk '{print $2}'`; do kill $A done
this fixes the problem, but obviously isn't a good long-term solution.
we use ssh and scp for a great deal of stuff, so the machines that are having problems (which act as controller machines for our other machines) have a LOT of ssh procs running usually.
most of the machines are debian linux (potato) with a custom built fsh 1.1 package (based on the woody package). there are a few freebsd machines as well.
the change has become problematic since we switched to using ssh v2 everywhere (i assume the extra processor overhead needed to deal with the larger dsa keys doesn't help).
so i realize the problem is most likely partially with ssh, but it only seems to become problematic when we're using fsh to tunnel connections.
any tips on setting fsh, ssh, or both to allow more concurrent procs? i tried changing the limits.h include file to allow more procs and recompiling ssh. this didn't seem to help.
i'm a bit more hesitent to edit the fsh stuff since i don't know python at all.
anyway, any help would be greatly appreciated, especially from people who use fsh with a great number of concurrent connections.
Will Yardley william@hq.newdream.net writes:
we've been using fsh for some internal stuff recently, along with openssh (currently we're using version 3.0.2). we've been having some problems with ssh connections failing with the message:
Received disconnect from x.x.x.x: 2: fork failed: Resource temporarily unavailable
the problem seems related to either a lot of openssh procs, a lot of fsh procs, or a combination. currently we have a cron job set to execute every 15 minutes which does:
for A in `ps aux |grep fsh |egrep -v "reset_fsh|fsh -T" |awk '{print $2}'`; do kill $A done
this fixes the problem, but obviously isn't a good long-term solution.
we use ssh and scp for a great deal of stuff, so the machines that are having problems (which act as controller machines for our other machines) have a LOT of ssh procs running usually.
most of the machines are debian linux (potato) with a custom built fsh 1.1 package (based on the woody package). there are a few freebsd machines as well.
the change has become problematic since we switched to using ssh v2 everywhere (i assume the extra processor overhead needed to deal with the larger dsa keys doesn't help).
so i realize the problem is most likely partially with ssh, but it only seems to become problematic when we're using fsh to tunnel connections.
any tips on setting fsh, ssh, or both to allow more concurrent procs? i tried changing the limits.h include file to allow more procs and recompiling ssh. this didn't seem to help.
The error message is probably not due to ssh or fsh putting a limit to the number of processes. My guess is that you are running into a limit of the operating system on the number of concurrent processes you can have.
You say that you are running Linux. I don't know what version of the kernel that Debian Potato uses. You probably have to recompile your kernel. Take a look at http://www.linuxraid.org/ under the heading "Increase the number of processes that may run simultaneously". The patch may be helpful. Some of the other patches on the same page are probably also useful.
It is possible you can tweak this at runtime if you are running Linux 2.4, but I didn't find any info during a quick Google search.
I'M a bit more hesitent to edit the fsh stuff since i don't know python at all.
I doubt that Python or fsh has anything to do with the problem. You could recompile fsh and give "--enable-timeout=3600" to cause fsh connections to time out after an hour of idle time, instead of the default of 10 hours.
anyway, any help would be greatly appreciated, especially from people who use fsh with a great number of concurrent connections.
I've only used fsh in small installations, but since nobody else has stepped forward and offered more insight I thought I would share what little I know.
/ceder