Re: The plot thickens (Re: Segmentation fault cause)

23 Jul 2020


      Tobias S. Josefowitz @ Pike developers forum wrote:
...
...
Am I reading those correctly that both are upon thread creation?
...
First, nice backtrace. I think it's more that any thread started ever
will have been started by thread creation, and that's what we're
seeing here, not that this necessarily immediately follows thread
creation.
That was the other explanation, but since I don't know enough about the
innards of Pike in this respect, I wasn't quite sure.
...
#ifdef PIKE_USE_MACHINE_CODE
         call_check_threads_etc();
#endif
...
I assume you have machine code enabled. call_check_threads_etc()
Normally I do, however, to get to the bottom of this, I temporarily compiled
with:
gcc-9 -g -O1 -pipe -DPIKE_DEBUG=1
  --with-cdebug --with-rtldebug --with-valgrind
  --with-double-precision --with-long-int \
  --disable-noopty-retry \
  --without-machine-code \
  --with-poll \
  --with-portable-bytecode \
...
indeed may schedule other threads and stuff and we may return from it
with the object we just looked up the identifier in destructed. When
we then call the identifier, we will indeed call into a destructed
object.
I already had the distinct impression this was happening.  In the pgsql driver
I've had numerous issues over the years where I had to cover for methods
running in destructed objects (because the driver is asynchronous to the bone).
But in most cases this did not result in segfaults, so I'm not quite sure
if the rest of the Pike system is more robust against this naturally.
I have one unexplained segfault there of about six months ago, but that was
so hard to reproduce, that it might as well have been the same problem I'm
chasing now.
...
Now, what to do about it... indeed check at every function entry that
we're not destructed? I don't know, but that just doesn't feel
cool.
Actually, it seems like Pike actually does this check before *every* access
of a local variable.  I.e. in those numerous cases I had to deal with in pgsql
I invariably got an exception like "lookup in destructed object".
With that in mind, a check upon function entry does not sound so bad.
P.S. Speaking about asynchronous destructs.  One of the "fun" things
     I discovered about three months ago was that since the actual destruct()
     method can also be called while being inside a totally random stackframe
     (very far beyond the stackframe where the object scope actually already
     ended), it is quite hazardous to try to acquire any kind of mutex
     from within a destruct() method, since it can result in very random
     and very rare deadlock situations.
-- 
Stephen.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: The plot thickens (Re: Segmentation fault cause)