Hi all,
A few months ago, started a conversation about what I'll call multi-tenant applications: that is, a single pike process running multiple applications (perhaps including copies of a given application) with isolated program and module spaces. The idea is to provide a similar capability to that provided by Java's ClassLoader API.
That is, I'd like to be able to do the following:
pike +-----------------|-----------------+ | | thread a1...an thread b1 ... bn compilation handler a compilation handler b loads classes for app1 instance 1 loads classes for app1 instance from location a,b from location a,c (perhaps) uses module foo.bar uses module foo.bar
foo.bar != foo.bar
though it would perhaps be acceptable if
object_program(foo.bar) == object_program(foo.bar)
applications would not be aware of each other (a condition of the multi-tenant contract) and thus object identity would not need to be maintained.
I've been dabbling with the approach suggested at the Pike conference, which was to use a compilation handler to provide this functionality. I've come tantalizingly close, being able to use an overriden master to provide versions of compile_string() and friends that automatically select the desired compilation handler based on various criteria (such as threads in an application). While a bit clunky, it seems to allow me to control visibility of identifiers in a given application or thread, but there does seem to be a limitation that for me is fatal: the programs and module objects are cached by the master, and therefore any two applications are not truely isolated: they share a common set of modules (and join/dirnodes) and (though less problematic) precompiled programs.Because programs and modules seem to be indexed by filename, I've played around with adding a unique identifier in order to split the cache on a per-handler basis, but this hasn't worked either (programs seem to be loaded as desired, but modules are still problematic.)
I've attempted to add storage of these caches in the compilation handler, but it results in extremely odd failures (for example, a given class will be cached as a zero value in the programs mapping, which means the program won't be found, even if it's on disk.
As a brute force attempt to prove that the idea can work, I'm thinking about short circuiting the auto-reload functionality so that it always reloads a given class from disk. I'm not sure that this will actually prove beneficial, as modules would still be persistent.
As always, any thoughts or suggestions would be welcome.
Bill
Isn't it necessary for you to implement something similar to the compat master scheme? I.e. not only have separate compilation handlers, but also separate master objects?
Looks like the compat master scheme uses subtyped object pointers, i.e. Pike_N_M_master::xxx, which I doubt would work for you, but you could instead have a global mapping somewhere where you keep track of the master objects for your "tenants".
Note also that some caches in the real master should be possible to keep global. E.g. fc, because it uses paths, and the objects mapping, because it's indexed on the program instances which are different when there is a real difference. The programs mapping also uses paths, but a problem there is the special "/master" entry, so it'd require some sort of wrapper object with `[], `[]= etc.
Yes, I think it will have to be something a bit more involved.
My original plan was to override all of the functions that are in play here (all of the methods defined in CompilationHandler, plus a number of others) so that they use the appropriate data.
However, as I think about this, perhaps the answer is even simpler:
The problem isn't strictly the methods themselves, it's more a matter of making them use the right set of data. Therefore, would it not be just as effective to implement getters/setters on the appropriate datasources:
class ResolutionEnvironment { array pike_module_path = ({}); array pike_include_path = ({}); mapping objects; // etc... }
mapping(Pike.Thread|string:ResolutionEvironment) _multitenant_threads = (["default": ResolutionEvironment()]);
`->pike_module_path() { array x; // do we have a special environment for this thread? if(x = _multitenant_threads[Thread.this_thread()]) { return x->pike_module_path; } // otherwise return the global environment else return _multitentant_threads["default"]->pike_module_path; }
My understanding is that the getter/setters operate at a lower level than standard `->(), so it's impossible to avoid them being called, which is desirable in this case.
Of course, this is all in addition to the necessary machinery to register a given configuration with one or more threads.
Bill
On Sun, 12 Feb 2012, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Isn't it necessary for you to implement something similar to the compat master scheme? I.e. not only have separate compilation handlers, but also separate master objects?
Looks like the compat master scheme uses subtyped object pointers, i.e. Pike_N_M_master::xxx, which I doubt would work for you, but you could instead have a global mapping somewhere where you keep track of the master objects for your "tenants".
Note also that some caches in the real master should be possible to keep global. E.g. fc, because it uses paths, and the objects mapping, because it's indexed on the program instances which are different when there is a real difference. The programs mapping also uses paths, but a problem there is the special "/master" entry, so it'd require some sort of wrapper object with `[], `[]= etc.
It turns out that using `->symbol and some other minor magic can be used to solve all of the problems I've been concerned about. I've run some simple tests that show two different threads with independent module/program paths that seem to be (almost) completely isolated from each other. The only code they share with each other is the standard set of static modules (which is an acceptable situation that could also be changed).
I'll need to test this quite a bit, but the initial results are encouraging.
Bill
On Mar 14, 2012, at 1:01 PM, Bill Welliver wrote:
Yes, I think it will have to be something a bit more involved.
My original plan was to override all of the functions that are in play here (all of the methods defined in CompilationHandler, plus a number of others) so that they use the appropriate data.
However, as I think about this, perhaps the answer is even simpler:
The problem isn't strictly the methods themselves, it's more a matter of making them use the right set of data. Therefore, would it not be just as effective to implement getters/setters on the appropriate datasources:
class ResolutionEnvironment { array pike_module_path = ({}); array pike_include_path = ({}); mapping objects; // etc... }
mapping(Pike.Thread|string:ResolutionEvironment) _multitenant_threads = (["default": ResolutionEvironment()]);
`->pike_module_path() { array x; // do we have a special environment for this thread? if(x = _multitenant_threads[Thread.this_thread()]) { return x->pike_module_path; } // otherwise return the global environment else return _multitentant_threads["default"]->pike_module_path; }
My understanding is that the getter/setters operate at a lower level than standard `->(), so it's impossible to avoid them being called, which is desirable in this case.
Of course, this is all in addition to the necessary machinery to register a given configuration with one or more threads.
Bill
For those following along, I may have been premature in my declaration of total victory, as the solution only seems to work with pike 7.9 (where it seems to work with very few issues). When used with 7.8, I get lots of errors like so:
/usr/local/pike/7.8.352/lib/modules/Protocols.pmod/HTTP.pmod/module.pmod:161:Got placeholder object when indexing module HTTP with 'Query'. (Resolver problem.)
The problem appears to be the programs and objects mappings in the master. If I replace either, these problems happen. If use the existing mappings from the current master (moved before using replace_master()), things work properly. If I do a shallow copy, things fail, so I'm inclined to believe that somethings holding on to those mappings, perhaps in dirnode().
Note that at this point, the error occurs before trying to use multiple compile environments; so it doesn't seem like it could be a matter of disjoint data for the resolver.
I haven't compared the differences between the two masters, but I know that changes were made to the resolver, correct?
Anyhow care to venture a guess as to the source of the problem?
Bill
On Thu, 15 Mar 2012, H. William Welliver III wrote:
It turns out that using `->symbol and some other minor magic can be used to solve all of the problems I've been concerned about. I've run some simple tests that show two different threads with independent module/program paths that seem to be (almost) completely isolated from each other. The only code they share with each other is the standard set of static modules (which is an acceptable situation that could also be changed).
/.../
/usr/local/pike/7.8.352/lib/modules/Protocols.pmod/HTTP.pmod/module.pmod:161:Got placeholder object when indexing module HTTP with 'Query'. (Resolver problem.)
The problem appears to be the programs and objects mappings in the master. If I replace either, these problems happen. If use the existing mappings from the current master (moved before using replace_master()), things work properly. If I do a shallow copy, things fail, so I'm inclined to believe that somethings holding on to those mappings, perhaps in dirnode().
It could also be that the old master is still called in some cases.
I haven't compared the differences between the two masters, but I know that changes were made to the resolver, correct?
Spontaneously I thought there would be, but there are actually fairly few commits to the master in 7.9 only:
git log --oneline 7.8..7.9 lib/master.pike.in
eac07315 Fixed compat resolver fallback order. fac36c39 Runtime: Changed backtrace representation for event handlers. 9eaaf898 Updated copyright. c5d9a9a6 Removed $Id$. 080e3aa0 Added support for dynamic compile-time macros. 0e22d157 Added RECUR_COMPILE_DEBUG to attempt to help debugging recursive resolver issues. 91ac5642 Ensure _master_file_name is set even without -m. fea47d91 Fixed unbalanced use of INC/DEC_RESOLV_MSG_DEPTH() in dirnode()->low_ind() a63ecdbd Instantiate the fallback codecs instead of using the master directly. 548e838f Improved unregister() to find stuff in joinnodes a bit better. 3032b456 Deprecating pike.ida.liu.se for pike.lysator.liu.se. eb6f0eef master: Restored lost comment. 1a03d133 Clean up some create():s 0e26166a Added compatibility mode for Pike 7.8. 6dd09dad Fixed the codec to handle the Val module values in a good way. 7729dbc4 Improved master compatibility with Pike 7.6. 130970d5 Improved describe_function for top level functions in modules. e15f7f13 Added callbacks to allow overlaying masters to read precompiled code from other sources. 9dcabf1e Give dirnodes and joinnodes real names to improve sprintf output.
I haven't dug around, but I see nothing obvious there that could have a bearing on this problem.
The problem isn't strictly the methods themselves, it's more a matter of making them use the right set of data. Therefore, would it not be just as effective to implement getters/setters on the appropriate datasources:
/.../
That could be a simpler alternative, I guess. I think it'd be slightly slower though.
My understanding is that the getter/setters operate at a lower level than standard `->(), so it's impossible to avoid them being called, which is desirable in this case.
Yes. That's why I prefer the syntax `foo and `foo=, besides it being shorter. I actually think it's somewhat unfortunate that the `->foo and `->foo= syntax wasn't removed from the start, because it just adds confusion.
Bill Welliver wrote:
multi-tenant applications: that is, a single pike process running multiple applications (perhaps including copies of a given application) with isolated program and module spaces. The idea is to provide a similar capability to that provided by Java's ClassLoader API.
Interesting as such, but, what would be the real benefit of this approach as opposed to simply starting multiple instances of Pike? Or is this geared towards an embedded solution where starting another Pike is difficult and/or impossible?
Hi Stephen,
Welcome back.
Good question. Obviously starting a new process is always going to be the most flexible/straight-forward approach. That's the approach I've been using, for a while now, however once you have 5-10 pike instances running in a resource constrained environment, elbow room begins to become a problem: the overhead of a running pike instance becomes important.
So, benefits might include:
- Minimize overhead of each additional pike interpreter - Enable easier solutions to embedded problems (as you suggest) - Enable multiple appplications to share a single port without having the (perhaps simply different) complexity of running a reverse proxy
There may be other "benefits" that I'm not thinking of, but those are the major benefits for me.
Best,
Bill
On Wed, 28 Mar 2012, Stephen R. van den Berg wrote:
Bill Welliver wrote:
multi-tenant applications: that is, a single pike process running multiple applications (perhaps including copies of a given application) with isolated program and module spaces. The idea is to provide a similar capability to that provided by Java's ClassLoader API.
Interesting as such, but, what would be the real benefit of this approach as opposed to simply starting multiple instances of Pike? Or is this geared towards an embedded solution where starting another Pike is difficult and/or impossible? -- Stephen.
Being able to try has no purpose if failing is not an option.
pike-devel@lists.lysator.liu.se