Regarding the new String.secure feature, what do you think about censoring such strings in sprintf("%O") output? That'd make it very useful to avoid password leakage in backtraces etc. All other operations on it - especially sprintf("%q") - would continue to work as usual.
We discussed this at Per's (after you left, probably). We all agreed that it was a good idea, but there are some problems that need solving.
* Censoring the string should not be based on the flag in the string object. If it was, having a String.secure("hej") would prevent also non-secure "hej"s from being displayed.
* Therefore censoring would have to be based on a flag in the subtype field of the svalue, also set by String.secure(). However, subtypes are lost when the string is stored in a short svalue (such as a class variable).
Aha, ok. Then the only option afaics is to add a string-lookalike class, e.g. String.Secure.
is there any advantage to not share them (with normal strings) i am thinking of features like make sure that secure strings are never written to swap (don't know if that is even possible)
greetings, martin.
Another thing that could be a problem is that it's possible to guess passwords by generating candidate strings and see whether you get more than your own refs to them.
That can be disabled with the pike security system, at least. Not that I know of anyone who has ever used it, though..
It's possible, but I think it'd be unfortunate to do that. _refs is useful for debugging and even in real code occasionally. I think it'd be better to have a String.Secure class which simply doesn't use a shared string for storage.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
It's possible, but I think it'd be unfortunate to do that. _refs is useful for debugging and even in real code occasionally. I think it'd be better to have a String.Secure class which simply doesn't use a shared string for storage.
Yes, and preferably using some kind of simple garbling mechanism to copy the string out of hiding into the normal space on a need-to-know basis.
I.e. use a simple pseudo-random number generator with a salt that is determined at runtime (when the class is instantiated) to encrypt the string, then store it in non-shared storage. Then when the string is needed, decrypt at runtime and copy to a normal string.
The only thing you can't ensure this way, is that once the string is garbage collected, that it is actually wiped from memory. Not quite sure how to solve that. Or, maybe we can. Perhaps keep a reference to the string inside the class, then after the cleartext version of the string is no longer needed, call String.Secure.wipe or something, which then throws an error when the string has more than one remaining reference, and if not, it deletes the string in a special way, which actually overwrites the physical memory location content with zeroes first (before releasing to the garbage collector).
If you have a secure string "hej" which is not written to swap, and a non-secure string "hej" which is not shared, then "hej" may still be written to swap from the non-secure string. So no, not sharing does not give any advantage in that respect.
just because it is possible to have a string both in a secured and non-secured version does not negate the advantage of not having secured strings in swap.
as long as it is not possible to relate a non-secure string to a secure one this should not be an issue. at least it is no worse than two unrelated people choosing the same password. knowing about one password does nothing to reveal the other one.
greetings, martin.
Martin Baehr wrote:
just because it is possible to have a string both in a secured and non-secured version does not negate the advantage of not having secured strings in swap.
as long as it is not possible to relate a non-secure string to a secure one this should not be an issue. at least it is no worse than two unrelated people choosing the same password. knowing about one password does nothing to reveal the other one.
Well, if I needed to find a password, and I know it exists as a shared string, then dumping all strings (either by enumerating them or by dumping the whole core of the program and then fishing out all the strings), and then using all the found strings to launch an attack is likely to succeed in a shorter time than doing a full brute-force attack, and therefore is weaker than not having the string visible at all.
but how do you know that it exists as a shared string?
of course if you know that then all is lost, but gaining that knowledge has nothing to do with whether secure and non-secured strings are shared or not. at least i can't see how either way here could make any difference.
in a multiuser environment it must not be possible to detect that a string is secured. if i write random strings out with %O and then i find some that are not printed then i can guess passwords that way. i don't even need access to pike but just a "helpful" debugging api that will use "%O" somewhere to print my own input (in a webinterface for example) could allow me to do some password guessing if this %O attack works.
watching reference counts could also be a problem. if the count rises more than i'd expect then i could guess that someone else is using that string. it won't tell me if it is used as a password. though. i guess in a multiuser environment the access to reference counters should generally be restricted or turned off. as it would be detectable either way because any secured string will at some point be non-secured (when it enters the system before it is secured)
i am not sure if any of these are real problems and if they are possible/worth to fix, but if they actually are problems at least there should be a big warning assotiated with using them.
greetings, martin.
You miss the point. If anything, preventing secure strings from being written to swap works _better_ if the secure and non-secure strings are shared. I didn't say there was no gain from not swapping secured strings, I said there was no gain from not _sharing_ them. At least not in this respect.
yes, i did indeed miss that. thank you for pointing it out (i am really just making a lot of assumptions here in the hope that they will either be corrected or acknoledged so that i can learn something in the process)
could you explain how that works?
greetings, martin.
The way (shared) secure strings work now is that when "hej" is turned into a secured string, the string object "hej" in the shared string pool gets marked as "secure". This object is referenced by all strings "hej", secure or not. The flag remains on the string object until it all references to "hej" are gone and the string object is freed (at which time the actual string is zeroed over, this is the "secure" feature).
The fact that also non-secure strings can refer to the same string object means that the string can remain longer in memory (before being zeroed over) than if it had only secure references to it, but actually it makes no difference. Consider the case that there is one non-secure reference to "hej", and one secure, which has just been removed.
Case 1, shared secure and non-secure strings: The secure shared string "hej" remains in memory because it still has a (non-secure) reference. Total counts of "hej" in the process space: 1.
Case 2, non-shared secure and non-secure strings: The secure shared string "hej" has been zeroed over because it has no more references. The non-secure string "hej" still remains because it has a reference. Total counts of "hej" in the process space: 1.
So it is clearly the case that not sharing the strings does not reduce the visibility of the secret string in the process space. In the same way, if making a string secure would also move it to non-pageable memory (which might be good), making the string not shared would not reduce the visibility of the secret string in the swap partition. It would rather be the other way around.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
The way (shared) secure strings work now is that when "hej" is turned into a secured string, the string object "hej" in the shared string pool gets marked as "secure". This object is referenced by all strings "hej", secure or not. The flag remains on the string object until it all references to "hej" are gone and the string object is freed (at which time the actual string is zeroed over, this is the "secure" feature).
I see. Ok, that is useful in its own right. The only problem that arises here is that if you are (ab)using this marker to suppress casual display of the string, then if this string just happens to be the same as some other (unsecured) string, then the printing of the unsecured string is going to be prohibited as well (all of a sudden), which is confusing at best.
sorry, i reread and now i understand what you meant. i disagree with that.
preventing a string that is not-secured from written somewhere because the same string is secured elsewhere makes it possible to detect the existance of a secured string. that is what i see as a danger and like to avoid.
seperating secured and non-secured strings should (i hope) prevent that. and to reiterate, my point is that _if_ a secured and a non-secured string are unrelated then having one version of it in swap (or written out with %O) can't do any harm. but that is a big if, since for any secured string there will always be a non-secured copy at least for some time until the string is being secured (and the non-secured version dereferenced)
greetings, martin.
preventing a string that is not-secured from written somewhere because the same string is secured elsewhere makes it possible to detect the existance of a secured string. that is what i see as a danger and like to avoid.
I think you are confusing different features now. How would the owner of a non-secure reference to the string be able to detect if the string is prevented from being written to swap? You probably think about the (suggested) hiding of the string in sprintf("%O"). And it was precisely because of the danger you talk about that I said that this feature can not be implemented with the current secure bit, but would need something added to the svalue instead (text 16616268).
and to reiterate, my point is that _if_ a secured and a non-secured string are unrelated then having one version of it in swap (or written out with %O) can't do any harm.
And my point is that, as far as swap-lock (or zeroing) is concerned, having them non-shared will not do any good either.
For the %O issue, having non-shared strings could be a solution, but there are also other possible solutions, which do not break the ubiquitous assumption that strings are shared.
Btw, the fact that secure strings start out as non-secure is not a problem for the current secure feature (which only kicks in after the string has lost all references), but it would indeed be a potential problem for a swap-lock feature.
On the other hand, the existence of both a secure and a non-secure string with the same contents is clear evidence that the security model has been broken.
Shouldn't the correct response to this evidence be to alert someone, by throwing an error or printing a stern warning?
It doesn't matter much why the model is broken. It may be because the random seed turned out not to be so random. Or because the secret key was stored in a non-secure string first. Or because the user choose a password that is far too easy to guess (by a computer if not a human).
It seems that just silently ensuring that the contents is not swapped today is rarely enough to solve the problem. Or perhaps the secure strings will be used for purposes where the do-not-swap feature is not that important?
There are no requirement about secret strings beeing random. In fact, I predict that they will be clear text passwords most of the time.
The do-not-swap feature is something I've been bringing up now and again, but it really isn't the main feature of secret strings. Neither is hiding secret strings from the rest of the Pike process. The absolute main usecase is making sure they are not printed in a backtrace on a webpage somewhere. Everything else is possible bonuses.
You are certainly correct. The greatest need is obviously for do-not-print-carelessly strings.
However, it would still be neat with important-crypto-stuff strings. And I think the currect behavior for such strings are to throw an error if they turn out to have the same contents as other strings.
That's not secure, it's just crash-randomly-and-unexpected-in-production because we don't like your way of using the secure string.
Or you could find out that the random number generator is broken now, instead of when someone does a code inspection two years later.
Strings containing cryptographic keys must be random on secret for the security model to work. If your production environment relies on cryptography to protect something it is correct behavior to shut it down on clear evidence that the keys are broken.
On Thu, Jul 03, 2008 at 08:45:02AM +0000, Mattias Wingstedt (Firefruit) @ Pike (-) developers forum wrote:
On the other hand, the existence of both a secure and a non-secure string with the same contents is clear evidence that the security model has been broken.
that is only true if both strings come from the same source. in a multiuser environment this is not necesarily the case.
greetings, martin.
A secure string, containing a cryptographic key, that has the same contents as a non-secure string, is broken no matter if it is a single or multiuser system. A cryptographic key that can so easilly be recreated by another user is broken. Either the random number generator, or the user chosing a pass phrase, created a far too weak key.
Shouldn't a good random number generator create any string (of the appropriate length), including "Mattias Wingstedt (Firefruit)", with equal probablility?
I can easilly recreate the string "P°\221R\1v9V\36ÀÈ<\f\227å\a" (I just did), does this mean that a random generator should never return this string? Should we create a blacklist with strings we can recreate, and check random numbers against it?
I actually do believe that a good random number generator for serious key lengths should never generate a collision. Thus we could well test it by creating such a blacklist. 2^128 gives enough combinations so that we should expect each random number to be unique.
Doesn't all the new cool distributed version control systems also rely on that assumption? (I guess you can look at a cryptographic hash function as a special sort of random number generator)
Not on being unable to encounter the same hash in any random other string, no - only on not having collisions in output of the function that generates a hash from an input string, from different inputs.
In your interpretation, security seems deemed broken as soon as the system encounters the same string, by any means, which to me seems more like an attack vector in itself -- if one can inject things that will make critical parts throw an exception, should your injected code trip on a secret key, I don't see how that would be a good thing.
If you can force the system to generate the same random key twice, by injecting some data somewhere, then it is indeed a very broken system.
If the same secret key is stored both as a secured object and as a non-secure object then either the system iself is broken, the key is too weak or the user has been too careless with her handling of the key.
If the same secret key is used in several secured objects then it is probably only the same user multitasking.
If an attacker can find out the random or secret key then the least of our problems is that she can use that knowledge to perform a new novel kind of denial-of-service attack against other users of that very key.
Ah. I think I misunderstood you. As I read you, the string itself (any random string data of a specific length known by the pike process as a string), however produced, would be used as to crash-and-burn on the generation of a [crypto-secure-object-by-whichever-name] whenever any kind of overlap of the two sets existed.
Now I think you are talking about some specific random-key-generation- function to crash and burn on generating the same output twice (when it manages to detect that having happened). The latter sounds like a potentially useful trait. The former did not.
yes, but that makes the stringor the application that allowed this particular string to be created broken, but it does not follow that any application where any one string happens to exist in a secure or non-secure version is broken.
i expect that this secure feature will be used for much less critical things that i don't want to show up in logs. usernames, urls, in fact any user data is a candidate for something that should not show up in logs.
this leads to the suggestion to rename the whole concept from "secure string" to "private string" or "hidden string". there is not much secure about secure strings, and i think secure is not a good term.
private could (at least from the terminology) clash with the current private keyword, and hidden nicely relates to the resulting feature of "hiding" the string from %O output.
greetings, martin.
Even if that idea had an merit (which I don't think it does) it has no use in combination with secure strings. If would in fact make applications much less secure. Let me exemplify:
Take a webserver, let's call it Roxen. This Roxen loads user-creatable scripts and modules. The author of the htaccess-module decides that storing the user:crypt(passwd) pairs in secure strings would be good to avoid printing them in a backtrace by mistake. This works fine for a while until a user reports that his Roxen no longer works because the htaccess module keeps crashing.
It turns out this is because this user loads another module that parses some sort of passwd file where one of the entries is a match for one of the entries the htpasswd module has loaded, but without the secure string.
There is two way of resoling this situation, but the only reasonable one - and the one that will be used - is to remove the secure bit from htaccess to avoid other random breakage.
I think you assumed that I'm advocating that do-not-print-carelessly strings must implement all properties that are important for important-crypto-stuff strings. I'm not. Clearly do-not-print-carelessly strings are important, and they cannot behave as important-crypto-stuff strings should.
However I still that an important-crypto-stuff string type would be cool and useful.
Obviously the important-crypto-stuff string type is not be necessary nor useful if there are already other objects that are capable of handling the cryptographic keys and seeds, and thus no reason to store them in a string.
pike-devel@lists.lysator.liu.se