Hello,
I thought I would make these two toys available for your perusal, even though they are hardly finished. If by some chance they survive scrutiny by thoughtful Pike developers (and maybe gain some improvements|bugfixes as a result), I may follow up with a more "public" announcement (freshmeat maybe), produce some usage samples, test cases,... (and profit ;-) Brief descriptions follow. In addition, the sources include some AutoDoc markup, and before setting up a dedicated webpage I hope to figure out how to extract and format API docs from standalone .pmod's.
* PiXPull.pmod aims to be a Pike implementation of an API not unlike the one described and advocated at http://xmlpull.org/ (which is designed for Java). In brief, it is a streaming XML parser, based on the "pull" parsing model rather then the more usual event-callback scheme (SAX, etc.). Currently it neither validates nor parses DOCTYPE declarations (merely skips them), and is not fully compliant with the XML spec in detecting violations of well-formedness (many checks are missing that are too expensive to do in Pike). XML Namespaces are not implemented (this is optional in the XMLPull API), but shouldn't be too hard to add. Other bugs/omissions: - input encoding detection isn't quite there (but at least utf-8, utf-16 and 8-bit encodings supported by Pike should work) - attribute value normalization is buggy - error reporting needs improvement It does however seem to work OK for about all XML input files I tried ;-) Being written entirely in Pike, PiXPull is not lightning-fast... but it does not seem to be significantly slower than pure Java implementations (but is of course no match for compiled native code...)
* PiXTree.pmod is a rough sketch of an interface for generating XML output. If PiXPull is pre-alpha, this is just an experimental prototype. Your feedback (if any ;-) will tell me whether this is at all a good idea. Basically, you can do like:
import .PiXTree; ... Element html = Element("html"); html( Element("head")(Element("title")("This is the title."))); Element body = Element("body")(Element("h1")("The Title.")); html << Comment("The <body> starts below."); html << "\n"; body << Element("p")("Here is some text in a paragraph."); write( html(body)->toString() ); // defaults to utf-8 encoding
// or a number of different permutations ;-) write( Doc(html)->toString("iso8859-1") ); // ..to produce an xml header and use the provided encoding; // note the body has already been attached to html, // you don't want to do it again.
The two modules are temporarily to be found at http://bobo.fuw.edu.pl/~rjb/STUFF/PiXPull.pmod http://bobo.fuw.edu.pl/~rjb/STUFF/PiXTree.pmod
Enjoy! (or otherwise ;-)