wish: string with other quoting then \

21 Sep 2003

      Here are the docstrings for my set of regexp operators. I doubt much
else of my code would be of any use here.
//! @decl RxNode any();
  //!
  //! Matches any symbol.
//! @decl RxNode seq (LaxRxType... regexps);
  //!
  //! A sequence. If an array is used as a sub-regexp it's converted to
  //! this.
//! @decl RxNode seq_or (LaxRxType... regexps);
  //!
  //! Like @[Rx.or], but keeps the order between the sub-regexps, so
  //! that if two or more of them match the same input, it's always
  //! the match in the first one that's returned first. If all
  //! alternative matches are requested, they're enumerated in the order
  //! that the sub-regexps match.
  //!
  //! @note
  //! This is the union variant that most closely resembles the "|"
  //! operator in most other regexp engines. However, in some cases it
  //! cannot do as good a job to determinize as @[Rx.or], so if the
  //! order isn't relevant, use that one instead.
//! @decl RxNode range (Symbol from, Symbol to);
  //!
  //! A range of all symbols between @[from] and @[to], inclusive.
//! @decl RxNode rep (LaxRxType regexp, void|int low, void|int high);
  //!
  //! Repetition, which can be upwardly bounded or unbounded. (In the
  //! unbounded forms this includes "Kleene star" and "Kleene plus".)
  //! The given regexp must match at least @[low] and at most @[high]
  //! times. @[low] defaults to zero. There's no upper bound if @[high]
  //! is left out or is negative. If @[high] isn't negative but less
  //! than @[low], this matches nothing.
  //!
  //! @note
  //! The first returned match is the longest possible one. Therefore
  //! this operator is "greedy". There's also a non-greedy variant
  //! @[Rx.lrep].
  //!
  //! Actually the above is not entirely correct; the first returned
  //! match is really the first match of @[regexp], repeated as many
  //! times as possible.
  //!
  //! For example, if @[regexp] matches @tt{"aa"@} and @tt{"a"@} in that
  //! order, then the first match on @tt{"aaa"@} will have two
  //! repetitions where @[regexp] matched @tt{"aa"@} and then @tt{"a"@},
  //! and not three repetitions where each matched @tt{"a"@}.
  //!
  //! Otoh, if @[regexp] is lazy and matches @tt{"a"@} before
  //! @tt{"aa"@}, and if the repetition is upwardly bounded to two
  //! repetitions, then the first match on @tt{"aaa"@} will be two
  //! repetitions where each matched @tt{"a"@}. I.e. the first match is
  //! not the longest possible one.
//! @decl RxNode lrep (LaxRxType regexp, void|int low, void|int high);
  //!
  //! Like @[Rx.rep], but implements laziness: The first returned match
  //! repeats the regexp as few times as possible within the limits,
  //! whereas @[Rx.rep] repeats it as many times as possible.
  //!
  //! @note
  //! The first returned match is actually the first match of @[regexp],
  //! repeated as few times as possible.
  //!
  //! For example, if @[regexp] matches @tt{"a"@} and @tt{"aa"@} in that
  //! order, then the first match on @tt{"aaa"@} will have three
  //! repetitions where each @[regexp] matched @tt{"a"@}, and not two
  //! repetitions where one of them matched @tt{"aa"@}.
  //!
  //! Otoh, if @[regexp] is greedy and matches @tt{"aa"@} before
  //! @tt{"a"@}, and if the repetition must match at least once, then
  //! the first match on @tt{"aaa"@} will be one repetition where
  //! @[regexp] matched @tt{"aa"@} and not @tt{"a"@}. I.e. the first
  //! match is not the shortest possible one.
//! @decl RxNode opt (LaxRxType regexp);
  //!
  //! Match the regexp optionally, i.e. like
  //! @tt{@[Rx.rep] (@[regexp], 0, 1)@}.
  //!
  //! @note
  //! In the case where it's possible to both match the regexp and not
  //! match it, the first returned match will be with the regexp. I.e.
  //! this operator is "greedy" just like @[Rx.rep]. There's also a
  //! non-greedy variant @[Rx.lopt].
//! @decl RxNode lopt (LaxRxType regexp);
  //!
  //! Match the regexp optionally and lazily, i.e. like
  //! @tt{@[Rx.lrep] (@[regexp], 0, 1)@}. So whenever it's possible to
  //! not match the regexp, the first returned match won't match it.
//! @decl RxNode str (string literal);
  //!
  //! A literal string. If a string is used as a sub-regexp, it's
  //! converted to this. Technically this is a syntax parser that treats
  //! its whole input as a literal.
//! @decl RxNode set_str (string chars);
  //!
  //! A set of symbols parsed from a string.
//! @decl RxNode save (LaxRxType regexp, void|string name);
  //!
  //! Saves the match of @[regexp] for later retrieval. If @[name] is
  //! given, it's used as a name to identify the saved submatch,
  //! otherwise it's accessed by position.
  //!
  //! The position is determined by counting the start of each unnamed
  //! submatch as they are encountered from left to right, beginning at
  //! zero. Note that this might not be well defined if e.g. @tt{(< >)@}
  //! or @tt{([ ])@} is used to build the regexp tree.
  //!
  //! If @[regexp] matches several times (typically when used inside a
  //! repetition) every match overwrites the preceding one, so only the
  //! last match is available afterwards.
//! @decl RxNode saveall (LaxRxType regexp, void|string name);
  //!
  //! Like @[Rx.save], but if @[regexp] matches several times (typically
  //! when used inside a repetition) then all those matches are saved.
  //! The saved value is an array of the matches, in the order they are
  //! found.
To put the operators above in some perspective, here are the others
that I think would be a bit difficult to include in the pcre glue:
//! @decl RxNode sym (Symbol... symbols);
  //!
  //! A sequence of symbols. The difference from @[Rx.seq] is that the
  //! elements are treated as literal symbols and not regexps. This is
  //! only necessary when the symbols are of a type that otherwise would
  //! be interpreted as something else, e.g. strings.
//! @decl RxNode pair (Symbol from, Symbol to);
  //!
  //! The pair @tt{@[from]/@[to]@}, where the symbol @[from] in the
  //! input is mapped to @[to] in the output. The result is thus a
  //! transducer.
//! @decl RxNode or (LaxRxType... regexps);
  //!
  //! A union; matches if any of the arguments match. If a multiset is
  //! used as a sub-regexp it's converted to this.
  //!
  //! @note
  //! When given no arguments, this doesn't match anything at all.
  //!
  //! @note
  //! This operator tries to get as good determinization as possible by
  //! allowing any match order between the alternatives. It's therefore
  //! effectively "greedy" to the extent that determinization succeeds,
  //! but that can't be counted on since determinization isn't
  //! guaranteed to be complete. There's also the @[Rx.seq_or] variant
  //! that always matches the alternatives in the order they are given
  //! (which most closely resembles the behavior in other common regexp
  //! engines).
//! @decl RxNode and (LaxRxType... regexps);
  //!
  //! Intersection; matches only when all the arguments match.
//! @decl RxNode neg (LaxRxType regexp)
  //!
  //! Negation; matches everything that @[regexp] doesn't match.
//! @decl RxNode sub (LaxRxType a, LaxRxType b);
  //!
  //! Subtraction; matches when @[a] but not @[b] matches.
//! @decl RxNode set (Symbol... symbols);
  //!
  //! A set of symbols. Much like @[Rx.or], but the elements are treated
  //! as literal symbols and not regexps.
//! @decl RxNode map (LaxRxType from, LaxRxType to);
  //!
  //! Maps the regexp @[from] to the regexp @[to]. Both must be
  //! recognizers and the result is a transducer. If a mapping with a
  //! single element is used as a sub-regexp, it's converted to this (a
  //! mapping with more elements becomes the union of the pairs in
  //! it).
  //!
  //! (Technically, this is the cross product of @[from] and @[to], i.e.
  //! the set of string pairs @tt{a/b@}, where @tt{a@} matches @[from]
  //! and @tt{b@} matches @[to].)
//! @decl RxNode test (function(DataList,void|Rx.Rx.Process:int) func, @
  //!		       void|int low, void|int high)
  //! @decl RxNode test (function(DataList,void|Rx.Rx.Process:int) func, @
  //!		       LaxRxType regexp)
  //!
  //! Calls @[func] to test whether there's a match at this position.
  //!
  //! The function will be called with a piece of the input and should
  //! return nonzero if the whole piece matches, zero otherwise. The
  //! second argument to the function is the current @[Rx.Rx.Process]
  //! object. Although it can't be used to reliably look at the input it
  //! might be useful to look at flags, e.g. @[Rx.Rx.Process.DEBUG_LOG].
  //!
  //! If @[low] and/or @[high] is given, they give the lower and upper
  //! limit of the length of the string that can possibly be matched by
  //! @[func]. @[low] defaults to zero. There's no upper bound if
  //! @[high] is left out or is negative.
  //!
  //! If @[regexp] is given, only input which it matches will be tested
  //! with @[func].
  //!
  //! @note
  //! If the possible matches aren't screened with @[regexp] or a narrow
  //! @[low]/@[high] interval, it's likely that the test function is
  //! called excessively often.
/ Martin Stjernholm, Roxen IS
Previous text:
...
2003-09-21 15:43:
Subject: wish: string with other quoting then \

...
Just changing the regexp quote character to something else would make
a simple rule.
Of course.
...
It'd be very simple to implement a similar object/function interface
in your pcre glue. It'd just be a set of functions that internally
converts to pcre regexp syntax. I can provide the design I've made for
that; it's very straightforward.
That's true. I'm currently on the step of starting to write the Pike
level glue for Regexp.PCRE... Was there a start of that somewhere? I
can't seem to find it.
/ Mirar

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

wish: string with other quoting then \