A new word.

Communicy - a community that exists only inside a communication medium.

To replace the phrase "virtual community" since ofter communicies are more real than the ones on the streets these days.




Back to the Rats

Since it came up again in LtU, and I was thinking about it after Burning Chrome mentioned parsers for syntax highlighting, I was at Henry S. Thompson's seminar on XML schema validation (which is essentially a similar problem to parser design), then Brenden Eich mentioned Narcissus in his closing session at XTech I've been playing with a packrat parser written in JavaScript.

I started again 7pm on Thursday during XTech*, and spent most the night on it, then did more on it on Saturday at BarCamp Amsterdam. Current work in progress is in this directory, with a simple demo which parses a definition of the grammar used to define grammars, then creates divs to render it prettily in the iframe. The grammar below the iframe is the grammar used to parse what's in the iframe, and below it are debug messages. Editing what's in the iframe causes events to mark it dirty, but the incremental reparsing doesn't work yet.

The next stage is to add the incremental reparsing, so when you do edit it it tries to reparse, and to link that to the divs so that you don't have to throw away all of the content of a div when you reparse it, as at present.

The Triep stuff is a combination of Trie with predicates that was another idea for incremental parsing taylored for interactive development. I'm going to add some form of trie to the parse states for autocompletion, though I may go back to using a standard trie as the calculus for the predicates is tricky, and reusing the packrat grammar may mean it's not needed.

One thing about using PackRat using the visitor pattern (or any other tree-walking recursive descent), is that you don't need to convert the grammar into a finite state machine, which means that while Henry S. Thompson's handling of numerical constraints is an elegant solution, the problem simply doen't exist - you can simply put the counter in a for-loop in the visitor's method, and you're done. The parser doesn't do that since most text grammar don't require it - it only handles zeroOrMore, oneOrMore or zeroOrOne, but nToM would be handled in much the same as those, and I don't know in principal why you couldn't use tree-walking descent for XML - my ASN XER to Java data bindings used exactly that approach, and that's a superset of WXD.


Labels: , ,


Decamel and the Colossal Cave

One thing I've noticed, when working in languages sufficiently introspective that you can access names of functions, is that any large enough system ends up implementing a decamel method.

Basically it takes a string, being a method or class name, which (due to limitations language design though based in the days when the only way you'd get timely parsing on the available hardware was to use recursive descent) is MadeOfSeveralWordsUsingCapitalsToIndicateBreaks and break it apart for user presentation.

Normally it's the text of a menu that's late bound to a function call, or the default label of an entity type on the palette of a modelling tool. I've also used it with a unit test framework that output the test names and results in an XML file, and with the magic of Ant, Saxon and FOP presented a properly formatted PDF automated test report for my line manager.

There's an example of an OSS tool on code_poet of it being used for documenting unit tests (though his 'self describing system' label is more than a little off mark - labelling a unit test 'Dog walks when you kick it' doesn't guarantee that the test has anything to do with the label; and as the fact that both his labels are present simple but the test for barking is past perfect and for walking is present progressive indicates, even giving a long, meaningful label doesn't guarantee to uncover the underlying assumptions that make creating good unit tests tricky).

Now, that stuff is very easy, and a very powerful tool for creating agile applications (you don't want a thousand anonymous ActionListeners cluttering up your code - even if they're generated by your IDE, it's all hardwired and a pain to maintain), and maybe a help for readable unit tests.

Something else on my radar, this time via decafbad but also now on LtU, is Inform7, which is creating a bit of a buzz.

Now I've known about controlled English systems such CLCE and ACE, both of which are general purpose mappings of a subset of natural language onto first order logic, and given effort you can create a model that can be transformed into running code. But it seems that the generalness of the AI originated systems gets in the way - or maybe I'm just too lazy to dig into them far enough to get a return on my effort.

It's also true that both CLCE and ACE have been used to create running code for systems specified in natural language, but in my experience there are always problems with general purpose code generators (not that I'm saying anything of the quality of those projects in particular).

So instead of applying the full systems to the general case, what happens if you add a simple adventure game style parser to the decamelled code? Can you generate unit tests like that, or do you need the full linguistic systems to get the assumptions about tense correct?




This Is the Title

Link: http://www.math.uchicago.edu/~chruska/recursive/moser.html

LoL, via LtU