2008-05-28

Python and EA

For my current contract, I'm part of a team producing scripts which define interfaces to equipment in a satellite communications system for monitoring and control. Various interfaces, serial, low-level TCP and SNMP.

The application is fairly old - originated on Unix in early 1990s, ported to MFC late 1990s - and there are many links between the various configuration files. It's a high reliability system, so uses the classic pattern of a watchdog process with spawns the active processes, monitors them for liveliness, CPU and memory use, and respawns them or kills them in event of failure.

The goal is to use Enterprise Architect for modelling the system in UML. EA has a limited custom code generation - one file per class, poor tree walking functionality - which is just sufficient to create the interface scripts, but none of the other configuration which is more highly intertwined - for each rack or group of racks of equipment to be monitored, there's a server which runs a set of monitoring processes, each monitoring a set of instances of the equipment. Various parts of the configuration, such as the port ID of each server process, need to be put into different configuration files, so the desired layout of generated files doesn't follow EA's assumptions.

Also, there needs to be a parser for the configurations which reverse engineers a UML model from the configuration files - we don't want to have to generate the UML models for existing equipments by hand.

So I started writing a little parser in Python which generated an XMI file for import into EA. That worked well enough, and impressed the line manager, so have been spending the last couple of days fleshing it out.

EA doesn't quite treat tagged values as first-class citizens of UML, relegating them to a second window (though it's nice to have them docked and always visible, but that could apply to all the panes in the properties dialogs). So I wanted a table view which looked like the config file layout, of all the extra tagged values we're using to define the application specific properities used to implement the attributes of the equipment in the configuration files.

I've been using GTK for that - I've played with GTK a bit in C, and quite like it's level of abstraction and simple event model. It's not too hard to get things working in Python (some of the handling of attributes is a bit counter intuitive), and it mostly looks native (compared to Java or Gecko, or even EA's UI, which I believe is written in C#, but isn't using WinAPI menus).

I quite like Python, but it's a bit too imperitive for my taste, and there's no type system to tell you if you've put a right-handed vector into a left-handed library, or referenced an attribute of an object which doesn't exist. But being able to create a simple dialog with a REPL (read/eval/print/loop) in it to do the initial exploration of the EA COM automation interface was very handy.

The thing I dislike most in Python is that it doesn't actually use indentation to indicate blocks (I read Ken Arnold's chapter in The Best Software Writing over the weekend, and think I agree). It uses the colon character to indicate a block, and following a colon you have to use indentation. I'm always just doing the indentation, and forgetting the colons.

Labels: , ,

2008-04-25

Some things I'm thinking about

I've been playing more with my mostly functional modelling and simulation language (kin), here are some ideas which I want to try and profile to see if they offer gains:

Carrying type and data as a tuple:


Some implementation of Scheme and JavaScript use a single void* size datum for each object, and for integers set the lowest bit and encode the value in the rest, and for other objects the lowest bits are zero due to alignment, and the whole is a pointer to object. That doesn't work very well for doubles, and requires that the type is stored in the object's header. Java, C#, and C++ (for object with a vtable) have type information in the object header, which costs in large arrays. I've been thinking about passing the type and the data around separately, which means you can have more tightly packed arrays for non-vtable types.

One-bit markers for escape detection:


Based on One-bit Reference Counting, a marker in the type field to distinguish 'shared' objects you can take a reference to, and 'non shared' which you must not.

Write-once arrays:


As an extension to 'one bit reference counting' style; you maintain a high water mark instead of always copying, for use in structures such as VLists so you don't copy on passing automatically, and it allows you to mutate an array if it can be proven that all accesses happen before writes in matrix multiplications:

ax[i] <- ax[i] + row[i] * col[i]

Block order arrays for cache sensitive iteration and super compiled functors:


In a simulation, it's not uncommon to have a reduction on some function of a few attributes of an entity:

let total_force = reduce(array_of_entities, lambda(e:entity, ax) => f.mass * f.acceleration + ax, 0)

If this is over a homomorphic array, and mass and acceleration are simple double values, this would in C translate to picking 8 bytes here and there from the memory for the array. If instead each field is kept either in 'column order', or in what I'll call block order - so the fields for (cacheline size in bytes) objects are held contiguously. This should both reduce cache misses, and allow the use of SIMD instructions to process the data. The obvious disadvantage is that an object in an array is no longer the same layout as a single object on the heap, and to exploit it you need either a super-compiler or a trace-based JIT.

Effective model for scopes based on lists, symbols and objects:


Trying to build an interpreter in Java, which makes it tempting to use maps for everything, I found that the properties of an object and the properties of a lexical scope are much the same, (the duality of closures and objects is well known) so will try and define the binding model for values in kin using symbols, integers and lists only.

Using CAS for thread safe homogeneous VLists


Similar to the vector implementation in Lock-free Dynamically Resizable Arrays.

Options for compiling Verilog to shader language


Having had an interview last week as a systems modelling engineer with a company who were using C as the modelling language for timing simulations in embedded memory controllers, which is a bit like going for a job as a car mechanic and discovering that you're being quizzed on your wood-working skills, I was thinking about compiling an appropriate modelling language to something which executes efficiently. Though their particulary problem - timing conflicts - I would have though as having an analytic solution.

Something after UML


Speaking of Verilog, Verilog came into existance because even standardized schematic diagrams don't carry strong enough semantics and are not amenable to algebraic analysis, and graphical notations don't scale to large systems without hiding information.
Pi-calculus came into existance as Petri nets don't scale to large systems and are not amenable to algebraic analysis.
UML is very much in the standardised schematic mould, lacks formal semantics, and relies on hiding information to scale.

Often the hidden information in UML is very important - what appears to be a good design is not a good design if its simplicity is achieved by hiding most of the attributes and operations of a class, as often happens when a class gets multiple unrelated responsibilities. The notation allow you to hide details which should indicate that the design should be factored into smaller units, and in fact encourage such behaviour as a mechanism to scale up to showing many classes. For example, in Altova UModel:
You can customize the display of classes in your diagram to show or hide individual class properties and operations. ... This feature lets you simplify the diagram to focus on the properties and operations relevant to the task at hand.

If the class has features which are not relevant, then refactor it. Don't hide it so it doesn't make the documentation more complicated. The classes for well-factored systems have nothing to hide, but require more relations between the classes, which UML tools don't handle well (last time I checked none provided basic auto-routing or anything like good layout engines) and look more complicated, even though they are more simple - in the same way a dry stone wall is simpler than a minimalist plastered white apartment, but is visually more complex.

So what might come after UML then? What would not mitigate against refactoring, but allow a visual overview? What notation might give good analytic properties for software, rather than being a system schematic with loose semantics?

I don't know yet.

Labels: , , ,