Design notes
Other developers are occasionally interested in why particular design decisions were taken with PSICS. The following notes address some of the more common questions.
Units and XML
In PSICS, dimensional quantities are expressed as single strings, as in:
<KSChannel id="HH_Na" permeantIon="Na" gSingle="4pS">...</KSChannel>
<KSChannel id="HH_Na"> <permeantIon>Na</permeantIon> <gSingle>4pS</gSingle> </KSChannel>
<gSingle unit="pS">4</gSingle>
<gSingle> <unit>pS</unit> <value>4</value> </gSingle>
This could be seen as breaking the principle of atomicity, (also called the principle of structured data) for XML content which states that structure data should be split out into XML elements rather than enclosed in strings. The problem with the two forms immediately above is that the result is not really atomic. There is a degeneracy in the split between the value and the unit, with the result that these cannot be considered the canonical representations of the quantity.
But, also, the atomicity principle is actually more subtle than the statement above would suggest. XML formats make extensive use of structured data within strings. Examples include scientific notation for numbers as in the XML schema datatypes recommendation, the ISO 8601 date format, xpath in and variable references in XSLT, and comma-separated lists in Ant. The common thread in these applications is that the processor using the data has no choice but to understand the concept being encoded. An xpath expression is of no use except to an xpath-aware processor, so there is no point in splitting it up further. The same is true for and ISO 8601 date and the other examples. In the rest of this section, we demonstrate that this is also the case for dimensional quantities in PSICS.
Consider the quantity 4pS. It can also be written as 0.004nS or 4E-12S or 0.004 per Gohm or 4E-5s^3 A^2 per g per cm^2 or 4E-12s^2 A^2 Kg^-1 m^-2 or any of an indefinite number of other variants, all of which specify exactly the same quantity. Expressing these as
<gSingle> <value>0.004</value> <unit>per GOhm</unit> </gSingle>
<gSingle> <value>4E-6</value> <unit>uS</unit> </gSingle>
<real decimal="6.02" exponent="23" /> <real exponent="23">6.02</real> <real exponent="24">0.602</real> <real> <decimal>602.</decimal> <exponent>21</exponent> </real>
Returning to dimensional quantities and the 4pS example, the natural canonical representation is
<gSingle> <value>4E-12</value> <M>-1</M> <L>-2</L> <T>3</T> <A>2</A> </gSingle>
<gSingle> <value>4E-12</value> <unit>S</unit> </gSingle>
<gSingle> <value>4E-12</value> <unit>Ohm^-1</unit> </gSingle>
So the XML designer has the choice whether to use the full canonical representation, an SI value and SI units, a plain undivided string, or a somewhat arbitrary split where neither component is unique. The last route is often taken through simplistic application of the atomicity principle without recognizing the degeneracy which takes away most of the benefits of such a split. The difficulty of the full canonical form is that it is hard to recognize that the dimension being expressed is the Siemen. The latter could be included as an attribute for reader convenience, but that reintroduces the fragility problem because it could then be inconsistent with the dimension that is actually expressed.
The main lesson to be taken from all this is that working with dimensional quantities is hard. Either specifications have to use the canonical representation or anything processing specifications of such quantities has to be aware of the subtleties of units and dimensions. Splitting a string into value and units components masquerades as XML atomicity but without the usual benefits. Since PSICS is intended to be read and written by users, the canonical option is not viable as it would force the user to do manual conversions to SI throughout. So PSICS takes the other option and implements a full dimensions model within the software itself, leaving dimensional quantities as undissociated strings. They suffer the same lack of uniqueness as with splits, but at least they do not look atomic, and they are certainly closer to the way such quantities are normally written.
Compound units and equations
The above discussion of intermediate expressions for dimensional quantities between plain strings and the canonical representation only scratches the surface of the problem. For example, a quantity such as 0.95 microFarad per square centimetre would properly demand some MathML to encode the expression 0.95 * microFarad / power(cm, 2). But given that such things are not atomic in any case and are likely to rapidly get more complicated than the true canonical representation, that does not seem like a good idea.
There is also a tendency in scientific applications of XML to use dimensionless variables in equations, presumably because computers languages only deal natively with dimensionless quantities. For example, if P is a membrane potential with dimensions Volt, then one can write p = P/Volt to get a dimensionless quantity p. Doing similar manipulations for other quantities, one can write whole physical equations in dimensionless form, such as c dp/dt = g(e - p) for the potential into a capacitive compartment with an ohmic conductance g to a driving potenital e. This is the dimensionless form of C dP/dT = G(E - P) where C has dimensions capacitance, G conductance T time, and E and P have dimensions Volt.
The key difference between these forms is that the dimensional form says nothing about units. It is a statement about physical quantities, in the same sense that I can lay a meter rule end to end with a six inch rule and get something one metre and six inches long. But if I add 1 plus 6 it is nonsense. Likewise, the dimensional form is universal, but the dimesionless form as a high capacity for nonsense. If I choose to work in mV and microFarad, so p = P / mV and c = C / uF, then I have to work out the right units for g or the equation is wrong, even though it looks just like the dimensional one. It is worth noting that publications in physics and biology normally express dimensional equations. They may include statements such as "if V is in mV and g in pS then c is in...", but these are just metadata presenting one dimensionless form from the whole family of such forms encompased in the equation. Where model specification tools choose to represent only relationships between dimensionless quantities, they are, often unwittingly, picking an impoverished and fragile version of the original.
The point here is the same as above: working with dimensional quantities takes work that has to be done somewhere. It can be forced on the user by providing an XML specification that requires equations to have been correctly reduced to a dimensionless form in advance. Or it can be implemented in the code that works with the data and leave the user to express equations with dimensional quantities. Given that PSICS has a single code base, but hopefully many users making many models, the latter is the preferred solution here.
Elements and Attributes
Following the terminology used by Uche Ogbuji in his Principles of XML design most PSICS design decisions can be seen as application of a slightly modified version of his principle of core content:
In a sense, of course, none of the information in a model specification in incidental or peripheral. It is all part of the model, so in that sense, it should all go in elements. But there are still two categories of content in a model, and it makes sense to use the element/attribute distinctions provided by XML to style them differently.
Because the models are intended to parallel physical entities, there is a natural distinction in the model between things and properties of those things. A cell or an ion channel are things; the radius of a cell or the conductance of an ion channel are properties. This leads to XML looking like:
<Cell radius="5um">...</Cell> <KSChannel conductance="3pS">...</KSChannel>
There are two rules of thumb that help decide what should be an element and what should be an attribute in this picture (in addition to the unambiguous cases noted by Ogbiji):
- If it feels most natural to say "the X has a Y" then Y should be a sub-element of X
- If it feels more natural to say "the X's Y is..." or "the Y of the X is..." then Y should be an attribute of X
The second rule of thumb, and in fact, the only rule that really matters since it is what defines the XML, relates to representing objects as classes and instances in an object-oriented language:
- If the object being described is an instance of a class of the same name, then it should be an element
- If the object being described has a local name that is different from the class name, then it should be an attribute if possible
class IaFCell { Voltage threshold; }
class IaFCell { Threshold threshold; }