Active Image Data Resource Formats

This document defines the formats for the various Adat resources used to format active image elements, which are compiled by BibleTrans into executable translation code. Each class of resource is confined to a range of ID numbers, specified separately in Adep resources and by codes in the text links. They are organized here by those codes. These and other resources are created by the DocPrep program, separately described.

This document also defines the format for structured text output resources (OTrx) and their supporting data, which are created by the translation engine, then used to display the translation history graph.

For additional information about these resources or their related data, see the BibleTrans Design Decisions.

Several of the formats share a standardized way of encoding a list of text items, consisting of a single number at the front defining how many (or which) items are there, followed by up to 31 integer links to the text items, followed by the text items themselves. The low 12 bits of each link is an offset to the (integer, 4-byte-increment) beginning of the text; the next 6 bits is the number of characters in the item, and the upper bits usually encode the pixel width of that item in its normal font. The Resource Viewer in DocPrep should know about most of these, and display them properly.

Often a text list aggregates multiple single-line text items such as checkboxes or pushbuttons. The aggregate can be displayed separately as an editable list, permitting the user to change the names of the checkboxes. The resource ID of the list is a multiple of 32, and the IDs of the collected one-line items are each offsets from that list ID, by the line number. Thus 64 is the list of language categories, and 65 is the first checkbox in that list. These list/checkbox combinations can also be used to enable or disable other data formats, by the matching low bits of their respective IDs. Thus checkbox 65 enables radio button group ID 1089 and variable list ID 2113, as well as L&N linkage table ID 32065; it also supplies a name to be added to the title of the Language Category 1 window where the category variables are displayed.

The numbers in the list here following index the various display alternatives for purposes of displaying and editing the image elements. They are stored in sequences of Adat resources separately numbered (shown in parentheses below).

Popup Menu (1-31) <PopM ..>

This is a text list. The high half of the first word is the number of items in the menu when popped up, and the low half is the currently selected item, as displayed normally.

Push Button (3840-4160) <Push ..>

This is a single line in a text list. The button name is that text item. The first word of the list is all 0 except for the button that is currently pushed, whose bit is 1 (the least significant bit is button 1). The upper 8 bits of the ID is the ID of a PshM resource containing the message to be sent; the low 8 bits of the ID is an offset into the PshM resource. Messages in the BibleTrans framework are 4-character ASCII codes that name events. Each thread has its own event loop for responding to these messages. Button messages are typically forwarded to whichever thread is responsible for processing them.

Checkbox (32-992) <ChBx ..>

This is a single line in a text list (see Fixed List). The checkbox name is that text item. Each bit of the first word of the list is 1 for checked items and 0 if unchecked (the least significant bit is checkbox 1).

Radio Button Group (3072-3583) <RadX ..> <RadG ..>

This is a complete text list, each radio button in the group is a line item in the list. The low half of the first word is the index of the currently selected item.

Edit List <EdLs ..>

This is an editable text list, with up to 31 one-line items (not used at this time).

Drag Line Group <DrLn ..>

7  Checked Drag Line (160, 1185-1215, 2209-2239) <CkDL ..>

Both of these formats are the same, but the Drag Line Group is not used at this time, and a checked drag line is not displayed if its associated checkbox is not checked. The checkboxes are in Adat#160, and the linked (syntax) drag lines are in the 31 resources starting at Adat#1185. The low half of the first word is the ID of an Item List (the same low five bits, starting at Adat#2209) containing the names of the elements that can appear in a line of this format. The low four bits of the upper half are the number of lines (up to 9), and the next four bits the currently selected line while a user is actively dragging an element. The high byte of this word is the element number being dragged.

The second word of the format has 1s in the bits where elements in the corresponding item list represent variable names, and 0s where the elements are computed values. Only variables in the item list can be linked to slot labels in the Dot Connector format.

Each line of the group can have up to 31 elements, one per byte with a byte count at the front of the line, for a total of exactly eight integer words (32 bytes) for each line, starting at byte 8 (third integer word). Only the low five bits of each byte are significant; the other bits may be set to facilitate exporting, but are ignored.

8  Predefined Table (1536, 1537, 1792) <Tb2x ..>

9  Checked Table (64, 2113-2143, 32001-32007, 32065-32095) <CkTb ..>

12  Table Lookup (192, 1217-1247) <LkTb ..>

These three formats are essentially the same. The Predefined Table is enabled if populated; the other two require their associated checkbox to be enabled. The high half of the first word in each designates the row labels, and the low half designates the column labels, in each case the ID of a text list whose items form the row or column labels. The Checked Table has a special code to indicate that the row labels come from the checked items in its corresponding column of the L&N Linkage Table; the column labels come from the variable list associated with that language category. In the Table Lookup format, the row and column labels are each dynamically created from the possible values of a designated variable. Only static values are considered for this list; if a variable is set by a Set Variable operation, those values are not considered. There are at most 31 rows or columns; additional labels are not displayed.

The second word of the format is a row/column count, the number of actual rows and columns to be displayed; the row count is in the high half, and the column count is in the low half. Following this count word are tab stops for each column boundary (one more than the number of data columns), the pixel position of the left edge of that column. These tab stops are calculated dynamically from the actual widths of the labels and data items.

Following the tab stops is one word for each data item, (rows * columns) words. Each word is encoded to one of these (high two bits):

00 No data, or integer > 0
10 Character
01 4 Chars
110 Text link
111 Negative integer

An integer 0 value is encoded as a character '0'. The low 18 bits of a text link are understood in the same way as the links in a text list, 12 bits are an integer offset (at the end of the table), and six bits are the text length. Four-character items allow short text entries (starting with a letter) without allocating and managing variable-length text space.

10  L&N Linkage Table (64, 32001-32007) <LNtb ..>

This table has some number of L&N concept numbers for its row labels, and all the enabled language categories for its column labels. A checkmark in a particular cell of this table enables that L&N concept to appear in the Checked Table corresponding to that language category. The second word of the format is the same as the Checked Table (rows and columns), but there are exactly 32 tab items (unchecked language categories effectively have zero width).

Following the tabs, there are two words for each row: the first word is encoded with the L&N concept number, and the second word is bitwise encoded with the checkmarks for that row, the least significant bit corresponding to column 1 and language category 1.

11 Fixed List (32-992) <FxLs ..>

This is an editable text list, with exactly 31 one-line items. It is used to aggregate up to 31 checkboxes. The user can change the spelling, but not add or delete items.

13 Item List (2113-2463) <ItLs ..>

This is an editable text list, with up to 31 one-word variable names. The user can only choose from a collection of existing names in popup menus derived from (other) variable lists or other sources of data like drag lines and lookup tables. These items can then become the elements of an associated drag line or parameters in a lexical rule.

14 Variable List (1024-1055) <VrLs ..> <VrIn ..>

This is an editable text list, with up to 31 one-word items that form variable names. The user supplies the spelling of each variable in the list, but it is constrained to be a single word with some punctuation disallowed.

15 Dot Connector Group (20001-22047) <DotG ..>

This format displays up to 10 connection patterns, each linking the slots of one node shape to the items associated with a drag line. Each group is associated with a single node shape, so a single shape can have up to ten different drag lines and connection patterns associated with it. When there is more than one pattern, it is selected by the value of a designated variable. Only numeric values in the range 0-9 are considered; any other value in that variable is treated the same as number 0.

The high half of the first word contains the 11-bit ID number of the node shape; the top 3 bits select the icon type, and the low 8 bits enumerate the various shapes defined for that type. The low four bits of the first word is the number of connection patterns in this group, and the remaining bits are used to temporarily hold the active group and anchor dot while the user is forming a connection. The high half of the second word contains the (relative local) xy coordinate of the free end of the connection line being formed, and the low half is the ID of a variable that selects which pattern to use when there are more than one.

Each defined pattern takes six integer words: the high half of the first word is the drag line ID, and the low four bits selects one of its lines, if more than one. The remaining bits of the first two words contain formatting information. The last four words in each pattern enumerate in Item List order, which slot is connected to that item, four bits each, for a total of 31 possible items. Zero in any item position is no connection. There are at most eight slots in any node shape, the index of which fits easily in four bits.

16 Conditional Value (128, 1153-1183; 384, 1409-1439) <Iffy ..>

This format displays up to nine conditions with a resulting value for each condition, plus a default if none of the conditions are satisfied. Any one of the second set of number ranges can be also designated for pronoun selection. The conditions can be any comparison of values, and each value may be an atomic value such as a text string or a number or a variable name, or else an arithmetic or logical construction of values. The low half of the first word is the ID of the Adep resource listing the base IDs of possible values, which are displayed at the top of the group as a set of popup menus. The high half of the first word is the total size of the significant data.

The values are stored in prefix Polish form, one integer per code; the operation is in the high byte, and the xy location of its popup in the displayed image is packed into the low 24 bits. The five data codes are followed by one or more words of actual data. Expression values can be arbitrarily complex, up to the size limit of resources; nested values are shown in depressed rectangles, with the operators in popup menu buttons.

0 Null data
1 Integer data
2 Up to 4 chars of text data
3 Variable data
4 Formatting code data
5 String Length() function
6 Negative
7 Logical NOT
8  +
9  -
10  *
11  /
12 MOD
13 AND
14 OR
15 XOR
16 String Item() function
17 String ConCat() function
18  <
19  >=
20  <=
21  >
22  =
23 unequal
24 L&N number of tree node
25 parent of tree node
26 sibling of tree node
27 child of tree node
28 noun# of tree node
29 Bible reference (bk,ch,vs)
30 pronoun # of tree node
32 ...
62 else
62 else
63 if ... then
128+n n-character string data

17 Set Variable (96, 1121-1151, 2145-2175; 352, 1377-1407, 2401-2431) <SetV ..>

This format identifies up to 9 variables to be given a new value, and the value to be given, which may be a literal constant, or else another variable or one of the calculated values like a table lookup or conditional value. Variables can be set in the middle of a drag line; no output value results, but the variables being set can thus have different values at the front and back of the same drag line. The "Early SetVars" (the second set of number ranges) get called before any other rules, before the Connector Group for a lexical rule activates (if selected there).

The high half of the first word is the number of variables being set, and the low half is the Item List ID containing popups for possible values. Each variable line consists of a word containing the variable list ID and line number (in its high half) of the variable being set, followed by a word for the value to be assigned, encoded similar to table values.

18 Variable Connector (288, 1057-1087) <VrCn ..>

This format identifies a single variable and up to 10 drag lines to be activated when this variable has a tree value assigned to it. The variable must be given a tree value ("+") in a Checked Table associated with one or more language categories. It may optionally be given a place in some drag line, in which case its own drag line will be activated at that place in the parent drag line, but only if the variable has a tree node assigned to it. Otherwise, the variable connector will be activated when the modifiers list containing that tree node are processed as part of a Proposition or Thing dot connector. If more than one drag line is specified, you also designate a variable whose value (0-9, other values count as 0) selects which drag line to use. If any drag line has multiple forms, a secondary popup chooses which line to use. Note that the Variable Connector makes no assignments to the items of a selected drag line; those must be made some other way, perhaps by Set Variable.

The low 22 bits of the first word chooses a variable to connect. The same bits in the second word identifies a selector variable, if there are more than one line; the number of lines is in the upper byte of this second word. Each additional word is one line of connection, the drag line ID in the low half, and its line number in the low four bits of the high half.

19 Pronoun Selector (1024, 2048) <PnEx ..>

This format defines two popup names, one to choose a conditional which returns an integer pronoun number, the other to choose either a conditional or a table lookup (or neither) for generating the inflected pronoun text.

20 Output Character Set or Morphological Rule (2040-2047, 10000-10009) <GfD_ ..>

We have six different data types collected into this "format", distinguished by the low 3 bits of the resource ID number:
<GfDg ..>  This is a placeholder for a 12x16 glyph table starting in resource#10000 (it could grow as big as seven or eight 1K resources, depending on how many characters are defined and how wide they are). The table is in the form of a font table as used for normal text display: Table position [0] is the 4-character font name, [1] gives the total height and ascent (from base line to top of cell) in 16-bit numbers, followed by [2] the character spacing and [3] space width. Beginning in [4] is an index of offsets to the beginning of that ASCII character (-28, that is, [4] is space character 32, [5] is '!', and so on) to character 255 in [223], and the offset to the end of the table in [224]. Glyph width is determined by the difference between adjacent glyph offsets. See example below.

<GfDg ..>  This is a placeholder for a panel displaying the pixels of a single glyph enlarged for editing. The resource contains the index of the selected glyph, and some positioning information; the actual pixels are in the table. Only the first word [0] is used: the low byte is the selected glyph, or zero if none; the next four bits is the cell width (editable white background), then a 4-bit offset from the left of the panel, the width of the grey zone there. The upper half of this word is a 5-bit offset that allows for the black pixels to start other than at the left edge of the character cell (as in the font table). The pixels in the file are always normalized, but you don't want them jumping around in the editing panel as you add or delete pixels on the left.

<GfDp ..>  This is a kerning table, in case the language needs to overlap vowel and consonant glyphs (not yet implemented).

<GfDp ..> This is a text string in the glyph font, so the characters can be viewed in context.

<GfDD ..>  This is 27 character sets (named by each letter of the Roman alphabet, plus a special set of word breaks named "#". Each set is a 224-bit bitmap, one bit for each character in the set.

5-7  <GfDp ..>  These are three groups of morphological (character substitution) rules to be applied after translation. Because the translation rules are ASCII only, one of these (#6) defines a conversion from (Roman) ASCII to whatever character font is defined in the glyphs. The other two perform substitutions before (#5) and after (#7) conversion. The differences are superficial (which font the characters are displayed in); all rules work exactly the same, and are tested in strict numerical sequence exactly once on each generated text character. Each rule is stored as a sequence of characters that is the "context" for applying the rule, followed by a sequence of characters to replace the match with. Character set codes (1-27) can be used in the rules to refer to any character in the corresponding lettered set (#4 above).

24 Node Shape Editor (1984-1987) <BTed ..>

We have four different editors collected into this "format", distinguished by the low 3 bits of the resource reference number, but there is only one placeholder resource (#1984) because the resource ID number controls what data is to be fetched from other resources and displayed.

25 Built-in Rules (1977-1983) <GBIR ..>

We have eight different rules possible in this "format", distinguished by the low 3 bits of the resource reference number, but the placeholder resources have no data because the Strn resource of the corresponding ID number cons the text to be displayed.

26 Tree Node List (3008) <NdLs ..>

This "format" is a placeholder resource for the node list displayed only.

30 Search (3024-3043)  <Srch ..>

This format displays one of 18 different panels showing search criteria and/or results.

31 Title (3584-3839) <Attx ..>

This format does not display anything, but rather alters the window title to include the name of the checkbox with the same low bits of ID. The last word of the given window title must be a number, which is replaced by the checkbox name.

Structured Text Output Resources

The structured text output window shows not only the translated text, but also the history of rules that generated it in a graphical presentation. resources. This information is recorded in the OTrx resources, which are a complex linked structure of nodes with grammar rules or output text as labels. This is so complicated, that I wrote a second document "Structured Text Output Window" to explain how the data is generated and formatted for display.

OTrx #32767 is an index of the available translated text resources in this file, in reverse order of translation (most recent first). The beginning of the resource is the number of entries, then the date (in seconds since 2000 Jan 1) of most recent translation. Beginning with offset +2, each entry consists of two numbers, first the episode number from the Tree episode that was translated, then the specific Bible reference (if any, or else the reference associated with that episode), which is used to represent it in the Translation menu. Each such episode is stored in one or more OTrx resources, numbered first by the episode number, then sequentially by adding 1024 (0x400) to the episode number, for a maximum of 32 segments, which should be sufficient for a thousand words.

When it is formatted, the effective image can be up to a half-million pixels wide, which is sufficient for 30,000 characters 16 pixels each (including spaces), or about 5000 medium words, eight single-space typewritten pages. Properly encoded, no Bible episode should generate more than a quarter of that.

The first eight numbers in the first OTrx resource are a header:

+0  The the start of the node data (=8) in high, the next available item, just past end of the data, in low.
+1  The episode number
+2  The rectangle, in 16-pixel scroll units (may be 0 in file)
+3  The size of display data (may be 0 in file)
+4  The Bible reference
+5  The date&time this resource was created
+6  Pixel offset (to the left of this resource)
+7  (reserved for future use)
Subsequent OTrx resources in  the same episode have only a 4-word header:
+0  The the start of the node data (=4) in high, the next available item, just past end of the data, in low.
+1  The episode number
+2  Pixel offset (to the left of this resource)
+3  The size of display data (may be 0 in file)
After the header, each node in the graph consists of two numbers. The column top is distinctive, in that it defines the horizontal position of the translated word (which may be 0 in the file, it is calculated on the fly and saved to the file only if there is more than one resource), and a link to the next column top (the next translated word in the resource). If there is a gloss, it looks like a column top but the horizontal position part is zero (the gloss is centered under the previous column top) and there is no additional history data following it.

The second word of the column top or gloss item is an index into the EmTx (output text) or GlTx (gloss) resource with the same episode number. Duplicate words in the output text point to the same word in the word list. The index points in turn into a TxtS resource of the same episode sequence, where the text string of that word may be found. The index word is packed from three numbers, the byte offset into the selected TxtS resource, six upper bits of the resource number (added to the episode number) which count down from 31 (31744 + episode number), and the length of that word. Small colored rectangles represent non-text output, the first three "words" in the list.

Following the column top are rule references, two numbers each. Each rule reference consists of four 16-bit numbers packed into two integers: the line number in that rule, and the rule reference number, then some flags. The flag word is positive for lexical rules (seven bits of domain over nine bits of concept within that domain), and negative for a named rule index into a CodX resource (see "Compiled Rule Code"); its low bits are zero when there is more in  this column, or non-zero when the next number links to another column for a horizontal line or is zero at the root of the tree.

EmTx (output text) and GlTx (gloss) resources are sequences of integers. Each entry is packed with a byte offset (low 12 bits), then a part index (6 bits, tacked onto the 10-bit episode, which is the resource number where this word can be found), a 6-bit length, and 8 bits of display (pixel) width if it fits. These in turn point into TxtS resources containing the actual text, which is limited to 61440 bytes of output text (they could be unique words or word fragments, but now is straight output text, plus separators, including larger gaps at resource boundaries, so that no word crosses a boundary) in a single episode. This is more than enough to accommodate several thousand words, far larger than the biggest well-formed BibleTrans episode.

Font Data

A font specifier array has a prolog, the index, then the pixels. The first number of the prolog is by convention the font name. The second number is a composite of the line height, descent and ascent, packed one byte each as H*0x10000+D*0x100+A. The third and fourth numbers are the number of pixels separating each character, and the width of a space. A monospace font prolog has two more numbers, a zero followed by the character width (which should be the same as the space width), and the index is omitted; the pixels for "!" start in table position [6]. A proportional font index starts in position [4] for the space, which should be the same as for "!" because there are no non-white pixels in a space. The pixel information must be in ASCII order, and the final index position CHR(0x7F) pointing just past the last displayable character. Each character is encoded as a sequence of 1-word pixel columns, up to 32 pixels tall (the 32 bits in one integer), as many pixels wide as needed (but currently limited to a maximum of 63). You can think of each character as if rotated 90 degrees clockwise, thus:

with the pixels as bits in the table, as shown.

Rev. 2013 September 27