[Word Processing Conversion - InterScript]

656 Kreag Road - Pittsford, NY 14534-3730 - USA
PHONE: (US)-585-385-3810 FAX: (US)-585-385-6822 WEB:
www.acii.com


InterScript - Document Description Language

Table of Contents


1 - What is InterScript?

InterScript is a language to efficiently describe the textual, layout and formatting content of word processing documents. The InterScript file format is designed to be as simple as possible, while at the same time support even advanced and complex document formatting functions.

An InterScript reader/writer converts InterScript files to popular word processor and other industry-standard document file formats (such as MS Word, HTML, RTF, etc), as well as the other way around. This is very useful for software developers and programmers who develop applications that need to read or write formatted documents in these different file formats. By incorporating InterScript read/write technology into their own product, they now need to read or write only one file format - InterScript .

1.1 - Standard vs. Customized InterScript

Basically, InterScript files represent formatting and control functions by means of mark-up tags . Standard InterScript refers to an implementation of InterScript which uses the standard tags as described in this document, and which does not translate tags or characters in a non-standard fashion.

However, InterScript gives you the flexibility of defining your own tags in place of the standard ones, including changing the tag markers. It also allows customized translation at the character level, for example to accommodate a specific character set or code page. Implementations of InterScript that take advantage of such customization features are called Customized InterScript . Thus, you can actually tailor the InterScript language to closely resemble a customized mark-up file format that may be peculiar to a particular application.


2 - Basic Components of InterScript

This chapter discusses the basics of InterScript structure, and its two basic components: text and commands.

2.1 - Text Representation

Text is represented in Standard InterScript by printable ASCII characters in the range 32 through 127. Other characters in the file have no significance. Therefore, for example, carriage returns (ASCII code 13) and line feeds (ASCII code 10) may be inserted in Standard InterScript files to improve readability, but they have no semantic significance. Similarly, the tab character (ASCII code 9), formfeed character (ASCII code 12), bell (ASCII code 7), backspace (ASCII code 8), etc., have no semantic significance in Standard InterScript . If such characters are encountered, they are ignored.

It is important to note that unlike standard ASCII files, a line ending in a Standard InterScript file is not an implied space or a word break. If a space is desired following the word just before a line ending, it must be explicitly included in the file.

In Customized InterScript , any character in the range 1-255 may be assigned special significance. This feature enables the use of non-ASCII codes to represent special functions. It may also be used to assign special significance to printable ASCII characters. For example, the standard ASCII character "<" may be used to represent a hard return. Similarly, characters in the upper 128 set may be assigned to represent frequently used symbols, or to directly supporting an extended character set such as a Dos Code Page or the ANSI set.

2.2 - Command Sequences

In Standard InterScript , all document characteristics other than straight ASCII text are represented by tags, also known as command sequences. In Standard InterScript , a command sequence begins with the opening curly parenthesis ({), and terminates with the closing one (}). Any number of command sequences may be entered between the opening and closing parentheses. In Customized InterScript this convention may be changed, and document formatting functions may be implemented by command sequences, by single bytes, or combinations of these.

A command sequence consists of a command word, followed by parameters (also called values or arguments) if necessary. Commands and parameters must be separated from each other by at least one space. Similarly, multiple command sequences must be separated from each other by at least one space. In general, missing numeric arguments are assumed to be 0, except for exceptions noted herein.

Here is an example of InterScript text containing command sequences:
In this example, {bold+}this portion is boldfaced{bold-}.
It is also possible to {bold+ ital+}boldface and
italicize{ital- bold-} at the same time.
This Standard InterScript segment, when printed after conversion, may appear something like this:
In this example, this portion is boldfaced. It
is also possible to boldface and italicize at
the same time.
Command words may be entered in upper, lower or mixed case. The case is insignificant as far as InterScript is concerned. The only exception to this is 2-letter commands representing accented letters as described later.


3 - Command Arguments

As described in the previous section, many InterScript commands take one or more arguments. These arguments may be of many different types. This chapter discusses the different types of these arguments in general, and how they are specified.

3.1 - Numeric arguments

Some commands require a numeric argument. This argument is simply entered as a number (base 10). For example:

{nunln 2} --- specifies double-underline.

3.2 - Single characters

Some commands require a single character as an argument. This argument is simply entered as the character. For example:

{ovschr /} --- specifies the slash character for overstrike.

3.3 - Character strings

Some commands require a character string argument (e.g., a font name is a character string). If the string is a single simple word (i.e., it does not contain any spaces or curly parentheses), it may be entered as is. Otherwise, it must be entered as a string delimited by single- or double-quotes. In the latter case, if the delimiter character also occurs inside the string, it should be entered twice within in the string.

The leading quote character may be used to enter line or page control characters as follows:
If the leading quote is not followed by one of the characters r, n, t, or f, the quote itself is ignored but the following character is output as is. Therefore, to enter the leading quote itself, it must be entered twice.

If a backslash is the first character of a string, the following character is taken literally.

Here are some examples of commands taking string arguments:

{fontnm Arial} --- The font name string is a simple word.
{fontnm "Times New Roman"} --- The font name contains spaces, so is enclosed in quotation marks.
{fontnm "Typographic ""Classic"" Bold"} --- The font name is Typographic "Classic" Bold.
{asc_lend `n`r} --- ASCII line ending is line-feed followed by carriage-return.

3.4 - Measurement values

Some commands take measurement values as one or more arguments. For example, the left margin command requires a value to specify the left margin, as in {lmar 1.5"}. Values may be specified with or without a decimal point, and may additionally be immediately followed by any of the following modifiers entered in upper or lower case. These modifiers denote the unit of measurement, and are called unit descriptors:
The first five modes of measurement are absolute, while the last one is relative to the character size in effect at that point in the document. If no unit descriptor is given, a default is used. In Standard InterScript the default measure is in inches, but this may be changed as described later.

Whether specified using absolute or character-relative measure, these parameters are internally converted to absolute measurements using the current value for character width (horizontal measure) or line height (vertical measure). For example, {lmar 10u} specifies a left margin of 10 characters. Assuming that the character width at the point the command is specified is 0.1" (10 characters per inch), that translates to a left margin of one inch. Later in the document, if the character width is changed (say, to 15 characters per inch), the left margin will stay at one inch because it is internally stored as an absolute value. Character width and Line height values must be specified using absolute measurement, since these values cannot be specified relative to themselves.

Here are some examples of Standard InterScript commands that take measurement values as arguments:

{lmar 12u} --- Set left margin to 12 character positions.
{rmar 1.25"} --- Set right margin at 1.25 inches.
{colgut 144t} --- Set column gutter to 144 twips (1/10 inch).
{rind 2c} --- Set right indent to 2 centimeters.
{lht 12p} --- Set line height to 12 printer points.

3.5 - Line Spacing values

Line spacing (which is the ratio of inter-line spacing to line height) for various components of a document may be specified using a decimal point, or as a fraction by using the divide (/) operator. For example:

{lsp 3/2} --- Line spacing one and a half.
{lsp 1.5} --- Also one and a half line spacing.

3.6 - Horizontal alignment

Horizontal alignment is specified by one of the following arguments:

3.7 - Basic Horizontal Placement arguments

Basic horizontal placement with respect to the currently effective margins is specified by one of the following arguments:

3.8 - Basic Vertical Placement arguments

Basic vertical placement with respect to the currently effective margins is specified by one of the following arguments:

3.9 - Complete Horizontal Placement arguments

Complete horizontal placement takes into account all the details about placing an object (such as an image) with respect to any of the page's horizontal milestones.

The horizontal milestones of a page (denoted here by href) are:
A complete horizontal placement specification can take any of the following forms:

3.10 - Complete Vertical Placement arguments

Complete vertical placement takes into account all the details about placing an object (such as an image) with respect to any of the page's vertical milestones.

The vertical milestones of a page (denoted here by vref) are:
A complete vertical placement specification can take any of the following forms:

3.11 - Color specification

Color is specified by its red, green, blue and opacity components, in that order. The red, green and blue components can range from 0 (minimum) to 255 (maximum), while opacity ranges from 0 (minimum) to 200 (maximum). The values are specified one after the other, separated by the colon character, with no intervening spaces. For example:

{fontco 255:0:0:200} --- Red color with 100% opacity.

Note that an opacity of 0 means no color. Usually, the value 0:0:0:0 is used to denote no color.

An opacity value greater than 200 means no enforcement of color. In this case, the color is not altered, but stays at whatever color has been inherited by the document at that point in the file.

3.12 - Line Style Arguments

Line Style arguments describe line segments used in borders that enclose text or other items (such as paragraph or table cell borders), as well as horizontal or vertical lines more generally inserted in a document. A line style is specified by means of a group of characters forming a single word. The constituent letters are as follows:

3.13 - White Space specification

White space in text is the space between words. InterScript recognizes three categories of white space, and each is denoted by a representative letter:
A whitespace specification is a string of one or more of these characters (no intervening spaces), or "n" if none of the three is intended. For example:

{aunln st} --- Also underline spaces and tabs.
{aovstk sti} --- Also strikeout whitespace produced by spaces, tabs and indents.
{aunln n} --- Do not underline any white space.

3.14 - Number Style arguments

Some commands indicate the style in which automatic numbers should be generated (e.g., automatic page numbers, footnote or endnote numbers, automatic paragraph numbers, outline tags, etc.). These commands take a single letter as an argument. The case (upper or lower) of the argument is significant as described below:
For example, the Standard InterScript sequence {pgnstyle A} denotes that pages should be numbered using the upper case alphabetic numbering scheme.


4 - Command Translation

InterScript lets you replace a command by a replacement sequence. This replacement sequence is specified as a string of InterScript commands. The InterScript command to effect such a replacement is of the form cmd> where cmd represents the command being replaced. For example, the command sequence {bold+> "shad+ ital+" bold-> "ital- shad-"} indicates that {bold+} should be replaced by the command sequence {shad+ ital+}, while {bold-} should be replaced by {ital- shad-}. This replacement process is called command translation, and is usually done in conversion customization files so that specific document functions may be converted in a special way. The replacement string is called the InterScript command translation string. There are some special cases:
InterScript command translation is non-recursive, so the replacement string may contain the command being replaced without causing infinite recursion. This includes a check on self-recursion as well as cross-recursion. Therefore, for example, the command string {bold+> "bold+ ital+" bold-> "ital- bold-"} is valid, and causes boldfacing to be replaced by boldface and italics.


5 - Special Characters

This chapter discusses how special characters such as mathematical symbols, typographic symbols, non-English characters, accents (such as umlaut, grave, etc.), decorative symbols, etc. are represented in InterScript .

5.1 - Non-ASCII Characters

A character that does not belong to the ASCII range 32-126 (decimal) is represented in Standard InterScript by the command #..., where ... represents one of the following:
ASCII characters in the range 32-126 may also be represented this way. Therefore, curly parentheses (which otherwise have special significance in InterScript ) may be represented by their decimal codes 123 and 125 respectively. For example, the text One, [two] and {three} would be represented in Standard InterScript as One, [two] and {#123}three{#125}.

5.2 - Accented Characters

Accented characters may be represented in InterScript using the #... command described above, or using special two-character commands. In these 2-character commands, the first character represents the accent as follows:
============================
CHAR    Accent Represented
----------------------------
  '     Acute
  ,     Cedilla
  ^     Circumflex
  `     Grave
  /     Stroke
  ~     Tilde
  "     Umlaut
============================
The second character is the letter to be accented. For example, the letter "e-circumflex" may be represented as {^e}. Note that the case (upper or lower) of the second character is significant.

5.3 - Character Translation

InterScript permits the replacement of any character by any user-defined sequence of text and/or commands. This is particularly useful to replace specific extended characters by a different character, or in fact by any custom text. The command to do this is:

6 - Page Size and Margin Commands

The commands in this chapter relate to page size, and the left, right, top and bottom margins. Note that all margin settings are measured from the nearest page edge.
The following commands determine if the left margin in the converted document should be based on the native setting provided by certain destination applications, or if that formatting should be overridden by that specified in the source document.
The following commands determine if the right margin in the converted document should be based on the native setting provided by certain destination applications, or if that formatting should be overridden by that specified in the source document.

7 - Character Format Commands

Character format commands are those that affect the appearance of characters in a document. This chapter discusses these commands.

7.1 - Print Enhancement

The commands described in this section cause text to be printed or displayed using enhanced functions, or control such enhancements.

The following commands control boldfacing:
The following commands control italics:
The following commands control underlining:
The following commands control overlining (this is like underlining, except that the line(s) are printed above rather than under the emphasized text):
The following commands control highlighting. This is a form of emphasis where the emphasized text prints with a color around it, as if it has been highlighted with a highlighter marker.
The following commands control turning overstriking on or off, and the character used for overstriking. This function is also called strikeout.
The following commands control outline-styled characters:
The following commands control shadow-styled characters:
The following commands control upper-case translation:
The following commands control the appearance of characters in small capitals:
The following commands control the marking of text as inserted. This refers to text that has been inserted in revision mode.
The following commands control the marking of text as deleted. This refers to text that has been deleted in revision mode. It is also referred to as redlined text.
The following commands control the appearance of text in reverse video. This mode is sometimes used to make selected text stand out on the screen.
The following commands control the blinking of text when it is displayed on the screen. They generally have no effect on the printed document.
The following commands control engraved appearance of text.
The following commands control embossed appearance of text.
The following commands control hidden text.
The following Standard InterScript segment serves as an example of the use of print enhancement functions:
We can {bold+}boldface{bold-} and
{ital+}italicize{ital-} text. We can also
{bold+}{ital+}do boldface and italics{ital-}{bold-}
at the same time.
This Standard InterScript segment, when printed after conversion, may appear something like this:
We can boldface and italicize text.
We can also do boldface and italics at the
same time.

7.2 - Superscript & Subscript Commands

Many word processors treat superscript and subscript like other print enhancement functions, in that they may be toggled or turned on or off. However, InterScript provides more advanced treatment of superscripts and subscripts in that it is possible to have multiple levels of superscripts and subscripts. For example, a superscript on a superscript would be a level-2 superscript. InterScript addresses this by defining a superscript/subscript level called the supsub level. Normal baseline text has a supsub level of 0. A simple superscript has a supsub level of 1, while a simple subscript has a supsub level of 0. Higher level superscripts have supsub values of 2, 3, etc., while lower-level subscripts have supsub levels of -2, -3, etc.
For example, the mathematical expression a raised to (n squared) may be represented in Standard InterScript as a{sup+}n{sup+}2{sup- sup-}, or as a{sup+}n{sup+}2{sup}. The function ((a squared) + (b squared)) may be represented simply by a{sup}2{sup} + b{sup}2{sup}.

7.3 - Fonts

InterScript supports specification of font name, size and color. The various font commands are:

7.4 - Hard and Soft Character Codes

This chapter describes commands that insert special codes which act like printed characters but carry special significance.

8 - Paragraph Format Commands

Paragraph format commands are those that affect the appearance of paragraphs in a document. This chapter discusses these commands.

8.1 - Paragraph Alignment & Justification

Alignment refers to how the lines in a paragraph are horizontally aligned with respect to each other (i.e., aligned at the left, center or right). Justification refers to whether or not lines are stretched (or compressed) to make them all the same length so that both the left and right margins are even.

The following commands affect paragraph alignment:
The following commands affect justification, and are relevant if the paragraph alignment mode is left-align. Justification refers to stretching (and possibly compressing) lines so that the left and the right margins are even. Further, a paragraph may be partially justified by specifying the justification percentage. For example, if a paragraph is 50% justified, the raggedness of the right margin is about half-way between normal ragged right (no justification) and completely even (100% justification). This may be used to create the appearance of unjustified text while at the same time minimizing the unevenness of the right margin.

8.2 - Line spacing, line height and leading

Line spacing refers to the distance between the lines in a paragraph expressed in terms of ratio to the size of the characters used in the lines. Line height refers to the actual height of a line. Leading refers to additional space inserted between lines.

The following commands affect line spacing before, within and after a paragraph:
The following commands determine whether the line height is automatically adjusted based on the font size of the characters in a line, or whether it is an absolute value. The absolute line height value is used if automatic line height adjustment is off.
The following commands determine the leading:

8.3 - Indents

An Indent refers to the space inserted between the edge of a line and the margin setting. This may refer to space on the left of a line (left indent) or right (right indent). If the left indent of the first line of a paragraph is less than that of the subsequent lines, it is also referred to as a hanging indent.

The following commands affect the indentation of a paragraph:

8.4 - Paragraph Protection

This section refers to commands that protect parts of a paragraph from soft page breaks.

The following commands prevent widows in a paragraph, which are isolated lines at the start of a paragraph (usually the first one or two lines) appearing at the bottom of a page while the rest of the paragraph appears on the following page:
The following commands prevent orphans in a paragraph, which are isolated lines at the end of a paragraph (usually the last one or two lines) appearing at the start of a page while the rest of the paragraph was on the preceding page:

8.5 - Paragraph Borders

The following commands affect the style of paragraph borders. A value of 0 means no border. This portion of InterScript is currently under construction.
The following commands affect the thickness of paragraph borders. A value of 0 means a hairline border (i.e., the thinnest border supported on any output device).
The following commands affect the gutters of paragraph borders:
The following commands affect the paragraph border colors:

8.6 - Heading Level

The following commands affect the heading level of a paragraph. The highest level is number 1, next lowest is number 2, etc. Level 1 is usually the title of the entire document (e.g., title of a book). Level 2 usually refers to chapters, level 3 to sections within a chapter, level 4 to sub-sections, and so on. A level of 0 refers to normal paragraphs that are not headings.
The following commands determine if heading-level paragraphs should use the native formatting provided by the destination application, or if that formatting should be overridden by that specified in the source document.

8.7 - Miscellaneous Paragraph Formatting

The following commands affect the color used for the background of a paragraph of text.

9 - Line Ending Commands

This chapter describes commands that insert different kinds of line endings, and result in a move to a new line.

10 - Column Ending Commands

This chapter describes commands that insert different kinds of column endings, and result in a move to a new column.

11 - Page Ending Commands

This chapter describes commands that insert different kinds of page endings, and result in a move to a new page.

12 - Tabs

These commands affect tab settings. InterScript recognizes different kinds of tabs, including centered, right-aligned etc. Some tabs are defined as "character-aligned", i.e., a specific text character lines up at that location. That character is called the alignment character. Decimal tabs are a special case of these, where the alignment character is a decimal point. Some tabs are defined with leaders, i.e., a special character called the leader character is used to fill the white space when positioning to those tab points.

The following commands clear or set tabs:
The following commands specify the alignment character (for character-aligned tabs), and the leader-fill character (for leader tabs).
The following commands turn leaders on or off. When leaders are on, the leader fill character specified in the last {ldrfchr} command is used as a filler when positioning horizontally. Also, any tab defined when leader-fill is turned on is defined to be a leader tab using the current leader character.

13 - Horizontal Positioning

These commands represent "tab functions", enabling horizontal positioning of subsequent text, as well as defining centered, right-aligned, decimal-aligned or character-aligned fields. In all these cases, the argument represents the distance from the left edge of the current paragraph to the position point. Further, if leader fill is turned on (see {ldrf+}), then the most recently defined leader character (see {ldrfchr}) is used to fill the white space when positioning subsequent text.

It is not necessary to set tabs when using these commands. InterScript processors automatically set tabs at the appropriate positions if required to ensure correct positioning.

Centered, right-aligned and character-aligned (of which decimal-aligned is a special case) fields terminate implicitly on any end-of-line kind of code or another horizontal positioning code. Or, they may be explicitly terminated by the {endfld} command described below.
The following commands affect positioning of subsequent text in relation to the current effective margins, which are defined as the horizontal limts of the current paragraph (excluding the left and right indents):
The following commands control whether or not reverse-positioning is allowed to take place. When reverse-positioning is enabled, the destination wordprocessor may backspace or reverse-tab in order to move to the left of the current position:

14 - Headers and Footers

This chapter describes commands that affect document headers and footers. Headers are relatively constant text that is repeated at the top of each page while the header is active. Similarly, footers are repeated at the bottom of each page while active.

Many of these commands work identically to the corresponding commands in the main document (body) text. Note that the commands described in this chapter appear outside headers or footers, but their effect applies to headers and/or footers. In general, most commands that can be applied to body text (e.g., print enhancements, font changes, paragraph formatting, etc.) can also occur inside headers and footers, and have their usual effect on formatting of the header/footer contents. Those commands have not been repeated in this chapter.

14.1 - Header/Footer Margins

The following commands affect header/footer margins, which refer to the distances from the edges of the header or footer to the nearest paper edge.

14.2 - Header/Footer Paragraph Formatting

The following commands affect paragraph alignment and justification within headers/footers.
The following commands affect line spacing and line height within headers/footers:

14.3 - Header/Footer Character Formatting

The following commands affect fonts within headers/footers:

14.4 - Other Header/Footer Commands

The following commands define headers/footers, or turn them on or off:

15 - Footnotes/Endnotes

This chapter describes commands that affect footnotes/endnotes. Many of these commands work identically to the corresponding commands in the main document (body) text.

Note that the commands described in this chapter appear outside footnotes or endnotes, but their effect applies to footnotes/endnotes. In general, most commands that can be applied to body text (e.g., print enhancements, font changes, paragraph formatting, etc.) can also occur inside footnotes and endnotes, and have their usual effect on formatting of the footnote/endnote contents. Those commands have not been repeated in this chapter.

15.1 - Footnote/Endnote Margins

The following commands affect footnote/endnote margins, which refer to the distances from the edges of the footnote or endnote to the nearest paper edge.

15.2 - Footnote/Endnote Paragraph Formatting

The following commands affect paragraph alignment and justification within footnotes and endnotes.
The following commands affect line spacing and line height within footnotes and endnotes: