NSGMLS(1)

NSGMLS(1)

nsend Home Page User Commands Index nstat


NAME
       nsgmls - a validating SGML parser

       An  System Conforming to
       International Standard ISO 8879 --
       Standard Generalized Markup Language

SYNOPSIS
       nsgmls  [  -BCdeglprsuv  ]  [  -alinktype  ]  [ -bbctf ] [
       -csysid ] [ -Ddirectory ] [ -Emax_errors ] [  -iname  ]  [
       sysid...  ]

DESCRIPTION
       Nsgmls  parses  and validates the  document whose document
       entity is specified by  filename...   and  prints  on  the
       standard  output a simple ASCII representation of its Ele-
       ment Structure Information Set.  (This is the  information
       set  which  a structure-controlled conforming  application
       should act upon.)  If more than one system  identifier  is
       specified,  then  the  corresponding entities will be con-
       catenated to form the document entity. Thus  the  document
       entity  may  be spread amongst several files; for example,
       the SGML declaration, prolog  and  document  instance  set
       could  each  be  in a separate file.  If no system identi-
       fiers are specified, then nsgmls will  read  the  document
       entity  from  the  standard  input.  A command line system
       identifier of - can also be used to refer to the  standard
       input.   (Normally in a system identifier, <osfd>0 is used
       to refer to standard input.)

       The following options are available:

       -alinktype
              Make link type linktype active. Not all ESIS infor-
              mation  is output in this case: the active LPDs are
              not  explicitly  reported,   although   each   link
              attribute  is  qualified  with  its link type name;
              there is no information about result elements; when
              there  are  multiple  link  rules applicable to the
              current element, nsgmls always chooses the first.

       -bbctf Use the BCTF named bctf for output.

       -B     Batch mode. Parse each sysid...  specified  on  the
              command  line separately, rather than concatenating
              them. This is useful mainly with -s.
              If -tfilename is also specified, then the specified
              filename  will be prefixed to the sysid to make the
              filename for the RAST result for each sysid.

       -csysid
              Map public identifiers and entity names  to  system
              identifiers using the catalog entry file whose sys-
              tem identifier is sysid.  Multiple -c  options  are
              allowed.  If  there  is a catalog entry file called
              catalog in the same place as the  document  entity,
              it  will  be  searched  for immediately after those
              specified by -c.

       -C     The filename...  arguments  specify  catalog  files
              rather  than  the  document  entity.  The  document
              entity is specified by the first DOCUMENT entry  in
              the catalog files.

       -D directory
              Search  directory  for  files  specified  in system
              identifiers.  Multiple -D options are allowed.  See
              the  description  of the osfile storage manager for
              more information about file searching.

       -e     Describe open entities in  error  messages.   Error
              messages  always  include  the position of the most
              recently opened external entity.

       -Emax_errors
              Nsgmls  will  exit  after  max_errors  errors.   If
              max_errors is 0, there is no limit on the number of
              errors. The default is 200.

       -ffile Redirect errors to file.   This  is  useful  mainly
              with  shells  that  do  not  support redirection of
              stderr.

       -g     Show the generic identifiers of  open  elements  in
              error messages.

       -iname Pretend that
                     <<!ENTITY % name "INCLUDE">>
              occurs  at  the start of the document type declara-
              tion  subset  in  the   document   entity.    Since
              repeated definitions of an entity are ignored, this
              definition will take precedence over any other def-
              initions of this entity in the document type decla-
              ration.  Multiple -i options are allowed.   If  the
              declaration replaces the reserved name INCLUDE then
              the new reserved name will be the replacement  text
              of  the entity.  Typically the document type decla-
              ration will contain
                     <<!ENTITY % name "IGNORE">>
              and will use %name; in the status keyword  specifi-
              cation  of  a  marked section declaration.  In this
              case the effect of the option will be to cause  the
              marked section not to be ignored.

       -o output_option
              Output  additional  information  according  to out-
              put_option:
       L commands giving the current line number and filename.

       -p     Parse only the prolog.  Sgmls will exit after pars-
              ing the document type declaration.  Implies -s.

       -r     Warn about defaulted references.

       -s     Suppress  output.   Error  messages  will  still be
              printed.

       -u     Warn about undefined elements: elements used in the
              DTD  but  not  defined.   Also warn about undefined
              short reference maps.

       -v     Print the version number.

   Entity Manager
       An external entity resides in  one  or  more  files.   The
       entity manager component of sgmls maps a sequence of files
       into an entity in three sequential stages:

       1.     each carriage return character  is  turned  into  a
              non-SGML character;

       2.     each  newline character is turned into a record end
              character, and at the  same  time  a  record  start
              character  is  inserted  at  the  beginning of each
              line;

       3.     the files are concatenated.

       A system identifier is interpreted as a list of  filenames
       separated by colons.  A filename of - can be used to refer
       to the standard input.  If no system  identifier  is  sup-
       plied,  then the entity manager will attempt to generate a
       filename using the public identifier (if there is one) and
       other  information  available to it.  Notation identifiers
       are not subject to this treatment.  This process  is  con-
       trolled  by  the environment variable SGML_PATH; this con-
       tains a colon-separated list  of  filename  templates.   A

       filename template is a filename that may contain substitu-
       tion fields; a substitution field is a  %  character  fol-
       lowed  by  a single letter that indicates the value of the
       substitution.  If SGML_PATH uses the %S field  (the  value
       of  which  is the system identifier), then the entity man-
       ager will also use SGML_PATH to generate a filename when a
       system identifier that does not contain any colons is sup-
       plied.  The value of a substitution can either be a string
       or it can be null.  The entity manager transforms the list
       of filename templates into a list of filenames by  substi-
       tuting for each substitution field and discarding any tem-
       plate that contained a substitution field whose value  was
       null.   It  then  uses  the  first resulting filename that
       exists and is readable.  Substitution  values  are  trans-
       formed  before  being  used for substitution: firstly, any
       names that were subject to  upper  case  substitution  are
       folded  to  lower  case;  secondly,  space  characters are
       mapped to underscores and slashes are mapped to  percents.
       The  value of the %S field is not transformed.  The values
       of substitution fields are as follows:

       %%     A single %.

       %D     The entity's data content notation.  This substitu-
              tion  will succeed only for external data entities.

       %N     The entity, notation or document type name.

       %P     The public identifier if there was a public identi-
              fier, otherwise null.

       %S     The system identifier if there was a system identi-
              fier otherwise null.

       %X     (This is provided  mainly  for  compatibility  with
              ARCSGML.)  A three-letter string chosen as follows:
                                         |            |
                                         |            | With public identifier
                                         |            +-------------+-----------
                                         | No public  |   Device    |  Device
                                         | identifier | independent | dependent
              ---------------------------+------------+-------------+-----------
              Data or subdocument entity | nsd        | pns         | vns
              General SGML text entity   | gml        | pge         | vge
              Parameter entity           | spe        | ppe         | vpe
              Document type definition   | dtd        | pdt         | vdt
              Link process definition    | lpd        | plp         | vlp
              The device dependent version  is  selected  if  the
              public text class allows a public text display ver-
              sion but no public text display version was  speci-
              fied.

       %Y     The  type  of thing for which the filename is being
              generated:
              SGML subdocument entity    sgml
              Data entity                data
              General text entity        text
              Parameter entity           parm
              Document type definition   dtd
              Link process definition    lpd

       The value of the following  substitution  fields  will  be
       null unless a valid formal public identifier was supplied.

       %A     Null if the text identifier in  the  formal  public
              identifier  contains an unavailable text indicator,
              otherwise the empty string.

       %C     The public text class, mapped to lower case.

       %E     The  public  text  designating   sequence   (escape
              sequence) if the public text class is CHARSET, oth-
              erwise null.

       %I     The empty string if the  owner  identifier  in  the
              formal  public  identifier  is an ISO owner identi-
              fier, otherwise null.

       %L     The public text language,  mapped  to  lower  case,
              unless  the  public text class is CHARSET, in which
              case null.

       %O     The owner identifier (with the +//  or  -//  prefix
              stripped.)

       %R     The  empty  string  if  the owner identifier in the
              formal public  identifier  is  a  registered  owner
              identifier, otherwise null.

       %T     The public text description.

       %U     The  empty  string  if  the owner identifier in the
              formal public identifier is an  unregistered  owner
              identifier, otherwise null.

       %V     The public text display version.  This substitution
              will be null if the  public  text  class  does  not
              allow a display version or if no version was speci-
              fied.  If an empty version was specified,  a  value
              of default will be used.

   System declaration
       The system declaration for sgmls is as follows:
                          SYSTEM "ISO 8879:1986"
                                  CHARSET
       BASESET  "ISO 646-1983//CHARSET
                 International Reference Version (IRV)//ESC 2/5 4/0"
       DESCSET  0 128 0
       CAPACITY PUBLIC  "ISO 8879:1986//CAPACITY Reference//EN"
                                 FEATURES
       MINIMIZE DATATAG NO  OMITTAG  YES   RANK     NO  SHORTTAG YES
       LINK     SIMPLE  NO  IMPLICIT NO    EXPLICIT NO
       OTHER    CONCUR  NO  SUBDOC   YES 1 FORMAL   YES
       SCOPE    DOCUMENT
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Reference//EN"
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Core//EN"
                                 VALIDATE
                GENERAL YES MODEL    YES   EXCLUDE  YES CAPACITY YES
                NONSGML YES SGML     YES   FORMAL   YES
                                   SDIF
                PACK    NO  UNPACK   NO

       The  memory usage of sgmls is not a function of the capac-
       ity points used by a document; however, sgmls  can  handle
       capacities significantly greater than the reference capac-
       ity set.

       In some environments, higher values may be  supported  for
       the SUBDOC parameter.

       Documents  that do not use optional features are also sup-
       ported.  For example, if FORMAL NO  is  specified  in  the
       declaration, public identifiers will not be required to be
       valid formal public identifiers.

       Certain parts of the concrete syntax may be changed:
              The shunned character numbers can be changed.
              Eight bit characters can be assigned  to  LCNMSTRT,
              UCNMSTRT,  LCNMCHAR  and  UCNMCHAR.  Declaring this
              requires that the syntax reference character set be
              declared like this:
                     BASESET   "ISO Registration Number 100//CHARSET
                                ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
                     DESCSET   0 256 0
              Uppercase substitution can be performed or not per-
              formed both for entity names and for other names.
              Either short reference delimiters assigned  by  the
              reference  delimiter  set  or  no  short  reference
              delimiters are supported.
              The reserved names can be changed.
              The quantity set can be  increased  within  certain
              limits  subject  to  there  being sufficient memory
              available.  The upper limit on NAMELEN is 239.  The
              upper  limits on ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL,
              LITLEN, PILEN, TAGLEN, and  TAGLVL  are  more  than
              thirty  times  greater  than  the reference limits.
              The upper limit on GRPCNT, GRPGTCNT, and GRPLVL  is
              253.   NORMSEP  cannot  be  changed.   DTAGLEN  are
              DTEMPLEN irrelevant since sgmls  does  not  support
              the DATATAG feature.

    declaration
       The  declaration may be omitted, the following declaration
       will be implied:
                             <!SGML "ISO 8879:1986"
                                     CHARSET
       BASESET  "ISO 646-1983//CHARSET
                 International Reference Version (IRV)//ESC 2/5 4/0"
       DESCSET    0  9 UNUSED
                  9  2  9
                 11  2 UNUSED
                 13  1 13
                 14 18 UNUSED
                 32 95 32
                127  1 UNUSED
       CAPACITY PUBLIC  "ISO 8879:1986//CAPACITY Reference//EN"
       SCOPE    DOCUMENT
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Reference//EN"
                                    FEATURES
       MINIMIZE DATATAG NO OMITTAG  YES          RANK     NO  SHORTTAG YES
       LINK     SIMPLE  NO IMPLICIT NO           EXPLICIT NO
       OTHER    CONCUR  NO SUBDOC   YES 99999999 FORMAL   YES
                                  APPINFO NONE>
       with the exception that characters 128 through 254 will be
       assigned  to  DATACHAR.  When exporting documents that use
       characters in this range, an accurate description  of  the
       upper  half  of the document character set should be added
       to this declaration.   For  ISO  Latin-1,  an  appropriate
       description would be:
       BASESET   "ISO Registration Number 100//CHARSET
                  ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
       DESCSET   128 32 UNUSED
                 160 95 32
                 255  1 UNUSED

   Output format
       The output is a series of lines.  Lines can be arbitrarily
       long.  Each line consists of an initial command  character
       and  one  or more arguments.  Arguments are separated by a
       single space, but when a command takes a fixed  number  of
       arguments  the last argument can contain spaces.  There is
       no space between  the  command  character  and  the  first

       argument.   Arguments  can  contain  the  following escape
       sequences.

       \\     A \.

       \n     A record end character.

       \|     Internal SDATA entities are bracketed by these.

       \nnn   The character whose code is nnn octal.

       A record start character  will  be  represented  by  \012.
       Most  applications  will need to ignore \012 and translate
       \n into newline.

       The possible command characters and arguments are as  fol-
       lows:

       (gi    The start of an element whose generic identifier is
              gi.  Any attributes for this element will have been
              specified with A commands.

       )gi    The  end an element whose generic identifier is gi.

       -data  Data.

       &name  A reference to an external data entity  name;  name
              will have been defined using an E command.

       ?pi    A processing instruction with data pi.

       Aname val
              The  next  element  to  start has an attribute name
              with value val which takes  one  of  the  following
              forms:
              IMPLIED
                     The value of the attribute is implied.
              CDATA data
                     The  attribute  is  character data.  This is
                     used for attributes whose declared value  is
                     CDATA.
              NOTATION nname
                     The attribute is a notation name; nname will
                     have been defined using a N  command.   This
                     is  used for attributes whose declared value
                     is NOTATION.
              ENTITY name...
                     The attribute is a list  of  general  entity
                     names.   Each  entity  name  will  have been
                     defined using an I, E or S command.  This is
                     used  for attributes whose declared value is
                     ENTITY or ENTITIES.
              TOKEN token...
                     The attribute is a list of tokens.  This  is
                     used  for attributes whose declared value is
                     anything else.

       Dename name val
              This is the same as the A command, except  that  it
              specifies  a  data attribute for an external entity
              named ename.  Any D commands will come after the  E
              command  that  defines  the  entity  to  which they
              apply, but before any & or A commands  that  refer-
              ence the entity.

       Nnname nname.  Define a notation This command will be pre-
              ceded by a p command if the notation  was  declared
              with a public identifier, and by a s command if the
              notation was declared with a system identifier.   A
              notation will only be defined if it is to be refer-
              enced in an E command or in an  A  command  for  an
              attribute with a declared value of NOTATION.

       Eename typ nname
              Define  an  external  data  entity named ename with
              type typ (CDATA, NDATA or SDATA) and notation  not.
              This command will be preceded by one or more f com-
              mands giving the filenames generated by the  entity
              manager  from the system and public identifiers, by
              a p command if a public identifier was declared for
              the  entity, and by a s command if a system identi-
              fier was declared for the entity.   not  will  have
              been  defined  using  a N command.  Data attributes
              may be specified for the entity using  D  commands.
              An  external data entity will only be defined if it
              is to be referenced in a & command or in an A  com-
              mand  for  an  attribute  whose  declared  value is
              ENTITY or ENTITIES.

       Iename typ text
              Define an internal data  entity  named  ename  with
              type typ (CDATA or SDATA) and entity text text.  An
              internal data entity will only be defined if it  is
              referenced  in  an A command for an attribute whose
              declared value is ENTITY or ENTITIES.

       Sename Define a subdocument entity named ename.  This com-
              mand  will  be  preceded  by one or more f commands
              giving the filenames generated by the  entity  man-
              ager from the system and public identifiers, by a p
              command if a public identifier was declared for the
              entity,  and  by a s command if a system identifier
              was declared for the entity.  A subdocument  entity
              will  only  be  defined  if it is referenced in a {
              command or in an A command for an  attribute  whose
              declared value is ENTITY or ENTITIES.

       ssysid This  command applies to the next E, S or N command
              and specifies the associated system identifier.

       ppubid This command applies to the next E, S or N  command
              and specifies the associated public identifier.

       ffilename
              This command applies to the next E or S command and
              specifies an associated filename.   There  will  be
              more than one f command for a single E or S command
              if the system identifier used a colon.

       {ename The start of the  subdocument entity  ename;  ename
              will have been defined using a S command.

       }ename The end of the  subdocument entity ename.

       Llineno file
       Llineno
              Set  the  current  line  number  and filename.  The
              filename argument will be omitted if only the  line
              number  has  changed.   This will be output only if
              the -l option has been given.

       #text  An APPINFO parameter of text was specified  in  the
              declaration.   This  is  not  strictly  part of the
              ESIS, but  a  structure-controlled  application  is
              permitted  to act on it.  No # command will be out-
              put if APPINFO NONE was  specified.   A  #  command
              will  occur  at most once, and may be preceded only
              by a single L command.

       C      This command indicates that the document was a con-
              forming   document.   If this command is output, it
              will be the last command.  An  document is not con-
              forming  if it references a subdocument entity that
              is not conforming.

BUGS
       Some non-SGML characters in literals are  counted  as  two
       characters  for the purposes of quantity and capacity cal-
       culations.

SEE ALSO
       The  Handbook, Charles F. Goldfarb
       ISO 8879(Standard Generalized Markup Language) Interna- 
       tional Organization for Standardization

ORIGIN
       ARCSGML was written by Charles F. Goldfarb.

       Sgmls   was   derived   from   ARCSGML   by   James  Clark
       jjc@jclark.com, to whom bugs should be reported.

nsend Home Page User Commands Index nstat