3.2.10 Character Set Control
Normally GNAT recognizes the Latin-1 character set in source program
identifiers, as described in the Ada Reference Manual.
This switch causes
GNAT to recognize alternate character sets in identifiers. c is a
single character indicating the character set, as follows:
- ISO 8859-1 (Latin-1) identifiers
- ISO 8859-2 (Latin-2) letters allowed in identifiers
- ISO 8859-3 (Latin-3) letters allowed in identifiers
- ISO 8859-4 (Latin-4) letters allowed in identifiers
- ISO 8859-5 (Cyrillic) letters allowed in identifiers
- ISO 8859-15 (Latin-9) letters allowed in identifiers
- IBM PC letters (code page 437) allowed in identifiers
- IBM PC letters (code page 850) allowed in identifiers
- Full upper-half codes allowed in identifiers
- No upper-half codes allowed in identifiers
- Wide-character codes (that is, codes greater than 255)
allowed in identifiers
See Foreign Language Representation, for full details on the
implementation of these character sets.
- Specify the method of encoding for wide characters.
e is one of the following:
For full details on these encoding
methods see Wide Character Encodings.
Note that brackets coding is always accepted, even if one of the other
options is specified, so for example -gnatW8 specifies that both
brackets and UTF-8 encodings will be recognized. The units that are
with'ed directly or indirectly will be scanned using the specified
representation scheme, and so if one of the non-brackets scheme is
used, it must be used consistently throughout the program. However,
since brackets encoding is always recognized, it may be conveniently
used in standard libraries, allowing these libraries to be used with
any of the available coding schemes.
- Hex encoding (brackets coding also recognized)
- Upper half encoding (brackets encoding also recognized)
- Shift/JIS encoding (brackets encoding also recognized)
- EUC encoding (brackets encoding also recognized)
- UTF-8 encoding (brackets encoding also recognized)
- Brackets encoding only (default value)
If no -gnatW? parameter is present, then the default
representation is normally Brackets encoding only. However, if the
first three characters of the file are 16#EF# 16#BB# 16#BF# (the standard
byte order mark or BOM for UTF-8), then these three characters are
skipped and the default representation for the file is set to UTF-8.
Note that the wide character representation that is specified (explicitly
or by default) for the main program also acts as the default encoding used
for Wide_Text_IO files if not specifically overridden by a WCEM form