Previous Next Contents

2. Standards for representation of Hebrew characters

2.1 ASCII

To make one thing clear, for once and forever: There is no such thing as 8-bit ASCII. ASCII is only 7 bits. Any 8-bit code is not ASCII, but that doesn't mean it's not standard. ISO-8859-8 is standard, but not ASCII. Thanks!

2.2 DOS Hebrew

The Hebrew encoding starts at 128d for Aleph. Therefore, encoding requires 8 bits. This is what you have on the Video card EPROM hardware fonts, all of the Hebrew DOS based editors use this table (Qtext, HED, etc.).

2.3 ISO Hebrew

The Hebrew encoding starts at 224 for Aleph. This is the Internet standard, international standard and basically the standard for Ms-Windows and for Macintoshes (Dagesh, etc...).

2.4 OLD PC Hebrew

This is 7-bit, and obsolete, as it occupies essentially the same ASCII range as English lowercase letters. So, it is best avoided. However, when ISO Hebrew gets its eighth bit stripped off by some ignorant Unix mail program (so you get a jumble of English letters for the Hebrew part of your message and the regular English, reversed or not, mixed in), you will get this, and will need to transform it to PC or ISO. If there was English mixed in with the Hebrew, this will be a sad situation, as you will either get Hebrew plus jumble, or English plus jumble...

2.5 Conversions

Here are some simple scripts to convert from each standard to the other:

DOS - ISO:      tr '\200-\232' '\340-\372' < {dos_file} > {iso_file}
ISO - DOS:      tr '\340-\372' '\200-\232' < {iso_file} > {dos_file}
OLD - DOS:      tr -z '\200-\232' < {old_Hebrew_file} > {dos_file}

NOTE: The numbers use by tr are in octal!


Previous Next Contents