STGuru User's Guide

System Requirements Contents  2. Built-in Code Recognition Mechanism


Conversions Between Simplified Chinese and Traditional Chinese

 

1. Intelligent High-Precision Code Conversions Between Simplified Chinese and Traditional Chinese


1.1 The Leading Intelligent High-Precision Code Conversion Engine of STGuru
1.2 Edit Area Conversions, Clipboard Conversions, File/Web Page/Directory/Site Conversions
    1.2.1 Edit Area Conversions
    1.2.2 Clipboard Conversions
    1.2.3 File/Web Page/Directory/Site Conversions
    1.2.4 File/Folder Batch Conversions
1.3 How to Convert Files in Special Formats Such as Word, Excel, PowerPoint, Access, Trados TM...?

1.4 Real-Time/Dynamic Conversions, Command Line Conversion Interface, Batch Conversions

1.5 Unicode-Related Conversions: Conversions Between Unicode/Unicode BE and GBK, Big5 and UTF-8

1.6 Intelligent Word Adjustment
1.7 How to Perform Perfect Conversions by STGuru?
1.8 Background Knowledge: Simplified Chinese, Traditional Chinese, GBK, GB2312, Big5, UTF-8, Unicode and Unicode BE

 

1.1 The Leading Intelligent High-Precision Code Conversion Engine of STGuru


STGuru is the pioneer in high-precision code conversion between Simplified Chinese and Traditional Chinese. A great leading feature of this program is it can help assuring perfect quality in conversion without post-conversion manual modifications.

 

Great efforts were made to increase the precision of code conversion. Most single characters in the GBK table, the GB2312 table and the Big5 table were manually considered with great care, by many specially designed programs or through some specially designed procedures. Many related standards and vast volume of modern text resources were checked again and again to make sure the conversion results are correct and suitable.

 

Besides, we also carefully designed and tuned out a unique word adjustment engine and a series of carefully designed word adjustment libraries, which will help users arrive at their ultimate expectation for a code conversion engine - perfection.

 

STGuru has a unique word adjustment pack management system. It can be used to tune out a word adjustment pack for a specific code conversion project. You can easily keep perfect conversion quality with a well-maintained code conversion pack for your conversion task. For more about word adjustment, please refer to 1.6 Intelligent Word Adjustment.

 

Unicode code set is the currently prevailing code set. It is bigger than the GBK code set and it contains the latter. Some Unicode characters cannot be found within the GBK code set. Since STGuru jumped out of the GB2312 code set and used the GBK code set many years ago, today, it jumps ahead and embraces the Unicode code set. We have improved the conversion engine and it now supports perfect Unicode conversions for pairs between UTF-8, Unicode and Unicode BE in Clipboard Conversions and File/Folder/Directory/Site conversions. Now,  these conversions are executed based on the brand-new Unicode conversion engine.

 

The code conversion engine of STGuru was specially optimized. Its speed is fast among its kind. The actual speed of code conversion depends on the performance of your computer, as well as the number of applied items in the working word adjustment pack.

 

STGuru provides full series of code conversion services - clipboard conversion, edit area conversion, file/web page/directory/site conversions, and provided professional guidelines for conversions for documents in special formats such as MS Word, Excel, PowerPoint, Access, Trados TM, etc., all at consistent professional quality.

 

1.2 Edit Area Conversions, Clipboard Conversions, File/Web Page/Directory/Site Conversions
 

These conversions work on text in the edit areas or the clipboard, on independent files/web pages or directories/sites in a whole. They support professional conversion between any valid combinations of Simplified Chinese (CHS), Traditional Chinese (CHT) and GBK, Big5 (CHT only), UTF-8, Unicode and Unicode BE.

 

 

Pic UG-1-1 All code conversion commands can be found from under the Code Conversion submenu.

 

 

Pic UG-1-2 Edit area code conversion and file/site code conversion buttons on upper toolbar. The two buttons in the left circles are for edit area code conversions (GBK->Big5 and Big5->GBK). Default direction for edit area conversion is from upper edit area to lower edit area. You can also set the direction as Upper -> Upper if you like (see Pic UG 5-4).

 

 

Pic UG-1-3 A code conversion progress bar will show if you convert text of more than 100 KB in the clipboard or edit area.

 

 

1.2.1 Edit Area Conversions

 

Edit area conversion converts text between Simplified Chinese and Traditional Chinese in the upper edit area. You can perform conversion with the "Edit Area Conversion" panel, which provides most comprehensive conversion options. You can convert between any valid combinations of Simplified Chinese and Traditional Chinese in GBK, Big5 and UTF-8. Beside this panel, you can also perform edit area conversions with relevant commands on the secondary menu on the Code Conversion menu and buttons on the upper or lower toolbars.

 

As edit area conversion only converts text in the upper (or source) edit area, such commands are available only when you have text in the upper edit area.

 

 

Pic UG-1-4 With "Edit Area Conversion" under the "Code Conversion" menu, you can open this "Edit Area Conversion" panel. As source text locates in the upper edit area, this panel can be opened only when there is text in the upper edit area.

 

1) Convert with the "Edit Area Conversion" Panel

 

Steps

This panel can be opened with the "Edit Area Conversion" command under the "Code Conversion" menu. The shortcut key is Ctrl+Q. Please refer to Pic UG-1-4 to view this panel.

 

Functions and Notes

Source Location: Selected text in the upper edit area (available only when some text is selected in the upper edit area) or all text in the upper edit area.

Target Location: Upper edit area (not recommended), lower edit area or clipboard. If the target location is set as the upper edit area, the source text will be overwritten, so this option is not recommended. If it is set as the clipboard, an additional option is available which first converts the result into Unicode, and then paste it into the clipboard.

Conversion Direction: Source code or target code can be any one in the following list:

 

    * Simplified Chinese (GBK)

    * Simplified Chinese (UTF-8)

    * Traditional Chinese (Big5)

    * Traditional Chinese (GBK)

    * Traditional Chinese (UTF-8)

 

So you have 5*4=20 optional directions.

 

2) Perform most common edit area conversions with edit area conversion commands in the menu or on the toolbar

 

Please refer to Pic UG-1-1. You can perform various most common edit area conversion operations with the secondary menu commands at the bottom of the Code Conversion menu. Besides, the buttons AA->U/A->U, G->B, B->G on the upper toolbar and AA->U/A->U on the lower toolbar are also for edit area conversions. You can check their descriptions by moving the mouse onto such a button and read the tip text. The Edit Area Conversion Options in the secondary menu can be set as "Upper Area -> Upper Area" or "Upper Area -> Lower Area". This setting is specially designed for edit area conversions by menu or toolbar commands.

 

Edit area conversion is visual text conversion. You can convert text within the upper edit area, or send the result in the lower edit area. In most case, you are suggested to display the result in the lower edit area so you can check the conversion result while keeping the original text in the upper area.

 

As shown in Pic UG-1-3, if the volume of the text is above 100 KB, STGuru will display a real-time bar to show the progress in conversion. The speed of STGuru's conversion engine is fast by itself. However, if the text to be converted is too large in volume, STGuru will spend quite some time to set it into the edit area.

 

 

 

Pic UG-1-5 Text display language will be changed automatically to show text in the proper language after an edit area conversion. In this example, after a Big5->GBK conversion, the language in upper edit area was changed to Traditional Chinese, language in lower edit area was changed to Simplified Chinese.

 

1.2.2 Clipboard Conversions

 

 

Pic UG-1-6 With "Clipboard Conversion" under the "Code Conversion" menu, you can open this "Clipboard Conversion" panel. You can do it only when there is text in the clipboard.

 

Edit area conversion converts text between Simplified Chinese and Traditional Chinese in the clipboard. You can perform conversion with the "Clipboard Conversion" panel, which provides most comprehensive conversion options. You can convert between any valid combinations of Simplified Chinese and Traditional Chinese in GBK, Big5, UTF-8, Unicode and Unicode BE. Beside this panel, you can also perform clipboard conversions with relevant commands on the secondary menu on the Code Conversion menu and buttons on the upper or lower toolbars.

 

As clipboard conversion only converts text in the clipboard, such commands are available only when you have text in the clipboard.

 

1) Convert with the "Clipboard Conversion" Panel

 

Steps

This panel can be opened with the "Clipboard Conversion" command under the "Code Conversion" menu. The shortcut key is Ctrl+J. Please refer to Pic UG-1-6 to view this panel.

 

Functions and Notes

Source code or target code can be any one in the following list:

 

    * Simplified Chinese (GBK)

    * Simplified Chinese (UTF-8)

    * Simplified Chinese (Unicode)

    * Simplified Chinese (Unicode BE)

    * Traditional Chinese (Big5)

    * Traditional Chinese (GBK)

    * Traditional Chinese (UTF-8)

    * Traditional Chinese (Unicode)

    * Traditional Chinese (Unicode BE)

 

So you have 9*8=72 optional directions.

 

2) Perform most common clipboard conversions with edit area conversion commands in the menu or on the toolbar

 

Please refer to Pic UG-1-1. You can perform various most common clipboard conversion operations with the secondary menu commands at the bottom of the Code Conversion menu. Besides, the buttons G->U (=GBK->Unicode), U->G (=Unicode->GBK), B->U (=Big5->Unicode), U->B (=Unicode->Big5) on the upper and lower toolbars are also for clipboard conversions. U->G and G->U is usable when the code in an edit area is GBK, while U->B and B->U is usable when the code in an edit area is Big5.

 

Same as edit area conversion, if the volume of the text is above 100 KB, STGuru will display a real-time bar to show the progress in conversion.

 

1.2.3 File/Web Page/Directory/Site Conversions

 

 

Pic UG-1-7 UI of the File/Web Page/Directory/Site Conversion dialog box

 

 

Pic UG-1-8 Site conversion in progress. Statistics information was shown in the caption of main UI of STGuru, so if you are converting a large site, you can do some other work during this time, and still can monitor status of site conversion from statistics information on task bar. You can halt or stop site conversion by clicking the image button as shown in the white circle.

 

File/web page conversion and directory/site conversion share the same dialog. You can open this dialog box with the command File/Web Page/Directory/Site Conversion from the "Code Conversion" menu, or with the "File/Web Page/Directory/Site Conversion" button on the upper toolbar (the rightmost button in Pic UG-1-2). If you choose to convert "A file/web page", the setting will be that for a file or a web page; if you choose to convert "A directory/site", the setting will be that for a directory or a site. For both options, source code or target code can be any one in the following list:

 

    * Simplified Chinese (GBK)

    * Simplified Chinese (UTF-8)

    * Simplified Chinese (Unicode)

    * Simplified Chinese (Unicode BE)

    * Traditional Chinese (Big5)

    * Traditional Chinese (GBK)

    * Traditional Chinese (UTF-8)

    * Traditional Chinese (Unicode)

    * Traditional Chinese (Unicode BE)

 

So you have 9*8=72 optional directions.

 

1. File/Web Page Conversions

 

When you choose to convert "A file/web page", the setting will be that for a file or a web page.

 

If you check the option "Result in a new file/web page", the destination file must be different from the source file, so the original file can be protected from mindless overwriting. Else the destination file will be the same as source file, so the original file will be overwritten at the end of a file conversion. 

 

If the file you want to convert is a web page and it has a language mark, the language mark will be converted automatically, so the resulting web page can show properly according to the new language mark. For example, if you convert a Traditional Chinese web page in Big5 (and its original language mark is Big5) into Simplified Chinese in UTF-8, the language mark in the target web page will be changed into UTF-8.

 

You can click the file icon at the right end of the path edit box to read the source file or the converted result.

 

2. Directory/Site Conversions

 

When you choose to convert "A directory/site", the setting will be that for a directory or a site.

 

If you check the option "Result in a new directory/site", the destination directory/site must be blank, or not existing at all. This is to protect user from mindlessly overwriting an existing directory. If you choose to convert to a new site, you can copy non-text files to the destination site. This function is useful since non-text files are usually images, zipped files or other resources referred by converted web pages.

 

While performing directory/site conversion, STGuru will automatically check whether a file is a text file. The file will be converted only when it is verified by STGuru as a text file. .If it is a text file, the file will be converted, else it will be kept unconverted. These non-text files are usually integral parts of this directory/site, such as pictures, compressed files or other resources linked from the web pages. They can be, by default, copied to the proper position in the new site if you check the relevant option.

 

The language marks of all web pages will be automatically changed to that for the target language. If the conversion direction is from Simplified Chinese (GBK) to Traditional Chinese (UTF-8), the language mark for web pages in the destination folder will be changed from GB2312 to UTF-8.

 

1.2.4 File/Folder Batch Conversions

 

A batch conversion pack can be seen as a macro made up of conversion/setting commands and can convert a list of files and folders with separate conversion settings continuously. You can separately specify conversion pack(s) in any one or multiple steps, and can separately specify source/target paths, directions and conversion options for each round of file/folder conversion. This feature can be of great convenience for those who often need to convert files and folders through multiple rounds with different settings in each round. Some notes:

  • You can find this feature under the "Code Conversion" menu. Its shortcut key is Ctrl+Shift+B.

  • Each macro (batch conversion pack) has macro properties (such as title, author, version, description, path…) and a list of commands. There are three types of commands - "Set Conversion Pack", "Convert File/Folder" and "File System Utilities". You can decide via an option to recover the original conversion pack in the current instance of STGuru after a batch conversion is complete.

  • You can further decide whether to convert a file or a folder in a "Convert File/Folder" command. If you need to convert a folder, you can have a new option by specifying which you can overwrite the target folder, if it already exists. This option can help a batch conversion that has many commands run smoothly without being prompted in the mid.

  • You can manage commands in the list individually or in batch, such as Add, Delete, Modify, Cut, Copy, Paste, Move Up, Move Down, and you can set or cancel the Apply status of any command. A command not Applied will be kept in the list, but will be skipped during the batch conversion. Multi-select operations are supported for all commands except for Add or Modify.

  • Drag and drop of files and folders are supported. You can drop multiple files/folders into the command list at a time. When a conversion pack is dropped, STGuru adds a Set Conversion Pack command automatically; when a text file or a folder is dropped, a Convert File/Folder command (with the "Result in new..." option UNCHECKED, so the conversion result will be saved directly in the original location) is added. You can also drag and drop a conversion pack or another file/folder when editing an individual command for the same result. This feature requires Windows 2000, XP or 2003, but is not supported by Windows Vista or 7, because Windows Vista and 7 no longer support drag and drop.

  • There is an "Edit Source File" button in the dialog box by clicking on which you can open the source file of the current batch conversion pack and edit it directly. When  you save changes after editing, STGuru will prompt you to upload changes into the dialog box. This feature is convenient for you to edit a big command list freely and in detail.

  • The UI can be resized, and when it is resized, the layout of controls in the UI, the width of the description column and the length of the description text will also be optimized automatically. This adds big convenience for those who use large screen monitors.

  • You can save a satisfied pack as a plain-text file, and can load any previously saved pack.

  • The File System Utilities commands include Run Program, Open, New, Cut, Copy, Paste, Create Shortcut, Delete, Rename/Move and Properties. These commands can help senior computer experts perform auxiliary file system operations when doing batch conversion. Most users do not need to use these commands.

1.3 How to Convert Files in Special Formats Such as Word, Excel, PowerPoint, Access, Trados TM...?


STGuru is dedicated to the professional-quality conversions for and between key codes such as GBK, Big5, UTF-8 and Unicode. It does not directly operate on files in other special formats such as Word, Excel, PowerPoint, Access, Trados TM, etc. However, STGuru also provides well-organized procedures, indirect but still efficient, to help you convert these files. After you have been familiar with these procedures, except for the time necessary for basic conversions, usually you can convert a file in any of these special formats at professional quality with only 1 additional minute.

 

Fundamentals of These Conversions:

1) Exporting from any of these special formats to XML (XML means UTF-8) or other formats that STGuru can recognize (complete or nearly complete formatting information is included).

2) Professional-quality conversion between Simplified Chinese and Traditional Chinese by STGuru on such a format that STGuru recognizes.

3) If necessary, post-conversion processing by such means as batch replace, etc. (This step can often be skipped).

4) Importing the result of the above step 3 and step 4 to be saved as the proper format.

5) If necessary, some simple processing. (This step can often be skipped)

 

Further Details:

For conversions for Word, Excel and PowerPoint, and general rules in conversions for files in special formats, please refer to “How to convert MS Word, Excel, Access... and files in many other non-txt formats?” in the FAQ.

For Access conversion, please refer to “I have an Access database to convert” in the FAQ.

For conversion for Trados TM, please refer to “I am a translator. I want to convert a Trados TM from the pair English -> Simplified Chinese into the pair English -> Traditional Chinese. How can I do?” in the FAQ.

 

1.4 Real-Time/Dynamic Conversions, Command Line Conversion Interface, Batch Conversion
 

Real-Time/Dynamic Conversions and Command Line Conversion Interface

 

Some developers, web site owners and computer geeks interested in command line operations expressed the interest for a real-time, dynamic conversion interface or whatever that can be called and used by a third-party program to realize real-time, dynamic conversions. We developed a standalone product STGS (an ActiveX-based product) for this purpose. However, facts show a command line interface directly by STGuru can be more convenient to use and maintain. Therefore, we recently developed the command line feature. It can be used to call STGuru to perform real-time file/directory conversions by a third party program via command line commands. This feature is available only in STGuru Standard Edition. More details can be found in:

 

A.3 Command Line Guide (Only for STGuru Standard Edition)

 

Batch Conversion

 

1) Folder conversion may be more suitable for you than "batch conversion"

 

Some users expressed interest for "Batch Conversion" because they have some files to convert, and some others that cannot or do not need to convert. They found it inconvenient to pick proper files to convert one by one. However, most of these users ask for it because they haven't fully understood how powerful is the module File/Web Page/Directory/Site Conversions (to make it simple, we call it Folder Conversions or Directory Conversions).

 

Please note, STGuru's Directory Conversions module converts all levels of subfolders and can recognize text files automatically during conversions. It will only convert text files. At the same time, it can be configured to copy non-text files, such as Word documents, image files or compressed files, to the same locations in the destination folder. Therefore, if you have a site in Simplified Chinese (a site is usually a folder) to be converted to Traditional Chinese, you only need to convert this folder. When the folder conversion is complete, all proper link relations will remain and non-text files are copied to the relevant locations. It is still in one piece. As a folder conversion can fix everything, why shall we need "Batch Conversion" any more?

 

If you have more than one file or directory to be converted, you can copy them to the same folder and convert them as a whole. It is also convenient.

 

Therefore, "Batch Conversion" is not actually that necessary for general users. The existing feature of folder conversion usually has been good enough for them.

 

2) Realize batch conversion by the command line interface

 

There are some special users that actually need batch conversions. These users are suggested to buy STGuru Standard Edition. The command line feature supported by STGuru Standard Edition can be used to realize batch conversion. You only need to read the sample batch file (you can find the download URL of the sample pack by clicking the link below) and replace the source files - target files and source directories - target directories to those you specify and run the finished batch file. A batch file can contain unlimited number of lines where each line is a command line command. You can set different conversion parameters for each conversion command, and can switch between conversion packs in real time.

 

For more details, please read:

 

A.3 Command Line Guide (Only for STGuru Standard Edition)

 

1.5 Unicode-Related Conversions: Conversions Between Unicode/Unicode BE and GBK, Big5 and UTF-8


When you are copying text from a web page on Windows NT/2000/XP/2003/Vista, or editing some text in notepad under Windows XP, what you are operating on is Unicode data. Some time you may found the text you copied from a web page to STGuru or some other program was shown as ????. This is because the text you pasted is not GBK or Big5 text, it's Unicode.

 

STGuru supports various kinds of operations for Unicode-related conversion in the way of Clipboard conversion:

Major Conversions

Conversions Explanation/where to us it
Unicode "Simplified Chinese" <-> Unicode "Traditional Chinese" Convert directly between Simplified Chinese and Traditional Chinese in Unicode format. Us this function when you only need to work on or between Web page, Notepad or other Unicode programs on NT/2000/XP/2003/Vista.
Unicode <-> GBK/Big5 Use these functions where you are converting data between a Unicode program (such as a web page or Notepad) and a non-Unicode program (such as STGuru).
Unicode BE <-> GBK/Big5 Processing Unicode BE data.
Unicode / Unicode BE <-> UTF-8 Processing UTF-8 data.

 

Shortcut Conversion Buttons

Since most users are interested only in conversions between Unicode and ANSI in GBK/Big5. 3 pairs of most often used commands are specially provided as buttons on upper/lower toolbars. The first group also appears independently on code conversion menu. The other two groups have already appeared as part of the major conversion commands above. The code of the ANSI part of these conversions is the current code of the edit area for these button commands.

 

"Clipboard Conversion": Selection to Unicode / Text to Unicode

The first pair of commands is in fact two kinds of commands of some differences. If some text is selected in the specified edit area, the left button appears. You can convert the selected text in this edit area to Unicode and put the final Unicode result to clipboard, which can then be pasted to some Unicode editor. If nothing is selected in the specified edit area, the right button appears, at this time you can convert all text in the edit area into Unicode, and put the converted Unicode to clipboard. The text will be seemed as text in the current display language during the conversion.

 

Clipboard Conversion: ANSI to Unicode / Unicode to ANSI

The next two pairs of commands are pure clipboard conversions from ANSI to Unicode and From Unicode to ANSI. Text in the specified edit area is not involved in the conversion, but the language of the edit area is used as the language of the ANSI text during these conversions. If code for corresponding edit area is GBK, then the commands will appear as G->U / U->G, if current code is Big5, then the commands will appear as B->U / U->B.

 

1.6 Intelligent Word Adjustment


STGuru provides word adjustment mechanism to further increase the precision of a code conversion. 

 

The result of code conversion with word adjustment can be much better than without it. 

1. There are some characters in Simplified Chinese, such as ,which can be mapped into 2 or more characters in Traditional Chinese under different context, in this example is . It's the same that some characters in Traditional Chinese can be mapped into 2 or more characters in Simplified Chinese, such as in Traditional Chinese are in Simplified Chinese.

2. Many modern phrases describing the same thing are different in Simplified Chinese and Traditional Chinese. Such as modem in Simplified Chinese is , but in Traditional Chinese, it is .

The above 2 kinds of problems can be solved perfectly with word adjustment mechanism. In addition, you can add any word you hope to change by applying word adjustment mechanism at a suitable time. 

 

 

Pic UG-1-9 The general Word Adjustment UI will be opened when you click "Advanced Word Adjustment" command under Code Conversion submenu. The top line of list and buttons are used for managing word adjustment packs. A word adjustment pack can be used to manage configurations of all GBK->Big5 libraries and all Big5->GBK libraries for a specific conversion task up to perfect conversion quality. You can save the pack, and edit or reapply it whenever you want. The four buttons at bottom-left just next to the help button are tools you can use to merge libraries, to check/fix repeating items in the library list, and to find a word in a specific library or in the whole library list. There is tip for each item in this UI at the bottom of this dialog box when you move mouse onto the item.

 

 

Pic UG-1-10 Editing a word adjustment library. You can change everything of a word adjustment library here. A library will be applied in word adjustment only when you checked "Apply this library". The word list may not be displayed correctly if the language of your OS is not same as the language of this library, but you can see the two words of any selected item pair correctly in the two edit boxes under "Before Adjustment" and "After Adjustment". If you don't like editing the library word by word in this dialog box, you can click the file icon right to the bottom of the list to open the library in another instance of STGuru. When you finished editing the library file you will be prompted to update from your modified file. You can check validation or auto-fix any validation related problems by the two tools in the circle at the bottom of the dialog box.

 

There are two groups of word adjustment libraries - Big5 libraries and GBK libraries. Big5 libraries are used in word adjustment at the end of a code conversion from GBK to Big5; GBK libraries are used in word adjustment at the end of a code conversion from Big5 to GBK.

 

6 system libraries and some custom libraries (IT/Windows/Network professional library) are provided in each group the installation packs of STGuru. You cannot delete a system library, but you can modify its content. However, you are not advised to do it if you are not an expert. Following are some typical examples. Anasoft keeps improving STGuru's CHS-CHT conversion engine and will update default libraries and packs for better use without advance notice.

 

System Libraries:

Base Lib: it's provided to solve the problems of character difference as described before in this section.

Phrase Difference Adjustment Foundation (PDAF) Lib: it's provided to fix the problem of different phrases.

Phrase Difference Adjustment Foundation 2 (PDAF2) Lib: it's provided to fix the problem of different phrases created or still not solved by PDAF Lib.

Final Fix Lib: sometimes too many word adjustment libraries may cause new problem in the mid, you can fix any problem created in the mid here.

Final Fix Lib2: fix trouble phrases produced, or still haven't been fixed in the original "Final Fix Lib"

Final Fix Lib3: fix trouble phrases produced, or still haven't been fixed in the original "Final Fix Lib 2"

 

Custom Libraries:

IT Lib: fix IT related problems.

IT Lib2-4: fix IT related terms produced, or still haven't been fixed in the original IT Lib.

Windows: provided to fix terms typical in Microsoft Windows operating systems.

Windows2: fix Windows related terms produced, or still haven't been fixed in the original Windows Lib.

Network: is provided to fix typical network related terms.

Network2: fix network related terms produced, or still haven't been fixed in the original Network Lib.

 

Custom Final Fix Libraries:

Hong Kong Lib: Word adjustment lib for the conversion of Hong Kong style Chinese.

 

One thing the user should know about the adjustment libraries is, all the libraries will be applied in the sequence from top to bottom in these library list. So if you found some problem arises from some library, you can fix it in a library below it.

 

For more details about word adjustment, please refer to A.2 Word Adjustment Engine Customization Walkthroughs.

 

1.7 How to Perform Perfect Conversions by STGuru?


STGuru has everything you need to perform perfect code conversion for any purpose; the only limitation is your imagination. Here is some advice on how to use it.

1) Apply the "Default" pack for your small general-purpose conversions; apply the "IT" pack for small IT-related code conversions. For small and general purpose conversion, apply "IT" pack if it's an IT-related topic; apply "Default" pack if it is a general-purpose conversion not closely related to IT. These two packs are usually enough for small general purpose conversions.

2) Create a special word adjustment pack for each of your important code conversion projects. You can edit and save all configurations for your word adjustment libraries in this pack. When you hope to perform the conversion, just apply this pack and begin your conversion.

3) Use packs provided by STGuru in a suitable way (advanced instructions)

a) Familiarize yourself with libraries provided by STGuru - 6 system libraries, several custom libraries and several custom final fix libraries (there are 2 series of such libraries for GBK->Big5 and Big5->GBK respectively). Make sure you know what each library is designed for.

b) It is a good idea to apply all the 6 system libraries all the time. They are applied in default setting.

c) Apply IT-related libraries when converting IT-related topics. Do not apply them if your conversion project is not related to IT.

d) There are several custom final fix libraries in the installation pack. You can apply any of them if it is suitable for your topic.

4) Created one or two level custom final fix libraries for either or both Big5->GBK conversions or GBK->Big5 conversions for your own use within your word adjustment pack. They can be used for two purposes:

a) Add your custom fix word pairs within your custom final fix libraries.

b) You can change possible difference in links, or any thing always requires manual work after the code conversion here. So you don't need to modify such links as "...\gb2312\abc.htm" to "...\big5\abc.htm" manually each time after conversion, you can just add a word fix pair "\gb2312\"->"\big5\" in your custom final fix library to fix this problem during code conversion. If you find some term uneasy to be fixed for certain reason, or new problem arises after you applied some lib(s), you can also fix it in your custom final fix lib(s).

 

For more details about word adjustment, please refer to A.2 Word Adjustment Engine Customization Walkthroughs.

 

1.8 Background Knowledge: Simplified Chinese, Traditional Chinese, GBK, GB2312, Big5, UTF-8, Unicode and Unicode BE
 

Simplified Chinese and Traditional Chinese

 

In the first half of the 20th century, the National Government had considered simplifying the Chinese characters, so the study and writing of the Chinese language, and the propaganda of national education could be easier. Due to unstable political situation, this plan hasn't been finally implemented.

 

After 1949, when the political situation between the Taiwan Strait became relatively stable, an expert commission was nominated by the government of the People's Republic of China in mainland, which led the simplification of the Chinese characters between the 1950's and the 1970's. The majority of simplified Chinese characters were those high-frequency characters. Most single simplified Chinese character or simplified part of a character we see today is the single descendant from a group of two or more characters/parts where the current one was (one of) the simplest. Many of the other simplified Chinese characters/parts were based on some ancient Chinese characters (often from the ancient Cursive Chinese). Only very little part of the current Simplified Chinese characters/parts were created from none. The government in Taiwan, however, hasn't continued the suspended Chinese Simplification plan. Therefore, we have to major Chinese, one is Simplified Chinese (CHS, 简体中文), the other is Traditional Chinese (CHT, 繁體中文).

 

As the simplification of Chinese characters were usually realized by merging two, three or more Chinese characters into one (the chosen one was usually a simple one), so the mapping relation between Simplified Chinese characters and Traditional Chinese characters are mostly multiple Traditional Chinese characters against one Simplified Chinese character. Besides, due to the long-time social and cultural separation of the two communities on different sides of the Taiwan Strait since 1949, for new concepts created since 1949, a major part of them are expressed differently in Simplified Chinese and Traditional Chinese. This is another major difference.

 

Now, Simplified Chinese is used and popularized as the official Chinese language in mainland China, Singapore and Malaysia, while Traditional Chinese is widely used in Taiwan, Hong Kong and Macao. For Chinese people in other countries/areas, more aged Chinese people use Traditional Chinese, while many youngsters, especially emigrants from mainland China, use Simplified Chinese. Due to the frequent cultural communications in recent twenty to thirty years, most Chinese speakers can recognize both Simplified Chinese and Traditional Chinese. However, the differences and preferences in the reading and writing cannot be easily changed. The development of Simplified Chinese is led by mainland China, while Taiwan takes the lead in the development of Traditional Chinese. Hong Kong people have their own habits in speaking Traditional Chinese. However, Hong Kong Traditional Chinese is obviously not as influential as Taiwan Traditional Chinese. In recent years, Traditional Chinese is often called 正體中文 in Taiwan, which means this is the genuine Chinese.

 

GBK/GB2312/Big5

 

Computer character encoding experienced a major transfer from single-byte (ANSI) to double-byte (Unicode) from the end of the 20th century to the beginning of the 21st century. Windows 9x and previous versions of Microsoft operating systems use single-byte (ANSI) encoding to store, transfer and process files and other data. Then length of an ANSI character is 8 bits with binary radix, so there are 2^8=256 basic ANSI characters. Windows NT and Windows 2000+ operating systems, however, have double-byte kernels. In a single-byte Simplified Chinese operating system, Chinese characters are represented with the GBK encoding scheme, while in single-byte Traditional Chinese operating systems, Chinese characters are represented with the Big5 encoding scheme. These two encoding schemes were established by different teams and are not relevant to each other directly.

 

In a single-byte system, a Chinese character is represented by two ANSI characters with values between 1 and 255. The two ANSI character makes up a Chinese character are this Chinese character's codes. The GBK standard was set up in mainland China. It provides encoding for around 22,000 Chinese characters, including all Simplified Chinese characters, and a nearly complete collection of Traditional Chinese characters. Therefore, GBK is a collection of both Simplified Chinese and Traditional Chinese. The Big5 standard was established separately by a team in Taiwan, which includes around 13,000 Chinese characters. Only Traditional Chinese characters were listed in Big5. Most Simplified Chinese characters not appearing in Traditional Chinese were not collected in the Big5 encoding standard.

 

About GB2312, we may say it was the predecessor of GBK. It is a character set of about 6,000 most common Simplified Chinese characters. This standard was used in the early days. In a Chinese article of general topic, without counting duplicates, about 98%-99.5% of the Chinese characters may be within GB2312. While the characters increasing, however, we may always meet some non-GB2312 characters. So in most systems, GB2312 has long been replaced by the more complete GBK character set. GBK hasn't changed the codes for the characters already listed in the GB2312 standard. It added, however, a lot of other Chinese characters and marks. GBK expands from GB2312 in three aspects: (1) It included almost all Chinese characters in Big5; (2) It added some less-common Chinese character; (3) It added some less-common punctuation marks. So the GBK character set includes and is fully compatible with GB2312 character set. Maybe for this reason, or maybe GB2312 appeared first and its name has been widely accepted, we still see many "GB2312s" in all kinds of IT documents. However, in most cases when you see a GB2312, it actually refers to the GBK character set, the term GB2312 is meaningful only as a name, not as an implication of the GB2312 character set.

 

Besides double-byte file/data formats, Windows NT and Windows 2000+ systems are also compatible with single-byte data. In a 2000+ Simplified Chinese system. Data can also be saved in single-byte GBK codes, while in a 2000+ Traditional Chinese system, data can also be saved in single-byte Big5 codes.

 

Unicode/Unicode BE/UTF-8

 

Unicode encoding is used in Windows 2000+ double-byte operating systems. Unicode is a double-byte (or wide character) encoding. A Unicode character doubles in length as against a single-byte character. There are 256*256=65536 Unicode characters. Due to the big capacity in Unicode encoding, and as it was the result of discussions by linguists from all major countries, Unicode encoding, at its completion, has included all characters from all major languages (all characters in Simplified Chinese and Traditional Chinese are also included in Unicode). So it is a collection of all characters in all major languages. If a Chinese character exists both in the Simplified Chinese and the Traditional Chinese (this happens when a Chinese character was kept unchanged during the time of Chinese simplification), such a Chinese character usually has different codes in GBK and Big5, but only one code in Unicode. As GBK includes almost all Big5 characters, Chinese characters are listed in the Unicode table based on the GBK character table.

 

Unicode BE is another form of Unicode encoding. A Unicode BE character is the re-assembly of the left half and right half of a Unicode character in the reverse order.

 

As many Unicode characters cannot be recognized by an ANSI system. For example, the 0-value ANSI character is a terminator both in a file and a string in an ANSI system, so the 0-value ANSI character cannot appear in the mid of a file or data block. If we simply break a Unicode wide character in the mid into two ANSI characters, there will be many X+0 or 0+X combinations, which bring in many invalid 0-characters. So not all Unicode characters can be simply broken in the mid and saved as two ANSI characters. UTF-8 was brought in to solve this puzzle. It is an ANSI compatible file/text storage encoding for Unicode. When a Unicode string is converted into UTF-8, each double-byte Unicode character is broken down based on certain rules into 1, 2 or 3 valid single-byte characters. With this encoding, Unicode data can be saved intact into ANSI-compatible single-byte encoding data.

 

So UTF-8 is Unicode in nature, while it is compatible both with single-byte and double-byte systems. UTF-8 is the most important file storage format in the double-byte age.