STGuru User's Guide

8. Changing the Layout for STGuru with the Help of Profile Management Contents A.1 Accelerator Keys

9. Tools

9.1 Character Count, Word Count, Paragraph Count and Line Count for Text, File and Folder
    1) Counting text in edit area
    2) File/Folder Character/Word Counts
    3) Comparison between STGuru and MS Word on characters/word counts
9.2 Character Occurrences
9.3 Text Difference Check
9.4 Line Sorting

9.5 Redundant Line Management

9.6 Chinese-English Glossary Management - Merge, Split, Swap...
9.7 Chinese Characters to Decimal/Hexadecimal Unicode Codes
9.8 Page Cleaning
9.9 Term Management System

9.1 Character Count, Word Count, Paragraph Count and Line Count for Text, File and Folder

1) Counting text in edit area

You can click the "Word Count" command from under the Tools menu to count characters and words in the current edit area. If the currently active edit area is the upper area, the text in the upper area will be counted, otherwise, the text in the lower area will be counted. If some text in the current edit area is selected, STGuru will only count the selected text.

STGuru's statistics include:

1) File information: file path, file times (created, modified, accessed), key file attributes (read-only, hidden, archive, system);

2) Characters, words, paragraphs and lines: characters (all/ANSI = file/text size), characters (all/supporting double-byte characters), characters (no spaces), characters (with spaces), words (all), words (single-byte characters), words (double-byte characters), paragraphs (all non-blank paragraphs) and lines (all paragraphs, including blank lines).

When word counting is complete, you can click the "Text Report" button at the right bottom of the dialog box to read a statistics report in pure text.

2) File/Folder Character/Word Counts

You can find this function from the Tools menu.

When you click this command, a dialog box for file/folder counting will pop up. All recently counted files and folders will be displayed in the path list. If a path you selected or entered into the path edit box has been counted recently, its statistics will be displayed below automatically. The software keeps statistical information for recently counted paths. You can set the maximum number of paths in the path list and the maximum days within which statistics will be kept in the Options dialog box.

If the path you choose to count is a file, the file will be counted. If you choose to count a folder, all text files in this folder (including those in all levels of its subfolders) will be counted. STGuru will detect whether a file is a text file, and based on which, it counts text files and ignores non-text files.

The report of statistics as the result of the counting includes the language and code of each text file as well as its characters (all/ANSI = file/text size), characters (all/supporting double-byte characters), characters (no spaces), characters (with spaces), words (all), words (single-byte characters), words (double-byte characters), paragraphs (all non-blank paragraphs) and lines (all paragraphs, including blank lines) for each text file and, at last, their totals.

If a file is detected to be in Simplified Chinese or English, the statistics will be the same as those counted by Simplified Chinese version of MS Word, while if a file is in Traditional Chinese, the statistics will be the same as those counted by Traditional Chinese version of MS Word. If the product fails to detect the code (such as when the file is in Traditional Chinese, but in the code GBK, or when different codes exist in the same file, sometimes the product may have trouble detecting the code), the file will be counted as if it is in Simplified Chinese (GBK).

If the files you need to count are dispersed in different locations of the hard disk, you can copy them to one folder before counting.

3) Comparison between STGuru and MS Word on characters/word counts

Item	STGuru	MS Word	Notes
Characters (all/ANSI = file/text size) Characters (all/supporting double-byte characters)	Available	N/A
Characters (no spaces) Characters (with spaces) Words (all) Words (single-byte characters) Words (double-byte characters)	Available (Same to each other)		STGuru's statistics on these parameters are the same as those by a proper language version of MS Word. If the language of the current edit area is Simplified Chinese (GBK), the statistics are the same as those by a Simplified Chinese version of MS Word, while if the language of the current edit area is Traditional Chinese (Big5), the statistics are the same as those by a Traditional Chinese version of MS Word.
Paragraphs	Available (Same to each other)		"Paragraphs" means the number of paragraphs. It does not count blank lines (a blank line is a line that has no characters, or only has blank spaces).
Lines	The number of lines counted by STGuru is the number of all paragraphs, including blank lines. The difference between Paragraphs and Lines by STGuru is that Paragraphs does not count blank lines (a line with only blank space(s) is also a decided by STGuru, as well as Word, as a blank line), while Lines does.	The lines provided by MS Word mean the actual lines shown on the screen. It changes with display font sizes or page width, even if the text in the file does not change at all.
Other differences	Sometimes, STGuru and MS Word provide different statistics for the "same" text.		If you copy text from STGuru to MS Word or vice versa, as the two text processor are of different internal codes, copy+paste data from any one to the other may cause changes to some character (such as double-byte to single byte, or single byte to double byte, or some character cannot be recognized). Usually, the changes are some unimportant minor changes. As now there are small differences in the text, the statistics cannot be the same. If you copy+paste text from one to another, and then from the other back to the original one, you will go over all the changes, and there will be no further changes even if you copy+paste text between the two text processors for more times. Now, with the finally same text, STGuru and MS Word will provide the same statistics.

9.2 Character Occurrences

You can find this function from the Tools menu.

It can be used to find out all the Chinese and non-Chinese characters appearing in a text together with the occurrences of each character in this text. The data can be sorted naturally (by character values in the Unicode Character set), or ascending or descending by occurrences.

It can be very helpful for professionals highly concerned about linguistic quality of their documents. Usually no more than 4,000 different Chinese characters are used in even a long Chinese document. If you sort the result naturally, single-byte English digits, letters and punctuation marks will appear first, then double-byte Chinese characters, including punctuation marks, common Simplified Chinese characters and Traditional Chinese characters, and uncommon characters, will be listed separately later. If your document is a pure Chinese document and if there are "unwanted" single-byte English characters, uncommon characters or unrecognizable characters in this document, you will easily locate them in the result. If you sort the result by occurrences, you may very possibly find, in the low-frequency section, some characters that you do not expect in you document (such as typos). With the result at hand, you can locate the real problems in the text by the full-text find (Find/Replace) feature of STGuru or any other text processor and fix them accordingly.

This feature is designed for the text in the current edit area. This means if the cursor is in the upper edit text, the result is for the text in the upper edit area, and if the cursor is in the lower edit text, the result is for the text in the lower edit area.

9.3 Text Difference Check

You can find this function from the Tools menu.

You can check the differences between the upper and the lower edit areas. Line numbers and contents of the different lines will appear in the check result.

While using this command, the font and language for the two edit areas should be set as the same. If you use a small font with a wide UI, it can be easy for you to find different lines. If you narrow the UI, you can locate specific places of difference easily.

9.4 Line Sorting

You can find this function from the Tools menu.

You can use this command to sort lines in the upper or lower edit areas, or selected text in the upper or lower edit areas, or text in the Clipboard in ascending or descending order.

9.5 Redundant Line Management

You can find this function from the Tools menu.

This command can be used to analyze or remove redundant lines.

Target: upper or lower edit area, selected text in upper or lower edit area and text in the clipboard.

Operation: 1. Analysis; 2. Removing redundant lines.

After you have got the analysis result, you can click the button "Details" to read content and extra line count for each group of redundant lines.

If you choose to remove redundant lines, the first line of each group of redundant lines will be remained, while all other extra lines will be deleted.

9.6 Chinese-English Glossary Management - Merge, Split, Swap...

You can find this function from the Tools menu.

This command can help you organize Chinese-English glossary in different forms.

Target: upper and lower edit areas, selected text in upper and lower edit areas, text in the Clipboard.

Chinese and English phrases before reorganization can be in any of the following forms:

1) A phrase pair will appear in two lines - Chinese at first line, and English at second, or English at first line, and Chinese below it.

2) One phrase pair will be in one line, Chinese at left, English at right, or English at left, and Chinese at right.

3) Multiple pairs of Chinese and English phrases can be lined one pair next to another continuously in the same line.

The pair pattern after organization can also be in one of above forms. It's sure the form before and after the organization can be different. For example, before organization, the pair of Chinese and English phrases can in the same line one next to another continuously. After the organization, they can be changed to each phrase in a separate line. Another example, before organization, the Chinese and English phrase can be in two lines, each one in a line, and after the organization, they can be organized as each whole pair in a line.

The default separators supported by the program are blank space, English comma, English colon and English semicolon. If you prefer other separator, you can realize it by replacing your separator to the standard separator before organization, and change back after organization.

9.7 Chinese Characters to Decimal/Hexadecimal Unicode Codes

You can find this function from the Tools menu.

The feature was previously named Baidu Post Assistant, used to post on Baidu Tieba in Traditional Chinese. Baidu canceled the mechanism later, and the function by STGuru was also canceled. However, some users may still want to convert Chinese characters from GBK/Big5/Unicode to Unicode in decimal/hexadecimal codes for some other specific purposes, so we kept the module and renamed it to the current one after necessary reform and improvements.

When using this feature, you can put Chinese text in GBK or Big5 to the Source Text edit box for automatic conversion, or use Unicode text you have copied to Windows clipboard as the source and then convert it manually.

The resulting codes can be put into a Web page's code page, then the relevant Chinese text will be displayed when this Web page is displayed. For example, the decimal and hexadecimal codes converted from "街道" are "街道" and "街道" respectively. After you post these codes to the code page of a Web page, "街道" will be displayed in the relevant position when this Web page is displayed.

Convert text in the "Source text" edit box

When you select the first three options - "GBK (converted automatically)", "Big5 (converted automatically)" or "Auto-Detect (converted automatically)", the text in the Source text edit box is used as the source. The respective codes in the edit box for these three codes are GBK, Big5 and what is detected by the software automatically. A good news is you can visually edit the source text. Besides, the codes to output are generated automatically in the code area below. After you have done editing, you can click the Copy button below the codes edit box and paste them to where you want to use.

Convert Unicode format text in the clipboard

When you select the last option "Use Unicode-Format Text in the Clipboard" in the source code drop-down list, the text in the clipboard is used as the source, the format of which should be Unicode. If the system you use is not the very ancient Windows 98/Me, but more common Windows 2000+ operating systems, when you copy text from Windows utilities such as Notepad, from Microsoft Office or directly from a Web page, all the formats of text are Unicode.

In this mode, you need to click the Convert button above the code edit box manually to perform conversion.

When text in the "Source text" edit box is used as the source, only characters in the GBK and Big5 character sets are supported. However, when you use the Unicode format text in the clipboard as the source, you will have the whole Unicode character set available to choose from.

"Decimal" and "Hexadecimal"

If you choose the "Decimal" option at the lower left, the output codes will be 5-digit decimal numbers. If you choose "Hexadecimal", the codes will be in 4-digit hexadecimal numbers.

9.8 Page Cleaning

Run

The function of Page Cleaning is attached as preinstalled macros in the Batch Replace dialog box. They can run in the following steps:

1) Open the Find/Replace dialog box for the upper area (by the Find/Replace menu item under the Edit menu for the upper area, or by the corresponding button on the upper toolbar). This dialog box can be opened only when there is text in the upper edit area. If there is no text in the upper area, you can paste some text into it by yourself.

2) Click the Batch Replace button to open the Batch Replace dialog box.

3) Click the Open button. A drop-down path list will pop up. If you are using a newly installed MLEditor, there will be 3 default page cleaning macros in this list - "page clean.txt" ("Page Cleaning" - for English text), "page clean-GBK.txt" ("Page Cleaning for GBK" - for Chinese in GBK) and "page clean-Big5.txt" ("Page Cleaning for Big5" - for Traditional Chinese in Big5). You can select the one that you need.

Function

"Page Cleaning" is designed for the cleaning of English text, while the other two are for the cleaning and normalization of Chinese text in GBK and Big5, especially for paragraphs of articles in full or nearly full-Chinese. You are not suggested to use a Chinese page cleaning macro on English text or text mixed with English and Chinese. Professional advice from English and Chinese linguists was carefully considered while these macros were created. Applying relevant page cleaning macros on text in the three languages/codes with respective version can promptly normalize and clean its punctuations, blank spaces and paragraph layout.

Custom page cleaners

If you have some special page cleaning requirements, you can create your own page cleaning macro based on one of these standard page cleaning macros.

You can make a new custom macro just by modifying the path and saving it. Then you can modify the title, description and all items in this macro according to your special requirements. After several rounds of tests and revisions, you will get the finished macro.

9.9 Term Management System

The word adjustment system (Conversions -> Advanced Word Adjustment) itself is an advanced term management system by itself.

It manages terms in project packs. In each pack, you can manage one or several libraries (glossaries). Each pack is self-contained and is comprised of a simple text-based pack management file (.stcp file) and one or several plain-text library (glossary) files. Such a project is portable, and can be easily shared whenever necessary. You can double-click a .stcp file to open it from Windows Explorer, or can select it from the pack list in the main interface of Advanced Word Adjustment.

You can conveniently add, modify, delete, find/search, cut, copy, paste entries. While adding/editing entries, STGuru can check for repetitive entries automatically, so you do not need to worry that you may add one term in a library twice. Terms are automatically sorted in the library (glossary) after they are added, so the source file for each library is a text file containing a sorted glossary.

Each entry contains three parts - term, explanation and notes. The Edit Word dialog box was specially optimized for editing entries in a glossary. Its interface can be scaled. "Notes" can be shown or hidden. The smallest interface shows only several lines, while the largest one can show up to 1000 Chinese characters or English words.

Glossaries can be easily merged (simply append the text of one source file at the end of another!). When you open a merged library in STGuru, the program will reorganize its entries automatically. If there are repeating entries, you will see a wizard that guides you through merging repeating entries one by one.