This site uses cookies.
Some of these cookies are essential to the operation of the site,
while others help to improve your experience by providing insights into how the site is being used.
For more information, please see the ProZ.com privacy policy.
Samuel Murray Hollandia Local time: 14:47 Tag (2006 óta) angol - afrikaans + ...
May 4, 2021
Hello everyone
I have a file in which some segments contain Chinese characters. I need to identify these segments, so I'm hoping I can use a search for the specific Unicode characters that are Chinese. Can anyone clarify for me what is the UTF8 character range for Chinese characters?
Thanks
Samuel
Added: found it, under "CJK scripts and symbols" here:... See more
Hello everyone
I have a file in which some segments contain Chinese characters. I need to identify these segments, so I'm hoping I can use a search for the specific Unicode characters that are Chinese. Can anyone clarify for me what is the UTF8 character range for Chinese characters?
However, I discovered that searching for the presence of all of these characters would be very inefficient, so instead I converted all my source text to one character per line and then removed duplicate lines, to get a list of all characters used in the source text. Then I just deleted non-Chinese characters, and thus had a much smaller list of characters to search (and no need to search hexadecimally either).
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
esperantisto Local time: 15:47 Tag (2006 óta) angol - orosz + ...
A WEBOLDALAT LOKALIZÁLÓ FORDÍTÓ
My range
May 4, 2021
Here is the range that I use (even though you have already found, maybe, it will be handy):
Code:
⺮-𰻞
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Samuel Murray Hollandia Local time: 14:47 Tag (2006 óta) angol - afrikaans + ...
TÉMAINDÍTÓ
@Esperantisto
May 4, 2021
esperantisto wrote:
Here is the range that I use (even though you have already found, maybe, it will be handy)...
Thanks, I'll give that a try as well (then I can use regex).
As it happens, my source text contained only about 1000 distinct Chinese characters, so testing for each of them one by one across 2000 segments was doable and took about 20 seconds only (not including the time it took to script it in AutoIt, of course). I'm curious if a regex approach would be quicker (not counting preprocessing time).
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
LIZ LI Kína Local time: 20:47 francia - kínai + ...
Copy & paste the source for Chinese > UTF8 in the UPPER dialog box, then click the 1st green button below;
OR
Copy & paste the source for UTF8 > Chinese in the LOWER dialog box, then click the 2nd green button below.
Copy & paste the source for Chinese > UTF8 in the UPPER dialog box, then click the 1st green button below;
OR
Copy & paste the source for UTF8 > Chinese in the LOWER dialog box, then click the 2nd green button below.
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value