Tagger for highlighted text in MS Word file
Thread poster: adrianrff
adrianrff
adrianrff
Venezuela
English to Spanish
Oct 29, 2020

We are currently translating several batches of MS Word files that come pre-prepared by the client with highlithed text, which they don't want translated. I'm trying to reduce the time spent by our PMs preparing these files for translation (creating and updating non-translatable lists, etc), so I want to create a tagger in memoQ that automatically imports the highlighted text as tags to minimize the possibility of mistakenly translating those. I'm aware I can simple hide the highlighted text, I ... See more
We are currently translating several batches of MS Word files that come pre-prepared by the client with highlithed text, which they don't want translated. I'm trying to reduce the time spent by our PMs preparing these files for translation (creating and updating non-translatable lists, etc), so I want to create a tagger in memoQ that automatically imports the highlighted text as tags to minimize the possibility of mistakenly translating those. I'm aware I can simple hide the highlighted text, I have macros created in Word to do this quickly, but in this case this is not a good idea: there are highlighted terms in the middle of sentences that, if missing in memoQ, could lead to translation mistakes.

I thought: "This should be easy... or at least it whould be easy to find information on it", but I haven't been able to come up with a simple solution for this.

I'm very familiar with the importing process and importing filters, as well with regex. memoQ doesn't seem to have a built-in filter configration for this. I tried creating a macro to wrap the highlighted text in some special combination of characters and then create a regex tagger for that in memoQ. It works, but I'm having problems with highlighted text that span more than one line. The search/replace feature in Word does not stop when it encounters a paragraph or soft break and it wraps all the lines with a single pair of characters, like in the images below. I'm using === as the wrapping group.


I'm sure I could find a way to do this differently in Word using regex and VBA code, but It seems a bit of an overkil for what it looked like a very common task, so I'm resorting to this knowladgeble community for assitance.

Any suggestions is appreciated.

[Edited at 2020-10-29 14:37 GMT]
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 13:24
Member (2006)
English to Afrikaans
+ ...
@Adrian Oct 29, 2020

adrianrff wrote:
The search/replace feature in Word does not stop when it encounters a paragraph or soft break...


Well, I'm 99% sure it would not bother the client if soft breaks and hard breaks (and possibly tabs) are not highlighted. I mean, you can't really see a non-highlighted line break. So, a trick that I often employ when doing find/replace with highlight, is to find all ^p, ^l and ^t and replace it with "not highlighted", and then in the next find/replace operation, specify that the Find text must be highlighted. This causes Word to stop at the breaks.

You can also use other attributes but they can be a little more complicated to use, e.g. you can mark all text as English and then mark the breaks as Spanish, and then specify "must be English" in the next find/replace operation's Find field. You'd still have to consider whether you'd want to flag individual sentences, but if you do, then you'd have to mark the spaces between sentences in some way -- for this, the English vs Spanish trick might actually work better, but remember that you'd have to mark the punctuation marks as English back again after you've marked the punctuation+space as Spanish.


[Edited at 2020-10-29 16:10 GMT]


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 15:24
English to Russian
. Oct 29, 2020

.

[Edited at 2020-10-29 19:16 GMT]


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 15:24
English to Russian
Why not replace yellow highlights with hidden text? Oct 29, 2020

Even if there is some hidden text in the middle of a sentence, it turns into a tag but not hidden completely. Why the highlights are 'missing in memoQ'?



[Edited at 2020-1
... See more
Even if there is some hidden text in the middle of a sentence, it turns into a tag but not hidden completely. Why the highlights are 'missing in memoQ'?



[Edited at 2020-10-29 19:26 GMT]
Collapse


 
adrianrff
adrianrff
Venezuela
English to Spanish
TOPIC STARTER
Clever! Oct 29, 2020

Samuel Murray wrote:

adrianrff wrote:
The search/replace feature in Word does not stop when it encounters a paragraph or soft break...


Well, I'm 99% sure it would not bother the client if soft breaks and hard breaks (and possibly tabs) are not highlighted. I mean, you can't really see a non-highlighted line break. So, a trick that I often employ when doing find/replace with highlight, is to find all ^p, ^l and ^t and replace it with "not highlighted", and then in the next find/replace operation, specify that the Find text must be highlighted. This causes Word to stop at the breaks.

You can also use other attributes but they can be a little more complicated to use, e.g. you can mark all text as English and then mark the breaks as Spanish, and then specify "must be English" in the next find/replace operation's Find field. You'd still have to consider whether you'd want to flag individual sentences, but if you do, then you'd have to mark the spaces between sentences in some way -- for this, the English vs Spanish trick might actually work better, but remember that you'd have to mark the punctuation marks as English back again after you've marked the punctuation+space as Spanish.


[Edited at 2020-10-29 16:10 GMT]


Thank you, Samuel. That's very helpful. I never thought of "unhighlighting" breaks and segment-breaking characters.

I think I was overcomplicating things though. I didn’t realize that memoQ treats hidden text just as any other attribute when there's a mid-segment change: it will enclose that text in a pair of tags, so the content will show up in memoQ. It does hide whole paragraphs with uniform hidden attribute though, but that's not a big problem, the linguists should check the reference file if they need to.

I will simply use the macro for hiding highlighted text and import that file into memoQ, but, again, your answer was somewhat enlightening. Thanks.


 
adrianrff
adrianrff
Venezuela
English to Spanish
TOPIC STARTER
That's it Oct 29, 2020

Stepan Konev wrote:

Why the highlights are 'missing in memoQ'?



They're not.

You are absolutely correct, when there's a mid-segment change in attributes (in this case hidden text) memoQ will enclose that text in a pair of tags, so the mid-sentence content will still be visible. What it does hide are whole paragraphs with uniform hidden attribute, but that's not a problem as the reference files are always available to the linguists.

Thank you, Stepan, I was definitly taking an overly complicated approach.

Regards,

[Edited at 2020-10-29 21:06 GMT]


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 13:24
Member (2006)
English to Afrikaans
+ ...
@Adrian Oct 29, 2020

adrianrff wrote:
What it does hide are whole paragraphs with uniform hidden attribute...


Surely hiding whole paragraphs one at a time is not a problem. And what happens when you unhide the line breaks -- does MemoQ treat it better? And hey, you can always replace every line break with a line break plus a dummy character, and unhide both, so that it's clearer in MemoQ more or less where the paragraphs go.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Tagger for highlighted text in MS Word file






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »