Additional fields in a Glossary
Téma indítója: Selcuk Akyuz

Selcuk Akyuz  Identity Verified
Törökország
Local time: 10:48
angol - török
+ ...
Jan 17, 2012

http://www.cafetran.com/handbook.html#part5

Dictionaries and Glossaries

CafeTran offers a flexible interface to access and update your dictionaries in the workflow.

Glossaries are more specialized dictionaries such as terminology lists. They enable an automatic and fast look-up for any specific terminology that should be used in the translation.

This distinction between glossaries and dictionaries in CafeTran only affects the resource integration in the workflow. The lookup in glossaries is done "on the fly" each time you take a new segment, whereas the dictionary check happens when you click on the Search button in the main toolbar.


I want "on the fly" term recognition then I need to convert my existing Term Base from another CAT tool into a Glossary. Clear information and takes less than 5 minutes to create my first Glossary.

Now I would like to test the pipe character but it seems that it only works in TMX memories ("Memory for Terms" a different use of TMX files in CafeTran).

Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.

So back to my Glossary, but there is a problem. On the fly term recognition is fast but what happened to my additional fields, e.g. definition, context, subject, client, date? I cannot see them in the glossary window.

Perhaps it works with the Dictionary feature. So I created a Dictionary in the Library Menu. Problem partly solved, now I can see most of the additional fields (provided that each term has one translation). But now I have another problem, Dictionary search is not performed on the fly.


My Term Base was created in DVX (with several additional fields) but it may be any CAT tool, MemoQ or MultiTerm term bases also store additional fields. So how can we benefit from these additional and valuable information in CafeTran?


 

Selcuk Akyuz  Identity Verified
Törökország
Local time: 10:48
angol - török
+ ...
TÉMAINDÍTÓ
conversion: glossary into tmx Jan 17, 2012

Selcuk Akyuz wrote:

Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.


It seems that CafeTran has a solution for it:

http://cafetran4mac.blogspot.com/2010/11/importing-glossary-ii.html
To make the integration of your base with CafeTran complete, you may convert it to a TMX file. Create a new memory(menu Memory | New memory), and then select Memory | Conversions | Import glossary entries. When import is finished, save the memory to a tmx file (Memory | Save). This also solves the problem of duplicate entries since CT adds to the TMX memory only the latest duplicate.


So far, so good! But what is a duplicate in CafeTran? Only two identical source segments (actually terms now) with identical (or maybe different) translations.

What happens in the case of two identical source terms with identical translations but with different subject or client information (yes, additional information issue again).

I think I will find the answer myself but CT freezes when segments are loaded (after terms are successfully imported to the Memory).

-------------------

OK, 11k out of 14k term pairs were imported into the new Memory which means a loss of approx. 3k term pairs with different translations (or identical translations but with different meta information). I am searching for another tab delimited text file to TMX converter now (which will not delete any segments). By the way all meta information were saved as notes in the TMX file but I could not find a way to display them (any solution?).

Honestly, these results are not satisfactory for me. I will continue testing other features of CafeTran (there are many good features indeed) and wait for improvements to be made in the Glossary/Dictionary/Memory for Terms features.

Selcuk

[Edited at 2012-01-17 02:31 GMT]


 

Selcuk Akyuz  Identity Verified
Törökország
Local time: 10:48
angol - török
+ ...
TÉMAINDÍTÓ
continued... Jan 17, 2012

Tested the Glossary feature in a new project, to my surprise additional information were displayed at least for some terms.

Used the super tool UniCSVed to join all additional fields separated by tabs. And tested again in my project. Additional fields were displayed for terms with a single translation, but if a term has several meanings then additional fields were not displayed (img. 1). So I removed the tab between the target term and additional information (img. 2). But I am aware
... See more
Tested the Glossary feature in a new project, to my surprise additional information were displayed at least for some terms.

Used the super tool UniCSVed to join all additional fields separated by tabs. And tested again in my project. Additional fields were displayed for terms with a single translation, but if a term has several meanings then additional fields were not displayed (img. 1). So I removed the tab between the target term and additional information (img. 2). But I am aware that this is not functional, additional information should be displayed but separated by a tab. Otherwise we can not use it for "auto-completion".



Normally we do not need additional information for terms with a single meaning, we just use them. But when we add a second meaning for a term, we need such additional information, it may be the subject or definition which helps us to select one or the other meaning. Unfortunately CT does not display additional information when a term has several meanings.
Collapse


 

Igor Kmitowski  Identity Verified
Lengyelország
Local time: 09:48
Tag (2016 óta)
angol - lengyel
+ ...
Additional fields in a Glossary Jan 17, 2012

Hi Selcuk,

Selcuk Akyuz wrote:

I want "on the fly" term recognition then I need to convert my existing Term Base from another CAT tool into a Glossary. Clear information and takes less than 5 minutes to create my first Glossary.

Now I would like to test the pipe character but it seems that it only works in TMX memories ("Memory for Terms" a different use of TMX files in CafeTran).



The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.


Perhaps I can convert my glossary file (a tab limited text file) into TMX, there should be some free tools for it. But I will test it later.

So back to my Glossary, but there is a problem. On the fly term recognition is fast but what happened to my additional fields, e.g. definition, context, subject, client, date? I cannot see them in the glossary window.



CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?

Igor


 

Igor Kmitowski  Identity Verified
Lengyelország
Local time: 09:48
Tag (2016 óta)
angol - lengyel
+ ...
conversion: glossary into tmx Jan 17, 2012

You can keep the duplicate entries when converting from a tab delimited glossary to a TMX memory. Just check Keep all duplicates box in Memory | New Memory | Filter when you create the new TMX memory for the conversion.

Selcuk Akyuz wrote:

So far, so good! But what is a duplicate in CafeTran? Only two identical source segments (actually terms now) with identical (or maybe different) translations.

What happens in the case of two identical source terms with identical translations but with different subject or client information (yes, additional information issue again).

I think I will find the answer myself but CT freezes when segments are loaded (after terms are successfully imported to the Memory).



CT may freeze when you load huge TMX or glossary files into RAM memory assigned to CafeTran. You can increase this value. See Edit | Options | Memory tab | Java memory size (MB). Remember not to go over the actual RAM memory on your system.


-------------------

OK, 11k out of 14k term pairs were imported into the new Memory which means a loss of approx. 3k term pairs with different translations (or identical translations but with different meta information). I am searching for another tab delimited text file to TMX converter now (which will not delete any segments). By the way all meta information were saved as notes in the TMX file but I could not find a way to display them (any solution?).



When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit


 

Selcuk Akyuz  Identity Verified
Törökország
Local time: 10:48
angol - török
+ ...
TÉMAINDÍTÓ
on Edit Tu menu and others Jan 18, 2012

Igor Kmitowski wrote:

The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.


Clear information, thanks! My glossary structure is Source Term TAB Target Term TAB Additional fields (all separated with tabs). No pipe characters, but I don't know how does CT consider my additional fields.

CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?


Sure, they are. I am good with csv files thanks to UniCSVed.

Well, but as I have stated above in my third message, Glossary file displays additional fields provided that you do not have duplicate source terms (with identical or different translations). I want to see the additional fields for duplicate terms and therefore Glossary is not so useful for me.

You can keep the duplicate entries when converting from a tab delimited glossary to a TMX memory. Just check Keep all duplicates box in Memory | New Memory | Filter when you create the new TMX memory for the conversion.


Thanks, it worked. No data loss now!


CT may freeze when you load huge TMX or glossary files into RAM memory assigned to CafeTran. You can increase this value. See Edit | Options | Memory tab | Java memory size (MB). Remember not to go over the actual RAM memory on your system.


I have 2GB only and 1GB is assigned to Java. I assume it will not be a good idea to increase it to 2GB. Other programs may freeze.

When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit




Scroll down the list with the mouse to find the term, click on the number, click on "Edit Tu", select "Edit note" to display it, then click on X (or press Esc three times) to close the window. IMO, it is time consuming and excessive use of mouse for a program. (Generally speaking, after testing CT for 3 days, I feel many operations in CT requires use of mouse)


I still did not test the External DB function, I have to make some research before using H2, MySQL, Oracle 10g, HSQLDB 2.0 or Derby (Java DB). Use of a term list with any of these databases may be better (for speed and hopefully for GUI).

But as for the other features I tested for terminology management (Glossary, Dictionary, Memory for Terms), sorry but I am really lost in them. IMO, Dictionary is useless because there is no on-the-fly search. Memory for Terms requires too much mouse and keyboard use. Most promising one is Glossary but it needs some improvements that I have discussed in this thread. By the way I loved docking and undocking of tabs

Kind regards,

Selcuk



[Edited at 2012-01-18 03:38 GMT]


 

Igor Kmitowski  Identity Verified
Lengyelország
Local time: 09:48
Tag (2016 óta)
angol - lengyel
+ ...
on Edit Tu menu and others Jan 18, 2012

Is there any standard for fields in tab delimited text files? Currently, CT follows this scheme:

source TAB target|alternative target|alternative target TAB additional fields

I implemented the above based on the users' request. It seems to me that there is no common agreement on how to treat the fields. In your case the issue is with alternative targets. They are not pipe separated but set in other fields.

As for the mouse operations, all basic workflow ope
... See more
Is there any standard for fields in tab delimited text files? Currently, CT follows this scheme:

source TAB target|alternative target|alternative target TAB additional fields

I implemented the above based on the users' request. It seems to me that there is no common agreement on how to treat the fields. In your case the issue is with alternative targets. They are not pipe separated but set in other fields.

As for the mouse operations, all basic workflow operations have keyboard shortcuts. For example, press F2 to list the matched terms and press the term number to insert it. The same holds true for autotranslation and fuzzy matches (F1 key). Yes, to reach additional meta information such as segment/terms notes in Memory you need to use the mouse.

Igor

Selcuk Akyuz wrote:

Igor Kmitowski wrote:

The pipe character in TMX memories is used for the stemming (prefix matching) feature whereas in simple tab delimited glossaries it used to separate multiple target meanings.


Clear information, thanks! My glossary structure is Source Term TAB Target Term TAB Additional fields (all separated with tabs). No pipe characters, but I don't know how does CT consider my additional fields.

CafeTran assumes that all fields in a text file are separated by the same character (for example TAB). Are they?


Sure, they are. I am good with csv files thanks to UniCSVed.

Well, but as I have stated above in my third message, Glossary file displays additional fields provided that you do not have duplicate source terms (with identical or different translations). I want to see the additional fields for duplicate terms and therefore Glossary is not so useful for me.

You can keep the duplicate entries when converting from a tab delimited glossary to a TMX memory. Just check Keep all duplicates box in Memory | New Memory | Filter when you create the new TMX memory for the conversion.


Thanks, it worked. No data loss now!


CT may freeze when you load huge TMX or glossary files into RAM memory assigned to CafeTran. You can increase this value. See Edit | Options | Memory tab | Java memory size (MB). Remember not to go over the actual RAM memory on your system.


I have 2GB only and 1GB is assigned to Java. I assume it will not be a good idea to increase it to 2GB. Other programs may freeze.

When you see search results in the Memory tab, click the segment number and go to Edit Tu menu to see notes for this translation unit




Scroll down the list with the mouse to find the term, click on the number, click on "Edit Tu", select "Edit note" to display it, then click on X (or press Esc three times) to close the window. IMO, it is time consuming and excessive use of mouse for a program. (Generally speaking, after testing CT for 3 days, I feel many operations in CT requires use of mouse)


I still did not test the External DB function, I have to make some research before using H2, MySQL, Oracle 10g, HSQLDB 2.0 or Derby (Java DB). Use of a term list with any of these databases may be better (for speed and hopefully for GUI).

But as for the other features I tested for terminology management (Glossary, Dictionary, Memory for Terms), sorry but I am really lost in them. IMO, Dictionary is useless because there is no on-the-fly search. Memory for Terms requires too much mouse and keyboard use. Most promising one is Glossary but it needs some improvements that I have discussed in this thread. By the way I loved docking and undocking of tabs

Kind regards,

Selcuk



[Edited at 2012-01-18 03:38 GMT]
Collapse


 


To report site rules violations or get help, contact a site moderator:

A fórum moderátora(i)
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

Additional fields in a Glossary

Advanced search






SDL MultiTerm 2021
One central location to store and manage multilingual terminology.

By providing access to all those involved in applying terminology (such as engineers, marketers, translators, and terminologists), our terminology management solution ensures consistent and high-quality content from source through to translation.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Terminológiai keresés
  • Munkák
  • Fórumok
  • Multiple search