A témához tartozó oldalak: [1 2 3 4] > | Recommendations for creating glossary from websites? Téma indítója: Miranda Drew
|
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?
I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business. | | | Mario Chávez Egyesült Államok Local time: 22:31 Tag (2024 óta) angol - spanyol + ... Skeptical about terminology extraction but... | Aug 15, 2024 |
Thanks to a phenomenon called polysemy, most languages record more than one meaning to a single term. I used memoQ for complex technical translations of CAD software intended for architects and others in the construction industries. I had to make several entries for a single term because the meaning and context were different. I've never used terminology extraction tools before, instead opting for applying my expertise in the field and disciplined instinct to compare read bilingual docume... See more Thanks to a phenomenon called polysemy, most languages record more than one meaning to a single term. I used memoQ for complex technical translations of CAD software intended for architects and others in the construction industries. I had to make several entries for a single term because the meaning and context were different. I've never used terminology extraction tools before, instead opting for applying my expertise in the field and disciplined instinct to compare read bilingual documents to see how a term was rendered in the foreign language.
Your client's bilingual website may be a pain to navigate, it may have been translated by different people, who in turn used different terms for the same thing. Does your client want you to stick closely to the translations in their bilingual website? Do you have leeway to use better industry terms? Is your project's volume large enough to warrant building a client-specific termbase? I love memoQ's features to build and maintain termbases, by the way, and I exploit them as much as possible. Another question to ponder: do you have enough time to research terms?
Even if you use a terminology extraction tool (free or paid), I bet you'll have a time-consuming task ahead of you. I wouldn't touch any AI application to do any term extraction at all.
Miranda Drew wrote:
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?
I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business. ▲ Collapse | | | Miranda Drew Olaszország Local time: 04:31 olasz - angol TÉMAINDÍTÓ Thanks for mansplaining that | Aug 15, 2024 |
I've been a translator for 20 years. I know how to check terminology. Your answer is extremely condescending and completely useless. | | | Jorge Payan Egyesült Államok Local time: 21:31 Tag (2002 óta) német - spanyol + ... My suggestion | Aug 16, 2024 |
Miranda Drew wrote:
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?
I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business.
I would use HTTrack or a similar tool to download the two versions of your customer's website to my PC.
Then, I would align both versions and create a TM in TMX format.
Finally, I would use Synchroterm to extract and curate bilingual terms and expressions from that TM, producing the desired glossary.
So far, I haven't found any AI-driven tool specifically for this bilingual term extraction process; I'm sure someone is working on it.
[Edited at 2024-08-16 05:06 GMT] | |
|
|
Jorge Payan Egyesült Államok Local time: 21:31 Tag (2002 óta) német - spanyol + ... | Philippe Locquet Portugália Local time: 03:31 Tag (2013 óta) angol - francia + ... Synchroterm or Copilot | Aug 16, 2024 |
Jorge Payan wrote:
Finally, I would use Synchroterm to extract and curate bilingual terms and expressions from that TM, producing the desired glossary.
Synchroterm is very good at terminology extraction. I find the best results when used on "Bitext" that are produced by Align Factory (from the same developpers, Terminotix). Align Factory also allows to align directly from the URL if the website doesn't have scraping protection.
Jorge Payan wrote:
So far, I haven't found any AI-driven tool specifically for this bilingual term extraction process; I'm sure someone is working on it.
AI: You don't necessarily need a tool to do this. If both source and target are found on the same page, then you can write a prompt to ask Copilot to extract source and target terms from the page. Open Copilot and the page as I describe in this video: https://youtu.be/u4nROnmnIxI?si=YAMmP5yQtwif-HhV You will need to accept to work in no memory mode (Copilot shenaningans...) and then ask it to extract the terms and tell it to format it in markdown format, this will allow you to copy and paste it in a table. You'll have a limit though, it will probably do only 20 or so term pairs at a time. If you need more, and your prompt works, try "Search GPT" (paid).
Hope this helps | | | Mario Chávez Egyesült Államok Local time: 22:31 Tag (2024 óta) angol - spanyol + ... No need to be unprofessional | Aug 16, 2024 |
Have you considered for a moment that the first sentence in my comment was introductory to address the topic, not to address you personally?
At no time my comment was addressing your knowledge or lack thereof. I find your reaction unprofessional and uncalled for.
MC
Miranda Drew wrote:
I've been a translator for 20 years. I know how to check terminology. Your answer is extremely condescending and completely useless. | | | Miranda Drew Olaszország Local time: 04:31 olasz - angol TÉMAINDÍTÓ You talk down to me and I'm unprofessional? | Aug 16, 2024 |
[quote]Mario Chávez wrote:
Have you considered for a moment that the first sentence in my comment was introductory to address the topic, not to address you personally?
At no time my comment was addressing your knowledge or lack thereof. I find your reaction unprofessional and uncalled for.
MC
[quote]Miranda Drew wrote:
I didn't ask for a general treatise on language and translation. I asked a specific question about specific tools. You decided to mansplain things that I think I learned in kindergarten (wow words can have more than one meaning?), give me unsolicited advice and not provide me with anything remotely near an actual answer to my question. You have the right to post whatever you want, but I've dealt with this kind of condescending behavior from men my whole life and I'm not going to be quiet about it anymore, even if that makes me 'unprofessional '. | |
|
|
Dan Lucas Egyesült Királyság Local time: 03:31 Tag (2014 óta) japán - angol And with files | Aug 16, 2024 |
Philippe Locquet wrote:
If you need more, and your prompt works, try "Search GPT" (paid). Hope this helps 
It does, thank you. My quote engineering is not very good, so this is very useful.
Do you think it would be possible to use an LLM and two pdfs to extract terms?
I haven't found Synchroterm useful for Japanese so far.
Regards,
Dan | | | | Miranda Drew Olaszország Local time: 04:31 olasz - angol TÉMAINDÍTÓ Looks interesting | Aug 16, 2024 |
That looks useful, I'll check it out, thanks | | | Dan Lucas Egyesült Királyság Local time: 03:31 Tag (2014 óta) japán - angol Unsuccessful | Aug 16, 2024 |
This is interesting. Unfortunately I tried it with two one-page PDF documents in Japanese and English and it was unable to process the job, several times. The error was not informative. Perhaps it works better with European languages?
But thanks again,
Dan | |
|
|
Philippe Locquet Portugália Local time: 03:31 Tag (2013 óta) angol - francia + ...
Dan Lucas wrote:
Philippe Locquet wrote:
If you need more, and your prompt works, try "Search GPT" (paid). Hope this helps 
It does, thank you. My quote engineering is not very good, so this is very useful.
Do you think it would be possible to use an LLM and two pdfs to extract terms?
I haven't found Synchroterm useful for Japanese so far.
Regards,
Dan
If you wish to use AI for this task, something like Chat GPT should work. To engineer your prompt, first, tell the robot what you want from it; and that it will have to wait for you to upload the two files on which the job is to be executed. Then pop both files in, Bob's your uncle!
Hope it works (it should, unless Chat GPT complains about pdf...).
I said Chat GPT, but Claude is very good too with text, they both need slightly different prompt styles, but with some tweaking you should be OK.
Bests,
Philippe | | | | Samuel Murray Hollandia Local time: 04:31 Tag (2006 óta) angol - afrikaans + ...
Miranda Drew wrote:
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?
Do you mean a glossary of terms or a translation memory?
I'm not aware of any tool that can reliably create a list of words in the source language that are likely to be "terms" and then find their translations in the target language. In fact, the biggest problem with what you're proposing is how difficult it is to create a list of source language terms. I've tried some programs that do this in the past, but the results were dismal. These tools either assume that (a) frequently occurring words are "terms" or (b) highly unique words are "terms". This approach may work in languages with compound nouns, but not in e.g. English. Do you think it'll work in Italian?
I have found that the best way to extract terminology from a bilingual website is manually. In other words, create a TM from the website, then load that TM into the CAT tool, and then regularly look up terms initially, and add them to the glossary based on matches from the TM.
I wonder if it would be possible to ask an AI tool to come up with a list of words in the source language that are "likely to be terms". You can then add that list to your glossary (which would be useful even if the glossary entries have no target text).
[Edited at 2024-08-17 16:35 GMT] | | | A témához tartozó oldalak: [1 2 3 4] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Recommendations for creating glossary from websites? TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |