Convert ae, oe, ue, ss to ä, ö, ü, ß where applicable
Téma indítója: Hans Lenting

Hans Lenting  Identity Verified
Hollandia
Tag (2006 óta)
német - holland
Jan 1

I have a list with about 40K (1) entries where ä, Ä, ö, Ö, ü, Ü and ß have been transcribed as ae, Ae, oe, Oe, ue, Ue and ss. But the list also contains (2) entries where ae, Ae, oe, Oe, ue, Ue and ss are not transcriptions of ä, Ä, ö, Ö, ü, Ü and ß.

Question: How can I correct entries of type (1) but leave entries of type (2) unmodified?

Ablesegeraet
Ablieferungspruefung (1)
Ablieferungspruefungen
abmeisseln
Abmessen
Abmessung
Abschaltfrequenz (2)
Abschaltreaktivitaet
Abschaltsteuerung
Abschaltverstaerker
abschiessen
Abschirmbehaelter (1)
Abschirmungsschlauch (2)
Abschlaege
Abschlaeger
Abschlaglaenge
Abschlagschuss
abschliessen


 

esperantisto  Identity Verified
Local time: 22:28
Tag (2006 óta)
angol - orosz
+ ...
SITE LOCALIZER
Spellcheck Jan 1

First, batch replace ae with ä, oe with ö etc. Then replace most obvious wrong replacements such as ßch to ssch. Run a spellchecker and correct as suggested.

Hans Lenting
 

Erik Freitag  Identity Verified
Németország
Local time: 20:28
Tag (2006 óta)
holland - német
+ ...
Exactly Jan 1

esperantisto wrote:

First, batch replace ae with ä, oe with ö etc. Then replace most obvious wrong replacements such as ßch to ssch. Run a spellchecker and correct as suggested.


That'd be my advice, too. Type 1 errors with umlauts will be few and far between anyway. You'll have most of them covered by re-replacing "qü" with "que", "Qü" with "Que", "eü" with "eue", and "Eü" with "Eue". Then, as esperantisto suggests, do ßch->ssch. Correct what's left over with a spellchecker (preferrably a good one, the old Duden spellchecker comes to mind).

You may be left with not as many manual corrections as one would think at first glance.

Succes!


Hans Lenting
 

Samuel Murray  Identity Verified
Hollandia
Local time: 20:28
Tag (2006 óta)
angol - afrikaans
+ ...
@Hans Jan 2

Hans Lenting wrote:
I have a list with about 40 000 entries...
How can I correct entries of type (1) but leave entries of type (2) unmodified?


I'm afraid you're going to have to use a spell-checker, and it would have to be a spell-checker capable of checking compound nouns. Do you have such a spell-checker? I would be surprised if MS Word's spell-checker can't do this sort of thing.

Then it's a matter of removing mis-spelled words from the list, then doing conversions on those mis-spelled words, then removing the mis-spelled words from that list, and then you're left with a list of words that your spell-checker doesn't recognise with or without the conversion, which you'd have to check manually. One possible downside to this method (that you can work around, if you know of it) is that only one variant of a word will end up in the final list. So if for example both "ass" and "aß" are valid German words, then only one of them will end up in your list.

I use a macro in MS Word from editorium.com that makes a list of mis-spelled words, although the macro does not remove those words from the original list (so you'd have to find a way of doing that). On a large document with many mis-spellings, your display could freeze until the macro has run its entire course. You can try to increase the speed by replacing line breaks with spaces temporarily. You may also benefit from a different macro (or second macro) that highlights mis-spelled words in the original list. I googled for it and found one that works for me, here. In addition, I confirm that this macro works in Excel 365 (at least, it works in French) -- it highlights whole cells, so you'd have to ensure you have one word per cell.

Samuel

[Edited at 2021-01-02 12:00 GMT]


Hans Lenting
 

Heinrich Pesch  Identity Verified
Finnország
Local time: 21:28
Tag (2003 óta)
finn - német
+ ...
qu/Qu und ssch ersetzen Jan 2

Diese durch Sonderzeichen ersetzen und dann die generelle Ersetzung von ue -> ü, ss -> ß etc. durchführen. Danach die Sonderzeichen zurückkonvertieren.
Ich bin mit der Rechtschreibprüfung von Word zufrieden.
Bei ß muss man natürlich aufpassen, dass nach Diphthong ß steht, selten aber nach einzelnen Vokalen. Also würde ich iess nach ieß generell konvertieren etc. Oder die Liste gilt für die Schweiz. Dann kein ß.
Am Schluss musst du die Liste doch manuell prüfen.


Hans Lenting
 

Hans Lenting  Identity Verified
Hollandia
Tag (2006 óta)
német - holland
TÉMAINDÍTÓ
Es war viel Arbeit Jan 3

Heinrich Pesch wrote:

Am Schluss musst du die Liste doch manuell prüfen.


Genau so habe ich es auch gemacht. Und dabei ein neues Wort gelernt:

https://iate.europa.eu/search/standard/result/1609653594195/1

Rebate on the rebate. I think that says it all. This German word is perhaps doomed to perish. Curiously, there’s no entry for “good riddance”.


 

Hans Lenting  Identity Verified
Hollandia
Tag (2006 óta)
német - holland
TÉMAINDÍTÓ
Both forms Jan 4

Samuel Murray wrote:

One possible downside to this method (that you can work around, if you know of it) is that only one variant of a word will end up in the final list. So if for example both "ass" and "aß" are valid German words, then only one of them will end up in your list.


I used this list to fix misspellings in my downloaded copy of the IATE de_nl. Since I added the term pairs with the corrected spelling, the old ones, probably from the beginning of IATE, are still available.

On the other hand, there will be many term pairs where I incorrectly replaced an ae with ä, etc. For my purposes, that doesn't matter: the correct spelling forms are still available. I wonder whether the IATE will ever be corrected in this regard. Probably not, since that would be gigantic operation.


 

Hans Lenting  Identity Verified
Hollandia
Tag (2006 óta)
német - holland
TÉMAINDÍTÓ
Another approach Jan 9

In order to reduce the number of words that I would have to check manually, I came up with this other approach:

From various sources I collected lists with correctly spelled German words. I placed them in one file of about 500K words. From this list I extracted all words with an ä, Ä, ö, Ö, ü, Ü or ß, resulting in a new list of about 76K words.

I changed all words in this list to lowercase and copied them to the second column of a spreadsheet. I then replaced all
... See more
In order to reduce the number of words that I would have to check manually, I came up with this other approach:

From various sources I collected lists with correctly spelled German words. I placed them in one file of about 500K words. From this list I extracted all words with an ä, Ä, ö, Ö, ü, Ü or ß, resulting in a new list of about 76K words.

I changed all words in this list to lowercase and copied them to the second column of a spreadsheet. I then replaced all ä, ö, ü and ß in the 76K list to ae, oe, ue and ss and copied the result to the first column of the spreadsheet.

Finally, I used this spreadsheet to make case-adaptive replacement to the original list of 40K words with incorrect spelling.

So, using the 76K list I have entries like:

Screenshot 2021-01-09 at 09.55.52

and:

Screenshot 2021-01-09 at 09.56.06

And with this, I can correct words like:

Fuehrungsgelaende
Gelaendefuehrung



[Edited at 2021-01-09 09:07 GMT]
Collapse


Dan Lucas
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Convert ae, oe, ue, ss to ä, ö, ü, ß where applicable

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Terminológiai keresés
  • Munkák
  • Fórumok
  • Multiple search