Translating PDF Files with Free Tools |
By Eric Le Carre |
Published
03/4/2009
|
Translation Techniques
|
Recommendation:
|
Contact the author
|
Quicklink: http://hun.proz.com/doc/2264
|
|
|
If, like me, you
receive many requests for quotation for PDF documents, especially for
PDF marketing materials, and you don't know where to start because you
don't own any PDF editing software applications like Abbyy PDF
Transformer, Nuance PDF Converter or
Solid Converter PDF, this article will show you how to count words in
your PDF
files, extract the texts and keep their formatting.
This
solution is more a workaround than a fully functional solution for
translating PDF files, as it may require some manual editing work.
However, it is well suited for short to mid-sized PDF
documents, especially for PDF marketing materials.
Please note
that this
solution doesn't work if your PDF file is password-protected and has
PDF
security options turned on.
The tools you need are
the following:
For
information on how and where to install these programs, especially
AbracadabraCompteur 2, read the accompanying documentation.
Counting
Words
Counting
words is the basic step you need to perform to know how many words
there are in your PDF file and provide your customer with quoting and
pricing information.
To count words with
AbracadabraCompteur 2:
- In
Adobe Reader, select Tools
> Word Counter
> Current
Page to count the words from the currently
displayed page or Tools
> Word Counter
> Document
to count the words in a PDF files with
more than one page.
- To count
word with Translator's Abacus:
- Double-click
the WordCount.exe file, the executable file for Translator's Abacus.
For my part, I put it under C:\Program Files\Translator'sAbacus3.1 and
created a shortcut on the Windows Desktop.
- In the
Translators Abacus window, click Add
files.
- In
the Open File
window, select the PDF file or files whose words you want to count.
- Click Report Word Count.
- The word
count is displayed in your Web Browser.
- In the
Translators Abacus window, click Exit
to quit the application.
Extracting
the text...
There is a special way
of extracting the text from the PDF file.
In
Adobe Reader, select Editing
> Select All,
then select Editing
>
Copy. You
can also use the key combinations Ctrl+A (Select All) and
Ctrl+C (Copy). All the selected text is then copy into the Windows
Clipboard.
...and
Keeping its Formatting
Using
AutoUnbreak, you can keep the basic formatting attributes of the
original PDF files (font names, sizes, colors, etc.) and remove most of
the carriage returns/ line breaks that you get when you simply cut and
paste the contents of a PDF file into an empty RTF or MS Word document.
To keep
the format of your original PDF file:
- Double-click
the AutoUnbreak.exe, the executable file for AutoUnbreak, to start the
application.
- In the
AutoUnbreak main window, click 1.
Paste to paste the contents of the Windows Clipboard into
AutoUnbreak.
- When the
contents of the Windows Clipboard are in the AutoUnbreak main window,
click 2. Unbreak!
to remove the carriage returns/line breaks.
- In the Processing done!
message window that appears, click OK.
- Back into
the AutoUnbreak main window, click 3.
Copy results.
- In the Text copied to clipboard
message window that appears, click OK.
- Back into
the AutoUnbreak main window, click Quit
to close AutoUnbreak.
- Start your
MS Word processor.
- In an
empty MS Word page, press Ctrl+V to copy the resulting text from your
AutoUnbreak session into MS Word.
You
can now compare your MS Word text and the original text from
the
PDF file to determine whether there are still unremoved carriage
returns/ line breaks and/or any other remaining formatting issues.
These will have to be manually fixed.
When you are happy
with your new MS Word document, you can start translating it with the
translation memory system of your choice.
Happy translating!
This
article was written with KompoZer, an open source WYSIWIG
(What You See
Is What You Get) HTML editor.
Copyright © ProZ.com, 1999-2024. All rights reserved.
|