A témához tartozó oldalak:   < [1 2]
Lingvopoint - reduce profile info?
Téma indítója: Deirdre Brophy (X)
Dan Lucas
Dan Lucas  Identity Verified
Egyesült Királyság
Local time: 20:04
Tag (2014 óta)
japán - angol
Automating OCR is not difficult Nov 19, 2014

Triston Goodwin wrote:
They would have to come to each of these adapted profiles individually, take a screenshot, run it through a OCR and then upload the information to their site. And that's assuming that they are able to identify which profiles weren't scanned during the crawl.

It's probably easier than you think.

For example, the PyTesser Python module uses the Tesseract OCR engine, so you could scrape a profile using something like Beautiful Soup and check to see if the profile contains any large images. If it does, pass the image to the OCR engine and parse the (text) result. Not perfect, but likely good enough. Otherwise parse the profile text as usual.

In theory - I say that because it is ProZ, not ourselves, who controls the site - we have two obvious choices. First, try to close the site completely, which would, whatever some members may think, have a chilling effect on site use. Second, just accept that scammers are a fact of life in any profession that delivers intangible services over the internet, and work round them.

Dan


 
Triston Goodwin
Triston Goodwin  Identity Verified
Egyesült Államok
Local time: 13:04
spanyol - angol
+ ...
Automated OCR Nov 19, 2014

Dan Lucas wrote:

Triston Goodwin wrote:
They would have to come to each of these adapted profiles individually, take a screenshot, run it through a OCR and then upload the information to their site. And that's assuming that they are able to identify which profiles weren't scanned during the crawl.

It's probably easier than you think.

For example, the PyTesser Python module uses the Tesseract OCR engine, so you could scrape a profile using something like Beautiful Soup and check to see if the profile contains any large images. If it does, pass the image to the OCR engine and parse the (text) result. Not perfect, but likely good enough. Otherwise parse the profile text as usual.

In theory - I say that because it is ProZ, not ourselves, who controls the site - we have two obvious choices. First, try to close the site completely, which would, whatever some members may think, have a chilling effect on site use. Second, just accept that scammers are a fact of life in any profession that delivers intangible services over the internet, and work round them.

Dan




I think you're right. I personally lean more towards the second option.

I haven't seen this kind of automated OCR tool before. Using an image might still be effective at first, since it's not something we really see here on Proz. I know Google sure had a hard time with my profile when I used an image instead of text for my About Me a few months ago.


 
Thayenga
Thayenga  Identity Verified
Németország
Local time: 21:04
Tag (2009 óta)
angol - német
+ ...
Additionally Nov 20, 2014

Maija Cirule wrote:

As a preventive action, I have included in my "About me" text the following sentence: For business correspondence, I use ONLY the EMAIL address WITH THE DOMAIN NAME specified in my profile, no gmail, yahoo, hotmail, etc., therefore, any my business-related e-mails from free email addresses are INVALID. Besides, I have encrypted my CV (of course, it can be typed but cannot be copied or edited). And last but not the least: never ever include your e-mail address in your CV or elsewhere


My CV's are not publicly available, only upon request, and then they include no sensitive information. Address, Skype, location, email address, etc. will be provided upon first job assignment on my invoice. This might "scare off" a few possible customers, but if an agency or an end-client is serious and legitimate, they understand these precautions that protect both parties. Additionally I have password-protected my PDF business brochures so they cannot be copied or printed - only typed if someone has the time. They also have my name in text fields/watermarks across the pages so that screenshots cannot be "marketed".


 
DLyons
DLyons  Identity Verified
Írország
Local time: 20:04
spanyol - angol
+ ...
Can be bypassed Nov 20, 2014

Thayenga wrote:

Additionally I have password-protected my PDF business brochures so they cannot be copied or printed - only typed if someone has the time. They also have my name in text fields/watermarks across the pages so that screenshots cannot be "marketed".



It's not hard to get around password-protection on PDFs. But time is money to scammers, so usually they just ignore anything that takes extra effort and move on to someone else.


 
A témához tartozó oldalak:   < [1 2]


To report site rules violations or get help, contact a site moderator:

A fórum moderátora(i)
Lucia Leszinsky[Call to this topic]

You can also contact site staff by submitting a support request »

Lingvopoint - reduce profile info?







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »