You never know when some knowledge will be useful

I'm a big fan of learning. Can't stand a day without learning new thing, and can't stand people who are not learning either. Recently I was reading Black Hat Python, it's a great book, I truly recommend it. Even such noob like myself could figure out the stuff described there, it's just brilliant. Anyways. I was reading it for fun, and maybe to use it for my private projects sometime. Never really though that I could use it in my work. Sure it would help me to develop my skills in general, but nothing to apply directly. So I thought. And I was wrong.

On chapter 10 section Winning the Race you can read, among other things, about quite nicely working folder monitor. That made me thinking. We have FineReader 12 Professional at work and I was already playing with its command line magic, thanks to stackoverflow. And I knew that Abby is offering Hot Folder functionality, but only in Coroprate version which is more expensive, of course. So hey, why not to combine mentioned folder monitor with command line options and create my own Hot Folder. And so I did.

I won't guide you through folder monitoring. People smarter than I have figured it out for you, so just head to this script and adjust it to your needs. You shouldn't care much about anything else than "if action == FILE_CREATED:" as this is where you'll put a call to my method. I strongly advice you to strip things you don't really need from this code. If you don't know how, hire an engineer. Method is really simple.

def OCR(filePath, fileName, language):
    extension = re.compile('\..+')
    fileName = extension.sub('.docx', fileName)
    savePath = os.path.join(saveDir, fileName)
    command = "FineCMD.exe \"" + filePath + "\" /lang " + language + " /out \"" + savePath + "\" /quit"
    os.system(command)

As you can see it just launches FineReader with specified recognition language and input file, it also tells it to output docx and quit after processing input file. And yeah, saveDir is defined elsewhere in the code as global variable. Good catch:)

To ease your life a bit you can create dictionary of languages and paths to be monitored. Good guess again, "out" will be your saveDir:)

{
  "english": "D:\\Data\\FR\\Read\\English",
  "german": "D:\\Data\\FR\\Read\\German",
  "hungarian": "D:\\Data\\FR\\Read\\Hungarian",
  "out": "D:\\Data\\FR\\Out"
}

And then launch monitors, yeah there are many each on its own thread, like this:

for language, path in dirsToMonitor.iteritems():
    monitor_thread = threading.Thread(target=start_monitor,args=(path,language))

You can really do anything you like with it. I've also added simple cleanup function just to not let my hard drive get swamped with, not needed anymore, input files. Will not share it here though, go create yours. It'll be better, I promise, as it'll be suited to your needs.

Back to the book. I've read a lot of criticism on the web directed toward this and other cookbooks. It can all be summed up with "people don't get the idea of cookbook". When you're making dinner and using "real" cookbook, you want only simple directions what you need to do and in what order. That's it. I don't think any normal person expects book with recipes to tell them how to make wine or what's a quarter. However for some strange reason people expect it from IT cookbooks. And that's wrong. This is just a collection of recipes which could help you solve common problems. If you want to know more about some area or apply it to other problem, like I did, research. And think. Don't expect everything to be just handed to you on golden platter. I know that you were taught this approach at school, but it's wrong. If you want to achieve anything in live, research. Learn by yourself. Nobody will do that for you, and while some may guide you a bit, you really need to find your way. Get your understanding of the subject. And figure out your application of what you've learned. Otherwise you're just a repeater, and you don't want to be that.

Posts

subscribe via RSS