Want to be efficient go with the API (memoQ and TMX)

Recently I was posed with the task of backing up Translation Memories (TMs) from memoQ server into TMX format. You can of course do it from memoQ desktop client, however taking into account number of TMs we have it would mean at least 400 clicks every time backup is needed. It would be quite a dull task and as a human being you can always miss one or two TMs in the process, and you don't want that.

memoQ API to the rescue! Kilgray has made its API pretty robust. And of course exporting TMs to TMX is implemented there. If you'll talk nice to their support they'll send you documentation including simple demo client written in C#. But you don't have to limit yourself to one programming language. I'll show you how to do it using Python. I assume you have some knowledge of this language as I'm not going to explain the code in details here.

As memoQ provides SOAP API we need a library which will allow us to connect to it. I'm using suds, but don't use the one installed via pip. Use the fixed version from this guy, he fixed the issue with recurrence limit which is a pain when it comes to this particular API. You'll also need base64 library as data stream you'll receive will be encoded. That's it, just two libraries and you're ready to go. My sample script, available below, is using more libraries, but it's for all these flashy things like progress bar, path creation, etc.

Let's start. As with every service first you need to connect to it.

wsdl_url = 'http://127.0.0.1:8080/memoqservices/tm?wsdl'
client = Client(wsdl_url)

Of course if your server is not on the same machine from which you're trying to connect or its using different port for WS API you need to adjust the first line accordingly. I assume you know how to enable API on memoQ server, if not go read the documentation.

OK, you're in. Now you need to check what TMs are available on server.

allTMs = client.service.ListTMs()

You can provide ListTMs() method with arguments, so if you want just English to German TMs list go with:

allTMs = client.service.ListTMs(eng, ger)

We have our list, so let's download something. You need a method to do that and I won't explain what it does step by step, sorry. Figure it out and maybe you'll write something better in the process.

def downloadTMX(outputFilename, guid):
    try:
        sessionID = client.service.BeginChunkedTMXExport(guid)
        with open(outputFilename, 'wb') as output:
            chunk = 1
            while chunk is not None:
                chunk = client.service.GetNextTMXChunk(sessionID)
                if chunk is not None: output.write(base64.b64decode(chunk))
    finally:
        client.service.EndChunkedTMXExport(sessionID)

As you can see it takes two arguments, name of the output file (TMX) which actually should be a full path to it, and GUID of TM you want to download (its unique identifier). You can get both when iterating through previously acquired TM list.

for tm in allTMs[0]:
  filename = 'c:\\somefolder\\' + tm.Name + '.tmx'
  downloadTMX(filename, tm.Guid)

Just change somefolder to real location where you want your TMX files. That's it, it's so simple.

You can spice things up by filtering TMs by name.

for tm in allTMs[0]:
  if ('Customer' in tm.Name):
    filename = 'c:\\somefolder\\' + tm.Name + '.tmx'
    downloadTMX(filename, tm.Guid)

Only TMs with Customer in name would be downloaded this time. Of course you can filter by any other parameter, read API's documentation and have fun.

Below I'm sharing full script which is doing exactly what I've described above. API address is hardcoded there, so just adjust wsdl_url variable if needed. It takes download path as first parameter and if you'll add "-k something" (one word please, no spaces) it'll perform by name lookup for the TMs. Oh, and it gets all language combination, so if you need just some, well, adjust it.

And if you've not understood a word from what I've just explained, but found this whole idea interesting. Hire an engineer!

from suds.client import Client
import base64
import progressbar
import argparse
import os

#parse arguments
parser = argparse.ArgumentParser(description='Download memoQ TMs to TMX')
parser.add_argument(dest='path', action='store', help='specify download path')
parser.add_argument('-k', dest='keyword', type=str,
                    help='specify keyword for TM name lookup')
args = parser.parse_args()

wsdl_url = 'http://127.0.0.1:8080/memoqservices/tm?wsdl'
client = Client(wsdl_url)

allTMs = client.service.ListTMs()

guids = dict()
for tm in allTMs[0]:
    if (args.keyword != None and args.keyword in tm.Name):
        guids[tm.Guid] = tm.Name
    elif (args.keyword == None):
        guids[tm.Guid] = tm.Name

def downloadTMX(outputFilename, guid):
    try:
        sessionID = client.service.BeginChunkedTMXExport(guid)
        with open(outputFilename, 'wb') as output:
            chunk = 1
            while chunk is not None:
                chunk = client.service.GetNextTMXChunk(sessionID)
                if chunk is not None: output.write(base64.b64decode(chunk))
    finally:
        client.service.EndChunkedTMXExport(sessionID)

bar = progressbar.ProgressBar(maxval=len(guids), \
    widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()])
i = 0
bar.start()
for guid in guids:
    filename = os.path.join(args.path, (guids[guid] + '.tmx'))
    downloadTMX(filename, guid)
    bar.update(i + 1)
bar.finish()

Posts

subscribe via RSS