Peter's Blog

Recently I’ve revived my subscription to Wired. I’ve bought yearly subscription for $5 which is a great price. I’ve used my Gmail address when ordering it and received confirmation to this address. However, after logging in I’ve found out my subscription wasn’t active and I couldn’t activate it. Let’s create a support ticket I’ve thought and off to the support page I went. I needed to log in again and it’s asked me to activate my subscription, again.

Some time ago I’ve created a tool which was testing our Translation Memories (TMs). It was useful, because they’ve become corrupted quite often and sooner we knew about it the better. So I’ve set it to run every day. These days problems with TMs are sporadical, but I’m still running it just in case. Process is automated and all it requires from me is to read short report each day, so no biggie.

Agile is the hype these days and there’s probably no single startup which isn’t using this approach right now. In big corporations the adoption is slower, but they are also moving toward it as it simply works. Develop in small steps, measure how well product works so far, move forward or take a step back and fix. Data is really crucial here. Measure how well new improvement helped you with your problem, or measure how it’s made some things harder, as it can go both ways.

I’m interested in computers since I’ve learnt that they exist. And I’ve always wanted to work with them. But life is strange and I’ve landed with MA in Marketing and Management. Thankfully these days education doesn’t chain you to one type of work and currently I work as a, sort of, software engineer. But I’ve been always feeling like an imposter, because I don’t have official CS education. The Imposter’s Handbook to the rescue!

BOM is a Unicode character, U+FEFF byte order mark (BOM). In context of UTF-8 the most import thing is that it confirms that the file is UTF-8 encoded (most probably). Because there’s no other way to be almost certain (you never get 100% confidence with encodings). Of course there are methods of heuristic analysis which can offer high accuracy. I myself am using Mozilla Universal Charset Detector, but it’s still guessing.

This year alone I’ve written more than thousand lines of code for various libraries. And I’ve started to write libraries;) Each of my methods is unit tested. My classes are as well designed as it’s possible basing on my current knowledge. I can write new automation script for our servers in the matter of hours (sometimes minutes) comparing to days not so long ago. I’m able to test most of the things before they’ll get on the server, so I’ve almost eliminated vicious cycle of build-deploy-check-fix.

I’m very interested in computer security. From the beginning I’ve had an idol, or idols even, who I’ve followed and learned from. I think it’s started with Kevin Mitnick after reading his book The Art Of Intrusion. Which was not exactly what I was hoping for. I’ve not known much about computers back then, so I’ve thought I would read more about scripts and such rather than how to trick a secretary.

Word document is still popular format in localization industry. Translators and reviewers are using tracked changes to discuss potential translation issues. Often times these people are from different companies and it’s required to preserve their anonymity. Word has functionality to make documents anonymous, but it’s blunt tool which will wipe all private information. And you want to know which comment comes from which person. At some point in my career I was introduced to this solution.

I’ve had a problem where I’ve needed to clean some stuff from files and then upload them to the server. Cleaning part was fast, for more than a hundred of files (2 GB in size) it took less than a minute. Problem was with upload. It took several hours. So, I’ve thought, let’s try doing it in parallel. I’ll launch upload on so many threads as I have, and my workstation has a Xeon, it should help.

memoQ is great product, I’m saying it a lot. But some features need a bit polishing and one of them is support for Machine Translation (MT). It’s done through series of plugins. I’ve just used two so far. Pseudo-translation which is fine, but limited. Although they’ve enhanced it in Adriatic version. Second plugin is for Google Translate. There’s not much in terms of configuration, you set the API key, can specify regex which matches will be ignored in MT process, and enable option to put the tags from source at the end of the translation.

Journalists suck at tech

Your projects can evolve

Agile over waterfall

I'm an Imposter

Why BOM?

Invisible improvement

My new idol

Anonymize DOCX Comments

Multithreading is not always best solution

Sometimes you need to hack it (MT and memoQ)