Your projects can evolve

Some time ago I’ve created a tool which was testing our Translation Memories (TMs). It was useful, because they’ve become corrupted quite often and sooner we knew about it the better. So I’ve set it to run every day. These days problems with TMs are sporadical, but I’m still running it just in case. Process is automated and all it requires from me is to read short report each day, so no biggie.

Agile over waterfall

Agile is the hype these days and there’s probably no single startup which isn’t using this approach right now. In big corporations the adoption is slower, but they are also moving toward it as it simply works. Develop in small steps, measure how well product works so far, move forward or take a step back and fix. Data is really crucial here. Measure how well new improvement helped you with your problem, or measure how it’s made some things harder, as it can go both ways.

I'm an Imposter

I’m interested in computers since I’ve learnt that they exist. And I’ve always wanted to work with them. But life is strange and I’ve landed with MA in Marketing and Management. Thankfully these days education doesn’t chain you to one type of work and currently I work as a, sort of, software engineer. But I’ve been always feeling like an imposter, because I don’t have official CS education. The Imposter’s Handbook to the rescue!

Why BOM?

BOM is a Unicode character, U+FEFF byte order mark (BOM). In context of UTF-8 the most import thing is that it confirms that the file is UTF-8 encoded (most probably). Because there’s no other way to be almost certain (you never get 100% confidence with encodings). Of course there are methods of heuristic analysis which can offer high accuracy. I myself am using Mozilla Universal Charset Detector, but it’s still guessing.

Invisible improvement

This year alone I’ve written more than thousand lines of code for various libraries. And I’ve started to write libraries;) Each of my methods is unit tested. My classes are as well designed as it’s possible basing on my current knowledge. I can write new automation script for our servers in the matter of hours (sometimes minutes) comparing to days not so long ago. I’m able to test most of the things before they’ll get on the server, so I’ve almost eliminated vicious cycle of build-deploy-check-fix.

My new idol

I’m very interested in computer security. From the beginning I’ve had an idol, or idols even, who I’ve followed and learned from. I think it’s started with Kevin Mitnick after reading his book The Art Of Intrusion. Which was not exactly what I was hoping for. I’ve not known much about computers back then, so I’ve thought I would read more about scripts and such rather than how to trick a secretary.

Anonymize DOCX Comments

Word document is still popular format in localization industry. Translators and reviewers are using tracked changes to discuss potential translation issues. Often times these people are from different companies and it’s required to preserve their anonymity. Word has functionality to make documents anonymous, but it’s blunt tool which will wipe all private information. And you want to know which comment comes from which person. At some point in my career I was introduced to this solution.

Multithreading is not always best solution

I’ve had a problem where I’ve needed to clean some stuff from files and then upload them to the server. Cleaning part was fast, for more than a hundred of files (2 GB in size) it took less than a minute. Problem was with upload. It took several hours. So, I’ve thought, let’s try doing it in parallel. I’ll launch upload on so many threads as I have, and my workstation has a Xeon, it should help.

Sometimes you need to hack it (MT and memoQ)

memoQ is great product, I’m saying it a lot. But some features need a bit polishing and one of them is support for Machine Translation (MT). It’s done through series of plugins. I’ve just used two so far. Pseudo-translation which is fine, but limited. Although they’ve enhanced it in Adriatic version. Second plugin is for Google Translate. There’s not much in terms of configuration, you set the API key, can specify regex which matches will be ignored in MT process, and enable option to put the tags from source at the end of the translation.

Don't oversell

Overselling happens when you have a server which can handle 10 users, but you’re selling it to 15, because you assume they’ll not utilize all resources assigned to them. Or that you’ll manage to add more resources before users would start using them fully. It’s the most broken assumption in tech industry which almost never works, drives customers mad and ruins your brand. And that’s exactly what’s happened to me recently.