Compared with human evaluators, machine translation (MT) evaluation tools have demonstrated that they are faster, cheaper, and more consistent in scoring the correctness of translated text. The metrics they generate can adequately compare MT-generated translations to a gold standard, but they fail to address the challenge of choosing the most appropriate path through intermediate languages in a multi-stage translation;in other words, they do not take MT transitivity into account. We propose a novel approach called translational score maps to extend the power of these evaluation tools. The purp...