Friday, 19 July 2013

Still on Google Translate

Longman is one of the best dictionaries ever written in terms of the English language because it is a contextualist dictionary: It frequently brings examples of applications for the sigmatoids (the symbols that form the word, all together, as they appear in the word) that  it associates with world references.

We can tell that it is absolutely impossible for a machine to always do competent work in translation just from knowing that, like if it ever does a document without committing a mistake, say a technical document, then it will commit a mistake in the next one, basically, like it is all random processes of luck.

Accuracy should be important.

We believe that any serious reader would like to visit the place where the person who created the original document was when the document was created as they read the translated version of such a document.

This trip can only happen if we have human hands driving the vehicle… . 

With GT, or any other replacement of GT, we will always be playing that chickens’ game (Atari, remember?):  If we are lucky, then we will make it to the other side without a scratch, when we will then get bonus points and all, but, in the vast majority of the time, even if we know what level we are from so much playing, we will be smashed way before that.

Certain professions can be tried by the adventurous person (data entry, for instance), and if they go in and out, it is not a big deal, but not Translation and Interpretation.

This way, unless the IT professional is really serious about it, and holds true passion for the processes we write about, better keeping good distance (Google Translate and all other pieces of software will obviously always depend on the hands that create their systems, that is, on the IT professional).

Amorzinho and others

On the last post, we wrote about the word amorzinho, which would be better translated into sweetheart if we are working with the pair (Portuguese; English).

Amorzinho is just one word, but if we start playing with contexts, perhaps whilst wondering about GT-possibilities, we will feel really bad very quickly.

  1. Amorzinho, traz o jornal para mim?Sweetheart, please bring the newspaper to me (not only punctuation needs to be changed, therefore intonation, but also a word needs to be added)
  2.  Vamos fazer um amorzinho?Let’s (please) make some sweet love! (words have been added and the punctuation has changed, but, more importantly, amorzinho got a different equivalent in the English language this time)
  3. Voce e um amorzinho de pessoaYou are a sweetie (a word has now been subtracted and there was no change in intonation)
  4. Amorzinho do papai, traz o jornal por favorDaddy’s baby, please bring the newspaper (yet another term to replace amorzinho and no addition or subtraction of words)

Basically, there are still many more contexts in which the word amorzinho could appear.

Translating linguistic tokens without considering context is therefore a very dangerous thing.

The more we know about the context, the more details we have, and the better our translated version of the text will be. 

The person who denies all this process, that of acquisition of information, when working as a linguist, has to be irresponsible, like to the least.

For a person to be good at doing translation work, even when it is all inside of the same language (in the case of those who make the lexicons, for instance), they really have to know quite a lot. They also need to have highly advanced processes of thinking (need to be able to master the upper levels of the Bloom’s Taxonomy).

For a person to be good at revising the work of translators, they need to be able to go even farther in the Bloom’s Taxonomy, since they have to, basically, evaluate the evaluation.

All these skills may be scientifically measured through exams.

However, the basic salaries of the translators and interpreters would have to be raised quite a lot for us to be able to demand so much from these professionals: They already spend so many resources, and so much from each one of them, to both become and keep on being professionals in the area... .

Sunday, 14 July 2013

Our chances against the machine: Will translators stop having a profession soon?

We decided to quickly test the machine (click here to see), thinking perhaps of the Turing Machine contest, and got the following results for the couple (Portuguese,English):

·         Simple sentences seem to do well if they do not escape machine reasoning, that is, if they are not, for instance, localisms. We try eu te amo and GT[1] (we will from now onwards call Google Translate GT) responds with I love you;
·         Localisms are ignored. We try é fogo, hein? and this is the most common carioca translation for it is quite unbearable, is it not? (like the not-so-elegant version of this sentence in their local language), and we get and fire, huh?
·         Technical expressions such as Rede Mundial de Computadores get a very good result, for GT comes up with World Wide Web, but the truth is that that is the translation of the word Internet too, and, if we try Internet, we get Internet.  It is not acceptable that we do not translate terms that can be translated into Portuguese according to the Brazilian linguists. We agree: This is to protect the language, to preserve it.  Besides, if we think like the purists, World Wide Web is actually Rede Mundial only, and Internet is Entre Redes (literal translation) or Rede Mundial de Computadores (recommendation of the Brazilian linguists); 
  • Simple, but technical, words, like Internet are therefore not finding good translation in GT, what is unexpected, since our work on Translation[2] points to all that is technical being passive of automation. Another easy example is município, which is municipality. If we use GT, we get município again for some reason.  Still to this side, cartório becomes registry in the GT system, but it should become registry office or office of births, deaths, and marriages, for instance (see our post here on the topic); 
  • GT seems to ignore subtleties of the Portuguese language: Things like the gender of the person, which is frequently passed through the endings of the words or the articles that come before them in Portuguese, seem to be completely ignored by the machine. For example, we try juíza, which should mean female magistrate or female judge in English, and we get judge. We try professora and we get teacher, not female teacher, and so on so forth; 
  • GT seems to actually ignore anything that be specific to the Portuguese language, that is, anything that does not mimic the English language. As another example, we have diminutives and augmentatives. For instance, we try cãozinho and we get doggy. Well, cãozinho is not doggy. Cãozinho is little dog. Doggy could perhaps be translated into cachorrada or cachorrinho (if a child is saying that) in Portuguese, but only very rarely into cãozinho (we believe that doggy style would definitely be better translated into estilo cachorrada than into estilo cãozinho). We try peninha and we get little feather, but peninha might mean little pitty as well, and this sense, little pitty, is also a localism. Everyone would agree that the distance between one and another is almost infinity. We try homenzarrão and, apparently because there is only one sense for it in Portuguese, things go well: Big man. However, big man is usually homem grande and homenzarrão should be super big man instead;
  • GT translates tu into thou, but tu is not thoutu is youVós is thouVós is translated into ye by GT. Because ye is the plural of thou and they translate tu into thou, however, everything seems to make sense; and
·         The best item ever found in GT is, we believe, amorzinho in the most common of its senses. GT translates this one into chickabiddy, which is chicken or child in English (see chickabiddy, Merriam-Webster).
Well, amorzinho actually means sweetheart, considering cultural equivalences (could be little love, if one goes literally, but that would imply extraordinary mistake in translation. This is not a word to diminish the size of the love, but to actually show huge amount of affection and consideration).

Perhaps we all know that, with machines, we will always need the human hand over the final product in order to provide a good translated version of any document, regardless of how simple the document is.

If we stick to the paradigms that we have created for the science of translation, then it is impossible to go without the human hand, since we need to adapt everything considering time of the production of the original, location, and all other aspects that we have classified as important.

According to the theory we have developed, we need to worry even about the style of the document, so that if the author said tu, we want to understand, upon reading the translated version of such a document, that that is you everywhere that be not a few special regions of Brazil (where people use tu instead of você to refer to the person next to them by the moment they speak), if we talk about Brazil.

The problem is that we really need to pass that information to the reader of the translated version of the document, so that we have to translate that tu into you, but we also have to add a note to explain that the original document brought tu, and that was because of the place where it has been produced or because of the person who wrote it or even because of the intentions of the writer (say that it is a play in which the characters are from the South of Brazil).

It is possible to insert all this information into a machine (we get the best translated versions of documents that we know of, scan them, and then record them in entries containing also dates and places where they have been produced), and consider it as we try to translate texts with GT (say that we enter the date and the location of the document that we have in our hands in the system as well).

Notwithstanding, it is not possible to get an equivalent for a particular term that works for all documents.

It is not possible to get an equivalent that works for all documents for an expression either.
The computer will always have to come up with a set of choices if things are done in a serious manner.

GT will never be a perfect tool, regardless. One of the reasons for that is that language is always being created.

Even if all translators of this world were entering their translated versions of documents with their original documents, and inserting all this data we talk about as they do that, into a system, twenty four hours a day, seven days a week, the system, regardless of its nature, would still not be complete.

However, it would be a much better system, since it would allow for the translation of more pieces of text in a reliable manner.
If that has already happened or if that is what is happening now somehow, and the own GT says that this is more or less what is currently happening[3], we would think that we should start charging royalties for every expression coined by a translator (in translation/equivalences).

The date of the creation would have to be irrelevant, since we would have to start from today.

We would then obviously be paying royalties to all the translators whose data has been inserted into the system each and every time someone uses their work.

It would not be fair worrying about these issues in music and writing and not worrying about these issues in translation... .

Obs.: All sources mentioned in this text have been consulted on the fifteenth of July of two thousand and thirteen.


Saturday, 13 July 2013

Localisms versus accuracy

Finding an equivalent in another language is sometimes the same as entering one of Tom Cruise’s movies from the series Mission Impossible in the body of Tom Cruise without leaving the Watchers’ World.

Finding perfect, or close-to-perfect, matches is something so difficult that we think that only those linguists who have lived in the Country of their target language could possibly be attempting to translate or interpret into it.

Ideally, linguists would know the culture of both the target and source countries.

Sometimes, however, knowing the culture of both the target and source countries is not enough, since the linguist would actually have to know the culture of the specific location that they are targeting to do a good job (see our previous article with PROz: cacetinho and pao privado, for instance).

We believe that English is English everywhere, is it not? If a people declare officially that they speak English, they must speak English, right?

The problem is actually the corruption of the language in the specific location we target.

For instance, the English of England would be at least sometimes completely different from the English of Australia.

We look in the dictionary (Longman’s) and we see that a Registry Office is a local government building in Britain where you can get married, and where births, marriages, and deaths are officially recorded (Pearson Education, 2005).

In Australia, we have the Office of Births, Deaths, and Marriages instead.

In Brazil, there are places where we can get married and record our signatures for posterior certification. These places are called cartorios and we have to pay fees to get our documents certified by them. We cannot get our documents certified in any other place.

In Portugal, all public officers and solicitors, for instance, can certify our documents, and we have to pay a fee for that ((7Graus, 2011-2013) and (Ordem dos notarios, 2007)).

Still in Portugal, if we want to get married, we can do that in a conservatoria (7Graus, 2011-2013b).

In Australia, certification of documents is free, and any judge of peace can do that for us (Attorney-General’s, 2013). Because becoming a judge of peace is not that hard (Attorney-General’s, 2013), we can find them in several places, including banks and libraries.

In England, we can get our documents certified for free, but we may also have to pay: All depends on who we go for (, 1999-2010).

We then understand that translating and interpreting, in this case, has to imply possessing detailed knowledge of the standard systems of all these so different countries, even with the pair being always the same.

We could then blame the lexicons and say that they are incomplete or not properly built.

It might be that including cartorio in the dictionary would not be much trouble, even if we had to list all the cultural equivalents, say for each and every Country that speaks English or Portuguese... .

However, when we look at expressions of the type bowel movements, we notice that only deep understanding and knowledge of both cultures could do the trick.

We can actually find bowel movements, precisely like that, in a few dictionaries. Longman, for instance, states that bowel movements is the act of getting rid of solid waste from your body.

Well, excrement is as solid as mucus at least sometimes, so that Longman’s explanation is at least incomplete. If you think that secretions are different from waste, then think of vomit, for instance, since vomit contains solid elements at least sometimes and has body waste as one of its synonyms (Collins, 2002).

For a person from another culture to understand that what we refer to is the solid waste from our intestine after reading Longman’s definition of bowel movements, they perhaps would have to check the definitions of bowel and movements.

Still to the side of criticizing the definition of bowel movements of one of the best dictionaries ever written, in terms of the English language, the Longman Dictionary, is the following remark: Our excrement does not need to be solid and, at least sometimes, it will be a sort of liquid, especially if we have diarrhea.

We could interpret the just-mentioned expression as movimentos do intestino by means of literal interpretation. This would be understood at the other end, but perhaps only after a certain amount of repetitions.

Ideally, we would simply use the local expression for such a thing, that is, what a doctor from Brazil, for instance, would be saying in place of bowel movements in the same sort of situation. We would then say digestao.

Some purists would then argue that this is an incorrect choice of terms because we have the word digestion in English and, if the doctor wanted to talk about that, they would have said digestion.

Notwithstanding, we have to know the culture of the Country where we work and the difference between interpreting and translating.

In Brazil, the amount of people that would acknowledge a movement of their own bowel is, pushing it, probably something like five percent.

Of course they should be able to understand any question of the sort how are your bowel movements going? but it is very likely that they do not.

We can then imagine that people in Australia, for instance, are so superior in all that they worry quite a lot about their own health: They worry to the point of monitoring, on a daily basis, or at least on a frequent basis, the movements of their own intestines.

Well, congratulations to them. They would then be able to answer such a question with oh, yesterday I accompanied the movements of my bowel and the contractions were exactly the same, in pattern, that I saw it making when I was five or something like that.

In a place like Brazil, however, the maximum that the local standards of intelligence and observation would allow for is: Yes, I am going to the toilet as usual.

There is then no point in agreeing with the purists and saying bowel movements in Portuguese (movimentos do intestino) when serving someone in the quality of professional interpreter. We should, in this particular case, sacrifice detailing in the name of information and communication.

If the doctor really meant it, and that is rarely the case, this also in Australia, then they will reword that question in a way to make communication possible.

If we are serving someone in the quality of professional translator, however, we should probably say movimentos do intestino and make a note on the cultural differences involved (footnote).

We should not use movimentos peristalticos, which is the most we could be hearing in Brazil for this one. The reason for this advice is that we have the word peristaltic in the English language and this word has not appeared in our original expression.

We then notice that the work of the linguist is, most of the time, happening inside the upper levels of the Bloom’s scale (Overbaugh, 2011), that is, inside Evaluation, Synthesis, and Analysis. Only rarely does it happen inside the lower levels (Application, Comprehension, and Knowledge).

A good linguist therefore has to be a person who has at least an inquisitive mind, since remaining in the lowest levels of the scale will lead to a large probability of increase in difficulty of communication amongst people.


