For anyone who has spent any time trying to research subjects in the MENA region, they will doubtless be aware of the difficulties in confidently identifying target individuals and organisations. Whilst almost all international business affairs are conducted using names in their English form, official registration documentation tends, unsurprisingly, to use exclusively the Arabic or Farsi forms. There is therefore a need to translate, or transliterate (replacing each letter with its closest corresponding letter), from one form to the other but also to be aware of the common pitfalls that these transformations can result in.
For example, let’s say you were interested in the Iranian importer with the Farsi name گلستان. This could be transliterated as “Golestan”, “Gulistan”, “Gulistān”, or even “Golistan” or “Gulestan”. The name could also be translated as “Rose Garden”. To produce a faithful and regular rendering of this name, we would first look to see if the company has an online presence with an ‘English’ name. In the case ofگلستان it does, so we can see that the company style themselves as “Golestan” in English, rather than, for example, using the translation ‘Rose Garden’. The spelling of Golestan also follows the method we at Diligencia would normally use for transliteration, with ‘o’ and ‘e’ used to represent short vowels and ‘a’ used to represent both the long and the short ‘a’ sound. A diacritical marker could be used over the ‘a’ to demonstrate that in the Persian original, the letter is a long ‘a’, but diacritics are generally superfluous in the commercial sector.
At Diligencia, we have a variety of techniques to assist with quickly identifying the correct target person or company record. We have incorporated these methods into the technical solutions that our team use to maintain our database and also into the Clarified By Portal and API (Application Programming Interface) that allow our customers to directly access our data.
The techniques include:
- Ensuring that searches are diacritic agnostic and will return the same set of results regardless of whether diacritics are used in either the search term or the target records.
- Supporting wildcarded searches using the asterisk (*) to represent one or more characters that are unclear or ambiguous. For example, using “m*h*m*d ali akbar” to will return results for people called Mohammed Ali Akbar regardless of whether their name is recorded in any one of its many variants (e.g. Muhamed, Muhammad, Muhammad, Mohammad, Muhammed, Mhamed, Mhammad, or Mhammed)
- Providing a fuzzy logic search option that replaces potentially ambiguous characters or letter sequences to automatically widen the search criteria in use.
Diligencia’s unique combination of skills allows us to render Arabic and Farsi terms in the most appropriate English equivalent and produce the best technology to make such data searchable by our clients.