Lifeline for non-English-speaking web users

17 February 2014

<b>Opening access:</b> Dr Mohammed Mustafa Ali has developed a way to create 'language-aware' internet search engines that can process queries in multiple languages, and even throw up mixed-language results. — **Opening access:** Dr Mohammed Mustafa Ali has developed a way to create 'language-aware' internet search engines that can process queries in multiple languages, and even throw up mixed-language results.

Although increasing numbers of people are taking their multilingual practices into their online communication, search engine technology had not kept pace with this trend - until UCT-trained computer scientist Dr Mohammed Mustafa Ali came up with a solution.

Ali, who hails from the Sudan and whose mother-tongue is Arabic, became acutely aware that technology driving internet search engines had not kept up with the online practice of continually switching between two languages - a habit that is particularly prevalent in the non-English-speaking world.

So during his doctoral studies he put his computer science skills to work and devised algorithms that, he argues, can create "language-aware" search engines, able to recognise searches made in multiple languages. These algorithms could also potentially empower search engines to throw up mixed-language results, opening up new worlds of information to non-native English-speakers.

"The algorithms proposed in this thesis address the Web searching needs of such non-English speakers, who often need the most relevant information rather than just retrieving documents containing exactly their query terms," says Ali. "The proposed algorithms could empower and present a direction for future search engines, which should allow multilingual users (and their multilingual queries) to retrieve relevant information created by other multilingual users.

"Thus, it could have significant outcomes for languages with limited modern vocabulary, mostly those non-English ones, in developing countries."

Language-mixing, or code-switching, as it is known, is common in multilingual communities in which the barrier between cultures is porous, with people using more than one language in their everyday interactions.

"In such communities, natives are able to express some keywords in languages other than their native tongue or vice versa," Ali says. "From personal experience, the typical Arabic speaker speaks a mixture of tightly-integrated words in both English and Arabic, and in various slang variants. "Hong Kong speakers typically speak Cantonese with many English words. Capetonians speak English with many scattered Afrikaans words, and/or local slang."

Current search engines and traditional Information Retrieval (IR) systems perform poorly when handling multilingual querying, because in most cases they fail to provide the most relevant documents, explains Ali. This, he says, is due to two reasons.

"First, the underlying assumption in IR is that users post queries in their native tongues. Second, most traditional IR systems depend primarily on similarity ranking methods that are based solely on monolingual computations and statistics, without taking into account the multilingual text in multilingual queries.

"Ignorance of this feature causes the most dominant documents on the ranked retrieval list to be those documents that contain exactly the same terms as in the multilingual query, regardless of its languages."

Ali maintains that his algorithms will enable non-native English speakers to access large swathes of online knowledge previously difficult to find.

"With information globalisation and moving towards an international community, it becomes essential not to constrain non-English-speakers, such as Arabic users, to single languages," he says. "There are many problems introduced by the explicit handling of multiple languages, but the algorithms and experiments conducted demonstrate that these problems can be adequately resolved in an IR system.

"The evidence suggests that language-awareness and mixed-language solutions are feasible for IR systems, without diminishing the quality of results."

Story by Yusuf Omar.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Please view the republishing articles page for more information.

Latest articles

Research Announcement: UCT research news, funding and other opportunities 23 Apr 2026

The story of June 16th at The Baxter Two-time Fleur du Cap award-winning playwright-director Tiisetso Mashifane wa Noni presents ‘Rise ‘76: The Story of June 16th’. 22 Apr 2026

UCT launches world-class Liver Centre SA The centre is the first coordinated liver care model of its kind in the country. 22 Apr 2026

UCT researchers uncover molecular ‘switch’ that fuels cancer progression Researchers UCT’s Scientific Computing Research Unit have uncovered a critical molecular ‘switch’ that drives the formation of cancer-associated antigens. 22 Apr 2026

Encouraging health adherence The app proposes patient-friendly explanations of chronic conditions, medications, dosing, side effects, and warning signs, all presented in an accessible way. 22 Apr 2026

Doctoral Studies

Salt in the wound 17 Feb 2014

Doctoral Studies

Psychosocial support crucial for caregivers in AIDS-ravaged communities 17 Feb 2014

Doctoral Studies

Namibian astronomer reaches for exploding stars 17 Feb 2014

Doctoral Studies

Baker's asthma no cakewalk 17 Feb 2014

Lifeline for non-English-speaking web users

Most read

Latest articles

Related