How Gulf-developed large language models like Jais are bringing Arabic into the AI mainstream

As Gulf states aim to become AI leaders by investing in R&D and startups (Supplied/MBZUAI)
Short Url
Updated 09 October 2023
Follow

How Gulf-developed large language models like Jais are bringing Arabic into the AI mainstream

  • ChatGPT understands inquiries in Arabic, but answers can sound unnatural or fail to convey the right message
  • Now homegrown LLMs can capture linguistic nuances and even comprehend dialects and cultural references

DUBAI: When ChatGPT made its debut last year, the artificial intelligence program caused a global sensation, as users found themselves communicating with a machine that could pass as another human being.

However, the enthusiasm among techies in the Arab world was somewhat diminished by ChatGPT’s limited grasp of Arabic, in part the result of the language’s complexity, diacritical markings, inflection system and regional dialects.

Although ChatGPT, which is based on a large language model, or LLM, can understand inquiries in Arabic and is able to translate, especially when using Modern Standard Arabic, answers can come across as unnatural, while literal translations do not always convey the right message.

That is why Jais, an LLM designed to support Arabic, was unveiled in July, bringing one of the world’s most widely spoken, though occasionally overlooked, languages into the AI mainstream.

Jais, a name that recalls the UAE’s highest peak in Ras Al-Khaimah, is the brainchild of a team of academics and engineers who embarked on the project because they felt too few LLMs were credibly multilingual.




The Ameca humanoid robot greets visitors at Dubai's Museum of the Future. (AFP)

Downloadable on the machine learning platform Hugging Face, Jais is the result of a collaboration between Cerebras Systems, Mohamed bin Zayed University of Artificial Intelligence, or MBZUAI, and a subsidiary of the Abu Dhabi-based G42 called Inception.

“It is vital that large language models are developed for languages other than English to ensure that innovation is accessible to everyone,” Andy Jackson, CEO of Inception, told Arab News.

“A quality Arabic LLM is critical for all sectors, businesses and organizations, as well as individuals. Innovation thrives when we collaborate, and Jais sets a new standard for AI advancement in the Middle East, ensuring that the Arabic language, with its depth and heritage, finds its voice within the AI landscape.

“Jais demonstrates our commitment to excellence, and our dedication to democratizing AI and promoting innovation.”

LLMs are functional machine learning models that use deep learning algorithms to process and understand natural human language. These models are then trained on large amounts of text data to learn patterns in the language.

These programs, which are rapidly proliferating in the wake of ChatGPT’s success, are capable of generating text on a seemingly endless array of subjects, producing everything from academic papers to poetry.

What is especially impressive about them is their ability to create responses to questions that are so convincingly human-like in almost any language, including coding.

But in order to make those languages sound convincing, native-speaking human programmers are often required to provide a critical layer of context and understanding that can enhance accuracy and reliability.

“Jais is purpose-built for the Arabic language and excels in capturing its intricacies and nuances, ensuring highly accurate and contextually relevant responses — a distinct advantage over general-purpose models,” said Jackson.




AI programs that are responsive to the Arabic language could widen access to a transformational new technology. (MBZUAI)

“This specialization is a pivotal development, opening up opportunities for governments, industries, and individuals across the Arab world to tap into the potential of generative AI.”

Currently considered among the foremost Arabic LLMs, Jais, a 13-billion parameter model, was trained on a newly developed 395-billion-token Arabic and English dataset on Condor Galaxy, one of the largest cloud AI supercomputers in the world, launched by G42 and Cerebras in July using 116 billion Arabic tokens and 279 billion English tokens.

“Jais was born in Abu Dhabi and offers more than 400 million Arabic speakers the opportunity to harness the potential of generative AI,” Preslav Nakov, professor and deputy department chair of Natural Language Processing at MBZUAI, told Arab News.

“It will facilitate and expedite innovation, highlighting Abu Dhabi’s leading position as a hub for AI, innovation, culture preservation and international collaboration.”

As an open-source model, Jais is expected to engage scientists, academics and developers to accelerate the growth of a an Arabic language AI ecosystem. It could also serve as a model for other languages now underrepresented in mainstream AI.

FASTFACTS

• Large language models, or LLMs, are a type of AI that can mimic human intelligence.

• Arabic is spoken by 400m people, but accounts for 1 percent of total global online content.

• Jais was created by Cerebras, MBZUAI, and a subsidiary of G42 called Inception.

“Jais outperforms existing Arabic models by a sizable margin,” said Nakov. “It is also competitive with English models of similar size despite being trained on significantly less English data.

“This exciting result shows that the model’s English component learned from the Arabic data and vice versa, opening a new era in LLM development and training.”

In Jais’s development, significant attention was devoted to pre-processing Arabic text, enhancing support for the language’s unique features, including its writing style and word order.

Jais also maintains a balanced Arabic-English dataset focus for optimal performance, offering a marked improvement over models with a limited Arabic text presence.

Its developers say Jais, unlike other models, captures linguistic nuances and even comprehends various Arabic dialects and cultural references.

“Jais facilitates faster customization for specific Arabic-focused use cases and addresses data ownership concerns by being based in the UAE, offering a reassuring solution for local enterprises,” said Inception CEO Jackson.




LLMs are functional machine learning models that use deep learning algorithms to process and understand natural human language. (Supplied)

The UAE’s Ministry of Foreign Affairs and Ministry of Industry and Advanced Technology, Abu Dhabi’s National Oil Company and Department of Health, Etihad Airways, First Abu Dhabi Bank, and global technology group e& are planning to utilize Jais, offering valuable insights to enhance the model and its applications across their industries.

Given the strong digital transformation efforts by several of the Arab Gulf governments, accompanied by huge investments in high-tech industries and homegrown tech startups, AI programs that are responsive to the Arabic language could widen access to a transformational new technology and challenge the monopoly of a clutch of Silicon Valley companies.

Last month, Technology Innovation Institute, an Emirati research center in Abu Dhabi, released Falcon 180b, an open-source AI model. Established in 2020, TII released Falcon 40b, the first version of its flagship open-source AI model, in May this year, after unveiling Noor, an Arabic-based AI model, last year.

According to a report in The Economist magazine, TII is the applied-research arm of the Advanced Technology Research Council, a government agency that employs an 800-strong multinational staff working on subjects from biotechnology and robotics to quantum computing.

“We are entering the game to disrupt the core players,” Faisal Al-Bannai, secretary-general of the ATRC, told The Economist, adding that TII will build new proprietary models and applications catering for specific fields such as medicine and law.

For its part, Saudi Arabia launched its National Strategy for Data and Artificial Intelligence in October 2020, aiming to become a global leader in the field as it seeks to attract $20 billion in foreign and local investments by 2030.

The Kingdom is also determined to future-proof its workforce, initially by training and developing a pool of 20,000 AI and data specialists. In May this year, Deloitte’s AI Institute was officially launched at the Experience Analytics conference in Riyadh.

Just last week Saudi Arabia launched a National Olympiad for Programming and Artificial Intelligence open to all middle- and high-school pupils. An estimated 300,000 students will be selected from 3 million participants for training in programming and AI, according to media reports.




The hope is that the advent of AI and the automation of rapid translation will be a game changer for Arabic content. (LEAP)

The initiative is a collaboration between the Saudi Data and Artificial Intelligence Authority, in collaboration with the Ministry of Education and King Abdulaziz and His Companions Foundation for Giftedness and Creativity (Mawhiba).

Saudi Arabia’s adoption of digitalization and emerging technologies is forecast to contribute about 2.4 percent to its gross domestic product by 2030, according to a recent report by global consultancy firm PwC.

In terms of average annual growth in the contribution of AI by region, Saudi Arabia is expected to grab a 31.3 percent share in the technology’s expansion between 2018 and 2030, the PwC report added.

“AI is developing rapidly, and its impact will be felt more and more across all sectors and areas of life,” said MBZUAI’s Nakov. “In this context, it is vital that the Arab world has access to an advanced LLM that can be adapted and utilized across all sectors.

“The rapid advancement of AI means that organizations that fail to adapt and start using AI sooner rather than later will be left behind, which makes it even more essential for the Arab world to have access to quality LLMs.”

Beyond its business applications, however, a crucial aspect of a program such as Jais is its ability to champion neglected languages, preserve them in a fast-changing economy, and promote digital inclusivity.

Although Arabic is an official language in 22 countries and is partly spoken in 11 others, it accounts for just 1 percent of total global online content, according to Jais’s creators. The hope is that the advent of AI and the automation of rapid translation will be a game changer.

By placing the language at the forefront of the AI revolution, Jais and its successors could help to maintain Arabic’s global prominence and its distinctive cultural significance in the digital age.


Blinken says Israel needs a clear and concrete plan for Gaza’s future

Updated 5 sec ago
Follow

Blinken says Israel needs a clear and concrete plan for Gaza’s future

“We do not support and will not support an Israeli occupation. We also of course, do not support Hamas governance in Gaza...” Blinken said
Israel says it intends to keep overall security control and has baulked at proposals for the Palestinian Authority to take charge

KYIV: Israel needs a clear and concrete plan for the future of Gaza where it faces the potential for a power vacuum that could become filled by chaos, US Secretary of State Antony Blinken said on Wednesday.
Washington and its ally Israel say Hamas cannot continue to run Gaza after militants from the group ignited the conflict with attacks on southern Israel that killed 1,200 people on Oct. 7.
“We do not support and will not support an Israeli occupation. We also of course, do not support Hamas governance in Gaza... We’ve seen where that’s led all too many times for the people of Gaza and for Israel. And we also can’t have anarchy and a vacuum that’s likely to be filled by chaos,” Blinken said during a press conference in Kyiv.
The US top diplomat has held numerous talks with Israel’s Arab neighbors on a post-conflict plan for Gaza since Israel vowed to root out Hamas from the Palestinian enclave more than seven months ago.
But Israel says it intends to keep overall security control and has baulked at proposals for the Palestinian Authority, which governs with partial authority in the Israeli-occupied West Bank, to take charge.
“It’s imperative that Israel also do this work and focus on what the future can and must be,” Blinken said. “There needs to be a clear and concrete plan, and we look to Israel to come forward with its ideas.”

Turkiye tells US that Israel’s attack on Rafah unacceptable, Turkish source says

Updated 7 min 7 sec ago
Follow

Turkiye tells US that Israel’s attack on Rafah unacceptable, Turkish source says

  • Fidan also told Blinken that it was important to achieve a ceasefire in Gaza as soon as possible

ANKARA: Turkish Foreign Minister Hakan Fidan told his US counterpart Antony Blinken in a call on Wednesday that Israel’s attack on the Gazan city of Rafah is unacceptable, a Turkish diplomatic source said.
Fidan also told Blinken that it was important to achieve a ceasefire in Gaza as soon as possible, while emphasising that obstacles to the access of humanitarian aid into the enclave must be removed, the source said.


Ireland to recognize Palestinian statehood ‘this month’: FM Martin

Updated 3 min 52 sec ago
Follow

Ireland to recognize Palestinian statehood ‘this month’: FM Martin

  • FM Micheal Martin: ‘We will be recognizing the state of Palestine before the end of the month’
  • Martin: ‘The specific date is still fluid because we’re still in discussions with some countries in respect of a joint recognition of a Palestinian state’

DUBLIN: Ireland is certain to recognize Palestinian statehood by the end of May, the country’s Foreign Minister Micheal Martin said on Wednesday, without specifying a date.
“We will be recognizing the state of Palestine before the end of the month,” Martin, who is also Ireland’s deputy prime minister, told the Newstalk radio station.
In March the leaders of Spain, Ireland, Slovenia and Malta said in a joint statement that they stand ready to recognize Palestinian statehood.
Ireland has long said it has no objection in principle to officially recognizing the Palestinian state if it could help the peace process in the Middle East.
But Israel’s war against Hamas militants in Gaza has given the issue new impetus.
Last week, EU foreign policy chief Josep Borrell said Spain, Ireland and Slovenia planned to symbolically recognize a Palestinian state on May 21, with others potentially following suit.
But Martin on Wednesday shied away from pinpointing a date.
“The specific date is still fluid because we’re still in discussions with some countries in respect of a joint recognition of a Palestinian state,” he said.
“It will become clear in the next few days as to the specific date but it certainly will be before the end of this month.
“I will look forward to consultations today with some foreign ministers in respect of the final specific detail of this.”
Last month during a visit to Dublin by Spanish premier Pedro Sanchez, Irish prime minister Simon Harris said the countries would coordinate the move together.
“When we move forward, we would like to do so with as many others as possible to lend weight to the decision and to send the strongest message,” said Harris.
Harris’s office said Wednesday that he updated King Abdullah II of Jordan by telephone on Ireland’s plan for statehood recognition.
Harris “outlined Ireland and Spain’s ongoing efforts on Palestinian recognition and ongoing discussions with other like-minded countries,” a statement read.
“The King and the Taoiseach (prime minister) agreed that both Ireland and Jordan should stay in touch in the coming days,” it added.
The conflict in Gaza followed Hamas’s unprecedented October 7 attack against Israel, which resulted in the deaths of more than 1,170 people, mostly civilians, according to an AFP tally of official Israeli figures.
Militants also seized about 250 hostages, 128 of whom Israel estimates remain in Gaza, including 36 the military says are dead.
Israel’s retaliatory offensive has killed more than 35,000 people in Gaza, mostly women and children, according to the Hamas-run territory’s health ministry.


Hezbollah says struck Israel after field commander’s killing

Updated 32 min 20 sec ago
Follow

Hezbollah says struck Israel after field commander’s killing

  • Hezbollah fighters on Wednesday attacked “the Meron base with dozens of Katyusha rockets, heavy rockets and artillery shells“
  • The attacks were “part of the response to the assassination carried out by the Israeli enemy in the south” the previous day, it said

BEIRUT: Lebanon’s Iran-backed Hezbollah group said it launched dozens of rockets at north Israel military positions Wednesday in retaliation for the killing of a member Israel said was a field commander.
Israel and Hamas ally Hezbollah have exchanged near-daily fire following the Palestinian group’s October 7 attack on southern Israel that sparked the war in Gaza.
Hezbollah fighters on Wednesday attacked “the Meron base with dozens of Katyusha rockets, heavy rockets and artillery shells” as well as targeting a barrack with “heavy rockets,” the group said.
The attacks were “part of the response to the assassination carried out by the Israeli enemy in the south” the previous day, it said.
Israel’s army said sirens sounded in Meron on Wednesday without providing further details.
On Tuesday evening, Hezbollah said Israeli fire had killed its member Hussein Makki, who was identified as a field commander by a source close to the group.
The Israeli army later confirmed it had launched the strike that killed Makki.
It described him as “a senior field commander” in Hezbollah responsible for planning and executing “numerous terrorist attacks against Israeli civilians and territory.”
“He previously served as the commander of Hezbollah’s forces in the coastal region,” the army added.
Lebanon’s official National News Agency had reported two people killed in an “enemy drone strike that targeted a car on the Tyre-Al-Hush main road.”
But another source close to Hezbollah later told AFP that while Makki was killed, the other person was injured.
At least 412 people have been killed in Lebanon in more than seven months of cross-border violence, mostly militants but also including 79 civilians, according to an AFP tally.
Israel says 14 soldiers and 10 civilians have been killed on its side of the border.
Tens of thousands of people have been displaced in areas on both sides of the border.


Jordan foils militant attempt to smuggle arms

Updated 35 min 42 sec ago
Follow

Jordan foils militant attempt to smuggle arms

  • Investigations are ongoing on the smuggling attempt

AMMAN: Jordan foiled an attempt by foreign-backed militants to smuggle arms into its territory, a security official told state news agency PETRA on Wednesday.

Security services seized the arms and detained the smugglers, who were Jordanians, in March.

“Investigations and operations are ongoing,” read the PETRA statement.

Jordan had recently blocked several attempts to smuggle arms including mines, explosives, Kalashnikov rifles, and Katyusha rockets.