Applying artificial intelligence to transform language learning

28 February 2022

The pioneering work of ERC grantee Roberto Navigli has brought computer language learning and translation closer to the level of human comprehension. Not only has this been used to deliver large-scale lexical resources that are used by language learners, universities and organisations across the world, but it could also have important implications for artificial intelligence, robotics and the future of machine translation.

A longstanding dream of many experts working in the field of artificial intelligence (AI) has been to develop machines capable of dialogue and language comprehension, indistinguishable to that of a human. For this, a computer must be able to understand not just the meaning of words, but grasp the context of sentences too.

This is a challenge that Roberto Navigli, Professor in the Department of Computer Science at the Sapienza University of Rome, Italy, has been grappling with for over 20 years.

Putting the problem in context

‘A core difficulty is the ambiguity of words’, he explains. ‘For example, if I say in English, “please call me a taxi”, what exactly am I talking about? For a human, the context should be clear, but a computer might think that I am asking to literally be called “taxi!”’

While humans take for granted that words have specific meanings in specific contexts, this is not the case for machines. Computers need to be able to choose the contextual meaning of a word from a list; a process called ‘word-sense disambiguation.’

A key motivation behind Navigli’s ERC-funded work was to tackle the ambiguity of words and bring computers closer to human-level performance. The project brought together computer science, linguistics and language, as well as AI and robotics.

What made this project especially ambitious was that it sought to tackle word ambiguity in just about every language. ‘ERC funding really lets you go in new directions, and to break new ground’, says Navigli.

Results with everyday application

Navigli’s research represents a benchmark in multilingual translation and computer language comprehension. A huge knowledge database of words and synonyms was developed using sophisticated algorithms, and the ‘BabelNet’ platform was born.

‘BabelNet is an innovative computer dictionary’, explains Navigli. ‘What makes it different is that it represents language as a relationship between concepts. This means that queries are done not by word, but by concept. Related concepts are then linked, like in Wikipedia.’

The platform, which just turned ten years old, contains more than 20 million entries in 500 languages. As a comparison, the English language version of Wikipedia contains just over 6 million articles. The resource is used for translation by more than 1,000 universities around the world. Language learners also make use of BabelNet on a daily basis to find phrases and to disambiguate texts across languages.

‘This also led to the creation of a company, Babelscape, which was something that I never expected’, says Navigli. ‘Without this, it would have been very difficult to make BabelNet sustainable.’

The company now employs around 25 staff and works with several multinational companies. The EU Intellectual Property Office also uses BabelNet in order to interpret trademark applications and ensure compliance across the EU.

The next leap forward

Thanks to additional ERC funding, Navigli now hopes to make the next conceptual leap to providing semantics for texts that are independent from language. In other words, this would enable a computer to make sense of a text, irrespective of which language the text is written in.

‘If we achieve this, then we will be getting closer and closer to what humans do with language’, explains Navigli. ‘If I read a text in French and want to translate it into English, I don’t aim to translate it word-for-word. Rather, I aim to convey the meaning. This, after all, is what interpreters do, and this is what we are aiming for.’

Potential end uses could include robotics, with machines capable of being instructed in any language. The comprehension of these commands would be independent of the language expressed.

‘This comes back to my original goal, which is about enabling understanding’, says Navigli. ‘The work is a bit like neural networks. Because we cannot “read” a brain, we need to use intermediate representations to explain what a neural network is doing. And this is the same with developing machines capable of dialogue and language comprehension.’

How the ERC transformed science - interview with Roberto Navigli

About the researcher

Roberto Navigli is a computer scientist and language specialist working within the Department of Computer Science at the Sapienza University of Rome. In addition, he is head of the Sapienza Natural Language Processing (NLP) Group and creator of BabelNet, the largest multilingual encyclopaedic computational dictionary in the world. In 2013, he received the prestigious Marco Cadoli AI*IA prize for his results as a young Italian researcher working in the field of AI. He won an ERC Starting Grant in 2010, and an ERC Consolidator Grant in 2016.

Project information

MOUSSE

Multilingual, Open-text Unified Syntax-independent SEmantics

Researcher:

Roberto Navigli

Host institution:

Universita Degli Studi Di Roma La Sapienza

Italy

Call details

ERC-2016-CoG, PE6

ERC funding

1 497 250 €