Microsoft Bing's extensive multilingual spelling correction models, collectively called Speller100, are being rolled out worldwide with high precision and high recall in over 100 languages.
According to Bing, around 15% of requests submitted by users have misspellings, which can lead to incorrect answers and suboptimal search results.
To address this problem, Bing created the most comprehensive spelling correction system ever developed.
In A / B test queries with and without Speller100, Bing observed the following results:
- The number of pages with no results has been reduced by up to 30%.
- The frequency with which users had to manually rephrase their query has been reduced by 5%.
- The number of times users clicked spelling suggestions rose from single digits to 67%.
- The number of times users clicked an item on the page increased from a single digit to 70%.
How did Bing achieve this? Read on to learn more about Speller100.
Improved spelling correction in Bing search results
Spelling proofing has long been a priority for Bing, and the search engine takes it one step further with the addition of more languages from around the world.
"To make Bing more inclusive, we decided to expand our current spell checker service to over 100 languages while maintaining the same high quality bar that we set for the original two dozen languages."
Read on below
The introduction of Speller100 represents a significant advancement for Bing and is made possible due to recent advances in AI.
The technology behind Speller100 is explained in the company's latest blog post. Here are some key details about Bing's new spelling correction technology.
Speller100 technology from Microsoft Bing
Bing credits Zero-shot learning as an important further development of the AI that Speller100 enables.
With zero-shot learning, an AI model can learn and correct the spelling precisely without additional language-specific training data. This is in contrast to traditional spell check solutions, which relied solely on training data to learn the spelling of a language.
Relying on training data is a challenge when it comes to correcting the spelling of languages that don't have enough data. That is the problem that zero-shot learning seeks to solve.
“Imagine someone taught you how to spell in English and you automatically learned to spell in German, Dutch, Afrikaans, Scottish and Luxembourgish too. That's what zero-shot learning enables, and it's a key component in Speller100 that allows us to expand to languages with very little to no data. "
Read on below
Spelling correction is not natural language processing
The difference with Bing is that spell checking is a completely different task, despite significant advances in natural language processing.
All misspellings can be divided into two types:
- Not-word mistake: Occurs when the word for a particular language is not in the vocabulary.
- Real word error: Occurs when the word is valid but does not fit into the larger context.
Bing has developed a deep learning approach to correcting these misspellings that is inspired by Facebook's BART model. However, it differs from BART in that it presents spelling correction as a character-level problem.
To address a character-level problem, Bing's Speller100 model is trained using character-level mutations that mimic misspellings.
Bing calls these "noise functions":
“We developed noise functions to generate common errors when rotating, inserting, deleting and replacing.
The use of a noise function has significantly reduced our demand for human-labeled annotations, which are often required in machine learning. This is very useful for languages for which we have little or no training data. "
Using noise capabilities, Bing can train Speller100 to correct the spelling of languages for which a large amount of misspelled query data is not available.
Instead, Bing gets by with normal text from websites, which is collected through regular web crawling. There is said to be a sufficient amount of text on the web to facilitate training in hundreds of languages.
“This preschool assignment is a solid first step towards solving multilingual spell checking for more than 100 languages. It helps get 50% of the top candidate correction callback in languages we don't have training data for. "
While this is a significant advancement, Bing says 50% of the recall isn't good enough. This is where zero shot learning comes in.
For languages without training data, Bing uses the zero-shot learning property to target language families. It does this based on the idea that most of the world's languages are known to be related to others.
Read on below
“This orthographic, morphological and semantic similarity between languages in the same group makes a zero-shot learning error model very efficient and effective …
Zero-shot learning enables learning to predict spelling for these resource-poor or resource-poor languages. "
The introduction of Speller100 to Bing marks the first step in a larger effort to implement the technology in more Microsoft products.
Source: Microsoft Research Blog