As the most popular encyclopedia of all time — with some 6.5 million articles — Wikipedia is the default first stop in the hunt for research information, background material, or an answer to that nagging question about pop culture. Wikipedia can tell you that scientists named a new species of fungus Spongiforma squarepantsii, after the cartoon character SpongeBob SquarePants, or that Blackfeet Tribe member Joe Hipp was the first Native American to compete for the World Boxing Association’s World Heavyweight title.
But sometimes that quick search for information comes with a nagging doubt: How do we know whether what we’re reading is accurate? For instance, if you had read the above mentioned entry on Blackfeet Tribe member Joe Hipp a month ago, the Wikipedia citation for that claim would have been a webpage that didn’t even mention Hipp or boxing. Wikipedia is crowdsourced, so it usually requires that facts be corroborated; quotations, controversial statements, and contentious material about living people must include a citation. Volunteers double-check Wikipedia’s footnotes, but, as the site continues to grow, it’s challenging to keep pace with the more than 17,000 new articles added each month.
Automated tools can help identify gibberish or statements that lack citations, but helping human editors determine whether a source actually backs up a claim is a much more complex task — one that requires an AI system’s depth of understanding and analysis.
Building on Meta AI’s research and advancements, we’ve developed the first model capable of automatically scanning hundreds of thousands of citations at once to check whether they truly support the corresponding claims. It’s open-sourced here, and you can see a demo of our verifier here. As a knowledge source for our model, we created a new dataset of 134 million public webpages — an order of magnitude larger and significantly more intricate than ever used for this sort of research. It calls attention to questionable citations, allowing human editors to evaluate the cases most likely to be flawed without having to sift through thousands of properly cited statements. If a citation seems irrelevant, our model will suggest a more applicable source, even pointing to the specific passage that supports the claim. Eventually, our goal is to build a platform to help Wikipedia editors systematically spot citation issues and quickly fix the citation or correct the content of the corresponding article at scale.
"This is a powerful example of machine learning tools that can help scale the work of volunteers by efficiently recommending citations and accurate sources. Improving these processes will allow us to attract new editors to Wikipedia and provide better, more reliable information to billions of people around the world. I look forward to continued improvements in this area, especially as machine learning tools are able to provide more customized citations and multilingual options to serve our Wikimedia communities across more than 300 languages."
Shani Evenstein Sigalov, a researcher at Tel Aviv University and long-time Wikimedian.
Teaching machines to understand the relationship between complex text passages, such as Wiki entries and the articles they cite, will also help the research community advance AI toward smarter systems that can reason about real-world knowledge with more complexity and nuance.
For example, to replace the failed citation about Joe Hipp, our system recommends a passage from a 2015 article in the Great Falls Tribune:
“In 1989 at the twilight of his career, [Marvin] Camel fought Joe Hipp of the Blackfeet Nation. Hipp, who became the first Native American to challenge for the world heavyweight championship, said the fight was one of the weirdest of his career.”
To identify that source, our system had to parse complex semantics. The newspaper passage does not explicitly mention boxing, but the model inferred the context from indirect clues, such as the term heavyweight. It also understood that the word challenge in the Tribune article means the same thing as compete in the Wikipedia claim.
At first, the sheer scale of the job seemed formidable even for an advanced AI system: There were millions of citations to check and millions of potential evidence documents to consider. Even more daunting was that citation editing requires near-human language comprehension and acumen. To succeed at this task, an AI model must understand the claim in question, find the corresponding passage on the cited website, and predict whether the source truly verifies the statement.
At Meta AI, we have already begun to develop the building blocks of the next generation of citation tools. Last year, we released an AI model that integrates information retrieval and verification, and we are training neural networks to learn more nuanced representations of language so they can pinpoint relevant source material in an internet-size pool of data.
Where a person would use reasoning and common sense to evaluate a citation, our system applies natural language understanding (NLU) techniques to estimate the likelihood that a claim can be inferred from a source. In NLU, a model translates human sentences (or words, phrases, or paragraphs) into complex mathematical representations. We’ve designed our tools to compare these representations in order to determine whether one statement supports or contradicts another.
Our new dataset of 134 million web pages serves as one of the system’s main components: Sphere, a web-scale retrieval library that’s open-sourced here.
To find appropriate sources among the millions of webpages in the library, we designed a way to use AI to index a vast amount of information. We fed our algorithms 4 million claims from Wikipedia, teaching them to zero in on a single source from a vast pool of webpages to validate each statement.
During a search, the models create and compare mathematical representations of the meanings of entire statements rather than of individual words. Because webpages can contain long stretches of text, the models assess content in chunks and consider only the most relevant passage when deciding whether to recommend a URL. These prebuilt indices, which catalog 40 times more content than other Wikipedia indices, will be included with Sphere.
The indices pass potential sources to an evidence-ranking model, which compares the new text with the original citation. Using fine-grained language comprehension, the model ranks the cited source and the retrieved alternatives according to the likelihood that they support the claim. When deployed in the real world, the model will offer the most relevant URLs as prospective citations for a human editor to review and approve.
Usually, to develop models like this, the input might be just a sentence or two. We trained our models with complicated statements from Wikipedia, accompanied by full websites that may or may not support the claims. As a result, our models have achieved a leap in performance in terms of detecting the accuracy of citations. For example, our system found a better source for a citation in the Wikipedia article “2017 in Classical Music.” The claim reads:
“The Los Angeles Philharmonic announces the appointment of Simon Woods as its next president and chief executive officer, effective 22 January 2018.”
The current Wikipedia footnote for this statement links to a press release from the Dallas Symphony Association announcing the appointment of its new president and CEO, also effective January 22, 2018. Despite their similarities, our evidence-ranking model deduced that the press release was not relevant to the claim. Our AI indices suggested another possible source, a blog post on the website Violinist.com, which notes,
“On Thursday Los Angeles Philharmonic announced the appointment of Simon Woods as its new Chief Executive Director, effective Jan. 22, 2018.”
The evidence-ranking model then correctly concluded that this was more relevant than Wikipedia’s existing citation for the claim.
Once they are ready to be deployed, our models will bolster the quality of knowledge on Wikipedia, helping to preserve the accuracy of a resource that virtually everyone uses. Beyond that, this project might ultimately help the research community solve difficult problems in AI. Our models have been trained on realistic data at an unprecedented scale. In addition to representing preliminary steps toward the development of an automatic fact-checking system, they could guide the way — by serving as pretrained models, for example — to better results on many other tasks, such as classic natural language inference, retrieval in question-answering systems, and few-shot learning.
Open source projects like these, which teach algorithms to understand dense material with an ever-higher degree of sophistication, help AI make sense of the real world. While we can’t yet design a computer system that has a human-level comprehension of language, our research creates smarter, more flexible algorithms. This improvement will only become more important as we rely on computers to interpret the surging volume of text citations generated each day.
While we continue to refine our verification and retrieval models, we are also looking to the future. These models are the first components of potential editors that could help verify documents in real time. In addition to proposing citations, the system would suggest auto-complete text — informed by relevant documents found on the web — and offer proofreading corrections. Ideally, the models would understand multiple languages and be able to process several types of media, including video, images, and data tables. These capabilities are among Meta AI’s new targets as we help teach technology to understand our world.
NOTE: Wikimedia and Meta are not partnering on this project. The project is still in the research phase and not being used to automatically update any content on Wikipedia.
Meta creates breakthrough technologies and advances AI to connect people to what matters and to help keep communities safe.