The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world’s facts
GOOGLE is building the largest store of knowledge in human history – and it’s doing so without any human help.
Instead, Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.
The breadth and accuracy of this gathered knowledge is already becoming the foundation of systems that allow robots and smartphones to understand what people ask them. It promises to let Google answer questions like an oracle rather than a search engine, and even to turn a new lens on human history.
Knowledge Vault is a type of “knowledge base” – a system that stores information so that machines as well as people can read it. Where a database deals with numbers, a knowledge base deals with facts. When you type “Where was Madonna born” into Google, for example, the place given is pulled from Google’s existing knowledge base.
This existing base, called Knowledge Graph, relies on crowdsourcing to expand its information. But the firm noticed that growth was stalling; humans could only take it so far.
So Google decided it needed to automate the process. It started building the Vault by using an algorithm to automatically pull in information from all over the web, using machine learning to turn the raw data into usable pieces of knowledge.
Knowledge Vault has pulled in 1.6 billion facts to date. Of these, 271 million are rated as “confident facts”, to which Google’s model ascribes a more than 90 per cent chance of being true. It does this by cross-referencing new facts with what it already knows.
“It’s a hugely impressive thing that they are pulling off,” says Fabian Suchanek, a data scientist at Télécom ParisTech in France.
Google’s Knowledge Graph is currently bigger than the Knowledge Vault, but it only includes manually integrated sources such as the CIA Factbook.
Knowledge Vault offers Google fast, automatic expansion of its knowledge – and it’s only going to get bigger. As well as the ability to analyse text on a webpage for facts to feed its knowledge base, Google can also peer under the surface of the web, hunting for hidden sources of data such as the figures that feed Amazon product pages, for example.
Tom Austin, a technology analyst at Gartner in Boston, says that the world’s biggest technology companies are racing to build similar vaults. “Google, Microsoft, Facebook, Amazon and IBM are all building them, and they’re tackling these enormous problems that we would never even have thought of trying 10 years ago,” he says.
The potential of a machine system that has the whole of human knowledge at its fingertips is huge. One of the first applications will be virtual personal assistants that go way beyond what Siri and Google Now are capable of, says Austin.
“Before this decade is out, we will have a smart priority inbox that will find for us the 10 most important emails we’ve received and handle the rest without us having to touch them,” Austin says. Our virtual assistant will be able to decide what matters and what doesn’t.
Other agents will carry out the same process to watch over and guide our health, sorting through a knowledge base of medical symptoms to find correlations with data in each person’s health records. IBM’s Watson is already doing this for cancer at Memorial Sloan Kettering Hospital in New York.
Knowledge Vault promises to supercharge our interactions with machines, but it also comes with an increased privacy risk. The Vault doesn’t care if you are a person or a mountain – it is voraciously gathering every piece of information it can find.
“Behind the scenes, Google doesn’t only have public data,” says Suchanek. It can also pull in information from Gmail, Google+ and Youtube.”You and I are stored in the Knowledge Vault in the same way as Elvis Presley,” Suchanek says.
Google researcher Kevin Murphy and his colleagues will present a paper on Knowledge Vault at the Conference on Knowledge Discovery and Data Mining in New York on 25 August.
As well as improving our interactions with computers, large stores of knowledge will be the fuel for augmented reality, too. Once machines get the ability to recognise objects, Knowledge Vault could be the foundation of a system that can provide anyone wearing a heads-up display with information about the landmarks, buildings and businesses they are looking at in the real world. “Knowledge Vault adds local entities – politicians, businesses. This is just the tip of the iceberg,” Suchanek says.
Richer vaults of knowledge will also change the way we study human society “This is the most visionary thing,” says Suchanek. “The Knowledge Vault can model history and society.”
Google already has a way to track mentions of names over time using historical texts, measuring the popularity of Albert Einstein vs Charles Darwin, for instance. By adding knowledge bases – which know the gender, age and place of birth of myriad people – historians would be able to track more in-depth questions, such as the popularity of female singers over time, for example.
Suchanek has already carried out a version of this kind of data-driven history. By combining a knowledge base called YAGO with data from French newspaper Le Monde, he was able to show how the gender gap in French politics changed over time. This was only possible because YAGO knows the gender of every French politician, and can apply that knowledge to names mentioned in Le Monde. He will present the work at the Very Large Databases Conference in Hangzhou, China, in September.
It might even be possible to use a knowledge base as detailed and broad as Google’s to start making accurate predictions about the future based on analysis and forward projection of the past, says Suchanek.
“This an entirely new generation of technology that’s going to result in massive changes – improvement in how people live and have fun, and how they make war,” says Austin. “This is a quantum leap.