As far back as 2001, a team at Princeton University studied the persistence of web references in scientific articles, finding that the raw number of URLs contained in academic articles was increasing but that many of the links were broken, including 53 percent of those in the articles they had collected from 1994. Thirteen years later, six researchers created a data set of more than 3.5 million scholarly articles about science, technology, and medicine, and determined that one in five no longer points to its originally intended source. In 2016, an analysis with the same data set found that 75 percent of all references had drifted.
Deletion isn’t the only issue. Not only can information be removed, but it also can be changed. Before the advent of the internet, it would have been futile to try to change the contents of a book after it had been long published.
So yes it is a very real problem and due to the decentralised nature of the Internet sites, blogs, books, and even government sites get deleted and changed. On the site/hosting side this will not change, so right now our hope really lies in the Internet Archive’s Wayback Machine (and that it keeps getting funding) and efforts like Amberlink, as it is unlikely that any legislation will change this reality. The fact is, we are doomed to lose a lot of human knowledge though the Internet, at least for now.
#technology #archiving #knowledge #internetarchive #waybackmachine
Too much has been lost already. The glue that holds humanity’s knowledge together is coming undone.