Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

How are articles from news websites like Yahoo News, CNN, and Reuters catalogued and archived?

There are a lot of online-only articles on places like Yahoo News and Reuters, and they often get deleted after some period of time. This creates A LOT of dead links on the references section of Wikipedia articles, which is frustrating since many of these articles reference useful information. Also, many of them aren't even accessible through archive.org (which is sadly not as effective as it used to be thanks to Ajax and robots.txt).

Is there some other way that they are being archived in order to be accessed later? Or are the articles effectively lost forever once the links go dead?

InquilineKea

Comments

Answer by Joe

The bigger problem is that being online with no physical manifestation, it's possible for them to correct or otherwise edit them over time ... so there were multiple forms of the article that might've been posted under a single URL. Or they could serve different information to different groups (eg, using geolocation or other information known about the user), so we can't be sure that any copy of the article is the same as that which was cited.

For the cases where I'm citing something, I make a local copy (eg, print to PDF) for later reference. In one case, as it's for tracking news articles about a project, I use wget to stash a copy of the page + its dependancies on a local server that's not accessible to the outside world ... then we still have a copy when it comes time for the project review every 3-5 years.

For everyone else to use? I don't know any that specifically target US news sites. Most of the archives that I know about tend to be run by a country's national library or similar to try to archive websites from that country. There are others that are institutional, ie. web pages from a given university. For a ist, seeNetPreserve.org

The US Library of Congress has a web archive with news articles, but it's rather limited in scope -- they only have websites related to a certain event (eg, elections, Sept 11 2011), and for a limited time period.

... so I guess the point is -- there might be a copy of the item out there ... but there's no easy way to find them that I know of (other than hope it's in Archive.org)

Comments

Answer by Jakob

Some newspaper indexing agencies such as Genios also collect some online sources (just search for "online" to get examples). So a partial answer is, these online news articles are catalogued and archived by press archives and indexing services. The Library of Congress provides a list of references, as well as this Wikipedia page.

Comments