Manipulate archived internet pages? Yes we can!

Previous articles described how “The Wayback Machine” (TWM), (www.archive.org) may be used to establish a date of public availability for a particular internet page, which may be useful in legal proceedings. These articles also discussed a pitfall when dating information from a page which uses frames. This article discusses another pitfall, which relates to the use of objects in a page.

Generally, an internet page is scripted into HTML (HyperText Markup Language). HTML-elements form the main building blocks of an internet page, and may contain various types of content including marked up text and objects such as images or flash-objects. Unlike text, objects are not part of the HTML-script. Objects are separate files saved on the web server, to which the HTML-element then links for incorporation of the object therein. Technically, when TWM archives an internet page, it copies the HTML-script, comprising the links to files saved on the web server, and saves it into an archive without necessarily archiving the files itself. Assuming that the content of the files on the web server never changes, TWM will display the internet page as it was on the date of archiving.

In a recent court case, the authors found the content of an archived internet page being cited as prior art against a patent, in which case the assumption proved to be wrong. Instead of amending the HTML-script, the owner of the internet page regularly updated the object saved in a file on the web server, to which the HTML-script linked. As a result, TWM displayed the internet page for every archiving date as being identical to the currently active internet page, as the HTML-script stayed the same over many years. As a demonstration, the authors updated the object to show a recent newspaper front page. The latter could be retrieved via TWM as being publicly available years before the date of publication of the newspaper. This behaviour of TWM was verified by a bailiff and recognised by a Belgian judge. The content of the object was therefore rejected as part of the prior art.

The case at hand shows how TWM information may be misinterpreted, or even falsely manipulated. It illustrates that care is needed when using TWM information not forming part of the archived HTML-script. This information may differ considerably from what was showing on the screen at the date the page was archived.

Author: Yannick Philippaerts - Publisher: Managing IP