Archive Assisted Archival Fixity Verification Framework

05/29/2019
by   Mohamed Aturban, et al.
0

The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure an archived resource has remained unaltered since the time it was captured. Some web archives do not allow users to access fixity information and, more importantly, even if fixity information is available, it is provided by the same archive from which the archived resources are requested. In this research, we propose two approaches, namely Atomic and Block, to establish and check fixity of archived resources. In the Atomic approach, the fixity information of each archived web page is stored in a JSON file (or a manifest), and published in a well-known web location (an Archival Fixity server) before it is disseminated to several on-demand web archives. In the Block approach, we first batch together fixity information of multiple archived pages in a single binary-searchable file (or a block) before it is published and disseminated to archives. In both approaches, the fixity information is not obtained directly from archives. Instead, we compute the fixity information (e.g., hash values) based on the playback of archived resources. One advantage of the Atomic approach is the ability to verify fixity of archived pages even with the absence of the Archival Fixity server. The Block approach requires pushing fewer resources into archives, and it performs fixity verification faster than the Atomic approach. On average, it takes about 1.25X, 4X, and 36X longer to disseminate a manifest to perma.cc, archive.org, and webcitation.org, respectively, than archive.is, while it takes 3.5X longer to disseminate a block to archive.org than perma.cc. The Block approach performs 4.46X faster than the Atomic approach on verifying the fixity of archived pages.

READ FULL TEXT
research
12/08/2017

Difficulties of Timestamping Archived Web Pages

We show that state-of-the-art services for creating trusted timestamps i...
research
03/21/2022

Web Page Content Extraction Based on Multi-feature Fusion

With the rapid development of Internet technology, people have more and ...
research
06/04/2018

How Content Volume on Landing Pages Influences Consumer Behavior

Does more information elicit users compliance and engagement, or the oth...
research
09/25/2022

Scrapbook: Screenshot-Based Bookmarks for Effective Digital Resource Curation across Applications

Modern knowledge workers typically need to use multiple resources, such ...
research
05/29/2019

MementoMap Framework for Flexible and Adaptive Web Archive Profiling

In this work we propose MementoMap, a flexible and adaptive framework to...
research
07/12/2022

A study of HTTP/2's Server Push Performance Potential

Modern web pages have complex structures comprised of up to hundreds of ...
research
09/18/2018

The Archive and Package (arcp) URI scheme

The arcp URI scheme is introduced for location-independent identifiers t...

Please sign up or login with your details

Forgot password? Click here to reset