SlideShare a Scribd company logo
Memento: Time Travel for the Web	
  


                                 Herbert Van de Sompel
                                   Robert Sanderson
                                   Michael L. Nelson

                                 http://mementoweb.org/



                                       Memento is funded by
                                      The Library of Congress

                                   	
  
Updated Technical Details (May 2011)
         Memento: Time Travel for the Web
         Updated Technical Details (05/2011)
Memento wants to make it Easy

          to navigate the Web of the Past




                    Technical Specification
https://datatracker.ietf.org/doc/draft-vandesompel-memento/
                Memento: Time Travel for the Web
                                                      2
                Updated Technical Details (05/2011)
Tate Online         Select Date                          Tate Online
   Today           March 16 2008                        March 16 2008




                                                          From
                                                    National Archives


              Memento: Time Travel for the Web
                                                    3
              Updated Technical Details (05/2011)
Memento achieves this by introducing

a uniform version access capability to

 integrate the past and current Web




        Memento: Time Travel for the Web
                                              4
        Updated Technical Details (05/2011)
Problem Statement …




Memento: Time Travel for the Web
                                      5
Updated Technical Details (05/2011)
Resources




Memento: Time Travel for the Web
                                      6
Updated Technical Details (05/2011)
Resources have Representations




     Memento: Time Travel for the Web
                                           7
     Updated Technical Details (05/2011)
Resources have Representations that Change over Time




                Memento: Time Travel for the Web
                                                      8
                Updated Technical Details (05/2011)
Only the Current Representation is Available from a Resource




                   Memento: Time Travel for the Web
                                                         9
                   Updated Technical Details (05/2011)
Old Representations are Lost Forever




       Memento: Time Travel for the Web
                                             10
       Updated Technical Details (05/2011)
Archived/Version Resources Exist




     Memento: Time Travel for the Web
                                           11
     Updated Technical Details (05/2011)
Resource Versions on the Web



                   •  Content Management Systems

                   •  Web Archives

                   •  Transactional archives

                   •  Search engine caches

                   •    …




    Memento: Time Travel for the Web
                                          12
    Updated Technical Details (05/2011)
Sep 11 2001, 20:36:10 UTC                                                 Dec 20 2001, 4:51:00 UTC

                                    Archived Resources




                                                                http://en.wikipedia.org/w/index.php?
http://web.archive.org/web/20010911203610/http://      title=September_11_attacks&oldid=282333 archived
www.cnn.com/ archived resource for http://cnn.com             resource for http://en.wikipedia.org/wiki/
                                                                        September_11_attacks

                                   Memento: Time Travel for the Web
                                   Updated Technical Details (05/2011)     13
Versions are Not Integrated with the Web



                        •  Cannot talk about a resource as it
                           used to exist

                        •  Cannot access a prior version
                           knowing the current one

                        •  Cannot access the current version
                           knowing a prior one




         Memento: Time Travel for the Web
                                               14
         Updated Technical Details (05/2011)
Memento Wants to Integrate the Past and Current Web




               Memento: Time Travel for the Web
                                                     15
               Updated Technical Details (05/2011)
The Memento Framework:


Protocol to Integrate Past and Current Web


                    Overview




          Memento: Time Travel for the Web
                                                16
          Updated Technical Details (05/2011)
Memento Framework



                   •  Regards the Web as a big Content
                      Management System

                   •  Introduces a uniform capability to
                      access versions on the Web

                   •  Does not build new archives but
                      leverages existing systems that
                      host versions




Memento: Time Travel for the Web
                                      17
Updated Technical Details (05/2011)
Memento Framework



                   •  Is distributed: versions may exist on
                      several servers

                   •  Uses time as a global version
                      indicator

                   •  Is based on the primitives of the
                      Web: resource, resource state,
                      representation, content negotiation,
                      link




Memento: Time Travel for the Web
                                      18
Updated Technical Details (05/2011)
Original Resources and Mementos




     Memento: Time Travel for the Web
                                           19
     Updated Technical Details (05/2011)
Bridge from Present to Past




  Memento: Time Travel for the Web
                                        20
  Updated Technical Details (05/2011)
Bridge from Past to Present




  Memento: Time Travel for the Web
                                        21
  Updated Technical Details (05/2011)
Memento Framework




Memento: Time Travel for the Web
                                      22
Updated Technical Details (05/2011)
Memento Client Server Interaction




     Memento: Time Travel for the Web
                                           23
     Updated Technical Details (05/2011)
Memento HTTP Flow

       HEAD R, Accept-Datetime


              Link    G


       GET G, Accept-Datetime


   302    M, Vary, Link    M,R,T


       GET M, Accept-Datetime


200, Memento-Datetime, Link     M,R,T,G
                                 24
The Memento Framework:


Protocol to Integrate Past and Current Web


            Interesting Cases




          Memento: Time Travel for the Web
                                                25
          Updated Technical Details (05/2011)
Multiple Archives




Memento: Time Travel for the Web
                                      26
Updated Technical Details (05/2011)
Original Resource Gone




Memento: Time Travel for the Web
                                      27
Updated Technical Details (05/2011)
Original Resource’s Server Gone




     Memento: Time Travel for the Web
                                           28
     Updated Technical Details (05/2011)
Original Resource Provides no Link




      Memento: Time Travel for the Web
                                            29
      Updated Technical Details (05/2011)
The Memento Framework:


Protocol to Integrate Past and Current Web


               HTTP Headers




          Memento: Time Travel for the Web
                                                30
          Updated Technical Details (05/2011)
HTTP Headers used in Memento


•  Defines two new headers:
    –  request: Accept-Datetime!
    –  response: Memento-Datetime!

•  Introduce new content for two existing headers:
    –  response: Vary ; Link!

•  Use one existing headers without modification:
    –  response: Location, TCN:!




                    Memento: Time Travel for the Web
                                                          31
                    Updated Technical Details (05/2011)
HTTP Request Headers: Accept-Datetime

•  Accept-Datetime


   o    Issued against TimeGate, (Original Resource), (Memento)

   o    Header value:
         -  Desired datetime of Memento (MANDATORY)
            Must be in RFC 1123 format and in GMT

         -  Interval indicator to express the client is only interested in
            Mementos within the interval (OPTIONAL)
              –  Expressed as two ISO8601 durations:
                 "-P3DT5H;+P2DT6H"!

  Accept-Datetime: Mon, 12 Oct 2009 14:20:33 GMT!

                      Memento: Time Travel for the Web
                                                            32
                      Updated Technical Details (05/2011)
HTTP Response Headers: Memento-Datetime
•  Memento-Datetime!
    o  Returned by Mementos

        -  Always. Even when not via a TimeGate
    o  Header value: Archival datetime of the Memento

        -  Resource has not and will not change beyond that date
    o  This header is sticky:

        -  Once returned, must always return it with same value
        -  Must also be preserved when Mementos are mirrored at
           different URIs
•  This header is crucial to allow a client to understand it has arrived
   at a Memento
       See: http://www.mementoweb.org/guide/resourcetype/

  Memento-Datetime: Mon, 12 Oct 2009 14:20:33 GMT!

                      Memento: Time Travel for the Web
                                                            33
                      Updated Technical Details (05/2011)
HTTP Response Headers: Vary

•  Vary!
    o  Returned by TimeGate

    o  Similar to regular content negotiation

    o  Header value:

        -  negotiate, accept-datetime!

•  TimeGate must first meet the datetime preference, and then – if
   possible – other content negotiation preferences

•  Note: accept-datetime value in Vary header is crucial to
   allow a client to understand it has arrived at a TimeGate.
      See: http://www.mementoweb.org/guide/resourcetype/

           Vary: negotiate, accept-datetime!

                    Memento: Time Travel for the Web
                                                          34
                    Updated Technical Details (05/2011)
HTTP Response Headers: Location

 •  Location


    o    Returned by TimeGate
    o    Similar to regular content negotiation
    o    Header value: URI of selected Memento




Location: http://web.archive.org/web/20010911223004/
                   http://cnn.com!

                      Memento: Time Travel for the Web
                                                            35
                      Updated Technical Details (05/2011)
HTTP Response Headers: Link

    •  Link!
        o  Returned by Original Resource, TimeGate and Mementos

        o  Various new Relation Types are introduced:

            -  “original”!
            -  “timegate”!
            -  “memento”!
            -  “timemap”!
        o  HTTP Link Header: RFC 5988

           See:     https://datatracker.ietf.org/doc/rfc5988/



Link: <http://web.archive.org/web/20010911223004/http://!
 cnn.com>;rel="memento";datetime="Mon, 11 Sep 2001 22:30:04 GMT"!

                       Memento: Time Travel for the Web
                                                             36
                       Updated Technical Details (05/2011)
Memento HTTP Flow

       HEAD R, Accept-Datetime


              Link  G
               timegate

       GET G, Accept-Datetime


   302    M, Vary, Link    M,R,T
              memento,original,timemap
       GET M, Accept-Datetime


200, Memento-Datetime, Link     M,R,T,G
        memento,original,timemap,timegate
HTTP Response Headers: Link

•  Link!
    o  Returned by Original Resource, TimeGate and Mementos

    o  Various new Relation Types are introduced




                   Memento: Time Travel for the Web
                                                         38
                   Updated Technical Details (05/2011)
The Memento Framework:


Protocol to Integrate Past and Current Web


            HTTP Interactions




          Memento: Time Travel for the Web
                                                39
          Updated Technical Details (05/2011)
Memento HTTP Flow: Step 1
Memento HTTP Flow: Step 2
Memento HTTP Flow: Step 3
Memento HTTP Flow: Step 4
Memento HTTP Flow: Step 5
Memento HTTP Flow: Step 6
The Memento Framework:


Protocol to Integrate Past and Current Web


       HTTP Link Header Details




          Memento: Time Travel for the Web
                                                46
          Updated Technical Details (05/2011)
HTTP Response Headers: Link




    Memento: Time Travel for the Web
                                          47
    Updated Technical Details (05/2011)
Memento HTTP Flow

       HEAD R, Accept-Datetime


              Link    G
              timegate

       GET G, Accept-Datetime


   302    M, Vary, Link    M,R,T


       GET M, Accept-Datetime


200, Memento-Datetime, Link     M,R,T,G
HTTP Response Headers: Link


•  RECOMMENDED "timegate” Link from Original Resource to
   TimeGate

•  If this Link is not available, the client must try and find a TimeGate
   itself, via:
          -  Memento Discovery approaches (see later)
          -  User interaction (e.g. Preferences in an application)




                      Memento: Time Travel for the Web
                                                            49
                      Updated Technical Details (05/2011)
HTTP Response Headers: Link




    Memento: Time Travel for the Web
                                          50
    Updated Technical Details (05/2011)
Memento HTTP Flow

    HEAD R, Accept-Datetime


           Link    G


    GET G, Accept-Datetime


  302    M, Vary, Link  M
                     memento

    GET M, Accept-Datetime


200, Memento-Datetime, Link  M
                         memento
HTTP Response Headers: Link

•  MANDATORY ”memento” Links from TimeGate and Memento to
   Mementos

•  ”memento” Links point at the following Mementos know to the
   responding server:
    o  Selected Memento (MANDATORY if a Memento is selected)

    o  First Memento, Last Memento (MANDATORY)

    o  Memento prev to selected one, Memento next to selected one

       (RECOMMENDED)
    o  Other Mementos (OPTIONAL, and only if prev and next are

       provided)
    o  Temporal order of Mementos is expressed using existing Relation

       Types (RFC 5829, RFC 5988): first, last, next, prev,
       successor-version, predecessor-version

                    Memento: Time Travel for the Web
                                                          52
                    Updated Technical Details (05/2011)
HTTP Response Headers: Link

•  MANDATORY ”memento” Links from TimeGate and Memento to
   Mementos

•  Attributes for a ”memento” Link:
    o  datetime (MANDATORY)

                  datetime of the Memento pointed at by the link
    o  license (OPTIONAL)

                  license associated with the Memento
    o  embargo (OPTIONAL)

                  datetime until which the Memento will remain
       inaccessible
    o  type (RECOMMENDED)

                  mime type of the Memento.


                     Memento: Time Travel for the Web
                                                           53
                     Updated Technical Details (05/2011)
HTTP Response Headers: Link




    Memento: Time Travel for the Web
                                          54
    Updated Technical Details (05/2011)
Memento HTTP Flow

HEAD R, Accept-Datetime


     Link    M=R
  memento,original
HTTP Response Headers: Link

•  Both ”memento” and ”original” Link on a resource.

•  The resource is its own Memento, i.e. it is a stable resource.
    o  Resource that was born stable or became stable; it will not change

       anymore.
    o  For example resources with PermaLink on news sites

    o  Note the difference with Last-Modified header




•  Can still also provide a ”timegate” Link!
    o  For example pointing at TimeGates for Mementos of the resource

       before it became stable




                    Memento: Time Travel for the Web
                                                          56
                    Updated Technical Details (05/2011)
HTTP Response Headers: Link




    Memento: Time Travel for the Web
                                          57
    Updated Technical Details (05/2011)
Memento HTTP Flow




     GET M, (Accept-Datetime)


200, Memento-Datetime, Link  M,R,T
                memento,original,timemap
HTTP Response Headers: Link


•  Mementos without a TimeGate, for example:
    o  Resources in snapshot archives

    o  Version resources in systems that have not yet implemented

       TimeGates

•  Should still use Memento-Datetime header
•  Should still use “original” Link
•  Can have “timemap” Link




                    Memento: Time Travel for the Web
                                                          59
                    Updated Technical Details (05/2011)
HTTP Response Headers: Link




    Memento: Time Travel for the Web
                                          60
    Updated Technical Details (05/2011)
Memento HTTP Flow

    HEAD R, Accept-Datetime


           Link    G


    GET G, Accept-Datetime


  302    M, Vary, Link  T
                     timemap

    GET M, Accept-Datetime


200, Memento-Datetime, Link  T
                         timemap
HTTP Response Headers: Link
•  RECOMMENDED ”timemap” Links from TimeGate and
   Mementos.

•  A TimeMap is introduced to allow retrieving an inventory of Mementos
   for an Original Resource that the responding server is aware of. It lists:
    o  URI of Original Resource (MANDATORY)

    o  URI and datetime of all known Mementos (MANDATORY)

    o  URI of TimeGate for Original Resource (RECOMMENDED)

    o  URI of TimeMap itself (RECOMMENDED)




•  Multiple TimeMap serializations possible; link-value format
   MANDATORY
    o  application/link-format:

       see https://datatracker.ietf.org/doc/draft-ietf-core-link-format/

                      Memento: Time Travel for the Web
                                                            62
                      Updated Technical Details (05/2011)
Memento HTTP Flow: GET TimeMap




     Memento: Time Travel for the Web
                                           63
     Updated Technical Details (05/2011)
Memento HTTP Flow: TimeMap Response




       Memento: Time Travel for the Web
       Updated Technical Details (05/2011)
The Memento Framework:


Discovery to Support Integration of Past and Current Web


                 TimeGate Discovery




                 Memento: Time Travel for the Web
                                                       65
                 Updated Technical Details (05/2011)
Batch Discovery of TimeGates: robots.txt

•  robots.txt file is used by Web servers to convey crawling
policies

•  Web crawlers (such as for archives) retrieve and parse it
•  De-facto standard, no official endorsement
•  Extended with new directives, including by Google



  User-agent: *               # Reject all crawlers!
  Disallow: /!
                       # Google Sitemap extension!
  Sitemap: http://some.example.com/me/sitemap.xml!

  User-agent: NiceBot         # Select only NiceBot!
  Crawl-delay: 10             # 10 seconds between requests!


                     Memento: Time Travel for the Web
                                                           66
                     Updated Technical Details (05/2011)
Batch Discovery of TimeGates: robots.txt

•  Add TimeGate and Archived directives to support discovery
of TimeGates known to the server
•  User agent should concatenate desired URL with TimeGate link
•  Archived value is truncated host/path or * to describe a general
web archive




              http://mementoweb.org/guide/robotstxt/
                     Memento: Time Travel for the Web
                                                           67
                     Updated Technical Details (05/2011)
The Memento Framework:


Discovery to Support Integration of Present and Past Web


 Discovery via TimeMaps: All Mementos for a given
      Original Resource known by an archive




                 Memento: Time Travel for the Web
                                                       68
                 Updated Technical Details (05/2011)
TimeMap Overview

•  A TimeMap is an inventory of Mementos for an Original Resource
that the responding server is aware of. It lists at least:
      •  URI of Original Resource
      •  URI and datetime of all known Mementos
      •  URI of TimeGate for Original Resource
      •  URI of TimeMap itself

•  Multiple TimeMap serializations possible:
     •  application/link-format mandatory

    see https://datatracker.ietf.org/doc/draft-ietf-core-link-format/

    •  RDF TimeMaps proposed




                     Memento: Time Travel for the Web
                                                           69
                     Updated Technical Details (05/2011)
TimeMaps: Link Format Syntax

    •  Document in the format of the value of the Link HTTP Header

    •  Format:
            <URI>;rel="type";attr="val", !
           !<URI2>…


    •  rel is the relationship between context URI and the URI in <>s
    •  The Context URI for TimeMaps is the URI with rel="original"
    •  Other rel types link to Mementos, TimeGates etc.!

<http://cnn.com/>;rel="original",

<http://web.archive.org/web/20010911223004/http://!
 cnn.com>;rel="memento";datetime="Mon, 11 Sep 2001 22:30:04 GMT"!




                        Memento: Time Travel for the Web
                                                              70
                        Updated Technical Details (05/2011)
TimeMaps: Link Attributes

•  Existing Attributes for Links
     •  "rel"               The type of relationship
     •  "type"              The (mime) formatt of the linked resource
     •  "title"             Title of the linked resource
     •  "hreflang"         !Language of linked resource
     •  "media"             Intended media (eg screen)
     •  "anchor"            URI to override context URI for link

•  New rel types introduced:
     •  "original"      The Original Resource
     •  "memento"        A Memento of the Original
     •  "timegate"       A TimeGate for the Original
     •  "timemap"        A TimeMap of Mementos of the Original!




                     Memento: Time Travel for the Web
                                                           71
                     Updated Technical Details (05/2011)
TimeMaps: Link Attributes

      •  New Attributes for Mementos:
           •  datetime The Memento-Datetime!
           •  license!License associated with Memento
           •  embargo!Time after which Memento is available
           •  Are others necessary?!

<http://cnn.com/>;rel="original",!
<http://web.archive.org/timegate/cnn.com/>;rel="timegate",!
<http://web.archive.org/timemap/link/cnn.com/>;rel="timemap",!
<http://web.archive.org/web/20010911223004/http://

 cnn.com>;rel="memento";datetime="Mon, 11 Sep 2001 22:30:04 GMT"!
<http://web.archive.org/web/…/cnn.com/>;rel="memento";

    license="http://archive.org/license/1";

    datetime="…";embargo="Mon, 20 Jul 2011 00:00:00 GMT",!
… !

                          Memento: Time Travel for the Web
                                                                72
                          Updated Technical Details (05/2011)
The Memento Framework:


Discovery to Support Integration of Past and Current Web


  Memento Discovery: All Mementos known by an
                    archive




                 Memento: Time Travel for the Web
                                                       73
                 Updated Technical Details (05/2011)
Batch discovery of Mementos: robots.txt

•  robots.txt file is used by Web servers to convey crawling
policies

•  Support discovery of Mementos through robots.txt via existing
User-agent and Allow directives
•  Use value memento for User-agent to convey that the value
for the Memento-Datetime header must remain sticky when
crawling/mirroring Mementos




                    Memento: Time Travel for the Web
                                                          74
                    Updated Technical Details (05/2011)
Batch discovery of Mementos: Memento Feeds

•  Concept:
    •  Archives publish feeds in which each entry provides details
    about a specific Memento, e.g. Memento-Datetime, Original
    Resource, etc.
    •  As new Mementos become available, new feeds with new
    entries are published
    •  Once published, feeds remain static

•  Technology:
     •  To be decided in collaboration with IIPC
     •  Inspired by the approach and functionality of CDX files (see
     http://www.archive.org/web/researcher/cdx_legend.php) but:
           •  With Memento-specific extensions;
           •  Possibly using different serialization;
           •  Including mechanisms to discover these feeds.


                     Memento: Time Travel for the Web
                                                           75
                     Updated Technical Details (05/2011)
The Memento Framework:




              Tools




  Memento: Time Travel for the Web
                                        76
  Updated Technical Details (05/2011)
Memento Client Support

                    •  Several client tools developed by
                       us and others

                    •  Add-ons for FireFox (operational)
                       and Internet Explorer
                       (experimental)

                    •  Applications for Android
                       (operational) and iPhone/iPad (in
                       development)

                    •  Paper in Code4Lib Journal
                        http://journal.code4lib.org/articles/4979




 Memento: Time Travel for the Web
                                       77
 Updated Technical Details (05/2011)
Memento Server Support




                              •  Plug-in for MediaWiki (operational)

                              •  Used on W3C’s main wiki

                              •  Please install it for your MediaWiki!




http://www.mediawiki.org/wiki/Extension:Memento
           Memento: Time Travel for the Web
                                                 78
           Updated Technical Details (05/2011)
Memento Server Support

                        •  Memento-compliant Wayback
                           software:

                             •  In production at the Internet
                                Archive

                             •  Available to Web archives,
                                worldwide

                             •  Please have your favorite Web
                                Archive experiment with the
                                new 1.6 version!




http://mementoweb.org/tools/wayback/
     Memento: Time Travel for the Web
                                           79
     Updated Technical Details (05/2011)
Memento Server Validator


                         •  Server side client:

                              •  Attempts to perform all
                                 Memento actions against a
                                 given URI

                              •  Reports success/failure of the
                                 interactions and warnings for
                                 optional aspects

                              •  Kept up to date with IETF
                                 Internet Draft


http://mementoweb.org/tools/validator/
      Memento: Time Travel for the Web
                                            80
      Updated Technical Details (05/2011)
Memento Proxy Support


                     •  Several systems that host
                        Mementos made Memento-
                        compliant “by proxy”

                          •  Many major Web Archives that
                             do not yet run Memento-
                             compliant software
                          •  3,000+ MediaWiki systems,
                             including Wikipedia, Wikia

                     •  We would love all of these to
                        become natively Memento
                        compliant!



  Memento: Time Travel for the Web
                                        81
  Updated Technical Details (05/2011)
Memento Aggregator TimeGate



                      •  Aggregates all known TimeGates
                          •  Proxies
                          •  Native Implementations

                      •  Redirects to authoritative
                         TimeGates (Wikipedia,
                         Transactional Archives)

                      •  Currently implemented with
                         BerkeleyDB
                      •  Future version to use
                         FaceBook's Cassandra platform



  Memento: Time Travel for the Web
                                        82
  Updated Technical Details (05/2011)
Aggregators Find More Mementos!




•  1000 URIs sampled from delicious.com
•  1 dot = 1 Memento (x=Memento-Datetime, y=URI of Original Resource)
•  Sorted by URI longevity
                        Memento: Time Travel for the Web
                                                              83
                        Updated Technical Details (05/2011)
But Still Too Few Mementos To Be Found…




•  1000 URIs sampled from search engine result pages;
•  See: “How Much of the Web is Archived?” JCDL 2011
                        Memento: Time Travel for the Web
                                                              84
                        Updated Technical Details (05/2011)
Crawl-Based Web Archives




                Observations
For example: Heritrix crawler for Internet Archive


           Memento: Time Travel for the Web
                                                 85
           Updated Technical Details (05/2011)
Crawl-Based Web Archives

•  Collect discreet observations of resources, not their entire
evolution.

•  Can be rejected (robots.txt, by user-agent, by host IP)

•  Can be deceived (cloaking, by geo-location, by user-agent).

•  Coverage of particular Web server dependent on crawl-strategy.




                      Memento: Time Travel for the Web
                                                            86
                      Updated Technical Details (05/2011)
Server-Side Transactional Web Archives




                   Change History
For example: TTApache, PageVault, Vignette Web Capture


              Memento: Time Travel for the Web
                                                    87
              Updated Technical Details (05/2011)
Server-Side Transactional Web Archives

•  Collect all representations served by to-be-archived server.

•  To-be-archived server needs to cooperate.
     •  Incentives e.g. institutional memory, official record of Web
     presence.

•  Archival coverage restricted by to-be-archived server, does not
include external servers (e.g. embedded resources).

•  To be archived server can submit falsified information.

•  Archival collection management: what to keep, what not (e.g.
significant changes, deduplication, …).




                      Memento: Time Travel for the Web
                                                            88
                      Updated Technical Details (05/2011)
Development of Transactional Web Archive Software
Capture:
   •  Apache connection filter module captures URI, headers, body
   •  POSTs in real-time to transactional archive




Access:
   •  Online, real time access via Memento TimeGates
   •  Batch Export via WARC files for long term preservation


                        Memento: Time Travel for the Web
                                                              89
                        Updated Technical Details (05/2011)
Development of Transactional Web Archive Software
Capture:
   •  Apache connection filter module captures URI, headers, body
   •  POSTs in real-time to transactional archive




Submit:
   •  Java-Grizzly-Jersey submission interface application
   •  Berkeley DB metadata store
   •  FS store for body and headers

                        Memento: Time Travel for the Web
                                                              90
                        Updated Technical Details (05/2011)
Development of Transactional Web Archive Software
Access:
   •  Transactional archive natively supports Memento
   •  Immediate availability of archived content
   •  Export of WARC, e.g. for long-term archiving in other environment




Development Timeline:
   •  Ongoing development (LANL) and testing (ODU)
   •  Submit/Access finalized; coding focus on collection management
   •  Expected release as open source, 3rd Quarter 2011

                        Memento: Time Travel for the Web
                                                              91
                        Updated Technical Details (05/2011)
The Memento Framework:




 Resource Versioning




  Memento: Time Travel for the Web
                                        92
  Updated Technical Details (05/2011)
Memento: Time Travel for the Web
                                      93
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
                                      94
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
                                      95
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
                                      96
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
                                      97
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
                                      98
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
                                      99
Updated Technical Details (05/2011)
Memento Framework

Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png




                   Memento: Time Travel for the Web
                                                         100
                   Updated Technical Details (05/2011)
Time Travel across Versions of a Picture of the Day




Movie at: http://www.mementoweb.org/demo/picoftheday.mov

               Memento: Time Travel for the Web
               Updated Technical Details (05/2011)
Memento: Time Travel for the Web
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
Updated Technical Details (05/2011)
Memento: Time Travel for the Web
Updated Technical Details (05/2011)
Memento Framework

Original Resource: http://dbpedia.org/resource/France




             Memento: Time Travel for the Web
                                                   107
             Updated Technical Details (05/2011)
Time-Series Analysis across DBpedia Versions




      Data collected through HTTP Navigation

      paper at http://arxiv.org/abs/1003.3661
            Memento: Time Travel for the Web
                                                  108
            Updated Technical Details (05/2011)
The Memento Framework:




 About Memento-Datetime:

Archive Navigation Coherence




    Memento: Time Travel for the Web
                                          109
    Updated Technical Details (05/2011)
Resource History Recorded by CMS and Transactional Archives




                   Memento: Time Travel for the Web
                                                         110
                   Updated Technical Details (05/2011)
Navigate foo.html @ t4




Memento: Time Travel for the Web
                                      111
Updated Technical Details (05/2011)
Navigation Coherence for foo.html @ t4




        Memento: Time Travel for the Web
                                              112
        Updated Technical Details (05/2011)
Resource Observations Recorded by Crawler-Based Archives




                  Memento: Time Travel for the Web
                                                        113
                  Updated Technical Details (05/2011)
Missed Observations




Memento: Time Travel for the Web
                                      114
Updated Technical Details (05/2011)
Navigate foo.html @ t4




Memento: Time Travel for the Web
                                      115
Updated Technical Details (05/2011)
Navigation Incoherence foo.html @ t4




       Memento: Time Travel for the Web
                                             116
       Updated Technical Details (05/2011)
Increase Coherence with Observations from Multiple Archives?




                   Memento: Time Travel for the Web
                                                         117
                   Updated Technical Details (05/2011)
The Memento Framework:




          About Memento-Datetime:

Relation to Creation Datetime and Last-Modified




             Memento: Time Travel for the Web
                                                   118
             Updated Technical Details (05/2011)
Three Notions of Time

•  Creation: Datetime when the resource first came into being
•  Last-Modified: Datetime when the resource was last changed
•  Memento-Datetime: Datetime that the resource was “frozen”, e.g.
   as a result of:
    o  Archiving it at a different URI (e.g. in a CMS, Web Archive,
       Transactional Archive, Snapshot Archive);
    o  Deciding never to change it anymore and keeping it at its
       original URI.




                    Memento: Time Travel for the Web
                                                          119
                    Updated Technical Details (05/2011)
Creation = Memento-Datetime = Last-Modified


Cr
MD
LM



At a particular point in time,
the Original Resource is
observed, and the associated
Memento is created. All time
values for the Memento are
the same.



                         Memento: Time Travel for the Web
                                                               120
                         Updated Technical Details (05/2011)
Creation = Memento-Datetime < Last-Modified


Cr
MD
LM


The HTML archive banner
added to a Memento
necessitates a change in
Last-Modified of the
Memento, but Creation date
and Memento-Datetime
remain unchanged.



                       Memento: Time Travel for the Web
                                                             121
                       Updated Technical Details (05/2011)
Creation < Memento-Datetime <= Last-Modified


Cr
MD
LM



It is possible that the Original
Resource was a placeholder
resource and returned a 200
response before it started to
identify a Memento (URI-
R=URI-M).



                            Memento: Time Travel for the Web
                                                                  122
                            Updated Technical Details (05/2011)
Memento-Datetime < Creation <= Last-Modified


Cr
MD
LM

If a Memento is copied to a
new archive, the copied
Memento has a Creation and
Last-Modified equal to the
time of copying. The
Memento-Datetime is “sticky”
and is the same for the
Memento and its copy.



                        Memento: Time Travel for the Web
                                                              123
                        Updated Technical Details (05/2011)
The Memento Framework:




Persistent Web Annotations




   Memento: Time Travel for the Web
                                         124
   Updated Technical Details (05/2011)
Web-Centric Annotation: No Persistence




Google Sidewiki Annotation on http://news.bbc.co.uk/ as of 2010-06-14
                     Memento: Time Travel for the Web
                                                           125
                     Updated Technical Details (05/2011)
Web-Centric Annotation: No Persistence




                       Archived page from:
http://www.dracos.co.uk/work/bbc-news-archive/2010/03/08/07.05.html
                    Memento: Time Travel for the Web
                                                          126
                    Updated Technical Details (05/2011)
Web-Centric Annotation: Desired Persistence




          Memento: Time Travel for the Web
                                                127
          Updated Technical Details (05/2011)
Open Annotation: Dealing with Web Time

•  As regular Web resources, Body and Target of an Annotation have
representations that can change over time.

•  Body and Target can change independently of each other.

•  If an Annotation involves resources as they existed at a particular point
in time, this needs to be recorded.

•  The OAC model provides hooks for doing so:
     •  Timeless Annotations;
     •  Uniform Time Annotations;
     •  Varied Time Annotations.




                        Memento: Time Travel for the Web
                        Updated Technical Details (05/2011)   128
Open Annotation: Uniform Time Annotations
•  The Annotation is not always applicable, but pertains to the state of the
Body and Target at a specific moment in time.


•  Add oac:when property to the Annotation.




                        Memento: Time Travel for the Web
                        Updated Technical Details (05/2011)   129
Memento + Open Annotation: Persistent Annotations
•  In order to reconstruct the Annotation as intended: Use Memento to
obtain an archived representation of B and T as they existed at the
oac:when datetime.




                      Memento: Time Travel for the Web
                      Updated Technical Details (05/2011)   130
Create an Annotation




Memento: Time Travel for the Web
                                      131
Updated Technical Details (05/2011)
Reconstruct the Annotation without Memento




          Memento: Time Travel for the Web
                                                132
          Updated Technical Details (05/2011)
Reconstruct the Annotation with Memento




         Memento: Time Travel for the Web
                                               133
         Updated Technical Details (05/2011)
The Memento Framework:




The Increasing Value of a URI




    Memento: Time Travel for the Web
                                          134
    Updated Technical Details (05/2011)
URI as Access Point to a Page




    Memento: Time Travel for the Web
    Updated Technical Details (05/2011)
URI as Access Point to Page and Data




       Memento: Time Travel for the Web
       Updated Technical Details (05/2011)
URI as Access Point to Current and Past Pages and Data




                Memento: Time Travel for the Web
                Updated Technical Details (05/2011)
References

•  Van de Sompel, H., Nelson, M.L., Sanderson, R.,
   Balakireva, L., Ainsworth, S., Shankar, H. (2009)
   Memento: Time Travel for the Web.
    http://arxiv.org/abs/0911.1112
•  Van de Sompel, H., Sanderson, R., Nelson, M.L.,
   Balakireva, L., Ainsworth, S., Shankar, H. (2010) An
   HTTP-Based Versioning Mechanism for Linked Data.
   Proceedings of the 3rd Workshop on Linked Data on the
   Web. http://arxiv.org/abs/1003.3661
•  Sanderson, R., and Van de Sompel, H. (2010) Making Web
   Annotations Persistent over Time. Proceedings of the
   10th ACM/IEEE-CS Joint Conference on Digital libraries.
    http://arxiv.org/abs/1003.2643
•  Sanderson, R., Van de Sompel, H. (2011) Open Annotation
   Alpha3 Data Model Guide.
   http://www.openannotation.org/spec/alpha3/


                  Memento: Time Travel for the Web
                                                        138
                  Updated Technical Details (05/2011)
Memento wants to make navigating the Web’s Past Easy




                    http://mementoweb.org/
        http://groups.google.com/group/memento-dev
                 Memento: Time Travel for the Web
                                                       139
                 Updated Technical Details (05/2011)

More Related Content

Memento: Updated technical details (May 2011)

  • 1. Memento: Time Travel for the Web   Herbert Van de Sompel Robert Sanderson Michael L. Nelson http://mementoweb.org/ Memento is funded by The Library of Congress   Updated Technical Details (May 2011) Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 2. Memento wants to make it Easy to navigate the Web of the Past Technical Specification https://datatracker.ietf.org/doc/draft-vandesompel-memento/ Memento: Time Travel for the Web 2 Updated Technical Details (05/2011)
  • 3. Tate Online Select Date Tate Online Today March 16 2008 March 16 2008 From National Archives Memento: Time Travel for the Web 3 Updated Technical Details (05/2011)
  • 4. Memento achieves this by introducing a uniform version access capability to integrate the past and current Web Memento: Time Travel for the Web 4 Updated Technical Details (05/2011)
  • 5. Problem Statement … Memento: Time Travel for the Web 5 Updated Technical Details (05/2011)
  • 6. Resources Memento: Time Travel for the Web 6 Updated Technical Details (05/2011)
  • 7. Resources have Representations Memento: Time Travel for the Web 7 Updated Technical Details (05/2011)
  • 8. Resources have Representations that Change over Time Memento: Time Travel for the Web 8 Updated Technical Details (05/2011)
  • 9. Only the Current Representation is Available from a Resource Memento: Time Travel for the Web 9 Updated Technical Details (05/2011)
  • 10. Old Representations are Lost Forever Memento: Time Travel for the Web 10 Updated Technical Details (05/2011)
  • 11. Archived/Version Resources Exist Memento: Time Travel for the Web 11 Updated Technical Details (05/2011)
  • 12. Resource Versions on the Web •  Content Management Systems •  Web Archives •  Transactional archives •  Search engine caches •  … Memento: Time Travel for the Web 12 Updated Technical Details (05/2011)
  • 13. Sep 11 2001, 20:36:10 UTC Dec 20 2001, 4:51:00 UTC Archived Resources http://en.wikipedia.org/w/index.php? http://web.archive.org/web/20010911203610/http:// title=September_11_attacks&oldid=282333 archived www.cnn.com/ archived resource for http://cnn.com resource for http://en.wikipedia.org/wiki/ September_11_attacks Memento: Time Travel for the Web Updated Technical Details (05/2011) 13
  • 14. Versions are Not Integrated with the Web •  Cannot talk about a resource as it used to exist •  Cannot access a prior version knowing the current one •  Cannot access the current version knowing a prior one Memento: Time Travel for the Web 14 Updated Technical Details (05/2011)
  • 15. Memento Wants to Integrate the Past and Current Web Memento: Time Travel for the Web 15 Updated Technical Details (05/2011)
  • 16. The Memento Framework: Protocol to Integrate Past and Current Web Overview Memento: Time Travel for the Web 16 Updated Technical Details (05/2011)
  • 17. Memento Framework •  Regards the Web as a big Content Management System •  Introduces a uniform capability to access versions on the Web •  Does not build new archives but leverages existing systems that host versions Memento: Time Travel for the Web 17 Updated Technical Details (05/2011)
  • 18. Memento Framework •  Is distributed: versions may exist on several servers •  Uses time as a global version indicator •  Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link Memento: Time Travel for the Web 18 Updated Technical Details (05/2011)
  • 19. Original Resources and Mementos Memento: Time Travel for the Web 19 Updated Technical Details (05/2011)
  • 20. Bridge from Present to Past Memento: Time Travel for the Web 20 Updated Technical Details (05/2011)
  • 21. Bridge from Past to Present Memento: Time Travel for the Web 21 Updated Technical Details (05/2011)
  • 22. Memento Framework Memento: Time Travel for the Web 22 Updated Technical Details (05/2011)
  • 23. Memento Client Server Interaction Memento: Time Travel for the Web 23 Updated Technical Details (05/2011)
  • 24. Memento HTTP Flow HEAD R, Accept-Datetime Link  G GET G, Accept-Datetime 302  M, Vary, Link  M,R,T GET M, Accept-Datetime 200, Memento-Datetime, Link  M,R,T,G 24
  • 25. The Memento Framework: Protocol to Integrate Past and Current Web Interesting Cases Memento: Time Travel for the Web 25 Updated Technical Details (05/2011)
  • 26. Multiple Archives Memento: Time Travel for the Web 26 Updated Technical Details (05/2011)
  • 27. Original Resource Gone Memento: Time Travel for the Web 27 Updated Technical Details (05/2011)
  • 28. Original Resource’s Server Gone Memento: Time Travel for the Web 28 Updated Technical Details (05/2011)
  • 29. Original Resource Provides no Link Memento: Time Travel for the Web 29 Updated Technical Details (05/2011)
  • 30. The Memento Framework: Protocol to Integrate Past and Current Web HTTP Headers Memento: Time Travel for the Web 30 Updated Technical Details (05/2011)
  • 31. HTTP Headers used in Memento •  Defines two new headers: –  request: Accept-Datetime! –  response: Memento-Datetime! •  Introduce new content for two existing headers: –  response: Vary ; Link! •  Use one existing headers without modification: –  response: Location, TCN:! Memento: Time Travel for the Web 31 Updated Technical Details (05/2011)
  • 32. HTTP Request Headers: Accept-Datetime •  Accept-Datetime
 o  Issued against TimeGate, (Original Resource), (Memento) o  Header value: -  Desired datetime of Memento (MANDATORY) Must be in RFC 1123 format and in GMT -  Interval indicator to express the client is only interested in Mementos within the interval (OPTIONAL) –  Expressed as two ISO8601 durations: "-P3DT5H;+P2DT6H"! Accept-Datetime: Mon, 12 Oct 2009 14:20:33 GMT! Memento: Time Travel for the Web 32 Updated Technical Details (05/2011)
  • 33. HTTP Response Headers: Memento-Datetime •  Memento-Datetime! o  Returned by Mementos -  Always. Even when not via a TimeGate o  Header value: Archival datetime of the Memento -  Resource has not and will not change beyond that date o  This header is sticky: -  Once returned, must always return it with same value -  Must also be preserved when Mementos are mirrored at different URIs •  This header is crucial to allow a client to understand it has arrived at a Memento See: http://www.mementoweb.org/guide/resourcetype/ Memento-Datetime: Mon, 12 Oct 2009 14:20:33 GMT! Memento: Time Travel for the Web 33 Updated Technical Details (05/2011)
  • 34. HTTP Response Headers: Vary •  Vary! o  Returned by TimeGate o  Similar to regular content negotiation o  Header value: -  negotiate, accept-datetime! •  TimeGate must first meet the datetime preference, and then – if possible – other content negotiation preferences •  Note: accept-datetime value in Vary header is crucial to allow a client to understand it has arrived at a TimeGate. See: http://www.mementoweb.org/guide/resourcetype/ Vary: negotiate, accept-datetime! Memento: Time Travel for the Web 34 Updated Technical Details (05/2011)
  • 35. HTTP Response Headers: Location •  Location
 o  Returned by TimeGate o  Similar to regular content negotiation o  Header value: URI of selected Memento Location: http://web.archive.org/web/20010911223004/ http://cnn.com! Memento: Time Travel for the Web 35 Updated Technical Details (05/2011)
  • 36. HTTP Response Headers: Link •  Link! o  Returned by Original Resource, TimeGate and Mementos o  Various new Relation Types are introduced: -  “original”! -  “timegate”! -  “memento”! -  “timemap”! o  HTTP Link Header: RFC 5988 See: https://datatracker.ietf.org/doc/rfc5988/ Link: <http://web.archive.org/web/20010911223004/http://! cnn.com>;rel="memento";datetime="Mon, 11 Sep 2001 22:30:04 GMT"! Memento: Time Travel for the Web 36 Updated Technical Details (05/2011)
  • 37. Memento HTTP Flow HEAD R, Accept-Datetime Link  G timegate GET G, Accept-Datetime 302  M, Vary, Link  M,R,T memento,original,timemap GET M, Accept-Datetime 200, Memento-Datetime, Link  M,R,T,G memento,original,timemap,timegate
  • 38. HTTP Response Headers: Link •  Link! o  Returned by Original Resource, TimeGate and Mementos o  Various new Relation Types are introduced Memento: Time Travel for the Web 38 Updated Technical Details (05/2011)
  • 39. The Memento Framework: Protocol to Integrate Past and Current Web HTTP Interactions Memento: Time Travel for the Web 39 Updated Technical Details (05/2011)
  • 46. The Memento Framework: Protocol to Integrate Past and Current Web HTTP Link Header Details Memento: Time Travel for the Web 46 Updated Technical Details (05/2011)
  • 47. HTTP Response Headers: Link Memento: Time Travel for the Web 47 Updated Technical Details (05/2011)
  • 48. Memento HTTP Flow HEAD R, Accept-Datetime Link  G timegate GET G, Accept-Datetime 302  M, Vary, Link  M,R,T GET M, Accept-Datetime 200, Memento-Datetime, Link  M,R,T,G
  • 49. HTTP Response Headers: Link •  RECOMMENDED "timegate” Link from Original Resource to TimeGate •  If this Link is not available, the client must try and find a TimeGate itself, via: -  Memento Discovery approaches (see later) -  User interaction (e.g. Preferences in an application) Memento: Time Travel for the Web 49 Updated Technical Details (05/2011)
  • 50. HTTP Response Headers: Link Memento: Time Travel for the Web 50 Updated Technical Details (05/2011)
  • 51. Memento HTTP Flow HEAD R, Accept-Datetime Link  G GET G, Accept-Datetime 302  M, Vary, Link  M memento GET M, Accept-Datetime 200, Memento-Datetime, Link  M memento
  • 52. HTTP Response Headers: Link •  MANDATORY ”memento” Links from TimeGate and Memento to Mementos •  ”memento” Links point at the following Mementos know to the responding server: o  Selected Memento (MANDATORY if a Memento is selected) o  First Memento, Last Memento (MANDATORY) o  Memento prev to selected one, Memento next to selected one (RECOMMENDED) o  Other Mementos (OPTIONAL, and only if prev and next are provided) o  Temporal order of Mementos is expressed using existing Relation Types (RFC 5829, RFC 5988): first, last, next, prev, successor-version, predecessor-version Memento: Time Travel for the Web 52 Updated Technical Details (05/2011)
  • 53. HTTP Response Headers: Link •  MANDATORY ”memento” Links from TimeGate and Memento to Mementos •  Attributes for a ”memento” Link: o  datetime (MANDATORY) datetime of the Memento pointed at by the link o  license (OPTIONAL) license associated with the Memento o  embargo (OPTIONAL) datetime until which the Memento will remain inaccessible o  type (RECOMMENDED) mime type of the Memento. Memento: Time Travel for the Web 53 Updated Technical Details (05/2011)
  • 54. HTTP Response Headers: Link Memento: Time Travel for the Web 54 Updated Technical Details (05/2011)
  • 55. Memento HTTP Flow HEAD R, Accept-Datetime Link  M=R memento,original
  • 56. HTTP Response Headers: Link •  Both ”memento” and ”original” Link on a resource. •  The resource is its own Memento, i.e. it is a stable resource. o  Resource that was born stable or became stable; it will not change anymore. o  For example resources with PermaLink on news sites o  Note the difference with Last-Modified header •  Can still also provide a ”timegate” Link! o  For example pointing at TimeGates for Mementos of the resource before it became stable Memento: Time Travel for the Web 56 Updated Technical Details (05/2011)
  • 57. HTTP Response Headers: Link Memento: Time Travel for the Web 57 Updated Technical Details (05/2011)
  • 58. Memento HTTP Flow GET M, (Accept-Datetime) 200, Memento-Datetime, Link  M,R,T memento,original,timemap
  • 59. HTTP Response Headers: Link •  Mementos without a TimeGate, for example: o  Resources in snapshot archives o  Version resources in systems that have not yet implemented TimeGates •  Should still use Memento-Datetime header •  Should still use “original” Link •  Can have “timemap” Link Memento: Time Travel for the Web 59 Updated Technical Details (05/2011)
  • 60. HTTP Response Headers: Link Memento: Time Travel for the Web 60 Updated Technical Details (05/2011)
  • 61. Memento HTTP Flow HEAD R, Accept-Datetime Link  G GET G, Accept-Datetime 302  M, Vary, Link  T timemap GET M, Accept-Datetime 200, Memento-Datetime, Link  T timemap
  • 62. HTTP Response Headers: Link •  RECOMMENDED ”timemap” Links from TimeGate and Mementos. •  A TimeMap is introduced to allow retrieving an inventory of Mementos for an Original Resource that the responding server is aware of. It lists: o  URI of Original Resource (MANDATORY) o  URI and datetime of all known Mementos (MANDATORY) o  URI of TimeGate for Original Resource (RECOMMENDED) o  URI of TimeMap itself (RECOMMENDED) •  Multiple TimeMap serializations possible; link-value format MANDATORY o  application/link-format: see https://datatracker.ietf.org/doc/draft-ietf-core-link-format/ Memento: Time Travel for the Web 62 Updated Technical Details (05/2011)
  • 63. Memento HTTP Flow: GET TimeMap Memento: Time Travel for the Web 63 Updated Technical Details (05/2011)
  • 64. Memento HTTP Flow: TimeMap Response Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 65. The Memento Framework: Discovery to Support Integration of Past and Current Web TimeGate Discovery Memento: Time Travel for the Web 65 Updated Technical Details (05/2011)
  • 66. Batch Discovery of TimeGates: robots.txt •  robots.txt file is used by Web servers to convey crawling policies •  Web crawlers (such as for archives) retrieve and parse it •  De-facto standard, no official endorsement •  Extended with new directives, including by Google User-agent: * # Reject all crawlers! Disallow: /! # Google Sitemap extension! Sitemap: http://some.example.com/me/sitemap.xml! User-agent: NiceBot # Select only NiceBot! Crawl-delay: 10 # 10 seconds between requests! Memento: Time Travel for the Web 66 Updated Technical Details (05/2011)
  • 67. Batch Discovery of TimeGates: robots.txt •  Add TimeGate and Archived directives to support discovery of TimeGates known to the server •  User agent should concatenate desired URL with TimeGate link •  Archived value is truncated host/path or * to describe a general web archive http://mementoweb.org/guide/robotstxt/ Memento: Time Travel for the Web 67 Updated Technical Details (05/2011)
  • 68. The Memento Framework: Discovery to Support Integration of Present and Past Web Discovery via TimeMaps: All Mementos for a given Original Resource known by an archive Memento: Time Travel for the Web 68 Updated Technical Details (05/2011)
  • 69. TimeMap Overview •  A TimeMap is an inventory of Mementos for an Original Resource that the responding server is aware of. It lists at least: •  URI of Original Resource •  URI and datetime of all known Mementos •  URI of TimeGate for Original Resource •  URI of TimeMap itself •  Multiple TimeMap serializations possible: •  application/link-format mandatory see https://datatracker.ietf.org/doc/draft-ietf-core-link-format/ •  RDF TimeMaps proposed Memento: Time Travel for the Web 69 Updated Technical Details (05/2011)
  • 70. TimeMaps: Link Format Syntax •  Document in the format of the value of the Link HTTP Header •  Format: <URI>;rel="type";attr="val", ! !<URI2>…
 •  rel is the relationship between context URI and the URI in <>s •  The Context URI for TimeMaps is the URI with rel="original" •  Other rel types link to Mementos, TimeGates etc.! <http://cnn.com/>;rel="original",
 <http://web.archive.org/web/20010911223004/http://! cnn.com>;rel="memento";datetime="Mon, 11 Sep 2001 22:30:04 GMT"! Memento: Time Travel for the Web 70 Updated Technical Details (05/2011)
  • 71. TimeMaps: Link Attributes •  Existing Attributes for Links •  "rel" The type of relationship •  "type" The (mime) formatt of the linked resource •  "title" Title of the linked resource •  "hreflang" !Language of linked resource •  "media" Intended media (eg screen) •  "anchor" URI to override context URI for link •  New rel types introduced: •  "original" The Original Resource •  "memento" A Memento of the Original •  "timegate" A TimeGate for the Original •  "timemap" A TimeMap of Mementos of the Original! Memento: Time Travel for the Web 71 Updated Technical Details (05/2011)
  • 72. TimeMaps: Link Attributes •  New Attributes for Mementos: •  datetime The Memento-Datetime! •  license!License associated with Memento •  embargo!Time after which Memento is available •  Are others necessary?! <http://cnn.com/>;rel="original",! <http://web.archive.org/timegate/cnn.com/>;rel="timegate",! <http://web.archive.org/timemap/link/cnn.com/>;rel="timemap",! <http://web.archive.org/web/20010911223004/http://
 cnn.com>;rel="memento";datetime="Mon, 11 Sep 2001 22:30:04 GMT"! <http://web.archive.org/web/…/cnn.com/>;rel="memento";
 license="http://archive.org/license/1";
 datetime="…";embargo="Mon, 20 Jul 2011 00:00:00 GMT",! … ! Memento: Time Travel for the Web 72 Updated Technical Details (05/2011)
  • 73. The Memento Framework: Discovery to Support Integration of Past and Current Web Memento Discovery: All Mementos known by an archive Memento: Time Travel for the Web 73 Updated Technical Details (05/2011)
  • 74. Batch discovery of Mementos: robots.txt •  robots.txt file is used by Web servers to convey crawling policies •  Support discovery of Mementos through robots.txt via existing User-agent and Allow directives •  Use value memento for User-agent to convey that the value for the Memento-Datetime header must remain sticky when crawling/mirroring Mementos Memento: Time Travel for the Web 74 Updated Technical Details (05/2011)
  • 75. Batch discovery of Mementos: Memento Feeds •  Concept: •  Archives publish feeds in which each entry provides details about a specific Memento, e.g. Memento-Datetime, Original Resource, etc. •  As new Mementos become available, new feeds with new entries are published •  Once published, feeds remain static •  Technology: •  To be decided in collaboration with IIPC •  Inspired by the approach and functionality of CDX files (see http://www.archive.org/web/researcher/cdx_legend.php) but: •  With Memento-specific extensions; •  Possibly using different serialization; •  Including mechanisms to discover these feeds. Memento: Time Travel for the Web 75 Updated Technical Details (05/2011)
  • 76. The Memento Framework: Tools Memento: Time Travel for the Web 76 Updated Technical Details (05/2011)
  • 77. Memento Client Support •  Several client tools developed by us and others •  Add-ons for FireFox (operational) and Internet Explorer (experimental) •  Applications for Android (operational) and iPhone/iPad (in development) •  Paper in Code4Lib Journal http://journal.code4lib.org/articles/4979 Memento: Time Travel for the Web 77 Updated Technical Details (05/2011)
  • 78. Memento Server Support •  Plug-in for MediaWiki (operational) •  Used on W3C’s main wiki •  Please install it for your MediaWiki! http://www.mediawiki.org/wiki/Extension:Memento Memento: Time Travel for the Web 78 Updated Technical Details (05/2011)
  • 79. Memento Server Support •  Memento-compliant Wayback software: •  In production at the Internet Archive •  Available to Web archives, worldwide •  Please have your favorite Web Archive experiment with the new 1.6 version! http://mementoweb.org/tools/wayback/ Memento: Time Travel for the Web 79 Updated Technical Details (05/2011)
  • 80. Memento Server Validator •  Server side client: •  Attempts to perform all Memento actions against a given URI •  Reports success/failure of the interactions and warnings for optional aspects •  Kept up to date with IETF Internet Draft http://mementoweb.org/tools/validator/ Memento: Time Travel for the Web 80 Updated Technical Details (05/2011)
  • 81. Memento Proxy Support •  Several systems that host Mementos made Memento- compliant “by proxy” •  Many major Web Archives that do not yet run Memento- compliant software •  3,000+ MediaWiki systems, including Wikipedia, Wikia •  We would love all of these to become natively Memento compliant! Memento: Time Travel for the Web 81 Updated Technical Details (05/2011)
  • 82. Memento Aggregator TimeGate •  Aggregates all known TimeGates •  Proxies •  Native Implementations •  Redirects to authoritative TimeGates (Wikipedia, Transactional Archives) •  Currently implemented with BerkeleyDB •  Future version to use FaceBook's Cassandra platform Memento: Time Travel for the Web 82 Updated Technical Details (05/2011)
  • 83. Aggregators Find More Mementos! •  1000 URIs sampled from delicious.com •  1 dot = 1 Memento (x=Memento-Datetime, y=URI of Original Resource) •  Sorted by URI longevity Memento: Time Travel for the Web 83 Updated Technical Details (05/2011)
  • 84. But Still Too Few Mementos To Be Found… •  1000 URIs sampled from search engine result pages; •  See: “How Much of the Web is Archived?” JCDL 2011 Memento: Time Travel for the Web 84 Updated Technical Details (05/2011)
  • 85. Crawl-Based Web Archives Observations For example: Heritrix crawler for Internet Archive Memento: Time Travel for the Web 85 Updated Technical Details (05/2011)
  • 86. Crawl-Based Web Archives •  Collect discreet observations of resources, not their entire evolution. •  Can be rejected (robots.txt, by user-agent, by host IP) •  Can be deceived (cloaking, by geo-location, by user-agent). •  Coverage of particular Web server dependent on crawl-strategy. Memento: Time Travel for the Web 86 Updated Technical Details (05/2011)
  • 87. Server-Side Transactional Web Archives Change History For example: TTApache, PageVault, Vignette Web Capture Memento: Time Travel for the Web 87 Updated Technical Details (05/2011)
  • 88. Server-Side Transactional Web Archives •  Collect all representations served by to-be-archived server. •  To-be-archived server needs to cooperate. •  Incentives e.g. institutional memory, official record of Web presence. •  Archival coverage restricted by to-be-archived server, does not include external servers (e.g. embedded resources). •  To be archived server can submit falsified information. •  Archival collection management: what to keep, what not (e.g. significant changes, deduplication, …). Memento: Time Travel for the Web 88 Updated Technical Details (05/2011)
  • 89. Development of Transactional Web Archive Software Capture: •  Apache connection filter module captures URI, headers, body •  POSTs in real-time to transactional archive Access: •  Online, real time access via Memento TimeGates •  Batch Export via WARC files for long term preservation Memento: Time Travel for the Web 89 Updated Technical Details (05/2011)
  • 90. Development of Transactional Web Archive Software Capture: •  Apache connection filter module captures URI, headers, body •  POSTs in real-time to transactional archive Submit: •  Java-Grizzly-Jersey submission interface application •  Berkeley DB metadata store •  FS store for body and headers Memento: Time Travel for the Web 90 Updated Technical Details (05/2011)
  • 91. Development of Transactional Web Archive Software Access: •  Transactional archive natively supports Memento •  Immediate availability of archived content •  Export of WARC, e.g. for long-term archiving in other environment Development Timeline: •  Ongoing development (LANL) and testing (ODU) •  Submit/Access finalized; coding focus on collection management •  Expected release as open source, 3rd Quarter 2011 Memento: Time Travel for the Web 91 Updated Technical Details (05/2011)
  • 92. The Memento Framework: Resource Versioning Memento: Time Travel for the Web 92 Updated Technical Details (05/2011)
  • 93. Memento: Time Travel for the Web 93 Updated Technical Details (05/2011)
  • 94. Memento: Time Travel for the Web 94 Updated Technical Details (05/2011)
  • 95. Memento: Time Travel for the Web 95 Updated Technical Details (05/2011)
  • 96. Memento: Time Travel for the Web 96 Updated Technical Details (05/2011)
  • 97. Memento: Time Travel for the Web 97 Updated Technical Details (05/2011)
  • 98. Memento: Time Travel for the Web 98 Updated Technical Details (05/2011)
  • 99. Memento: Time Travel for the Web 99 Updated Technical Details (05/2011)
  • 100. Memento Framework Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png Memento: Time Travel for the Web 100 Updated Technical Details (05/2011)
  • 101. Time Travel across Versions of a Picture of the Day Movie at: http://www.mementoweb.org/demo/picoftheday.mov Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 102. Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 103. Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 104. Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 105. Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 106. Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 107. Memento Framework Original Resource: http://dbpedia.org/resource/France Memento: Time Travel for the Web 107 Updated Technical Details (05/2011)
  • 108. Time-Series Analysis across DBpedia Versions Data collected through HTTP Navigation paper at http://arxiv.org/abs/1003.3661 Memento: Time Travel for the Web 108 Updated Technical Details (05/2011)
  • 109. The Memento Framework: About Memento-Datetime: Archive Navigation Coherence Memento: Time Travel for the Web 109 Updated Technical Details (05/2011)
  • 110. Resource History Recorded by CMS and Transactional Archives Memento: Time Travel for the Web 110 Updated Technical Details (05/2011)
  • 111. Navigate foo.html @ t4 Memento: Time Travel for the Web 111 Updated Technical Details (05/2011)
  • 112. Navigation Coherence for foo.html @ t4 Memento: Time Travel for the Web 112 Updated Technical Details (05/2011)
  • 113. Resource Observations Recorded by Crawler-Based Archives Memento: Time Travel for the Web 113 Updated Technical Details (05/2011)
  • 114. Missed Observations Memento: Time Travel for the Web 114 Updated Technical Details (05/2011)
  • 115. Navigate foo.html @ t4 Memento: Time Travel for the Web 115 Updated Technical Details (05/2011)
  • 116. Navigation Incoherence foo.html @ t4 Memento: Time Travel for the Web 116 Updated Technical Details (05/2011)
  • 117. Increase Coherence with Observations from Multiple Archives? Memento: Time Travel for the Web 117 Updated Technical Details (05/2011)
  • 118. The Memento Framework: About Memento-Datetime: Relation to Creation Datetime and Last-Modified Memento: Time Travel for the Web 118 Updated Technical Details (05/2011)
  • 119. Three Notions of Time •  Creation: Datetime when the resource first came into being •  Last-Modified: Datetime when the resource was last changed •  Memento-Datetime: Datetime that the resource was “frozen”, e.g. as a result of: o  Archiving it at a different URI (e.g. in a CMS, Web Archive, Transactional Archive, Snapshot Archive); o  Deciding never to change it anymore and keeping it at its original URI. Memento: Time Travel for the Web 119 Updated Technical Details (05/2011)
  • 120. Creation = Memento-Datetime = Last-Modified Cr MD LM At a particular point in time, the Original Resource is observed, and the associated Memento is created. All time values for the Memento are the same. Memento: Time Travel for the Web 120 Updated Technical Details (05/2011)
  • 121. Creation = Memento-Datetime < Last-Modified Cr MD LM The HTML archive banner added to a Memento necessitates a change in Last-Modified of the Memento, but Creation date and Memento-Datetime remain unchanged. Memento: Time Travel for the Web 121 Updated Technical Details (05/2011)
  • 122. Creation < Memento-Datetime <= Last-Modified Cr MD LM It is possible that the Original Resource was a placeholder resource and returned a 200 response before it started to identify a Memento (URI- R=URI-M). Memento: Time Travel for the Web 122 Updated Technical Details (05/2011)
  • 123. Memento-Datetime < Creation <= Last-Modified Cr MD LM If a Memento is copied to a new archive, the copied Memento has a Creation and Last-Modified equal to the time of copying. The Memento-Datetime is “sticky” and is the same for the Memento and its copy. Memento: Time Travel for the Web 123 Updated Technical Details (05/2011)
  • 124. The Memento Framework: Persistent Web Annotations Memento: Time Travel for the Web 124 Updated Technical Details (05/2011)
  • 125. Web-Centric Annotation: No Persistence Google Sidewiki Annotation on http://news.bbc.co.uk/ as of 2010-06-14 Memento: Time Travel for the Web 125 Updated Technical Details (05/2011)
  • 126. Web-Centric Annotation: No Persistence Archived page from: http://www.dracos.co.uk/work/bbc-news-archive/2010/03/08/07.05.html Memento: Time Travel for the Web 126 Updated Technical Details (05/2011)
  • 127. Web-Centric Annotation: Desired Persistence Memento: Time Travel for the Web 127 Updated Technical Details (05/2011)
  • 128. Open Annotation: Dealing with Web Time •  As regular Web resources, Body and Target of an Annotation have representations that can change over time. •  Body and Target can change independently of each other. •  If an Annotation involves resources as they existed at a particular point in time, this needs to be recorded. •  The OAC model provides hooks for doing so: •  Timeless Annotations; •  Uniform Time Annotations; •  Varied Time Annotations. Memento: Time Travel for the Web Updated Technical Details (05/2011) 128
  • 129. Open Annotation: Uniform Time Annotations •  The Annotation is not always applicable, but pertains to the state of the Body and Target at a specific moment in time. •  Add oac:when property to the Annotation. Memento: Time Travel for the Web Updated Technical Details (05/2011) 129
  • 130. Memento + Open Annotation: Persistent Annotations •  In order to reconstruct the Annotation as intended: Use Memento to obtain an archived representation of B and T as they existed at the oac:when datetime. Memento: Time Travel for the Web Updated Technical Details (05/2011) 130
  • 131. Create an Annotation Memento: Time Travel for the Web 131 Updated Technical Details (05/2011)
  • 132. Reconstruct the Annotation without Memento Memento: Time Travel for the Web 132 Updated Technical Details (05/2011)
  • 133. Reconstruct the Annotation with Memento Memento: Time Travel for the Web 133 Updated Technical Details (05/2011)
  • 134. The Memento Framework: The Increasing Value of a URI Memento: Time Travel for the Web 134 Updated Technical Details (05/2011)
  • 135. URI as Access Point to a Page Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 136. URI as Access Point to Page and Data Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 137. URI as Access Point to Current and Past Pages and Data Memento: Time Travel for the Web Updated Technical Details (05/2011)
  • 138. References •  Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L., Ainsworth, S., Shankar, H. (2009) Memento: Time Travel for the Web. http://arxiv.org/abs/0911.1112 •  Van de Sompel, H., Sanderson, R., Nelson, M.L., Balakireva, L., Ainsworth, S., Shankar, H. (2010) An HTTP-Based Versioning Mechanism for Linked Data. Proceedings of the 3rd Workshop on Linked Data on the Web. http://arxiv.org/abs/1003.3661 •  Sanderson, R., and Van de Sompel, H. (2010) Making Web Annotations Persistent over Time. Proceedings of the 10th ACM/IEEE-CS Joint Conference on Digital libraries. http://arxiv.org/abs/1003.2643 •  Sanderson, R., Van de Sompel, H. (2011) Open Annotation Alpha3 Data Model Guide. http://www.openannotation.org/spec/alpha3/ Memento: Time Travel for the Web 138 Updated Technical Details (05/2011)
  • 139. Memento wants to make navigating the Web’s Past Easy http://mementoweb.org/ http://groups.google.com/group/memento-dev Memento: Time Travel for the Web 139 Updated Technical Details (05/2011)