AS ISO 28500:2018 pdf free.Australian Standard Information and documentation – WARC file format.
6 WARCrecordtypes
6.1 General
The purpose and use of each defined record type is described in 6.2 to 6.9.
Because new record types that extend the WARC format may be defined in future versions of the WARC
standard, WARC processing software shall skip records of unknown type.
6.2 ‘warcinfo’
A ‘warcinfo’ record describes the records that follow it, up through end of file, end of input, or until next ‘warcinfo’ record. T’pically, this appears once and at the beginning of a WARC file. For a web archive, it often contains information about the web crawl which generated the following records.
The format of this descriptive record block may vary, though the use of the “application/warc-fields” content-type is recommended. Allowable fields include, but are not limited to, all [DCMI] plus the following field definitions. All fields are optional.
a) ‘operator’: contact information for the operator who created this WARC resource. A name or name and email address is recommended.
b) ‘software’: the software and software version used to create this WARC resource. For example, “heritrix/1.12.O”.
c) ‘robots’: the robots policy followed by the harvester creating this WARC resource. The string ‘classic’ indicates the 1994 web robots exclusion standard rules are being obeyed.
d) ‘hostname’: the hostname of the machine that created this WARC resource, such as “crawlingl7. arch ive.org”.
e) ‘ip’: the IP address of the machine that created this WARC resource, such as “123.2.3.4”.
f) ‘http-header-user-agent’: the HTTP ‘user-agent’ header usually sent by the harvester along with each request. Note that if ‘request’ records are used to save verbatim requests, this information is redundant. (If a ‘request’ or ‘metadata’ record reports a different ‘user-agent’ for a specific request, the more specific information should be considered more reliable.)
g) ‘http-header-from’: the HTTP ‘from’ header usually sent by the harvester aLong with each request. (The same considerations as for ‘user-agent’ apply.)
it may also contain technical information such as base encoding of the digests used in named fields.
So that multiple record excerpts from inside WARC files are also valid WARC files, it is optional that the first record of a legal WARC be a ‘warcinfo’ description. Also, to allow the concatenation of WARC files into a larger valid WARC file, it is allowable for ‘warcinfo’ records to appear anywhere in a WARC file.
See B.1 for an example of a ‘warcinfo’ record.
6.3 ‘response’
6.3.1 General
A ‘response’ record should contain a complete scheme-specific response, including network protocol information, where possible. The exact contents of a ‘response’ record are determined not just by the record type but also by the URI scheme of the record’s target-URI, as described in 6.3.2 to 6.3.3.
See B.2 for an example of a ‘response’ record.AS ISO 28500 pdf download.
AS ISO 28500:2018 pdf free
Note:
Can you help me share this website on your Facebook or others? Many thanks!