This sounds like a simple question. Ten seconds on most sites will
tell a human viewer where a site originates from, and a little digging
will produce the answer eventually. But under Non-Print Legal Deposit,
we need a scaleable way of settling the question without human
intervention. Our remit under the new regulations extends to sites that
are issued from a .uk or other UK geographic top-level domain, or where
part of the publishing process takes place in the UK. (See the
regulations here, and a summary here.)
We estimate that there are just short of five million sites that end
in .uk - a simple, unambiguous and machine-readable way of knowing that a
site originates from within the UK and so is covered by the remit we
now have. However, not all UK domains end in .uk. Many .com, .org and
other sites are in fact published from within the UK, and there are few
reliable figures as to how many of these there are. And so to identify
which of these fall within the scope of the regulations, we need other
methods.
One such method is to find out where the site is hosted. www.geoiptool.com provides
information on where a server is located, although it is difficult to
attain 100% accuracy. Another way is to look at where the domain name is
registered, using a service such as www.whois.net. However,
in many cases domains are registered by one company on behalf of
another or of an individual, perhaps because they want their contact
details to remain private. There also isn't (yet) a straightforward way
of querying any of these services at scale for thousands or indeed
millions of sites.
There may be sites for which we have direct knowledge, from the site
owner, that their .com domain is operated from within the UK, but that
could only ever be for a tiny proportion of sites. And so after all
these possibilities are exhausted, the next step is to make judgements
based on the presentation of the site itself. But what in a site is
"enough" ? A postal address in a Contact Us page is a possibility; so is
a UK-domain email address (for those sites whose owners don't use
anything as twentieth century as the post).
What if a site doesn't disclose the information we might like, but is
self-evidently from the UK (once you look at the content)? One example
is Conservative Home,
a prominent political site, which nowhere explicitly states that it is
published in the UK. This is a particular issue for blogs, which are
often hosted on a platform service such as Wordpress
(which is based in San Antonio, Texas) but would be thought by most to
be "published" from wherever the author is based. There are similar
issues in determining which parts of social media sites such as Twitter
or Facebook should be treated as published from within the UK.
All of this of course supposes that all website owners tell the truth
about where they are based. There may be cases where a site is
published in another country but purports to be from the UK, perhaps to
protect the author from a repressive regime. Conversely an owner might,
for reasons which are hard to predict, wish that their site published
within the UK did not appear to be.
No comments:
Post a Comment