War of the worlds – website
security
Forget the Summer blockbuster starring
Tom Cruise. The real pyrotechnics took place between
two astronomers last year.
In July 2005, Jose-Luis Ortiz and his
team at the Institute of Astrophysics of Andalusia announced
that they had discovered a giant object orbiting beyond Neptune.
Mike Brown, an astronomer at Caltech, emailed his congratulations
to Ortiz, and at the same time, told the Minor Planet Center
(MPC) that he had also been tracking the object. Soon after,
Brian Marsden of the MPC told Brown that telescope logs including
his observations were publicly available on the internet.
Brown
then checked his server records, and by performing reverse
DNS lookup (incidentally demonstrating what a valuable process
this is), discovered that his logs had been accessed via two
computers at the Institute of Astrophysics of Andalusia. Ortiz
readily admits that this is the case. However, he claims that
he did nothing wrong, as he found the logs on a publicly available
website via a Google search. However, as the use of the Caltech
logs were not recognised, it is not clear whether the log
file data was used to validate the Spanish findings, or whether
it caused them to re-examine images taken more than two years
previously.
Putting to one side the elements of the debate
particular to the astronomy community, lets concentrate on
the accessing of the log files. Well, within the letter of
the law, you would have to say that Ortiz is right in saying
that the log files were in the public domain, and therefore
“fair game”. In fact, we found that the log files
are still available to the public. However, we would have
to say that for us, it isn’t right ethically.
Don’t
think that finding information not really intended for everyone
is uncommon. Not that long ago, we found that we had been
nominated for an award when we found the entry form via Google.
It wasn’t particularly sensitive, but we knew that it
wasn’t supposed to be available to the general public.
If your website can be indexed by Google, it will index it.
Normally, of course, this is a good thing, but its worth sitting
back for a moment and thinking about what you have on your
website and whether you want Google to index everything it
finds.
One thing you can do is to go to Google and type in,
“site:www.mydomain.com”, inserting your own domain
name, of course. This will list all the pages that Google
has indexed from your website.
Assuming that you want to keep
information from Google, what can you do? Well, firstly, you
can password-protect directories and pages. This is probably
the best solution, as it is difficult to argue that information
is in the public domain if someone has to hack a password
to get it. You can also use a robots.txt file to tell the
search engine spiders (the technology used to index a website)
what it can list and what is off limits. Similarly, a meta
tag can be placed in the head of individual pages to the same
effect.
This won’t completely fireproof you however. On some
websites, your browser will list the contents of a directory
if there is no index page. That loophole can also be closed,
just don't forget to do it.
Return to resources
page