I was involved in getting the legal office to approve our release of data. This post summarises some of the key aspects of the conversation:
Q. Was a
data collection notice used which indicated this possible use of the data?
A. Users
of CamTools have always [since the service first went online] had the option to
open the privacy policy which says "Logs are used to create summary
statistics which may be made publicly available. Summary statistics do not
include personal data." Although it would be ideal to send an email to
tell users as a matter of courtesy that we are going to anonymise and release
the data (because users were not required to read the privacy policy which
is not prominently displayed), it was felt we did not have to do this.
The nature of the personal information concerned includes CRSids but
not sensitive personal data. It was
therefore deemed that we did not need consent to anonymise (ie. process the
data). It is not completely clear if anonymising is processing but it is safest
to assume it is. The personal information was not provided on a confidential
basis, so there should be no issue on that score.
Conclusion. Be
rigorous about having a privacy policy for systems that collect personal data
and anticipate public use of the data. We will probably move to actively
collecting agreement to terms and conditions (including privacy policy) with
click-through pages in future.
Q. Is the
anonymisation secure enough?
A. We had
taken advice from an expert in the Computer Lab about anonymisation of data,
but we were also advised to cross-check with the University Computing Service
that; (1) they were happy with the security of the anonymisation process
used, including the identifiers which you might insert in relation to some of
the data (for example identifying the school to which the users belong), was
adequate to protect the identity of individuals, and (2) since the open data
licence contemplates commercial use, the UCS is confident that this is within
JANET acceptable use policy.
I was pointed out that if we use the PDDL open data licence, we are
releasing the material irrevocably and with no restraints. Both
commercialistion and patenting at a future date is precluded in fact and by the
terms of the licence.
Conclusion. This is
a tricky area and it is good to have as many opinions as possible confirming
that the anonymisation is adequate. We also need to track (if we can) uses made
of the data and any indication the anonymisation is inadequate. However, the
terms of the PDDL mean there is little we can do about it once the data is
released, which just increases the pressure to ensure we have good
anonymisation in the first place.
Q. Would
this breach anyone’s intellectual property rights?
A. The
data was created for the administrative purposes of the University, so there
should be no problem under the IP Ordinance (the University would own the
data). Release of the data also releases database table structures. Since
the software that produced the log
tables was licensed under the ECL2 open source licence, which allows for onward
distribution on an open source basis, there should be no problem here either.
We were advised to check there is no conflict between the ECL2 and the
PDDL licence.
Conclusion. I
hadn’t thought about this aspect of releasing data. This question represents an
unexpected benefit of running open source code and a potential minor headache
for anyone releasing data from a commercial system.