Exposing VLE Activity Data: October 2011

I was involved in getting the legal office to approve our release of data. This post summarises some of the key aspects of the conversation:

Q. Was a data collection notice used which indicated this possible use of the data?

A. Users of CamTools have always [since the service first went online] had the option to open the privacy policy which says "Logs are used to create summary statistics which may be made publicly available. Summary statistics do not include personal data." Although it would be ideal to send an email to tell users as a matter of courtesy that we are going to anonymise and release the data (because users were not required to read the privacy policy which is not prominently displayed), it was felt we did not have to do this.

The nature of the personal information concerned includes CRSids but not sensitive personal data. It was therefore deemed that we did not need consent to anonymise (ie. process the data). It is not completely clear if anonymising is processing but it is safest to assume it is. The personal information was not provided on a confidential basis, so there should be no issue on that score.

Conclusion. Be rigorous about having a privacy policy for systems that collect personal data and anticipate public use of the data. We will probably move to actively collecting agreement to terms and conditions (including privacy policy) with click-through pages in future.

Q. Is the anonymisation secure enough?

A. We had taken advice from an expert in the Computer Lab about anonymisation of data, but we were also advised to cross-check with the University Computing Service that; (1) they were happy with the security of the anonymisation process used, including the identifiers which you might insert in relation to some of the data (for example identifying the school to which the users belong), was adequate to protect the identity of individuals, and (2) since the open data licence contemplates commercial use, the UCS is confident that this is within JANET acceptable use policy.

I was pointed out that if we use the PDDL open data licence, we are releasing the material irrevocably and with no restraints. Both commercialistion and patenting at a future date is precluded in fact and by the terms of the licence.

Conclusion. This is a tricky area and it is good to have as many opinions as possible confirming that the anonymisation is adequate. We also need to track (if we can) uses made of the data and any indication the anonymisation is inadequate. However, the terms of the PDDL mean there is little we can do about it once the data is released, which just increases the pressure to ensure we have good anonymisation in the first place.

Q. Would this breach anyone’s intellectual property rights?

A. The data was created for the administrative purposes of the University, so there should be no problem under the IP Ordinance (the University would own the data). Release of the data also releases database table structures. Since the software that produced the log tables was licensed under the ECL2 open source licence, which allows for onward distribution on an open source basis, there should be no problem here either. We were advised to check there is no conflict between the ECL2 and the PDDL licence.

Conclusion. I hadn’t thought about this aspect of releasing data. This question represents an unexpected benefit of running open source code and a potential minor headache for anyone releasing data from a commercial system.

Exposing VLE Activity Data

Friday, 28 October 2011

Legal clearance for releasing personal data