- We've been extracting the data from Sakai, which was more difficult than it sounds. Sakai stores its events in a massive SQL table, one after the other, so that it's tens of millions of rows long before very long at all. Merging tables, fixing corrupt old data, that kind of thing. Anyway, all done now.
- We're investigating tools to help us analyse the data. Pentaho looks very promising.
Though none of this should be treated as doctrine, and we're still definitely open to ideas, we thought it was time to do some initial data investigations, now that we have it. The key structuring concept for me is:
Who will be interested in our data, and what would they like to know?An easy to imagine, but not entirely encompassing imaginary situations are these.
- If someone else were running the VLE, what would we want to know about it?
- If we could get secret, spy-style access to our deadliest rival institution (identity an exercise for the reader) what would we want to find out to make our VLE more awe-inspiring than theirs?
- If a charismatic leader were to rouse academics or students to come to our door bearing pitchforks and burning torches, demanding VLE data, what would be the rhetoric -- what would they be demanding?
In terms of the data, what we have is:
who does whatSo to do a meaningful analysis we have two axes: Who and What. While we'll give away as much raw data as is possible, we need to provide supporting mappings. Who is dps10? What is site 85? We also need to make sure, when we anonymise that we don't lose those aspects that enable external people to ask questions.
We're working out how we should take a first stab at Who and What, and are looking at finding sources. I imagine that when we've done this first round of analysis we'll discover the world doesn't divide up how we imagine. That seems to be the near universal experience of user experience analysis, certainly we learnt in our JISC Academic Networking project that the world of networking isn't divided up in quite the way we imagined. As we discover this from the activity data, we will iterate around, trying again and again.
It might even be worth applying Bayesian Clustering or Entropy-Based Tree Building to see how a machine would cluster behaviour. All very exciting (to me, anyway!). See pages 15-21 of this powerpoint by Allan Neymark at SJSU to see all this simply explained in terms of Simpsons characters.
Exciting times. At the same time, extremely tedious for the guys doing the database extraction and normalisation. Personally, I seem to have escaped that bit for this project. Phew!
No comments:
Post a Comment