Friday 16 September 2011

What can our Perl analysis tool do ?

Introduction

We have developed a Perl script that allows us to extract information from the event logs and enrich this with data from other sources.

We have used this extracted data in several ways – for example looking at when the peak usage occurs for different events and visualising a network of sites linked by their users.

This article is to give an overview of the information that can be extracted using this script and the additional data that is needed.

Institutional Unique Logins

Input Data:

  • Event Logs broken down by Academic Year

  • Users and the Institutions (eg colleges, departments) they belong to

Request:

  • Academic years to query

  • Period over which to accumulate counts (year, month, week).

Output:

  • Report of how many users belong to only 1 institution and how many to multiple (2,3 or 4) institutions.

    inst_membership_count.csv

  • For each institution, produces a count of unique logins over the requested time unit. These are produced separately for each unit of time and also joined into one overall file for the requested period. This can be pivoted in a spreadsheet to produce overall total unique logins per year/month/week for each institution.

    unique_insts_year_wk or month no.csv for each week or month no

    unique_insts_by_week.csv or unique_insts_by_month.csv

Warning: Many undergraduates are not assigned to a Department but only a college during their initial years of study.

Counts of Events

Input Data:

  • Event Logs broken down by Academic Year

Request:

  • Academic years to query

  • Period over which to accumulate counts (month, week).

Output:

  • Produces a separate file for each requested time unit containing total counts for each type of event. These are produced separately for each unit of time and also joined into one overall file for the requested period. This can be pivoted in at spreadsheet to produce overall total counts for each event per month or week.

    total_events_year_wk or month no.csv for each week or month no

    total_events_by_week.csv or total_events_by_month.csv

This was used to analyse when peaks in activity occurred for different events.


Detailed and Summary reports for a list of Users

Input Data:

  • Event Logs broken down by Academic Year

  • Sakai Site file for Site Title

Parameters:

  • List of user identifiers – can just be one user

  • Academic years to query

Output:

For each user:

  • Detailed breakdown of each session (see later). events_for_user_year.csv


Illustration
Illustration 1: Selected columns for one User session showing Sites and Content Type

  • Sites visited by this user with counts for each week sites_for_user_year.csv



Illustration
Illustration 2: Sites visited by a User with Weekly Counts


  • Events and sites visited by this user with counts for each week

    events_for_user_year_weekly_event&site.csv

Would be like the illustration below with weekly counts too

  • Summary events and sites (no weekly counts)

    events_for_user_year_summary_event&site.csv


  • Illustration
    Illustration 3: Event Types, Sites and Counts for a User


  • Total counts for each event across all sites

    events_for_user_year_summary_event_all_sites.csv


Illustration
Illustration 4: Totals for each Event Type



Overall list of sites visited by these users sites_for_user_list_.csv



Detailed and Summary reports for a list of Sites

Input Data:

  • Event Logs broken down by Academic Year

  • Users and the Institutions (eg colleges, departments) they belong to

  • Sakai Site file for Site Title for other sites visited.

Parameters:

  • List of Site Titles – can just be one site (will also take a Site Id).

  • Academic years to query

  • Look for admin users/non admin users or both

Output:

For each user:

  • Detailed breakdown of each session (see later).

    usertype_events_for_site_year.csv

  • List of users which have logged events on this site.

    active_users_for_site_year.csv

    These can be joined across sites and pivoted to give a list of all active users that have visited these sites. This data was used for Gephi Network visualisation.

  • Events and sites visited by this user with counts for each week. All sites visited within a session in which requested site is visited.

    usertype_events_for_site_year_weekly_event&site.csv

  • Events and sites visited by this user with counts for each week. Just this site.

    usertype_events_for_site_year_weekly_event_this_site.csv

  • Summary events and sites (no weekly counts). All sites visited within a session in which requested site is visited.

    usertype_events_for_site_year_summary_event&site.csv


    Illustration
    Illustration 5: Sample Rows for this report


  • Summary events and sites (no weekly counts). Just this site

    usertype_events_for_site_year_summary_event_this_site.csv


    Illustration
    Illustration 6: Sample rows for this report (just events for this site)


Detailed Session Breakdown

This is pretty much the same whether we are looking at a list of users or sites. Much has been done to extract and derive further information but the tool needs to be developed further (or spreadsheet pivoting used) to summarise this into useful reports/visualisations.

Warning:

Site and Tool information is embedded in the Event Ref field and will need customised code to extract it from non-Cambridge University users of the Sakai VLE.

Session Information (Only produced for a Site request)

A session is normally a chain of events that takes place between a user logging in and logging out (or being automatically logged out).

The detailed breakdown of each session includes the session length, an indication of whether logout was forced by the system after a period of inactivity and the time of day (am, pm, eve or night) when the session occurred. This could data could be used to analyse how the session length or time of day varies across sites and time of year.


Illustration
Illustration 7: Selected columns for a Session for a Site showing Session Information


Event Information

'Routine' events such as login and logout, searching for announcements and pres.begin have been ignored from this report (they can be turned on again within the code if required).

Institution Details

The first two institutions have been added for each user where available. This also includes an indication of whether the user is an admin or regular user or both. This is based on a flag set in the Site file for Admin users.


Event Time

As well as a time stamp for each event, we include the calendar year and week number.

Event, Site and Tool

This shows the event type (eg content.read). The Site is extracted from the Event Ref using knowledge of the content of the Event Ref data for each event type. This routine will most likely need changing for different institutional implementation of Sakai VLE. The Sakai Site file is used to convert a Site Number to a Site Title. Where there is no match in the Sakai Site file '*Site not found*' will be shown.

The Tool used is derived from the Event Type and the Event Ref. For some events such as those dealing with content, the exact tool used has to be determined from further information in the reference. This may indicate assignments, announcements, calendar, course outline, forums and the mail tool.

The reports that the Perl tool produces are based on events not the underlying tools and further work would be needed to be able to report by tools.

Content and content type

The content can be extracted from the Event Ref for many events including content.read. There is a rudimentary analysis of what the content contains (based on searching for terms like “Examination, Exercise, Timetable and Syllabus”. This is printed in the detailed event report but further work would be needed to look at which sites display what type of content or when students access this content.

The length of the content address string and its depth on terms of files have also been extracted to allow for future analysis here.

Created By, Created On, Modified By and Modified On

(Currently only printed in reports produced for Users)

Records who created and last modified this site and when.

Counts

Various counts are printed to be used in spreadsheet pivoting

The following are currently only printed in reports produced for Sites:

Search For

Could be used to investigate what terms users search for and then used in conjunction with the next site they go to to understand more about search issues.

Assessment Title

For Samigo Tests and Quizzes the tool can determine the title of the Assessment provided it has access to the following files:

sam_assessmentgrading_t

sam_publishedassessment_t

Evaluation Owner and Title

For the Swift Tool, the Perl tool can determine the Evaluation owner and title provided it has access to the following files:

eval_response

eval_adhoc_group







1 comment:

  1. Your tips are remarkable. I regularly read your blog and its very helpful.
    Cloud data

    ReplyDelete