Evaluation

We have gotten some questions on the evaluation of the two tracks. This page tries to explain the evaluation approaches in detail.

Please note that we welcome you to use other evaluation metrics as well, however, in order to gain a simple overview of all the different approaches we need you to include the ones below as well.

Due to the different recommendation approaches used by different teams, there are two means of evaluation based on whether you approach is group-based (household-based), or user-based.

Note that the difference between a group-based and a user-based approach is whether the list of recommended items is recommended to individual users, or whether it is recommended to a set of users (i.e. the household).

The type of approach will be taken into consideration by the organizers when comparing the obtained results.

Track 1 – Household Recommendations

Track 1 should be evaluated using the information contained in the file public_eval_t1.csv. All users in the file belong to a household, the household precision and household MAP values should be averaged per household.

Furthermore, at latest in your camera ready versions you should also include the average ranking (position in the set of recommended items) for movies found in the public_eval_t1.csv file, averaged per movie per household. This should be, for each movie in the evaluation, average the rank of the movie for each member in the household. Do this for all movies in the household, average the value. And finally average the value of all households.

User-based approaches:

Example of household precision@1, 2, 3 for a household with two members and two movies (the division by 2 in every step is the number of users in the household)

Group-based approaches:

Example of household precision@1, 2, 3 for a household with 3 users and two movies

Track 2 – Identifying Ratings

Track 2 should be evaluated using the information contained int he file public_eval_t2.csv and/or housemovies_t2.csv.

The metric which should be used for this track is a classification error rate by household, i.e. the number of correct predictions divided by the number of predictions, averaged by household. Additionally similar error rates should be calculated for each household size, i.e. one error rate averaged by household for all households where household size is {2, 3, ..}.

Other metrics

Household Area under the ROC Curve should be calculated according to:

For each user (i.e. True Positive = one up, false positive = one right).
Average the area in each household.
Calculate one final average AUC for all huseholds.
Please note that a completely random approach will have an AUC value of 0.5.

MAP

See here for how MAP is calculated: http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
MAP you first need to calculate Average Precision according to http://en.wikipedia.org/wiki/Information_retrieval#Average_precision
In order to calculate Average Precision you need to calculate household Precision, as defined under Track 1 (above)

news

related

Evaluation

Track 1 – Household Recommendations

Track 2 – Identifying Ratings

Other metrics

CAMRa’11 info