Trainset class¶
-
class
surprise.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, offset, raw2inner_id_users, raw2inner_id_items)[source]¶ A trainset contains all useful data that constitutes a training set.
It is used by the
fit()method of every prediction algorithm. You should not try to built such an object on your own but rather use theDataset.folds()method or theDatasetAutoFolds.build_full_trainset()method.Trainsets are different from
Datasets. You can think of aDatasetsas the raw data, and Trainsets as higher-level data where useful methods are defined. Also, aDatasetsmay be comprised of multiple Trainsets (e.g. when doing cross validation).-
ur¶ defaultdictoflist– The users ratings. This is a dictionary containing lists of tuples of the form(item_inner_id, rating). The keys are user inner ids.
-
ir¶ defaultdictoflist– The items ratings. This is a dictionary containing lists of tuples of the form(user_inner_id, rating). The keys are item inner ids.
-
n_users¶ Total number of users \(|U|\).
-
n_items¶ Total number of items \(|I|\).
-
n_ratings¶ Total number of ratings \(|R_{train}|\).
-
rating_scale¶ tuple – The minimum and maximal rating of the rating scale.
-
global_mean¶ The mean of all ratings \(\mu\).
-
all_ratings()[source]¶ Generator function to iterate over all ratings.
Yields: A tuple (uid, iid, rating)where ids are inner ids (see this note).
-
build_anti_testset(fill=None)[source]¶ Return a list of ratings that can be used as a testset in the
test()method.The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is either replaced by the
fillvalue or assumed to be equal to the mean of all ratingsglobal_mean.Parameters: fill (float) – The value to fill unknown ratings. If Nonethe global mean of all ratingsglobal_meanwill be used.Returns: A list of tuples (uid, iid, fill)where ids are raw ids.
-
build_testset()[source]¶ Return a list of ratings that can be used as a testset in the
test()method.The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the
all_ratings()generator. This is useful in cases where you want to to test your algorithm on the trainset.
-
global_mean Return the mean of all ratings.
It’s only computed once.
-
knows_item(iid)[source]¶ Indicate if the item is part of the trainset.
An item is part of the trainset if the item was rated at least once.
Parameters: iid (int) – The (inner) item id. See this note. Returns: Trueif item is part of the trainset, elseFalse.
-
knows_user(uid)[source]¶ Indicate if the user is part of the trainset.
A user is part of the trainset if the user has at least one rating.
Parameters: uid (int) – The (inner) user id. See this note. Returns: Trueif user is part of the trainset, elseFalse.
-
to_inner_iid(riid)[source]¶ Convert an item raw id to an inner id.
See this note.
Parameters: riid (str) – The item raw id. Returns: The item inner id. Return type: int Raises: ValueError– When item is not part of the trainset.
-
to_inner_uid(ruid)[source]¶ Convert a user raw id to an inner id.
See this note.
Parameters: ruid (str) – The user raw id. Returns: The user inner id. Return type: int Raises: ValueError– When user is not part of the trainset.
-