.. _FAQ:

FAQ
===

You will find here the Frequently Asked Questions, as well as some other
use-case examples that are not part of the User Guide.

How to get the top-N recommendations for each user
----------------------------------------------------------

Here is an example where we retrieve retrieve the top-10 items with highest
rating prediction for each user in the MovieLens-100k dataset. We first train
an SVD algorithm on the whole dataset, and then predict all the ratings for the
pairs (user, item) that are not in the training set. We then retrieve the
top-10 prediction for each user.

.. literalinclude:: ../../examples/top_n_recommendations.py
    :caption: From file ``examples/top_n_recommendations.py``
    :name: top_n_recommendations.py
    :lines: 10-

.. _precision_recall_at_k:

How to compute precision@k and recall@k
-----------------------------------------------------------------------

Here is an example where we compute Precision@k and Recall@k for each user:

:math:`\text{Precision@k} = \frac{ | \{ \text{Recommended items that are relevant} \} | }{ | \{ \text{Recommended items} \} | }`
:math:`\text{Recall@k} = \frac{ | \{ \text{Recommended items that are relevant} \} | }{ | \{ \text{Relevant items} \} | }`

An item is considered relevant if its true rating :math:`r_{ui}` is greater
than a given threshold.  An item is considered recommended if its estimated
rating :math:`\hat{r}_{ui}` is greater than the threshold, and if it is among
the k highest estimated ratings.

.. literalinclude:: ../../examples/precision_recall_at_k.py
    :caption: From file ``examples/precision_recall_at_k.py``
    :name: precision_recall_at_k.py
    :lines: 7-

.. _get_k_nearest_neighbors:

How to get the k nearest neighbors of a user (or item)
--------------------------------------------------------------

You can use the :meth:`get_neighbors()
<surprise.prediction_algorithms.algo_base.AlgoBase.get_neighbors>` methods of
the algorithm object. This is only relevant for algorithms that use a
similarity measure, such as the :ref:`k-NN algorithms
<pred_package_knn_inpired>`.

Here is an example where we retrieve the 10 nearest neighbors of the movie Toy
Story from the MovieLens-100k dataset. The output is:

.. parsed-literal::

    The 10 nearest neighbors of Toy Story are:
    Beauty and the Beast (1991)
    Raiders of the Lost Ark (1981)
    That Thing You Do! (1996)
    Lion King, The (1994)
    Craft, The (1996)
    Liar Liar (1997)
    Aladdin (1992)
    Cool Hand Luke (1967)
    Winnie the Pooh and the Blustery Day (1968)
    Indiana Jones and the Last Crusade (1989)

There's a lot of boilerplate because of the conversions between movie names and
their raw/inner ids (see :ref:`this note <raw_inner_note>`), but it all boils
down to the use of :meth:`get_neighbors()
<surprise.prediction_algorithms.algo_base.AlgoBase.get_neighbors>`:

.. literalinclude:: ../../examples/k_nearest_neighbors.py
    :caption: From file ``examples/k_nearest_neighbors.py``
    :name: k_nearest_neighbors.py
    :lines: 10-

Naturally, the same can be done for users with minor modifications.

.. _serialize_an_algorithm:

How to serialize an algorithm
-----------------------------

Prediction algorithms can be serialized and loaded back using the :func:`dump()
<surprise.dump.dump>` and :func:`load() <surprise.dump.load>` functions. Here
is a small example where the SVD algorithm is trained on a dataset and
serialized. It is then reloaded and can be used again for making predictions:

.. literalinclude:: ../../examples/serialize_algorithm.py
    :caption: From file ``examples/serialize_algorithm.py``
    :name: serialize_algorithm.py
    :lines: 9-

.. _further_analysis:

Algorithms can be serialized along with their predictions, so that can be
further analyzed or compared with other algorithms, using pandas dataframes.
Some examples are given in the two following notebooks:

    * `Dumping and analysis of the KNNBasic algorithm
      <http://nbviewer.jupyter.org/github/NicolasHug/Surprise/tree/master/examples/notebooks/KNNBasic_analysis.ipynb/>`_.
    * `Comparison of two algorithms
      <http://nbviewer.jupyter.org/github/NicolasHug/Surprise/tree/master/examples/notebooks/Compare.ipynb/>`_.

How to build my own prediction algorithm
----------------------------------------

There's a whole guide :ref:`here<building_custom_algo>`.

.. _raw_inner_note:

What are raw and inner ids
--------------------------

Users and items have a raw id and an inner id. Some methods will use/return a
raw id (e.g. the :meth:`predict()
<surprise.prediction_algorithms.algo_base.AlgoBase.predict>` method), while
some other will use/return an inner id.

Raw ids are ids as defined in a rating file or in a pandas dataframe. They can
be strings or numbers. Note though that if the ratings were read from a file
which is the standard scenario, they are represented as strings (see e.g.
:ref:`here <train_on_whole_trainset>`).

On trainset creation, each raw id is mapped to a unique
integer called inner id, which is a lot more suitable for `Surprise
<https://nicolashug.github.io/Surprise/>`_ to manipulate. Conversions between
raw and inner ids can be done using the :meth:`to_inner_uid()
<surprise.dataset.Trainset.to_inner_uid>`, :meth:`to_inner_iid()
<surprise.dataset.Trainset.to_inner_iid>`, :meth:`to_raw_uid()
<surprise.dataset.Trainset.to_raw_uid>`, and :meth:`to_raw_iid()
<surprise.dataset.Trainset.to_raw_iid>` methods of the :class:`trainset
<surprise.dataset.Trainset>`.


Can I use my own dataset with Surprise, and can it be a pandas dataframe
------------------------------------------------------------------------

Yes, and yes. See the :ref:`user guide <load_custom>`.

How to tune an algorithm parameters
-----------------------------------

You can tune the parameters of an algorithm with the :class:`GridSearch
<surprise.evaluate.GridSearch>` class as described :ref:`here
<tuning_algorithm_parameters>`. After the tuning, you may want to have an
:ref:`unbiased estimate of your algorithm performances
<unbiased_estimate_after_tuning>`.

How to get accuracy measures on the training set
------------------------------------------------

You can use the :meth:`build_testset()
<surprise.dataset.Trainset.build_testset()>` method of the :class:`Trainset
<surprise.dataset.Trainset>` object to build a testset that can be then used
with the :meth:`test()
<surprise.prediction_algorithms.algo_base.AlgoBase.test>` method:

.. literalinclude:: ../../examples/evaluate_on_trainset.py
    :caption: From file ``examples/evaluate_on_trainset.py``
    :name: evaluate_on_trainset.py
    :lines: 9-24

Check out the example file for more usage examples.

.. _unbiased_estimate_after_tuning:

How to save some data for unbiased accuracy estimation
------------------------------------------------------

If your goal is to tune the parameters of an algorithm, you may want to spare a
bit of data to have an unbiased estimation of its performances. For instance
you may want to split your data into two sets A and B. A is used for parameter
tuning using grid search, and B is used for unbiased estimation. This can be
done as follows:

.. literalinclude:: ../../examples/split_data_for_unbiased_estimation.py
    :caption: From file ``examples/split_data_for_unbiased_estimation.py``
    :name: split_data_for_unbiased_estimation.py
    :lines: 10-
