Metadata-Version: 2.1
Name: pyassetpricing
Version: 1.0
Summary: Python library for asset pricing research
Home-page: https://github.com/chulwoohan/pyanomaly
Author: Chulwoo Han and Jongho Kang
Author-email: chulwoo.han@skku.edu
Project-URL: Documentation, https://pyanomaly.readthedocs.io/en/latest/index.html
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
License-File: LICENSE.txt
Requires-Dist: wrds
Requires-Dist: openpyxl
Requires-Dist: pandas
Requires-Dist: statsmodels
Requires-Dist: numba
Requires-Dist: matplotlib
Requires-Dist: scikit-learn
Requires-Dist: pyarrow

==========
PyAnomaly
==========

PyAnomaly is a comprehensive python library for asset pricing research with a focus on firm characteristic and factor generation.
It covers the majority of the firm characteristics published in the literature and contains various analytic tools that are
commonly used in asset pricing research, such as quantile portfolio construction, factor regression, and cross-sectional regression.
The purpose of PyAnomaly is *NOT* to generate firm characteristics in a fixed manner. Rather, we aim to build
a package that can serve as a standard library for asset pricing research and help reduce *non-standard errors*.

The current list of the firm characteristics supported by PyAnomaly can be found in Coverage.
PyAnomaly is a live project and we plan to add more firm characteristics and functionalities going forward. We also welcome contributions
from other scholars.

PyAnomaly is very efficient, comprehensive, and flexible.

    Efficiency
        PyAnomaly can generate over 200 characteristics from 1950 under one hour (tested on a desktop with
        12th Gen Intel(R) Core(TM) i9-12900KS and 128GB RAM and 1GB network). Once the data have been downloaded from
        WRDS, the processing time is under 15 minutes.
        To achieve this, PyAnomaly utilizes numba, multiprocessing, and asyncio packages when possible, but not too
        heavily to maximize readability of the code.
        The library is slower on the first run due to caching of numba jitted functions.

    Comprehensiveness
        PyAnomaly supports over 200 firm characteristics published in the literature. It covers most characteristics in
        Green et al. (2017) and Jensen et al. (2021), except those that use IBES data. It also provides
        various tools for asset pricing research.

    Flexibility
        PyAnomaly adopts the object-oriented programming design philosophy and is easy to customize or add functionalities.
        This means users can easily change the definition of an existing characteristic, add a new characteristic, or
        change configurations to run the program. For instance, a user can choose whether to update annual accounting
        variables quarterly (using Compustat.fundq) or annually (using Compustat.funda), or whether
        to use the latest market equity or the year-end market equity when generating firm characteristics.


Main Features
=============

* Efficient data download from WRDS using asynco.
* Over 200 firm characteristics generation. You can choose which firm characteristics to generate.
* Factor models

    * Fama-French 3-factor model
    * Fama-French 5-factor model
    * Hou-Xue-Zhang 4-factor model
    * Stambaugh-Yuan 4-factor model

* Analytics

    * Cross-sectional regression
    * 1-D sort
    * 2-D sort
    * Rolling regression
    * Quantile portfolio
    * Long-short portfolio
    * Portfolio performance analysis

* Data tools

    * Filtering
    * Winsorizing
    * Trimming
    * Grouping
    * Population
    * And more...


Changelog
=========

v0.9 - 2022.01.15
-----------------

Initial version.

v0.923 - 2022.01.16
--------------------

Multiprocessing in ``datatools.populate()`` has been updated to increase the speed.


v0.930 - 2022.01.17
--------------------

The trend factor of Han, Zhou, and Zhu (2016) has been added. We thank Guofu Zhou for this suggestion.


v0.931 - 2022.01.23
--------------------

A bug of not returning the result in FUNDA.c_ebitda_mev has been fixed.

A new characteristic method for Enterprise multiple (Loughran and Wellman, 2011), c_enterprise_multiple,
has been added to FUNDA, as the previous one (c_ebitda_mev) that implements JKP's SAS code uses a different definition
from the original definition. This new method uses the original definition.

v1.0 - 2024.02.28 (Major Update)
--------------------------------

There are several important updates in this version and some functions are not backward compatible.
See the examples in Cookbook for changes.

**Major updates**

- Performance upgrade

    The library is now significantly faster and more memory efficient.

- ``panel.FCPanel``

    The ``Panel`` class has been divided into two classes: ``Panel`` class that serves as the base class for panel
    data analysis and ``FCPanel`` class that inherits ``Panel`` and serves as the base class for firm characteristics
    generation. ``FUNDA``, ``FUNDQ``, ``CRSPM``, ``CRSPD``, and ``Merge`` now inherit ``FCPanel`` instead of ``Panel``.

- ``characteristics.CRSPDRaw``

    Previously, ``CRSPD.data`` contained daily crspd data and ``CRSPD.chars`` contained monthly firm characteristics.
    In the new version, a new class ``CRSPDRaw`` handles daily crspd data and is a member of ``CRSPD``.
    ``CRSPDRaw.data`` contains daily crspd data and ``CRSPD.data`` contains monthly firm characteristics.

- Factor models

    Two new factor models, Fama-French 5-factor and Stambough and Yuan 4-factor models, have been added.

- CRSP-Compustat link

    If a use don't have WRDS subscription for ccmxpf_linktable, PyAnomlay will create a link table internally and use it
    to map permno and gvkey. Compared to using ccmxpf_linktable, about 13% of gvkey's are different when using the
    internal link table ('crsp_comp_linktable').

**Minor updates**

- Default log directory has been added as ``config.log_dir``.
- Float datatype can be configured to float32 using ``set_config(float_type='float32')``.
- New file format, parquet, has been added. To change the file format to parquet,
  do ``set_config(file_format='parquet')``. The default file format is pickle.
- ``log.set_log_path()`` has been revised so that it can create a log file automatically from a file name.
- ``datatools.classify()`` has been revised so that if the characteristic is a binary variable, the class is either
  0 or (number of quantiles - 1). In the previous version, the class was not deterministic.
- ``jkp.py`` has been renamed as ``factors.py``.
- ``analytics.rolling_beta()`` has been renamed as ``numba_support.rolling_regression()``.
- ``panel.Panel.rolling_beta()`` has been renamed as ``panel.Panel.rolling_regression()``.
- Input arguments have been changed in the following functions.

    - ``datatools.classify()``
    - ``datatools.trim()``
    - ``datatools.filter()``
    - ``datatools.winsorize()``

- A new argument `fname` has been added to ``load_data()`` of ``FUNDA``, ``FUNDQ``, ``CRSPM``, and ``CRSPD``.
  If funda, fundq, crspm, and crspd data are modified (e.g., cleansed) and saved with different file names,
  those names can be given to read data from those modified data files.

- ``mapping.xlsx``: New columns, original sample start date (sample_start) and original sample end date (sample_end),
  have been added.

**New functions**

    - ``analytics.grs_test()``: GRS (Gibbons, Ross, and Shanken, 1989) test.
    - ``config.set_config()``: Set library configuration.
    - ``config.get_config()``: Get library configuration.
    - ``datatools.apply_to_groups()``: Group data and apply a function to each group.
    - ``datatools.apply_to_groups_jit()``: Group data and apply a function to each group (jitted version).
    - ``datatools.apply_to_groups_reduce_jit()``: Group data and apply a reduce function to each group (jitted version).
    - ``numba_support.roll_sum()``: Rolling sum.
    - ``numba_support.roll_mean()``: Rolling mean.
    - ``numba_support.roll_std()``: Rolling standard deviation.
    - ``numba_support.roll_var()``: Rolling variance.
    - ``numba_support.rank()``: Rank.
    - ``numba_support.bivariate_regression()``: Bivariate regression.
    - ``numba_support.regression()``: Multivariate regression.
    - ``numba_support.rolling_regression()``: Rolling regression.
    - ``panel.Panel.apply_to_ids()``: Apply a function to each id group.
    - ``panel.Panel.apply_to_dates()``: Apply a function to each date group.
    - ``wrdsdata.WRDS.create_crsp_comp_linktable()``: Create a CRSP-Compustat link table using cusip.
    - ``wrdsdata.WRDS.add_gvkey_to_crsp_cusip()``: Add gvkey to m(d)sf and identify primary stocks using internal link table.

**Deprecated functions**

    - ``characteristics.FUNDA.convert_to_monthly()``: Use ``Panel.populate()`` instead.
    - ``characteristics.FUNDQ.convert_to_monthly()``: Use ``Panel.populate()`` instead.
    - ``datatools.filter_n()``.
    - ``datatools.groupby_apply()``: Use ``datatools.apply_to_groups()``, ``datatools.apply_to_groups_jit()``, or
      ``datatools.apply_to_groups_reduce_jit()``.
    - ``datatools.groupby_apply_np()``: Use ``datatools.apply_to_groups()``, ``datatools.apply_to_groups_jit()``, or
      ``datatools.apply_to_groups_reduce_jit()``.
    - ``datatools.rolling_apply()``: Use ``datatools.apply_to_groups()``, ``datatools.apply_to_groups_jit()``, or
      ``datatools.apply_to_groups_reduce_jit()``.
    - ``datatools.rolling_apply_np()``: Use ``datatools.apply_to_groups()``, ``datatools.apply_to_groups_jit()``, or
      ``datatools.apply_to_groups_reduce_jit()``.

**Bug fix**

    - ``characteristic.FUNDA.c_currat()``: A bug of not returning the result has been fixed.
    - ``characteristics.FUNDQ.c_ni_inc8q()``: In the previous version, dibq (difference of ibq) was set to nan in the
      first 4 quarters. This made some valid ni_inc8q in the first 12 quarters become nan. In the new version,
      we set all nan values of dibq to 0 before calculating ni_inc8q and ni_inc8q is set to nan if dibq is nan.
      The revised logic does not lose valid ni_inc8q in the first 12 quarters.
    - ``characteristic.CRSPD.zero_trades_21d()``: Fixed dividing by 0 when the total turnover is 0.
      When counting the number of days in a month, only the days when turnover is not nan are counted. Before, all days
      were counted.
    - ``characteristic.CRSPD.c_zero_trades_126d()``: Fixed dividing by 0 when the total turnover is 0.
    - ``characteristic.CRSPD.c_zero_trades_252d()``: Fixed dividing by 0 when the total turnover is 0.
    - ``characteristic.CRSPD.c_rmax5_21d()``: A bug when there are only a few distinct return values in a month has been
      fixed.
      Suppose the return is positive in two days and 0 in the other days. Previously, rmax5_21d was the mean of the
      two positive returns. In the new version, it is the mean of the two positive returns and three 0 returns.
      Also, if days of valid returns (not nan) are fewer than or equal to 5, the result is nan.
    - ``characteristic.Merge.age()``: In the previous version, age was the max of (funda history, crspm history).
      This logic can make the age decrease when funda history is missing: if funda data exists from 2000.01 to 2020.12
      and crsp data from 2001.01 to 2022.12, the age will decrease in 2021.01. The logic has been revised so that the
      age doesn't decrease when funda data is missing.
    - ``panel.Panel.rolling()``: When `lag` > 0, shifted rows were not properly removed. This bug has been fixed.

