Metadata-Version: 2.4
Name: alphapurify
Version: 0.1.7
Summary: High-performance quantitative factor analysis and purification toolkit
Author-email: Elias Wu <elaiswu71@gmail.com>
License-Expression: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: pandas
Requires-Dist: polars
Requires-Dist: duckdb
Requires-Dist: plotly
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: pyarrow
Requires-Dist: joblib
Requires-Dist: scikit-learn
Requires-Dist: tqdm
Dynamic: license-file

# AlphaPurify: Factor analytics for quants

**AlphaPurify** Python library for financial data aggregation, factor construction, IC testing, factor return attribution, full-pipeline backtesting, and large-scale experimentation to help quants rapidly validate ideas.

---

![IC](assets/logo.jpg)

---

### AlphaPurify is comprised of 4 main modules:

1.  **`alphapurify.FactorAnalyzer`** — for IC testing and quantile portfolio analysis to evaluate factor predictive ability.
2.  **`alphapurify.AlphaPurifier`** — for factor preprocessing, including 40+ Winsorization, Neutralization, and Standardization methods.
3.  **`alphapurify.Database`** — for reading, writing, and aggregating financial and factor datasets.
4.  **`alphapurify.Exposures`** — for factor correlation analysis and factor-based return attribution.

---

## Why AlphaPurify?

Compared with traditional factor research tools, **You merely just need a Dataframe**.

**• Optimized for single-machine research**

Many independent researchers work on a single laptop where memory overflow and slow computation are common issues.  
AlphaPurify is designed with optimized caching, vectorized computation, and multiprocessing wherever possible.

For example, a **15-year daily dataset of the CSI 300 universe** can complete full factor evaluation — including **long-only, long-short, short portfolios and IC analysis** — in **around 30 seconds** on a typical laptop.


**• Adaptive to arbitrary bar frequency**

AlphaPurify works with **any bar frequency** (daily, hourly, minute-level, etc.).  
Return aggregation automatically adapts to the data frequency, while allowing users to explicitly specify the horizon if needed.

The framework is carefully designed to **strictly prevent look-ahead bias**.


**• Professional factor preprocessing toolkit**

AlphaPurify provides **40+ built-in preprocessing methods** for factor research, including common operations such as:

- winsorization
- neutralization
- standardization  
 

This allows researchers to rapidly experiment with different factor cleaning pipelines.

**• Lightweight high-performance data backend**

AlphaPurify integrates a fast **Parquet + DuckDB** data layer for factor storage and aggregation.

This avoids the need for configuring complex database systems while still providing **high-performance querying and fast factor construction workflows**.

---

##  Quick Start

### 1.Install with pip
Users can easily install ``AlphaPurify`` by pip according to the following command.

```bash
pip install alphapurify
```
**Note**: pip will install the latest stable ``AlphaPurify``. However, the main branch of AlphaPurify is in active development. If you want to test the latest scripts or functions in the main branch. Please install ``AlphaPurify`` with clone.

---

### 2.Load your DataFrame
| datetime           | symbol | close | volume | factor | momentum_12_1 | vol_60 | beta_252 |
|:-------------------|:------|------:|------:|------:|--------------:|------:|--------:|
| 2024-01-01 09:30   | AAPL  | 189.9 | 120034 | 0.42 | 0.15 | 0.21 | 1.08 |
| 2024-01-01 09:31   | AAPL  | 190.0 | 98321  | 0.38 | 0.16 | 0.22 | 1.07 |
| 2024-01-01 09:32   | AAPL  | 190.4 | 101245 | 0.41 | 0.17 | 0.23 | 1.06 |
| 2024-01-01 09:30   | MSFT  | 378.5 | 84211  | -0.15 | -0.05 | 0.18 | 0.95 |
| 2024-01-01 09:31   | MSFT  | 378.9 | 90122  | -0.12 | -0.04 | 0.19 | 0.96 |
| 2024-01-01 09:32   | MSFT  | 379.1 | 95433  | -0.08 | -0.03 | 0.20 | 0.97 |

---

### 3.Creating reports
```bash
from alphapurify import AlphaPurifier, FactorAnalyzer, Pure_Exposures

# preprocess
df = (
    AlphaPurifier(df, factor_col="alpha_003")
    .winsorize(method="mad")
    .standardize(method="zscore")
    .to_result()
)

#backtest
FA = FactorAnalyzer(base_df=df,
                    trade_date_col='datetime',
                    symbol_col='symbol',
                    price_col='close',
                    factor_name='alpha_003')
FA.run()
FA.create_long_return_sheet()
FA.create_long_short_return_sheet()
FA.create_short_return_sheet()
FA.create_single_fac_ic_sheet()

#contributions of other factors
Ex = Pure_Exposures(
    base_df=df,
    trade_date_col='datetime',
    symbol_col='symbol',
    price_col='close',
    factor_name='alpha_003',
    exposure_cols=['momentum_12_1', 'vol_60', 'beta_252'],
)

Ex.run()
Ex.plot_pure_exposures()
Ex.plot_pure_returns()
Ex.plot_pure_exposures_and_returns()
Ex.plot_correlations()
```

---

## Examples of Outputs
### Portfolio for long positions only:
![IC](assets/newplot2.png)
### Contributions of other factors:
![IC2](assets/newplot3.png)
![IC2](assets/newplot4.png)
![IC2](assets/newplot5.png)

---

## P.S.
More detailed documentation and examples will be released soon.

Suggestions and improvements are welcome. Feel free to open an issue, submit a pull request, or contact me via email.

---

**Elias Wu**

