Metadata-Version: 2.4
Name: narayanpop
Version: 0.0.1
Summary: Narayan-Popp ADF Unit Root Test with Two Structural Breaks
Author-email: Dr Merwan Roudane <merwanroudane920@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/merwanroudane/narayanpop
Project-URL: Repository, https://github.com/merwanroudane/narayanpop
Project-URL: Bug Tracker, https://github.com/merwanroudane/narayanpop/issues
Project-URL: Documentation, https://github.com/merwanroudane/narayanpop/blob/main/README.md
Keywords: econometrics,unit root test,structural breaks,time series,ADF test,stationarity,cointegration
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scipy>=1.7.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=4.5.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: numpydoc>=1.2.0; extra == "docs"
Provides-Extra: examples
Requires-Dist: matplotlib>=3.5.0; extra == "examples"
Requires-Dist: seaborn>=0.11.0; extra == "examples"
Requires-Dist: jupyter>=1.0.0; extra == "examples"
Dynamic: license-file

# narayanpop

[![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Narayan-Popp ADF Unit Root Test with Two Structural Breaks**

A Python implementation of the unit root test with two structural breaks proposed by Narayan and Popp (2010).

## Reference

Narayan, P. K. and Popp, S. (2010), "A new unit root test with two structural breaks in level and slope at unknown time", *Journal of Applied Statistics*, 37(9), 1425-1438. 

DOI: [10.1080/02664760903039883](https://doi.org/10.1080/02664760903039883)

## Features

- ✅ **Exact replication** of the original GAUSS code and paper methodology
- ✅ **Two model specifications**:
  - Model 1 (Model A): Two breaks in level
  - Model 2 (Model C): Two breaks in level and trend
- ✅ **Sequential break date selection** procedure
- ✅ **Innovational Outlier (IO)** model for gradual breaks
- ✅ **Flexible lag selection**: AIC, SIC, or t-statistic criterion
- ✅ **Publication-ready output** formatted for top-tier journals
- ✅ **Critical values** for various sample sizes (T ≤ 50, 50 < T ≤ 200, 200 < T ≤ 400, T > 400)
- ✅ **Panel data support** for testing multiple series

## Installation

```bash
pip install narayanpop
```

Or install from source:

```bash
git clone https://github.com/merwanroudane/narayanpop.git
cd narayanpop
pip install -e .
```

## Quick Start

### Basic Usage

```python
import numpy as np
import pandas as pd
from narayanpop import adf_2breaks

# Generate sample data
np.random.seed(42)
y = np.cumsum(np.random.randn(100))

# Run test with Model 1 (breaks in level only)
result = adf_2breaks(y, model=1)

# Print formatted results
print(result.summary())
```

### Working with Time Series Data

```python
import pandas as pd
from narayanpop import adf_2breaks

# Load your data
dates = pd.date_range('1960', periods=100, freq='Y')
y = pd.Series(np.cumsum(np.random.randn(100)), index=dates)

# Run test with Model 2 (breaks in level and trend)
result = adf_2breaks(y, model=2, pmax=8, ic=3, trimm=0.10)

# Access results
print(f"Test Statistic: {result.test_statistic:.4f}")
print(f"First Break: {result.break1}")
print(f"Second Break: {result.break2}")
print(f"Optimal Lag: {result.optimal_lag}")
print(f"Critical Values: {result.critical_values}")
```

### Panel Data Analysis

```python
import pandas as pd
from narayanpop import adf_2breaks_panel

# Load panel data
data = pd.DataFrame({
    'GDP': np.cumsum(np.random.randn(100)),
    'CPI': np.cumsum(np.random.randn(100)),
    'Unemployment': np.cumsum(np.random.randn(100))
})

# Test all series
results_df = adf_2breaks_panel(data, model=1)
print(results_df)
```

## Methodology

### Models

#### Model 1 (Model A): Break in Level
```
Δy_t = ρy_{t-1} + α_1 + β*t + θ_1*D(TB)_{1,t} + θ_2*D(TB)_{2,t} 
       + δ_1*DU_{1,t-1} + δ_2*DU_{2,t-1} + Σβ_j*Δy_{t-j} + ε_t
```

#### Model 2 (Model C): Break in Level and Trend
```
Δy_t = ρy_{t-1} + α* + β*t + κ_1*D(TB)_{1,t} + κ_2*D(TB)_{2,t}
       + δ*_1*DU_{1,t-1} + δ*_2*DU_{2,t-1} + γ*_1*DT_{1,t-1} + γ*_2*DT_{2,t-1}
       + Σβ_j*Δy_{t-j} + ε_t
```

Where:
- `DU_{i,t}` = 1 if t > TB_i, 0 otherwise (level shift dummy)
- `DT_{i,t}` = (t - TB_i) if t > TB_i, 0 otherwise (trend shift dummy)
- `D(TB)_{i,t}` = 1 if t = TB_i + 1, 0 otherwise (impulse dummy)

### Break Date Selection

The test uses a **sequential procedure**:

1. **First Break**: Maximize |t_θ1| (Model 1) or |t_κ1| (Model 2)
2. **Second Break**: Conditional on the first, maximize |t_θ2| or |t_κ2|

This approach is computationally efficient (2T operations vs T² for grid search).

### Null and Alternative Hypotheses

- **H₀**: Unit root with structural breaks (y_t is I(1) with breaks)
- **H₁**: Trend stationary with structural breaks (y_t is I(0) around a broken trend)

## Parameters

### `adf_2breaks(y, model, pmax=8, ic=3, trimm=0.10)`

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `y` | array-like | Data series (1D array or pandas Series) | Required |
| `model` | int | Model specification: 1 (level breaks) or 2 (level & trend breaks) | Required |
| `pmax` | int | Maximum number of lags for Δy | 8 |
| `ic` | int | Information criterion: 1 (AIC), 2 (SIC), 3 (t-stat) | 3 |
| `trimm` | float | Trimming rate for break search (0 < trimm < 0.5) | 0.10 |

## Output

### `ADF2BreaksResult` Object

| Attribute | Type | Description |
|-----------|------|-------------|
| `test_statistic` | float | ADF test statistic |
| `break1` | int/date | First break location |
| `break2` | int/date | Second break location |
| `optimal_lag` | int | Selected lag length |
| `critical_values` | dict | Critical values at 1%, 5%, 10% levels |
| `model` | int | Model specification used |
| `nobs` | int | Number of observations |

### Methods

- `summary()`: Returns formatted output suitable for journal publication

## Critical Values

Critical values from Narayan & Popp (2010), Table 3:

### Model 1 (Break in Level)

| Sample Size | 1% | 5% | 10% |
|-------------|----|----|-----|
| T ≤ 50 | -5.259 | -4.514 | -4.143 |
| 50 < T ≤ 200 | -4.958 | -4.316 | -3.980 |
| 200 < T ≤ 400 | -4.731 | -4.136 | -3.825 |
| T > 400 | -4.672 | -4.081 | -3.772 |

### Model 2 (Break in Level and Trend)

| Sample Size | 1% | 5% | 10% |
|-------------|----|----|-----|
| T ≤ 50 | -5.949 | -5.181 | -4.789 |
| 50 < T ≤ 200 | -5.576 | -4.937 | -4.596 |
| 200 < T ≤ 400 | -5.318 | -4.741 | -4.430 |
| T > 400 | -5.287 | -4.692 | -4.396 |

## Examples

### Example 1: Nelson-Plosser Data

```python
import pandas as pd
from narayanpop import adf_2breaks

# Real GNP data (1909-1970)
data = pd.read_csv('nelson_plosser.csv', index_col=0)
y = data['Real_GNP']

# Test with Model 2
result = adf_2breaks(y, model=2, pmax=8, ic=3)
print(result.summary())
```

Output:
```
======================================================================
Narayan-Popp ADF Unit Root Test with Two Structural Breaks
======================================================================
Model: Model C (Break in Level and Trend)
Number of observations: 62
Optimal lag length: 2

Test Results:
----------------------------------------------------------------------
ADF test statistic:    -5.5970

Critical Values:
  1% level:     -5.9490
  5% level:     -5.1810
 10% level:     -4.7890

Structural Breaks:
  First break:  1921 (19.35%)
  Second break: 1938 (46.77%)

Conclusion: Reject H0 at 5% level: Evidence AGAINST unit root **
======================================================================
Note: *** 1%, ** 5%, * 10% significance levels
H0: Unit root with structural breaks
H1: Trend stationary with structural breaks
======================================================================
```

### Example 2: US Macroeconomic Data

```python
from narayanpop import adf_2breaks

# CPI data (1948-2007)
result = adf_2breaks(cpi_data, model=1, pmax=8, ic=3, trimm=0.10)

if result.test_statistic < result.critical_values['5%']:
    print(f"Reject unit root at 5% level")
    print(f"Breaks detected at: {result.break1}, {result.break2}")
else:
    print("Cannot reject unit root hypothesis")
```

### Example 3: Monte Carlo Simulation

```python
import numpy as np
from narayanpop import adf_2breaks

# Simulation parameters
T = 100
n_sims = 1000
rejections = 0

for i in range(n_sims):
    # Generate I(1) data with no breaks
    y = np.cumsum(np.random.randn(T))
    
    # Run test
    result = adf_2breaks(y, model=1, pmax=8, ic=3)
    
    # Check rejection at 5% level
    if result.test_statistic < result.critical_values['5%']:
        rejections += 1

print(f"Empirical size at 5% level: {rejections/n_sims:.3f}")
# Should be close to 0.05
```

## Comparison with Related Tests

| Test | Breaks | Under H₀ | Under H₁ | Type |
|------|--------|----------|----------|------|
| **Narayan-Popp (2010)** | 2 | Yes | Yes | ADF-IO |
| Lee-Strazicich (2003) | 2 | Yes | Yes | LM |
| Lumsdaine-Papell (1997) | 2 | No | Yes | ADF-IO |
| Perron (1989) | 1 | Yes | Yes | ADF-IO |
| Zivot-Andrews (1992) | 1 | No | Yes | ADF-IO |

**Key Advantage**: Narayan-Popp allows for breaks under both null and alternative hypotheses, avoiding spurious rejections that occur with tests that only allow breaks under H₁.

## Testing Strategy

### Step 1: Choose Model
- Use **Model 1** if only level shifts are expected
- Use **Model 2** if both level and trend changes are possible

### Step 2: Set Parameters
- `pmax`: Rule of thumb: `int(12*(T/100)^{1/4})` or 8 for T ≈ 100
- `ic`: Use `3` (t-stat) for general-to-specific approach
- `trimm`: Keep at 0.10 (following Zivot-Andrews, Lumsdaine-Papell)

### Step 3: Interpret Results
1. Compare test statistic to critical values
2. If reject H₀: evidence of trend stationarity with breaks
3. Check break dates for economic/historical relevance
4. Verify optimal lag is reasonable

### Step 4: Robustness Checks
- Try both models
- Vary pmax
- Check sensitivity to trimming rate

## Validation

This implementation has been validated against:

1. ✅ **Original GAUSS code** (Saban Nazlioglu)
2. ✅ **Critical values** from Narayan & Popp (2010), Table 3
3. ✅ **Nelson-Plosser dataset** results
4. ✅ **Monte Carlo simulations** for size and power properties

## Technical Notes

### Innovational Outlier (IO) Model

The IO model assumes breaks occur **gradually** rather than instantaneously:
- More realistic for economic time series
- Breaks affect the series through the same dynamic process as innovations
- Specified through the inclusion of Ψ*(L) in the deterministic component

### Computational Efficiency

- **Sequential procedure**: ~2T operations
- **Grid search**: ~T² operations
- For T=100: Sequential is ~50x faster

### Trimming

Default trimming (0.10) excludes:
- First 10% of observations from break1 search
- Last 10% of observations from break2 search
- Ensures sufficient observations on each side of breaks

## Troubleshooting

### Issue: "Data contains missing values"
**Solution**: Remove or interpolate NaN values before testing

```python
y = y.dropna()  # or y.fillna(method='ffill')
```

### Issue: "Optimal lag is 0"
**Solution**: Normal if series is white noise or pmax too small. Consider:
- Increasing pmax
- Using different ic criterion
- Checking data quality

### Issue: "No clear breaks detected"
**Solution**: 
- Try different model specification
- Check if breaks actually exist in data
- Consider single-break tests first

## Citation

If you use this package in your research, please cite:

```bibtex
@article{narayan2010unit,
  title={A new unit root test with two structural breaks in level and slope at unknown time},
  author={Narayan, Paresh Kumar and Popp, Stephan},
  journal={Journal of Applied Statistics},
  volume={37},
  number={9},
  pages={1425--1438},
  year={2010},
  publisher={Taylor \& Francis}
}

@software{narayanpop2024,
  author = {Roudane, Merwan},
  title = {narayanpop: Python implementation of Narayan-Popp unit root test},
  year = {2024},
  url = {https://github.com/merwanroudane/narayanpop},
  version = {0.0.1}
}
```

## Related Packages

- **statsmodels**: General econometrics (ADF, KPSS, etc.)
- **arch**: ARCH/GARCH models and unit root tests
- **linearmodels**: Panel data econometrics

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/NewFeature`)
3. Commit your changes (`git commit -am 'Add NewFeature'`)
4. Push to the branch (`git push origin feature/NewFeature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Author

**Dr Merwan Roudane**
- Email: merwanroudane920@gmail.com
- GitHub: [@merwanroudane](https://github.com/merwanroudane)

## Acknowledgments

- Original GAUSS code by Saban Nazlioglu
- Narayan & Popp (2010) for the methodology
- The econometrics community for valuable feedback

## Changelog

### Version 0.0.1 (2024)
- Initial release
- Full implementation of Narayan-Popp test
- Model 1 and Model 2 support
- Panel data functionality
- Publication-ready output formatting

---

**Note**: This is an independent implementation for research purposes. For commercial applications, please verify results with the original paper and code.
