Metadata-Version: 2.4
Name: pycachy
Version: 0.0.7
Summary: Cache your API calls with a single line of code. No mocks, no fixtures. Just faster, cleaner code.
Author-email: Tommy <tc@answer.ai>
License: Apache-2.0
Project-URL: Repository, https://github.com/AnswerDotAI/cachy
Project-URL: Documentation, https://AnswerDotAI.github.io/cachy
Keywords: nbdev,jupyter,notebook,python
Classifier: Natural Language :: English
Classifier: Intended Audience :: Developers
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore>=1.12.9
Requires-Dist: httpx
Provides-Extra: dev
Requires-Dist: openai; extra == "dev"
Requires-Dist: anthropic; extra == "dev"
Requires-Dist: litellm; extra == "dev"
Requires-Dist: google-genai; extra == "dev"
Requires-Dist: nbdev; extra == "dev"
Dynamic: license-file

# cachy


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

We often call APIs while prototyping and testing our code. A single API
call (e.g. an Anthropic chat completion) can take 100’s of ms to run.
This can really slow down development especially if our notebook
contains many API calls 😞.

`cachy` caches API requests. It does this by saving the result of each
call to a local `cachy.jsonl` file. Before calling an API (e.g. OpenAI)
it will check if the request exists in `cachy.jsonl`. If it does it will
return the cached result.

**How does it work?**

Under the hood popular SDK’s like OpenAI, Anthropic and LiteLLM use
`httpx.Client` and `httpx.AsyncClient`.

`cachy` patches the `send` method of both clients and injects a simple
caching mechanism:

- create a cache key from the request
- if the key exists in `cachy.jsonl` return the cached response
- if not, call the API and save the response to `cachy.jsonl`

## Usage

To use `cachy`

- install the package: `pip install pycachy`
- add the snippet below to the top of your notebook

``` python
from cachy import enable_cachy

enable_cachy()
```

By default `cachy` will cache requests made to OpenAI, Anthropic, Gemini
and DeepSeek.

*Note: Gemini caching only works via the LiteLLM SDK.*

> [!NOTE]
>
> ### Custom APIs
>
> If you’re using the OpenAI or LiteLLM SDK for other LLM providers like
> Grok, Mistral you can cache these requests as shown below.
>
> ``` python
> from cachy import enable_cachy, doms
> enable_cachy(doms=doms+('api.x.ai', 'api.mistral.com'))
> ```

## Docs

Docs can be found hosted on this GitHub
[repository](https://github.com/AnswerDotAI/cachy)’s
[pages](https://AnswerDotAI.github.io/cachy/).

## How to use

First import and enable cachy

``` python
from cachy import enable_cachy
```

``` python
enable_cachy()
```

Now run your api calls as normal.

``` python
from openai import OpenAI
```

``` python
cli = OpenAI()
```

``` python
r = cli.responses.create(model="gpt-4.1", input="Hey!")
r
```

    Response(id='resp_0f917a6452ee099400697b191473c48191b8e4f6e76e0add02', created_at=1769675028.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4.1-2025-04-14', object='response', output=[ResponseOutputMessage(id='msg_0f917a6452ee099400697b1914af1081919353425f8cd2d7d7', content=[ResponseOutputText(annotations=[], text='Hey! How can I help you today? 😊', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, completed_at=1769675028.0, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, prompt_cache_retention=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), safety_identifier=None, service_tier='default', status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'), top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=9, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=11, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=20), user=None, billing={'payer': 'developer'}, frequency_penalty=0.0, presence_penalty=0.0, store=True)

If you run the same request again it will read it from the cache.

``` python
r = cli.responses.create(model="gpt-4.1", input="Hey!")
r
```

    Response(id='resp_0f917a6452ee099400697b191473c48191b8e4f6e76e0add02', created_at=1769675028.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4.1-2025-04-14', object='response', output=[ResponseOutputMessage(id='msg_0f917a6452ee099400697b1914af1081919353425f8cd2d7d7', content=[ResponseOutputText(annotations=[], text='Hey! How can I help you today? 😊', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, completed_at=1769675028.0, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, prompt_cache_retention=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), safety_identifier=None, service_tier='default', status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'), top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=9, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=11, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=20), user=None, billing={'payer': 'developer'}, frequency_penalty=0.0, presence_penalty=0.0, store=True)
