Metadata-Version: 2.1
Name: masa-ai
Version: 0.1.0
Summary: MASA SDK - Masa's AI Software Architecture
Author-email: Masa AI <engineering@masa.ai>
License: # MIT License
        
        Copyright (c) 2024 Masa
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/masa-finance/masa
Project-URL: Bug Tracker, https://github.com/masa-finance/masa/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=5.4
Requires-Dist: requests
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: python-dotenv
Requires-Dist: tqdm
Requires-Dist: colorlog
Requires-Dist: dynaconf
Requires-Dist: scikit-learn
Requires-Dist: torch
Requires-Dist: transformers
Requires-Dist: streamlit
Requires-Dist: streamlit-extras
Requires-Dist: langchain
Requires-Dist: langchain_community
Requires-Dist: langchain-openai
Requires-Dist: langchainhub
Requires-Dist: langchain-ollama
Requires-Dist: openai
Requires-Dist: tiktoken
Requires-Dist: nomic[local]
Requires-Dist: ollama
Requires-Dist: pinecone-client[grpc]
Requires-Dist: matplotlib
Requires-Dist: jupyter
Requires-Dist: notebook
Requires-Dist: ipykernel
Requires-Dist: sphinx
Requires-Dist: sphinx_rtd_theme
Requires-Dist: recommonmark
Requires-Dist: myst-parser
Requires-Dist: psycopg2
Requires-Dist: feedparser
Requires-Dist: tavily-python
Requires-Dist: langgraph
Requires-Dist: assemblyai
Requires-Dist: pytube

# Masa AI Software Architecture

MASA is a project for data retrieval, quality control, and orchestration. It currently provides tools to retrieve data from Twitter using the Masa Protocol Node API, with plans to expand to other data sources and functionalities in the future.

Currently this SDK requires a Masa Protocol Node to be running on the system. Instructions on how to install and run a node can be found [here](https://developers.masa.ai/docs/welcome-to-masa/#masa-protocol).

## Quick Start

1. Install the MASA package:

   ```bash
   pip install masa-ai
   ```

2. Create a `request_list.json` file with the queries you'd like to process. This file can be placed anywhere on your system. Here is an example of what the `request_list.json` might look like:

   ```json
   [
       {
           "query": "#example",
           "max_results": 100
       },
       {
           "query": "from:example_user",
           "max_results": 50
       }
   ]
   ```

   An example `request_list.json` file is included in the package. You can find it using the following command:

   ```bash
   EXAMPLE_PATH=$(pip show masa-ai | grep Location | awk '{print $2"/masa_ai/request_list.json"}')
   echo "Example request_list.json path: $EXAMPLE_PATH"
   ```

3. Use the MASA CLI:

   ```bash
   masa-ai-cli <action> [arguments]
   ```

   Available actions:
   - `process [path_to_requests_json]`: Process all requests (both resumed and new)
   - `--docs [page_name]`: Rebuild and view the documentation for the specified page
   - `--data`: List the scraped data files

   For example:

   ```bash
   masa-ai-cli process /path/to/request_list.json
   masa-ai-cli --docs usage
   masa-ai-cli --data
   ```

4. Accessing Scraped Data:

   The data that is scraped is saved within the package directory under the `data` folder. To list all scraped data files, use the following command:

   ```bash
   masa-ai-cli --data
   ```

   This will display the structure of the `data` folder and list all the files contained within it.

5. Recommendations for Accessing and Using Scraped Data:

   - **Command Line**: You can navigate to the `data` folder using the command line to view and manipulate the files directly. Here is a step-by-step example:

   > **IMPORTANT:** The `data` folder is not included in the package. It is only created when you run the `masa-ai-cli process [path_to_requests_json]` command.

     ```bash
     # Find the installation path of the masa package
     PACKAGE_PATH=$(pip show masa-ai | grep Location | awk '{print $2"/masa_ai"}')
     echo "Masa package path: $PACKAGE_PATH"
     ```

     You can use this path to access data for further process, analysis, and utilization with agents.

6. For detailed usage instructions, please refer to the [Usage Guide](usage.rst).

## Configuration

The project uses YAML files for configuration:

- `configs/settings.yaml`: Main configuration file containing settings for Twitter API, request management, and logging.
- `configs/.secrets.yaml`: (Optional) File for storing sensitive information like API keys. Not currently in use.

The `settings.yaml` file is loaded using Dynaconf, which allows for easy environment-based configuration management.

## Advanced Twitter Search

The Masa Protocol Node API provides advanced search capabilities for retrieving Twitter data. Some of the available search options include:

- Hashtag Search: `#hashtag`
- Mention Search: `@username`
- From User Search: `from:username`
- Keyword Exclusion: `-keyword`
- OR Operator: `term1 OR term2`
- Geo-location Based Search: `geocode:latitude,longitude,radius`
- Language-Specific Search: `lang:language_code`

For more details, refer to the [Masa Protocol Twitter Docs](xtwitter_advanced.rst).

## Project Structure

- `masa_ai/`: Main package directory
  - `configs/`: Configuration files
  - `connections/`: API connection handlers
  - `tools/`: Core functionality modules
    - `qc/`: Quality control tools
    - `retrieve/`: Data retrieval tools
    - `utils/`: Utility functions
  - `orchestration/`: Request management and processing

## Dependencies

Key dependencies include:

- Data processing: numpy, pandas
- API interaction: requests
- Configuration: dynaconf
- Quality control: colorlog
- Natural Language Processing: langchain, openai
- Data visualization: matplotlib, streamlit

For a full list of dependencies, refer to `pyproject.toml`.

## Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for more information on how to get started, including documentation best practices.

## Documentation

The MASA project uses Sphinx to generate its documentation. The documentation is automatically rebuilt and viewed when using the `--docs` option with the `masa-ai-cli` command.

To view the documentation:

```bash
masa-ai-cli --docs [page_name]
```

This command will rebuild and view the documentation for the specified page. Note that the page name is optional. If no page name is provided, the documentation for the entire project will be displayed.

## License

This project is licensed under the MIT License. See the [LICENSE](../../LICENSE) file for details.
