Metadata-Version: 2.4
Name: feather_tools
Version: 1.1.0
Summary: Scripts to use for pyarrow feather files in a Linux terminal window
Author-email: Andreas Paepcke <paepcke@cs.stanford.edu>
Maintainer-email: Andreas Paepcke <paepcke@cs.stanford.edu>
License: MIT License
Project-URL: Homepage, https://github.com/paepcke/feather-tools
Project-URL: Documentation, https://github.com/paepcke/feather-tools
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.2.2
Requires-Dist: pyarrow>=16.0.0
Provides-Extra: test
Requires-Dist: pytest; extra == "test"

# Command Line Tools for Apache Feather Files

Stand-alone Unix shell command line tools for Apache's .feather formatted files:

```bash
- fless
- ftail
- fwc (-l)
- f2csv
```

Each tool operates in the spirit of the analogous Unix shell command. Only the most basic uses of these original Unix tools are suported in their *f* version. For example, the `fwc` command operates like `wc -l`, i.e. it displays the number of data rows. But `wc -c` is not provided.

## Installation

The recommended installation is via `pipx`, which is designed to install shell level command line tools. It will create a `$HOME/.local/bin` directory if it does not exist. The commands `fless`, `ftail`, etc. will reside there. The  `pipx` installation offers to add the location to the OS `PATH` by adding an entry in the shell config file.

```bash
- pip install pipx
- pipx install feather-tools
```

## Usage

### `fless`

```bash
fless <fpath>
```

Shows one screenful of the .feather file at a time. The number of rows displayed is determined by the terminal in which the command is issued. At the end of each displayed page, type a single character:

- To show the next page: `spacebar`, or `\n`, or the character *n*
- Back one page: *b*
- Back to beginning (page 0): *s*
- To the last page: *e*
- For help: *h*
- To quit the display: *q*

### `ftail <fpath> [(-n | --lines) n]`

Displays the last *n* rows of the .feather file. Default is the lesser of 10, and the length of the data.

**NOTE**: Starts the row display with the logical (i.e. terminal-height) page that contains the first line specified by the tail default or in the `--lines` argument. So a few more rows than requested may be displayed at the top.

### `fwc <fpath>`

Is analogout to the Unix `wc -l`, and shows the number of data rows, not counting the column header.

### `f2csv <src-fpath> [<dst-fpath>]`

Writes a .csv file that contains the .feather data. Default separator is comma. Default output is stdout.

The command line arguments are as in the [Pandas `df.to_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) documentation. This means that the `index` keyword is True by default, while `index_label` is None. Which leads to ugly .csv like:

```markdown
,foo,bar,fum
Row0,1,2,3
Row1,4,5,6
Row2,7,8,9
```

Where the `Row`*n* are the dataframe index. Note the orphan comma in the header. To fix (same as with df.to_csv()):

```bash
f2csv --index=true --index_label=Rows file.feather

Rows,foo,bar,fum
Row0,1,2,3
Row1,4,5,6
Row2,7,8,9
```

or:

```bash
f2csv --index=false file.feather

foo,bar,fum
1,2,3
4,5,6
7,8,9
```

If you clone this repo to make changes: all Python imports are relative to `<proj-root>/src`. So ensure that your $PYTHONPATH includes that directory.

## Testing

To test the code, clone this repo, and install the test dependency:

```bash
pip install .[test]
pytest
```
