Metadata-Version: 2.1
Name: pytokr
Version: 1.0
Summary: Very simple tokenizer for teaching purposes
Author-email: José Luis Balcázar <jose.luis.balcazar@upc.edu>
Project-URL: Homepage, https://github.com/balqui/pytokr
Project-URL: Bug Tracker, https://github.com/balqui/pytokr/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# pytokr

Very simple, somewhat stoned tokenizer for teaching purposes.

Current version 1.0 both for this repo and for the pip-installable version.

Behaviorally inspired by the early versions of the 
[easyinput module](https://github.com/jutge-org/easyinput); 
shares with it some similar aims, but not the aim of 
conceptual consistency with C/C++. A separate, different 
evolution of `easyinput` is [yogi](https://github.com/jutge-org/yogi).


<!--- CHECK OUT THE --upgrade INDICATION
--->


## Install

The usual incantation should work: `pip install pytokr` or,
in case you already have an earlier `pytokr`,
`pip install --upgrade pytokr`
(maybe with either `sudo` or `--user` or within a 
virtual environment).

If that does not work, download or clone the repo, then 
put the `pytokr` folder where Python can see it from 
wherever you want to use it.

## Simplest usage

Finds items (simple tokens, white-space separated) in a 
string-based iterable such as stdin (default). Ends of 
line are counted as white space but are otherwise ignored. 

Simplest usage is

`from pytokr import item`

Then call `item()` to keep retrieving white-space-sparated
items from `stdin`. In case no items remain, a custom 
`EndOfDataError` exception will be raised. Note that, 
as white-space is ignored, including ends of line, 
in case only white-space remains then the program _is_ at 
end of data. The outcomes are `str`: casting them into 
`int` or `float` or whatever, if convenient, falls upon 
the caller. 
Of course you can assign to the function a different name
at `import` time by using a standard `as` clause.

Alternatively, you may import an iterator on the whole
contents of `stdin`:

`from pytokr import items`

It is most naturally employed in a `for` loop:

`for itm in items():`

Then, the iterator gracefully stops at end of data and 
does not raise the `EndOfDataError` exception. Again the
renaming option applies, of course, and again ends of line
are ignored as white space.

In case you import both, they will interact naturally: the 
individual `item()` function can be called inside a `for` 
loop on the iterator, provided there is still at least 
one item not yet read. That call will advance the items; 
so, the next item at the loop will be the current one _after_ 
the local advances. Briefly: both advance _the same_ iterator.

## Slightly less simple usage

Alternatively, import the function that creates the reading
functions:

`from pytokr import pytokr`

Call then `pytokr` to obtain the tokenizer function; give it 
whatever name you see fit, say, `item`:

`item = pytokr()`

If a different source of items is desired, say `source` 
(e.g. a `file` just `open`'ed or a list of strings), 
simply pass it on:

`item = pytokr(source)`

In either case, a second output can be requested, namely, an
iterator over the items, say you want to name it `items`:

`item, items = pytokr(iter = True)`

(such a call would accept as well a `source` as first parameter).
Then you can run `for itm in items():` or make up a `ls = list(items())`
and, with some care, avoid the dependence on the `EndOfDataError`
exception. Both combine naturally as explained above.

Also `from pytokr import __version__` works as expected.

<!--- Both calls combine naturally: it is valid to call `item()` 
within a `for w in items()` loop provided there is still 
at least one item not yet read. The reading will advance 
on and the next item in the loop will correspond to the 
advance. 

Then, successive calls to `item()` will provide you with
successive tokens from `stdin`. 
A call to `item()` at end of data raises 
the custom `EndOfDataError` exception.


Token items are returned as strings; the user should cast them as
int or float or whatever when appropriate.

All items provided are of type `str` and will not contain 
white space; cast them explicitly 
into `int` or `float` or whatever if that is what you need. 

--->

## Example

Based on [Jutge problem P29448](https://jutge.org/problems/P29448_en)
Correct Dates (and removing spoilers):

    from pytokr import pytokr
    item, items = pytokr(iter = True)
    # alternative: from pytokr import item, items
    for d in items():
        m, y = item(), item()
        if correct_date(int(d), int(m), int(y)):
            print("Correct Date")
        else:
            print("Incorrect Date")

## (Un)Deprecations

The import of `item` and `items` has gone through several
deprecation and undeprecation stages. They are currently
undeprecated and usable with normality. Please try to upgrade
to the most advanced version of `pytokr` and check the descriptions above.

The function `make_tokr` from earlier versions stays
deprecated. If employed on version 1.0 it will still work
but will print a deprecation message on `stderr`.


<!--- ## To do: 

- Sources in the 'tests/deprecated/' folder may use 
obsolete identifiers; keep updating them and moving
them to other appropriate folders (not sure how many
yet to go).

- I called initially the items 'toks' (for very simple 
'tokens') but that sounded a bit inappropriate to me, 
first, because of the simplicity of the case and, 
second, due to the early programming level of my 
target students. Calling them 'items' seems suboptimal 
though, since we are going to study `dict`'s later on 
and then risk confusions. But I settled on 'items' for 
the time being anyway; alternative suggestions welcome.

--->
