Metadata-Version: 2.1
Name: tworavens-preprocess
Version: 1.0
Summary: TwoRavens Preprocess package
Home-page: https://github.com/TwoRavens/raven-metadata-service
Author: Two Ravens Team
Author-email: raman_prasad@harvard.edu
License: UNKNOWN
Keywords: tworavens preprocess metadata
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.6
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: pandas (>=0.22.0)
Requires-Dist: scipy (>=1.0.0)
Requires-Dist: simplejson (>=3.13.2)
Requires-Dist: dictdiffer (==0.8.0)
Requires-Dist: xlrd (>=1.1.0)
Requires-Dist: jsonschema (>=2.6.0)
Requires-Dist: pycountry (==19.8.18)
Requires-Dist: us (==1.0.0)

# TwoRavens Preprocess

Python package to produce TwoRavens metadata:
  - https://pypi.org/project/tworavens-preprocess/

```
pip install tworavens-preprocess
```

##  Preprocess a data file

- Open a python shell

```
from raven_preprocess.preprocess_runner import PreprocessRunner

# process a data file
#
run_info = PreprocessRunner.load_from_file('input/path/my-data-file.csv')

# Did it work?
#
if not run_info.success:
    # nope :(
    #
    print(run_info.err_msg)
else:
    # yes :)
    #
    runner = run_info.result_obj

    # show the JSON (string)
    #
    print(runner.get_final_json(indent=4))

    # retrieve the data as a python OrderedDict
    #
    metadata = runner.get_final_dict()

    # iterate through the variables
    #
    for vkey, vinfo in metadata['variables'].items():
        print('-' * 40)
        print(f'--- {vkey} ---')
        print('nature:', vinfo['nature'])
        print('invalidCount:', vinfo['invalidCount'])
        print('validCount:', vinfo['validCount'])
        print('uniqueCount:', vinfo['uniqueCount'])
        print('median:', vinfo['median'])
        print('etc...')
```

##  Preprocess a single file: output to screen or file

```
# -------------------------
# Preprocess a single file,
# Write output to screen
# -------------------------
from raven_preprocess.preprocess import run_preprocess
run_preprocess('path-to-input-file.csv')

# -------------------------
# Preprocess a single file,
# Write output to file
# -------------------------
from raven_preprocess.preprocess import run_preprocess
run_preprocess('path-to-input-file.csv', 'path-to-OUTPUT-file.csv')
```


