Metadata-Version: 2.1
Name: tsstp
Version: 0.2.0
Summary: Template semi-structure text parser
Home-page: https://gitlab.com/bmaterials/tsstp
Author: Zhuming Gong; Bing He; 
Author-email: shhebing@qq.com
License: UNKNOWN
Platform: UNKNOWN
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: lark (==1.1.2)
Requires-Dist: xmltodict (==0.12.0)

# Templates Semi-Structured Text Parser 
tsstp is a Python module, the Template semi-structure text parser, which allows custom parsing of semi-structured text data using templates. tsstp was developed to enable programmatic access to semi-structured textual data generated by computational software in computational materials science, but it can be used to parse any semi-structured text that contains unique repetitive patterns, and can also be extended to other texts with special patterns and textual notation.

Unlike regular file-specific parsers that require only input data, tsstp takes two files as input - the data to be parsed and the parsing template - and returns a result structure with extracted information.

The same data can be parsed by different templates to produce results corresponding to the template. Templates are easy to create and users can define templates to extract data according to their needs and are encouraged to write their own ttp templates,through this way to achieve simple data reuse, catering to the FAIR principle.

## install

```
pip install tsstp
```

## how to use 

 ```python
from tsstp import DataTemplate

data_to_parse = """
 POSCAR\(4)
 3
1.00000000000000
  8.3879995346000005    0.0000000000000000    0.0000000000000000
  0.0000000000000000    8.3879995346000005    0.0000000000000000
  0.0000000000000000    0.0000000000000000   23.0000000000000000
O    Fe   Ni
 50    33     1
 Direct configuration= 1
 """

template = """
 {{ head }}
 {{ loop_num }}
 {{ Scaling }}
 {{ Coordinates1 }} ~ loop_num
 {{ Coordinates2 }} ~ 3
 {{ Coordinates3 }} ~ 3
 {{ elements }} ~ n
 {{ elements_num }} ~ n
 Direct configuration= {{ number }}
 """

# create parser object and parse data using template:
parser = DataTemplate(data=data_to_parse, template=template)
parser.parse()

# print result in JSON format
results = parser.result(format='json')
print(results)
{
 "head": "POSCAR\(4)",
 "loop_num": "3",
 "Scaling": "1.00000000000000",
 "Coordinates1": ["8.3879995346000005", "0.0000000000000000","0.0000000000000000"],
 "Coordinates2": ["0.0000000000000000", "8.3879995346000005", "0.0000000000000000"],
 "Coordinates3": ["0.0000000000000000", "0.0000000000000000", "23.0000000000000000"],
 "elements": ["O", "Fe", "Ni"],
 "elements_num": ["50", "33", "1"],
 "number": "1"
  }
 ```

