Metadata-Version: 2.1
Name: ctextlib
Version: 1.0.6
Summary: Python package with CText C++ extension
Home-page: https://github.com/antonmilev/CText
Author: Anton Milev
Author-email: baj.mile@abv.bg
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# CText
# Modern C++ text processing library

https://github.com/antonmilev/CText

## Python Reference
To install CText:

```
pip install ctextlib
```

To use CText in Python script:
```python
from ctextlib import CTextA as text
a = text("Hello World")
print(a)
```

Python methods reference:

<b>addToFileName</b>
```python
a = text("C:\\Temp\\Temp2\\File.bmp")
a.addToFileName("_mask")
print(a)
```

```
C:\Temp\Temp2\File_mask.bmp
```

<b>append</b>
```python
a = text("Hello ")
a.append("World")
```

```
Hello World
```

```python
a = text("123")
a.append('4',4)
```

```
1234444
```

```python
a = text("")
a.append(['Hello', ' ', 'World'])
```

```
Hello World
```

<b>appendRange</b>
```python
a = text()
a.appendRange('a','z').appendRange('0','9')

```

```
abcdefghijklmnopqrstuvwxyz0123456789
```

<b>between</b>
```python
a = text('The quick brown fox jumps over the lazy dog')
a.between('q','d')
print(a)
```

```
uick brown fox jumps over the lazy
```

```python
a = text('The quick brown fox jumps over the lazy dog')
a.between('quick','lazy')
print(a)
```

```
 brown fox jumps over the
```

<b>contain</b>
```python
a = text('The quick brown fox jumps over the lazy dog')
if a.contain('quick') :
    print("contain 'quick'")
```

```
contain 'quick'
```

Case-incensitive

```python
a = text('The quick brown fox jumps over the lazy dog')
if a.contain('Quick', False) :
    print("contain 'quick'")
```

```
contain 'quick'
```

```python
a = text('The quick brown fox jumps over the lazy dog')
if a.contain(['slow','fast','quick']):
    print("contain 'quick'")
```

```
contain 'quick'
```

<b>containAny</b>
```python
a = text('Hello World')
a.containAny('abcd')
True
```
<b>containOnly</b>
```python
a = text('4365767')
a.containOnly('0123456789')
True
```

<b>convertToHex</b>
```python
a = text("Hello World")
a.convertToHex()
print(a)
```

```
48 65 6C 6C 6F 20 57 6F 72 6C 64
```

<b>count</b>
```python
a = text('The quick brown fox jumps over the lazy dog')
a.count('the', False)
```

```
2
```

<b>countWordFrequencies</b>
```python
from ctextlib import CTextA as text
a = text("The quick brown fox jumps over the lazy dog")
a.countWordFrequencies(False)
```

```
[(2, 'the'), (1, 'brown'), (1, 'dog'), (1, 'fox'), (1, 'jumps'), (1, 'lazy'), (1, 'over'), (1, 'quick')]
```

<b>cutAfterFirst</b>
```python
s = text('The quick brown fox jumps over the lazy dog')
a.cutAfterFirst('o')
```

```
The quick br
```

<b>cutAfterLast</b>
```python
s = text('The quick brown fox jumps over the lazy dog')
a.cutAfterLast('o')
```

```
The quick brown fox jumps over the lazy d
```


<b>cutBeforeFirst</b>
```python
s = text('The quick brown fox jumps over the lazy dog')
a.cutBeforeFirst('o')
```

```
own fox jumps over the lazy dog
```

<b>cutEnds</b>
```python
s = text('The quick brown fox jumps over the lazy dog')
a.cutEnds(4)
```

```
quick brown fox jumps over the lazy
```

<b>cutLeft</b>
```python
s = text("Hello World")
s.cutLeft(6)
```

```
World
```

<b>cutRight</b>
```python
s = text("Hello World")
s.cutRight(6)
```

```
Hello
```

<b>enclose</b>
```python
a = text("Hello World")
a.enclose('<','>')
```

```
<Hello World>
```

<b>endsWith</b>
```python
a = text("Hello World")
if a.endsWith('World'):
    print("ends with 'World'")
```


```
ends with 'World'
```

With case-insensitive search:

```python
a = text("Hello World")
if a.endsWith('world', False):
    print("ends with 'world'")
```

```
ends with 'world'
```

<b>endsWithAny</b>
```python
if(a.endsWithAny(['cat','dog'])):
    print('end to animal...')
```

```
end to animal...
```

<b>erase</b>
```python
a = text('The quick brown fox jumps over the lazy dog')
a.erase(8, 10)
print(a)
```

``` 
The quicx jumps over the lazy dog
``` 

<b>equal</b>
```python
a = text()
a.equal('A',10)
```

```
AAAAAAAAAA
```

<b>find</b>
```python
a = text('The quick brown fox jumps over the lazy dog')
a.find('brown')
```

```
'brown fox jumps over the lazy dog'
```

With case-incensitive search:

```python
a = text('The quick brown fox jumps over the lazy dog')
a.find('Brown', False)
```

```
'brown fox jumps over the lazy dog'
```

<b>fromArray</b>
```python
a = text()
a.fromArray([1,2,3,4])
print(a)
```

```
1 2 3 4
```

```python
a = text()
a.fromArray([1,2,3,4], '|')
print(a)
```

```
1|2|3|4
```

```python
a = text()
a.fromArray([1,2,3,4], '')
print(a)
```

```
1234
```

Array of floats

```python
a = text()
a.fromArray([1.1,2.2,3.3,4.4])
print(a)
```

```
1.1 2.2 3.3 4.4
```

Array of strings
```python
a = text()
a.fromArray(['hello','world'])
print(a)
```

```
hello world
```

```python
import numpy as np
a = text()
a.fromArray(np.array(["hello","world"]))
print(a)
```

```
hello world
```

<b>fromArrayAsHex</b>
```python
a = text()
a.fromArrayAsHex([10,20,30,40])
print(a)
```

```
0A 14 1E 28
```

Use without separator

```python
a.fromArrayAsHex([10,20,30,40],2,'')
print(a)
```

```
0A141E28
```

```python
a = text()
a.fromArrayAsHex([1000,2000,3000,4000])
print(a)
```

```
3E8 7D0 BB8 FA0
```

```python
a = text()
a.fromArrayAsHex([1000,2000,3000,4000], 4, ',')
print(a)
```

```
03E8,07D0,0BB8,0FA0
```

<b>fromBinary</b>
```python
a = text()
a.fromBinary(12345)
print(a)
```

```
00000000000000000011000000111001
```

<b>fromDouble</b>
```python
a = text()
a.fromDouble(3.333338478)
print(a)
a.fromDouble(3.33989, 4)
print(a)
a.fromDouble(3.333338478, 10)
```

```
3.333338
3.3399
3.3333384780
```

<b>fromHex</b>
```python
a = text()
a.fromHex(1234567)
a.fromHex('a')
```

```
0012D687
61
```

<b>fromInteger</b>
```python
a = text()
a.fromInteger(358764)
print(a)
```

```
358764
```

<b>fromMatrix</b>
```python
from ctextlib import CTextA as text
import numpy as np
x = np.array([[10, 20, 30], [40, 50, 60]])
a = text()
a.fromMatrix(x)
print(a)
```

```
10 20 30
40 50 60
```

```python
from ctextlib import CTextA as text
import numpy as np
x = np.array([[10, 20, 30], [40, 50, 60]])
a = text()
a.fromMatrix(x, ',')

```

```
10,20,30
40,50,60
```

<b>fromMatrixAsHex</b>
```python
from ctextlib import CTextA as text
import numpy as np
x = np.array([[10, 20, 30], [40, 50, 60]])
a = text()
a.fromMatrixAsHex(x)
print(a)
```

```
0A 14 1E
28 32 3C
```

```python
from ctextlib import CTextA as text
import numpy as np
x = np.array([[1000, 2000, 3000], [4000, 5000, 6000]])
a = text()
a.fromMatrixAsHex(x,4)
print(a)
```

```
03E8 07D0 0BB8
0FA0 1388 1770
```

<b>getDir</b>
```python
a = text("D:\\Folder\\SubFolder\\TEXT\\file.dat")
a.getDir()
```

```
D:\Folder\SubFolder\TEXT\
```

<b>getExtension</b>
```python
a = text("D:\\Folder\\SubFolder\\TEXT\\file.dat")
a.getExtension()
```

```
'.dat'
```

<b>getFileName</b>
```python
a = text("D:\\Folder\\SubFolder\\TEXT\\file.dat")
a.getFileName()
```

```
'file.dat'
```

<b>hash</b>
```python
s.hash()
```

```
9257130453210036571
```

<b>indexOf</b>
```python
a = text("The quick brown fox jumps over the lazy dog.")
a.indexOf("brown")
```

```
10
```

<b>indexOfAny</b>
```python
a = text("The quick brown fox jumps over the lazy dog.")
a.indexOfAny(["fox", "dog"])
```

```
16
```

<b>indexOfAny</b>

```python
a = text("The quick brown fox jumps over the lazy dog.")
a.indexOfAny("abc")
```

```
7
```

<b>insert</b>
```python
a = text("abc")
a.insert(1,'d',2)
```

```
addbc
```

```python
a = text("The quick jumps over the lazy dog.")
a.insert(10,"fox ")
```

```
The quick fox jumps over the lazy dog.
```

<b>insertAtBegin</b>
<br><b>insertAtEnd</b>
```python
a = text("Hello")
a.insertAtBegin("<begin>")
a.insertAtEnd("</begin>")
```

```
<begin>abc</begin>
```

<b>isAlpha</b>
```python
a = text("Abcd")
a.isAlpha()
True
```

<b>isBinary</b>
```python
a = text("01111011100001")
a.isBinary()
True
```

<b>isEmpty</b>
```python
a = text()
a.isEmpty()
True
```

<b>isHexNumber</b>
```python
a = text("12AB56FE")
a.isHexNumber()
True
```

<b>isNumber</b>
```python
a = text("123456")
a.isNumber()
True
```

<b>isLower</b>
```python
a = text("hello world")
a.isLower()
True
```

<b>isUpper</b>
```python
a = text("HELLO WORLD")
a.isUpper()
True
```

<b>isPalindrome</b>
```python
a = text("racecar")
a.isPalindrome()
True
```

<b>keep</b>
```python
s = text("Hello World").keep(3,5)
```

```
lo Wo
```

<b>keepLeft</b>
```python
a = text("The quick jumps over the lazy dog.")
a.keepLeft(10)
```

```
The quick
```

<b>keepRight</b>
```python
a = text("The quick jumps over the lazy dog.")
a.keepRight(10)
```

```
 lazy dog.
```

<b>lastIndexOf</b>
```python
s = text("Hello World")
s.lastIndexOf('l')
```

```
9
```

<b>lines</b>
```python
a = text("L1\nL2\n\nL3\nL4\n  \n\nL5")
a.lines()
```

```
['L1', 'L2', 'L3', 'L4', 'L5']
```

<b>linesCount</b>
```python
a = text("L1\nL2\n\nL3\nL4\n  \n\nL5")
a.linesCount()
```

```
7
```

<b>linesRemoveEmpty</b>
```python
a = text("L1\nL2\n\nL3\nL4\n  \n\nL5")
a.linesRemoveEmpty()
print(a)
```

```
L1
L2
L3
L4
L5
```

Several per line methods
<br><b>linesAppend</b>
<br><b>linesInsertAtBegin</b>
<br><b>linesSort</b>
<br><b>linesPaddRight</b>
<br><b>linesTrim</b>
<br>Example of opening a text file, sort all lines, and save it with another name
```python
from ctextlib import CTextA as text
s = text()
s.readFile('Unordered.txt')
s.linesSort()
s.writeFile('Sorted_python.txt')
```

<b>limit</b>
```python
s = text("Hello World")
s.limit(6)
```

```
Hello
```

<b>lower</b>
```python
s = text("Hello World")
s.lower()
```

```
hello world
```

<b>makeUnique</b>
```python
a = text()
a.appendRange('a','z').appendRange('a','z')
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
a.makeUnique()
print(a)
```

```
abcdefghijklmnopqrstuvwxyz
```

<b>mid</b>
```python
a = text("Hello World").mid(3)
```

```
lo Wo
```

<b>nextLine</b>
```python
# Example of iterating all lines
from ctextlib import CTextA as text
a = text("Line1\nLine2\nLine3")
line = text()
pos = 0
while(pos >= 0):
    pos = a.nextLine(pos,line)
    print(line)
```

```
Line1
Line2
Line3
```

<b>nextWord</b>
```python
# Example of iterating all words
from ctextlib import CTextA as text
a = text('The quick brown fox jumps over the lazy dog')
word = text()
pos = 0
while(pos >= 0):
    pos = a.nextWord(pos,word)
    print(word)
```

```
The
quick
brown
fox
jumps
over
the
lazy
dog
```

<b>paddLeft</b>
```python
s = text("Abra")
s.paddLeft('.', 16)
```

```
............Abra
```

<b>paddRight</b>
```python
s = text("Abra")
s.paddRight('.', 16)
```

```
Abra............
```

<b>pathCombine</b>
```python
a = text("C:\\Temp")
a.pathCombine("..\\Folder")
```

```
C:\Folder
```

<b>quote</b>
```python
a = text("Hello")
a.quote()
```

```
"Hello"
```

<b>random</b>
```python
a = text()
a.random()
"P1kAlMiG2Kb7FzP5"
a.sort()
"1257AFGKMPPbiklz"
a.shuffle()
"k2lF7KAPG5M1Pzbi"
a.random(32)
P1kAlMiG2Kb7FzP5tM1QBI6DSS92c31A
```

<b>randomAlpha</b>
```python
s = text()
s.randomAlpha()
IkEffmzNiMKKASVW
```

<b>randomNumber</b>
```python
s = text()
s.randomNumber()
3892795431
s.randomNumber(32)
33341138742779319865028602486509
```

<b>readFile</b>
```python
# demontrates how to read a whole text file
from ctextlib import CTextA as text
a = text()
a.readFile('test.txt')
print(a)
```

```
Hello World
```


<b>regexMatch</b>
```python
s = text("+336587890078")
if(s.regexMatch("(\\+|-)?[[:digit:]]+")):
    print("it is a number")
```

```
it is a number
```

<b>regexLines</b>
```txt
animals.txt
------------
Cat
Dog
Giraffe
Lion
Llama
Monkey
Mouse
Parrot
Poodle
Scorpion
Snake
Weasel
```

```python
# collect all lines starting with given characters
from ctextlib import CTextA as text
a = text()
a.readFile("animals.txt")
a.regexLines("^[A-G][a-z]+")
```

```
['Cat', 'Dog', 'Giraffe']
```

<b>regexReplace</b>    
```python
from ctextlib import CTextA as text
a = text("there is sub-sequence in the sub-way string")
a.regexReplace("\\b(sub)([^ ]*)", "sub-$2")
```

```
there is sub--sequence in the sub--way string
```

<b>regexSearch</b>    
```python
# collect all words using regex
from ctextlib import CTextA as text
a = text("The quick brown fox jumps over the lazy dog")
a.regexSearch("\\w+")
```

```
'The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
```

<b>regexWords</b>    
```python
# collect all words starting with given characters
from ctextlib import CTextA as text
a = text("The quick brown fox jumps over the lazy dog")
a.regexWords("^[a-n][a-z]+")
```

```
['brown', 'fox', 'jumps', 'lazy', 'dog']   
```

<b>remove</b>
```python
a = text('we few, we happy few, we band of brothers.')
a.remove('we')
a.reduceChain()
a.trim()
```

```
few happy few band of brothers
```

<b>removeAny</b>
```python
from ctextlib import CTextA as text
a = text('The quick brown fox jumps over the lazy dog')
a.removeAny(['brown','quick','lazy'])
a.reduceChain()
```

```
The fox jumps over the dog
```

<b>removeExtension</b>
```python
a = text("D:\\Folder\\SubFolder\\TEXT\\File.dat")
a.removeExtension()
```

```
D:\Folder\SubFolder\TEXT\File
```

<b>removeFileName</b>
```python
a = text("D:\\Folder\\SubFolder\\TEXT\\File.dat")
a.removeFileName()
```

```
D:\Folder\SubFolder\TEXT\
```

<b>removeWhileBegins</b>
```python
a = text("Some text ending with something")
a.removeWhileBegins("Some text ")
print(a)
```

```
ending with something
```

<b>removeWhileEnds</b>
```python
a = text("Some text ending with something")
a.removeWhileEnds(" something")
print(a)
```

```
Some text ending with
```

<b>replace</b>
```python
a = text("The quick brown fox jumps over the lazy dog")
a.replace("fox", "cat")
print(a)
```

```
The quick brown cat jumps over the lazy dog
```

```python
a = text("The quick brown fox jumps over the lazy dog")
a.replace(["fox", "cat","dog","quick"], "-")
```

```
The ----- brown --- jumps over the lazy ---
```

<b>replaceAny</b>
```python
a = text("The quick brown fox jumps over the lazy dog")
a.replaceAny(["fox", "cat","dog"], "***")
print(a)
```

```
The quick brown *** jumps over the lazy ***
```

```python
a = text("The quick brown fox jumps over the lazy dog")
a.replaceAny(["fox", "dog"], ["dog", "fox"])
```

```
The quick brown dog jumps over the lazy fox
```

<b>reverse</b>
```python
a = text("Hello")
a.reverse()
```

```
olleH
```






<b>right</b>
```python
a = text("Hello World")
a.right(5)
```

```
World
```

<b>rotate</b>
```python
a = text("Hello World")
a.rotateLeft(2)
a.rotateRight(4)
```

Output
```
llo WorldHe
ldHello Wor
```

<b>split</b>
```python
# by default split uses the standard separators (" \t\r\n")
a = text("The quick brown fox jumps over the lazy dog")
a.split()
```

```
['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
```

```python
# split can be used with any list of separator characters
a = text("The quick, brown....fox,,, ,jumps over,the  lazy.dog")
a.split(",. ")
```

```
['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
```

<b>toBinary</b>
```python
bOk = False
a = text("100001")
a.toBinaryNumber(bOk)
33
```

<b>toHexNumber</b>
```python
bOk = False
a = text("1E1E")
a.toHexNumber(bOk)
7710
```

<b>trim</b>
```python
a = text(" \t\n   lazy dog  \t\n   ")
a.trim()
lazy dog
a = text("000000000000101")
a.trimLeft("0")
101
a = ("101000000000000")
a.trimRight('0')
101
a = text("0000000101000000000")
a.trim("0")
101
```

<b>upper</b>
```python
s = text("Hello World")
s.upper()
```

```
HELLO WORLD
```

<b>words</b>
```python
a = text("The quick brown fox jumps over the lazy dog")
a.words()
```

```
['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
```

```python
a = text("The|quick|brown|fox|jumps|over|the|lazy|dog")
a.words('|')
```

```
['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
```

<b>wordsCapitalize</b>
```python
a = text("The quick brown fox jumps over the lazy dog")
a.wordsCapitalize()
```

```
The Quick Brown Fox Jumps Over The Lazy Dog
```

<b>wordsCount</b>
```python
a = text('The quick brown fox jumps over the lazy dog')
a.wordsCount()
```

```
9
```

<b>wordsEnclose</b>
```python
a = text("The quick brown fox jumps over the lazy dog")
a.wordsEnclose('[',']')
```

```
[The] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog]
```

<b>wordsReverse</b>
```python
a = text("The quick brown fox jumps over the lazy dog")
a.wordsReverse()
```

```
ehT kciuq nworb xof spmuj revo eht yzal god
```


<b>wordsSort</b>
```python
a = text('The quick brown fox jumps over the lazy dog')
a.wordsSort()
```

Output
```
The brown dog fox jumps lazy over quick the
```

<b>writeFile</b>
```python
# demontrates how to read a whole text file
from ctextlib import CTextA as text
a = text("Hello World")
a.writeFile('test.txt')
print(a)
```

For the full list type help(ctextlib).

## Performance Tests

Comparing to built-in Python text library, in many cases CText methods are faster, sometimes 2-3 times faster. When using Python regular epressions difference is even bigger.

Below are given several Python performance tests and obtained results.

Test 1 - words enclose
```python
from time import perf_counter
from ctextlib import CTextA as text
import re
import urllib.request

# download words.txt from https://github.com/dwyl/english-words
print("download words.txt.....")
url = 'https://github.com/dwyl/english-words/raw/master/words.txt'
urllib.request.urlretrieve(url, 'words.txt')

a = text()
if(a.readFile("words.txt") == False):
    print("error openning file")
    exit()
start = perf_counter()
a.wordsEnclose('[',']')
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("words.txt CText wordsEnclose", duration * 1000))

if(a.readFile("words.txt") == False):
    print("error openning file")
    exit()

s = a.str()

start = perf_counter()
b = ' '.join('[{}]'.format(word) for word in s.split('\n'))
duration = perf_counter() - start

print('{} took {:.3f} ms'.format("words.txt Python split-join", duration * 1000))

s = a.str()

start = perf_counter()
s = re.sub(r'(\w+)',r'[\1]',s)
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("words.txt Python re.sub", duration * 1000))
```

Output
```
words.txt CText wordsEnclose took 92.083 ms
words.txt Python split-join took 186.377 ms
words.txt Python re.sub took 601.214 ms
```

Test 2 - words reverse

```python
from time import perf_counter
from ctextlib import CTextA as text
import re
import urllib.request

# download words.txt from https://github.com/dwyl/english-words
print("download words.txt.....")
url = 'https://github.com/dwyl/english-words/raw/master/words.txt'
urllib.request.urlretrieve(url, 'words.txt')

a = text()
if(a.readFile("words.txt") == False):
    print("error openning file")
    exit()
start = perf_counter()
a.wordsReverse()
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("words.txt CText wordsReverse", duration * 1000))

if(a.readFile("words.txt") == False):
    print("error openning file")
    exit()

s = a.str()

start = perf_counter()
b = ' '.join( word[::-1] for word in (s.split('\n')))
duration = perf_counter() - start

print('{} took {:.3f} ms'.format("words.txt Python reverse split-join", duration * 1000))

start = perf_counter()
words = ' '.join( word[::-1] for word in ( re.findall('\w+|[:;,.!?]', s)))
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("words.txt Python reverse re.findall", duration * 1000))
```

```
words.txt CText wordsReverse took 78.501 ms
words.txt Python reverse split-join took 130.286 ms
words.txt Python reverse re.findall took 609.706 ms
```

Test 2 - remove repeating lines

```python
from time import perf_counter
from ctextlib import CTextA as text
import re
import urllib.request

# download from https://www.gutenberg.org/files/2600/2600-0.txt
print("download 2600-0.txt.....")
url = 'https://www.gutenberg.org/files/2600/2600-0.txt'
urllib.request.urlretrieve(url, '2600-0.txt')

a = text()
if(a.readFile("2600-0.txt") == False):
    print("error opening file")
    exit()

s = a.str()

start = perf_counter()
a.linesRemoveEmpty()
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("linesRemoveEmpty", duration * 1000))
#print(a)

start = perf_counter()    
b = '\n'.join(line for line in s.split('\n') if line.strip() != '')
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("line.strip 18362.txt", duration * 1000))
```


```
linesRemoveEmpty took 11.599 ms
line.strip took 31.567 ms
```

![CText Performance](https://github.com/antonmilev/CText/blob/master/PerfChart.png)

When comparing CText words list opperations with Python regular expressions difference in performance gap becomes much bigger. 
For example below is compared CText wordsReplaceAny function with regex.sub. For managing large words lists, CText uses optimized character tries 
and thus search time is a linear function from the words number. For replacing the 500th most common English words with a single fixed string in War and Peace book, by Leo Tolstoy (Gutenberg EBook), CText needs 27 times less time than the regular expessions, for 1000 words <b>CText becomes more than 50 times faster</b>!

![CText replace words Performance](https://github.com/antonmilev/CText/blob/master/PerfWordsReplace.png)

```python
from time import perf_counter
from ctextlib import CTextA as text
import re
import urllib.request

url = 'https://gist.githubusercontent.com/deekayen/4148741/raw/98d35708fa344717d8eee15d11987de6c8e26d7d/1-1000.txt'
urllib.request.urlretrieve(url, 'words1000.txt')

with open('words1000.txt', 'r') as f:
    words = f.read().split('\n')

print(words)

print("replace using CText.....")    
a = text()
if(a.readFile("2600-0.txt") == False):
    print("error opening file")
    exit()

s = a.str()

start = perf_counter()   
a.wordsReplaceAny(words, "***")
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("replace 1000 words with CText wordsReplaceAny", duration * 1000))

start = perf_counter()  
regex = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, words)))   
s_new = regex.sub("***", s)
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("replace 1000 words with regex.sub", duration * 1000))

```

```
replace 1000 words with CText wordsReplaceAny took 77.058 ms
replace 1000 words with regex.sub took 4445.524 ms
```

Similarly, for wordsReplaceWithChar difference with re.sub is <b>more than 60 times</b>: 

```python
# replaces words from the given list with a single character with same words length
from time import perf_counter
from ctextlib import CTextA as text
import re
import urllib.request

url = 'https://gist.githubusercontent.com/deekayen/4148741/raw/98d35708fa344717d8eee15d11987de6c8e26d7d/1-1000.txt'
urllib.request.urlretrieve(url, 'words1000.txt')

with open('words1000.txt', 'r') as f:
    words = f.read().split('\n')[:1000]


print("replace using CText.....")    
a = text()
if(a.readFile("2600-0.txt") == False):
    print("error opening file")
    exit()

s = a.str()

start = perf_counter()   
a.wordsReplaceWithChar(words, "-")
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("replace 1000 words with CText wordsReplaceWithChar", duration * 1000))

#print(a)

def repl(m):
    return '-' * len(m.group())

start = perf_counter()  
regex = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, words)))   
s_new = regex.sub(repl, s)
duration = perf_counter() - start
print('{} took {:.3f} ms'.format("replace 1000 words with regex.sub", duration * 1000))
```

```
replace 1000 words with CText wordsReplaceWithChar took 69.136 ms
replace 1000 words with regex.sub took 4225.293 ms
```

## TODO List
* **More methods for words,lines,sentences and complex expressions**:  There are lots more methods that can be added to support diferent NLP and lexical tasks.
* **Further improve containers abstraction**: CText needs more convertion routines to/from STL and other containers and generic data structures.
* **Regular Expressions**: - Partial or full support to regular expressions.
* **Other char types**: - Character types like char_32 can be also supported
* **Mini Text Editor**: - This is a text editor based on CText that I plan to port on Modern C++.
* **Export to Python**: - I want to export CText library to Python-3



