Metadata-Version: 1.1
Name: unidump
Version: 1.1.2
Summary: hexdump for your unicode data
Home-page: https://github.com/Codepoints/unidump
Author: Manuel Strehl
Author-email: boldewyn@gmail.com
License: MIT
Description:   A Unicode codepoint dump.
        
          Think of it as  hexdump(1)  for Unicode.  The command analyses  the input and
          prints then three columns:  the raw byte count of the first codepoint in this
          row,  codepoints in their hex notation,  and finally the raw input characters
          with control and whitespace replaced by a dot.
        
          Invalid byte sequences are represented with an “X” and with the hex value en-
          closed in question marks, e.g., “?F5?”.
        
          You can pipe in  data from stdin,  select several files at once,  or even mix
          all those input methods together.
        
        Examples:
        
        * Basic usage with stdin:
        
              echo -n 'ABCDEFGHIJKLMNOP' | unidump -n 4
                    0    0041 0042 0043 0044    ABCD
                    4    0045 0046 0047 0048    EFGH
                    8    0049 004A 004B 004C    IJKL
                   12    004D 004E 004F 0050    MNOP
        
        * Dump the code points translated from another encoding:
        
              unidump -c latin-1 some-legacy-file
        
        * Dump many files at the same time:
        
              unidump foo-*.txt
        
        * Control characters and whitespace are safely rendered:
        
              echo -n -e '\x01' | unidump -n 1
                   0    0001    .
        
        * Finally learn what your favorite Emoji is composed of:
        
              ( echo -n -e '\xf0\x9f\xa7\x9d\xf0\x9f\x8f\xbd\xe2' ; \
                echo -n -e '\x80\x8d\xe2\x99\x82\xef\xb8\x8f' ; ) | \
              unidump -n 5
                   0    1F9DD 1F3FD 200D 2642 FE0F    .🏽.♂️
        
          See  <http://emojipedia.org/man-elf-medium-skin-tone/> for images.  The “elf”
          emoji (the first character) is replaced with a dot here,  because the current
          version of Python’s unicodedata doesn’t know of this character yet.
        
        * Use it like strings(1):
        
              unidump -e '{data}' some-file.bin
        
          This will replace  every unknown byte from the input file  with “X” and every
          control and whitespace character with “.”.
        
        * Only print the code points of the input:
        
              unidump -e '{repr}'$'\n' -n 1 some-file.txt
        
          This results in  a stream of codepoints in hex notation,  each on a new line,
          without byte counter  or rendering of actual data.  You can use this to count
          the total amount of characters  (as opposed to raw bytes)  in a file,  if you
          pipe it through `wc -l`.
        
        This is version 1.1.2 of unidump, using Unicode 8.0.0 data.
        
Keywords: unicode hexdump codepoint utility
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Debuggers
Classifier: Topic :: Software Development :: Internationalization
Classifier: Topic :: Text Processing :: General
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
