Metadata-Version: 2.4
Name: rtfsig
Version: 0.1.4
Summary: Extract potentially unique strings from RTF files for threat hunting
Home-page: https://github.com/PwCUK-CTO/rtfsig
Author: David Cannings
Author-email: david@edeca.net
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Environment :: Console
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Security
Description-Content-Type: text/markdown
License-File: LICENSE-2.0.txt
Requires-Dist: Jinja2==3.1.6
Provides-Extra: docs
Provides-Extra: tests
Requires-Dist: pylint; extra == "tests"
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Requires-Dist: plyara; extra == "tests"
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: tox; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: wheel; extra == "dev"
Requires-Dist: pylint; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: plyara; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: summary

# Introduction

This tool is designed to make it easy to signature potentially unique parts of RTF files.

It was written by David Cannings (@edeca) and released by PwC UK under the Apache 2.0 license.  

To install, you'll need Python 3 and some basic libraries. These are handled automatically if you install using `pip`:

    $ pip install rtfsig

Then run like:

    $ rtfsig -f badfile.rtf -y output.yar

This will scan the file for potentially unique RTF tags, print details to screen and save a Yara rule to `output.yar`.

Please raise bugs as Github issues, and note this tool is in beta.

# Output

## Console

Basic output is shown on the console, which can be used to search VirusTotal (try a search like `content:rsid7043998`).

    -> % rtfsig -f 0b06052d3b5954594cf0e28bd9c50d9110eb8fb78cb78c9a99686eb4ba3391df.hostile
    INFO:root:Starting to parse file 0b06052d3b5954594cf0e28bd9c50d9110eb8fb78cb78c9a99686eb4ba3391df.hostile
    INFO:root:Non-standard RTF magic marker, should be {\rtf1, often a sign of malicious docs
    INFO:root:Found an RSID table in this document
    INFO:root:Found 1 embedded image(s) with set height/width
    INFO:root:Found 2 document information group tags
    INFO:root:Interesting strings (higher chance of FP): \rsid7043998, \rsid7476075, insrsid7043998, \rsid10243744, \rsid7604251, insrsid10243744, {\author blue}, rsidroot10243744, \rsid9200135, tblrsid10243744, charrsid10243744, \picw1\pich1\picwgoal1\pichgoal1 , pararsid10243744, \rsid7238080, insrsid7476075, \rsid11666446, insrsid12343406, \rsid12343406, {\operator blue}
    INFO:root:Found some unique strings!  Consider using vtgrep or deploying Yara rules

Debug output can be generated using `-v` which is helpful if you are reporting a bug.

## Yara rules

The tool will automatically generate Yara rules if the `-y` option is passed.  Two Yara rules are created, one which should generate low false positives (`strict_rule`) and one which may have a higher false positive rate (`loose_rule`).

It is recommended to review strings carefully and to change `any of them` to a sensible number, for example `3 of them`.

An example rule generated from `0b06052d3b5954594cf0e28bd9c50d9110eb8fb78cb78c9a99686eb4ba3391df` looks like:

    rule loose_rule {
      meta:
        description = "RTF file matching known unique identifiers (higher chance of FP, adjust 'any of them' if required)"
        generated_by = "rtfsig version 0.0.2"

      strings:
        $ = "{\\author blue}" ascii
        $ = "\\rsid7238080" ascii
        $ = "pararsid10243744" ascii
        $ = "insrsid7043998" ascii
        $ = "\\rsid7043998" ascii
        $ = "rsidroot10243744" ascii
        $ = "\\rsid9200135" ascii
        $ = "\\rsid7604251" ascii
        $ = "insrsid7476075" ascii
        $ = "\\rsid10243744" ascii
        $ = "insrsid12343406" ascii
        $ = "{\\operator blue}" ascii
        $ = "insrsid10243744" ascii
        $ = "charrsid10243744" ascii
        $ = "\\rsid11666446" ascii
        $ = "\\rsid12343406" ascii
        $ = "\\picw1\\pich1\\picwgoal1\\pichgoal1 " ascii
        $ = "tblrsid10243744" ascii
        $ = "\\rsid7476075" ascii

      condition:
        uint32be(0) == 0x7b5c7274 and any of them
    }

    rule strict_rule {
      meta:
        description = "RTF file matching known unique identifiers (lower chance of FP)"
        generated_by = "rtfsig version 0.0.2"

      strings:
        $ = "\\rsid7043998\\rsid7238080\\rsid7476075\\rsid7604251\\rsid9200135\\rsid10243744\\rsid11666446\\rsid12343406" ascii

      condition:
        uint32be(0) == 0x7b5c7274 and any of them
    }
    
# Known limitations

* At present, documents containing lots of obfuscation (e.g. comments between control words and their values) may 
not be parsed correctly. Please raise an issue with sample files for further inspection.

# Contributing

To setup a development environment, clone the git repository and run the following inside a virtualenv:

    $ pip install -e ".[dev]"

Before submitting a pull request, please check all tests pass and there is 100% coverage of the core module.

This is as simple as running tox and checking the output:

    $ tox
    .. tool output ..
    
    py37: commands succeeded
    congratulations :)

Packaging:

    $ python setup.py sdist bdist_wheel 

Check and upload to PyPI, signing with GPG:

    $ twine check dist/*
    $ twine upload dist/* --sign --identity FCEC8AAA140C74C826592AC357974C5B48A00D9B

# Version history

* v0.0.1 (18th October 2019) - Initial version, supports RSID control words and generating Yara rules
* v0.0.2 (23rd October 2019) - Second beta, added support for unique image identifiers and document information
* v0.0.3 (23rd October 2019) - Third beta, added support for picture sizes
* v0.1.0 (19th September 2020) - First public release, packaged as a Python module for PyPI
* v0.1.1 (26th January 2024) - Bumped Jinja2 dependency to a current version
* v0.1.2 (7th January 2025) - Bumped Jinja2 dependency to a current version
* v0.1.3 (7th January 2025) - Tests fixed and integrated with GitHub actions
* v0.1.4 (5th January 2026) - Bumped Jinja2 dependency to a current version
