Metadata-Version: 2.1
Name: iocide
Version: 0.2.0
Summary: Indicator of Compromise (IOC) Detection Utility
Home-page: https://gitlab.com/dsfinn/iocide
Author: David Finn
Author-email: dsfinn@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Security
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: chardet (>=4.0.0)
Requires-Dist: pdfminer.six (>=20201018)
Requires-Dist: regex (>=2021.4.4)
Requires-Dist: unidecode (>=1.2.0)

# iocide

Defanged/Obfuscated Indicator of Compromise (IOC) Detection

`iocide` uses programatically-generated regular expressions to capture IOC
values that may have been defanged and/or obfuscated using a range of
techniques.

Expressions for each IOC type have been tailored to the relevant specification
standard, allowing identification of values obfuscated with combinations of
defanging, alternate unicode characters, and unusual edge-case formatting.

## Author

David Finn: dsfinn@gmail.com

## Features

### Detected IOC Types

`iocide` can detect multiple IOC types, including:

- remote URLs
- remote IPs
- remote hostnames
- email addresses
- hashes

### Document Types

`iocide` automatically detects and decodes PDF and zip files, including modern
Microsoft Office documents.
Other data will be inspected for text encoding using
[`chardet`](https://pypi.org/project/chardet/).

This automatic decoding extends to embedded binary values in input text,
meaning that `iocide` is able to detected PDF, zip/Office, and text files
encoded as text blobs to arbitrary depth.

### Text Deobfuscation

Invoking `iocide` without the `--raw` parameter will cause unicode characters
in the input text to be converted to ASCII where possible using
[`unidecode`](https://pypi.org/project/Unidecode/).
IOC values will be extracted from this normalised text, neutralising
obfuscation based on unicode character substitution.

### Encoded Binary

`iocide` can detect (and optionally refang) binary blobs encoded according to
[RFC 3548](https://datatracker.ietf.org/doc/html/rfc3548.html),
including:

- base16
- base32
- base64

### Binary-Embedded Text

`iocide` can automatically inspect detected binary blobs for text encodings.
Detected text can be searched for IOCs to arbitrary recursion depth.
By default, only embedded text binaries found in the top level text will be
further inspected.

## Installation

```
pip install iocide
```

## Command Line Quickstart

See `iocide -h` for parameters and subcommands.

`iocide` reads text from `stdin` and writes detected IOC values to `stdout`.
If invoked without arguments, it will search for all known IOC types including
binary blobs and binary-embedded text.

The `--refang` flag (shortcut `-r`) will cause `iocide` to normalise detected
values to their 'fanged' representations if the IOC type supports refanging.

By default, `iocide` will normalise input text by replacing non-ASCII
characters with ASCII where possible.
This behaviour can be deactivated using the `--raw` flag.

### Finding defanged IOCs in text

```
echo "fake1,Ħ×Xƥŝ://ÀÇÈÌÐÑ<ąŧ>ƒơő[.ƃăr.)ḅȃź{ďōţ}çøm<fake2>" | iocide
```
Output:
```
HxXps://ACEIDN<at>foo[.bar.)baz
//ACEIDN<at>foo[.bar.)baz
foo[.bar.)baz
```

### Refanging detected IOCs

```
echo "fake1,Ħ×Xƥŝ://ÀÇÈÌÐÑ<ąŧ>ƒơő[.ƃăr.)ḅȃź{ďōţ}çøm<fake2>" | iocide --refang
```
Output:
```
https://ACEIDN@foo.bar.baz
//ACEIDN@foo.bar.baz
foo.bar.baz
```

### Finding a specific IOC type

```
cat suspicious_document.txt | iocide url
cat suspicious_document.txt | iocide --refang url
```

### Finding binary-embedded text

```
cat suspicious_document.txt | iocide secrets
```

### Filter output for unique values
```
cat suspicious_document.txt | iocide | sort -u
```

### Specify an input path

For large input, buffering file content from stdin can be avoided by specifying
an input file using `--input`:

```
iocide --input some/path.txt
```

## Python Interface

```python
import iocide


some_raw_text = ...

# Normalise text characters to ascii where possible
some_text = iocide.normalise(some_raw_text)

# extract_all will extract all known IOC types

for defanged_ioc in iocide.extract_all(text=some_text, refang=False):
	...

for refanged_ioc in iocide.extract_all(text=some_text, refang=True):
	...

# Use the appropriate submodule to extract a specific IOC type
# e.g. for url:

for defanged_url in iocide.url.extract(text=some_text, refang=False):
	...

for refanged_url in iocide.url.extract(text=some_text, refang=True):
	...


# To find all text contents of data including encoded text embedded as binary
# blobs to an arbitrary depth of recursion:
with open('some/file') as data_file:
	for text in iocide.blobs.extract_text(data=data_file, depth=None):
		...


# To exclude text from the top level of encoding, use blobs.extract_text with
# `embedded_only=True`:
with open('some/file') as data_file:
	for secret in iocide.blobs.extract_text(
			data=data_file, embedded_only=True, depth=3):
		...

```

To facilitate advanced use of detected IOC values, generated values are
instances of appropriate built-in Python types.
For example, URL values are instances of `urllib.parse.ParseResult`.

Some IOC types don't correspond to a built-in Python type (such as hashes),
and are generated as `str` objects.

All generated values can be converted to strings by calling the `str`
constructor:

```python
str(defanged_url)
str(refanged_url)
```

Custom subclasses have been used to faciliate `str` construction and defanged
text preservation where necessary.
These subclasses can be inspected in the relevant module.


