Metadata-Version: 2.0
Name: jprofile
Version: 0.7.1
Summary: Automated JP2 profiling for digitisation batches
Home-page: https://github.com/KBNLresearch/jprofile
Author: Johan van der Knijff
Author-email: johan.vanderknijff@kb.nl
License: LGPL
Download-URL: https://github.com/KBNLresearch/jprofile/archive/0.7.1.tar.gz
Description-Content-Type: UNKNOWN
Platform: POSIX
Platform: Windows
Classifier: Environment :: Console
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, <4
Requires-Dist: setuptools
Requires-Dist: jpylyzer
Requires-Dist: lxml

# Jprofile

Johan van der Knijff, KB/National Library of the Netherlands.

## What is *jprofile*?

*Jprofile* is a simple tool for automated profiling of large batches of *JP2* images. Internally it wraps around [*jpylyzer*](http://jpylyzer.openpreservation.org/), which is used for validating each image and for extracting its properties. The *jpylyzer* output is then validated against a set of [*Schematron*](http://en.wikipedia.org/wiki/Schematron) schemas that contain the required characteristics for master, access and target images, respectively. 

## Licensing

*Jprofile* is released under the [GNU Lesser General Public License](https://www.gnu.org/licenses/lgpl.html).

## Dependencies

*Jprofile* uses the [lxml](http://lxml.de/) toolkit. You need to install this to run the software in Python. The Windows binaries of *jprofile* are completely stand-alone, and don't require the installation of lxml (or anything else).

## Installation

Just unzip the contents of *jprofile_x.y.z_win32.zip* to any empty directory.

## Command-line syntax


    usage: jprofile batchDir prefixOut -p PROFILE


## Positional arguments

**batchDir**: root directory of batch  

**prefixOut**: prefix that is used for writing output files

**PROFILE**: name of profile that defines schemas for master, access and target images 

To list all available profiles, use a value of *l* or *list* for *PROFILE*.


## Profiles

A *profile* is an *XML*-formatted file that simply defines which schemas are used to validate *jpylyzer*'s output for master, access and target images, respectively. Here's an example:

    <profile>

    <!-- Sample profile -->

    <schemaMaster>master300Gray_2014.sch</schemaMaster>
    <schemaAccess>access300Colour_2014.sch</schemaAccess>
    <schemaTarget>master300Colour_2014.sch</schemaTarget>

    </profile>

Note that each entry only contains the *name* of a profile, not its full path! All profiles are located in the *profiles* directory in the installation folder.

## Available profiles

The following profiles are included by default:

| Name|Description|
| :------| :-----|
|kb_generic_2014.xml|Generic profile for KB digitisation streams (doesn't include any checks on resolution or colour spaces!)|
|kb_300Colour_2014.xml|As generic profile, but with additional requirements than resolution equals 300 ppi and colour space is Adobe RGB 1998|
|kb_300Gray_2014.xml|As generic profile, but with additional requirements than resolution equals 300 ppi and colour space is Gray Gamma 2.2|
|kb_600Colour_2014.xml|As generic profile, but with additional requirements than resolution equals 600 ppi and colour space is Adobe RGB 1998|
|kb_600Gray_2014.xml|As generic profile, but with additional requirements than resolution equals 360 ppi and colour space is Gray Gamma 2.2|

It is possible to create custom-made profiles. Just add them to the *profiles* directory in the installation folder.   

## Schemas

The quality assessment is based on a number of rules/tests that are defined a set of *Schematron* schemas. These are located in the *schemas* folder in the installation directory. In principe *any* property that is reported by *jpylyzer* can be used here, and new tests can be added by editing the schemas. More details on this can be found in [this blog post](http://openpreservation.org/knowledge/blogs/2012/09/04/automated-assessment-jp2-against-technical-profile/).  

## Available schemas

| Name|Description|
|:------| :-----|
|kbMaster_2014.sch|Generic schema for losslessly-compressed master images according to 2014 specifications|
|master600Colour_2014.sch|Schema for losslessly-compressed master images, 600 ppi, Adobe RGB (1998) colour space|
|master600Gray_2014.sch|Schema for losslessly-compressed master images, 600 ppi, Gray Gamma 2.2 colour space|
|master300Colour_2014.sch|Schema for losslessly-compressed master images, 300 ppi, Adobe RGB (1998) colour space|
|master300Gray_2014.sch|Schema for losslessly-compressed master images, 300 ppi, Gray Gamma 2.2 colour space|
|kbAccess_2014.sch|Generic schema for lossily-compressed access images according to 2014 specifications|
|access600Colour_2014.sch|Schema for lossily-compressed access images, 600 ppi, Adobe RGB (1998) colour space|
|access600Gray_2014.sch|Schema for lossily-compressed access images, 600 ppi, Gray Gamma 2.2 colour space|
|access300Colour_2014.sch|Schema for lossily-compressed access images, 300 ppi, Adobe RGB (1998) colour space|
|access300Gray_2014.sch|Schema for lossily-compressed access images, 300 ppi, Gray Gamma 2.2 colour space|

It is possible to create custom-made schemas. Just add them to the *schemas* directory in the installation folder.

## Overview of 2014 schemas

The following tables give a general overview of the technical profiles that the generic master- and access schemas are representing:

### Master

|Parameter|Value|
|:---|:---|
|File format|JP2 (JPEG 2000 Part 1)|
|Compression type|Reversible 5-3 wavelet filter|
|Colour transform|Yes (only for colour images)|
|Number of decomposition levels|5|
|Progression order |RPCL|
|Tile size |1024 x 1024|
|Code block size| 64 x 64 (2<sup>6</sup> x 2<sup>6</sup>)|
|Precinct size	|256 x 256 (2<sup>8</sup>) for 2 highest resolution levels; 128 x 128 (2<sup>7</sup>) for remaining resolution levels|
|Number of quality layers|11|
|Target compression ratio layers|2560:1 [1] ; 1280:1 [2] ;  640:1 [3] ; 320:1 [4] ; 160:1 [5] ; 80:1 [6] ; 40:1 [7] ; 20:1 [8] ; 10:1 [9] ; 5:1 [10] ; 2.5:1 [11]|
|Error resilience|Start-of-packet headers; end-of-packet headers; segmentation symbols|
|Sampling rate|Stored in "Capture Resolution" fields|
|Capture metadata|Embedded as XMP metadata in XML box|


### Access

|Parameter|Value|
|:---|:---|
|File format|JP2 (JPEG 2000 Part 1)|
|Compression type|Irreversible 7-9 wavelet filter|
|Colour transform|Yes (only for colour images)|
|Number of decomposition levels|5|
|Progression order |RPCL|
|Tile size |1024 x 1024|
|Code block size| 64 x 64 (2<sup>6</sup> x 2<sup>6</sup>)|
|Precinct size	|256 x 256 (2<sup>8</sup>) for 2 highest resolution levels; 128 x 128 (2<sup>7</sup>) for remaining resolution levels|
|Number of quality layers|8|
|Target compression ratio layers|2560:1 [1] ; 1280:1 [2] ;  640:1 [3] ; 320:1 [4] ; 160:1 [5] ; 80:1 [6] ; 40:1 [7] ; 20:1 [8]|
|Error resilience|	Start-of-packet headers; end-of-packet headers; segmentation symbols|
|Sampling rate|Stored in "Capture Resolution" fields|
|Capture metadata|Embedded as XMP metadata in XML box|

Note that jpylyzer is unable to establish the compression ratio of individual layers, so the access schema only checks for the overall compression ratio (i.e. 20:1). The more specific schemas (300Colour, 600Gray, etc.) contain additional checks for resolution values, the number of colour components and embedded ICC profiles. 

## Usage examples

### List available profiles

    jprofile d:\myBatch mybatch -p list

This results in a list of all available profiles (these are stored in the installation folder's *profiles* directory):

    Available profiles:

    kb_600Gray_2014.xml
    kb_300Gray_2014.xml
    kb_300Colour_2014.xml
    kb_600Colour_2014.xml
    kb_generic_2014.xml


### Analyse batch

    jprofile d:\myBatch mybatch -p kb_300Colour_2014.xml

This will result in the creation of 2 output files:

- `mybatch_status.csv` (status output file)
- `mybatch_failed.txt` (detailed output on images that failed quality asessment)

## Status output file

This is a comma-separated file with the assessment status of each analysed image. The assessment status is either *pass* (passed all tests) or *fail* (failed one or more tests). Here's an example:

    F:\test\access\MMKB03_000004896_00015_access.jp2,pass
    F:\test\access\MMKB03_000004896_00115_access.jp2,pass
    F:\test\access\MMKB03_000004896_00215_access.jp2,pass
    F:\test\targets-jp2\MMKB03_MTF_RGB_20120626_02_01.jp2,fail
    F:\test\master\MMKB03_000004896_00015_master.jp2,pass


## Failure output file

Any image that failed one or more tests are reported in the failure output file. For each failed image, it contains a full reference to the file path, followed by the specific errors. An example:


    F:\test\targets-jp2\MMKB03_MTF_RGB_20120626_02_01.jp2
    *** Schema validation errors:
    Test "layers = '11'" failed (wrong number of layers)
    Test "transformation = '5-3 reversible'" failed (wrong transformation)
    Test "comment = 'KB_MASTER_LOSSLESS_01/01/2015'" failed (wrong codestream comment string)
    ####

Entries in this file are separated by a sequence of 4 '#' characters. Note that each line here corresponds to a failed test in the schema (this information is taken from *Probatron*'s output). For images that are identified as not-valid JP2 some additional information from *jpylyzer*'s output is included as well. For example:


    F:\test\master\MMUBL07_MTF_GRAY_20121213_01_05.jp2
    *** Schema validation errors:
    Test "isValidJP2 = 'True'" failed (no valid JP2)
    *** Jpylyzer JP2 validation errors:
    Test methIsValid failed
    Test precIsValid failed
    Test approxIsValid failed
    Test foundNextTilePartOrEOC failed
    Test foundEOCMarker failed
    ####


Here, the outcome of test *isValidJP2* means that the image does not conform to the *JP2* specification. The lines following 'Jpylyzer JP2 validation errors' lists the specific errors that were reported by *jpylyzer*. The meaning of these errors can be found in the *jpylyzer* User Manual.

## Preconditions

- All images that are to be analysed have a .jp2 extension (all others are ignored!)
- *Master* images are located in a (subdirectory of a) directory called '*master*'
- *Access* images are located in a (subdirectory of a) directory called '*access*'
- *Target* images are located in a (subdirectory of a) directory called '*targets-jp2*'.
- Either of the above directories may be missing.

Other than that, the organisation of images may follow any arbitrary directory structure (*jprofile* does a recursive scan of whole directory tree of a batch)

## Known limitations

- Images that have names containing square brackets ("[" and "]" are ignored (limitation of *Python*'s *glob* module, will be solved in the future).

## Useful links

- [*jpylyzer*](http://jpylyzer.openpreservation.org/)
- [*Schematron*](http://en.wikipedia.org/wiki/Schematron)
- [Automated assesment of JP2 against a technical profile using jpylyzer and Schematron](http://openpreservation.org/blog/2012/09/04/automated-assessment-jp2-against-technical-profile/)




