Changelog
=========

1.2 (2012-12-28)
----------------
- fix cache check to prevent overwriting cache [djay]
- turn redirects into Link objects [djay]
- summary stats of which mimetypes were crawled [djay]
- fixed bug where redirected pages weren't getting uploaded [djay]
- fixed bugs with storing default pages in cache [djay]
- fixed bug with space chars in urls [ivanteoh]
- better handling of charset detection [djay]


1.1 (2012-04-17)
----------------

- add start-urls option [djay]
- add ignore_robots option [djay]
- fixed bug in http-equiv refresh handling [djay]
- fixes to disk caching [djay]
- better logging [djay]
- default maxsize is unlimited [djay]
- Provide ability for the reformat function to substitute patterns with 
  empty strings (nothing).  Buildout does not support empty lines within
  configuration, so if a substitution is <EMPTYSTRING> this becomes an empty
  string. [davidjb]
- Provide a logger in the LXMLPage class so the reformat function can 
  succeed [davidjb]
- Reformat spacing in webcrawler reformat function [davidjb] 


1.0 (2011-06-29)
----------------
-    many fixes for importing from local directory w/ many languages [simahawk]
-    fix UnicodeEncodeError when file name/language is not english [simahawk]
-    fix iterating over non-sequence [simahawk]
-    fix missing import for MyStringIO [simahawk]

1.0b7 (2011-02-17)
------------------
- fix bug in cache check [djay]

1.0b6 (2011-02-12)
------------------
-    only open cache files when needed so don't run out of handles [djay]
-    follow http-equiv refresh links [djay]

1.0b5 (2011-02-06)
------------------
- files use file pointers to reduce memory usage [djay]
- cache saves .metadata files to record and playback headersx [djay]

1.0b4 (2010-12-13)
------------------
- improve logging [djay]
- fix encoding bug caused by cache [djay]

1.0b3 (2010-11-10)
------------------

- Fixed bug in cache that caused many links to be ignored in some cases [djay]
- Fix documentation up [djay]

1.0b2 (2010-11-09)
------------------

- Stopped localhost output when no output set [djay]

1.0b1 (2010-11-08)
------------------

- change site_url to just url. [djay]

- rename maxpage to maxsize [djay]

- fix file: style urls  [djay]

- Added cache option to replace base_alias [djay]

- fix _origin key set by webcrawler, instead of url now it is path as expected by further blue
  [Vitaliy Podoba]

- add _orig_path to pipeline item to keep original path for any further purposes, we will need
  [Vitaliy Podoba]

- make all url absolute taking into account base tags inside webcrawler blueprint
   [Vitaliy Podoba] 


0.1 (2008-09-25)
----------------

- renamed package from pretaweb.blueprints to transmogrify.webcrawler.
      [djay]

- enhanced import view [djay]


