3. determine dpi
4. foreach double-page-spread (scan page)
4.1. extract scan page from pdf, save as png
- 4.2. run a mask over it to pull off large black areas
- 4.3. run unpaper over it, creating 2 pages (physical page)
- 4.4. foreach physical page
- 4.4.1. remask and retrim
- 4.4.2. attempt to detect if a physical page contains 2 logical pages,
- 4.4.2.1. if so split with unpaper
- 4.4.3. do any final processing (resize for bebook)
-5. move all the final pictures into a final picture directory
-In the accidentally deleted code we used ocropus's binarise stuff to do some
-extra cleaning.
+5. run ocropus's binarise over all the pngs
+
+6. foreach binarised scan page
+ 6.1. create a mask from the original (unbinarised) page
+ 6.2. use the mask to trim the binarised page (cutting this off improves unpaper's accuracy)
+ 6.3. run unpaper over the clean binarised page, creating 2 pages (physical page)
+ 6.4. foreach physical page
+ 6.4.1. remask and retrim
+ 6.4.2. attempt to detect if a physical page contains 2 logical pages,
+ 6.4.2.1. if so split with unpaper
+ 6.4.3. do any final processing (resize for bebook)
+7. move all the final pictures into a final picture directory
= What options do we need? =
Anything we attempt to detect automatically should have the option to set manually
* an option to set a default and certain exceptions would be ace.
- options for final output
- options to ignore partial products
+ - more debug options