= How are things laid out? = 1 scan page contains 2 physical pages. each physical page may contain either 2 or 1 logical pages (future: or 4 slides!) = What do we do? = 0. init, process args, etc. 1. determine page count 2. determine depth 3. determine dpi 4. foreach double-page-spread (scan page) 4.1. extract scan page from pdf, save as png 5. run ocropus's binarise over all the pngs 6. foreach binarised scan page 6.1. create a mask from the original (unbinarised) page 6.2. use the mask to trim the binarised page (cutting this off improves unpaper's accuracy) 6.3. run unpaper over the clean binarised page, creating 2 pages (physical page) 6.4. foreach physical page 6.4.1. remask and retrim 6.4.2. attempt to detect if a physical page contains 2 logical pages, 6.4.2.1. if so split with unpaper 6.4.3. do any final processing (resize for bebook) 7. move all the final pictures into a final picture directory = What options do we need? = Anything we attempt to detect automatically should have the option to set manually - depth - dpi - probably which pages we want to process - how many logical pages a physical page has * an option to set a default and certain exceptions would be ace. - options for final output - options to ignore partial products