major clean

[dja/scandal.git] / architecture.txt
diff --git a/architecture.txt b/architecture.txt

new file mode 100644 (file)

index 0000000..95251d3
--- /dev/null
+++ b/architecture.txt
@@ -0,0 +1,35 @@
+= How are things laid out? =
+
+1 scan page contains 2 physical pages.
+each physical page may contain either 2 or 1 logical pages
+(future: or 4 slides!)
+
+= What do we do? =
+
+0. init, process args, etc.
+1. determine page count
+2. determine depth
+3. determine dpi
+4. foreach double-page-spread (scan page)
+       4.1. extract scan page from pdf, save as png
+       4.2. run a mask over it to pull off large black areas
+       4.3. run unpaper over it, creating 2 pages (physical page)
+       4.4. foreach physical page
+               4.4.1. remask and retrim
+               4.4.2. attempt to detect if a physical page contains 2 logical pages, 
+                       4.4.2.1. if so split with unpaper
+               4.4.3. do any final processing (resize for bebook)
+5. move all the final pictures into a final picture directory
+
+In the accidentally deleted code we used ocropus's binarise stuff to do some
+extra cleaning.
+
+= What options do we need? =
+Anything we attempt to detect automatically should have the option to set manually
+ - depth
+ - dpi
+ - probably which pages we want to process
+ - how many logical pages a physical page has
+       * an option to set a default and certain exceptions would be ace.
+ - options for final output
+ - options to ignore partial products