Amazing paper + Document Taxonomy

[ipdf/documents.git] / LitReviewDavid.tex
diff --git a/LitReviewDavid.tex b/LitReviewDavid.tex

index b6e75cc..5605cef 100644 (file)
--- a/LitReviewDavid.tex
+++ b/LitReviewDavid.tex
@@ -2,6 +2,8 @@
  \usepackage[utf8]{inputenc}
  \usepackage{hyperref}
  \usepackage{graphicx}
+\usepackage{amsmath}
+\usepackage{amssymb}
  
  %opening
  \title{Literature Review}
@@ -32,28 +34,128 @@ the content of the document to be explored in ways that perhaps the author had n
  However, these data-driven formats typically do not support fixed layouts, and the display differs from renderer to
  renderer.
  
-Ultimately, there are two fundamental stages by which all documents --- digital or otherwise --- are produced and displayed:
-\emph{layout} and \emph{display}. The \emph{layout} stage is where the positions and sizes of text and other graphics are
-determined, while the \emph{display} stage actually produces the final output, whether as ink on paper or pixels on a computer monitor.
+\subsection{A Taxonomy of Document formats}
  
-Existing document formats, due to being designed to model paper,
-have limited precision (8 decimal digits for PostScript\cite{plrm}, 5 decimal digits for PDF\cite{pdfref17}).
-This matches the limited resolution of printers and ink, but is limited when compared to what aught to be possible
-with ``zoom'' functionality, which is prevent from working beyond a limited scale factor, lest artefacts appear due
+The process of creating and displaying a document is a rather universal one (\ref{documenttimeline}), though
+different document formats approach it slightly differently. A document often begins as raw content: text and images
+(be they raster or vector) and it must end up as a set of photons flying towards the reader's eyes.
+
+\begin{figure}
+       \label{documenttimeline}
+       \centering \includegraphics[width=0.8\linewidth]{figures/documenttimeline}
+       \caption{The lifecycle of a document}
+\end{figure}
+
+There are two fundamental stages by which all documents --- digital or otherwise --- are produced and displayed:
+\emph{layout} and \emph{rendering}. The \emph{layout} stage is where the positions and sizes of text and other graphics are
+determined. The text will be \emph{flowed} around graphics, the positions of individual glyphs will be placed, ensuring
+that there is no undesired overlap and that everything will fit on the page or screen.
+
+The \emph{display} stage actually produces the final output, whether as ink on paper or pixels on a computer monitor.
+Each graphical element is rasterized and composited into a single image of the target resolution.
+
+
+Different document formats cover documents in different stages of this project. Bitmapped images,
+for example, would represent the output of the final stage of the process, whereas markup languages typically specify
+a document which has not yet been processed, ready for the layout stage. 
+
+Furthermore, some document formats treat the document as a program, written in
+a (usually turing complete) document language with instructions which emit shapes to be displayed. These shapes are either displayed
+immediately, as in PostScript, or stored in another file, such as with \TeX or \LaTeX, which emit a \texttt{DVI} file. Most other
+forms of document use a \emph{Document Object Model}, being a list or tree of objects to be rendered. \texttt{DVI}, \texttt{PDF},
+\texttt{HTML}\footnote{Some of these formats --- most notably \texttt{HTML} --- implement a scripting lanugage such as JavaScript,
+which permit the DOM to be modified while the document is being viewed.} and SVG\cite{svg2011-1.1}. Of these, only \texttt{HTML} and \TeX typically
+store documents in pre-layout stages, whereas even turing complete document formats such as PostScript typically encode documents
+which already have their elements placed.
+
+\begin{description}
+       \item[\TeX \, and \LaTeX]
+       Donald Knuth's typesetting language \TeX \, is one of the older computer typesetting systems, originally conceived in 1977\cite{texdraft}.
+       It implements a turing-complete language and is human-readable and writable, and is still popular
+       due to its excellent support for typesetting mathematics.
+       \TeX only implements the ``layout'' stage of document display, and produces a typeset file,
+       traditionally in \texttt{DVI} format, though modern implementations will often target \texttt{PDF} instead.
+       
+       This document was prepared in \LaTeXe.
+       
+       \item[DVI]
+       \TeX \, traditionally outputs to the \texttt{DVI} (``DeVice Independent'') format: a binary format which consists of a
+       simple stack machine with instructions for drawing glyphs and curves\cite{fuchs1982theformat}.
+       
+       A \texttt{DVI} file is a representation of a document which has been typeset, and \texttt{DVI}
+       viewers will rasterize this for display or printing, or convert it to another similar format like PostScript
+       to be rasterized.
+       
+       \item[HTML]
+       The Hypertext Markup Language (HTML)\cite{html2rfc} is the widely used document format which underpins the
+       world wide web. In order for web pages to adapt appropriately to different devices, the HTML format simply
+       defined semantic parts of a document, such as headings, phrases requiring emphasis, references to images or links
+       to other pages, leaving the \emph{layout} up to the browser, which would also rasterize the final document.
+       
+       The HTML format has changed significantly since its introduction, and most of the layout and styling is now controlled
+       by a set of style sheets in the CSS\cite{css2spec} format.
+       
+       \item[PostScript]
+       Much like DVI, PostScript\cite{plrm} is a stack-based format for drawing vector graphics, though unlike DVI (but like \TeX), PostScript is
+       text-based and turing complete. PostScript was traditionally run on a control board in laser printers, rasterizing pages at high resolution
+       to be printed, though PostScript interpreters for desktop systems also exist, and are often used with printers which do not support PostScript natively.\cite{ghostscript}
+       
+       PostScript programs typically embody documents which have been typeset, though as a turing-complete language, some layout can be performed by the document.
+       
+       \item[PDF]
+       Adobe's Portable Document Format (PDF)\cite{pdfref17} takes the PostScript rendering model, but does not implement a turing-complete language.
+       Later versions of PDF also extend the PostScript rendering model to support translucent regions via Porter-Duff compositing\cite{porter1984compositing}.
+       
+       PDF documents represent a particular layout, and must be rasterized before display.
+\end{description}
+
+\subsection{Precision in Document Formats}
+
+Existing document formats --- typically due to having been designed for documents printed on paper, which of course has
+limited size and resolution --- use numeric types which can only represent a fixed range and precision.
+While this works fine with printed pages, users reading documents on computer screens using programs
+with ``zoom'' functionality are prevented from working beyond a limited scale factor, lest artefacts appear due
  to issues with numeric precision.
  
+\TeX uses a $14.16$ bit fixed point type (implemented as a 32-bit integer type, with one sign bit and one bit used to detect overflow)\cite{beebe2007extending}.
+This can represent values in the range $[-(2^14), 2^14 - 1]$ with 16 binary digits of fractional precision.
+
+The DVI files \TeX \, produces may use ``up to'' 32-bit signed integers\cite{fuchs1982theformat} to specify the document, but there is no requirement that
+implementations support the full 32-bit type. It would be permissible, for example, to have a DVI viewer support only 24-bit signed integers, though many
+files which require greater range may fail to render correctly.
+
+PostScript\cite{plrm} supports two different numeric types: \emph{integers} and \emph{reals}, both of which are specified as strings. The interpreter's representation of numbers
+is not exposed, though the representation of integers can be divined by a program by the use of bitwise operations. The PostScript specification lists some ``typical limits''
+of numeric types, though the exact limits may differ from implementation to implementation. Integers typically must fall in the range $[-2^{31}, 2^{31} - 1]$,
+and reals are listed to have largest and smallest values of $\pm10^{38}$, values closest to $0$ of $\pm10^{-38}$ and approximately $8$ decimal digits of precision,
+derived from the IEEE 754 single-precision floating-point specification.
+
+Similarly, the PDF specification\cite{pdfref17} stores \emph{integers} and \emph{reals} as strings, though in a more restricted format than PostScript.
+The PDF specification gives limits for the internal representation of values. Integer limits have not changed from the PostScript specification, but numbers
+representable with the \emph{real} type have been specified differently: the largest representable values are $\pm 3.403\times 10^{38}$, the smallest non-zero representable values are
+\footnote{The PDF specification mistakenly leaves out the negative in the exponent here.}
+$\pm1.175 \times 10^{-38}$ with approximately $5$ decimal digits of precision \emph{in the fractional part}.
+Adobe's implementation of PDF uses both IEEE 754 single precision floating-point numbers and (for some calculations, and in previous versions) 16.16 bit fixed-point values.
+
+
  \section{Rendering}
  
-Computer graphics comes in two forms: bit-mapped (or raster) graphics, which is defined by an array of pixel colours, 
+Computer graphics comes in two forms: bit-mapped (or raster) graphics, which is defined by an array of pixel colours; 
  and \emph{vector} graphics, defined by mathematical descriptions of objects. Bit-mapped graphics are well suited to photographs
  and are match how cameras, printers and monitors work. However, bitmap devices do not handle zooming beyond their
  ``native'' resolution --- the resolution where one document pixel maps to one display pixel ---, exhibiting an artefact
  called pixelation where the pixel structure becomes evident. Attempts to use interpolation to hide this effect are
-never entirely successful, and sharp edges, such as those found in text and diagrams, are particularly effected.
+never entirely successful, and sharp edges, such as those found in text and diagrams, are particularly affected.
+
+\begin{figure}[h]
+       \centering \includegraphics[width=0.8\linewidth]{figures/vectorraster_example}
+       \caption{A circle as a vector image and a $32 \times 32$ pixel raster image}
+\end{figure}
+
  
  Vector graphics lack many of these problems: the representation is independent of the output resolution, and rather
  an abstract description of what it is being rendered, typically as a combination of simple geometric shapes like lines,
-arcs and ``B\'ezier curves''. 
+arcs and ``B\'ezier curves''\cite{catmull1974asubdivision}. 
  As existing displays (and printers) are bit-mapped devices, vector documents must be \emph{rasterized} into a bitmap at
  a given resolution. This bitmap is then displayed or printed. The resulting bitmap is then an approximation of the vector image
  at that resolution.
@@ -80,16 +182,25 @@ the Cairo\cite{worth2003xr} library, based around the PostScript/PDF rendering m
  renderer by nVidia\cite{kilgard2012gpu} as an OpenGL extension\cite{kilgard300programming}.
  
  
-\section{Floating-Point Precision}
+\section{Numeric formats}
  
  On modern computer architectures, there are two basic number formats supported:
  fixed-width integers and \emph{floating-point} numbers. Typically, computers
  natively support integers of up to 64 bits, capable of representing all integers
-between $0$ and $2^{64} - 1$\footnote{Most machines also support \emph{signed} integers,
+between $0$ and $2^{64} - 1$, inclusive\footnote{Most machines also support \emph{signed} integers,
  which have the same cardinality as their \emph{unsigned} counterparts, but which
-represent integers between $-(2^{63})$ and $2^{63} - 1$}.
+represent integers in the range $[-(2^{63}), 2^{63} - 1]$}.
  
-Floating-point numbers\cite{goldberg1991whatevery} are the binary equivalent of scientific notation:
+By introducing a fractional component (analogous to a decimal point), we can convert
+integers to \emph{fixed-point} numbers, which have a more limited range, but a fixed, greater
+precision. For example, a number in 4.4 fixed-point format would have four bits representing the integer
+component, and four bits representing the fractional component:
+\begin{equation}
+       \underbrace{0101}_\text{integer component}.\underbrace{1100}_\text{fractional component} = 5.75
+\end{equation}
+
+
+Floating-point numbers\cite{goldberg1992thedesign} are the binary equivalent of scientific notation:
  each number consisting of an exponent ($e$) and a mantissa ($m$) such that a number is given by
  \begin{equation}
         n = 2^{e} \times m
@@ -98,21 +209,51 @@ each number consisting of an exponent ($e$) and a mantissa ($m$) such that a num
  The IEEE 754 standard\cite{ieee754std1985} defines several floating-point data types
  which are used\footnote{Many systems' implement the IEEE 754 standard's storage formats,
  but do not implement arithmetic operations in accordance with this standard.} by most
-computer systems. The standard defines 32-bit (8-bit exponent, 23-bit mantissa) and 
-64-bit (11-bit exponent, 53-bit mantissa) formats\footnote{The 2008
+computer systems. The standard defines 32-bit (8-bit exponent, 23-bit mantissa, 1 sign bit) and 
+64-bit (11-bit exponent, 53-bit mantissa, 1 sign bit) formats\footnote{The 2008
  revision to this standard\cite{ieee754std2008} adds some additional formats, but is
-less widely supported in hardware.} 
-
-How floating-point works and what its behaviour is w/r/t range and precision
-\cite{goldberg1991whatevery}
-\cite{goldberg1992thedesign}
-
-Arb. precision exists
-
+less widely supported in hardware.}, which can store approximately 7 and 15 decimal digits
+of precision respectively.
+
+Floating-point numbers behave quite differently to integers or fixed-point numbers, as
+the representable numbers are not evenly distributed. Large numbers are stored to a lesser
+precision than numbers close to zero. This can present problems in documents when zooming in
+on objects far from the origin.
+
+IEEE floating-point has some interesting features as well, including values for negative zero,
+positive and negative infinity, the ``Not a Number'' (NaN) value and \emph{denormal} values, which
+trade precision for range when dealing with very small numbers. Indeed, with these values,
+IEEE 754 floating-point equality does not form an equivalence relation, which can cause issues
+when not considered carefully.\cite{goldberg1991whatevery}
+
+There also exist formats for storing numbers with arbitrary precising and/or range.
+Some programming languages support ``big integer''\cite{java_bigint} types which can
+represent any integer that can fit in the system's memory. Similarly, there are
+arbitrary-precision floating-point data types\cite{java_bigdecimal}\cite{boost_multiprecision}
+which can represent any number of the form
+\begin{equation}
+       \frac{n}{2^d} \; \; \; \; n,d \in \mathbb{Z} % This spacing is horrible, and I should be ashamed.
+\end{equation}
+These types are typically built from several native data types such as integers and floats,
+paired with custom routines implementing arithmetic primitives.\cite{priest1991algorithms}
+These, therefore, are likely slower than the native types they are built on.
+
+While traditionally, GPUs have supported some approximation of IEEE 754's 32-bit floats,
+modern graphics processors also support 16-bit\cite{nv_half_float} and 64-bit\cite{arb_gpu_shader_fp64}
+IEEE floats.  Note, however, that some parts of the GPU are only able to use some formats,
+so precision will likely be truncated at some point before display.
  Higher precision numeric types can be implemented or used on the GPU, but are
-slow.
-\cite{emmart2010high}
+slow.\cite{emmart2010high}
  
+Pairs of integers $(a \in \mathbb{Z},b \in \mathbb{Z}\setminus 0)$ can be used to represent rationals. This allows
+values such as $\frac{1}{3}$ to be represented exactly, whereas in fixed or floating-point formats,
+this would have a recurring representation:
+\begin{equation}
+       \underbrace{0}_\text{integer part} . \underbrace{01}_\text{recurring part} 01 \; \; 01 \; \; 01 \dots
+\end{equation}
+Whereas with a rational type, this is simply $\frac{1}{3}$.
+Rationals do not have a unique representation for each value, typically the reduced fraction is used
+as a characteristic element.
  
  
  \section{Quadtrees}