LitReviewDavid.tex

   1 \documentclass[a4paper,10pt]{article}
   2 \usepackage[utf8]{inputenc}
   3 \usepackage{hyperref}
   4 \usepackage{graphicx}
   5 \usepackage{amsmath}
   6 \usepackage{amssymb}
   7
   8 %opening
   9 \title{Literature Review}
  10 \author{David Gow}
  11
  12 \begin{document}
  13
  14 \maketitle
  15
  16 \section{Introduction}
  17
  18 Since mankind first climbed down from the trees, it is our ability to communicate that has made us unique.
  19 Once ideas could be passed from person to person, it made sense to have a permanent record of them; one which
  20 could be passed on from person to person without them ever meeting.
  21
  22 And thus the document was born.
  23
  24 Traditionally, documents have been static: just marks on paper, but with the advent of computers many more possibilities open up.
  25
  26 \section{Document Formats}
  27
  28 Most existing document formats --- such as the venerable PostScript and PDF --- are, however, designed to imitate
  29 existing paper documents, largely to allow for easy printing. In order to truly take advantage of the possibilities operating in the digital
  30 domain opens up to us, we must look to new formats.
  31
  32 Formats such as \texttt{HTML} allow for a greater scope of interactivity and for a more data-driven model, allowing
  33 the content of the document to be explored in ways that perhaps the author had not anticipated.\cite{hayes2012pixels}
  34 However, these data-driven formats typically do not support fixed layouts, and the display differs from renderer to
  35 renderer.
  36
  37 \subsection{A Taxonomy of Document formats}
  38
  39 The process of creating and displaying a document is a rather universal one (\ref{documenttimeline}), though
  40 different document formats approach it slightly differently. A document often begins as raw content: text and images
  41 (be they raster or vector) and it must end up as a set of photons flying towards the reader's eyes.
  42
  43 \begin{figure}
  44         \label{documenttimeline}
  45         \centering \includegraphics[width=0.8\linewidth]{figures/documenttimeline}
  46         \caption{The lifecycle of a document}
  47 \end{figure}
  48
  49 There are two fundamental stages by which all documents --- digital or otherwise --- are produced and displayed:
  50 \emph{layout} and \emph{rendering}. The \emph{layout} stage is where the positions and sizes of text and other graphics are
  51 determined. The text will be \emph{flowed} around graphics, the positions of individual glyphs will be placed, ensuring
  52 that there is no undesired overlap and that everything will fit on the page or screen.
  53
  54 The \emph{display} stage actually produces the final output, whether as ink on paper or pixels on a computer monitor.
  55 Each graphical element is rasterized and composited into a single image of the target resolution.
  56
  57
  58 Different document formats cover documents in different stages of this project. Bitmapped images,
  59 for example, would represent the output of the final stage of the process, whereas markup languages typically specify
  60 a document which has not yet been processed, ready for the layout stage.
  61
  62 Furthermore, some document formats treat the document as a program, written in
  63 a (usually turing complete) document language with instructions which emit shapes to be displayed. These shapes are either displayed
  64 immediately, as in PostScript, or stored in another file, such as with \TeX or \LaTeX, which emit a \texttt{DVI} file. Most other
  65 forms of document use a \emph{Document Object Model}, being a list or tree of objects to be rendered. \texttt{DVI}, \texttt{PDF},
  66 \texttt{HTML}\footnote{Some of these formats --- most notably \texttt{HTML} --- implement a scripting lanugage such as JavaScript,
  67 which permit the DOM to be modified while the document is being viewed.} and SVG\cite{svg2011-1.1}. Of these, only \texttt{HTML} and \TeX typically
  68 store documents in pre-layout stages, whereas even turing complete document formats such as PostScript typically encode documents
  69 which already have their elements placed.
  70
  71 \begin{description}
  72         \item[\TeX \, and \LaTeX]
  73         Donald Knuth's typesetting language \TeX \, is one of the older computer typesetting systems, originally conceived in 1977\cite{texdraft}.
  74         It implements a turing-complete language and is human-readable and writable, and is still popular
  75         due to its excellent support for typesetting mathematics.
  76         \TeX only implements the ``layout'' stage of document display, and produces a typeset file,
  77         traditionally in \texttt{DVI} format, though modern implementations will often target \texttt{PDF} instead.
  78
  79         This document was prepared in \LaTeXe.
  80
  81         \item[DVI]
  82         \TeX \, traditionally outputs to the \texttt{DVI} (``DeVice Independent'') format: a binary format which consists of a
  83         simple stack machine with instructions for drawing glyphs and curves\cite{fuchs1982theformat}.
  84
  85         A \texttt{DVI} file is a representation of a document which has been typeset, and \texttt{DVI}
  86         viewers will rasterize this for display or printing, or convert it to another similar format like PostScript
  87         to be rasterized.
  88
  89         \item[HTML]
  90         The Hypertext Markup Language (HTML)\cite{html2rfc} is the widely used document format which underpins the
  91         world wide web. In order for web pages to adapt appropriately to different devices, the HTML format simply
  92         defined semantic parts of a document, such as headings, phrases requiring emphasis, references to images or links
  93         to other pages, leaving the \emph{layout} up to the browser, which would also rasterize the final document.
  94
  95         The HTML format has changed significantly since its introduction, and most of the layout and styling is now controlled
  96         by a set of style sheets in the CSS\cite{css2spec} format.
  97
  98         \item[PostScript]
  99         Much like DVI, PostScript\cite{plrm} is a stack-based format for drawing vector graphics, though unlike DVI (but like \TeX), PostScript is
 100         text-based and turing complete. PostScript was traditionally run on a control board in laser printers, rasterizing pages at high resolution
 101         to be printed, though PostScript interpreters for desktop systems also exist, and are often used with printers which do not support PostScript natively.\cite{ghostscript}
 102
 103         PostScript programs typically embody documents which have been typeset, though as a turing-complete language, some layout can be performed by the document.
 104
 105         \item[PDF]
 106         Adobe's Portable Document Format (PDF)\cite{pdfref17} takes the PostScript rendering model, but does not implement a turing-complete language.
 107         Later versions of PDF also extend the PostScript rendering model to support translucent regions via Porter-Duff compositing\cite{porter1984compositing}.
 108
 109         PDF documents represent a particular layout, and must be rasterized before display.
 110 \end{description}
 111
 112 \subsection{Precision in Document Formats}
 113
 114 Existing document formats --- typically due to having been designed for documents printed on paper, which of course has
 115 limited size and resolution --- use numeric types which can only represent a fixed range and precision.
 116 While this works fine with printed pages, users reading documents on computer screens using programs
 117 with ``zoom'' functionality are prevented from working beyond a limited scale factor, lest artefacts appear due
 118 to issues with numeric precision.
 119
 120 \TeX uses a $14.16$ bit fixed point type (implemented as a 32-bit integer type, with one sign bit and one bit used to detect overflow)\cite{beebe2007extending}.
 121 This can represent values in the range $[-(2^14), 2^14 - 1]$ with 16 binary digits of fractional precision.
 122
 123 The DVI files \TeX \, produces may use ``up to'' 32-bit signed integers\cite{fuchs1982theformat} to specify the document, but there is no requirement that
 124 implementations support the full 32-bit type. It would be permissible, for example, to have a DVI viewer support only 24-bit signed integers, though many
 125 files which require greater range may fail to render correctly.
 126
 127 PostScript\cite{plrm} supports two different numeric types: \emph{integers} and \emph{reals}, both of which are specified as strings. The interpreter's representation of numbers
 128 is not exposed, though the representation of integers can be divined by a program by the use of bitwise operations. The PostScript specification lists some ``typical limits''
 129 of numeric types, though the exact limits may differ from implementation to implementation. Integers typically must fall in the range $[-2^{31}, 2^{31} - 1]$,
 130 and reals are listed to have largest and smallest values of $\pm10^{38}$, values closest to $0$ of $\pm10^{-38}$ and approximately $8$ decimal digits of precision,
 131 derived from the IEEE 754 single-precision floating-point specification.
 132
 133 Similarly, the PDF specification\cite{pdfref17} stores \emph{integers} and \emph{reals} as strings, though in a more restricted format than PostScript.
 134 The PDF specification gives limits for the internal representation of values. Integer limits have not changed from the PostScript specification, but numbers
 135 representable with the \emph{real} type have been specified differently: the largest representable values are $\pm 3.403\times 10^{38}$, the smallest non-zero representable values are
 136 \footnote{The PDF specification mistakenly leaves out the negative in the exponent here.}
 137 $\pm1.175 \times 10^{-38}$ with approximately $5$ decimal digits of precision \emph{in the fractional part}.
 138 Adobe's implementation of PDF uses both IEEE 754 single precision floating-point numbers and (for some calculations, and in previous versions) 16.16 bit fixed-point values.
 139
 140
 141 \section{Rendering}
 142
 143 Computer graphics comes in two forms: bit-mapped (or raster) graphics, which is defined by an array of pixel colours;
 144 and \emph{vector} graphics, defined by mathematical descriptions of objects. Bit-mapped graphics are well suited to photographs
 145 and are match how cameras, printers and monitors work. However, bitmap devices do not handle zooming beyond their
 146 ``native'' resolution --- the resolution where one document pixel maps to one display pixel ---, exhibiting an artefact
 147 called pixelation where the pixel structure becomes evident. Attempts to use interpolation to hide this effect are
 148 never entirely successful, and sharp edges, such as those found in text and diagrams, are particularly affected.
 149
 150 \begin{figure}[h]
 151         \centering \includegraphics[width=0.8\linewidth]{figures/vectorraster_example}
 152         \caption{A circle as a vector image and a $32 \times 32$ pixel raster image}
 153 \end{figure}
 154
 155
 156 Vector graphics lack many of these problems: the representation is independent of the output resolution, and rather
 157 an abstract description of what it is being rendered, typically as a combination of simple geometric shapes like lines,
 158 arcs and ``B\'ezier curves''\cite{catmull1974asubdivision}.
 159 As existing displays (and printers) are bit-mapped devices, vector documents must be \emph{rasterized} into a bitmap at
 160 a given resolution. This bitmap is then displayed or printed. The resulting bitmap is then an approximation of the vector image
 161 at that resolution.
 162
 163 This project will be based around vector graphics, as these properties make it more suited to experimenting with zoom
 164 quality.
 165
 166
 167 The rasterization process typically operates on an individual ``object'' or ``shape'' at a time: there are special algorithms
 168 for rendering lines\cite{bresenham1965algorithm}, triangles\cite{giesen2013triangle}, polygons\cite{pineda1988parallel} and B\'ezier
 169 Curves\cite{goldman_thefractal}. Typically, these are rasterized independently and composited in the bitmap domain using Porter-Duff
 170 compositing\cite{porter1984compositing} into a single image. This allows complex images to be formed from many simple pieces, as well
 171 as allowing for layered translucent objects, which would otherwise require the solution of some very complex constructive geometry problems.
 172
 173 While traditionally, rasterization was done entirely in software, modern computers and mobile devices have hardware support for rasterizing
 174 some basic primitives --- typically lines and triangles ---, designed for use rendering 3D scenes. This hardware is usually programmed with an
 175 API like \texttt{OpenGL}\cite{openglspec}.
 176
 177 More complex shapes like B\'ezier curves can be rendered by combining the use of bitmapped textures (possibly using signed-distance
 178 fields\cite{leymarie1992fast}\cite{frisken2000adaptively}\cite{green2007improved}) with polygons approximating the curve's shape\cite{loop2005resolution}\cite{loop2007rendering}.
 179
 180 Indeed, there are several implementations of entire vector graphics systems using OpenGL: OpenVG\cite{robart2009openvg} on top of OpenGL ES\cite{oh2007implementation};
 181 the Cairo\cite{worth2003xr} library, based around the PostScript/PDF rendering model, has the ``Glitz'' OpenGL backend\cite{nilsson2004glitz} and the SVG/PostScript GPU
 182 renderer by nVidia\cite{kilgard2012gpu} as an OpenGL extension\cite{kilgard300programming}.
 183
 184
 185 \section{Numeric formats}
 186
 187 On modern computer architectures, there are two basic number formats supported:
 188 fixed-width integers and \emph{floating-point} numbers. Typically, computers
 189 natively support integers of up to 64 bits, capable of representing all integers
 190 between $0$ and $2^{64} - 1$, inclusive\footnote{Most machines also support \emph{signed} integers,
 191 which have the same cardinality as their \emph{unsigned} counterparts, but which
 192 represent integers in the range $[-(2^{63}), 2^{63} - 1]$}.
 193
 194 By introducing a fractional component (analogous to a decimal point), we can convert
 195 integers to \emph{fixed-point} numbers, which have a more limited range, but a fixed, greater
 196 precision. For example, a number in 4.4 fixed-point format would have four bits representing the integer
 197 component, and four bits representing the fractional component:
 198 \begin{equation}
 199         \underbrace{0101}_\text{integer component}.\underbrace{1100}_\text{fractional component} = 5.75
 200 \end{equation}
 201
 202
 203 Floating-point numbers\cite{goldberg1992thedesign} are the binary equivalent of scientific notation:
 204 each number consisting of an exponent ($e$) and a mantissa ($m$) such that a number is given by
 205 \begin{equation}
 206         n = 2^{e} \times m
 207 \end{equation}
 208
 209 The IEEE 754 standard\cite{ieee754std1985} defines several floating-point data types
 210 which are used\footnote{Many systems' implement the IEEE 754 standard's storage formats,
 211 but do not implement arithmetic operations in accordance with this standard.} by most
 212 computer systems. The standard defines 32-bit (8-bit exponent, 23-bit mantissa, 1 sign bit) and
 213 64-bit (11-bit exponent, 53-bit mantissa, 1 sign bit) formats\footnote{The 2008
 214 revision to this standard\cite{ieee754std2008} adds some additional formats, but is
 215 less widely supported in hardware.}, which can store approximately 7 and 15 decimal digits
 216 of precision respectively.
 217
 218 Floating-point numbers behave quite differently to integers or fixed-point numbers, as
 219 the representable numbers are not evenly distributed. Large numbers are stored to a lesser
 220 precision than numbers close to zero. This can present problems in documents when zooming in
 221 on objects far from the origin.
 222
 223 IEEE floating-point has some interesting features as well, including values for negative zero,
 224 positive and negative infinity, the ``Not a Number'' (NaN) value and \emph{denormal} values, which
 225 trade precision for range when dealing with very small numbers. Indeed, with these values,
 226 IEEE 754 floating-point equality does not form an equivalence relation, which can cause issues
 227 when not considered carefully.\cite{goldberg1991whatevery}
 228
 229 There also exist formats for storing numbers with arbitrary precising and/or range.
 230 Some programming languages support ``big integer''\cite{java_bigint} types which can
 231 represent any integer that can fit in the system's memory. Similarly, there are
 232 arbitrary-precision floating-point data types\cite{java_bigdecimal}\cite{boost_multiprecision}
 233 which can represent any number of the form
 234 \begin{equation}
 235         \frac{n}{2^d} \; \; \; \; n,d \in \mathbb{Z} % This spacing is horrible, and I should be ashamed.
 236 \end{equation}
 237 These types are typically built from several native data types such as integers and floats,
 238 paired with custom routines implementing arithmetic primitives.\cite{priest1991algorithms}
 239 These, therefore, are likely slower than the native types they are built on.
 240
 241 While traditionally, GPUs have supported some approximation of IEEE 754's 32-bit floats,
 242 modern graphics processors also support 16-bit\cite{nv_half_float} and 64-bit\cite{arb_gpu_shader_fp64}
 243 IEEE floats.  Note, however, that some parts of the GPU are only able to use some formats,
 244 so precision will likely be truncated at some point before display.
 245 Higher precision numeric types can be implemented or used on the GPU, but are
 246 slow.\cite{emmart2010high}
 247
 248 Pairs of integers $(a \in \mathbb{Z},b \in \mathbb{Z}\setminus 0)$ can be used to represent rationals. This allows
 249 values such as $\frac{1}{3}$ to be represented exactly, whereas in fixed or floating-point formats,
 250 this would have a recurring representation:
 251 \begin{equation}
 252         \underbrace{0}_\text{integer part} . \underbrace{01}_\text{recurring part} 01 \; \; 01 \; \; 01 \dots
 253 \end{equation}
 254 Whereas with a rational type, this is simply $\frac{1}{3}$.
 255 Rationals do not have a unique representation for each value, typically the reduced fraction is used
 256 as a characteristic element.
 257
 258
 259 \section{Quadtrees}
 260 When viewing or processing a small part of a large document, it may be helpful to
 261 only processs --- or \emph{cull} --- parts of the document which are not on-screen.
 262
 263 \begin{figure}[h]
 264         \centering \includegraphics[width=0.4\linewidth]{figures/quadtree_example}
 265         \caption{A simple quadtree.}
 266 \end{figure}
 267 The quadtree\cite{finkel1974quad}is a data structure --- one of a family of \emph{spatial}
 268 data structures --- which recursively breaks down space into smaller subregions
 269 which can be processed independently. Points (or other objects) are added to a single
 270 node, which if certain criteria are met --- typically the number of points in a node
 271 exceeding a maximum, though in our case likely the level of precision required exceeding
 272 that supported by the data type in use --- is split into four equal-sized subregions, and
 273 points attached to the region which contains them.
 274
 275 In this project, we will be experimenting with a form of quadtree in which each
 276 node has its own independent coordinate system, allowing us to store some spatial
 277 information\footnote{One bit per-coordinate, per-level of the quadtree} within the
 278 quadtree structure, eliminating redundancy in the coordinates of nearby objects.
 279
 280 Other spatial data structures exist, such as the KD-tree\cite{bentley1975multidimensional},
 281 which partitions the space on any axis-aligned line; or the BSP tree\cite{fuchs1980onvisible},
 282 which splits along an arbitrary line which need not be axis aligned. We believe, however,
 283 that the simpler conversion from binary coordinates to the quadtree's binary split make
 284 it a better avenue for initial research to explore.
 285
 286 \bibliographystyle{unsrt}
 287 \bibliography{papers}
 288
 289 \end{document}