LitReviewDavid.tex

   1 \documentclass[a4paper,10pt]{article}
   2 \usepackage[utf8]{inputenc}
   3 \usepackage{hyperref}
   4 \usepackage{graphicx}
   5 \usepackage{amsmath}
   6 \usepackage{amssymb}
   7
   8 %opening
   9 \title{Literature Review}
  10 \author{David Gow}
  11
  12 \begin{document}
  13
  14 \maketitle
  15
  16 \section{Introduction}
  17
  18 Since mankind first climbed down from the trees, it is our ability to communicate that has made us unique.
  19 Once ideas could be passed from person to person, it made sense to have a permanent record of them; one which
  20 could be passed on from person to person without them ever meeting.
  21
  22 And thus the document was born.
  23
  24 Traditionally, documents have been static: just marks on rock, parchment or paper, but with the advent of computers many more possibilities open up.
  25
  26
  27 \section{Rendering}
  28
  29 Computer graphics comes in two forms: bit-mapped (or raster) graphics, which is defined by an array of pixel colours;
  30 and \emph{vector} graphics, defined by mathematical descriptions of objects. Bit-mapped graphics are well suited to photographs
  31 and match how cameras, printers and monitors work.
  32
  33
  34 \begin{figure}[h]
  35         \centering \includegraphics[width=0.8\linewidth]{figures/vectorraster_example}
  36         \caption{A circle as a vector image and a $32 \times 32$ pixel raster image}
  37 \end{figure}
  38
  39
  40 However, bitmap devices do not handle zooming beyond their ``native'' resolution (the resolution where one document pixel maps
  41 to one display pixel), exhibiting an artefact called pixelation where the pixel structure becomes evident. Attempts to use
  42 interpolation to hide this effect are never entirely successful, and sharp edges, such as those found in text and diagrams, are particularly affected.
  43
  44 Vector graphics avoid many of these problems: the representation is independent of the output resolution, and rather
  45 an abstract description of what it is being rendered, typically as a combination of simple geometric shapes like lines,
  46 arcs and glyphs.
  47
  48 As existing displays (and printers) are bit-mapped devices, vector documents must be \emph{rasterized} into a bitmap at
  49 a given resolution. This bitmap is then displayed or printed. The resulting bitmap is then an approximation of the vector image
  50 at that resolution.
  51
  52 % Project specific line
  53 This project will be based around vector graphics, as these properties make it more suited to experimenting with zoom
  54 quality.
  55
  56 \subsection{Rasterizing Vector Graphics}
  57
  58 Before an vector document can be rasterized, the co-ordinates of any shapes must
  59 be transformed into \emph{screen space} or \emph{viewport space}\cite{blinn1992trip}.
  60 On a typical display, many of these screen-space coordinates require very little precision or range.
  61 However, the co-ordinate transform must take care to ensure that precision is preserved during this transform.
  62
  63 After this transformation, the image is decomposed into its separate shapes, which are rasterized
  64 and then composited together.
  65 Most graphics formats support Porter-Duff compositing\cite{porter1984compositing}.
  66 Porter-Duff compositing gives each element (typically a pixel) a ``coverage'' value,
  67 denoted $\alpha$ which represents the contribution of that element to the final scene.
  68 Completely transparent elements would have an $\alpha$ value of $0$, and completely opaque
  69 elements have an $\alpha$ of $1$. This permits arbitrary shapes to be layered over one another
  70 in the raster domain, while retaining soft-edges.
  71
  72 The rasterization process may then proceed on one object (or shape) at a time. There are special algorithms for rasterizing
  73 different shapes.
  74
  75 \begin{description}
  76         \item[Line Segment]
  77         Straight lines between two points are easily rasterized using Bresenham's algorithm\cite{bresenham1965algorithm}.
  78         Bresenham's algorithm draws a pixel for every point along the \emph{long} axis of the line, moving along the short
  79         axis when the error exceeds $\frac{1}{2}$ a pixel.
  80
  81         Bresenham's algorithm only operates on lines whose endpoints lie on integer pixel coordinates. Due to this, line ``clipping''
  82         may be performed to find endpoints of the line segment such that the entire line will be on-screen. However, if line clipping is
  83         performed na\"ively without also setting the error accumulator correctly, the line's slope will be altered slightly, becoming dependent
  84         on the viewport.
  85
  86         \item[B\'ezier Curve]
  87         A B\'ezier curve is a smooth (i.e.\ infinitely differentiable) curve between two points, represented by a Bernstein polynomial.
  88         The coefficients of this Bernstein polynomial are known as the ``control points.''
  89
  90         B\'ezier curves are typically rasterized using De Casteljau's algorithm\cite{foley1996computer}
  91         Line Segments are a first-order B\'ezier curve.
  92
  93         \item[B\'ezier Spline]
  94         A spline of order $n$ is a $C^{n-1}$ smooth continuous piecewise function composed of polynomials of degree $\leq n$.
  95         In a B\'ezier spline, these polynomials are Bernstein polynomials, hence the spline is a curve made by joining B\'ezier curves
  96         end-to-end (in a manner which preserves some level of smoothness).
  97
  98         Many vector graphics formats call B\'ezier splines of a given order (typically quadratic or cubic) ``paths'' and treat them as the
  99         fundamental type from which shapes are formed.
 100 \end{description}
 101
 102
 103 %There are special algorithms
 104 %for rendering lines\cite{bresenham1965algorithm}, triangles\cite{giesen2013triangle}, polygons\cite{pineda1988parallel} and B\'ezier
 105 %Curves\cite{goldman_thefractal}.
 106
 107 \subsection{GPU Rendering}
 108 While traditionally, rasterization was done entirely in software, modern computers and mobile devices have hardware support for rasterizing
 109 lines and triangles designed for use rendering 3D scenes. This hardware is usually programmed with an
 110 API like \texttt{OpenGL}\cite{openglspec}.
 111
 112 More complex shapes like B\'ezier curves can be rendered by combining the use of bitmapped textures (possibly using signed-distance
 113 fields\cite{leymarie1992fast}\cite{frisken2000adaptively}\cite{green2007improved}) strtched over a triangle mesh
 114 approximating the curve's shape\cite{loop2005resolution}\cite{loop2007rendering}.
 115
 116 Indeed, there are several implementations of entire vector graphics systems using OpenGL:
 117 \begin{itemize}
 118         \item The OpenVG standard\cite{robart2009openvg} has been implemented on top of OpenGL ES\cite{oh2007implementation};
 119         \item the Cairo\cite{worth2003xr} library, based around the PostScript/PDF rendering model, has the ``Glitz'' OpenGL backend\cite{nilsson2004glitz}
 120         \item and the SVG/PostScript GPU renderer by nVidia\cite{kilgard2012gpu} as an OpenGL extension\cite{kilgard300programming}.
 121 \end{itemize}
 122
 123
 124 \section{Numeric formats}
 125
 126 On modern computer architectures, there are two basic number formats supported:
 127 fixed-width integers and \emph{floating-point} numbers. Typically, computers
 128 natively support integers of up to 64 bits, capable of representing all integers
 129 between $0$ and $2^{64} - 1$, inclusive\footnote{Most machines also support \emph{signed} integers,
 130 which have the same cardinality as their \emph{unsigned} counterparts, but which
 131 represent integers in the range $[-(2^{63}), 2^{63} - 1]$}.
 132
 133 By introducing a fractional component (analogous to a decimal point), we can convert
 134 integers to \emph{fixed-point} numbers, which have a more limited range, but a fixed, greater
 135 precision. For example, a number in 4.4 fixed-point format would have four bits representing the integer
 136 component, and four bits representing the fractional component:
 137 \begin{equation}
 138         \underbrace{0101}_\text{integer component}.\underbrace{1100}_\text{fractional component} = 5.75
 139 \end{equation}
 140
 141
 142 Floating-point numbers\cite{goldberg1992thedesign} are the binary equivalent of scientific notation:
 143 each number consisting of an exponent ($e$) and a mantissa ($m$) such that a number is given by
 144 \begin{equation}
 145         n = 2^{e} \times m
 146 \end{equation}
 147
 148 The IEEE 754 standard\cite{ieee754std1985} defines several floating-point data types
 149 which are used\footnote{Many systems' implement the IEEE 754 standard's storage formats,
 150 but do not implement arithmetic operations in accordance with this standard.} by most
 151 computer systems. The standard defines 32-bit (8-bit exponent, 23-bit mantissa, 1 sign bit) and
 152 64-bit (11-bit exponent, 53-bit mantissa, 1 sign bit) formats\footnote{The 2008
 153 revision to this standard\cite{ieee754std2008} adds some additional formats, but is
 154 less widely supported in hardware.}, which can store approximately 7 and 15 decimal digits
 155 of precision respectively.
 156
 157 Floating-point numbers behave quite differently to integers or fixed-point numbers, as
 158 the representable numbers are not evenly distributed. Large numbers are stored to a lesser
 159 precision than numbers close to zero. This can present problems in documents when zooming in
 160 on objects far from the origin.
 161
 162 IEEE floating-point has some interesting features as well, including values for negative zero,
 163 positive and negative infinity, the ``Not a Number'' (NaN) value and \emph{denormal} values, which
 164 trade precision for range when dealing with very small numbers. Indeed, with these values,
 165 IEEE 754 floating-point equality does not form an equivalence relation, which can cause issues
 166 when not considered carefully.\cite{goldberg1991whatevery}
 167
 168 There also exist formats for storing numbers with arbitrary precising and/or range.
 169 Some programming languages support ``big integer''\cite{java_bigint} types which can
 170 represent any integer that can fit in the system's memory. Similarly, there are
 171 arbitrary-precision floating-point data types\cite{java_bigdecimal}\cite{boost_multiprecision}
 172 which can represent any number of the form
 173 \begin{equation}
 174         \frac{n}{2^d} \; \; \; \; n,d \in \mathbb{Z} % This spacing is horrible, and I should be ashamed.
 175 \end{equation}
 176 These types are typically built from several native data types such as integers and floats,
 177 paired with custom routines implementing arithmetic primitives.\cite{priest1991algorithms}
 178 These, therefore, are likely slower than the native types they are built on.
 179
 180 Pairs of integers $(a \in \mathbb{Z},b \in \mathbb{Z}\setminus 0)$ can be used to represent rationals. This allows
 181 values such as $\frac{1}{3}$ to be represented exactly, whereas in fixed or floating-point formats,
 182 this would have a recurring representation:
 183 \begin{equation}
 184         \underbrace{0}_\text{integer part} . \underbrace{01}_\text{recurring part} 01 \; \; 01 \; \; 01 \dots
 185 \end{equation}
 186 Whereas with a rational type, this is simply $\frac{1}{3}$.
 187 Rationals do not have a unique representation for each value, typically the reduced fraction is used
 188 as a characteristic element.
 189
 190 While traditionally, GPUs have supported some approximation of IEEE 754's 32-bit floats,
 191 modern graphics processors also support 16-bit\cite{nv_half_float} and 64-bit\cite{arb_gpu_shader_fp64}
 192 IEEE floats, though some features of IEEE floats, like denormals and NaNs are not always supported.
 193 Note, however, that some parts of the GPU are only able to use some formats,
 194 so precision will likely be truncated at some point before display.
 195 Higher precision numeric types can be implemented or used on the GPU, but are
 196 slow.\cite{emmart2010high}
 197
 198
 199 \section{Document Formats}
 200
 201 Most existing document formats --- such as the venerable PostScript and PDF --- are, however, designed to imitate
 202 existing paper documents, largely to allow for easy printing. In order to truly take advantage of the possibilities operating in the digital
 203 domain opens up to us, we must look to new formats.
 204
 205 Formats such as \texttt{HTML} allow for a greater scope of interactivity and for a more data-driven model, allowing
 206 the content of the document to be explored in ways that perhaps the author had not anticipated.\cite{hayes2012pixels}
 207 However, these data-driven formats typically do not support fixed layouts, and the display differs from renderer to
 208 renderer.
 209
 210 \subsection{A Taxonomy of Document formats}
 211
 212 The process of creating and displaying a document is a rather universal one (\ref{documenttimeline}), though
 213 different document formats approach it slightly differently. A document often begins as raw content: text and images
 214 (be they raster or vector) and it must end up as a stream of photons flying towards the reader's eyes.
 215
 216 \begin{figure}
 217         \label{documenttimeline}
 218         \centering \includegraphics[width=0.8\linewidth]{figures/documenttimeline}
 219         \caption{The lifecycle of a document}
 220 \end{figure}
 221
 222 There are two fundamental stages by which all documents --- digital or otherwise --- are produced and displayed:
 223 \emph{layout} and \emph{rendering}. The \emph{layout} stage is where the positions and sizes of text and other graphics are
 224 determined. The text will be \emph{flowed} around graphics, the positions of individual glyphs will be placed, ensuring
 225 that there is no undesired overlap and that everything will fit on the page or screen.
 226
 227 The \emph{display} stage actually produces the final output, whether as ink on paper or pixels on a computer monitor.
 228 Each graphical element is rasterized and composited into a single image of the target resolution.
 229
 230
 231 Different document formats cover documents in different stages of this project. Bitmapped images,
 232 for example, would represent the output of the final stage of the process, whereas markup languages typically specify
 233 a document which has not yet been processed, ready for the layout stage.
 234
 235 Furthermore, some document formats treat the document as a program, written in
 236 a (usually turing complete) document language with instructions which emit shapes to be displayed. These shapes are either displayed
 237 immediately, as in PostScript, or stored in another file, such as with \TeX or \LaTeX, which emit a \texttt{DVI} file. Most other
 238 forms of document use a \emph{Document Object Model}, being a list or tree of objects to be rendered. \texttt{DVI}, \texttt{PDF},
 239 \texttt{HTML}\footnote{Some of these formats --- most notably \texttt{HTML} --- implement a scripting lanugage such as JavaScript,
 240 which permit the DOM to be modified while the document is being viewed.} and SVG\cite{svg2011-1.1}. Of these, only \texttt{HTML} and \TeX typically
 241 store documents in pre-layout stages, whereas even turing complete document formats such as PostScript typically encode documents
 242 which already have their elements placed.
 243
 244 \begin{description}
 245         \item[\TeX \, and \LaTeX]
 246         Donald Knuth's typesetting language \TeX \, is one of the older computer typesetting systems, originally conceived in 1977\cite{texdraft}.
 247         It implements a turing-complete language and is human-readable and writable, and is still popular
 248         due to its excellent support for typesetting mathematics.
 249         \TeX only implements the ``layout'' stage of document display, and produces a typeset file,
 250         traditionally in \texttt{DVI} format, though modern implementations will often target \texttt{PDF} instead.
 251
 252         This document was prepared in \LaTeXe.
 253
 254         \item[DVI]
 255         \TeX \, traditionally outputs to the \texttt{DVI} (``DeVice Independent'') format: a binary format which consists of a
 256         simple stack machine with instructions for drawing glyphs and curves\cite{fuchs1982theformat}.
 257
 258         A \texttt{DVI} file is a representation of a document which has been typeset, and \texttt{DVI}
 259         viewers will rasterize this for display or printing, or convert it to another similar format like PostScript
 260         to be rasterized.
 261
 262         \item[HTML]
 263         The Hypertext Markup Language (HTML)\cite{html2rfc} is the widely used document format which underpins the
 264         world wide web. In order for web pages to adapt appropriately to different devices, the HTML format simply
 265         defined semantic parts of a document, such as headings, phrases requiring emphasis, references to images or links
 266         to other pages, leaving the \emph{layout} up to the browser, which would also rasterize the final document.
 267
 268         The HTML format has changed significantly since its introduction, and most of the layout and styling is now controlled
 269         by a set of style sheets in the CSS\cite{css2spec} format.
 270
 271         \item[PostScript]
 272         Much like DVI, PostScript\cite{plrm} is a stack-based format for drawing vector graphics, though unlike DVI (but like \TeX), PostScript is
 273         text-based and turing complete. PostScript was traditionally run on a control board in laser printers, rasterizing pages at high resolution
 274         to be printed, though PostScript interpreters for desktop systems also exist, and are often used with printers which do not support PostScript natively.\cite{ghostscript}
 275
 276         PostScript programs typically embody documents which have been typeset, though as a turing-complete language, some layout can be performed by the document.
 277
 278         \item[PDF]
 279         Adobe's Portable Document Format (PDF)\cite{pdfref17} takes the PostScript rendering model, but does not implement a turing-complete language.
 280         Later versions of PDF also extend the PostScript rendering model to support translucent regions via Porter-Duff compositing\cite{porter1984compositing}.
 281
 282         PDF documents represent a particular layout, and must be rasterized before display.
 283
 284         \item[SVG]
 285         Scalable Vector Graphics (SVG) is a vector graphics document format\cite{svg2011-1.1} which uses the Document Object Model. It consists of a tree of matrix transforms,
 286         with objects such as vector paths (made up of B\'ezier curves) and text at the leaves.
 287
 288 \end{description}
 289
 290 \subsection{Precision in Document Formats}
 291
 292 Existing document formats --- typically due to having been designed for documents printed on paper, which of course has
 293 limited size and resolution --- use numeric types which can only represent a fixed range and precision.
 294 While this works fine with printed pages, users reading documents on computer screens using programs
 295 with ``zoom'' functionality are prevented from working beyond a limited scale factor, lest artefacts appear due
 296 to issues with numeric precision.
 297
 298 \TeX uses a $14.16$ bit fixed point type (implemented as a 32-bit integer type, with one sign bit and one bit used to detect overflow)\cite{beebe2007extending}.
 299 This can represent values in the range $[-(2^14), 2^14 - 1]$ with 16 binary digits of fractional precision.
 300
 301 The DVI files \TeX \, produces may use ``up to'' 32-bit signed integers\cite{fuchs1982theformat} to specify the document, but there is no requirement that
 302 implementations support the full 32-bit type. It would be permissible, for example, to have a DVI viewer support only 24-bit signed integers, though many
 303 files which require greater range may fail to render correctly.
 304
 305 PostScript\cite{plrm} supports two different numeric types: \emph{integers} and \emph{reals}, both of which are specified as strings. The interpreter's representation of numbers
 306 is not exposed, though the representation of integers can be divined by a program by the use of bitwise operations. The PostScript specification lists some ``typical limits''
 307 of numeric types, though the exact limits may differ from implementation to implementation. Integers typically must fall in the range $[-2^{31}, 2^{31} - 1]$,
 308 and reals are listed to have largest and smallest values of $\pm10^{38}$, values closest to $0$ of $\pm10^{-38}$ and approximately $8$ decimal digits of precision,
 309 derived from the IEEE 754 single-precision floating-point specification.
 310
 311 Similarly, the PDF specification\cite{pdfref17} stores \emph{integers} and \emph{reals} as strings, though in a more restricted format than PostScript.
 312 The PDF specification gives limits for the internal representation of values. Integer limits have not changed from the PostScript specification, but numbers
 313 representable with the \emph{real} type have been specified differently: the largest representable values are $\pm 3.403\times 10^{38}$, the smallest non-zero representable values are
 314 \footnote{The PDF specification mistakenly leaves out the negative in the exponent here.}
 315 $\pm1.175 \times 10^{-38}$ with approximately $5$ decimal digits of precision \emph{in the fractional part}.
 316 Adobe's implementation of PDF uses both IEEE 754 single precision floating-point numbers and (for some calculations, and in previous versions) 16.16 bit fixed-point values.
 317
 318 The SVG specification\cite{svg2011-1.1} specifies numbers as strings with a decimal representation of the number.
 319 It is stated that a ``Conforming SVG Viewer'' must have ``all visual rendering accurate to within one device pixel to the mathematically correct result at the initial 1:1
 320 zoom ratio'' and that ``it is suggested that viewers attempt to keep a high degree of accuracy when zooming.''
 321 A ``Conforming High-Quality SVG Viewer'' must use ``double-precision floating point\footnote{Presumably the 64-bit IEEE 754 ``double'' type.}'' for computations involving
 322 coordinate system transformations.
 323
 324 \section{Quadtrees}
 325 When viewing or processing a small part of a large document, it may be helpful to
 326 only processs --- or \emph{cull} --- parts of the document which are not on-screen.
 327
 328 \begin{figure}[h]
 329         \centering \includegraphics[width=0.4\linewidth]{figures/quadtree_example}
 330         \caption{A simple quadtree.}
 331 \end{figure}
 332 The quadtree\cite{finkel1974quad}is a data structure --- one of a family of \emph{spatial}
 333 data structures --- which recursively breaks down space into smaller subregions
 334 which can be processed independently. Points (or other objects) are added to a single
 335 node which (if certain criteria are met) is split into four equal-sized subregions, and
 336 points attached to the region which contains them.
 337
 338 Quadtrees have been used in computer graphics for both culling --- excluding objects in
 339 nodes which are not visible --- and ``level of detail'', where different levels of the quadtree store
 340 different quality versions of objects or data\cite{zerbst2004game}.
 341 Typically the number of points in a node
 342 exceeding a maximum triggers this split, though in our case likely the level of precision required exceeding
 343 that supported by the data type in use.
 344
 345 In this project, we will be experimenting with a form of quadtree in which each
 346 node has its own independent coordinate system, allowing us to store some spatial
 347 information\footnote{One bit per-coordinate, per-level of the quadtree} within the
 348 quadtree structure, eliminating redundancy in the coordinates of nearby objects.
 349
 350 Other spatial data structures exist, such as the KD-tree\cite{bentley1975multidimensional},
 351 which partitions the space on any axis-aligned line; or the BSP tree\cite{fuchs1980onvisible},
 352 which splits along an arbitrary line which need not be axis aligned. We believe, however,
 353 that the simpler conversion from binary coordinates to the quadtree's binary split make
 354 it a better avenue for initial research to explore.
 355
 356 \bibliographystyle{unsrt}
 357 \bibliography{papers}
 358
 359 \end{document}