Deadline approaching, battery dying, no adaptor, X crashing, gdm3 is a piece of shit
bus was late, almost locked wallet in lab, porcupine does not have the pygments latex package
wasted time writing commit messages...
\chapter{Literature Review}\label{Background}
-The first half of this chapter will be devoted to documents themselves, including: the representation and displaying of graphics primitives\cite{computergraphics2}, and how collections of these primitives are represented in document formats, focusing on widely used standards\cite{plrm, pdfref17, svg2011-1.1}.
+The first part of this chapter will be devoted to documents themselves, including: the representation and displaying of graphics primitives, and how collections of these primitives are represented in document formats, focusing on widely used standards.
We will find that although there has been a great deal of research into the rendering, storing, editing, manipulation, and extension of document formats, modern standards are content to specify at best single precision IEEE-754 floating point arithmetic.
-The research on arbitrary precision arithmetic applied to documents is rather sparse; however arbitrary precision arithmetic itself is a very active field of research. Therefore, the second half of this chapter will be devoted to considering fixed precision floating point numbers as specified by the IEEE-754 standard, possible limitations in precision, and alternative number representations for increased or arbitrary precision arithmetic.
+The research on arbitrary precision arithmetic applied to documents is rather sparse; however arbitrary precision arithmetic itself is a very active field of research. Therefore, remainder of this chapter will be devoted to considering fixed precision floating point numbers as specified by the IEEE-754 standard, possible limitations in precision, and alternative number representations for increased or arbitrary precision arithmetic.
In Chapter \ref{Progress}, we will discuss our findings so far with regards to arbitrary precision arithmetic applied to document formats, and expand upon the goals outlined in Chapture \ref{Proposal}.
Hearn and Baker's textbook ``Computer Graphics''\cite{computergraphics2} gives a comprehensive overview of graphics from physical display technologies through fundamental drawing algorithms to popular graphics APIs. This section will examine algorithms for drawing two dimensional geometric primitives on raster displays as discussed in ``Computer Graphics'' and the relevant literature. This section is by no means a comprehensive survey of the literature but intends to provide some idea of the computations which are required to render a document.
-It is of some historical significance that vector display devices were popular during the 70s and 80s, and papers oriented towards drawing on these devices can be found\cite{brassel1979analgorithm}. Whilst curves can be drawn at high resolution on vector displays, a major disadvantage was shading; by the early 90s the vast majority of computer displays were raster based\cite{computergraphics2}.
+It is of some historical significance that vector display devices were popular during the 70s and 80s, and papers oriented towards drawing on these devices can be found\cite{brassel1979analgorithm}. Whilst curves can be drawn at high resolution on vector displays, a major disadvantage was shading\cite{lane1983analgorithm}; by the early 90s the vast majority of computer displays were raster based\cite{computergraphics2}.
-\subsection{Straight Lines}\label{Rasterising Straight Lines}
+\subsection{Straight Lines}\label{Straight Lines}
\input{chapters/Background_Lines}
\subsection{Spline Curves and B{\'e}ziers}\label{Spline Curves}
-\section{Number Representations}
+\section{Number Representations}\label{Number Representations}
+Consider a value of $7.25 = 2^2 + 2^1 + 2^0 + 2^{-2}$. In binary (base 2), this could be written as $111.01_2$ Such a value would require 5 binary digits (bits) of memory to represent exactly in computer hardware. Some values, for example $7.3$ can not be represented exactly in one base (decimal) but not another; in binary the sequence $111.010\text{...}_2$ will never terminate. A rational value such as $\frac{7}{3}$ could not be represented exactly in any base, but could be represented by the combination of a numerator $7 = 111_2$ and denominator $3 = 11_2$. Lastly, some values such as $e \approx 2.81\text{...}$ can only be expressed exactly using a symbolical system --- in this case as the result of an infinite summation --- $e = \displaystyle\sum_n=0^{\infty} (-1)^{n}\frac{1}{n!}$
-Consider a value of $7.25 = 2^2 + 2^1 + 2^0 + 2^{-2}$. In binary, this can be written as $111.01_2$ Such a value would require 5 binary digits (bits) of memory to represent exactly. On the other hand a rational value such as $7\frac{1}{3}$ could not be represented exactly; the sequence of bits $111.0111 \text{ ...}_2$ never terminates. Modern computer hardware typically supports integer and floating-point number representations. Due to physical limitations, the size of these representations is limited; this is the fundamental source of both limits on range and precision in computer based calculations.
-
-A Fixed Point representation keeps the ``point'' at the same position in a string of bits. Floating point representations can be thought of as analogous to scientific notation; an ``exponent'' and fixed point value are encoded, with multiplication by the exponent moving the position of the point.
+Modern computer hardware typically supports integer and floating-point number representations and operations. Due to physical limitations, the size of these representations is limited; this is the fundamental source of both limits on range and precision in computer based calculations.
\subsection{Floating Point Definitions}
+Whilst a Fixed Point representation keeps the ``point'' at the same position in a string of bits, Floating point representations can be thought of as analogous to scientific notation; an ``exponent'' and fixed point value are encoded, with multiplication by the exponent moving the position of the point.
+
A floating point number $x$ is commonly represented by a tuple of values $(s, e, m)$ in base $B$ as\cite{HFP, ieee2008-754}:
\begin{align*}
Where $s$ is the sign and may be zero or one, $m$ is commonly called the ``mantissa'' and $e$ is the exponent. Whilst $e$ is an integer in some range $\pm e_max$, the mantissa $m$ is a fixed point value in the range $0 < m < B$.
-The choice of base $B = 2$ in the original IEEE-754 standard matches the nature of modern hardware. It has also been found that this base in general gives the smallest rounding errors\cite{HFP}. Early computers had in fact used a variety of representations including $B=3$ or even $B=7$\cite{goldman1991whatevery}, and the revised IEEE-754 standard specifies a decimal representation $B = 10$ intended for use in financial applications\cite{ieee754std2008}. From now on we will restrict ourselves to considering base 2 floats.
+The choice of base $B = 2$ in the original IEEE-754 standard matches the nature of modern hardware. It has also been found that this base in general gives the smallest rounding errors\cite{HFP}. Early computers had in fact used a variety of representations including $B=3$ or even $B=7$\cite{goldman1991whatevery}, and the revised IEEE-754 standard specifies a decimal representation $B = 10$ intended for use in financial applications\cite{ieee754std2008}\footnote{Eg: The smallest valid unit of currency \$0.01 could not be represented exactly in base 2}. From now on we will restrict ourselves to considering base 2 floats.
The IEEE-754 encoding of $s$, $e$ and $m$ requires a fixed number of continuous bits dedicated to each value. Originally two encodings were defined: binary32 and binary64. $s$ is always encoded in a single leading bit, whilst (8,23) and (11,53) bits are used for the (exponent, mantissa) encodings respectively.
-The encoding of $m$ in the IEEE-754 standard is not exactly equivelant to a fixed point value. By assuming an implicit leading bit (ie: restricting $1 \leq m < 2$) except for when $e = 0$, floating point values are gauranteed to have a unique representations; these representations are said to be ``normalised''. When $e = 0$ the leading bit is not implied; these representations are called ``denormals'' because multiple representations may map to the same real value. This idea, which allows for one extra bit of precision when using normalised values appears to have been considered by Goldberg as early as 1967\cite{goldbern1967twentyseven}.
+The encoding of $m$ in the IEEE-754 standard is not exactly equivelant to a fixed point value. By assuming an implicit leading bit (ie: restricting $1 \leq m < 2$) except for when $e = 0$, floating point values are gauranteed to have a unique representations; these representations are said to be ``normalised''. When $e = 0$ the leading bit is not implied; these representations are called ``denormals'' because multiple representations may map to the same real value. The idea of using an implicit bit appears to have been considered by Goldberg as early as 1967\cite{goldbern1967twentyseven}.
Figure \ref{float.pdf}\footnote{In a digital PDF viewer we suggest increasing the zoom level --- the graphs were created from SVG images} shows the positive real numbers which can be represented exactly by an 8 bit floating point number encoded in the IEEE-754 format\footnote{Not quite; we are ignoring the IEEE-754 definitions of NaN and Infinity for simplicity}, and the distance between successive floating point numbers. We show two encodings using (1,2,5) and (1,3,4) bits to encode (sign, exponent, mantissa) respectively. For each distinct value of the exponent, the successive floating point representations lie on a straight line with constant slope. As the exponent increases, larger values are represented, but the distance between successive values increases; this can be seen on the right. The marked single point discontinuity at \verb/0x10/ and \verb/0x20/ occur when $e$ leaves the denormalised region and the encoding of $m$ changes. We have also plotted a fixed point representation for comparison; fixed point and integer representations appear as straight lines - the distance between points is always constant.
-\subsection{Precision and Rounding}
+\subsection{Precision and Rounding}\label{Precision and Rounding}
Real values which cannot be represented exactly in a floating point representation must be rounded to the nearest floating point value. The results of a floating point operation will in general be such values and thus there is a rounding error possible in any floating point operation. Referring to Figure \ref{floats.pdf} it can be seen that the largest possible rounding error is half the distance between successive floats; this means that rounding errors increase as the value to be represented increases.
\text{ and } c = y_1 - m x_1
\end{align}
-On a raster display, only points $(x,y)$ with integer coordinates can be displayed; however $m$ will generally not be an integer. Thus a straight forward use of Equation \ref{eqn_line} will require costly floating point operations and rounding (See Section\ref{}). Modifications based on computing steps $\Delta x$ and $\Delta y$ eliminate the multiplication but are still less than ideal in terms of performance\cite{computergraphics2}.
+On a raster display, only points $(x,y)$ with integer coordinates can be displayed; however $m$ will generally not be an integer. Thus a straight forward use of Equation \ref{eqn_line} will require costly floating point operations and rounding (See Section\ref{Precision and Rounding}). Modifications based on computing steps $\Delta x$ and $\Delta y$ eliminate the multiplication but are still less than ideal in terms of performance\cite{computergraphics2}.
-It should be noted that algorithms for drawing lines can be based upon sampling $y(x)$ only if $|m| \leq 1$; if $|m| > 1$ then sampling at every integer for $x$ would leave gaps in the line. However line drawing algorithms can be trivially adopted to sample $x(y)$ if $|m| > 1$.
+It should be noted that algorithms for drawing lines can be based upon sampling $y(x)$ only if $|m| \leq 1$; otherwise sampling at every integer $x$ coordinate would leave gaps in the line because $\Delta y > 1$. Line drawing algorithms can be trivially adopted to sample $x(y)$ if $|m| > 1$.
Bresenham's Line Algorithm was developed in 1965 with the motivation of controlling a particular mechanical plotter in use at the time\cite{bresenham1965algorithm}. The plotter's motion was confined to move between discrete positions on a grid one cell at a time, horizontally, vertically or diagonally. As a result, the algorithm presented by Bresenham requires only integer addition and subtraction, and it is easily adopted for drawing pixels on a raster display. Bresenham himself points out that rasterisation processes have existed since long before the first computer displays\cite{bresenham1996pixel}.
In Figure \ref{rasterising-line} a) and b) we illustrate the rasterisation of a line width a single pixel width. The path followed by Bresenham's algorithm is shown. It can be seen that the pixels which are more than half filled by the line are set by the algorithm. This causes a jagged effect called aliasing which is particularly noticable on low resolution displays. From a signal processing point of view this can be understood as due to the sampling of a continuous signal on a discrete grid\cite{wu1991anefficient}.
-Figure \ref{rasterising-line} c) shows an (idealised) antialiased rendering of the line. The pixel intensity has been set to the average of the line and background colours over that pixel. Such an ideal implementation would be impractically computationally expensive on real devices\cite{elias2000graphics}. In 1991 Wu introduced an algorithm for drawing anti-aliased lines which, while equivelant in results to existing algorithms by Fujimoto and Iwata, set the state of the art in performance\cite{wu1991anefficient}\footnote{Techniques for anti-aliasing primitives other than straight lines are discussed in some detail in Chapter 4 of ``Computer Graphics'' \cite{computergraphics2}}.
+Figure \ref{rasterising-line} c) shows an (idealised) antialiased rendering of the line. The pixel intensity has been set to the average of the line and background colours over that pixel. Such an ideal implementation would be impractically computationally expensive on real devices\cite{elias2000graphics}. In 1991 Wu introduced an algorithm for drawing approximately antialiased lines which, while equivelant in results to existing algorithms by Fujimoto and Iwata, set the state of the art in performance\cite{wu1991anefficient}\footnote{Techniques for antialiasing primitives other than straight lines are discussed in some detail in Chapter 4 of ``Computer Graphics'' \cite{computergraphics2}}.
.
At a fundamental level everything that is seen on a display device is represented as either a vector or raster image. These images can be stored as stand alone documents or embedded within a more complex document format capable of containing many other types of information.
-A raster image's structure closely matches it's representation as shown on modern display hardware; the image is represented as a grid of filled square ``pixels''. Each pixel is considered to be a filled square of the same size and contains information describing its colour. This representation is simple and also well suited to storing images as produced by cameras and scanners. The drawback of raster images is that by their very nature there can only be one level of detail.
+A raster image's structure closely matches it's representation as shown on modern display hardware; the image is represented as a grid of filled square ``pixels''. Each pixel is considered to be a filled square of the same size and contains information describing its colour. This representation is simple and also well suited to storing images as produced by cameras and scanners. The drawback of raster images is that by their very nature there can only be one level of detail; this is illustrated in Figures \ref{vector-vs-raster} and \ref{vector-vs-raster-scaled}.
-A vector image contains information about the positioning and shading of geometric shapes. To display this image on modern display hardware, coordinates are transformed according to the view and then the image is converted into a raster like representation. Whilst the raster image merely appears to contain edges, the vector image actually contains information about these edges, meaning they can be displayed ``infinitely sharply'' at any level of detail\cite{citationneeded} --- or they could be if the coordinates are stored with enough precision (see Section \ref{}).
+A vector image contains information about the positioning and shading of geometric shapes. To display this image on modern display hardware, coordinates are transformed according to the view and then the image is converted into a raster like representation. Whilst the raster image merely appears to contain edges, the vector image actually contains information about these edges, meaning they can be displayed ``infinitely sharply'' at any level of detail --- or they could be if the coordinates are stored with enough precision (see Section \ref{Precision and Rounding}).
-Figures \ref{vector-vs-raster} and \ref{vector-vs-raster-scaled} attempt to illustrate the advantage of vector formats by comparing raster and vector images in a similar way to Worth and Packard\cite{worth2003xr}. The right side of Figure \ref{vector-vs-raster} is a raster image which should be recognisable as an animal defined by fairly sharp edges. Figure \ref{vector-vs-raster-scaled} shows how these edges appear jagged when scaled. There is no information in the original image as to what should be displayed at a larger size, so each square shaped pixel is simply increased in size. A blurring effect will probably be visible in most PDF viewers; the software has attempted to make the ``edge'' appear more realistic using a technique called ``antialiasing''.
+Figures \ref{vector-vs-raster} and \ref{vector-vs-raster-scaled} illustrate the advantage of vector formats by comparing raster and vector images in a similar way to Worth and Packard\cite{worth2003xr}. On the right is a raster image which should be recognisable as an animal defined by fairly sharp edges. Figure \ref{vector-vs-raster-scaled} shows how these edges appear jagged when scaled. There is no information in the original image as to what should be displayed at a larger size, so each square shaped pixel is simply increased in size. A blurring effect will probably be visible in most PDF viewers; the software has attempted to make the ``edge'' appear more realistic using a technique called ``antialiasing''.
-The left sides of Figures \ref{vector-vs-raster} and \ref{vector-vs-raster-scaled} are a vector image. When scaled, the edges maintain a smooth appearance which is limited by the resolution of the display rather than the image itself. Vector images are well suited to high quality digital art\footnote{Figure \ref{vector-vs-raster} is not to be taken as an example of this.} and text.
+The left side of the Figures are a vector image. When scaled, the edges maintain a smooth appearance which is limited by the resolution of the display rather than the image itself. %Vector images are well suited to high quality digital art\footnote{Figure \ref{vector-vs-raster} is not to be taken as an example of this.} and text.
\newlength\imageheight
There are many different ways to define a spline.One approach is to specify ``knots'' on the curve and choosing a fixed $n$ ($n = 3$ for ``cubic'' splines) solve for the cooefficients to generate polynomials passing through the points. Alternatively, special polynomials may be defined using ``control'' points which themselves are not part of the curve; these are convenient for graphical based editors.\end{co B{\'e}zier splines are the most straight forward way to define a curve in the standards considered in Section \ref{Document Representations}. A spline defined from two cubic B{\'e}ziers is shown in Figure \ref{spline.pdf}
\end{comment}
-Cubic and Quadratic B{\'e}zier Splines are used to define curved paths in the PostScript\cite{plrm}, PDF\cite{pdfref17} and SVG\cite{svg2011-1.1} standards which we will discuss in Section \ref{Document Representations}. Cubic B{\'e}ziers are also used to define vector fonts for rendering text in these standards and the \TeX typesetting language \cite{knuth1983metafont, knuth1984texbook}. The usefulness of B{\'e}zier curves was realised by Pierre B{\'e}zier who used them in the 1960s for the computer aided design of automobile bodies\cite{bezier1986apersonal}.
+Cubic and Quadratic B{\'e}zier Splines are used to define curved paths in the PostScript\cite{plrm}, PDF\cite{pdfref17} and SVG\cite{svg2011-1.1} standards which we will discuss in Section \ref{Document Representations}. Cubic B{\'e}ziers are also used to define vector fonts for rendering text in these standards and the {\TeX} typesetting language \cite{knuth1983metafont, knuth1984texbook}. Although he did not derive the mathematics, the usefulness of B{\'e}zier curves was realised by Pierre B{\'e}zier who used them in the 1960s for the computer aided design of automobile bodies\cite{bezier1986apersonal}.
A B{\'e}zier Curve of degree $n$ is defined by $n$ ``control points'' $\left\{P_0, ... P_n\right\}$.
Points $P(t) = (x(t), y(t))$ along the curve are defined by:
Algorithms for rendering B{\'e}zier's may simply sample $P(t)$ for suffiently many values of $t$ --- enough so that the spacing between successive points is always less than one pixel distance. Alternately, a smaller number of points may be sampled with the resulting points connected by straight lines using one of the algorithms discussed in Section \ref{Straight Lines}.
-De Casteljau's algorithm of 1959 is often used for approximating B{\'e}ziers\cite{computergraphics2, knuth1983metafont}. This algorithm subdivides the original $n$ control points $\left\{P_0, ... P_n\right\}$ into $2n$ points $\left\{Q_0, ... Q_n\right\}$ and $\left\{R_0, ... R_n\right\}$; when iterated, the produced points will converge to $P(t)$. As a tensor equation this subdivision can be expressed as:
+De Casteljau's algorithm of 1959 is often used for approximating B{\'e}ziers\cite{computergraphics2, knuth1983metafont}. This algorithm subdivides the original $n$ control points $\left\{P_0, ... P_n\right\}$ into $2n$ points $\left\{Q_0, ... Q_n\right\}$ and $\left\{R_0, ... R_n\right\}$; when iterated, the produced points will converge to $P(t)$. As a tensor equation this subdivision can be expressed as\cite{goldman_thefractal}:
\begin{align}
Q_i = \left(\frac{\left(^n_j\right)}{2^j}\right) P_i &\text{ and } R_i = \left(\frac{\left(^{n-j}_{n-k}\right)}{2^{n-j}}\right) P_i
\end{align}
-In much of the literature it is taken as trivial that it is only necessary to specify the control points of a B{\'e}zier in order to be able to render it at any level of detail\cite{knuth1983metafont, computergraphics2}. Recently, Goldman presented an argument that B{\'e}zier's could be considered as fractal in nature, because the De Casteljau algorithm may be modified to be expressed the polynomial $P(t)$ as the result of iterated function system\cite{goldman_thefractal}. If this argument is correct, any primitive that can be described soley in terms of B{\'e}zier Curves may also be considered as fractal in nature. Ideally all these primitives may be rendered at any level of detail or ``zoom'' desired; however, computation of the pixel locations of the curve will be subject to the precision limits of the numerical representation which is used; we discuss these issues in Section \ref{}.
+In much of the literature it is taken as trivial that it is only necessary to specify the control points of a B{\'e}zier in order to be able to render it at any level of detail\cite{knuth1983metafont, computergraphics2}. Recently, Goldman presented an argument that B{\'e}zier's could be considered as fractal in nature, because the De Casteljau algorithm may be modified to be expressed the polynomial $P(t)$ as the result of iterated function system\cite{goldman_thefractal}. If this argument is correct, any primitive that can be described soley in terms of B{\'e}zier Curves may also be considered as fractal in nature. Ideally all these primitives may be rendered at any level of detail or ``zoom'' desired; however, computation of the pixel locations of the curve will be subject to the precision limits of the numerical representation which is used; we discuss these issues in Section \ref{Num}.
\begin{figure}[H]
The emergence of the internet, web browsers, XML/HTML, JavaScript and related technologies has seen a revolution in the ways in which information can be presented digitally, and the PDF standard itself has begun to move beyond static text and figures\cite{hayes2012pixels, barnes2013embedding}. However, the popular document formats are still designed with the intention of showing information at either a single, fixed level of detail, or a small range of levels.
-As most digital display devices are smaller than physical paper medium, all useful viewers are able to ``zoom'' to a subset of the document. Vector graphics formats including PostScript and PDF support rasterisation at different zoom levels\cite{plrm, pdfref17}, but the use of fixed precision floating point numbers causes problems due to imprecision either far from the origin, or at a high level of detail\cite{goldberg1991whatevery, goldberg1992thedesign}.
+As most digital display devices are smaller than physical paper medium, all useful viewers are able to ``zoom'' to a subset of the document. Vector graphics formats including PostScript, PDF and SVG support rasterisation at different zoom levels\cite{plrm, pdfref17, svg2011-1.1}, but the use of fixed precision floating point numbers causes problems due to imprecision either far from the origin, or at a high level of detail\cite{goldberg1991whatevery, goldberg1992thedesign}.
We are now seeing a widespread use of mobile computing devices with touch screens, where the display size is typically much smaller than paper pages and traditional computer monitors; it seems that there is much to be gained by breaking free of the restricted precision of traditional document formats.
\section{Overview}
-The remainder of this document will be organised as follows: In Chapter \ref{Proposal} we give an overview of the current state of the research in document formats, and the motivation for implementing ``infinite precision'' in a document format. We will outline our approach to research in collaboration with David Gow\cite{}. In Chapter \ref{Background} we provide more detailed background examining the literature related to rendering, interpreting, and creating document formats, as well as possible techniques for increased and possibly infinite precision. In Chapter \ref{Progress} gives the current state of our research and the progress towards the goals outlined in Chapter \ref{Introduction}.
+The remainder of this document will be organised as follows: In Chapter \ref{Proposal} we give an overview of the current state of the research in document formats, and the motivation for implementing ``infinite precision'' in a document format. We will outline our approach to research in collaboration with David Gow\cite{proposalGow}. In Chapter \ref{Background} we provide more detailed background examining the literature related to rendering, interpreting, and creating document formats, as well as possible techniques for increased and possibly infinite precision. In Chapter \ref{Progress} gives the current state of our research and the progress towards the goals outlined in Chapter \ref{Introduction}.
Mainly motivated by producing Figure \ref{minifloat.pdf} we have also implemented functions to convert an arbitrary \verb/Real/ type (which may be IEEE-754 floats) to and from a fixed size floating point representation of our choosing. We have not implemented any operations for floating point arithmetic using these representations.
-By using the functions to convert real numbers to variable precision floats as an interface for the virtual FPU, we hope to illustrate the limitations of floating point arithmetic more clearly than would be possible using IEEE-754 binary32 as is native to the C and C++ languages.
+By using the functions to convert real numbers to variable precision floats as an interface for the virtual FPU, we hope to illustrate the limitations of floating point arithmetic more clearly than would be possible using IEEE-754 binary32 as is native to the C and C++ languages. Using the virtual FPU instead of a CPU based software library will prove useful for determining the exact performance of floating point operations.
\section{Prototype Document Formats}