Summarise kilgarde2012gpu and more

[ipdf/documents.git] / LiteratureNotes.tex
diff --git a/LiteratureNotes.tex b/LiteratureNotes.tex

index 1d4fe12..6831428 100644 (file)
--- a/LiteratureNotes.tex
+++ b/LiteratureNotes.tex
@@ -1,11 +1,12 @@
-\documentclass[8pt]{extarticle}
+\documentclass[8pt]{extreport}
  \usepackage{graphicx}
  \usepackage{caption}
  \usepackage{amsmath} % needed for math align
  \usepackage{bm} % needed for maths bold face
   \usepackage{graphicx}    % needed for including graphics e.g. EPS, PS
  \usepackage{fancyhdr}  % needed for header
  \usepackage{graphicx}
  \usepackage{caption}
  \usepackage{amsmath} % needed for math align
  \usepackage{bm} % needed for maths bold face
   \usepackage{graphicx}    % needed for including graphics e.g. EPS, PS
  \usepackage{fancyhdr}  % needed for header
-
+%\usepackage{epstopdf} % Converts eps to pdf before including. Do it manually instead.
+\usepackage{float}
  \usepackage{hyperref}
  
   \topmargin -1.5cm        % read Lamport p.163
  \usepackage{hyperref}
  
   \topmargin -1.5cm        % read Lamport p.163
@@ -17,7 +18,7 @@
   \parskip 8.2pt           % sets spacing between paragraphs
   %\renewcommand{\baselinestretch}{1.5}         % Uncomment for 1.5 spacing between lines
   \parindent 0pt                  % sets leading space for paragraphs
   \parskip 8.2pt           % sets spacing between paragraphs
   %\renewcommand{\baselinestretch}{1.5}         % Uncomment for 1.5 spacing between lines
   \parindent 0pt                  % sets leading space for paragraphs
-\linespread{0.8}
+\linespread{1}
  
  
  \newcommand{\vect}[1]{\boldsymbol{#1}} % Draw a vector
  
  
  \newcommand{\vect}[1]{\boldsymbol{#1}} % Draw a vector
@@ -36,14 +37,14 @@
  \definecolor{darkred}{rgb}{0.75,0,0}
  \definecolor{darkblue}{rgb}{0,0,0.75}
  \definecolor{pink}{rgb}{1,0.5,0.5}
  \definecolor{darkred}{rgb}{0.75,0,0}
  \definecolor{darkblue}{rgb}{0,0,0.75}
  \definecolor{pink}{rgb}{1,0.5,0.5}
-\lstset{language=Java}
+\lstset{language=XML}
  \lstset{backgroundcolor=\color{darkgray}}
  \lstset{backgroundcolor=\color{darkgray}}
-\lstset{numbers=left, numberstyle=\tiny, stepnumber=1, numbersep=5pt}
-\lstset{keywordstyle=\color{darkred}\bfseries}
-\lstset{commentstyle=\color{darkblue}}
+%\lstset{numbers=left, numberstyle=\tiny, stepnumber=1, numbersep=5pt}
+%\lstset{keywordstyle=\color{darkred}\bfseries}
+%\lstset{commentstyle=\color{darkblue}}
  %\lstset{stringsyle=\color{red}}
  %\lstset{stringsyle=\color{red}}
-\lstset{showstringspaces=false}
-\lstset{basicstyle=\small}
+%\lstset{showstringspaces=false}
+%\lstset{basicstyle=\small}
  
  \newcommand{\shell}[1]{\texttt{#1}}
  \newcommand{\code}[1]{\texttt{#1}}
  
  \newcommand{\shell}[1]{\texttt{#1}}
  \newcommand{\code}[1]{\texttt{#1}}
@@ -68,21 +69,69 @@
  
  \tableofcontents
  
  
  \tableofcontents
  
-\section{Postscript Language Reference Manual\cite{plrm}}
+\chapter{Literature Summaries}
+
+\section{Postscript Language Reference Manual  \cite{plrm}}
  
  Adobe's official reference manual for PostScript.
  
  It is big.
  
  
  Adobe's official reference manual for PostScript.
  
  It is big.
  
-\section{Portable Document Format Reference Manual\cite{pdfref17}}
+\begin{itemize}
+       \item First version was published BEFORE the IEEE standard and used smaller floats than binary32
+       \item Now uses binary32 floats.
+\end{itemize}
+
+\section{Portable Document Format Reference Manual  \cite{pdfref17}}
  
  Adobe's official reference for PDF.
  
  It is also big.
  
  
  Adobe's official reference for PDF.
  
  It is also big.
  
+\begin{itemize}
+       \item Early versions did not use IEEE binary32 but 16-16 exponent/mantissa encodings (Why?)
+       \item Current standard is restricted to binary32
+       \item It specifically says PDF creators must use at most binary32 because higher precision is not supported by Adobe Reader.
+\end{itemize}
+
+\section{IEEE Standard for Floating-Point Arithmetic \cite{ieee2008-754}}
+
+The IEEE (revised) 754 standard.
+
+It is also big.
+
+Successes:
+\begin{itemize}
+       \item Has been adopted by CPUs
+       \item Standardised floats for programmers --- accomplishes goal of allowing non-numerical experts to write reasonably sophisticated platform independent programs that may perform complex numerical operations
+\end{itemize}
+
+Failures:
+\begin{itemize}
+       \item Adoption by GPUs slower\cite{hillesland2004paranoia}
+       \item It specifies the maximum errors for operations using IEEE types but nothing about internal representations
+       \item Many hardware devices (GPUs and CPUs) use non-IEEE representations internally and simply truncate/round the result
+       \begin{itemize}
+               \item This isn't so much of a problem when the device uses additional bits but it is misleading when GPUs use less than binary32 and act as if they are using binary32 from the programmer's perspective.
+               \item Devices using {\bf less} bits internally aren't IEEE compliant
+       \end{itemize}
+       \item Thus the same program compiled and run on different architectures may give completely different results\cite{HFP}
+       \begin{itemize}
+               \item The ultimate goal of allowing people to write numerical programs in total ignorance of the hardware is not entirely realised
+       \end{itemize}
+       \item This is the sort of thing that makes people want to use a virtual machine, and thus Java
+       \begin{itemize}
+               \item Objectively I probably shouldn't say that using Java is in itself a failure
+       \end{itemize}
+       \item Standards such as PostScript and PDF were slow to adopt IEEE representations
+       \item The OpenVG standard accepts IEEE binary32 in the API but specifically states that hardware may use less than this\cite{rice2008openvg}
+\end{itemize}
+
+
+
  \pagebreak
  
  \pagebreak
  
-\section{Portable Document Format (PDF) --- Finally...\cite{cheng2002portable}}
+\section{Portable Document Format (PDF) --- Finally...  \cite{cheng2002portable}}
  
  This is not spectacularly useful, is basically an advertisement for Adobe software.
  
  
  This is not spectacularly useful, is basically an advertisement for Adobe software.
  
@@ -115,7 +164,7 @@ This is not spectacularly useful, is basically an advertisement for Adobe softwa
  \end{itemize}
  
  \pagebreak
  \end{itemize}
  
  \pagebreak
-\section{Pixels or Perish \cite{hayes2012pixels}}
+\section{Pixels or Perish  \cite{hayes2012pixels}}
  
  ``The art of scientific illustration will have to adapt to the new age of online publishing''
  And therefore, JavaScript libraries ($\text{D}^3$) are the future.
  
  ``The art of scientific illustration will have to adapt to the new age of online publishing''
  And therefore, JavaScript libraries ($\text{D}^3$) are the future.
@@ -180,14 +229,14 @@ This paper uses Metaphors a lot. I never met a phor that didn't over extend itse
  
  \end{itemize}
  
  
  \end{itemize}
  
-\section{Embedding and Publishing Interactive, 3D Figures in PDF Files\cite{barnes2013embedding}}
+\section{Embedding and Publishing Interactive, 3D Figures in PDF Files  \cite{barnes2013embedding}}
  
  \begin{itemize}
         \item Linkes well with \cite{hayes2012pixels}; I heard you liked figures so I put a figure in your PDF
         \item Title pretty much summarises it; similar to \cite{hayes2012pixels} except these guys actually did something practical
  \end{itemize}
  
  
  \begin{itemize}
         \item Linkes well with \cite{hayes2012pixels}; I heard you liked figures so I put a figure in your PDF
         \item Title pretty much summarises it; similar to \cite{hayes2012pixels} except these guys actually did something practical
  \end{itemize}
  
-\section{27 Bits are not enough for 8 digit accuracy\cite{goldberg1967twentyseven}}
+\section{27 Bits are not enough for 8 digit accuracy  \cite{goldberg1967twentyseven}}
  
  Proves with maths, that rounding errors mean that you need at least $q$ bits for $p$ decimal digits. $10^p < 2^{q-1}$
  
  
  Proves with maths, that rounding errors mean that you need at least $q$ bits for $p$ decimal digits. $10^p < 2^{q-1}$
  
@@ -210,7 +259,7 @@ Proves with maths, that rounding errors mean that you need at least $q$ bits for
         \end{itemize}
  \end{itemize}
  
         \end{itemize}
  \end{itemize}
  
-\section{What every computer scientist should know about floating-point arithmetic\cite{goldberg1991whatevery}}
+\section{What every computer scientist should know about floating-point arithmetic  \cite{goldberg1991whatevery}}
  
  \begin{itemize}
         \item Book: \emph{Floating Point Computation} by Pat Sterbenz (out of print... in 1991)
  
  \begin{itemize}
         \item Book: \emph{Floating Point Computation} by Pat Sterbenz (out of print... in 1991)
@@ -260,7 +309,7 @@ Proves with maths, that rounding errors mean that you need at least $q$ bits for
  %%%%
  % David's Stuff
  %%%%
  %%%%
  % David's Stuff
  %%%%
-\section{Compositing Digital Images\cite{porter1984compositing}}
+\section{Compositing Digital Images  \cite{porter1984compositing}}
  
  
  
  
  
  
@@ -300,7 +349,7 @@ and is implemented almost exactly by modern graphics APIs such as \texttt{OpenGL
  all but guaranteed that this is the method we will be using for compositing
  document elements in our project.
  
  all but guaranteed that this is the method we will be using for compositing
  document elements in our project.
  
-\section{Bresenham's Algorithm: Algorithm for computer control of a digital plotter\cite{bresenham1965algorithm}}
+\section{Bresenham's Algorithm: Algorithm for computer control of a digital plotter  \cite{bresenham1965algorithm}}
  Bresenham's line drawing algorithm is a fast, high quality line rasterization
  algorithm which is still the basis for most (aliased) line drawing today. The
  paper, while originally written to describe how to control a particular plotter,
  Bresenham's line drawing algorithm is a fast, high quality line rasterization
  algorithm which is still the basis for most (aliased) line drawing today. The
  paper, while originally written to describe how to control a particular plotter,
@@ -323,13 +372,13 @@ sub-pixel coverage into account. Bresenham himself extended this algorithm to
  produce Bresenham's circle algorithm. The principles behind the algorithm have
  also been used to rasterize other shapes, including B\'{e}zier curves.
  
  produce Bresenham's circle algorithm. The principles behind the algorithm have
  also been used to rasterize other shapes, including B\'{e}zier curves.
  
-\section{Quad Trees: A Data Structure for Retrieval on Composite Keys\cite{finkel1974quad}}
+\section{Quad Trees: A Data Structure for Retrieval on Composite Keys  \cite{finkel1974quad}}
  
  This paper introduces the ``quadtree'' spatial data structure. The quadtree structure is
  a search tree in which every node has four children representing the north-east, north-west,
  south-east and south-west quadrants of its space.
  
  
  This paper introduces the ``quadtree'' spatial data structure. The quadtree structure is
  a search tree in which every node has four children representing the north-east, north-west,
  south-east and south-west quadrants of its space.
  
-\section{Xr: Cross-device Rendering for Vector Graphics\cite{worth2003xr}}
+\section{Xr: Cross-device Rendering for Vector Graphics  \cite{worth2003xr}}
  
  Xr (now known as Cairo) is an implementation of the PDF v1.4 rendering model,
  independent of the PDF or PostScript file formats, and is now widely used
  
  Xr (now known as Cairo) is an implementation of the PDF v1.4 rendering model,
  independent of the PDF or PostScript file formats, and is now widely used
@@ -355,7 +404,7 @@ providing a trade-off between rendering quality and performance. The library dev
  that setting the tolerance to greater than $0.1$ device pixels resulted in errors visible to the
  user.
  
  that setting the tolerance to greater than $0.1$ device pixels resulted in errors visible to the
  user.
  
-\section{Glitz: Hardware Accelerated Image Compositing using OpenGL\cite{nilsson2004glitz}}
+\section{Glitz: Hardware Accelerated Image Compositing using OpenGL  \cite{nilsson2004glitz}}
  
  This paper describes the implementation of an \texttt{OpenGL} based rendering backend for
  the \texttt{Cairo} library. 
  
  This paper describes the implementation of an \texttt{OpenGL} based rendering backend for
  the \texttt{Cairo} library. 
@@ -378,9 +427,11 @@ Performance was much improved over the software rasterization and over XRender a
  on all except nVidia hardware. However, nVidia's XRender implementation did slow down significantly when
  some transformations were applied.
  
  on all except nVidia hardware. However, nVidia's XRender implementation did slow down significantly when
  some transformations were applied.
  
+In \cite{kilgard2012gpu}, Kilgard mentions that Glitz has been abandoned. He describes it as ''GPU assisted'' rather than GPU accelerated, since it used the XRender (??) extension.
+
  %% Sam again
  
  %% Sam again
  
-\section{Boost Multiprecision Library\cite{boost_multiprecision}}
+\section{Boost Multiprecision Library  \cite{boost_multiprecision}}
  
  \begin{itemize}
         \item ``The Multiprecision Library provides integer, rational and floating-point types in C++ that have more range and precision than C++'s ordinary built-in types.''
  
  \begin{itemize}
         \item ``The Multiprecision Library provides integer, rational and floating-point types in C++ that have more range and precision than C++'s ordinary built-in types.''
@@ -391,7 +442,7 @@ some transformations were applied.
  
  % Some hardware related sounding stuff...
  
  
  % Some hardware related sounding stuff...
  
-\section{A CMOS Floating Point Unit\cite{kelley1997acmos}}
+\section{A CMOS Floating Point Unit  \cite{kelley1997acmos}}
  
  The paper describes the implentation of a FPU for PowerPC using a particular Hewlett Packard process (HP14B 0.5$\mu$m, 3M, 3.3V).
  It implements a ``subset of the most commonly used double precision floating point instructions''. The unimplemented operations are compiled for the CPU.
  
  The paper describes the implentation of a FPU for PowerPC using a particular Hewlett Packard process (HP14B 0.5$\mu$m, 3M, 3.3V).
  It implements a ``subset of the most commonly used double precision floating point instructions''. The unimplemented operations are compiled for the CPU.
@@ -418,8 +469,10 @@ It is probably not that useful, I don't think we'll end up writing FPU assembly?
  
  FPU's typically have 80 bit registers so they can support REAL4, REAL8 and REAL10 (single, double, extended precision).
  
  
  FPU's typically have 80 bit registers so they can support REAL4, REAL8 and REAL10 (single, double, extended precision).
  
+Note: Presumably this is referring to the x86 80 bit floats that David was talking about?
  
  
-\section{Floating Point Package User's Guide\cite{bishop2008floating}}
+
+\section{Floating Point Package User's Guide  \cite{bishop2008floating}}
  
  This is a technical report describing floating point VHDL packages \url{http://www.vhdl.org/fphdl/vhdl.html}
  
  
  This is a technical report describing floating point VHDL packages \url{http://www.vhdl.org/fphdl/vhdl.html}
  
@@ -432,13 +485,13 @@ See also: Java Optimized Processor\cite{jop} (it has a VHDL implementation of a
  
  \section{Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy\cite{dieter2007lowcost}}
  
  
  \section{Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy\cite{dieter2007lowcost}}
  
-Mentions how GPUs offer very good floating point performance but only for single precision floats.
+Mentions how GPUs offer very good floating point performance but only for single precision floats. (NOTE: This statement seems to contradict \cite{hillesland2004paranoia}.
  
  Has a diagram of a Floating Point adder.
  
  Talks about some magical technique called "Native-pair Arithmetic" that somehow makes 32-bit floating point accuracy ``competitive'' with 64-bit floating point numbers.
  
  
  Has a diagram of a Floating Point adder.
  
  Talks about some magical technique called "Native-pair Arithmetic" that somehow makes 32-bit floating point accuracy ``competitive'' with 64-bit floating point numbers.
  
-\section{Accurate Floating Point Arithmetic through Hardware Error-Free Transformations\cite{kadric2013accurate}}
+\section{Accurate Floating Point Arithmetic through Hardware Error-Free Transformations  \cite{kadric2013accurate}}
  
  From the abstract: ``This paper presents a hardware approach to performing ac-
  curate floating point addition and multiplication using the idea of error-
  
  From the abstract: ``This paper presents a hardware approach to performing ac-
  curate floating point addition and multiplication using the idea of error-
@@ -455,7 +508,9 @@ I guess it's time to try and work out how to use the Opensource VHDL implementat
  This is about reduction of error in hardware operations rather than the precision or range of floats.
  But it is probably still relevant.
  
  This is about reduction of error in hardware operations rather than the precision or range of floats.
  But it is probably still relevant.
  
-\section{Floating Point Unit from JOP\cite{jop}}
+This has the Fast2Sum algorithm but for the love of god I cannot see how you can compute anything other than $a + b = 0 \forall a,b$ using the algorithm as written in their paper. It references Dekker\cite{dekker1971afloating} and Kahn; will look at them instead.
+
+\section{Floating Point Unit from JOP  \cite{jop}}
  
  This is a 32 bit floating point unit developed for JOP in VHDL.
  I have been able to successfully compile it and the test program using GHDL\cite{ghdl}. 
  
  This is a 32 bit floating point unit developed for JOP in VHDL.
  I have been able to successfully compile it and the test program using GHDL\cite{ghdl}. 
@@ -463,7 +518,7 @@ I have been able to successfully compile it and the test program using GHDL\cite
  Whilst there are constants (eg: \verb/FP_WIDTH = 32, EXP_WIDTH = 8, FRAC_WIDTH = 23/) defined, the actual implementation mostly uses magic numbers, so 
  some investigation is needed into what, for example, the "52" bits used in the sqrt units are for.
  
  Whilst there are constants (eg: \verb/FP_WIDTH = 32, EXP_WIDTH = 8, FRAC_WIDTH = 23/) defined, the actual implementation mostly uses magic numbers, so 
  some investigation is needed into what, for example, the "52" bits used in the sqrt units are for.
  
-\section{GHDL\cite{ghdl}}
+\section{GHDL  \cite{ghdl}}
  
  GHDL is an open source GPL licensed VHDL compiler written in Ada. It had packages in debian up until wheezy when it was removed. However the sourceforge site still provides a \shell{deb} file for wheezy.
  
  
  GHDL is an open source GPL licensed VHDL compiler written in Ada. It had packages in debian up until wheezy when it was removed. However the sourceforge site still provides a \shell{deb} file for wheezy.
  
@@ -477,7 +532,7 @@ Using unix domain sockets we can execute the FPU as a child process and communic
  
  Using \shell{ghdl} the testbench can also be linked as part a C/C++ program and run using a function; however there is still no way to communicate with it other than forking a child process and using a unix domain socket anyway. Also, compiling the VHDL FPU as part of our document viewer would clutter the code repository and probably be highly unportable. The VHDL FPU has been given a seperate repository.
  
  
  Using \shell{ghdl} the testbench can also be linked as part a C/C++ program and run using a function; however there is still no way to communicate with it other than forking a child process and using a unix domain socket anyway. Also, compiling the VHDL FPU as part of our document viewer would clutter the code repository and probably be highly unportable. The VHDL FPU has been given a seperate repository.
  
-\section{On the design of fast IEEE floating-point adders\cite{seidel2001onthe}}
+\section{On the design of fast IEEE floating-point adders  \cite{seidel2001onthe}}
  
  This paper gives an overview of the ``Naive'' floating point addition/subtraction algorithm and gives several optimisation techniques:
  
  
  This paper gives an overview of the ``Naive'' floating point addition/subtraction algorithm and gives several optimisation techniques:
  
@@ -501,13 +556,316 @@ The paper concludes by summarising the optimisation techniques used by various a
  
  This paper does not specifically mention the precision of the operations, but may be useful because a faster adder design might mean you can increase the precision.
  
  
  This paper does not specifically mention the precision of the operations, but may be useful because a faster adder design might mean you can increase the precision.
  
+\section{Re: round32 ( round64 ( X ) ) ?= round32 ( X )  \cite{beebe2011round32}}
+
+I included this just for the quote by Nelson H. F. Beebe:
+
+``It is too late now to repair the mistakes of the past that are present
+in millions of installed systems, but it is good to know that careful
+research before designing hardware can be helpful.''
+
+This is in regards to the problem of double rounding. It provides a reference for a paper that discusses a rounding mode that eliminates the problem, and a software implementation.
+
+It shows that the IEEE standard can be fallible!
+
+Not sure how to work this into our literature review though.
  
  % Back to software
  
  % Back to software
-\section{Basic Issues in Floating Point Arithmetic and Error Analysis\cite{demmel1996basic}}
+\section{Basic Issues in Floating Point Arithmetic and Error Analysis  \cite{demmel1996basic}}
  
  These are lecture notes from U.C Berkelye CS267 in 1996.
  
  
  
  These are lecture notes from U.C Berkelye CS267 in 1996.
  
  
+\section{Charles Babbage  \cite{dodge_babbage, nature1871babbage}}
+
+Tributes to Charles Babbage. Might be interesting for historical background. Don't mention anything about floating point numbers though.
+
+\section{GPU Floating-Point Paranoia  \cite{hillesland2004paranoia}}
+
+This paper discusses floating point representations on GPUs. They have reproduced the program \emph{Paranoia} by William Kahan for characterising floating point behaviour of computers (pre IEEE) for GPUs.
+
+
+There are a few remarks about GPU vendors not being very open about what they do or do not do with 
+
+
+Unfortunately we only have the extended abstract, but a pretty good summary of the paper (written by the authors) is at: \url{www.cs.unc.edu/~ibr/projects/paranoia/}
+
+From the abstract:
+
+``... [GPUs are often similar to IEEE] However, we have found
+that GPUs do not adhere to IEEE standards for floating-point op-
+erations, nor do they give the information necessary to establish
+bounds on error for these operations ... ''
+
+and ``...Our goal is to determine the error bounds on floating-point op-
+eration results for quickly evolving graphics systems. We have cre-
+ated a tool to measure the error for four basic floating-point opera-
+tions: addition, subtraction, multiplication and division.''
+
+The implementation is only for windows and uses glut and glew and things.
+Implement our own version?
+
+\section{A floating-point technique for extending the available precision  \cite{dekker1971afloating}}
+
+This is Dekker's formalisation of the Fast2Sum algorithm originally implemented by Kahn.
+
+\begin{align*}
+       z &= \text{RN}(x + y) \\
+       w &= \text{RN}(z - x) \\
+       zz &= \text{RN}(y - w) \\
+       \implies z + zz &= x + y
+\end{align*}
+
+There is a version for multiplication.
+
+I'm still not quite sure when this is useful. I haven't been able to find an example for $x$ and $y$ where $x + y \neq \text{Fast2Sum}(x, y)$.
+
+\section{Handbook of Floating-Point Arithmetic \cite{HFP}}
+
+This book is amazingly useful and pretty much says everything there is to know about Floating Point numbers.
+It is much easier to read than Goldberg or Priest's papers.
+
+I'm going to start working through it and compile their test programs.
+
+\subsection{A sequence that seems to converge to a wrong limit  - pgs 9-10, \cite{HFP}}
+
+\begin{align*}
+       u_n &= \left\{ \begin{array}{c} u_0 = 2 \\ u_1 = -4 \\ u_n = 111 - \frac{1130}{u_{n-1}} + \frac{3000}{u_{n-1}u_{n-2}}\end{array}\right.
+\end{align*}
+
+The limit of the series should be $6$ but when calculated with IEEE floats it is actually $100$
+The authors show that the limit is actually $100$ for different starting values, and the error in floating point arithmetic causes the series to go to that limit instead.
+
+\begin{figure}[H]
+       \centering
+       \includegraphics[width=0.8\textwidth]{figures/handbook1-1.pdf}
+       \caption{Output of Program 1.1 from \emph{Handbook of Floating-Point Arithmetic}\cite{HFP} for various IEEE types}
+       \label{HFP-1-1}
+\end{figure}
+
+\subsection{Mr Gullible and the Chaotic Bank Society pgs 10-11 \cite{HFP}}
+
+This is an example of a sequence involving $e$. Since $e$ cannot be represented exactly with FP, even though the sequence should go to $0$ for $a_0 = e - 1$, the representation of $a_0 \neq e - 1$ so the sequence goes to $\pm \infty$.
+
+To eliminate these types of problems we'd need an \emph{exact} representation of all real numbers.
+For \emph{any} FP representation, regardless of precision (a finite number of digits) there will be numbers that can't be represented exactly hence you could find a similar sequence that would explode.
+
+IE: The more precise the representation, the slower things go wrong, but they still go wrong, {\bf even with errorless operations}.
+
+
+\subsection{Rump's example pg 12 \cite {HFP}}
+
+This is an example where the calculation of a function $f(a,b)$ is not only totally wrong, it gives completely different results depending on the CPU. Despite the CPU conforming to IEEE.
+
+\section{Scalable Vector Graphics (SVG) 1.1 (Second Edition) \cite{svg2011-1.1}}
+
+Scalable Vector Graphics (SVG) is a XML language for describing two dimensional graphics. This document is \url{http://www.w3.org/TR/2011/REC-SVG11-20110816/}, the latest version of the standard at the time of writing.
+
+\subsubsection{Three types of object}
+\begin{enumerate}
+       \item Vector graphics shapes (paths)
+       \item Images (embedded bitmaps)
+       \item Text
+\end{enumerate}
+
+\subsubsection{Rendering Model and Precision}
+
+SVG uses the ``painter's model''. Paint is applied to regions of the page in an order, covering the paint below it according to rules for alpha blending.
+
+``Implementations of SVG are expected to behave as though they implement a rendering (or imaging) model cor-
+responding to the one described in this chapter. A real implementation is not required to implement the model in
+this way, but the result on any device supported by the implementation shall match that described by this model.''
+
+SVG uses {\bf single precision floats}. Unlike PDF and PostScript, the standard specifies this as a {\bf minimum} range from \verb/-3.4e+38F/ to \verb/+3.4e+38F/
+
+``It is recommended that higher precision floating point storage and computation be performed on operations
+such as coordinate system transformations to provide the best possible precision and to prevent round-off errors.''
+
+There is also a ``High Quality SVG Viewers'' requirement to use at least {\bf double} precision floats.
+
+\section{OpenVG Specification 1.1 \cite{rice2008openvg}}
+``OpenVG is an application programming interface (API) for hardware-accelerated two-
+dimensional vector and raster graphics developed under the auspices of the Khronos
+Group \url{www.khronos.org}.''
+
+Specifically, provides the same drawing functionality required by a high-performance SVG document viewer (SVG Tiny 1.2)
+TODO: Look at that $\neq$ SVG 1.1
+
+It is designed to be similar to OpenGL.
+
+\subsubsection{Precision}
+``All floating-point values are specified in standard IEEE 754 format. However,
+implementations may clamp extremely large or small values to a restricted
+range, and internal processing may be performed with lesser precision. At least
+16 bits of mantissa, 6 bits of exponent, and a sign bit must be present, allowing
+values from $\pm 2^{-30}$ to $\pm2^{31}$ to be represented with a fractional precision of at least 1
+in $2^{16}$.''
+
+IEEE but with a non standard number of bits.
+
+Presumably the decreased precision is for increased efficiency the standard states that example applications are for ebooks.
+
+
+\section{Document Object Model --- pugixml 1.4 manual \cite{pugixmlDOM}}
+
+Pugixml is a C++ library for parsing XML documents\cite{kapoulkine2014pugixml}. XML is based on the Document Object Model (DOM); this is explained pretty well by the pugixml manual.
+
+The document is the root node of the tree. Each child node has a type. These may
+\begin{itemize}
+       \item Have attributes
+       \item Have child nodes of their own
+       \item Contain data 
+\end{itemize}
+
+In the case of HTML/SVG an XML parser within the browser/viewer creates the DOM tree from the XML (plain text) and then interprets that tree to produce the objects that will be rendered.
+
+Example of XML $\to$ DOM tree given at\cite{pugixmlDOM}.
+Example of XML parsing using pugixml is in \shell{code/src/tests/xml.cpp}
+
+
+\begin{figure}[H]
+       \centering
+\begin{lstlisting}[language=XML,basicstyle=\ttfamily]
+<?xml version="1.0"?>
+<mesh name="mesh_root">
+    <!-- here is a mesh node -->
+    some text
+    <![CDATA[someothertext]]>
+    some more text
+    <node attr1="value1" attr2="value2" />
+    <node attr1="value2">
+        <innernode/>
+    </node>
+</mesh>
+<?include somedata?>
+\end{lstlisting}
+
+       \includegraphics[width=0.6\textwidth]{references/pugixmlDOM-dom_tree.png}
+       \caption{Tree representation of the above listing \cite{pugixmlDOM}}
+\end{figure}
+
+\section{An Algorithm For Shading of Regions on Vector Display Devices \cite{brassel1979analgorithm}}
+
+All modern display devices are raster based and therefore this paper is mainly of historical interest. It provides some references for shading on a raster display.
+
+The algorithm described will shade an arbitrary simply-connected polygon using one or two sets of parallel lines.
+
+The ``traditional'' method is:
+\begin{enumerate}
+       \item Start with a $N$ vertex polygon, rotate coords by the shading angle
+       \item Determine a bounding rectangle
+       \item For $M$ equally spaced parallel lines, compute the intersections with the boundaries of the polygon
+       \item Rotate coordinates back
+       \item Render the $M$ lines
+\end{enumerate}
+
+This is pretty much exactly how an artist would shade a pencil drawing. It is $O(M\times N)$.
+
+The algorithm in this paper does:
+\begin{enumerate}
+       \item Rotate polygon coords by shading angle
+       \item Subdivide the polygon into trapezoids (special case triangle)
+       \item Shade the trapezoids independently
+       \item Rotate it all back
+\end{enumerate}
+It is more complicated than it seems. The subdivision requires a sort to be performed on the vertices of the polygon based on their rotated $x$ and $y$ coordinates.
+
+\section{An Algorithm For Filling Regions on Graphics Display Devices \cite{lane1983analgorithm}}
+
+This gives an algorithm for for polygons (which may have ``holes'').
+It requires the ability to ``subtract'' fill from a region; this is (as far as I can tell) difficult for vector graphics devices but simple on raster graphics devices, so the paper claims it is oriented to the raster graphics devices.
+
+If the polygon is defined by $(x_i, y_i)$ then this algorithm iterates from $i = 2$ and alternates between filling and erasing the triangles $[(x_i, y_i), (x_{i+1}, y_{i+1}), (x_1, y_1)]$. It requires no sorting of the points.
+
+The paper provides a proof that the algorithm is correct and is ``optimal in the number of pixel updates required for convex polygons''.
+In the conclusion it is noted that trapezoids could be used from a fixed line and edge of the polygon, but this is not pixel optimal.
+
+This paper doesn't have a very high citation count but it is cited by the NVIDIA article \cite{kilgard2012gpu}.
+Apparently someone else adapted this algorithm for use with the stencil buffer.
+
+\section{GPU-accelerated path rendering \cite{kilgard2012gpu, kilgard300programming}}
+
+Vector graphics on the GPU; an NVIDIA extension. \cite{kilgard300programming} is the API.
+
+Motivations:
+\begin{itemize}
+       \item The focus has been on 3D acceleration in GPUs; most path rendering is done by the CPU.
+       \item Touch devices allow the screen to be transformed rapidly; CPU rastering of the path becomes inefficient
+       \begin{itemize}
+               \item The source of the ugly pixelated effects on a smartphone when scaling?
+       \end{itemize}
+       \item Especially when combined with increased resolution of these devices
+       \item Standards such as HTML5, SVG, etc, expose path rendering
+       \item Javascript is getting fast enough that we can't blame it anymore (the path rendering is the bottleneck not the JS)
+       \item GPU is more power efficient than the CPU
+\end{itemize}
+
+Results show the extension is faster than almost every renderer it was compared with for almost every test image.
+
+Comparisons to other attempts:
+\begin{itemize}
+       \item Cairo and Glitz \cite{nilsson2004glitz} (abandoned)
+\      \item Direct2D from Microsoft uses CPU to tesselate trapezoids and then renders these on the GPU
+       \item Skia in Android/Chrome uses CPU but now has Ganesh which is also hybrid CPU/GPU
+       \item Khronos Group created OpenVG\cite{rice2008openvg} with several companies creating hardware units to implement the standard. Performance is not as good as ``what we report''
+\end{itemize}
+
+
+\chapter{General Notes}
+
+\section{The DOM Model}
+
+A document is modelled as a tree. The root of the tree is the document. This has nodes of varying types. Some nodes may have children, attributes, and data.
+
+\section{Floating-Point Numbers\cite{HFP,goldberg1991whatevery,goldberg1992thedesign,priest1991algorithms}}
+
+A set of FP numbers is characterised by:
+\begin{enumerate}
+       \item Radix (base) $\beta \geq 2$
+       \item Precision %$p \req 2$ ``number of sig digits'' 
+       \item Two ``extremal`` exponents $e_min < 0 < e_max$ (generally, don't have to have the $0$ in there) 
+\end{enumerate}
+
+Numbers are represented by {\bf integers}: $(M, e)$ such that $x = M \times \beta^{e - p + 1}$
+
+Require: $|M| \leq \beta^{p}-1$ and $e_min \leq e \leq e_max$.
+
+Representations are not unique; set of equivelant representations is a cohort.
+
+$\beta^{e-p+1}$ is the quantum, $e-p+1$ is the quantum exponent.
+
+Alternate represetnation: $(s, m, e)$ such that $x = (-1)^s \times m \times \beta^{e}$
+$m$ is the significand, mantissa, or fractional part. Depending on what you read.
+
+
+
+
+
+\section{Rounding Errors}
+
+
+They happen. There is ULP and I don't mean a political party.
+
+TODO: Probably say something more insightful. Other than "here is a graph that shows errors and we blame rounding".
+
+\subsection{Results of calculatepi}
+
+We can calculate pi by numerically solving the integral:
+\begin{align*}
+       \int_0^1 \left(\frac{4}{1+x^2}\right) dx &= \pi
+\end{align*}
+
+Results with Simpson Method:
+\begin{figure}[H]
+       \centering
+       \includegraphics[width=0.8\textwidth]{figures/calculatepi.pdf}
+       \caption{Example of accumulated rounding errors in a numerical calculation}
+\end{figure}
+
+Tests with \verb/calculatepi/ show it's not quite as simple as just blindly replacing all your additions with Fast2Sum from Dekker\cite{dekker1971afloating}.
+ie: The graph looks exactly the same for single precision. \verb/calculatepi/ obviously also has multiplication ops in it which I didn't change. Will look at after sleep maybe.
+
+
  \pagebreak
  \bibliographystyle{unsrt}
  \bibliography{papers}
  \pagebreak
  \bibliographystyle{unsrt}
  \bibliography{papers}