Progress*

[ipdf/sam.git] / chapters / Background.tex
diff --git a/chapters/Background.tex b/chapters/Background.tex

index 700a8b9..dbb6f26 100644 (file)
--- a/chapters/Background.tex
+++ b/chapters/Background.tex
@@ -1,113 +1,65 @@
  \chapter{Literature Review}\label{Background}
  
-This chapter will \rephrase{review the literature. It will also include some figures created by us from our test programs to aid with conceptual understanding of the literature.} A paper by paper summary of the literature is also available at: \\ \url{http://szmoore.net/ipdf/documents/LiteratureNotes.pdf}.
+\rephrase{0. Here is a brilliant summary of the sections below}
  
-\rephrase{TODO: If I want to link to the Paper by Paper summary it will need a bit of rewriting}.
+This chapter provides an overview of relevant literature. The areas of interest can be broadly grouped into two largely separate categories; Documents and Number Representations.
  
-\rephrase{TODO: Actually (re)write this entire chapter}.
+The first half of this chapter will be devoted to documents themselves, including: the representation and displaying of graphics primitives\cite{computergraphics2}, and how collections of these primitives are represented in document formats, focusing on well known standards currently in use\cite{plrm, pdfref17, svg2011-1.1}.
  
-\rephrase{TODO: Un dot point ify}
+We will find that although there has been a great deal of research into the rendering, storing, editing, manipulation, and extension of document formats, these widely used document standards are content to specify at best a single precision IEEE-754 floating point number representations. 
  
-\rephrase{TODO: Citations}
+The research on arbitrary precision arithmetic applied to documents is very sparse; however arbitrary precision arithmetic itself is a very active field of research. Therefore, the second half of this chapter will be devoted to considering the IEEE-754 standard, its advantages and limitations, and possible alternative number representations to allow for arbitrary precision arithmetic.
  
-\rephrase{TODO: Make less terrible}
+In Chapter \ref{Progress}, we will discuss our findings so far with regards to arbitrary precision arithmetic applied to document formats, and expand upon the goals outlined in Chapture \ref{Proposal}.
  
  
-\rephrase{TODO: Reconsider sections (do I really want to make this go down to} \verb/\subsubsection/?)
+\pagebreak
  
-\rephrase{TODO: :-(}
+\section{Raster and Vector Images}\label{Raster and Vector Images}
+\input{chapters/Background_Raster-vs-Vector}
  
+\section{Rasterising Vector Images}\label{Rasterising Vector Images}
  
+Throughout Section \ref{vector-vs-raster-graphics} we were careful to refer to ``modern'' display devices, which are raster based. It is of some historical significance that vector display devices were popular during the 70s and 80s, and papers oriented towards drawing on these devices can be found\cite{brassel1979analgorithm}. Whilst curves can be drawn at high resolution on vector displays, a major disadvantage was shading; by the early 90s the vast majority of computer displays were raster based\cite{computergraphics2}.
  
-\section{Vector Graphics vs Raster Graphics}
+Hearn and Baker's textbook ``Computer Graphics''\cite{computergraphics2} gives a comprehensive overview of graphics from physical display technologies through fundamental drawing algorithms to popular graphics APIs. This section will examine algorithms for drawing two dimensional geometric primitives on raster displays as discussed in ``Computer Graphics'' and the relevant literature. Informal tutorials are abundant on the internet\cite{elias2000graphics}.
  
-\newlength\imageheight
-\newlength\imagewidth
-\settoheight\imageheight{\includegraphics{figures/fox-raster.png}}
-\settowidth\imagewidth{\includegraphics{figures/fox-raster.png}}
+\subsection{Straight Lines}\label{Straight Lines}
+\input{chapters/Background_Lines}
  
-%Height: \the\imageheight
-%Width: \the\imagewidth
+\subsection{Spline Curves}\label{Spline Curves}
  
-\rephrase{TODO: Distinguish between Raster Formats and the Rasterisation of an image (which may or may not be in a raster format)}
+Splines are continuous curves formed from piecewise polynomial segments. A polynomial of $n$th degree is defined by $n$ constants $\{a_0, a_1, ... a_n\}$ and:
+\begin{align}
+       y(x) &= \displaystyle\sum_{k=0}^n a_k x^k
+\end{align}
  
  
-\subsubsection{Raster Graphics}
-\begin{itemize}
-       \item Bitmap --- array of colour information for pixels
-       \item Exact pixels in a similar format to how they would appear on a (modern) display device.
-       \begin{itemize}
-               \item Also similar to how they would be stored by a camera or scanner
-               \item \rephrase{Is it misleading to say 2D array? Pixels are actually stored in a 1D array, but conceptually it's nicer to say 2D}
-               \item \rephrase{For that matter, should it described as 3D (3rd dimension = colour)?}
-       \end{itemize}
-       \item Lowest level representation of a document
-       \item Issues with scaling; values of extra pixels must be calculated
-       \item Not convenient to edit; ill suited to text
-\end{itemize}
-
-
-\subsubsection{Vector Graphics}
-\begin{itemize}
-       \item Stores relative position of primitives - scales better
-       \item In particular, \emph{edges} of lines can be zoomed without becomming jagged; sometimes (somewhat misleadingly) described as ``infinitely sharp''
-       \item Vector Graphics must be rasterised before being drawn on most display devices.
-       \item Still can't scale forever due to use of fixed size floats
-\end{itemize}
-
-\subsubsection{Resolution and Raster Graphics}
-\begin{itemize}
-       \item DPI = dots (pixels) per inch differs per display device - a rastered image looks different on different display devices
-       \item PostScript/PDF use 72 points per inch; this means a rasterised image will look the same in all pdf viewers regardless of the display.
-       \item Tex uses 72.27 points per inch (?)
-       \item The vector image was rastered at 96 points per inch
-       \item Hence, have to scale by 72.27/96 = 0.7528125 to get the vector and rastered version to look exactly the same in the pdf
-\end{itemize}
-
+A straight line is simply a polynomial of $0$th degree. Splines may be rasterised by sampling of $y(x)$ at a number of points $x_i$ and rendering straight lines between  $(x_i, y_i)$ and $(x_{i+1}, y_{i+1})$ as discussed in Section \ref{Straight Lines}. More direct algorithms for drawing splines based upon Brasenham and Wu's algorithms also exist\cite{citationneeded}.
  
+There are many different ways to define a spline. One approach is to specify ``knots'' on the spline and solve for the cooefficients to generate a cubic spline ($n = 3$) passing through the points. Beziers are a popular spline which can be created in GUI based graphics editors using several ``control points'' which themselves are not part of the curve.
  
-\begin{figure}[H]
-       \centering
+\subsubsection{Bezier Curves}
+\input{chapters/Background_Bezier}
  
-       \includegraphics[scale=0.7528125]{figures/fox-vector.pdf}
-       \includegraphics[scale=0.7528125]{figures/fox-raster.png}
-       \includegraphics[scale=0.7528125, viewport=210 85 280 150,clip, width=0.45\textwidth]{figures/fox-vector.pdf}
-       \includegraphics[scale=0.7528125, viewport=0 85 70 150,clip, width=0.45\textwidth]{figures/fox-raster.png}
-       \caption{Scaling of vector and Raster Graphics}\label{vector-vs-raster}
-\end{figure}
+\subsection{Shading}
  
+Algorithms for shading on vector displays involved drawing equally spaced lines within a region; this is limited both in the complexity of shading and the performance required to compute the lines\cite{brassel1979analgorithm}.
  
+On raster displays, shading is typically based upon Lane's algorithm of 1983\cite{lane1983analgorithm} which is implemented in the GPU  \cite{kilgard2012gpu}
  
-Figure \ref{vector-vs-raster} shows a vector image (left) which has been rasterised (right). At the original scale the two foxes should appear to be mirror images\footnote{If I've worked out the scaling to account for dpi differences between inkscape and latex/pdf correctly}. When the scale is increased, the edges of the vector image remain sharp, whilst the raster image begins to appear jagged. PDF viewers will typically use antialiasing to smooth the edges of a scaled bitmap, causing the image to appear blurred.\footnote{In the Atril Document Viewer 1.6.0 this image will only be antialiased at zoom levels $\leq 125\%$}.
+\rephrase{6. Sort of starts here... or at least background does}
  
-Various ways to end this section:
-\begin{enumerate}
-       \item \rephrase{It should be obvious that documents containing text must use the vector graphics format, and so the remainder of this chapter will concentrate on the latter}.
-       \item \rephrase{As can be seen in Figure \ref{fox}, if we were to decide to pursue ``infinite precision'' in raster graphics we would be shooting ourselves in both feet and then the face before we even started. The rest of this chapter will concentrate on vector graphics.}
-       \item \rephrase{You can't have infinite precision in raster graphics by definition, therefore we no longer care about them in this report.}
-       \item \rephrase{This report being in a vector format is a clue that we only care about vector formats}.
-\end{enumerate}
+\subsection{Rendering Vector Graphics on the GPU}
  
-\section{Primitives in Vector Graphics Formats (and how they are Rendered)}
+Traditionally, vector graphics have been rasterized by the CPU before being sent to the GPU for drawing\cite{kilgard2012gpu}. Lots of people would like to change this \cite{worth2003xr, loop2007rendering, rice2008openvg, kilgard2012gpu, green2007improved} ... \rephrase{All of these are things David found except kilgard which I thought I found and then realised David already had it :S}
  
-\subsection{Bezier Curves}
-\rephrase{I did an ipython notebook on this in February, but I forgot all of it}
-
-\subsection{Text}
-Text is just Bezier Curves
-
-\subsection{Shapes}
-Shapes are just bezier curves joined together.
-
-\subsection{Other Things}
-We don't really care about other things (ie: Colour gradients etc) in this report.
+\rephrase{2. Here are the ways documents are structured ... we got here eventually}
  
  \section{Document Representations}
  
  \rephrase{The file format can be either human readable\footnote{For some definition of human and some definition of readable} or binary\footnote{So, our viewer is basically a DOM style but stored in a binary format}. Can also be compressed or not. Here we are interested in how the document is interpreted or traversed in order to produce graphics output.}
  
-
-
  \subsection{Interpreted Model}
  
  \rephrase{Did I just invent that terminology or did I read it in a paper? Is there actually existing terminology for this that sounds similar enough to ``Document Object Model'' for me to compare them side by side?}
@@ -145,6 +97,7 @@ We don't really care about other things (ie: Colour gradients etc) in this repor
         \item XML (SGML) is the standard language used to represent documents in the DOM
         \item XML is plain text
         \item SVG is a standard for a vector graphics language conforming to XML (ie: a DOM format)
+       \item CSS style sheets represent more complicated styling on objects in the DOM
  \end{itemize}
  
  \subsection{Blurring the Line --- Javascript}
@@ -167,35 +120,42 @@ We don't really care about other things (ie: Colour gradients etc) in this repor
         \end{itemize}
  \end{itemize}
  
+\subsection{Why do we still use static PDFs}
  
-
-
-\section{Precision Limitations of Modern Documents}
-
-\rephrase{All this is very interesting and provides important context, but it is not actually directly related to the problem of infinite precision which we are going to try and solve}.
-
-\subsection{Limitations Imposed By Standards}
+Despite their limitations, we still use static, boring old PDFs. Particularly in scientific communication.
  \begin{itemize}
-       \item Implementations of PostScript and PDF must by definition restrict themselves to IEEE binary32 ``single precision'' floating point number representations in order to conform to the standards\cite{plrm, pdfref17}.
-       \item Implementations of SVG are by definition required to use IEEE binary32 as a {\bf minimum}. ``High Quality'' SVG viewers are required to use at least IEEE binary64.\cite{svg2011-1.1}
+       \item They are portable; you can write an amazing document in Mathematica/Matlab but it 
+       \item Scientific journals would need to adapt to other formats and this is not worth the effort
+       \item No network connection is required to view a PDF (although DRM might change this?)
+       \item All rescources are stored in a single file; a website is stored accross many separate files (call this a ``distributed'' document format?)
+       \item You can create PDFs easily using desktop processing WYSIWYG editors; WYSIWYG editors for web based documents are worthless due to the more complex content
+       \item Until Javascript becomes part of the PDF standard, including Javascript in PDF documents will not become widespread
+       \item Once you complicate a PDF by adding Javascript, it becomes more complicated to create; it is simply easier to use a series of static figures than to embed a shader in your document. Even for people that know WebGL.
  \end{itemize}
  
+\rephrase{3. Here are the ways document standards specify precision (or don't)}
+
+\section{Precision in Modern Document Formats}
+
+\rephrase{All the above is very interesting and provides important context, but it is not actually directly related to the problem of infinite precision which we are going to try and solve. Sorry to make you read it all.}
  
-\subsection{Limitations Imposed By Graphics APIs and/or GPUs}
  
-It's not really the standard's fault (although they could specify double); they specify IEEE because underlying hardware must use IEEE.
  
  \begin{itemize}
-       \item Internal representations are GPU dependent\cite{hillesland2004paranoia}
-       \item OpenGL standards specify: binary16, binary32, binary64
-       \item OpenVG aims to become a standard API for SVG viewers but it uses binary32 and may be {\bf worse} internally\cite{rice2008openvg}
+       \item Implementations of PostScript and PDF must by definition restrict themselves to IEEE binary32 ``single precision''\footnote{The original IEEE-754 defined single, double and extended precisions; in the revision these were renamed to binary32, binary64 and binary128 to explicitly state the base and number of bits}
+ floating point number representations in order to conform to the standards\cite{plrm, pdfref17}.
+       \item Implementations of SVG are by definition required to use IEEE binary32 as a {\bf minimum}. ``High Quality'' SVG viewers are required to use at least IEEE binary64.\cite{svg2011-1.1}
+       \item Numerical computation packages such as Mathematica and Maple use arbitrary precision floats
+       \begin{itemize}
+               \item Mathematica is not open source which is an issue when publishing scientific research (because people who do not fork out money for Mathematica cannot verify results)
+               \item What about Maple? \cite{HFP} and \cite{fousse2007mpfr} both mention it being buggy. 
+               \item Octave and Matlab use fixed precision doubles
+       \end{itemize}
  \end{itemize}
  
-\subsection{Limitations Imposed By CPU}
-
-Even if we don't use the GPU, CPU's are restricted in their representation of floating point numbers by the IEEE standard.
+The use of IEEE binary32 floats in the PostScript and PDF standards is not surprising if we consider that these documents are oriented towards representing static pages. They don't actually need higher precision to do this; 32 bits is more than sufficient for A4 paper.
  
-\rephrase{AND THUS WE FINALLY GET TO THE POINT}
+\rephrase{4. Here is IEEE-754 which is what these standards use}
  
  \section{Representation of Numbers}
  
@@ -215,32 +175,54 @@ $B = 2$, although IEEE also defines decimal representations for $B = 10$ --- the
  
  \rephrase{Aside: Are decimal representations for a document format eg: CAD also useful because you can then use metric coordinate systems?}
  
+
  \subsubsection{Precision}
  
  The floats map an infinite set of real numbers onto a discrete set of representations.
  
+
+
  \rephrase{Figure: 8 bit ``minifloats'' (all 255 of them) clearly showing the ``precision vs range'' issue}
  
  The most a result can be rounded in conversion to a floating point number is the units in last place; $m_{N} \times B^{e}$.
  
  \rephrase{Even though that paper that claims double is the best you will ever need because the error can be as much as the size of a bacterium relative to the distance to the moon}\cite{} \rephrase{there are many cases where increased number of bits will not save you}.\cite{HFP}
  
-\subsection{Examples of Precision Related Errors in Floating Point Arithmetic}
  
-\subsection{Relate This to the Sorts of Maths Done By Document Formats}
+\rephrase{5. Here are limitations of IEEE-754 floating point numbers on compatible hardware}
+
+\subsection{Limitations Imposed By CPU}
+
+CPU's are restricted in their representation of floating point numbers by the IEEE standard.
+
+
+
+\subsection{Limitations Imposed By Graphics APIs and/or GPUs}
+
+Traditionally algorithms for drawing vector graphics are performed on the CPU; the image is rasterised and then sent to the GPU for rendering\cite{}. Recently there has been a great deal of literature relating to implementation of algorithms such as bezier curve rendering\cite{} or shading\cite{} on the GPU. As it seems the trend is to move towards GPU 
  
-\subsection{Techniques for Arbitrary Precision Arithmetic}
+\rephrase{6. Here are ways GPU might not be IEEE-754 --- This goes *somewhere* in here but not sure yet}
  
  \begin{itemize}
-       \item Fast2SUM for summation (and multiplication).
-       \item Guard digits.
-       \item Other techniques
-       \item Hardware techniques that improve speed (which may be beneficial because you can get away with higher precision in hardware)
-       \item Anything you can do in hardware you can do in software but it will be slower and have more segmentation faults
+       \item Internal representations are GPU dependent and may not match IEEE\cite{hillesland2004paranoia}
+       \item OpenGL standards specify: binary16, binary32, binary64
+       \item OpenVG aims to become a standard API for SVG viewers but the API only uses binary32 and hardware implementations may use less than this internally\cite{rice2008openvg}
+       \item It seems that IEEE has not been entirely successful; although all modern CPUs and GPUs are able to read and write IEEE floating point types, many do not conform to the IEEE standard in how they represent floating point numbers internally. 
  \end{itemize}
  
+
+
+\rephrase{7. Sod all that, let's just use an arbitrary precision library (AND THUS WE FINALLY GET TO THE POINT)}
+
  \subsection{Alternate Number Representations}
  
  \rephrase{They exist\cite{HFP}}.
  
+Do it all using MFPR\cite{}, she'll be right.
+
+\rephrase{8. Here is a brilliant summary of sections 7- above}
+
+Dear reader, thankyou for your persistance in reading this mangled excuse for a Literature Review.
+Hopefully we have brought together the radically different areas of interest together in some sort of coherant fashion.
+In the next chapter we will talk about how we have succeeded in rendering a rectangle. It will be fun. I am looking forward to it.