X-Git-Url: https://git.ucc.asn.au/?p=ipdf%2Fdocuments.git;a=blobdiff_plain;f=LiteratureNotes.tex;h=0e423ca18a466348619618ca1e756123724fb3d3;hp=fd86b5a86b5f25a4e9b3eae8ab60b5649b716122;hb=e51aef2036ed3b6c219029a19b485ca0828b4828;hpb=cb7ac26fb36b428b17d76cb220fa1b4edb764abb diff --git a/LiteratureNotes.tex b/LiteratureNotes.tex index fd86b5a..0e423ca 100644 --- a/LiteratureNotes.tex +++ b/LiteratureNotes.tex @@ -1,11 +1,12 @@ -\documentclass[8pt]{extarticle} +\documentclass[8pt]{extreport} \usepackage{graphicx} \usepackage{caption} \usepackage{amsmath} % needed for math align \usepackage{bm} % needed for maths bold face \usepackage{graphicx} % needed for including graphics e.g. EPS, PS \usepackage{fancyhdr} % needed for header - +%\usepackage{epstopdf} % Converts eps to pdf before including. Do it manually instead. +\usepackage{float} \usepackage{hyperref} \topmargin -1.5cm % read Lamport p.163 @@ -45,9 +46,8 @@ \lstset{showstringspaces=false} \lstset{basicstyle=\small} - - - +\newcommand{\shell}[1]{\texttt{#1}} +\newcommand{\code}[1]{\texttt{#1}} \begin{document} @@ -69,21 +69,31 @@ \tableofcontents -\section{Postscript Language Reference Manual\cite{plrm}} +\chapter{Literature Summaries} + +\section{Postscript Language Reference Manual \cite{plrm}} Adobe's official reference manual for PostScript. It is big. -\section{Portable Document Format Reference Manual\cite{pdfref17}} +\section{Portable Document Format Reference Manual \cite{pdfref17}} Adobe's official reference for PDF. It is also big. +\section{IEEE Standard for Floating-Point Arithmetic \cite{ieee2008-754}} + +The IEEE (revised) 754 standard. + +It is also big. + + + \pagebreak -\section{Portable Document Format (PDF) --- Finally...\cite{cheng2002portable}} +\section{Portable Document Format (PDF) --- Finally... \cite{cheng2002portable}} This is not spectacularly useful, is basically an advertisement for Adobe software. @@ -116,7 +126,7 @@ This is not spectacularly useful, is basically an advertisement for Adobe softwa \end{itemize} \pagebreak -\section{Pixels or Perish \cite{hayes2012pixels}} +\section{Pixels or Perish \cite{hayes2012pixels}} ``The art of scientific illustration will have to adapt to the new age of online publishing'' And therefore, JavaScript libraries ($\text{D}^3$) are the future. @@ -181,16 +191,439 @@ This paper uses Metaphors a lot. I never met a phor that didn't over extend itse \end{itemize} -\section{Embedding and Publishing Interactive, 3D Figures in PDF Files\cite{barnes2013embedding}} +\section{Embedding and Publishing Interactive, 3D Figures in PDF Files \cite{barnes2013embedding}} \begin{itemize} \item Linkes well with \cite{hayes2012pixels}; I heard you liked figures so I put a figure in your PDF \item Title pretty much summarises it; similar to \cite{hayes2012pixels} except these guys actually did something practical \end{itemize} +\section{27 Bits are not enough for 8 digit accuracy \cite{goldberg1967twentyseven}} + +Proves with maths, that rounding errors mean that you need at least $q$ bits for $p$ decimal digits. $10^p < 2^{q-1}$ + +\begin{itemize} + \item Eg: For 8 decimal digits, since $10^8 < 2^{27}$ would expect to be able to represent with 27 binary digits + \item But: Integer part requires digits bits (regardless of fixed or floating point represenetation) + \item Trade-off between precision and range + \begin{itemize} + \item 9000000.0 $\to$ 9999999.9 needs 24 digits for the integer part $2^{23} = 83886008$ + \end{itemize} + \item Floating point zero = smallest possible machine exponent + \item Floating point representation: + \begin{align*} + y &= 0.y_1 y_2 \text{...} y_q \times 2^{n} + \end{align*} + \item Can eliminate a bit by considering whether $n = -e$ for $-e$ the smallest machine exponent (???) + \begin{itemize} + \item Get very small numbers with the same precision + \item Get large numbers with the extra bit of precision + \end{itemize} +\end{itemize} + +\section{What every computer scientist should know about floating-point arithmetic \cite{goldberg1991whatevery}} + +\begin{itemize} + \item Book: \emph{Floating Point Computation} by Pat Sterbenz (out of print... in 1991) + \item IEEE floating point standard becoming popular (introduced in 1987, this is 1991) + \begin{itemize} + \item As well as structure, defines the algorithms for addition, multiplication, division and square root + \item Makes things portable because results of operations are the same on all machines (following the standard) + \item Alternatives to floating point: Floating slasi and Signed Logarithm (TODO: Look at these, although they will probably not be useful) + + \end{itemize} + \item Base $\beta$ and precision $p$ (number of digits to represent with) - powers of the base can be represented exactly. + \item Largest and smallest exponents $e_{min}$ and $e_{max}$ + \item Need bits for exponent and fraction, plus one for sign + \item ``Floating point number'' is one that can be represented exactly. + \item Representations are not unique! $0.01 \times 10^1 = 1.00 \times 10^{-1}$ Leading digit of one $\implies$ ``normalised'' + \item Requiring the representation to be normalised makes it unique, {\bf but means it is impossible to represent zero}. + \begin{itemize} + \item Represent zero as $1 \times \beta^{e_{min}-1}$ - requires extra bit in the exponent + \end{itemize} + \item {\bf Rounding Error} + \begin{itemize} + \item ``Units in the last place'' eg: 0.0314159 compared to 0.0314 has ulp error of 0.159 + \item If calculation is the nearest floating point number to the result, it will still be as much as 1/2 ulp in error + \item Relative error corresponding to 1/2 ulp can vary by a factor of $\beta$ ``wobble''. Written in terms of $\epsilon$ + \item Maths $\implies$ {\bf Relative error is always bounded by $\epsilon = (\beta/2)\beta^{-p}$} + \item Fixed relative error $\implies$ ulp can vary by a factor of $\beta$ . Vice versa + \item Larger $\beta \implies$ larger errors + \end{itemize} + \item {\bf Guard Digits} + \begin{itemize} + \item In subtraction: Could compute exact difference and then round; this is expensive + \item Keep fixed number of digits but shift operand right; discard precision. Lead to relative error up to $\beta - 1$ + \item Guard digit: Add extra digits before truncating. Leads to relative error of less than $2\epsilon$. This also applies to addition + \end{itemize} + \item {\bf Catastrophic Cancellation} - Operands are subject to rounding errors - multiplication + \item {\bf Benign Cancellation} - Subtractions. Error $< 2\epsilon$ + \item Rearrange formula to avoid catastrophic cancellation + \item Historical interest only - speculation on why IBM used $\beta = 16$ for the system/370 - increased range? Avoids shifting + \item Precision: IEEE defines extended precision (a lower bound only) + \item Discussion of the IEEE standard for operations (TODO: Go over in more detail) + \item NaN allow continuing with underflow and Infinity with overflow + \item ``Incidentally, some people think that the solution to such anomalies is never to compare floating-point numbers for equality but instead to consider them equal if they are within some error bound E. This is hardly a cure all, because it raises as many questions as it answers.'' - On equality of floating point numbers + +\end{itemize} + + +%%%% +% David's Stuff +%%%% +\section{Compositing Digital Images \cite{porter1984compositing}} + + + +Perter and Duff's classic paper "Compositing Digital Images" lays the +foundation for digital compositing today. By providing an "alpha channel," +images of arbitrary shapes — and images with soft edges or sub-pixel coverage +information — can be overlayed digitally, allowing separate objects to be +rasterized separately without a loss in quality. + +Pixels in digital images are usually represented as 3-tuples containing +(red component, green component, blue component). Nominally these values are in +the [0-1] range. In the Porter-Duff paper, pixels are stored as $(R,G,B,\alpha)$ +4-tuples, where alpha is the fractional coverage of each pixel. If the image +only covers half of a given pixel, for example, its alpha value would be 0.5. + +To improve compositing performance, albeit at a possible loss of precision in +some implementations, the red, green and blue channels are premultiplied by the +alpha channel. This also simplifies the resulting arithmetic by having the +colour channels and alpha channels use the same compositing equations. + +Several binary compositing operations are defined: +\begin{itemize} +\item over +\item in +\item out +\item atop +\item xor +\item plus +\end{itemize} + +The paper further provides some additional operations for implementing fades and +dissolves, as well as for changing the opacity of individual elements in a +scene. + +The method outlined in this paper is still the standard system for compositing +and is implemented almost exactly by modern graphics APIs such as \texttt{OpenGL}. It is +all but guaranteed that this is the method we will be using for compositing +document elements in our project. + +\section{Bresenham's Algorithm: Algorithm for computer control of a digital plotter \cite{bresenham1965algorithm}} +Bresenham's line drawing algorithm is a fast, high quality line rasterization +algorithm which is still the basis for most (aliased) line drawing today. The +paper, while originally written to describe how to control a particular plotter, +is uniquely suited to rasterizing lines for display on a pixel grid. + +Lines drawn with Bresenham's algorithm must begin and end at integer pixel +coordinates, though one can round or truncate the fractional part. In order to +avoid multiplication or division in the algorithm's inner loop, + +The algorithm works by scanning along the long axis of the line, moving along +the short axis when the error along that axis exceeds 0.5px. Because error +accumulates linearly, this can be achieved by simply adding the per-pixel +error (equal to (short axis/long axis)) until it exceeds 0.5, then incrementing +the position along the short axis and subtracting 1 from the error accumulator. + +As this requires nothing but addition, it is very fast, particularly on the +older CPUs used in Bresenham's time. Modern graphics systems will often use Wu's +line-drawing algorithm instead, as it produces antialiased lines, taking +sub-pixel coverage into account. Bresenham himself extended this algorithm to +produce Bresenham's circle algorithm. The principles behind the algorithm have +also been used to rasterize other shapes, including B\'{e}zier curves. + +\section{Quad Trees: A Data Structure for Retrieval on Composite Keys \cite{finkel1974quad}} + +This paper introduces the ``quadtree'' spatial data structure. The quadtree structure is +a search tree in which every node has four children representing the north-east, north-west, +south-east and south-west quadrants of its space. + +\section{Xr: Cross-device Rendering for Vector Graphics \cite{worth2003xr}} + +Xr (now known as Cairo) is an implementation of the PDF v1.4 rendering model, +independent of the PDF or PostScript file formats, and is now widely used +as a rendering API. In this paper, Worth and Packard describe the PDF v1.4 rendering +model, and their PostScript-derived API for it. + +The PDF v1.4 rendering model is based on the original PostScript model, based around +a set of \emph{paths} (and other objects, such as raster images) each made up of lines +and B\'{e}zier curves, which are transformed by the ``Current Transformation Matrix.'' +Paths can be \emph{filled} in a number of ways, allowing for different handling of self-intersecting +paths, or can have their outlines \emph{stroked}. +Furthermore, paths can be painted with RGB colours and/or patterns derived from either +previously rendered objects or external raster images. +PDF v1.4 extends this to provide, amongst other features, support for layering paths and +objects using Porter-Duff compositing\cite{porter1984compositing}, giving each painted path +the option of having an $\alpha$ value and a choice of any of the Porter-Duff compositing +methods. + +The Cairo library approximates the rendering of some objects (particularly curved objects +such as splines) with a set of polygons. An \texttt{XrSetTolerance} function allows the user +of the library to set an upper bound on the approximation error in fractions of device pixels, +providing a trade-off between rendering quality and performance. The library developers found +that setting the tolerance to greater than $0.1$ device pixels resulted in errors visible to the +user. + +\section{Glitz: Hardware Accelerated Image Compositing using OpenGL \cite{nilsson2004glitz}} + +This paper describes the implementation of an \texttt{OpenGL} based rendering backend for +the \texttt{Cairo} library. + +The paper describes how OpenGL's Porter-Duff compositing is easily suited to the Cairo/PDF v1.4 +rendering model. Similarly, traditional OpenGL (pre-version 3.0 core) support a matrix stack +of the same form as Cairo. + +The ``Glitz'' backend will emulate support for tiled, non-power-of-two patterns/textures if +the hardware does not support it. + +Glitz can render both triangles and trapezoids (which are formed from pairs of triangles). +However, it cannot guarantee that the rasterization is pixel-precise, as OpenGL does not proveide +this consistently. + +Glitz also supports multi-sample anti-aliasing, convolution filters for raster image reads (implemented +with shaders). + +Performance was much improved over the software rasterization and over XRender accellerated rendering +on all except nVidia hardware. However, nVidia's XRender implementation did slow down significantly when +some transformations were applied. + +%% Sam again + +\section{Boost Multiprecision Library \cite{boost_multiprecision}} + +\begin{itemize} + \item ``The Multiprecision Library provides integer, rational and floating-point types in C++ that have more range and precision than C++'s ordinary built-in types.'' + \item Specify number of digits for precision as a template argument. + \item Precision is fixed... {\bf possible approach to project:} Use \verb/boost::mpf_float/ and increase \verb/N/ as more precision is required? +\end{itemize} + + +% Some hardware related sounding stuff... + +\section{A CMOS Floating Point Unit \cite{kelley1997acmos}} + +The paper describes the implentation of a FPU for PowerPC using a particular Hewlett Packard process (HP14B 0.5$\mu$m, 3M, 3.3V). +It implements a ``subset of the most commonly used double precision floating point instructions''. The unimplemented operations are compiled for the CPU. + +The paper gives a description of the architecture and design methods. +This appears to be an entry to a student design competition. + +Standard is IEEE 754, but the multiplier tree is a 64-bit tree instead of a 54 bit tree. +`` The primary reason for implementing a larger tree is for future additions of SIMD [Single Instruction Multiple Data (?)] instructions similar to Intel's MMX and Sun's VIS instructions''. + +HSPICE simulations used to determine transistor sizing. + +Paper has a block diagram that sort of vaguely makes sense to me. +The rest requires more background knowledge. + +\section{Simply FPU\cite{filiatreault2003simply}} + +This is a webpage at one degree of seperation from wikipedia. + +It talks about FPU internals, but mostly focuses on the instruction sets. +It includes FPU assembly code examples (!) + +It is probably not that useful, I don't think we'll end up writing FPU assembly? + +FPU's typically have 80 bit registers so they can support REAL4, REAL8 and REAL10 (single, double, extended precision). + + +\section{Floating Point Package User's Guide \cite{bishop2008floating}} + +This is a technical report describing floating point VHDL packages \url{http://www.vhdl.org/fphdl/vhdl.html} + +In theory I know VHDL (cough) so I am interested in looking at this further to see how FPU hardware works. +It might be getting a bit sidetracked from the ``document formats'' scope though. + +The report does talk briefly about the IEEE standard and normalised / denormalised numbers as well. + +See also: Java Optimized Processor\cite{jop} (it has a VHDL implementation of a FPU). + +\section{Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy\cite{dieter2007lowcost}} + +Mentions how GPUs offer very good floating point performance but only for single precision floats. (NOTE: This statement seems to contradict \cite{hillesland2004paranoia}. + +Has a diagram of a Floating Point adder. + +Talks about some magical technique called "Native-pair Arithmetic" that somehow makes 32-bit floating point accuracy ``competitive'' with 64-bit floating point numbers. + +\section{Accurate Floating Point Arithmetic through Hardware Error-Free Transformations \cite{kadric2013accurate}} + +From the abstract: ``This paper presents a hardware approach to performing ac- +curate floating point addition and multiplication using the idea of error- +free transformations. Specialized iterative algorithms are implemented +for computing arbitrarily accurate sums and dot products.'' + +The references for this look useful. + +It also mentions VHDL. + +So whenever hardware papers come up, VHDL gets involved... +I guess it's time to try and work out how to use the Opensource VHDL implementations. + +This is about reduction of error in hardware operations rather than the precision or range of floats. +But it is probably still relevant. + +This has the Fast2Sum algorithm but for the love of god I cannot see how you can compute anything other than $a + b = 0 \forall a,b$ using the algorithm as written in their paper. It references Dekker\cite{dekker1971afloating} and Kahn; will look at them instead. + +\section{Floating Point Unit from JOP \cite{jop}} + +This is a 32 bit floating point unit developed for JOP in VHDL. +I have been able to successfully compile it and the test program using GHDL\cite{ghdl}. + +Whilst there are constants (eg: \verb/FP_WIDTH = 32, EXP_WIDTH = 8, FRAC_WIDTH = 23/) defined, the actual implementation mostly uses magic numbers, so +some investigation is needed into what, for example, the "52" bits used in the sqrt units are for. + +\section{GHDL \cite{ghdl}} + +GHDL is an open source GPL licensed VHDL compiler written in Ada. It had packages in debian up until wheezy when it was removed. However the sourceforge site still provides a \shell{deb} file for wheezy. + +This reference explains how to use the \shell{ghdl} compiler, but not the VHDL language itself. + +GHDL is capable of compiling a ``testbench'' - essentially an executable which simulates the design and ensures it meets test conditions. +A common technique is using a text file to provide the inputs/outputs of the test. The testbench executable can be supplied an argument to save a \shell{vcd} file which can be viewed in \shell{gtkwave} to see timing diagrams. + +Sam has successfully compiled the VHDL design for an FPU in JOP\cite{jop} into a ``testbench'' executable which uses standard i/o instead of a regular file. +Using unix domain sockets we can execute the FPU as a child process and communicate with it from our document viewing test software. This means we can potentially simulate alternate hardware designs for FPUs and witness the effect they will have on precision in the document viewer. + +Using \shell{ghdl} the testbench can also be linked as part a C/C++ program and run using a function; however there is still no way to communicate with it other than forking a child process and using a unix domain socket anyway. Also, compiling the VHDL FPU as part of our document viewer would clutter the code repository and probably be highly unportable. The VHDL FPU has been given a seperate repository. + +\section{On the design of fast IEEE floating-point adders \cite{seidel2001onthe}} + +This paper gives an overview of the ``Naive'' floating point addition/subtraction algorithm and gives several optimisation techniques: + +TODO: Actually understand these... + +\begin{itemize} + \item Use parallel paths (based on exponent) + \item Unification of significand result ranges + \item Reduction of IEEE rounding modes + \item Sign-magnitude computation of a difference + \item Compound Addition + \item Approximate counting of leading zeroes + \item Pre-computation of post-normalization shift +\end{itemize} + +They then give an implementation that uses these optimisation techniques including very scary looking block diagrams. + +They simulated the FPU. Does not mention what simulation method was used directly, but cites another paper (TODO: Look at this. I bet it was VHDL). + +The paper concludes by summarising the optimisation techniques used by various adders in production (in 2001). + +This paper does not specifically mention the precision of the operations, but may be useful because a faster adder design might mean you can increase the precision. + +\section{Re: round32 ( round64 ( X ) ) ?= round32 ( X ) \cite{beebe2011round32}} + +I included this just for the quote by Nelson H. F. Beebe: + +``It is too late now to repair the mistakes of the past that are present +in millions of installed systems, but it is good to know that careful +research before designing hardware can be helpful.'' + +This is in regards to the problem of double rounding. It provides a reference for a paper that discusses a rounding mode that eliminates the problem, and a software implementation. + +It shows that the IEEE standard can be fallible! + +Not sure how to work this into our literature review though. + +% Back to software +\section{Basic Issues in Floating Point Arithmetic and Error Analysis \cite{demmel1996basic}} + +These are lecture notes from U.C Berkelye CS267 in 1996. + + +\section{Charles Babbage \cite{dodge_babbage, nature1871babbage}} + +Tributes to Charles Babbage. Might be interesting for historical background. Don't mention anything about floating point numbers though. + +\section{GPU Floating-Point Paranoia \cite{hillesland2004paranoia}} + +This paper discusses floating point representations on GPUs. They have reproduced the program \emph{Paranoia} by William Kahan for characterising floating point behaviour of computers (pre IEEE) for GPUs. + + +There are a few remarks about GPU vendors not being very open about what they do or do not do with + + +Unfortunately we only have the extended abstract, but a pretty good summary of the paper (written by the authors) is at: \url{www.cs.unc.edu/~ibr/projects/paranoia/} + +From the abstract: + +``... [GPUs are often similar to IEEE] However, we have found +that GPUs do not adhere to IEEE standards for floating-point op- +erations, nor do they give the information necessary to establish +bounds on error for these operations ... '' + +and ``...Our goal is to determine the error bounds on floating-point op- +eration results for quickly evolving graphics systems. We have cre- +ated a tool to measure the error for four basic floating-point opera- +tions: addition, subtraction, multiplication and division.'' + +The implementation is only for windows and uses glut and glew and things. +Implement our own version? + +\section{A floating-point technique for extending the available precision \cite{dekker1971afloating}} + +This is Dekker's formalisation of the Fast2Sum algorithm originally implemented by Kahn. + +\begin{align*} + z &= \text{RN}(x + y) \\ + w &= \text{RN}(z - x) \\ + zz &= \text{RN}(y - w) \\ + \implies z + zz &= x + y +\end{align*} + +There is a version for multiplication. + +I'm still not quite sure when this is useful. I haven't been able to find an example for $x$ and $y$ where $x + y \neq \text{Fast2Sum}(x, y)$. + +\section{Handbook of Floating-Point Arithmetic \cite{HFP}} + +This book is amazingly useful and pretty much says everything there is to know about Floating Point numbers. +It is much easier to read than Goldberg or Priest's papers. + +I'm going to start working through it and compile their test programs. + +\chapter{General Notes} + +\section{Rounding Errors} + +They happen. There is ULP and I don't mean a political party. + +TODO: Probably say something more insightful. Other than "here is a graph that shows errors and we blame rounding". + +\subsection{Results of calculatepi} + +We can calculate pi by numerically solving the integral: +\begin{align*} + \int_0^1 \left(\frac{4}{1+x^2}\right) dx &= \pi +\end{align*} + +Results with Simpson Method: +\begin{figure}[H] + \centering + \includegraphics[width=0.8\textwidth]{figures/calculatepi.pdf} + \caption{Example of accumulated rounding errors in a numerical calculation} +\end{figure} + +Tests with \verb/calculatepi/ show it's not quite as simple as just blindly replacing all your additions with Fast2Sum from Dekker\cite{dekker1971afloating}. +ie: The graph looks exactly the same for single precision. \verb/calculatepi/ obviously also has multiplication ops in it which I didn't change. Will look at after sleep maybe. + +\subsection{A sequence that seems to converge to a wrong limit - pgs 9-10, \cite{HFP}} +\begin{align*} + u_n &= \left\{ \begin{array}{c} u_0 = 2 \\ u_1 = -4 \\ u_n = 111 - \frac{1130}{u_{n-1}} + \frac{3000}{u_{n-1}u_{n-2}}\end{array}\right. +\end{align*} +The limit of the series should be $6$ but when calculated with IEEE floats it is actually $100$ +The authors show that the limit is actually $100$ for different starting values, and the error in floating point arithmetic causes the series to go to that limit instead. +\begin{figure}[H] + \centering + \includegraphics[width=0.8\textwidth]{figures/handbook1-1.pdf} + \caption{Output of Program 1.1 from \emph{Handbook of Floating-Point Arithmetic}\cite{HFP} for various IEEE types} +\end{figure} \pagebreak \bibliographystyle{unsrt}