LiteratureNotes.tex

   1 \documentclass[8pt]{extreport}
   2 \usepackage{graphicx}
   3 \usepackage{caption}
   4 \usepackage{amsmath} % needed for math align
   5 \usepackage{bm} % needed for maths bold face
   6  \usepackage{graphicx}    % needed for including graphics e.g. EPS, PS
   7 \usepackage{fancyhdr}   % needed for header
   8 %\usepackage{epstopdf} % Converts eps to pdf before including. Do it manually instead.
   9 \usepackage{float}
  10 \usepackage{hyperref}
  11
  12  \topmargin -1.5cm        % read Lamport p.163
  13  \oddsidemargin -0.04cm   % read Lamport p.163
  14  \evensidemargin -0.04cm  % same as oddsidemargin but for left-hand pages
  15  \textwidth 16.59cm
  16  \textheight 21.94cm
  17  %\pagestyle{empty}       % Uncomment if don't want page numbers
  18  \parskip 8.2pt           % sets spacing between paragraphs
  19  %\renewcommand{\baselinestretch}{1.5}  % Uncomment for 1.5 spacing between lines
  20  \parindent 0pt           % sets leading space for paragraphs
  21 \linespread{0.8}
  22
  23
  24 \newcommand{\vect}[1]{\boldsymbol{#1}} % Draw a vector
  25 \newcommand{\divg}[1]{\nabla \cdot #1} % divergence
  26 \newcommand{\curl}[1]{\nabla \times #1} % curl
  27 \newcommand{\grad}[1]{\nabla #1} %gradient
  28 \newcommand{\pd}[3][ ]{\frac{\partial^{#1} #2}{\partial #3^{#1}}} %partial derivative
  29 %\newcommand{\d}[3][ ]{\frac{d^{#1} #2}{d #3^{#1}}} %full derivative
  30 \newcommand{\phasor}[1]{\tilde{#1}} % make a phasor
  31 \newcommand{\laplacian}[1]{\nabla^2 {#1}} % The laplacian operator
  32
  33 \usepackage{color}
  34 \usepackage{listings}
  35
  36 \definecolor{darkgray}{rgb}{0.95,0.95,0.95}
  37 \definecolor{darkred}{rgb}{0.75,0,0}
  38 \definecolor{darkblue}{rgb}{0,0,0.75}
  39 \definecolor{pink}{rgb}{1,0.5,0.5}
  40 \lstset{language=Java}
  41 \lstset{backgroundcolor=\color{darkgray}}
  42 \lstset{numbers=left, numberstyle=\tiny, stepnumber=1, numbersep=5pt}
  43 \lstset{keywordstyle=\color{darkred}\bfseries}
  44 \lstset{commentstyle=\color{darkblue}}
  45 %\lstset{stringsyle=\color{red}}
  46 \lstset{showstringspaces=false}
  47 \lstset{basicstyle=\small}
  48
  49 \newcommand{\shell}[1]{\texttt{#1}}
  50 \newcommand{\code}[1]{\texttt{#1}}
  51
  52 \begin{document}
  53
  54
  55 \pagestyle{fancy}
  56 \fancyhead{}
  57 \fancyfoot{}
  58
  59 \fancyhead[LO, L]{}
  60 \fancyfoot[CO, C]{\thepage}
  61
  62 \begin{center}
  63         B.Eng. Final Year Project \par
  64         {\bf \Large Literature Notes} \par
  65         Sam Moore, David Gow \\
  66         Faculty of Engineering, Computing and Mathematics, University of Western Australia \\
  67         March 2014
  68 \end{center}
  69
  70 \tableofcontents
  71
  72 \chapter{Literature Summaries}
  73
  74 \section{Postscript Language Reference Manual  \cite{plrm}}
  75
  76 Adobe's official reference manual for PostScript.
  77
  78 It is big.
  79
  80 \section{Portable Document Format Reference Manual  \cite{pdfref17}}
  81
  82 Adobe's official reference for PDF.
  83
  84 It is also big.
  85
  86 \section{IEEE Standard for Floating-Point Arithmetic \cite{ieee2008-754}}
  87
  88 The IEEE (revised) 754 standard.
  89
  90 It is also big.
  91
  92
  93
  94 \pagebreak
  95
  96 \section{Portable Document Format (PDF) --- Finally...  \cite{cheng2002portable}}
  97
  98 This is not spectacularly useful, is basically an advertisement for Adobe software.
  99
 100 {\bf Intro}
 101 \begin{itemize}
 102         \item Visual communications has been revolutionised by computing
 103         \item BUT there have always been problems in exchanging formats
 104         \item Filetypes like text, rich text, IGES, DXF, TIFF, JPEG, GIFF solve problems for particular types of files only
 105         \item PDF solves everything for everyone; can include text, images, animation, sound, etc
 106 \end{itemize}
 107 {\bf PDF Features}
 108 \begin{itemize}
 109         \item Raster Image Process (RIP) --- For printing (presumably also displaying on screen)
 110         \item Originally needed to convert to PS then RIP, with PS 3 can now RIP directly.
 111         \item Reduced filesize due to compression
 112         \item Four major applications - Stoy 1999\cite{stoy1999}
 113                 \begin{enumerate}
 114                         \item Download files from internet
 115                         \item Files on CDs
 116                         \item Files for outputing to printers
 117                         \item Conventional [commercial scale?] printing
 118                 \end{enumerate}
 119         \item List of various (Adobe) PDF related software
 120         \begin{itemize}
 121                 \item Includes software for PS that converts to/from PDF
 122                 \item So PS was obviously pretty popular before PDF
 123         \end{itemize}
 124         \item Can Optimize for screen/printer [not clear how]
 125         \item Can compress for size
 126 \end{itemize}
 127
 128 \pagebreak
 129 \section{Pixels or Perish  \cite{hayes2012pixels}}
 130
 131 ``The art of scientific illustration will have to adapt to the new age of online publishing''
 132 And therefore, JavaScript libraries ($\text{D}^3$) are the future.
 133
 134 The point is that we need to change from thinking about documents as paper to thinking of them as pixels.
 135 This kind of makes it related to our paper, because it is the same way we are justifying our project. It does mention precision, but doesn't say we need to get more of it.
 136
 137 I get the feeling from this that Web based documents are a whole bunch of completely different design philosophies hacked together
 138 with JavaScript.
 139
 140 This paper uses Metaphors a lot. I never met a phor that didn't over extend itself.
 141
 142
 143 {\bf Intro}
 144 \begin{itemize}
 145         \item Drawings/Pictures are ornaments in science but they are not just ornamental
 146         \item Processes have changed a lot; eg: photographic plates $\to$ digital images
 147         \item ``we are about to turn the page --- if not close the book --- on yet another chapter in publishing history.'' (HO HO HO)
 148         \item It would be cool to have animated figures in documents (eg: Population pyramid; changes with time); not just as ``supplements''
 149         \item In the beginning, there was PostScript, 1970s and 1980s, John Warnock and Charles Geschke, Adobe Systems
 150         \item PS is a language for vector graphics; objects are constructed from geometric primitives rather than a discrete array of pixels
 151         \item PS is a complete programming language; an image is also a program; can exploit this to control how images are created based on data (eg: Faces)
 152         \item PDF is ``flattened'' PS. No longer programable. Aspires to be ``virtual paper''.
 153         \item But why are we using such powerful computing machines just to emulate sheets paper? (the author asks)
 154 \end{itemize}
 155
 156 {\bf Web based Documents}
 157 \begin{itemize}
 158         \item HTML, CSS, JavaScript - The Axis of Web Documents
 159         \begin{itemize}
 160                 \item HTML - Defines document structure
 161                 \item CSS - Defines presentation of elements in document
 162                 \item JavaScript - Encodes actions, allows dynamic content (change the HTML/CSS)
 163         \end{itemize}
 164         \item \texttt{<canvas>} will let you draw anything (So in principle don't even need all of HTML/CSS)
 165         \begin{itemize}
 166                 \item Not device independent
 167                 \item ``Coordinates can be specified with precision finer than pixel resolution'' {\bf (TODO: Investigate this?)}
 168                 \item JavaScript operators to draw things on canvas are very similar to the PostScript model
 169         \end{itemize}
 170         \item SVG --- Same structure (Document Object Model (DOM)) as HTML
 171         \begin{itemize}
 172                 \item ``Noun language''
 173                 \item Nouns define lines/curves etc, rather than paragraphs/lists
 174                 \item Also borrows things from PostScript (eg: line caps and joints)
 175                 \item IS device independent, ``very high precision'' {\bf (TODO: Investigate)}
 176                 \item JavaScript can be used to interact with SVG too
 177         \end{itemize}
 178         \item $\text{D}^{3}$ (Data Driven Documents) - A JavaScript library
 179         \begin{itemize}
 180                 \item Idea is to create or modify elements of a DOM document using supplied data
 181                 \item \url{https://github.com/mbostock/d3/wiki}
 182         \end{itemize}
 183         \item We are in a new Golden Age of data visualisation
 184         \item Why do we still use PDFs?
 185         \begin{itemize}
 186                 \item PDFs are ``owned'' by the author/reader; you download it, store it, you can print it, etc
 187                 \item HTML documents are normally on websites. They are not self contained. They often rely on remote content from other websites (annoying to download the whole document).
 188         \end{itemize}
 189         \item {\bf Conclusion} Someone should open up PDF to accept things like $\text{D}^{3}$ and other graphics formats (links nicely with \cite{barnes2013embedding})
 190         \item Also, Harry Potter reference
 191
 192 \end{itemize}
 193
 194 \section{Embedding and Publishing Interactive, 3D Figures in PDF Files  \cite{barnes2013embedding}}
 195
 196 \begin{itemize}
 197         \item Linkes well with \cite{hayes2012pixels}; I heard you liked figures so I put a figure in your PDF
 198         \item Title pretty much summarises it; similar to \cite{hayes2012pixels} except these guys actually did something practical
 199 \end{itemize}
 200
 201 \section{27 Bits are not enough for 8 digit accuracy  \cite{goldberg1967twentyseven}}
 202
 203 Proves with maths, that rounding errors mean that you need at least $q$ bits for $p$ decimal digits. $10^p < 2^{q-1}$
 204
 205 \begin{itemize}
 206         \item Eg: For 8 decimal digits, since $10^8 < 2^{27}$ would expect to be able to represent with 27 binary digits
 207         \item But: Integer part requires digits bits (regardless of fixed or floating point represenetation)
 208         \item Trade-off between precision and range
 209         \begin{itemize}
 210                 \item 9000000.0 $\to$ 9999999.9 needs 24 digits for the integer part $2^{23} = 83886008$
 211         \end{itemize}
 212         \item Floating point zero = smallest possible machine exponent
 213         \item Floating point representation:
 214         \begin{align*}
 215                 y &= 0.y_1 y_2 \text{...} y_q \times 2^{n}
 216         \end{align*}
 217         \item Can eliminate a bit by considering whether $n = -e$ for $-e$ the smallest machine exponent (???)
 218         \begin{itemize}
 219                 \item Get very small numbers with the same precision
 220                 \item Get large numbers with the extra bit of precision
 221         \end{itemize}
 222 \end{itemize}
 223
 224 \section{What every computer scientist should know about floating-point arithmetic  \cite{goldberg1991whatevery}}
 225
 226 \begin{itemize}
 227         \item Book: \emph{Floating Point Computation} by Pat Sterbenz (out of print... in 1991)
 228     \item IEEE floating point standard becoming popular (introduced in 1987, this is 1991)
 229     \begin{itemize}
 230                 \item As well as structure, defines the algorithms for addition, multiplication, division and square root
 231                 \item Makes things portable because results of operations are the same on all machines (following the standard)
 232                 \item Alternatives to floating point: Floating slasi and Signed Logarithm (TODO: Look at these, although they will probably not be useful)
 233
 234         \end{itemize}
 235         \item Base $\beta$ and precision $p$ (number of digits to represent with) - powers of the base can be represented exactly.
 236         \item Largest and smallest exponents $e_{min}$ and $e_{max}$
 237         \item Need bits for exponent and fraction, plus one for sign
 238         \item ``Floating point number'' is one that can be represented exactly.
 239         \item Representations are not unique! $0.01 \times 10^1 = 1.00 \times 10^{-1}$ Leading digit of one $\implies$ ``normalised''
 240         \item Requiring the representation to be normalised makes it unique, {\bf but means it is impossible to represent zero}.
 241         \begin{itemize}
 242                 \item Represent zero as $1 \times \beta^{e_{min}-1}$ - requires extra bit in the exponent
 243         \end{itemize}
 244         \item {\bf Rounding Error}
 245         \begin{itemize}
 246                 \item ``Units in the last place'' eg: 0.0314159 compared to 0.0314 has ulp error of 0.159
 247                 \item If calculation is the nearest floating point number to the result, it will still be as much as 1/2 ulp in error
 248                 \item Relative error corresponding to 1/2 ulp can vary by a factor of $\beta$ ``wobble''. Written in terms of $\epsilon$
 249                 \item Maths $\implies$ {\bf Relative error is always bounded by $\epsilon = (\beta/2)\beta^{-p}$}
 250                 \item Fixed relative error $\implies$ ulp can vary by a factor of $\beta$ . Vice versa
 251                 \item Larger $\beta \implies$ larger errors
 252         \end{itemize}
 253         \item {\bf Guard Digits}
 254         \begin{itemize}
 255                 \item In subtraction: Could compute exact difference and then round; this is expensive
 256                 \item Keep fixed number of digits but shift operand right; discard precision. Lead to relative error up to $\beta - 1$
 257                 \item Guard digit: Add extra digits before truncating. Leads to relative error of less than $2\epsilon$. This also applies to addition
 258         \end{itemize}
 259         \item {\bf Catastrophic Cancellation} - Operands are subject to rounding errors - multiplication
 260         \item {\bf Benign Cancellation} - Subtractions. Error $< 2\epsilon$
 261         \item Rearrange formula to avoid catastrophic cancellation
 262         \item Historical interest only - speculation on why IBM used $\beta = 16$ for the system/370 - increased range? Avoids shifting
 263         \item Precision: IEEE defines extended precision (a lower bound only)
 264         \item Discussion of the IEEE standard for operations (TODO: Go over in more detail)
 265         \item NaN allow continuing with underflow and Infinity with overflow
 266         \item ``Incidentally, some people think that the solution to such anomalies is never to compare floating-point numbers for equality but instead to consider them equal if they are within some error bound E. This is hardly a cure all, because it raises as many questions as it answers.'' - On equality of floating point numbers
 267
 268 \end{itemize}
 269
 270
 271 %%%%
 272 % David's Stuff
 273 %%%%
 274 \section{Compositing Digital Images  \cite{porter1984compositing}}
 275
 276
 277
 278 Perter and Duff's classic paper "Compositing Digital Images" lays the
 279 foundation for digital compositing today. By providing an "alpha channel,"
 280 images of arbitrary shapes — and images with soft edges or sub-pixel coverage
 281 information — can be overlayed digitally, allowing separate objects to be
 282 rasterized separately without a loss in quality.
 283
 284 Pixels in digital images are usually represented as 3-tuples containing
 285 (red component, green component, blue component). Nominally these values are in
 286 the [0-1] range. In the Porter-Duff paper, pixels are stored as $(R,G,B,\alpha)$
 287 4-tuples, where alpha is the fractional coverage of each pixel. If the image
 288 only covers half of a given pixel, for example, its alpha value would be 0.5.
 289
 290 To improve compositing performance, albeit at a possible loss of precision in
 291 some implementations, the red, green and blue channels are premultiplied by the
 292 alpha channel. This also simplifies the resulting arithmetic by having the
 293 colour channels and alpha channels use the same compositing equations.
 294
 295 Several binary compositing operations are defined:
 296 \begin{itemize}
 297 \item over
 298 \item in
 299 \item out
 300 \item atop
 301 \item xor
 302 \item plus
 303 \end{itemize}
 304
 305 The paper further provides some additional operations for implementing fades and
 306 dissolves, as well as for changing the opacity of individual elements in a
 307 scene.
 308
 309 The method outlined in this paper is still the standard system for compositing
 310 and is implemented almost exactly by modern graphics APIs such as \texttt{OpenGL}. It is
 311 all but guaranteed that this is the method we will be using for compositing
 312 document elements in our project.
 313
 314 \section{Bresenham's Algorithm: Algorithm for computer control of a digital plotter  \cite{bresenham1965algorithm}}
 315 Bresenham's line drawing algorithm is a fast, high quality line rasterization
 316 algorithm which is still the basis for most (aliased) line drawing today. The
 317 paper, while originally written to describe how to control a particular plotter,
 318 is uniquely suited to rasterizing lines for display on a pixel grid.
 319
 320 Lines drawn with Bresenham's algorithm must begin and end at integer pixel
 321 coordinates, though one can round or truncate the fractional part. In order to
 322 avoid multiplication or division in the algorithm's inner loop,
 323
 324 The algorithm works by scanning along the long axis of the line, moving along
 325 the short axis when the error along that axis exceeds 0.5px. Because error
 326 accumulates linearly, this can be achieved by simply adding the per-pixel
 327 error (equal to (short axis/long axis)) until it exceeds 0.5, then incrementing
 328 the position along the short axis and subtracting 1 from the error accumulator.
 329
 330 As this requires nothing but addition, it is very fast, particularly on the
 331 older CPUs used in Bresenham's time. Modern graphics systems will often use Wu's
 332 line-drawing algorithm instead, as it produces antialiased lines, taking
 333 sub-pixel coverage into account. Bresenham himself extended this algorithm to
 334 produce Bresenham's circle algorithm. The principles behind the algorithm have
 335 also been used to rasterize other shapes, including B\'{e}zier curves.
 336
 337 \section{Quad Trees: A Data Structure for Retrieval on Composite Keys  \cite{finkel1974quad}}
 338
 339 This paper introduces the ``quadtree'' spatial data structure. The quadtree structure is
 340 a search tree in which every node has four children representing the north-east, north-west,
 341 south-east and south-west quadrants of its space.
 342
 343 \section{Xr: Cross-device Rendering for Vector Graphics  \cite{worth2003xr}}
 344
 345 Xr (now known as Cairo) is an implementation of the PDF v1.4 rendering model,
 346 independent of the PDF or PostScript file formats, and is now widely used
 347 as a rendering API. In this paper, Worth and Packard describe the PDF v1.4 rendering
 348 model, and their PostScript-derived API for it.
 349
 350 The PDF v1.4 rendering model is based on the original PostScript model, based around
 351 a set of \emph{paths} (and other objects, such as raster images) each made up of lines
 352 and B\'{e}zier curves, which are transformed by the ``Current Transformation Matrix.''
 353 Paths can be \emph{filled} in a number of ways, allowing for different handling of self-intersecting
 354 paths, or can have their outlines \emph{stroked}.
 355 Furthermore, paths can be painted with RGB colours and/or patterns derived from either
 356 previously rendered objects or external raster images.
 357 PDF v1.4 extends this to provide, amongst other features, support for layering paths and
 358 objects using Porter-Duff compositing\cite{porter1984compositing}, giving each painted path
 359 the option of having an $\alpha$ value and a choice of any of the Porter-Duff compositing
 360 methods.
 361
 362 The Cairo library approximates the rendering of some objects (particularly curved objects
 363 such as splines) with a set of polygons. An \texttt{XrSetTolerance} function allows the user
 364 of the library to set an upper bound on the approximation error in fractions of device pixels,
 365 providing a trade-off between rendering quality and performance. The library developers found
 366 that setting the tolerance to greater than $0.1$ device pixels resulted in errors visible to the
 367 user.
 368
 369 \section{Glitz: Hardware Accelerated Image Compositing using OpenGL  \cite{nilsson2004glitz}}
 370
 371 This paper describes the implementation of an \texttt{OpenGL} based rendering backend for
 372 the \texttt{Cairo} library.
 373
 374 The paper describes how OpenGL's Porter-Duff compositing is easily suited to the Cairo/PDF v1.4
 375 rendering model. Similarly, traditional OpenGL (pre-version 3.0 core) support a matrix stack
 376 of the same form as Cairo.
 377
 378 The ``Glitz'' backend will emulate support for tiled, non-power-of-two patterns/textures if
 379 the hardware does not support it.
 380
 381 Glitz can render both triangles and trapezoids (which are formed from pairs of triangles).
 382 However, it cannot guarantee that the rasterization is pixel-precise, as OpenGL does not proveide
 383 this consistently.
 384
 385 Glitz also supports multi-sample anti-aliasing, convolution filters for raster image reads (implemented
 386 with shaders).
 387
 388 Performance was much improved over the software rasterization and over XRender accellerated rendering
 389 on all except nVidia hardware. However, nVidia's XRender implementation did slow down significantly when
 390 some transformations were applied.
 391
 392 %% Sam again
 393
 394 \section{Boost Multiprecision Library  \cite{boost_multiprecision}}
 395
 396 \begin{itemize}
 397         \item ``The Multiprecision Library provides integer, rational and floating-point types in C++ that have more range and precision than C++'s ordinary built-in types.''
 398         \item Specify number of digits for precision as a template argument.
 399         \item Precision is fixed... {\bf possible approach to project:} Use \verb/boost::mpf_float<N>/ and increase \verb/N/ as more precision is required?
 400 \end{itemize}
 401
 402
 403 % Some hardware related sounding stuff...
 404
 405 \section{A CMOS Floating Point Unit  \cite{kelley1997acmos}}
 406
 407 The paper describes the implentation of a FPU for PowerPC using a particular Hewlett Packard process (HP14B 0.5$\mu$m, 3M, 3.3V).
 408 It implements a ``subset of the most commonly used double precision floating point instructions''. The unimplemented operations are compiled for the CPU.
 409
 410 The paper gives a description of the architecture and design methods.
 411 This appears to be an entry to a student design competition.
 412
 413 Standard is IEEE 754, but the multiplier tree is a 64-bit tree instead of a 54 bit tree.
 414 `` The primary reason for implementing a larger tree is for future additions of SIMD [Single Instruction Multiple Data (?)] instructions similar to Intel's MMX and Sun's VIS instructions''.
 415
 416 HSPICE simulations used to determine transistor sizing.
 417
 418 Paper has a block diagram that sort of vaguely makes sense to me.
 419 The rest requires more background knowledge.
 420
 421 \section{Simply FPU\cite{filiatreault2003simply}}
 422
 423 This is a webpage at one degree of seperation from wikipedia.
 424
 425 It talks about FPU internals, but mostly focuses on the instruction sets.
 426 It includes FPU assembly code examples (!)
 427
 428 It is probably not that useful, I don't think we'll end up writing FPU assembly?
 429
 430 FPU's typically have 80 bit registers so they can support REAL4, REAL8 and REAL10 (single, double, extended precision).
 431
 432
 433 \section{Floating Point Package User's Guide  \cite{bishop2008floating}}
 434
 435 This is a technical report describing floating point VHDL packages \url{http://www.vhdl.org/fphdl/vhdl.html}
 436
 437 In theory I know VHDL (cough) so I am interested in looking at this further to see how FPU hardware works.
 438 It might be getting a bit sidetracked from the ``document formats'' scope though.
 439
 440 The report does talk briefly about the IEEE standard and normalised / denormalised numbers as well.
 441
 442 See also: Java Optimized Processor\cite{jop} (it has a VHDL implementation of a FPU).
 443
 444 \section{Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy\cite{dieter2007lowcost}}
 445
 446 Mentions how GPUs offer very good floating point performance but only for single precision floats. (NOTE: This statement seems to contradict \cite{hillesland2004paranoia}.
 447
 448 Has a diagram of a Floating Point adder.
 449
 450 Talks about some magical technique called "Native-pair Arithmetic" that somehow makes 32-bit floating point accuracy ``competitive'' with 64-bit floating point numbers.
 451
 452 \section{Accurate Floating Point Arithmetic through Hardware Error-Free Transformations  \cite{kadric2013accurate}}
 453
 454 From the abstract: ``This paper presents a hardware approach to performing ac-
 455 curate floating point addition and multiplication using the idea of error-
 456 free transformations. Specialized iterative algorithms are implemented
 457 for computing arbitrarily accurate sums and dot products.''
 458
 459 The references for this look useful.
 460
 461 It also mentions VHDL.
 462
 463 So whenever hardware papers come up, VHDL gets involved...
 464 I guess it's time to try and work out how to use the Opensource VHDL implementations.
 465
 466 This is about reduction of error in hardware operations rather than the precision or range of floats.
 467 But it is probably still relevant.
 468
 469 This has the Fast2Sum algorithm but for the love of god I cannot see how you can compute anything other than $a + b = 0 \forall a,b$ using the algorithm as written in their paper. It references Dekker\cite{dekker1971afloating} and Kahn; will look at them instead.
 470
 471 \section{Floating Point Unit from JOP  \cite{jop}}
 472
 473 This is a 32 bit floating point unit developed for JOP in VHDL.
 474 I have been able to successfully compile it and the test program using GHDL\cite{ghdl}.
 475
 476 Whilst there are constants (eg: \verb/FP_WIDTH = 32, EXP_WIDTH = 8, FRAC_WIDTH = 23/) defined, the actual implementation mostly uses magic numbers, so
 477 some investigation is needed into what, for example, the "52" bits used in the sqrt units are for.
 478
 479 \section{GHDL  \cite{ghdl}}
 480
 481 GHDL is an open source GPL licensed VHDL compiler written in Ada. It had packages in debian up until wheezy when it was removed. However the sourceforge site still provides a \shell{deb} file for wheezy.
 482
 483 This reference explains how to use the \shell{ghdl} compiler, but not the VHDL language itself.
 484
 485 GHDL is capable of compiling a ``testbench'' - essentially an executable which simulates the design and ensures it meets test conditions.
 486 A common technique is using a text file to provide the inputs/outputs of the test. The testbench executable can be supplied an argument to save a \shell{vcd} file which can be viewed in \shell{gtkwave} to see timing diagrams.
 487
 488 Sam has successfully compiled the VHDL design for an FPU in JOP\cite{jop} into a ``testbench'' executable which uses standard i/o instead of a regular file.
 489 Using unix domain sockets we can execute the FPU as a child process and communicate with it from our document viewing test software. This means we can potentially simulate alternate hardware designs for FPUs and witness the effect they will have on precision in the document viewer.
 490
 491 Using \shell{ghdl} the testbench can also be linked as part a C/C++ program and run using a function; however there is still no way to communicate with it other than forking a child process and using a unix domain socket anyway. Also, compiling the VHDL FPU as part of our document viewer would clutter the code repository and probably be highly unportable. The VHDL FPU has been given a seperate repository.
 492
 493 \section{On the design of fast IEEE floating-point adders  \cite{seidel2001onthe}}
 494
 495 This paper gives an overview of the ``Naive'' floating point addition/subtraction algorithm and gives several optimisation techniques:
 496
 497 TODO: Actually understand these...
 498
 499 \begin{itemize}
 500         \item Use parallel paths (based on exponent)
 501         \item Unification of significand result ranges
 502         \item Reduction of IEEE rounding modes
 503         \item Sign-magnitude computation of a difference
 504         \item Compound Addition
 505         \item Approximate counting of leading zeroes
 506         \item Pre-computation of post-normalization shift
 507 \end{itemize}
 508
 509 They then give an implementation that uses these optimisation techniques including very scary looking block diagrams.
 510
 511 They simulated the FPU. Does not mention what simulation method was used directly, but cites another paper (TODO: Look at this. I bet it was VHDL).
 512
 513 The paper concludes by summarising the optimisation techniques used by various adders in production (in 2001).
 514
 515 This paper does not specifically mention the precision of the operations, but may be useful because a faster adder design might mean you can increase the precision.
 516
 517 \section{Re: round32 ( round64 ( X ) ) ?= round32 ( X )  \cite{beebe2011round32}}
 518
 519 I included this just for the quote by Nelson H. F. Beebe:
 520
 521 ``It is too late now to repair the mistakes of the past that are present
 522 in millions of installed systems, but it is good to know that careful
 523 research before designing hardware can be helpful.''
 524
 525 This is in regards to the problem of double rounding. It provides a reference for a paper that discusses a rounding mode that eliminates the problem, and a software implementation.
 526
 527 It shows that the IEEE standard can be fallible!
 528
 529 Not sure how to work this into our literature review though.
 530
 531 % Back to software
 532 \section{Basic Issues in Floating Point Arithmetic and Error Analysis  \cite{demmel1996basic}}
 533
 534 These are lecture notes from U.C Berkelye CS267 in 1996.
 535
 536
 537 \section{Charles Babbage  \cite{dodge_babbage, nature1871babbage}}
 538
 539 Tributes to Charles Babbage. Might be interesting for historical background. Don't mention anything about floating point numbers though.
 540
 541 \section{GPU Floating-Point Paranoia  \cite{hillesland2004paranoia}}
 542
 543 This paper discusses floating point representations on GPUs. They have reproduced the program \emph{Paranoia} by William Kahan for characterising floating point behaviour of computers (pre IEEE) for GPUs.
 544
 545
 546 There are a few remarks about GPU vendors not being very open about what they do or do not do with
 547
 548
 549 Unfortunately we only have the extended abstract, but a pretty good summary of the paper (written by the authors) is at: \url{www.cs.unc.edu/~ibr/projects/paranoia/}
 550
 551 From the abstract:
 552
 553 ``... [GPUs are often similar to IEEE] However, we have found
 554 that GPUs do not adhere to IEEE standards for floating-point op-
 555 erations, nor do they give the information necessary to establish
 556 bounds on error for these operations ... ''
 557
 558 and ``...Our goal is to determine the error bounds on floating-point op-
 559 eration results for quickly evolving graphics systems. We have cre-
 560 ated a tool to measure the error for four basic floating-point opera-
 561 tions: addition, subtraction, multiplication and division.''
 562
 563 The implementation is only for windows and uses glut and glew and things.
 564 Implement our own version?
 565
 566 \section{A floating-point technique for extending the available precision  \cite{dekker1971afloating}}
 567
 568 This is Dekker's formalisation of the Fast2Sum algorithm originally implemented by Kahn.
 569
 570 \begin{align*}
 571         z &= \text{RN}(x + y) \\
 572         w &= \text{RN}(z - x) \\
 573         zz &= \text{RN}(y - w) \\
 574         \implies z + zz &= x + y
 575 \end{align*}
 576
 577 There is a version for multiplication.
 578
 579 I'm still not quite sure when this is useful. I haven't been able to find an example for $x$ and $y$ where $x + y \neq \text{Fast2Sum}(x, y)$.
 580
 581 \section{Handbook of Floating-Point Arithmetic \cite{HFP}}
 582
 583 This book is amazingly useful and pretty much says everything there is to know about Floating Point numbers.
 584 It is much easier to read than Goldberg or Priest's papers.
 585
 586 I'm going to start working through it and compile their test programs.
 587
 588 \subsection{A sequence that seems to converge to a wrong limit  - pgs 9-10, \cite{HFP}}
 589
 590 \begin{align*}
 591         u_n &= \left\{ \begin{array}{c} u_0 = 2 \\ u_1 = -4 \\ u_n = 111 - \frac{1130}{u_{n-1}} + \frac{3000}{u_{n-1}u_{n-2}}\end{array}\right.
 592 \end{align*}
 593
 594 The limit of the series should be $6$ but when calculated with IEEE floats it is actually $100$
 595 The authors show that the limit is actually $100$ for different starting values, and the error in floating point arithmetic causes the series to go to that limit instead.
 596
 597 \begin{figure}[H]
 598         \centering
 599         \includegraphics[width=0.8\textwidth]{figures/handbook1-1.pdf}
 600         \caption{Output of Program 1.1 from \emph{Handbook of Floating-Point Arithmetic}\cite{HFP} for various IEEE types}
 601         \label{HFP-1-1}
 602 \end{figure}
 603
 604 \subsection{Mr Gullible and the Chaotic Bank Society pgs 10-11 \cite{HFP}}
 605
 606 This is an example of a sequence involving $e$. Since $e$ cannot be represented exactly with FP, even though the sequence should go to $0$ for $a_0 = e - 1$, the representation of $a_0 \neq e - 1$ so the sequence goes to $\pm \infty$.
 607
 608 To eliminate these types of problems we'd need an \emph{exact} representation of all real numbers.
 609 For \emph{any} FP representation, regardless of precision (a finite number of digits) there will be numbers that can't be represented exactly hence you could find a similar sequence that would explode.
 610
 611 IE: The more precise the representation, the slower things go wrong, but they still go wrong, {\bf even with errorless operations}.
 612
 613
 614 \subsection{Rump's example pg 12 \cite {HFP}}
 615
 616 This is an example where the calculation of a function $f(a,b)$ is not only totally wrong, it gives completely different results depending on the CPU. Despite the CPU conforming to IEEE.
 617
 618 \chapter{General Notes}
 619
 620 \section{Floating-Point \cite{HFP,goldberg1991whatevery,goldberg1992thedesign,priest1991algorithms}}
 621
 622 A set of FP numbers is characterised by:
 623 \begin{enumerate}
 624         \item Radix (base) $\beta \geq 2$
 625         \item Precision %$p \req 2$ ``number of sig digits''
 626         \item Two ``extremal`` exponents $e_min < 0 < e_max$ (generally, don't have to have the $0$ in there)
 627 \end{enumerate}
 628
 629 Numbers are represented by {\bf integers}: $(M, e)$ such that $x = M \times \beta^{e - p + 1}$
 630
 631 Require: $|M| \leq \beta^{p}-1$ and $e_min \leq e \leq e_max$.
 632
 633 Representations are not unique; set of equivelant representations is a cohort.
 634
 635 $\beta^{e-p+1}$ is the quantum, $e-p+1$ is the quantum exponent.
 636
 637 Alternate represetnation: $(s, m, e)$ such that $x = (-1)^s \times m \times \beta^{e}$
 638 $m$ is the significand, mantissa, or fractional part. Depending on what you read.
 639
 640
 641
 642
 643
 644 \section{Rounding Errors}
 645
 646
 647 They happen. There is ULP and I don't mean a political party.
 648
 649 TODO: Probably say something more insightful. Other than "here is a graph that shows errors and we blame rounding".
 650
 651 \subsection{Results of calculatepi}
 652
 653 We can calculate pi by numerically solving the integral:
 654 \begin{align*}
 655         \int_0^1 \left(\frac{4}{1+x^2}\right) dx &= \pi
 656 \end{align*}
 657
 658 Results with Simpson Method:
 659 \begin{figure}[H]
 660         \centering
 661         \includegraphics[width=0.8\textwidth]{figures/calculatepi.pdf}
 662         \caption{Example of accumulated rounding errors in a numerical calculation}
 663 \end{figure}
 664
 665 Tests with \verb/calculatepi/ show it's not quite as simple as just blindly replacing all your additions with Fast2Sum from Dekker\cite{dekker1971afloating}.
 666 ie: The graph looks exactly the same for single precision. \verb/calculatepi/ obviously also has multiplication ops in it which I didn't change. Will look at after sleep maybe.
 667
 668
 669 \pagebreak
 670 \bibliographystyle{unsrt}
 671 \bibliography{papers}
 672
 673 \end{document}
 674