Automatic commit of irc logs

[ipdf/documents.git] / LiteratureNotes.tex
diff --git a/LiteratureNotes.tex b/LiteratureNotes.tex

index c7aae7b..111e220 100644 (file)
--- a/LiteratureNotes.tex
+++ b/LiteratureNotes.tex
@@ -77,18 +77,56 @@ Adobe's official reference manual for PostScript.
  
  It is big.
  
  
  It is big.
  
+\begin{itemize}
+       \item First version was published BEFORE the IEEE standard and used smaller floats than binary32
+       \item Now uses binary32 floats.
+\end{itemize}
+
  \section{Portable Document Format Reference Manual  \cite{pdfref17}}
  
  Adobe's official reference for PDF.
  
  It is also big.
  
  \section{Portable Document Format Reference Manual  \cite{pdfref17}}
  
  Adobe's official reference for PDF.
  
  It is also big.
  
+\begin{itemize}
+       \item Early versions did not use IEEE binary32 but 16-16 exponent/mantissa encodings (Why?)
+       \item Current standard is restricted to binary32
+       \item It specifically says PDF creators must use at most binary32 because higher precision is not supported by Adobe Reader.
+\end{itemize}
+
  \section{IEEE Standard for Floating-Point Arithmetic \cite{ieee2008-754}}
  
  The IEEE (revised) 754 standard.
  
  It is also big.
  
  \section{IEEE Standard for Floating-Point Arithmetic \cite{ieee2008-754}}
  
  The IEEE (revised) 754 standard.
  
  It is also big.
  
+Successes:
+\begin{itemize}
+       \item Has been adopted by CPUs
+       \item Standardised floats for programmers --- accomplishes goal of allowing non-numerical experts to write reasonably sophisticated platform independent programs that may perform complex numerical operations
+\end{itemize}
+
+Failures:
+\begin{itemize}
+       \item Adoption by GPUs slower\cite{hillesland2004paranoia}
+       \item It specifies the maximum errors for operations using IEEE types but nothing about internal representations
+       \item Many hardware devices (GPUs and CPUs) use non-IEEE representations internally and simply truncate/round the result
+       \begin{itemize}
+               \item This isn't so much of a problem when the device uses additional bits but it is misleading when GPUs use less than binary32 and act as if they are using binary32 from the programmer's perspective.
+               \item Devices using {\bf less} bits internally aren't IEEE compliant
+       \end{itemize}
+       \item Thus the same program compiled and run on different architectures may give completely different results\cite{HFP}
+       \begin{itemize}
+               \item The ultimate goal of allowing people to write numerical programs in total ignorance of the hardware is not entirely realised
+       \end{itemize}
+       \item This is the sort of thing that makes people want to use a virtual machine, and thus Java
+       \begin{itemize}
+               \item Objectively I probably shouldn't say that using Java is in itself a failure
+       \end{itemize}
+       \item Standards such as PostScript and PDF were slow to adopt IEEE representations
+       \item The OpenVG standard accepts IEEE binary32 in the API but specifically states that hardware may use less than this\cite{rice2008openvg}
+\end{itemize}
+
  
  
  \pagebreak
  
  
  \pagebreak
@@ -311,6 +349,12 @@ and is implemented almost exactly by modern graphics APIs such as \texttt{OpenGL
  all but guaranteed that this is the method we will be using for compositing
  document elements in our project.
  
  all but guaranteed that this is the method we will be using for compositing
  document elements in our project.
  
+{\bf Some issues with the statements made here...}
+
+When introducing Bresenham's algorithm below you say modern graphics systems ``will often use Wu's line-drawing algorithm instead, \emph{as it produces antialiased lines}'' (and don't give a citation). Here you say OpenGL uses Porter-Duff compositing ``almost exactly''. But in their introduction they say: ``
+\begin{enumerate}
+\
+
  \section{Bresenham's Algorithm: Algorithm for computer control of a digital plotter  \cite{bresenham1965algorithm}}
  Bresenham's line drawing algorithm is a fast, high quality line rasterization
  algorithm which is still the basis for most (aliased) line drawing today. The
  \section{Bresenham's Algorithm: Algorithm for computer control of a digital plotter  \cite{bresenham1965algorithm}}
  Bresenham's line drawing algorithm is a fast, high quality line rasterization
  algorithm which is still the basis for most (aliased) line drawing today. The
@@ -389,6 +433,8 @@ Performance was much improved over the software rasterization and over XRender a
  on all except nVidia hardware. However, nVidia's XRender implementation did slow down significantly when
  some transformations were applied.
  
  on all except nVidia hardware. However, nVidia's XRender implementation did slow down significantly when
  some transformations were applied.
  
+In \cite{kilgard2012gpu}, Kilgard mentions that Glitz has been abandoned. He describes it as ''GPU assisted'' rather than GPU accelerated, since it used the XRender (??) extension.
+
  %% Sam again
  
  \section{Boost Multiprecision Library  \cite{boost_multiprecision}}
  %% Sam again
  
  \section{Boost Multiprecision Library  \cite{boost_multiprecision}}
@@ -429,6 +475,8 @@ It is probably not that useful, I don't think we'll end up writing FPU assembly?
  
  FPU's typically have 80 bit registers so they can support REAL4, REAL8 and REAL10 (single, double, extended precision).
  
  
  FPU's typically have 80 bit registers so they can support REAL4, REAL8 and REAL10 (single, double, extended precision).
  
+Note: Presumably this is referring to the x86 80 bit floats that David was talking about?
+
  
  \section{Floating Point Package User's Guide  \cite{bishop2008floating}}
  
  
  \section{Floating Point Package User's Guide  \cite{bishop2008floating}}
  
@@ -702,6 +750,147 @@ Example of XML parsing using pugixml is in \shell{code/src/tests/xml.cpp}
         \caption{Tree representation of the above listing \cite{pugixmlDOM}}
  \end{figure}
  
         \caption{Tree representation of the above listing \cite{pugixmlDOM}}
  \end{figure}
  
+\section{An Algorithm For Shading of Regions on Vector Display Devices \cite{brassel1979analgorithm}}
+
+All modern display devices are raster based and therefore this paper is mainly of historical interest. It provides some references for shading on a raster display.
+
+The algorithm described will shade an arbitrary simply-connected polygon using one or two sets of parallel lines.
+
+The ``traditional'' method is:
+\begin{enumerate}
+       \item Start with a $N$ vertex polygon, rotate coords by the shading angle
+       \item Determine a bounding rectangle
+       \item For $M$ equally spaced parallel lines, compute the intersections with the boundaries of the polygon
+       \item Rotate coordinates back
+       \item Render the $M$ lines
+\end{enumerate}
+
+This is pretty much exactly how an artist would shade a pencil drawing. It is $O(M\times N)$.
+
+The algorithm in this paper does:
+\begin{enumerate}
+       \item Rotate polygon coords by shading angle
+       \item Subdivide the polygon into trapezoids (special case triangle)
+       \item Shade the trapezoids independently
+       \item Rotate it all back
+\end{enumerate}
+It is more complicated than it seems. The subdivision requires a sort to be performed on the vertices of the polygon based on their rotated $x$ and $y$ coordinates.
+
+\section{An Algorithm For Filling Regions on Graphics Display Devices \cite{lane1983analgorithm}}
+
+This gives an algorithm for for polygons (which may have ``holes'').
+It requires the ability to ``subtract'' fill from a region; this is (as far as I can tell) difficult for vector graphics devices but simple on raster graphics devices, so the paper claims it is oriented to the raster graphics devices.
+
+If the polygon is defined by $(x_i, y_i)$ then this algorithm iterates from $i = 2$ and alternates between filling and erasing the triangles $[(x_i, y_i), (x_{i+1}, y_{i+1}), (x_1, y_1)]$. It requires no sorting of the points.
+
+The paper provides a proof that the algorithm is correct and is ``optimal in the number of pixel updates required for convex polygons''.
+In the conclusion it is noted that trapezoids could be used from a fixed line and edge of the polygon, but this is not pixel optimal.
+
+This paper doesn't have a very high citation count but it is cited by the NVIDIA article \cite{kilgard2012gpu}.
+Apparently someone else adapted this algorithm for use with the stencil buffer.
+
+\section{GPU-accelerated path rendering \cite{kilgard2012gpu, kilgard300programming}}
+
+Vector graphics on the GPU; an NVIDIA extension. \cite{kilgard300programming} is the API.
+
+Motivations:
+\begin{itemize}
+       \item The focus has been on 3D acceleration in GPUs; most path rendering is done by the CPU.
+       \item Touch devices allow the screen to be transformed rapidly; CPU rastering of the path becomes inefficient
+       \begin{itemize}
+               \item The source of the ugly pixelated effects on a smartphone when scaling?
+       \end{itemize}
+       \item Especially when combined with increased resolution of these devices
+       \item Standards such as HTML5, SVG, etc, expose path rendering
+       \item Javascript is getting fast enough that we can't blame it anymore (the path rendering is the bottleneck not the JS)
+       \item GPU is more power efficient than the CPU
+\end{itemize}
+
+Results show the extension is faster than almost every renderer it was compared with for almost every test image.
+
+Comparisons to other attempts:
+\begin{itemize}
+       \item Cairo and Glitz \cite{nilsson2004glitz} (abandoned)
+\      \item Direct2D from Microsoft uses CPU to tesselate trapezoids and then renders these on the GPU
+       \item Skia in Android/Chrome uses CPU but now has Ganesh which is also hybrid CPU/GPU
+       \item Khronos Group created OpenVG\cite{rice2008openvg} with several companies creating hardware units to implement the standard. Performance is not as good as ``what we report''
+\end{itemize}
+
+
+\section{A Multiple Precision Binary Floating Point Library With Correct Rounding \cite{fousse2007mpfr}}
+
+This is what is now the GNU MPFR C library; it has packages in debian wheezy.
+
+The library was motivated by the lack of existing arbitrary precision libraries which conformed to IEEE rounding standards.
+Examples include Mathematica, GNU MP (which this library is actually built upon), Maple (which is an exception but buggy).
+
+TODO: Read up on IEEE rounding to better understand the first section
+
+Data:
+\begin{itemize}
+       \item $(s, m, e)$ where $e$ is signed
+       \item Precision $p$ is number of bits in $m$
+       \item $\frac{1}{2} \leq m < 1$
+       \item The leading bit of the mantissa is always $1$ but it is not implied
+       \item There are no denormalised numbers
+       \item Mantissa is stored as an array of ``limbs'' (unsigned integers) as in GMP
+\end{itemize}
+
+The paper includes performance comparisons with several other libraries, and a list of literature using the MPFR (the dates indicating that the library has been used reasonably widely before this paper was published).
+
+\section{Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic \cite{kahan1996ieee754}}
+
+11 years after the IEEE 754 standard becomes official, Kahan discusses various misunderstood features of the standard and laments at the failure of compilers and microprocessors to conform to some of these.
+
+I am not sure how relevant these complaints are today, but it makes for interesting reading as Kahan is clearly very passionate about the need to conform \emph{exactly} to IEEE 754.
+
+Issues considered are: Rounding rules, Exception handling and NaNs (eg: The payload of NaNs is not used in any software Kahan is aware of), Bugs in compilers (mostly Fortran) where an expression contains floats of different precisions (the compiler may attempt to optimise an expression resulting in a failure to round a double to a single), Infinity (which is unsupported by many compilers though it is supported in hardware)...
+
+
+An example is this Fortran compiler ``nasty bug'' where the compiler replaces $s$ with $x$ in line 4 and thus a rounding operation is lost.
+\begin{lstlisting}[language=Fortran, basicstyle=\ttfamily]
+       real(8) :: x, y % double precision (or extended as real(12))
+       real(4) :: s, t % single precision
+       s = x % s should be rounded
+       t = (s - y) / (...) % Compiler incorrectly replaces s with x in this line
+\end{lstlisting}
+
+\subsection{The Baleful Influence of Benchmarks \cite{kahan1996ieee754} pg 20}
+
+This section discusses the tendency to rate hardware (or software) on their speed performance and neglect accuracy.
+
+Is this complaint still relevant now when we consider the eagerness to perform calculations on the GPU?
+ie: Doing the coordinate transforms on the GPU is faster, but less accurate (see our program).
+
+Benchmarks need to be well designed when considering accuracy; how do we know an inaccurate computer hasn't gotten the right answer for a benchmark by accident?
+
+A proposed benchmark for determining the worst case rounding error is given. This is based around computing the roots to a quadratic equation.
+A quick scan of the source code for paranoia does not reveal such a test.
+
+As we needed benchmarks for CPUs perhaps we need benchmarks for GPUs. The GPU Paranoia paper\cite{hillesland2004paranoia} is the only one I have found so far.
+
+\section{Prof W. Kahan's Web Pages \cite{kahanweb}}
+
+William Kahan, architect of the IEEE 754-1985 standard, creator of the program ``paranoia'', the ``Kahan Summation Algorithm'' and contributor to the revised standard. 
+
+Kahan's web page has more examples of errors in floating point arithmetic (and places where things have not conformed to the IEEE 754 standard) than you can poke a stick at.
+
+Wikipedia's description is pretty accurate: ``He is an outspoken advocate of better education of the general computing population about floating-point issues, and regularly denounces decisions in the design of computers and programming languages that may impair good floating-point computations.''
+
+Kahan's articles are written with almost religious zeal but they are backed up by examples and results, a couple of which I have confirmed.\footnote{One that doesn't work is an example of wierd arithmetic in Excel 2000 when repeated in LibreOffice Calc 4.1.5.3} This is the unpublished work. I haven't read any of the published papers yet. 
+
+The articles are filled sporadically with lamentation for the decline of experts in numerical analysis (and error) which is somewhat ironic; if there were no IEEE 754 standard meaning any man/woman and his/her dog could write floating point arithmetic and expect it to produce platform independent results\footnote{Even if, as Kahan's articles often point out, platforms aren't actually following IEEE 754} this decline would probably not have happened.
+
+These examples would be of real value if the topic of the project were on accuracy of numerical operations. They also explain some of the more bizarre features of IEEE 754 (in a manner attacking those who dismiss these features as being too bizarre to care about of course).
+
+It's somewhat surprising he hasn't written anything (that I could see from scanning the list) about the lack of IEEE in PDF/PostScript (until Adobe Reader 6) and further that the precision is only binary32.
+
+I kind of feel really guilty saying this, but since our aim is to obtain arbitrary scaling, it will be inevitable that we break from the IEEE 754 standard. Or at least, I (Sam) will. Probably. Also there is no way I will have time to read and understand the thousands of pages that Kahan has written.
+
+Therefore we might end up doing some things Kahan would not approve of.
+
+Still this is a very valuable reference to go in the overview of floating point arithmetic section.
+
  \chapter{General Notes}
  
  \section{The DOM Model}
  \chapter{General Notes}
  
  \section{The DOM Model}