Notes on Goldberg papers

author Sam Moore <[email protected]>

Mon, 31 Mar 2014 00:49:35 +0000 (08:49 +0800)

committer Sam Moore <[email protected]>

Mon, 31 Mar 2014 00:49:35 +0000 (08:49 +0800)
author Sam Moore <[email protected]>
Mon, 31 Mar 2014 00:49:35 +0000 (08:49 +0800)
committer Sam Moore <[email protected]>
Mon, 31 Mar 2014 00:49:35 +0000 (08:49 +0800)
diff --git a/LiteratureNotes.pdf b/LiteratureNotes.pdf

index ef77c50..8760bb2 100644 (file)

Binary files a/LiteratureNotes.pdf and b/LiteratureNotes.pdf differ
diff --git a/LiteratureNotes.tex b/LiteratureNotes.tex

index fd86b5a..6dfa524 100644 (file)
--- a/LiteratureNotes.tex
+++ b/LiteratureNotes.tex
@@ -188,6 +188,74 @@ This paper uses Metaphors a lot. I never met a phor that didn't over extend itse
         \item Title pretty much summarises it; similar to \cite{hayes2012pixels} except these guys actually did something practical
  \end{itemize}
  
+\section{27 Bits are not enough for 8 digit accuracy\cite{goldberg1967twentyseven}}
+
+Proves with maths, that rounding errors mean that you need at least $q$ bits for $p$ decimal digits. $10^p < 2^{q-1}$
+
+\begin{itemize}
+       \item Eg: For 8 decimal digits, since $10^8 < 2^{27}$ would expect to be able to represent with 27 binary digits
+       \item But: Integer part requires digits bits (regardless of fixed or floating point represenetation)
+       \item Trade-off between precision and range
+       \begin{itemize}
+               \item 9000000.0 $\to$ 9999999.9 needs 24 digits for the integer part $2^{23} = 83886008$
+       \end{itemize}
+       \item Floating point zero = smallest possible machine exponent
+       \item Floating point representation:
+       \begin{align*}
+               y &= 0.y_1 y_2 \text{...} y_q \times 2^{n}
+       \end{align*}
+       \item Can eliminate a bit by considering whether $n = -e$ for $-e$ the smallest machine exponent (???)
+       \begin{itemize}
+               \item Get very small numbers with the same precision    
+               \item Get large numbers with the extra bit of precision
+       \end{itemize}
+\end{itemize}
+
+\section{What every computer scientist should know about floating-point arithmetic\cite{goldberg1991whatevery}}
+
+\begin{itemize}
+       \item Book: \emph{Floating Point Computation} by Pat Sterbenz (out of print... in 1991)
+    \item IEEE floating point standard becoming popular (introduced in 1987, this is 1991)
+    \begin{itemize}
+               \item As well as structure, defines the algorithms for addition, multiplication, division and square root
+               \item Makes things portable because results of operations are the same on all machines (following the standard)
+               \item Alternatives to floating point: Floating slasi and Signed Logarithm (TODO: Look at these, although they will probably not be useful)
+
+       \end{itemize}
+       \item Base $\beta$ and precision $p$ (number of digits to represent with) - powers of the base can be represented exactly.
+       \item Largest and smallest exponents $e_{min}$ and $e_{max}$
+       \item Need bits for exponent and fraction, plus one for sign
+       \item ``Floating point number'' is one that can be represented exactly.
+       \item Representations are not unique! $0.01 \times 10^1 = 1.00 \times 10^{-1}$ Leading digit of one $\implies$ ``normalised''
+       \item Requiring the representation to be normalised makes it unique, {\bf but means it is impossible to represent zero}.
+       \begin{itemize}
+               \item Represent zero as $1 \times \beta^{e_{min}-1}$ - requires extra bit in the exponent
+       \end{itemize}
+       \item {\bf Rounding Error}
+       \begin{itemize}
+               \item ``Units in the last place'' eg: 0.0314159 compared to 0.0314 has ulp error of 0.159
+               \item If calculation is the nearest floating point number to the result, it will still be as much as 1/2 ulp in error
+               \item Relative error corresponding to 1/2 ulp can vary by a factor of $\beta$ ``wobble''. Written in terms of $\epsilon$
+               \item Maths $\implies$ {\bf Relative error is always bounded by $\epsilon = (\beta/2)\beta^{-p}$}
+               \item Fixed relative error $\implies$ ulp can vary by a factor of $\beta$ . Vice versa
+               \item Larger $\beta \implies$ larger errors
+       \end{itemize}
+       \item {\bf Guard Digits}
+       \begin{itemize}
+               \item In subtraction: Could compute exact difference and then round; this is expensive
+               \item Keep fixed number of digits but shift operand right; discard precision. Lead to relative error up to $\beta - 1$
+               \item Guard digit: Add extra digits before truncating. Leads to relative error of less than $2\epsilon$. This also applies to addition
+       \end{itemize}
+       \item {\bf Catastrophic Cancellation} - Operands are subject to rounding errors - multiplication
+       \item {\bf Benign Cancellation} - Subtractions. Error $< 2\epsilon$
+       \item Rearrange formula to avoid catastrophic cancellation
+       \item Historical interest only - speculation on why IBM used $\beta = 16$ for the system/370 - increased range? Avoids shifting
+       \item Precision: IEEE defines extended precision (a lower bound only)
+       \item Discussion of the IEEE standard for operations (TODO: Go over in more detail)
+       \item NaN allow continuing with underflow and Infinity with overflow
+       \item ``Incidentally, some people think that the solution to such anomalies is never to compare floating-point numbers for equality but instead to consider them equal if they are within some error bound E. This is hardly a cure all, because it raises as many questions as it answers.'' - On equality of floating point numbers
+
+\end{itemize}
  
  
  
diff --git a/ProjectProposalDavid.pdf b/ProjectProposalDavid.pdf

index 0dac40c..6c3f16e 100644 (file)

Binary files a/ProjectProposalDavid.pdf and b/ProjectProposalDavid.pdf differ
diff --git a/ProjectProposalSam.pdf b/ProjectProposalSam.pdf

index 8bd080e..7cc5d2f 100644 (file)

Binary files a/ProjectProposalSam.pdf and b/ProjectProposalSam.pdf differ
diff --git a/papers.bib b/papers.bib

index bb77ec4..c4a857c 100644 (file)
--- a/papers.bib
+++ b/papers.bib
@@ -20,7 +20,7 @@
  % Floating-pt Precision
  %%%%%%%%%%%%%%%%%%%%%%%
  Goldberg:1991:CSK:103162.103163,
-@article{goldberg91whatevery,
+@article{goldberg1991whatevery,
   author = {Goldberg, David},
   title = {What Every Computer Scientist Should Know About Floating-point Arithmetic},
   journal = {ACM Comput. Surv.},
@@ -347,3 +347,24 @@ month={Jun},
  pages={132-143},
  keywords={digital arithmetic;number theory;coordinates;floating point arithmetic;intersection point;line intersection;line segment;Algorithm design and analysis;Costs;Error analysis;Floating-point arithmetic;Hardware;High performance computing;Libraries;Mathematics;Packaging;Roundoff errors},
  doi={10.1109/ARITH.1991.145549},}
+
+@article{goldbern1967twentyseven,
+ author = {Goldberg, I. Bennett},
+ title = {27 Bits Are Not Enough for 8-digit Accuracy},
+ journal = {Commun. ACM},
+ issue_date = {Feb. 1967},
+ volume = {10},
+ number = {2},
+ month = feb,
+ year = {1967},
+ issn = {0001-0782},
+ pages = {105--106},
+ numpages = {2},
+ url = {http://doi.acm.org/10.1145/363067.363112},
+ doi = {10.1145/363067.363112},
+ acmid = {363112},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+} 
+
+
diff --git a/references/goldberg1967twentyseven.pdf b/references/goldberg1967twentyseven.pdf

new file mode 100644 (file)

index 0000000..2c30e3f

Binary files /dev/null and b/references/goldberg1967twentyseven.pdf differ
author	Sam Moore <[email protected]>
	Mon, 31 Mar 2014 00:49:35 +0000 (08:49 +0800)
committer	Sam Moore <[email protected]>
	Mon, 31 Mar 2014 00:49:35 +0000 (08:49 +0800)
LiteratureNotes.pdf		patch \| blob \| history
LiteratureNotes.tex		patch \| blob \| history
ProjectProposalDavid.pdf		patch \| blob \| history
ProjectProposalSam.pdf		patch \| blob \| history
papers.bib		patch \| blob \| history
references/goldberg1967twentyseven.pdf	[new file with mode: 0644]	patch \| blob