X-Git-Url: https://git.ucc.asn.au/?p=ipdf%2Fsam.git;a=blobdiff_plain;f=chapters%2FBackground%2FFloats%2FDefinition.tex;h=76b355107dbfa88318ee5b4048ffedc33825455b;hp=50029630f1b220b535c34a86a93d1fc966ba7ddc;hb=HEAD;hpb=a1ede3cfc3ef650aa0f7d3d06e78c6c6ef4cb0cc diff --git a/chapters/Background/Floats/Definition.tex b/chapters/Background/Floats/Definition.tex index 5002963..76b3551 100644 --- a/chapters/Background/Floats/Definition.tex +++ b/chapters/Background/Floats/Definition.tex @@ -5,7 +5,7 @@ Whilst a Fixed Point representation keeps the ``point'' (the location considered -A floating point number $x$ is commonly represented by a tuple of values $(s, e, m)$ in base $B$ as\cite{HFP, ieee2008-754}: $x = (-1)^{s} \times m \times B^{e}$ +A floating point number $x$ is commonly represented by a tuple of values $(s, e, m)$ in base $B$ as\cite{HFP, ieee754std2008}: $x = (-1)^{s} \times m \times B^{e}$ Where $s$ is the sign and may be zero or one, $m$ is commonly called the ``mantissa'' and $e$ is the exponent. Whilst $e$ is an integer in some range $\pm e_max$, the mantissa $m$ is a fixed point value in the range $0 < m < B$. The choice of base $B = 2$ in the original IEEE-754 standard matches the nature of modern hardware. It has also been found that this base in general gives the smallest rounding errors\cite{HFP}. @@ -15,3 +15,5 @@ The IEEE-754 encoding of $s$, $e$ and $m$ requires a fixed number of continuous The encoding of $m$ in the IEEE-754 standard is not exactly equivelant to a fixed point value. By assuming an implicit leading bit (ie: restricting $1 \leq m < 2$) except for when $e = 0$, floating point values are gauranteed to have a unique representations; these representations are said to be ``normalised''. When $e = 0$ the leading bit is not implied; these representations are called ``denormals'' because multiple representations may map to the same real value. The idea of using an implicit bit appears to have been considered by Goldberg as early as 1967\cite{goldbern1967twentyseven}, and it leads to an increase of precision near the origin. +The IEEE-754 also defines $e$ with a biased encoding and allows representation of the special values $\pm \infty$ and different types of \texttt{NaN} (Not a number) which can occur due to invalid operations (such as division by zero). A more detailed overview of IEEE-754 can be found in the ``Handbook of Floating Point Arithmetic'' \cite{HFP}. +