coordinate system transformations to provide the best possible precision and to prevent round-off errors.''\cite{svg2011-1.1} An SVG Viewer may refer to itself as ``High Quality'' if it uses a minimum of ``double precision'' floats.
\subsection{Javascript}
-We include Javascript here due to its relation with the SVG, HTML5 and PDF standards.
-
+%We include Javascript here due to its relation with the SVG, HTML5 and PDF standards.
According to the EMCA-262 standard, ``The Number type has exactly 18437736874454810627 (that is, $2^64-^53+3$) values,
representing the double-precision 64-bit format IEEE 754 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic''\cite{ecma-262}.
The Number type does differ slightly from IEEE-754 in that there is only a single valid representation of ``Not a Number'' (NaN). The EMCA-262 does not define an ``integer'' representation.
-\section{Number Representations}
-Consider a value of $7.25 = 2^2 + 2^1 + 2^0 + 2^{-2}$. In binary, this can be written as $111.01_2$ Such a value would require 5 binary digits (bits) of memory to represent exactly. On the other hand a rational value such as $7\frac{1}{3}$ could not be represented exactly; the sequence of bits $111.0111 \text{ ...}_2$ never terminates.
+\section{Number Representations}
-Integer representations do not allow for a ``point'' as shown in these examples, that is, no bits are reserved for the fractional part of a number. $7.25$ would be rounded to the nearest integer value, $7$. A fixed point representation keeps the decimal place at the same position for all values. Floating point representations can be thought of as analogous to scientific notation; an ``exponent'' and fixed point value are encoded, with multiplication by the exponent moving the position of the point.
-Modern computer hardware typically supports integer and floating-point number representations. Due to physical limitations, the size of these representations is limited; this is the fundamental source of both limits on range and precision in computer based calculations.
+Consider a value of $7.25 = 2^2 + 2^1 + 2^0 + 2^{-2}$. In binary, this can be written as $111.01_2$ Such a value would require 5 binary digits (bits) of memory to represent exactly. On the other hand a rational value such as $7\frac{1}{3}$ could not be represented exactly; the sequence of bits $111.0111 \text{ ...}_2$ never terminates. Modern computer hardware typically supports integer and floating-point number representations. Due to physical limitations, the size of these representations is limited; this is the fundamental source of both limits on range and precision in computer based calculations.
+A Fixed Point representation keeps the ``point'' at the same position in a string of bits. Floating point representations can be thought of as analogous to scientific notation; an ``exponent'' and fixed point value are encoded, with multiplication by the exponent moving the position of the point.
-\subsection{Floating Point Definition}
+\subsection{Floating Point Definitions}
A floating point number $x$ is commonly represented by a tuple of values $(s, e, m)$ in base $B$ as\cite{HFP, ieee2008-754}:
The IEEE-754 encoding of $s$, $e$ and $m$ requires a fixed number of continuous bits dedicated to each value. Originally two encodings were defined: binary32 and binary64. $s$ is always encoded in a single leading bit, whilst (8,23) and (11,53) bits are used for the (exponent, mantissa) encodings respectively.
-$m$ as encoded in the IEEE-754 standard is not exactly equivelant to a fixed point value. By assuming an implicit leading bit (ie: restricting $1 \leq m < 2$) except for when $e = 0$, floating point values are gauranteed to have unique representations. When $e = 0$ the leading bit is not implied; these values are called ``denormals''. This idea, which allows for 1 extra bit of precision for the normalised values appears to have been considered by Goldberg as early as 1967\cite{goldbern1967twentyseven}.
+The encoding of $m$ in the IEEE-754 standard is not exactly equivelant to a fixed point value. By assuming an implicit leading bit (ie: restricting $1 \leq m < 2$) except for when $e = 0$, floating point values are gauranteed to have a unique representations; these representations are said to be ``normalised''. When $e = 0$ the leading bit is not implied; these representations are called ``denormals'' because multiple representations may map to the same real value. This idea, which allows for one extra bit of precision when using normalised values appears to have been considered by Goldberg as early as 1967\cite{goldbern1967twentyseven}.
-Figure \ref{float.pdf} shows the positive real numbers which can be represented exactly by an 8 bit floating point number encoded in the IEEE-754 format\footnote{Not quite; we are ignoring the IEEE-754 definitions of NaN and Infinity for simplicity}, and the distance between successive floating point numbers. We show two encodings using (1,2,5) and (1,3,4) bits to encode (sign, exponent, mantissa) respectively. We have also plotted a fixed point representation. For each distinct value of the exponent, the successive floating point representations lie on a straight line with constant slope. As the exponent increases, larger values are represented, but the distance between successive values increases; this can be seen on the right. The marked single point discontinuity at \verb/0x10/ and \verb/0x20/ occur when $e$ leaves the denormalised region and the encoding of $m$ changes.
+Figure \ref{float.pdf}\footnote{In a digital PDF viewer we suggest increasing the zoom level --- the graphs were created from SVG images} shows the positive real numbers which can be represented exactly by an 8 bit floating point number encoded in the IEEE-754 format\footnote{Not quite; we are ignoring the IEEE-754 definitions of NaN and Infinity for simplicity}, and the distance between successive floating point numbers. We show two encodings using (1,2,5) and (1,3,4) bits to encode (sign, exponent, mantissa) respectively. For each distinct value of the exponent, the successive floating point representations lie on a straight line with constant slope. As the exponent increases, larger values are represented, but the distance between successive values increases; this can be seen on the right. The marked single point discontinuity at \verb/0x10/ and \verb/0x20/ occur when $e$ leaves the denormalised region and the encoding of $m$ changes. We have also plotted a fixed point representation for comparison; fixed point and integer representations appear as straight lines - the distance between points is always constant.
The earlier example $7.25$ would be converted to a (1,3,4) floating point representation as follows:
\begin{enumerate}
In 1971 Dekker formalised a series of algorithms including the Fast2Sum method for calculating the correction term due to accumulated rounding errors\cite{dekker1971afloating}. The exact result of $x + y$ may be expressed in terms of floating point operations with rounding as follows:
\begin{align*}
- z &= \text{RN}(x + y) w &= \text{RN}(z - x) \\
- zz &= \text{RN}(y - w) \implies x + y &= zz
+ z = \text{RN}(x + y) &\quad w = \text{RN}(z - x) \\
+ zz = \text{RN}(y - w) &\quad \implies x + y = zz
\end{align*}
\subsection{Arbitrary Precision Floating Point Numbers}