X-Git-Url: https://git.ucc.asn.au/?p=ipdf%2Fsam.git;a=blobdiff_plain;f=chapters%2FBackground%2FFloats%2FOperations.tex;fp=chapters%2FBackground%2FFloats%2FOperations.tex;h=1a724f97580290630835a14e7eb38ab385249e18;hp=0000000000000000000000000000000000000000;hb=9fcf44a0c34f393689118e913a2d17d907036c85;hpb=d5e7e14d2ec624cfe0febcccd81e95082ef1c175;ds=sidebyside

diff --git a/chapters/Background/Floats/Operations.tex b/chapters/Background/Floats/Operations.tex
new file mode 100644
index 0000000..1a724f9
--- /dev/null
+++ b/chapters/Background/Floats/Operations.tex
@@ -0,0 +1,29 @@
+
+
+Real values which cannot be represented exactly in a floating point representation must be rounded to the nearest floating point value. The results of a floating point operation will in general be such values and thus there is a rounding error possible in any floating point operation. Referring to Figure \ref{floats.pdf} it can be seen that the largest possible rounding error is half the distance between successive floats; this means that rounding errors increase as the value to be represented increases.
+
+
+
+{\bf Put this stuff in an Appendix?}
+\subsection{Addition and Subtraction}
+
+According to the IEEE-754 standard, if $e_1 < e_2$, then the preferred form of $f_1 + f_2$ is:
+\begin{align}
+	m_1 \beta^{e_1} \pm m_2 \beta^{e_2} &= (m_1 \pm \beta^{e_2 - e_1} m_2) \beta^{e_1}
+\end{align}
+
+This is equivelant to shifting the fixed point in $m_2$ by $e_2 - e_1$ to the left, and then performing fixed point addition or subtraction. If the result of the addition/subtraction requires a carry/borrow, divide result by $\beta$ (ie: shift digits by $1$ the right) and increment/decrement exponent. Then normalise the result (subtract leading zeros in mantissa from the exponent). Lastly perform the rounding operation; if this would generate a carry/borrow, shift right and increment/decrement exponent again, repeat.
+
+
+\subsection{Multiplication and Division}
+\begin{align}
+	m_1 \beta^{e_1} \times m_2 \beta^{e_2} &= (m_1 \times m_2 ) \beta^{e_1 + e_2}
+\end{align}
+
+\begin{align}
+	m_1 \beta^{e_1} \div m_2 \beta^{e_2} &= (m_1 \div m_2 ) \beta^{e_1 - e_2}
+\end{align}
+
+Multiplication and Division are not inverses.
+
+Floating point operations can in principle be performed using integer operations, but specialised Floating Point Units (FPUs) are an almost universal component of modern processors\cite{kelley1997acmos}. The improvement of FPUs remains highly active in several areas including: efficiency\cite{seidel2001onthe}; accuracy of operations\cite{dieter2007lowcost}; and even the adaptation of algorithms originally used in software, such as Kahan's Fast2Sum algorithm\cite{kadric2013accurate}.