X-Git-Url: https://git.ucc.asn.au/?p=ipdf%2Fsam.git;a=blobdiff_plain;f=chapters%2FBackground.tex;h=3cf44a8840bbba40cce6fff3f301c1c40381e133;hp=8b8ad5038f4882533be7437686828d2ad112de87;hb=0d7e6aa4d2966020240ea5b5f2a824502f271eaa;hpb=1e1740165abac91f4f620ef8223a30e37e7124ab diff --git a/chapters/Background.tex b/chapters/Background.tex index 8b8ad50..3cf44a8 100644 --- a/chapters/Background.tex +++ b/chapters/Background.tex @@ -71,7 +71,7 @@ Haye's 2012 article ``Pixels or Perish'' discusses the recent history and curren \subsection{The Portable Document Format} -Adobe's Portable Document Format (PDF) is currently used almost universally for sharing documents; the ability to export or print to PDF can be found in most graphical document editors and even some plain text editors\cite{cheng2002finally}. +Adobe's Portable Document Format (PDF) is currently used almost universally for sharing documents; the ability to export or print to PDF can be found in most graphical document editors and even some plain text editors\cite{cheng2002portable}. Hayes describes PDF as ``... essentially 'flattened' PostScript; it’s what’s left when you remove all the procedures and loops in a program, replacing them with sequences of simple drawing commands.''\cite{hayes2012pixels}. Consultation of the PDF 1.7 standard shows that this statement does not a give a complete picture --- despite being based on the Adobe PostScript model of a document as a series of ``pages'' to be printed by executing sequential instructions, from version 1.5 the PDF standard began to borrow some ideas from the Document Object Model. For example, interactive elements such as forms may be included as XHTML objects and styled using CSS. ``Actions'' are objects used to modify the data structure dynamically. In particular, it is possible to include Javascript Actions. Adobe defines the API for Javascript actions seperately to the PDF standard\cite{js_3d_pdf}. There is some evidence in the literature of attempts to exploit these features, with mixed success\cite{barnes2013embedding, hayes2012pixels}. @@ -88,9 +88,9 @@ The PostScript reference describes a ``Real'' object for representing coordinate \subsection{PDF} PDF defines ``Real'' objects in a similar way to PostScript, but suggests a range of $\pm3.403\times10^{38}$ and smallest non-zero values of $\pm1.175\times10^{38}$\cite{pdfref17}. A note in the PDF 1.7 manual mentions that Acrobat 6 now uses IEEE-754 single precision floats, but ``previous versions used 32-bit fixed point numbers'' and ``... Acrobat 6 still converts floating-point numbers to fixed point for some components''. -\subsection{\TeX and METAFONT} +\subsection{{\TeX} and METAFONT} -In ``The METAFONT book'' Knuth appears to describe coordinates as fixed point numbers: ``The computer works internally with coordinates that are integer multiples of $\frac{1}{65536} \approx 0.00002$ of the width of a pixel''\cite{knuth1983metafont}. There is no mention of precision in ``The \TeX book''. In 2007 Beebe claimed that {\TeX} uses a $14.16$ fixed point encoding, and that this was due to the lack of standardised floating point arithmetic on computers at the time; a problem that the IEEE-754 was designed to solve\cite{beebe2007extending}. Beebe also suggests that \TeX and METAFONT could now be modified to use IEEE-754 arithmetic. +In ``The METAFONT book'' Knuth appears to describe coordinates as fixed point numbers: ``The computer works internally with coordinates that are integer multiples of $\frac{1}{65536} \approx 0.00002$ of the width of a pixel''\cite{knuth1983metafont}. \footnote{This corresponds to using $16$ bits for the fractional component of a fixed point representation} There is no mention of precision in ``The {\TeX} book''. In 2007 Beebe claimed that {\TeX} uses a $14.16$ fixed point encoding, and that this was due to the lack of standardised floating point arithmetic on computers at the time; a problem that the IEEE-754 was designed to solve\cite{beebe2007extending}. Beebe also suggested that {\TeX} and METAFONT could now be modified to use IEEE-754 arithmetic. \subsection{SVG} @@ -107,7 +107,9 @@ The Number type does differ slightly from IEEE-754 in that there is only a singl \section{Number Representations}\label{Number Representations} -Consider a value of $7.25 = 2^2 + 2^1 + 2^0 + 2^{-2}$. In binary (base 2), this could be written as $111.01_2$ Such a value would require 5 binary digits (bits) of memory to represent exactly in computer hardware. Some values, for example $7.3$ can not be represented exactly in one base (decimal) but not another; in binary the sequence $111.010\text{...}_2$ will never terminate. A rational value such as $\frac{7}{3}$ could not be represented exactly in any base, but could be represented by the combination of a numerator $7 = 111_2$ and denominator $3 = 11_2$. Lastly, some values such as $e \approx 2.81\text{...}$ can only be expressed exactly using a symbolical system --- in this case as the result of an infinite summation --- $e = \displaystyle\sum_n=0^{\infty}\frac{1}{n!}$ +\subsection{Exact Representations} + +Consider a value of $7.25 = 2^2 + 2^1 + 2^0 + 2^{-2}$. In binary (base 2), this could be written as $111.01_2$ Such a value would require 5 binary digits (bits) of memory to represent exactly in computer hardware. Some values, for example $7.3$ can not be represented exactly in one base (decimal) but not another; in binary the sequence $111.010\text{...}_2$ will never terminate. A rational value such as $\frac{7}{3}$ could not be represented exactly in any base, but could be represented by the combination of a numerator $7 = 111_2$ and denominator $3 = 11_2$. Lastly, some values such as $e \approx 2.718\text{...}$ can only be expressed exactly using a symbolical system --- in this case as the result of an infinite summation $e = \displaystyle\sum_n^{\infty} 1/n!$ Modern computer hardware typically supports integer and floating-point number representations and operations. Due to physical limitations, the size of these representations is limited; this is the fundamental source of both limits on range and precision in computer based calculations. @@ -115,11 +117,7 @@ Modern computer hardware typically supports integer and floating-point number re Whilst a Fixed Point representation keeps the ``point'' at the same position in a string of bits, Floating point representations can be thought of as analogous to scientific notation; an ``exponent'' and fixed point value are encoded, with multiplication by the exponent moving the position of the point. -A floating point number $x$ is commonly represented by a tuple of values $(s, e, m)$ in base $B$ as\cite{HFP, ieee2008-754}: - -\begin{align*} - x &= (-1)^{s} \times m \times B^{e} -\end{align*} +A floating point number $x$ is commonly represented by a tuple of values $(s, e, m)$ in base $B$ as\cite{HFP, ieee2008-754}: $x = (-1)^{s} \times m \times B^{e}$ Where $s$ is the sign and may be zero or one, $m$ is commonly called the ``mantissa'' and $e$ is the exponent. Whilst $e$ is an integer in some range $\pm e_max$, the mantissa $m$ is a fixed point value in the range $0 < m < B$. @@ -130,7 +128,7 @@ The IEEE-754 encoding of $s$, $e$ and $m$ requires a fixed number of continuous The encoding of $m$ in the IEEE-754 standard is not exactly equivelant to a fixed point value. By assuming an implicit leading bit (ie: restricting $1 \leq m < 2$) except for when $e = 0$, floating point values are gauranteed to have a unique representations; these representations are said to be ``normalised''. When $e = 0$ the leading bit is not implied; these representations are called ``denormals'' because multiple representations may map to the same real value. The idea of using an implicit bit appears to have been considered by Goldberg as early as 1967\cite{goldbern1967twentyseven}. -Figure \ref{float.pdf}\footnote{In a digital PDF viewer we suggest increasing the zoom level --- the graphs were created from SVG images} shows the positive real numbers which can be represented exactly by an 8 bit floating point number encoded in the IEEE-754 format\footnote{Not quite; we are ignoring the IEEE-754 definitions of NaN and Infinity for simplicity}, and the distance between successive floating point numbers. We show two encodings using (1,2,5) and (1,3,4) bits to encode (sign, exponent, mantissa) respectively. For each distinct value of the exponent, the successive floating point representations lie on a straight line with constant slope. As the exponent increases, larger values are represented, but the distance between successive values increases; this can be seen on the right. The marked single point discontinuity at \verb/0x10/ and \verb/0x20/ occur when $e$ leaves the denormalised region and the encoding of $m$ changes. We have also plotted a fixed point representation for comparison; fixed point and integer representations appear as straight lines - the distance between points is always constant. +Figure \ref{floats.pdf}\footnote{In a digital PDF viewer we suggest increasing the zoom level --- the graphs were created from SVG images} shows the positive real numbers which can be represented exactly by an 8 bit floating point number encoded in the IEEE-754 format\footnote{Not quite; we are ignoring the IEEE-754 definitions of NaN and Infinity for simplicity}, and the distance between successive floating point numbers. We show two encodings using (1,2,5) and (1,3,4) bits to encode (sign, exponent, mantissa) respectively. For each distinct value of the exponent, the successive floating point representations lie on a straight line with constant slope. As the exponent increases, larger values are represented, but the distance between successive values increases; this can be seen on the right. The marked single point discontinuity at \verb/0x10/ and \verb/0x20/ occur when $e$ leaves the denormalised region and the encoding of $m$ changes. We have also plotted a fixed point representation for comparison; fixed point and integer representations appear as straight lines - the distance between points is always constant. The earlier example $7.25$ would be converted to a (1,3,4) floating point representation as follows: \begin{enumerate} @@ -192,7 +190,7 @@ In 1971 Dekker formalised a series of algorithms including the Fast2Sum method f \subsection{Arbitrary Precision Floating Point Numbers} -Arbitrary precision floating point numbers are implemented in a variety of software libraries which will dynamically allocate extra bits for the exponent or mantissa as required. An example is the GNU MPFR library discussed by Fousse in 2007\cite{fouse2007mpfr}. Although many arbitrary precision libraries already existed, MPFR intends to be fully compliant with some of the more obscure IEEE-754 requirements such as rounding rules and exceptions. +Arbitrary precision floating point numbers are implemented in a variety of software libraries which will dynamically allocate extra bits for the exponent or mantissa as required. An example is the GNU MPFR library discussed by Fousse in 2007\cite{fousse2007mpfr}. Although many arbitrary precision libraries already existed, MPFR intends to be fully compliant with some of the more obscure IEEE-754 requirements such as rounding rules and exceptions. As we have seen, it is trivial to find real numbers that would require an infinite number of bits to represent exactly. Implementations of ``arbitrary'' precision must carefully determine at what point rounding should occur so as to balance performance with memory usage.