X-Git-Url: https://git.ucc.asn.au/?p=ipdf%2Fsam.git;a=blobdiff_plain;f=chapters%2FBackground.tex;h=1d3e141a9bc1982b58ccedab6924ed21f90bf252;hp=f11c1be75f3c0b3af4b6c77a78063b46749e7a79;hb=c9fe0f2e2b310beadf3f046a96f93d255a5b3b38;hpb=439a25f18a7a9ef114a01b11ead914d94d088ef4 diff --git a/chapters/Background.tex b/chapters/Background.tex index f11c1be..1d3e141 100644 --- a/chapters/Background.tex +++ b/chapters/Background.tex @@ -99,29 +99,12 @@ In this section we will consider various approaches and motivations to specifyin \subsection{The Portable Document Format} -``A PDF file should be thought of as a flattened representation of a data structure -consisting of a collection of objects that can refer to each other in any arbitrary -way.'' +Adobe's Portable Document Format (PDF) is used almost universally for sharing documents; the ability to export or print to PDF can be found in most graphical document editors, even text editors. -The PDF 1.7 standard describes a format which is \rephrase{essentially PostScript plus everything in the kitchen sink}. -\begin{itemize} - \item PDF is not just crippled postscript - \item Objects - has a type system, like a programming language, not like the DOM where all objects are fundamentally the same - this is similar to PostScript - \item File structure - Header, body, reference table (location of objects in file), trailer (location of reference table and special objects) - \begin{itemize} - \item Read the file from the end - \item File can be updated incrementally as long as the trailer is at the end - \end{itemize} - \item Document structure - This is basically a graph, wheras the DOM is a tree - \item Content streams - objects but conceptually different - operators or instructions - \item Interactivity --- At this point, PDF suddenly changes from being PostScript to being XML - \begin{itemize} - \item - \end{itemize} -\end{itemize} +Hayes describes PDF as ``... essentially 'flattened' PostScript; it’s what’s left when you remove all the procedures and loops in a program, replacing them with sequences of simple drawing commands.''\cite{hayes2012pixelsor}. Consultation of the PDF 1.7 standard shows that this statement does not a give a complete picture --- despite being based on the Adobe PostScript model of a document as a series of ``pages'' to be printed by executing sequential instructions, from version 1.5 the PDF standard began to borrow some ideas from the Document Object Model discussed in Section \ref{Document Object Model}. For example, interactive elements such as forms may be included as XHTML objects and styled using CSS. ``Actions'' are objects used to modify the data structure dynamically. In particular, it is possible to include Javascript Actions. Adobe defines the API for Javascript actions seperately to the PDF standard\cite{js_3d_pdf}. There is evidence in the literature of attempts to exploit these features, with mixed success\cite{barnes2013embedding, hayes2012pixelsor}. + +To quote Adobe's PDF 1.7 reference manual, ``A PDF file should be thought of as a flattened representation of a data structure consisting of a collection of objects that can refer to each other in any arbitrary way''\cite{pdfref17}. -The biggest difference between the PDF design philosophy and the HTML5 philosophy is the emphasis in PDF on the actual file format. -This means PDF is more complicated but also more efficient (at least, we would hope so). \subsection{Scientific Computation Packages}