Summarise kilgarde2012gpu and more

author Sam Moore <[email protected]>

Thu, 1 May 2014 06:37:37 +0000 (14:37 +0800)

committer Sam Moore <[email protected]>

Thu, 1 May 2014 06:37:37 +0000 (14:37 +0800)
author Sam Moore <[email protected]>
Thu, 1 May 2014 06:37:37 +0000 (14:37 +0800)
committer Sam Moore <[email protected]>
Thu, 1 May 2014 06:37:37 +0000 (14:37 +0800)
diff --git a/LiteratureNotes.pdf b/LiteratureNotes.pdf

index 5ed142a..19d262a 100644 (file)

Binary files a/LiteratureNotes.pdf and b/LiteratureNotes.pdf differ
diff --git a/LiteratureNotes.tex b/LiteratureNotes.tex

index c7aae7b..6831428 100644 (file)
--- a/LiteratureNotes.tex
+++ b/LiteratureNotes.tex
@@ -77,18 +77,56 @@ Adobe's official reference manual for PostScript.
  
  It is big.
  
+\begin{itemize}
+       \item First version was published BEFORE the IEEE standard and used smaller floats than binary32
+       \item Now uses binary32 floats.
+\end{itemize}
+
  \section{Portable Document Format Reference Manual  \cite{pdfref17}}
  
  Adobe's official reference for PDF.
  
  It is also big.
  
+\begin{itemize}
+       \item Early versions did not use IEEE binary32 but 16-16 exponent/mantissa encodings (Why?)
+       \item Current standard is restricted to binary32
+       \item It specifically says PDF creators must use at most binary32 because higher precision is not supported by Adobe Reader.
+\end{itemize}
+
  \section{IEEE Standard for Floating-Point Arithmetic \cite{ieee2008-754}}
  
  The IEEE (revised) 754 standard.
  
  It is also big.
  
+Successes:
+\begin{itemize}
+       \item Has been adopted by CPUs
+       \item Standardised floats for programmers --- accomplishes goal of allowing non-numerical experts to write reasonably sophisticated platform independent programs that may perform complex numerical operations
+\end{itemize}
+
+Failures:
+\begin{itemize}
+       \item Adoption by GPUs slower\cite{hillesland2004paranoia}
+       \item It specifies the maximum errors for operations using IEEE types but nothing about internal representations
+       \item Many hardware devices (GPUs and CPUs) use non-IEEE representations internally and simply truncate/round the result
+       \begin{itemize}
+               \item This isn't so much of a problem when the device uses additional bits but it is misleading when GPUs use less than binary32 and act as if they are using binary32 from the programmer's perspective.
+               \item Devices using {\bf less} bits internally aren't IEEE compliant
+       \end{itemize}
+       \item Thus the same program compiled and run on different architectures may give completely different results\cite{HFP}
+       \begin{itemize}
+               \item The ultimate goal of allowing people to write numerical programs in total ignorance of the hardware is not entirely realised
+       \end{itemize}
+       \item This is the sort of thing that makes people want to use a virtual machine, and thus Java
+       \begin{itemize}
+               \item Objectively I probably shouldn't say that using Java is in itself a failure
+       \end{itemize}
+       \item Standards such as PostScript and PDF were slow to adopt IEEE representations
+       \item The OpenVG standard accepts IEEE binary32 in the API but specifically states that hardware may use less than this\cite{rice2008openvg}
+\end{itemize}
+
  
  
  \pagebreak
@@ -389,6 +427,8 @@ Performance was much improved over the software rasterization and over XRender a
  on all except nVidia hardware. However, nVidia's XRender implementation did slow down significantly when
  some transformations were applied.
  
+In \cite{kilgard2012gpu}, Kilgard mentions that Glitz has been abandoned. He describes it as ''GPU assisted'' rather than GPU accelerated, since it used the XRender (??) extension.
+
  %% Sam again
  
  \section{Boost Multiprecision Library  \cite{boost_multiprecision}}
@@ -429,6 +469,8 @@ It is probably not that useful, I don't think we'll end up writing FPU assembly?
  
  FPU's typically have 80 bit registers so they can support REAL4, REAL8 and REAL10 (single, double, extended precision).
  
+Note: Presumably this is referring to the x86 80 bit floats that David was talking about?
+
  
  \section{Floating Point Package User's Guide  \cite{bishop2008floating}}
  
@@ -702,6 +744,73 @@ Example of XML parsing using pugixml is in \shell{code/src/tests/xml.cpp}
         \caption{Tree representation of the above listing \cite{pugixmlDOM}}
  \end{figure}
  
+\section{An Algorithm For Shading of Regions on Vector Display Devices \cite{brassel1979analgorithm}}
+
+All modern display devices are raster based and therefore this paper is mainly of historical interest. It provides some references for shading on a raster display.
+
+The algorithm described will shade an arbitrary simply-connected polygon using one or two sets of parallel lines.
+
+The ``traditional'' method is:
+\begin{enumerate}
+       \item Start with a $N$ vertex polygon, rotate coords by the shading angle
+       \item Determine a bounding rectangle
+       \item For $M$ equally spaced parallel lines, compute the intersections with the boundaries of the polygon
+       \item Rotate coordinates back
+       \item Render the $M$ lines
+\end{enumerate}
+
+This is pretty much exactly how an artist would shade a pencil drawing. It is $O(M\times N)$.
+
+The algorithm in this paper does:
+\begin{enumerate}
+       \item Rotate polygon coords by shading angle
+       \item Subdivide the polygon into trapezoids (special case triangle)
+       \item Shade the trapezoids independently
+       \item Rotate it all back
+\end{enumerate}
+It is more complicated than it seems. The subdivision requires a sort to be performed on the vertices of the polygon based on their rotated $x$ and $y$ coordinates.
+
+\section{An Algorithm For Filling Regions on Graphics Display Devices \cite{lane1983analgorithm}}
+
+This gives an algorithm for for polygons (which may have ``holes'').
+It requires the ability to ``subtract'' fill from a region; this is (as far as I can tell) difficult for vector graphics devices but simple on raster graphics devices, so the paper claims it is oriented to the raster graphics devices.
+
+If the polygon is defined by $(x_i, y_i)$ then this algorithm iterates from $i = 2$ and alternates between filling and erasing the triangles $[(x_i, y_i), (x_{i+1}, y_{i+1}), (x_1, y_1)]$. It requires no sorting of the points.
+
+The paper provides a proof that the algorithm is correct and is ``optimal in the number of pixel updates required for convex polygons''.
+In the conclusion it is noted that trapezoids could be used from a fixed line and edge of the polygon, but this is not pixel optimal.
+
+This paper doesn't have a very high citation count but it is cited by the NVIDIA article \cite{kilgard2012gpu}.
+Apparently someone else adapted this algorithm for use with the stencil buffer.
+
+\section{GPU-accelerated path rendering \cite{kilgard2012gpu, kilgard300programming}}
+
+Vector graphics on the GPU; an NVIDIA extension. \cite{kilgard300programming} is the API.
+
+Motivations:
+\begin{itemize}
+       \item The focus has been on 3D acceleration in GPUs; most path rendering is done by the CPU.
+       \item Touch devices allow the screen to be transformed rapidly; CPU rastering of the path becomes inefficient
+       \begin{itemize}
+               \item The source of the ugly pixelated effects on a smartphone when scaling?
+       \end{itemize}
+       \item Especially when combined with increased resolution of these devices
+       \item Standards such as HTML5, SVG, etc, expose path rendering
+       \item Javascript is getting fast enough that we can't blame it anymore (the path rendering is the bottleneck not the JS)
+       \item GPU is more power efficient than the CPU
+\end{itemize}
+
+Results show the extension is faster than almost every renderer it was compared with for almost every test image.
+
+Comparisons to other attempts:
+\begin{itemize}
+       \item Cairo and Glitz \cite{nilsson2004glitz} (abandoned)
+\      \item Direct2D from Microsoft uses CPU to tesselate trapezoids and then renders these on the GPU
+       \item Skia in Android/Chrome uses CPU but now has Ganesh which is also hybrid CPU/GPU
+       \item Khronos Group created OpenVG\cite{rice2008openvg} with several companies creating hardware units to implement the standard. Performance is not as good as ``what we report''
+\end{itemize}
+
+
  \chapter{General Notes}
  
  \section{The DOM Model}
diff --git a/papers.bib b/papers.bib

index 7ca29f4..e99c9c8 100644 (file)
--- a/papers.bib
+++ b/papers.bib
@@ -632,3 +632,52 @@ language={English}
         journal = "pugixml 1.4 manual",
         howpublished = "http://pugixml.googlecode.com/svn/tags/latest/docs/manual/dom.html"
  }
+
+% Rendering vector graphics on vector display devices (historical)
+%Brassel:1979:ASR:965103.807434,
+@article{brassel1979analgorithm,
+ author = {Brassel, Kurt E. and Fegeas, Robin},
+ title = {An Algorithm for Shading of Regions on Vector Display Devices},
+ journal = {SIGGRAPH Comput. Graph.},
+ issue_date = {August 1979},
+ volume = {13},
+ number = {2},
+ month = aug,
+ year = {1979},
+ issn = {0097-8930},
+ pages = {126--133},
+ numpages = {8},
+ url = {http://doi.acm.org.ezproxy.library.uwa.edu.au/10.1145/965103.807434},
+ doi = {10.1145/965103.807434},
+ acmid = {807434},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+ keywords = {Cartography, Computer graphics, Line-drawing processing, Polygons, Shading, Software, Spatial information},
+} 
+%Lane:1983:AFR:357323.357326,
+@article{lane1983analgorithm,
+ author = {Lane, J. M. and M. Rarick, R. and},
+ title = {An Algorithm for Filling Regions on Graphics Display Devices},
+ journal = {ACM Trans. Graph.},
+ issue_date = {July 1983},
+ volume = {2},
+ number = {3},
+ month = jul,
+ year = {1983},
+ issn = {0730-0301},
+ pages = {192--196},
+ numpages = {5},
+ url = {http://doi.acm.org.ezproxy.library.uwa.edu.au/10.1145/357323.357326},
+ doi = {10.1145/357323.357326},
+ acmid = {357326},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+}
+
+@article{hillesland2004paranoia,
+       author = "Karl E Hillesland and Anselmo Lastra",
+       title = "GPU Floating-Point Paranoia",
+       journal = "Proceedings of GP 2004",
+       year = 2004,
+       url = "\url{http://www.cs.unc.edu/~ibr/projects/paranoia/}"
+}
diff --git a/references/brassel1979analgorithm.pdf b/references/brassel1979analgorithm.pdf

new file mode 100644 (file)

index 0000000..d0c4a3b

Binary files /dev/null and b/references/brassel1979analgorithm.pdf differ
diff --git a/references/lane1983analgorithm.pdf b/references/lane1983analgorithm.pdf

new file mode 100644 (file)

index 0000000..f6b507a

Binary files /dev/null and b/references/lane1983analgorithm.pdf differ
author	Sam Moore <[email protected]>
	Thu, 1 May 2014 06:37:37 +0000 (14:37 +0800)
committer	Sam Moore <[email protected]>
	Thu, 1 May 2014 06:37:37 +0000 (14:37 +0800)
LiteratureNotes.pdf		patch \| blob \| history
LiteratureNotes.tex		patch \| blob \| history
papers.bib		patch \| blob \| history
references/brassel1979analgorithm.pdf	[new file with mode: 0644]	patch \| blob
references/lane1983analgorithm.pdf	[new file with mode: 0644]	patch \| blob