Notes on how GPUs suck at rendering circles

[ipdf/documents.git] / FloatingPointCPUvsGPU.tex
diff --git a/FloatingPointCPUvsGPU.tex b/FloatingPointCPUvsGPU.tex

new file mode 100644 (file)

index 0000000..0ad0df8
--- /dev/null
+++ b/FloatingPointCPUvsGPU.tex
@@ -0,0 +1,92 @@
+\documentclass[11pt]{article}
+\input{template}
+
+\begin{document}
+\title{Floating Point and CPU vs GPU Rendering}
+\author{Sam Moore, David Gow}
+\maketitle
+
+\section*{Abstract}
+
+Modern GPUs appear to not be compliant with IEEE-754 when using floating point operations. There is also difference between the behaviour of different GPU models.
+We compare the rendering of filled circles on a x86-64 CPU and a GPU (AMD/ATI  Whilstler LE [Radeon HD 6610M/7610M]).
+
+\section{Introduction}
+
+Although it is well known that the behaviour of GPU drivers is inconsistent, there is little research into the behaviour of floating point operations using such drivers. 
+
+In 2004 Hillesland and Lastra adapted Kahan's well known program for testing floating point arithmetic on CPUs during the 1980s ``Paranoia'' for GPUs and found that many GPUs did not appear to be compliant with IEEE-754\cite{hillesland2004paranoia}.
+
+Given the recent interest in use of the GPU for vector graphics\cite{kilgard2012gpu} this seems worthy of further investigation.
+Using the same algorithm implemented in C/C++ and GLSL we have shown that a particular GPU using a particular driver exhibit less precision than IEEE-754 binary32 floats.
+
+\section{Algorithm}
+
+For each integer valued $(x,y)$ in the bounding rectangle, if $x^2 + y^2 \leq r^2$ where $r$ is the radius of the circle, then $(x,y)$ should be filled.
+
+There are two sources of error; the coordinate transforms of vertices \emph{before} rendering, and the operations required to subsequently render the circle at that position. The difference between using the CPU and GPU for the former is apparent but is only easily demonstrated in a live demo. The images below demonstrate the precision issues with the latter\footnote{Me write um good english}.
+
+\section{Results}
+
+Each pair of figures compares rendering using an x86-64 CPU and OpenGL shaders running on the AMD/ATI  Whilstler LE [Radeon HD 6610M/7610M] using the \emph{fglrx} proprietry graphics drivers.
+
+\begin{figure}[H]
+\centering
+\includegraphics[width=\textwidth]{figures/circles_gpu_vs_cpu/cpu0.png}
+\caption{A circle. Sort of.}
+\end{figure}
+
+The images produced by GPU and CPU rendering are indistinguishable at the original scale. Note that the "CPU rendering" is essentially producing a bitmap and then sending that to the GPU; so all floating point operations are still done on the CPU, wherase "GPU rendering" involves passing floats representing vertex positions to a GLSL shader program.
+
+\begin{figure}[H]
+\centering
+\includegraphics[width=\textwidth]{figures/circles_gpu_vs_cpu/cpu1.png}
+\caption{CPU, zoomed}
+\end{figure}
+
+\begin{figure}[H]
+\centering
+\includegraphics[width=\textwidth]{figures/circles_gpu_vs_cpu/gpu1.png}
+\caption{GPU, zoomed}
+\end{figure}
+
+The rounding errors begin to become apparent.
+
+\begin{figure}[H]
+\centering
+\includegraphics[width=\textwidth]{figures/circles_gpu_vs_cpu/cpu2.png}
+\caption{CPU, zoomed more}
+\end{figure}
+
+\begin{figure}[H]
+\centering
+\includegraphics[width=\textwidth]{figures/circles_gpu_vs_cpu/gpu2.png}
+\caption{GPU, zoomed more}
+\end{figure}
+
+Even worse...
+
+\begin{figure}[H]
+\centering
+\includegraphics[width=\textwidth]{figures/circles_gpu_vs_cpu/cpu3.png}
+\caption{CPU, zoomed more}
+\end{figure}
+
+\begin{figure}[H]
+\centering
+\includegraphics[width=\textwidth]{figures/circles_gpu_vs_cpu/gpu3.png}
+\caption{GPU, zoomed more}
+\end{figure}
+
+Image is not recognisable as once being a circle.
+Note that at this scale there are also issues with translation using the CPU (ie: It was not possible to position the circle so that it covered the same fraction of the screen as in the earlier images).
+
+It is hard to tell how much of this is due to bugs in the fglrx driver and how much is actually due to physical limitations on the GPU hardware.
+
+David found code for a graphics driver with a \verb/USE_IEEE_FLOATS/ define that was \verb/false/ by default...\footnote{David say more here?}
+
+\section{Conclusions}
+
+fglrx is pretty terrible. GPUs are probably not as good at floating point as CPUs.
+
+\end{document}