The OpenGL pipeline has been designed
to be
data
parallel. For example, processing a fragment
can be
done independently of other fragments.
Therefore, a shader program can be run in
parallel for each fragment, running simultaneously on different threads.
Modern GPUs have a massive number of
cores. For
example, the NVidia GeForce GTX 980
has
2048
CUDA cores. Each core can do vertex, geometry, or
fragment shader calculations.
Although shaders were
originally designed for graphics, their parallel nature combined
with performance of GPUs have found new applications
beyond graphics.
General purpose GPU (GP-GPU) programming is an interesting topic;
different programming mechanisms exist (compute shaders, OpenCL,
etc.).