Approximation by sRGB colors¶
Problems caused by approximation¶
In the previous section, we described color maps as curves in a uniform color space. The most common standard for computer displays, however, is sRGB [InternationalECommission99], which is not a uniform color space. Because sRGB is intended as a standard for hardware capabilities, it is not appropriate for designing color maps. Displaying a color map on a computer requires converting from a uniform color space to sRGB, but this introduces several complications.
The first complication is the limited color palette. sRGB expresses colors in terms of red, green, and blue stimuli. The standard specifies that \(2^8 = 256\) intensities of each stimulus are available, indexed by integers between 0 and 255 inclusive. The intensities of the three stimuli together require 24 bits to store, so this bit depth is often referred to as “24-bit color.” sRGB output devices are not required to support fractional intensities, so sRGB can express \((2^8)^3 = 16,777,216\) colors and no others.
It is not straightforward to approximate even a single color by an sRGB color because the colors expressible by sRGB are not evenly distributed in a uniform color space. As an example, consider the \(J' = 10\) cross-section of CAM16-UCS:
The colors in the picture were sampled on a grid with 65,536 points in each direction. Each color was converted to sRGB, and colors requiring a stimulus value below 0 or above 255 were removed. The nearest sRGB color to each (measured in CAM16-UCS) was found; there were 6,069 distinct sRGB colors. We placed a circle at the \((a', b')\) coordinates of each of these colors. The color of each circle is the corresponding sRGB color, but lightened for visual clarity. Lightening was done by translating and linearly rescaling the \(J'\) values of the sRGB colors so that they spanned the range from 30 and 90. The actual range of \(J'\) values was 9.568 to 10.498, and the median \(J'\) value was 10.008.
As a specific example of a difficult color, consider the color with CAM16-UCS coordinates \((J', a', b') = (82.0, -23.9, -15.1)\). In sRGB coordinates, this color is (9.6, 221.5, 249.1) (when sRGB coordinates are on a scale of 0 to 255). The most obvious possibility is to round this to (10, 222, 249). The rounded color has CAM16-UCS coordinates (82.13, -23.99, -14.96), putting it 0.2088 away from the original color. But the sRGB color (1, 222, 250) is 12.3% better. When this color is converted to CAM16-UCS, it has coordinates (82.13, -23.95, -15.22), which is only 0.1830 away from the original color.
For some colors, the perceived difference between the nearest sRGB color and the color found by rounding in sRGB coordinates is even more extreme. The color with CAM16-UCS coordinates (2.3, 5.7, 2.2), for example, converts to the sRGB color (4.281, 0.504, 0.498). This rounds to (4, 1, 0), which is (2.48, 2.99, 4.49) in CAM16-UCS. But the sRGB color (6, 1, 1) converts to (2.94, 5.72, 2.09), which is 5.47 times closer to the original color!
This problem can be avoided by transforming all \(2^{24}\) sRGB colors into the uniform color space and storing them in a \(k\)-d tree [Ben75]. This data structure allows us to efficiently list the closest sRGB points to any color in the uniform color space we choose. We used the \(k\)-d tree implementation in SciPy [VGO+20].
There are three other issues with sRGB approximation that are specific to color map construction.
Two or more distinct colors in the uniform color space may have the same sRGB color as their best approximation. This is called posterization or banding, and it decreases the amount of information communicated by the color map.
Successive colors may be intended to have increasing values of \(J'\), but their best sRGB approximations may have decreasing values of \(J'\) instead (or vice versa). Such reversals cause unwanted high-frequency artifacts in the final visualization.
Similarly, successive colors may be intended to have increasing hue angles, but their best sRGB approximations may have decreasing hue angles instead (or vice versa).
Because hue mostly communicates low-frequency information, we did not believe that slightly misordered hue angles would cause problems for viewers. For that reason, we directed our efforts towards only the first two of the above problems.
An approximation algorithm¶
Our solution to these problems is an application of the \(A^*\)-algorithm [RN20]. Each step of the algorithm begins with an approximation to some of the colors in the color map, extends it by finding approximations to one of the remaining colors, and inserts the newly extended approximations into a priority queue. The algorithm discards any color map with posterization or misordered lightnesses.
Here is a description of our algorithm in Python-like pseudocode.
The description uses a function sRGB_nearest_neighbors
whose arguments are a color c and a positive integer
n and which returns a list of the n nearest sRGB
colors to c. (In practice this was implemented with a
\(k\)-d tree.) By replacing this function, the algorithm
could be used to approximate a color map by a gamut other than
sRGB.
def approximate_color_map(colors, max_neighbors):
H = PriorityQueue()
H.insert(empty_color_map(), key=0) # This key does not matter
while H.nonempty():
partial_color_map = H.pop()
t = len(partial_color_map)
if t == len(colors):
return partial_color_map
if t > 0 and colors[t].is_lighter_than(colors[t-1]):
s = 1
elif t > 0 and colors[t].is_darker_than(colors[t-1]):
s = -1
else:
s = 0
for y in sRGB_nearest_neighbors(colors[t], max_neighbors[t]):
if y in partial_color_map:
continue
if s > 0 and y.is_darker_than(partial_color_map[t-1]):
continue
if s < 0 and y.is_lighter_than(partial_color_map[t-1]):
continue
extended_partial_color_map = partial_color_map.append(y)
score = heuristic_score(colors, extended_partial_color_map)
H.insert(extended_partial_color_map, key=score)
# No sRGB approximation satisfying the desired conditions exists
return None
def heuristic_score(colors, extended_partial_color_map):
heuristic_color_map = extended_partial_color_map.copy()
for i in range(len(extended_partial_color_map), len(colors)):
heuristic_color_map.append(sRGB_nearest_neighbors(colors[i], 1)[0])
return squared_distance(colors, heuristic_color_map)
It is easy to verify that this algorithm does what it is supposed to:
Improvements to the approximation algorithm¶
While sRGB_nearest_neighbors is already useful in
practice, it can be improved in several ways.
Coalescence¶
Call each entry of H a “partial color map approximation.”
When late colors in the input color map are far away from early
colors, two partial color map approximations may coalesce: They
may differ in their first few entries but not in their later
entries. When sRGB_nearest_neighbors tries to extend
these partial color map approximations, it extends them in the
same way, producing more pairs of color maps that differ only in
their early entries. This is wasted effort, because one of these
two partial color map approximations was a better approximation
of the early colors in colors, and that partial color map
is the only one worth further attention.
To make the sRGB_nearest_neighbors appreciate this, it
can be augmented with an associative array A that maps
sets of colors to lightnesses. The array is initialized to empty
at the time H is created. After determining s,
the algorithm tests for coalescence as follows. It creates a set
containing, for all i greater than t, the
max_neighbors[i] nearest sRGB colors to
colors[i], omitting those colors already in
partial_color_map. This set k is used as the key
for A. If k is present in A, then the
algorithm looks up the corresponding value and tests s.
If s == 1 and partial_color_map[t] is lighter
than the stored lightness, or if s == -1 and
partial_color_map[t] is darker than the stored lightness,
or if s == 0, then the algorithm skips
partial_color_map and returns to the beginning of the
loop. Otherwise, the algorithm modifies A so that
k is associated to the lightness of
partial_color_map[t]. (For the empty color map, the
value is a placeholder such as zero.)
Scoring¶
Another way to improve sRGB_nearest_neighbors is to
improve heuristic_score. A better (but still consistent)
score is the same as a tighter lower bound on the minimum error
for a completion of extended_partial_color_map to an
approximation of all of colors satisfying the conclusions
of the theorem.
One way to create better lower bounds is to enforce more
constraints on the colors used to extend the partial color map.
For instance, the final color map consists of distinct colors, so
ideally heuristic_color_map should also consist of
distinct colors. For simplicity, we did not actually try to make
all the colors in heuristic_color_map distinct. Instead,
after creating heuristic_color_map, we tested it for
duplicate entries. If there were indices i1 through
ik such that heuristic_color_map[i1] through
heuristic_color_map[ik] were the same color, then we
replaced all but one of them by a worse sRGB approximation.
Specifically, if one of these entries was part of
extended_partial_color_map, then that entry was left
unchanged, while the others were switched to the second-best sRGB
approximations of the corresponding entries in colors;
otherwise, the entry corresponding to the color with the largest
distance between its best and second-best sRGB approximations was
left at the best sRGB approximation, while the others were
switched to second-best approximations.
More elaborate consistent heuristics are possible. For example,
if three or more colors in colors share the same best
sRGB approximation, then we could use a small instance of the
\(A^*\)-algorithm to find the minimum error way of assigning
distinct sRGB approximations to just these colors. We could also
try to force heuristic_color_map to satisfy, or come
closer to satisfying, the lightness consistency requirement. We
have not pursued these possibilities.
Multiple passes¶
Since there are millions of sRGB colors, testing every possible
approximation of every color in colors quickly runs out
of memory. There is little harm in having a bound such as
max_neighbors because each color in colors has
only a small number of reasonable sRGB approximations. Unless
max_neighbors contains extremely small entries, the final
color map is suboptimal only if making an especially bad
approximation to one color mysteriously improves the
approximations of the other colors. This should never happen in
any realistic situation. However, it is not easy to rule it out
with absolute certainty, which makes the presence of
max_neighbors a little irksome.
One possible solution to this is to run the algorithm twice in a
branch-and-bound fashion. The first time, max_neighbors
is taken to be fairly small, enough to guarantee that there are
only a few approximations to each color. If the output is
None, max_neighbors is increased and the
algorithm is run again. Once the algorithm outputs a color map,
that color map determines an upper bound on the square error of
the best approximation. It is not necessary to consider an sRGB
approximation to a color if it, when combined with the nearest
sRGB approximations to the other colors, would produce more
approximation error than the bound. This allows us to set
max_neighbors[i] so that we produce a minimum error
approximation without considering every possible sRGB color.
(There is one complication to this claim, namely the presence of
floating-point approximation errors. A provably correct
computation would need to use interval arithmetic. We did not
attempt this.)
We tried this branch-and-bound method on the Chromophile color
maps. The color maps had three distinct behaviors. It was
possible for the first pass to produce a color map with so little
error that there was no need to run the second pass. This
happened when the upper bound resulting from the first pass was
so tight that the second pass would never have considered more
colors than the first pass. In these cases,
max_neighbors could have been reduced. The second
behavior was that the second pass considered more colors than the
first pass but arrived at the same final result. The final
behavior was that our methods were not powerful enough for the
second pass to finish in a reasonable amount of time. Usually,
these were color maps where the first pass had a large square
error (such as cp_seq_gray) or which had a large number
of colors (such as cp_mseq_orange_green_blue_purple).
There were no cases where the second pass terminated with
different output from the first pass.