Approximation by sRGB colors

Problems caused by approximation

In the previous section, we described color maps as curves in a uniform color space. The most common standard for computer displays, however, is sRGB [InternationalECommission99], which is not a uniform color space. Because sRGB is intended as a standard for hardware capabilities, it is not appropriate for designing color maps. Displaying a color map on a computer requires converting from a uniform color space to sRGB, but this introduces several complications.

The first complication is the limited color palette. sRGB expresses colors in terms of red, green, and blue stimuli. The standard specifies that \(2^8 = 256\) intensities of each stimulus are available, indexed by integers between 0 and 255 inclusive. The intensities of the three stimuli together require 24 bits to store, so this bit depth is often referred to as “24-bit color.” sRGB output devices are not required to support fractional intensities, so sRGB can express \((2^8)^3 = 16,777,216\) colors and no others.

It is not straightforward to approximate even a single color by an sRGB color because the colors expressible by sRGB are not evenly distributed in a uniform color space. As an example, consider the \(J' = 10\) cross-section of CAM16-UCS:

The J' = 10 cross-section of CAM16-UCS with sRGB colors marked

The colors in the picture were sampled on a grid with 65,536 points in each direction. Each color was converted to sRGB, and colors requiring a stimulus value below 0 or above 255 were removed. The nearest sRGB color to each (measured in CAM16-UCS) was found; there were 6,069 distinct sRGB colors. We placed a circle at the \((a', b')\) coordinates of each of these colors. The color of each circle is the corresponding sRGB color, but lightened for visual clarity. Lightening was done by translating and linearly rescaling the \(J'\) values of the sRGB colors so that they spanned the range from 30 and 90. The actual range of \(J'\) values was 9.568 to 10.498, and the median \(J'\) value was 10.008.

As a specific example of a difficult color, consider the color with CAM16-UCS coordinates \((J', a', b') = (82.0, -23.9, -15.1)\). In sRGB coordinates, this color is (9.6, 221.5, 249.1) (when sRGB coordinates are on a scale of 0 to 255). The most obvious possibility is to round this to (10, 222, 249). The rounded color has CAM16-UCS coordinates (82.13, -23.99, -14.96), putting it 0.2088 away from the original color. But the sRGB color (1, 222, 250) is 12.3% better. When this color is converted to CAM16-UCS, it has coordinates (82.13, -23.95, -15.22), which is only 0.1830 away from the original color.

For some colors, the perceived difference between the nearest sRGB color and the color found by rounding in sRGB coordinates is even more extreme. The color with CAM16-UCS coordinates (2.3, 5.7, 2.2), for example, converts to the sRGB color (4.281, 0.504, 0.498). This rounds to (4, 1, 0), which is (2.48, 2.99, 4.49) in CAM16-UCS. But the sRGB color (6, 1, 1) converts to (2.94, 5.72, 2.09), which is 5.47 times closer to the original color!

This problem can be avoided by transforming all \(2^{24}\) sRGB colors into the uniform color space and storing them in a \(k\)-d tree [Ben75]. This data structure allows us to efficiently list the closest sRGB points to any color in the uniform color space we choose. We used the \(k\)-d tree implementation in SciPy [VGO+20].

There are three other issues with sRGB approximation that are specific to color map construction.

  1. Two or more distinct colors in the uniform color space may have the same sRGB color as their best approximation. This is called posterization or banding, and it decreases the amount of information communicated by the color map.

  2. Successive colors may be intended to have increasing values of \(J'\), but their best sRGB approximations may have decreasing values of \(J'\) instead (or vice versa). Such reversals cause unwanted high-frequency artifacts in the final visualization.

  3. Similarly, successive colors may be intended to have increasing hue angles, but their best sRGB approximations may have decreasing hue angles instead (or vice versa).

Because hue mostly communicates low-frequency information, we did not believe that slightly misordered hue angles would cause problems for viewers. For that reason, we directed our efforts towards only the first two of the above problems.

An approximation algorithm

Our solution to these problems is an application of the \(A^*\)-algorithm [RN20]. Each step of the algorithm begins with an approximation to some of the colors in the color map, extends it by finding approximations to one of the remaining colors, and inserts the newly extended approximations into a priority queue. The algorithm discards any color map with posterization or misordered lightnesses.

Here is a description of our algorithm in Python-like pseudocode. The description uses a function sRGB_nearest_neighbors whose arguments are a color c and a positive integer n and which returns a list of the n nearest sRGB colors to c. (In practice this was implemented with a \(k\)-d tree.) By replacing this function, the algorithm could be used to approximate a color map by a gamut other than sRGB.

def approximate_color_map(colors, max_neighbors):
    H = PriorityQueue()
    H.insert(empty_color_map(), key=0)      # This key does not matter

    while H.nonempty():
        partial_color_map = H.pop()
        t = len(partial_color_map)
        if t == len(colors):
            return partial_color_map

        if t > 0 and colors[t].is_lighter_than(colors[t-1]):
            s = 1
        elif t > 0 and colors[t].is_darker_than(colors[t-1]):
            s = -1
        else:
            s = 0

        for y in sRGB_nearest_neighbors(colors[t], max_neighbors[t]):
            if y in partial_color_map:
                continue
            if s > 0 and y.is_darker_than(partial_color_map[t-1]):
                continue
            if s < 0 and y.is_lighter_than(partial_color_map[t-1]):
                continue

            extended_partial_color_map = partial_color_map.append(y)
            score = heuristic_score(colors, extended_partial_color_map)
            H.insert(extended_partial_color_map, key=score)

    # No sRGB approximation satisfying the desired conditions exists
    return None


def heuristic_score(colors, extended_partial_color_map):
    heuristic_color_map = extended_partial_color_map.copy()
    for i in range(len(extended_partial_color_map), len(colors)):
        heuristic_color_map.append(sRGB_nearest_neighbors(colors[i], 1)[0])
    return squared_distance(colors, heuristic_color_map)

It is easy to verify that this algorithm does what it is supposed to:

Improvements to the approximation algorithm

While sRGB_nearest_neighbors is already useful in practice, it can be improved in several ways.

Coalescence

Call each entry of H a “partial color map approximation.” When late colors in the input color map are far away from early colors, two partial color map approximations may coalesce: They may differ in their first few entries but not in their later entries. When sRGB_nearest_neighbors tries to extend these partial color map approximations, it extends them in the same way, producing more pairs of color maps that differ only in their early entries. This is wasted effort, because one of these two partial color map approximations was a better approximation of the early colors in colors, and that partial color map is the only one worth further attention.

To make the sRGB_nearest_neighbors appreciate this, it can be augmented with an associative array A that maps sets of colors to lightnesses. The array is initialized to empty at the time H is created. After determining s, the algorithm tests for coalescence as follows. It creates a set containing, for all i greater than t, the max_neighbors[i] nearest sRGB colors to colors[i], omitting those colors already in partial_color_map. This set k is used as the key for A. If k is present in A, then the algorithm looks up the corresponding value and tests s. If s == 1 and partial_color_map[t] is lighter than the stored lightness, or if s == -1 and partial_color_map[t] is darker than the stored lightness, or if s == 0, then the algorithm skips partial_color_map and returns to the beginning of the loop. Otherwise, the algorithm modifies A so that k is associated to the lightness of partial_color_map[t]. (For the empty color map, the value is a placeholder such as zero.)

Scoring

Another way to improve sRGB_nearest_neighbors is to improve heuristic_score. A better (but still consistent) score is the same as a tighter lower bound on the minimum error for a completion of extended_partial_color_map to an approximation of all of colors satisfying the conclusions of the theorem.

One way to create better lower bounds is to enforce more constraints on the colors used to extend the partial color map. For instance, the final color map consists of distinct colors, so ideally heuristic_color_map should also consist of distinct colors. For simplicity, we did not actually try to make all the colors in heuristic_color_map distinct. Instead, after creating heuristic_color_map, we tested it for duplicate entries. If there were indices i1 through ik such that heuristic_color_map[i1] through heuristic_color_map[ik] were the same color, then we replaced all but one of them by a worse sRGB approximation. Specifically, if one of these entries was part of extended_partial_color_map, then that entry was left unchanged, while the others were switched to the second-best sRGB approximations of the corresponding entries in colors; otherwise, the entry corresponding to the color with the largest distance between its best and second-best sRGB approximations was left at the best sRGB approximation, while the others were switched to second-best approximations.

More elaborate consistent heuristics are possible. For example, if three or more colors in colors share the same best sRGB approximation, then we could use a small instance of the \(A^*\)-algorithm to find the minimum error way of assigning distinct sRGB approximations to just these colors. We could also try to force heuristic_color_map to satisfy, or come closer to satisfying, the lightness consistency requirement. We have not pursued these possibilities.

Multiple passes

Since there are millions of sRGB colors, testing every possible approximation of every color in colors quickly runs out of memory. There is little harm in having a bound such as max_neighbors because each color in colors has only a small number of reasonable sRGB approximations. Unless max_neighbors contains extremely small entries, the final color map is suboptimal only if making an especially bad approximation to one color mysteriously improves the approximations of the other colors. This should never happen in any realistic situation. However, it is not easy to rule it out with absolute certainty, which makes the presence of max_neighbors a little irksome.

One possible solution to this is to run the algorithm twice in a branch-and-bound fashion. The first time, max_neighbors is taken to be fairly small, enough to guarantee that there are only a few approximations to each color. If the output is None, max_neighbors is increased and the algorithm is run again. Once the algorithm outputs a color map, that color map determines an upper bound on the square error of the best approximation. It is not necessary to consider an sRGB approximation to a color if it, when combined with the nearest sRGB approximations to the other colors, would produce more approximation error than the bound. This allows us to set max_neighbors[i] so that we produce a minimum error approximation without considering every possible sRGB color. (There is one complication to this claim, namely the presence of floating-point approximation errors. A provably correct computation would need to use interval arithmetic. We did not attempt this.)

We tried this branch-and-bound method on the Chromophile color maps. The color maps had three distinct behaviors. It was possible for the first pass to produce a color map with so little error that there was no need to run the second pass. This happened when the upper bound resulting from the first pass was so tight that the second pass would never have considered more colors than the first pass. In these cases, max_neighbors could have been reduced. The second behavior was that the second pass considered more colors than the first pass but arrived at the same final result. The final behavior was that our methods were not powerful enough for the second pass to finish in a reasonable amount of time. Usually, these were color maps where the first pass had a large square error (such as cp_seq_gray) or which had a large number of colors (such as cp_mseq_orange_green_blue_purple). There were no cases where the second pass terminated with different output from the first pass.