Implementing Photoshop's Content-Aware Scaling

I once took a photograph of a lighthouse by the beach. I had carefully planned the composition of my photo and was pleased with the result. Later that day, I went to go post it on Instagram, only to realize that the top of the lighthouse was cropped out by Instagram's max aspect ratio of 4:5. If only I had known of content-aware scaling!

Content-aware scaling is a computational photography technique made possible by an algorithm called seam carving. But before we get into the details, let's talk about traditional methods of image scaling.

Normally, we want to shrink or enlarge an image to fit a desired format. For example, I wanted to shrink my lighthouse photo from its original aspect ratio of 2:3 to 4:5.

Common aspect ratios

The traditional methods of image resizing are scaling and cropping. Scaling is used to shrink or enlarge an image. However, if the aspect ratio is locked, then scaling results in an image of the same aspect ratio — making it useless for my goal of making the lighthouse fit in Instagram's crop. If the aspect ratio is unlocked, meaning the width or height can change independently, then the image will become distorted (like those funhouse mirrors). Cropping simply removes pixels from the image periphery, which may alter the main subject of the image. Both methods are unsatisfactory because of how they affect the image content.

Content-aware scaling aims to avoid these limitations by resizing an image while preserving the image content. The technique shrinks or enlarges an image by adding or removing unimportant pixels. This exact feature is implemented in Photoshop as Content-Aware Scaling. Now that we've discussed the purpose of this technique, let us dive into the algorithm.

Seam carving algorithm

The seam carving algorithm was first published in 2007 and later refined in 2008 (see references below). The 2008 paper extended content-aware scaling to video and introduced an improved forward-energy criterion to the algorithm. This article will only discuss the application to images.

A seam is defined as a monotonic and connected path of pixels to “carve” out. Monotonic means that there can be one and only one pixel in each row or column of the image. Connected here means the pixels on the path are horizontally, vertically, or diagonally adjacent. A seam can go either left-to-right or top-to-bottom, depending on which direction we want to resize the image.

The steps of the algorithm are:

  1. Find the energy of each pixel.
  2. Find the path of the optimal seam.
  3. Remove (or insert) the seam, and repeat the process from step 1 until the desired size has been achieved.

Find the energy of each pixel

A pixel's energy is how much its intensity value changes from neighboring pixels. A large change indicates a sharp contrast between neighboring pixels. This tells us where the edges of objects in the image lie. Our goal is to remove pixels that are low-energy and blend into their surroundings, thus leaving the high-energy edge pixels untouched.

Horizontal and vertical image gradients

We define an energy function to find the edges in an image by combining the horizontal and vertical gradients of an image. The image gradient tells us the magnitude and direction of changing values. First, we compute the convolution of our image with the Sobel operator. This computation gives us the x- and y- image gradients, which we then take the absolute values of and sum together. After summing together, we get the final energy image depicting the edges of objects within the image.

Edges of steam valve

Find the path of the optimal seam

With the energy image obtained, we can now target pixels for deletion. For each row of the image, we calculate the cumulative minimum energy of the connected path it took to get each pixel. I.e., at each pixel, we choose the adjacent pixel from the previous row with the minimum value, sum it with the current pixel's value from the energy image, and store it for the next row to use.

The optimal seam to remove has the minimum cost among all other seams, and we define the cost of a seam as the cumulative energy of all pixels on the seam.

At the end of this dynamic programming procedure, the minimum value in the last row holds the tail of our optimal seam. By backtracking from this tail pixel, we can find the entire path of this optimal seam.

Remove (or insert) the seam

Seam removal and insertion work similarly; in fact, seam insertion relies on seam removal first. Recall that we continuously loop the process of finding the optimal seam and removing the seam. Thus, the image either shrinks or grows larger by 1 pixel in each iteration.

With NumPy's Boolean array indexing, we can create a mask over the image that is the same shape and size minus the optimal seam. Then, we can copy the values of the image under the mask to a new matrix that represents the new image. We repeat this process to remove as many seams as desired. The code can be optimized by using matrix operations in NumPy and optionally Numba's JIT compiler. It is not recommended for performance reasons to use naive, nested for-loops in your code because it will take hours to process (don't ask me how I know this).

For insertion, we do the same process in reverse. Having obtained the optimal seam path, we want to duplicate that seam and add 1 pixel to the overall width or height of the image instead of subtracting. This will require keeping track of the indices of the original image, because after insertion all pixels to the right of the seam will be shifted over by 1.

Forward-energy criterion

One final consideration is the improved seam carving algorithm published in the 2008 paper. The authors noticed that the original seam carving algorithm was prone to producing visual artifacts in the results. The culprit they discovered is that the original algorithm doesn't take into account energy introduced into the new image when a seam is removed. When a seam is removed, pixels that were previously separate now became adjacent. In some cases, this would create pixels with higher energy than before the removal — leading to visual artifacts in the result such as jaggedness.

The fix was to modify the cumulative minimum cost function and consider the energy cost that would be introduced by seam removal or insertion. I.e., we look forward in time to see which pixels will become adjacent as a result of seam carving and add that to the pixel's cumulative cost. The rest of the code remains exactly the same.

New edges created by seam carving

Results

The original image is a photograph of the Statue of Liberty. There are areas of water on the side that have less image content, so we want the seam carving algorithm to target those areas over the middle of the image with the statue.

Statue of Liberty

Below are the images generated using the original (backward-energy) criterion and the improved forward-energy criterion. The images with red lines show the seams that were removed or inserted into the original image.

Backward-energy seam removal:

Backward-energy seam removal Backward-energy seam removal with red overlay

Forward-energy seam removal:

Forward-energy seam removal Forward-energy seam removal with red overlay

Backward-energy seam insertion:

Backward-energy seam insertion Backward-energy seam insertion with red overlay

Forward-energy seam insertion:

Forward-energy seam insertion Forward-energy seam insertion with red overlay

Both methods produce good results, but look closely and you'll notice that the base of the statue pedestal is less distorted in the forward-energy seam removal and insertion. From the overlay of red seams, we can see that the seam carving algorithm completely avoids impacting the statue. Traditional scaling or cropping would not be able to produce similar results, and this demonstrates the usefulness of content-aware scaling.


References

Avidan, S., & Shamir, A. (2007). Seam carving for content-aware image resizing. ACM SIGGRAPH 2007 Papers, 10–es. https://doi.org/10.1145/1275808.1276390

Rubinstein, M., Shamir, A., & Avidan, S. (2008). Improved seam carving for video retargeting. ACM Transactions on Graphics, 27(3), 1–9. https://doi.org/10.1145/1360612.1360615