livn

Implementing Photoshop's Content-Aware Scaling

Fri, 08 Apr 2022 23:25:25 +0000

I once took a photograph of a lighthouse by the beach. I had carefully planned the composition of my photo and was pleased with the result. Later that day, I went to go post it on Instagram, only to realize that the top of the lighthouse was cropped out by Instagram's max aspect ratio of 4:5. If only I had known of content-aware scaling!

Content-aware scaling is a computational photography technique made possible by an algorithm called seam carving. But before we get into the details, let's talk about traditional methods of image scaling.

Normally, we want to shrink or enlarge an image to fit a desired format. For example, I wanted to shrink my lighthouse photo from its original aspect ratio of 2:3 to 4:5.

The traditional methods of image resizing are scaling and cropping. Scaling is used to shrink or enlarge an image. However, if the aspect ratio is locked, then scaling results in an image of the same aspect ratio — making it useless for my goal of making the lighthouse fit in Instagram's crop. If the aspect ratio is unlocked, meaning the width or height can change independently, then the image will become distorted (like those funhouse mirrors). Cropping simply removes pixels from the image periphery, which may alter the main subject of the image. Both methods are unsatisfactory because of how they affect the image content.

Content-aware scaling aims to avoid these limitations by resizing an image while preserving the image content. The technique shrinks or enlarges an image by adding or removing unimportant pixels. This exact feature is implemented in Photoshop as Content-Aware Scaling. Now that we've discussed the purpose of this technique, let us dive into the algorithm.

Seam carving algorithm

The seam carving algorithm was first published in 2007 and later refined in 2008 (see references below). The 2008 paper extended content-aware scaling to video and introduced an improved forward-energy criterion to the algorithm. This article will only discuss the application to images.

A seam is defined as a monotonic and connected path of pixels to “carve” out. Monotonic means that there can be one and only one pixel in each row or column of the image. Connected here means the pixels on the path are horizontally, vertically, or diagonally adjacent. A seam can go either left-to-right or top-to-bottom, depending on which direction we want to resize the image.

The steps of the algorithm are:

Find the energy of each pixel.
Find the path of the optimal seam.
Remove (or insert) the seam, and repeat the process from step 1 until the desired size has been achieved.

Find the energy of each pixel

A pixel's energy is how much its intensity value changes from neighboring pixels. A large change indicates a sharp contrast between neighboring pixels. This tells us where the edges of objects in the image lie. Our goal is to remove pixels that are low-energy and blend into their surroundings, thus leaving the high-energy edge pixels untouched.

We define an energy function to find the edges in an image by combining the horizontal and vertical gradients of an image. The image gradient tells us the magnitude and direction of changing values. First, we compute the convolution of our image with the Sobel operator. This computation gives us the x- and y- image gradients, which we then take the absolute values of and sum together. After summing together, we get the final energy image depicting the edges of objects within the image.

Find the path of the optimal seam

With the energy image obtained, we can now target pixels for deletion. For each row of the image, we calculate the cumulative minimum energy of the connected path it took to get each pixel. I.e., at each pixel, we choose the adjacent pixel from the previous row with the minimum value, sum it with the current pixel's value from the energy image, and store it for the next row to use.

The optimal seam to remove has the minimum cost among all other seams, and we define the cost of a seam as the cumulative energy of all pixels on the seam.

At the end of this dynamic programming procedure, the minimum value in the last row holds the tail of our optimal seam. By backtracking from this tail pixel, we can find the entire path of this optimal seam.

Remove (or insert) the seam

Seam removal and insertion work similarly; in fact, seam insertion relies on seam removal first. Recall that we continuously loop the process of finding the optimal seam and removing the seam. Thus, the image either shrinks or grows larger by 1 pixel in each iteration.

With NumPy's Boolean array indexing, we can create a mask over the image that is the same shape and size minus the optimal seam. Then, we can copy the values of the image under the mask to a new matrix that represents the new image. We repeat this process to remove as many seams as desired. The code can be optimized by using matrix operations in NumPy and optionally Numba's JIT compiler. It is not recommended for performance reasons to use naive, nested for-loops in your code because it will take hours to process (don't ask me how I know this).

For insertion, we do the same process in reverse. Having obtained the optimal seam path, we want to duplicate that seam and add 1 pixel to the overall width or height of the image instead of subtracting. This will require keeping track of the indices of the original image, because after insertion all pixels to the right of the seam will be shifted over by 1.

Forward-energy criterion

One final consideration is the improved seam carving algorithm published in the 2008 paper. The authors noticed that the original seam carving algorithm was prone to producing visual artifacts in the results. The culprit they discovered is that the original algorithm doesn't take into account energy introduced into the new image when a seam is removed. When a seam is removed, pixels that were previously separate now became adjacent. In some cases, this would create pixels with higher energy than before the removal — leading to visual artifacts in the result such as jaggedness.

The fix was to modify the cumulative minimum cost function and consider the energy cost that would be introduced by seam removal or insertion. I.e., we look forward in time to see which pixels will become adjacent as a result of seam carving and add that to the pixel's cumulative cost. The rest of the code remains exactly the same.

Results

The original image is a photograph of the Statue of Liberty. There are areas of water on the side that have less image content, so we want the seam carving algorithm to target those areas over the middle of the image with the statue.

Below are the images generated using the original (backward-energy) criterion and the improved forward-energy criterion. The images with red lines show the seams that were removed or inserted into the original image.

Backward-energy seam removal:

Forward-energy seam removal:

Backward-energy seam insertion:

Forward-energy seam insertion:

Both methods produce good results, but look closely and you'll notice that the base of the statue pedestal is less distorted in the forward-energy seam removal and insertion. From the overlay of red seams, we can see that the seam carving algorithm completely avoids impacting the statue. Traditional scaling or cropping would not be able to produce similar results, and this demonstrates the usefulness of content-aware scaling.

References

Avidan, S., & Shamir, A. (2007). Seam carving for content-aware image resizing. ACM SIGGRAPH 2007 Papers, 10–es. https://doi.org/10.1145/1275808.1276390

Rubinstein, M., Shamir, A., & Avidan, S. (2008). Improved seam carving for video retargeting. ACM Transactions on Graphics, 27(3), 1–9. https://doi.org/10.1145/1360612.1360615

Self-Piloting Spaceship Simulation

Tue, 26 Oct 2021 02:01:05 +0000

The goal is to simulate a self-piloting spaceship that can navigate through an asteroid field without collisions. I implemented a Kalman filter that estimates the state of the system — the asteroids’ future positions. Knowing where each asteroid is heading, I then programmed the spaceship to choose the best heading and velocity at each time step as not to crash into any asteroids.

Estimating future locations of asteroids is the most important part of the project and is calculated through Kalman filtering. This is a technique to estimate the state of a system given measurements over time. For this project, the measurements are the asteroids’ coordinates (with measurement noise) and the state we estimate are the coordinates of each asteroid in the next time step.

The Pilot class controls the spaceship’s flight path and contains three methods:

observe_asteroids is called once per time step and informs the spaceship of the latest asteroid measurements.
estimate_asteroid_locs predicts the location of each asteroid in the next time step after observing asteroids.
next_move chooses the best move given the current state of the spaceship and system.

The Pilot class is provided an array of asteroid coordinates that are currently in the field. In a real-life scenario, we can imagine the spaceship having a sensor or radar to make these measurements. All measurements, including the one passed into the Pilot class, come with noise from the sensors, meaning these measurements may not be 100% accurate. We have to take this error into account as Gaussians when calculating our Kalman filter.

In a Kalman filter, every new measurement, observe_asteroids, makes us more confident about the asteroid coordinates. However, every prediction step, estimate_asteroid_locs, throws some uncertainty into the equation. The Kalman filter continually cycles through these two steps to produce the best estimate of where each asteroid could be at the current time step, but this is merely a guess and can be off.

The recording above shows red dots floating around the asteroids. These red dots are estimates that are too far away from the asteroid’s actual location to be accurate. Notice how the estimates get more accurate over time as state measurements accumulate.

To mitigate uncertainty, I have my spaceship wait at the starting area for a few time steps while the Kalman filter calibrates. Once I am confident that I have localized all the asteroids in the field to a certain margin of error, I let loose the spaceship to fly itself. The Pilot then calls next_move at each time step while considering the coordinate estimates given by the Kalman filter.

These test runs show exciting results as the spaceship zips around, narrowly avoiding catastrophic collisions.

JWT Authentication for Single Page Applications

Sat, 12 Dec 2020 20:52:39 +0000

Last updated: April 9, 2021

A set of notes made when I was learning about web authentication. Like with anything else in software development, there are tradeoffs between using sessions vs. JWTs. At the end of this post, I highlight the security concerns that need to be taken into consideration when using JWT authentication.

Traditional client-server interactions were straightforward request-response cycles. Modern Single-Page Application (SPA) interactions can be more complex, and can involve multiple clients and servers.

Traditional server-side authentication

In server-side web applications, the user signs in and their credentials are sent to the server. The server checks these credentials against a database, and if everything matches, a session is created on the server. The session is a piece of data that identifies the user. After the session is created, a cookie gets sent back to the browser with the session_id of the user. The cookie is saved in localhost and anytime the user makes a request, the cookie is automatically sent along with the request to the server. The server extracts the session_id from the cookie and verifies that the session exists in the database. This is how users remain authenticated over multiple, separate requests, and is an example of stateful authentication.

What is a session?

In general terms, a session is a way to preserve a desired state. For both server-side and client-side authentication, this piece of state determines whether the user is authenticated.

In a session, this data is stored in memory on the server (or database). For server-side authentication, this is the identification (session_id) for the user, which is used to make a determination about the user's authentication status. Keeping sessions in this manner is stateful.

In client-side authentication, the SPA has no way to know whether a user is authenticated or not. We can't store a session like the traditional manner, because the SPA is decoupled from the backend.

Client-side authentication

The client-side authentication flow starts off similar to the server-side flow. The user submits their credentials, which are sent to the server and checked against the database. If everything matches, a token is created and signed instead of a session. The token is sent back to the user and saved in the browser, usually in local storage or a cookie. The token is then attached to the Authorization header on every subsequent HTTP request.

When the request is received on the backend, the token is verified by the secret key stored on the server. The payload is checked and the server looks at the claims on the token. If the token is valid, the requested resource is returned, else a 401 is returned.

RESTful APIs have a formal constraint that they should be stateless, so traditional authentication doesn't conform to those standards. Authentication with tokens can be stateless and is a good approach for authentication with a SPA and RESTful API.

SPAs no longer rely on the server to do authentication. Instead, the client claims to the server that it is authenticated with the token. The backend can now receive requests from multiple clients and it only cares if the token is valid. The backend acts as a decoupled API serving up resources, which means there is no need for additional user access lookups because this information can be included right in the payload.

What's in a JWT?

I've been vague about what a token is so far, so let's get into the details. By token, I am referring to a JSON Web Token.

JWTs are a method for communicating between computers in a secure way, through a JSON payload. A web browser can send a claim (assertions) with a JWT to a server that asserts something about the identity (user). The client says “believe me, and here's the proof”. The token is digitally signed, and contains the proof that this client is who they say they are. All this information exists in the self-contained, compact JWT. This is a great method for performing stateless authentication.

Stateless authentication means that the client and server don't need to know too much about one another. The token contains the necessary information to identify the user, whereas stateful authentication uses sessions to identify users. These sessions must be stored in the database, hence stateful.

This is the structure of a basic JWT and its three main components. The header contains meta information about the token, the payload contains the information you want to pass along, and the signature is what makes the JWT secure. The JWT is hashed with this secret key and makes the token “unchangeable”, in that changes would produce a different hashed JWT.

It's important to note that JWTs are not encrypted, meaning that if someone were to get a hold of a JWT, they could easily decode it and extract the payload from it. This is why you should not put sensitive information in the payload, just enough to identify the user. However, if a JWT is modified, it is immediately invalidated because the hashing algorithm will produce a completely new JWT.

The payload contains claims about the entity for which it was issued. The JWT standard describes a set of reserved claims: its, sub, aud, exp, nbf, int, jti. You can see above in the picture that sub is a type of user ID and can be used to identify the user. You can add any arbitrary claim, such as name.

Implementation details about JWT authentication

Because JWT authentication is stateless, the best method for determining the user's authentication status is to go by the JWT's expiry time. If the JWT is expired, it can't be used to access protected resources.

When the user logs in, we provide an application-wide “flag” to indicate the user is logged in, by putting the token in local storage. At any point in the application's lifecycle the token's exp value can be checked against the current time. An example of a lifecycle event is the route changing. If the token expires, we change the “flag” (remove from local storage) to indicate the user is logged out.

To conditionally render content based on the user's authentication status, we can implement a function to check if the user is authenticated or not. We do this by grabbing the decoded token's expiry date and comparing to the current time. We can then store this isAuthenticated flag on the global state.

User information in the JWT payload

The JWT payload is what makes the JWT useful as it contains a “summary” of the user. We want to use the information in the payload to feed our profile view. A JavaScript library that can decode the JWT payload is jwt-decode.

Payload best practices

It may be tempting to store a whole profile object in the payload, but we shouldn't do this. It's important to keep the JWT small because it is sent on every single request. Furthermore, because the JWT is decodable, we don't want to store any sensitive information in there.

So, what should be in the payload? Basic user information such as email, name, and picture are good things to keep in the payload that can build a simplified user profile. Consider providing a separate endpoint which retrieves a user profile object if you need more profile data.

Protecting API resources

The whole point of implementing authentication in an app is to restrict resource access to users who own those resources. We can think of the different levels of access:

Publicly Accessible: Open to anyone
Limited to Authenticated Users: Open to anyone logged in
Limited to a Single Authenticated User: Open only to the user logged in
Limited to a Subset of Authenticated Users: Open to anyone of a specific privilege

JWT middleware

We can create API endpoints that require an authentication check. To pass the check, a valid JWT must be present. If it's valid, access is granted for the resource.

To make this JWT check available for multiple endpoints, we can create a custom middleware that will extract the incoming Bearer tokens and verify it with the server's secret key. We can also specify the scope or levels of access for a specific endpoint by using a library like express-jwt-authz, which will validate against the scope value from a JWT payload.

Making authenticated requests

Sending authenticated requests from the client to the server requires us to first retrieve the JWT from local storage and attaching it in the Authorization header. Or, if we choose to store the token in a cookie, it will automatically be attached to every request by the browser.

const token = await getToken();
const response = await fetch('/protected-route', {
	headers: {
		Authorization: `Bearer ${token}`
	}
})

The Bearer scheme is coming from the OAuth 2.0 specification.

Protecting client-side routes

In your client-side application, you may want to protect certain routes from unauthenticated users. In traditional web apps, the server can verify the user's session before serving up the requested page. In SPAs, however, we don't have a server protecting our routes and the entire client-side is bundled and loaded on the first visit. We can protect API resources using the JWT check, but how will we protect client-side routes?

Protecting client-side routes can be tricky because the user can modify the expiration time or scope in their JWT. Moreover, we can't verify the JWT on the client-side because the secret only lives on the server.

Does it matter if the user accesses protected client-side routes? In the end, even if a user hacks their way into the protected route, they will not receive any information because protected resources live on the server. The server is responsible for feeding data into the client view, so as long as sensitive information is stored correctly on the server, there is no issue with a simple check on the client-side.

Given what we discussed above, we can safely use the expiration time and scope claims from the JWT to decide what to render to the user. We can build out a PrivateRoute component to wrap routes we want to protect.

Important considerations

Nothing is 100% secure, and JWTs are no exception. There are several attack vectors on JWT authentication. If you store your token in local storage, you are vulnerable to cross-site scripting (XSS). If you do store your token in local storage, you should be protecting your site against XSS by ensuring you are following protocols. If you store your token in a cookie, you are vulnerable to cross-site request forgery (CSRF). Cookies can be set as httpOnly to prevent JavaScript access. It is commonly recommended to store the token in a httpOnly cookie.

There are also man-in-the-middle attacks (MITM), which is prevalent on public networks. Always serve your application and API over HTTPS.

Because JWTs should be short-lived (around 1h), we need to implement refresh tokens to provide a better user experience. A refresh token generates new tokens for the user when their tokens expire or are about to expire. These refresh tokens are long-lived and stored on the server. They should be kept secret and never stored on the client.