url-shortener

Design a URL Shortener

Posted by

Introduction

Have you ever thought, about what happens when you paste a URL into LinkedIn Post content, It just gets converted into a short URL, How is that short URL generated, when you click on a short URL it sends you to the actual URL, This all happens with the help of a service named as URL Shortening Service.

What is URL Shortening Service

A service that converts a long URL into a short URL.

And redirects the short URL to the long URL.

This is the concept of URL shortening service, many platforms like bit.ly, and TinyURL have products around this concept.

So In today’s post, we are going to discuss How to design a URL shortening service. So let’s get started.

First of all, let’s discuss the Functional and NonFunctional Requirements:

Functional Requirements

  • Service should convert a long URL into a short URL.
  • On Clicking of short URL service should redirect to the long URL.
  • Service should consider the expiration of Old URLs

Non Functional Requirements

  • Service should be scalable and highly available. (because if the service goes down, all the URL routing will stop working)
  • Service should be fault-tolerant.

Capacity Estimation

Now that we have the requirements clear, so before starting the design we should consider how much data we are going to store, based on that it will help us understand database and server capacity. 

So if there are 1 lakh requests per month in our application. and for each request, we are going to store the following data.

  1. long URL: It will take approx 2 kb because it will be varchar.
  2. short URL: We are considering a short URL a few characters long so it will take 7-8 bytes approx.
  3. createdAt: This will also take 8 bytes

So a total of approx 2.014 KB space for one record and in 1 year total space required will be 2.014 * 100000 * 12 = 2416800 KB = 2.4 GB.
So this calculation is for 1 lakh unique requests every month and in 1 Year 2.4 GB space is required.

Short URL Length

We are thinking of making a short URL length of 7 which is an enough large string. and we will be using 62 characters[0-9A-Za-z] using these 62 characters every time a 7-character long short URL is generated.

So using this combination we will be able to generate 62^7 possible unique short URLs, which is quite a high number, If we have used only [0-9] which means 10 characters then only 10^7 unique short URLs can be generated, so using alphabets is a good idea.

Now since this is an Interview question, so don’t directly jump into the solution or approaches instead discuss these things with the interviewer first and then explain them.

We are clear now about the requirements, storage, and URL pattern. So now let’s discuss the actual problem which is how to generate a short URL.

Approaches to Generate Short URL

  1. Using Base62 encoding: The base 62 encoding technique uses 62 characters [0-9A-Za-z] and using them generates an encrypted string. Base 62 encoding uses an algorithm to encode. 

Using the above encode method just generates a short string from a long URL, Now you might be wondering if this method takes a number as input, not a long URL, so our job is to convert a long URL (string) to a long one using some algorithm like SHA-256 and then pass that number into encode method.

2. Using MD5: MD stands for MessageDigest, MD5 is an algorithm that is used to generate encrypted short URLs for a long URL. The only thing is it generates a very long short URL so just pick starting few bytes of it.

The disadvantage of this is that it can generate the same short URL for different long URLs which will cause collision and data will become redundant in DB.

Store the mapping in the Database

Now since we have an approach to convert we will be converting and storing mappings into the database for each request.

 1. Base62 Technique: If we use this technique for conversion, before storing the record in the DB need to check if the same short URL is already generated or not if it is generated then generate again a new short URL then insert it.

Another problem could be if users send the same long URL multiple times it will generate a new short URL every time, which will cause a data duplication in the long URL field. So to overcome this problem before proceeding with the request just check if a long URL exists in DB the just return the short URL associated with it.

Now next problem could be if multiple users send the request same time from different servers, The same short URL may be generated for both requests but both were not present in the DB before but they can be inserted at the same time, so this will also corrupt the data because same short URL for two long URLs is a problem.

So to overcome this problem in Spring Boot we have a version column in the database, which will make the process to access one by one and it will check each update request. I have updated the code with the above implementation so you can explore the code.

2. MD5: The MD5 technique will also be used the same way benefit of MD5 is that it will generate the same short URL every time if the long URL is the same, so this will help in reducing data duplication.

But for inserting records in both techniques we need to use RDBMS query help, which is INSERT IF NOT EXISTS so it will only insert unique records. But since in NOSQL we don’t have this way to insert data, we need to think about another approach.

3. Counter Approach: 

We will be maintaining one counter variable, It will be stored either in DB or in a memory cache anywhere, but what we will be doing is each new request will be taking the current value of the counter and using that will generate a short URL and it will increment the counter subsequently, so next request will get the different value of the counter. It will help in removing the problem of duplicate short URLs.

But the problem is this approach has a single point of failure, what if the DB that is giving the counter goes down because multiple servers will be hitting the same DB to get the value of the counter? In that case, the complete application will go down.

To overcome this problem you can say we can have multiple DBs for the counter, and different servers will use different counter DBs, but the management of this is difficult so Zookeeper can be used here for managing multiple servers.

Ok, So the implementation for this counter approach is also given in the code repo. You can check that and if you think there could be some more better approach you can create a PR for that.

Databases

For the above service, we can use an SQL database, which is ACID compliant. but the thing if the load in the application increases the performance of the system can degrade, because sharding is difficult and complex to do.

If the load increases then using NoSQL we can easily handle performance, by increasing the number of nodes easily.

Get the long URL

The next point is getting the long URL, to get the URL in a performant way we should index on the short URL column because every time we are going to fetch the record by short URL. index will make the query fast

Expired URLs

To handle expired URLs we should run some timer tasks at regular intervals to monitor the expired URLs and remove them from DB.

I have added all the code related to all the tasks in GitHub, So please check it.

Github Repo Link

Thanks for reading!!

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *