You may find many articles on technology blogs discussing database scalability and its applications. The topics are so scattered and full of techniques defined haphazardly without describing many contexts. Most of these discussions are not done step by step, which makes it difficult for beginners to understand it. As one starts with database scalability challenges, one must know about various scaling options, identifying the most feasible practice to follow, and administration of the same.
So, here in this article, we are discussing database scalability step by step in its real context for the beginners. This article is not a high-level technical article but described in such a way that it is understood. We may be discussing the scaling techniques in detail in the forthcoming articles. So, to get started, we will get an overview here.
Contents
A sample case for database scalability
In the case of beginners, most of the time, this test case will match. Assume that you have a startup business, which offers a service at a cheaper cost. You have started at a minimal scale by covering the customers by targeting a city. You have started, and your customers were in tens across the city.
You saved all your customer information, demographics, and transactional data in a single database, i.e., a single physical database server. In the beginning, you may not be aware of or have the fancy of caching some big data pipeline for data analysis or decision making. However, this mode of administration is excellent as there are only a minimal number of customers, and your system goes on well with this.
As time passes, if people start to love your service, your customer base may begin to build up slowly. If you do some advertising also, then there is more scope of getting a higher number of customers attracted. Soon, you will find your customer base increasing to hundreds or thousands probably. Next, you will be expanding your business to a second city or more, and customers may double up.
At this point, you may start to realize that your existing database system performs very poorly. The symptoms are like the API latency troubles the users more frequently, transactional deadlocks popping up, slowing down of application performance, or in the worst case, frequent system failures. Your service app may take more time to respond, and each such instance ends up in customer dissatisfaction. You start to identify a customer drain due to poor online performance.
At this typical case scenario, here are some patterns to identify and follow:
Implementation of the connection pool and query optimization
The first you may think of technically is the cache using some non-dynamic data as service booking, payments, customer profile, etc. But beyond this caching of the application layer, you cannot resolve latency problems related to APIs. You may identify that the database is normalized heavily to introduce redundant columns, as you frequently see in the JOIN ON and WHERE clause in the queries for denormalization. Doing this will help reduce the join queries and break the bigger queries into multiple small queries.
Another optimization approach you can adopt is to tweak the database connections. Client libraries for databases are usually external libraries, which are available in all the programming languages. You may use the connection pool libraries to cache the database connections. You may also try to configure the size of the connection pool in DBMS itself.
Scaling up
Once after examining and understanding the system metrics, you may understand that there may not be an easy solution for scaling databases rather than increasing the system’s hardware or addition. You can upgrade your RAM size and then upgrade the disk space two or three times more. This approach is called scaling up or vertical scaling. You may take the support of your server administration team or the DevOps team to do it. There are also third-party agents offering database solutions that help to upgrade your system.
As RemoteDBA points out, in vertical scaling, you allot a bigger machine for storing. You may migrate the data manually from the old system to new rather than set the new server as a replicable of the existing system. The replication runs naturally. Once this is done, you may promote the newly added bigger machine to the primary and disconnect the older machine. As the bigger systems can serve more requests simultaneously, the database read/write to be run through this machine.
CQRS (Command Query Responsibility Segregation)
In some cases, you may identify that the big machine, too, may not be able to handle all the read/write requests. Sometimes, some may need more transactional capabilities to write than reading and vice versa. So, it is a good idea to separate your read & write operations machine wise. This will offer a better scope on individual systems meant for each to handle more read or write operations more effectively.
In this case, you may take a couple of machines and set those up as a replicable of the current machine. This will execute the distribution of data form the primary machine to the two replicas. You may request all the read queries to the replicas (in which any replica can serve the read request) and navigate the write queries to the primary. The replication may take a little time to complete, but it is affordable based on your specific business use case.
Partitioning:
If your location database gets a huge volume of write & read traffic, it may put more pressure on your existing database. The location tables may contain various primary data related to customers and service transactions but may not have much to with the financial data. So, the best approach may be to separate the location tables to a different database schema.
This is partitioning of databases by functionality. Various databases out there may host the data categorized into different functionality. With this technique, you may focus more on scaling up the high in demand functionalities with a higher rate of read/write requests. The backend layer takes the responsibility to combine the results, as necessary.
There are more patterns, too, as horizontal Scaling, sharding, data-center wise partition, etc., which will help you up-scale your database as the business grows. There are some popular techniques adopted for database scaling, which may help you build a better database system and architecture.