How bad is the situation?
We need to perform audit to answer this question "in numbers". Website page generation time on server and browser rendering time are critical indicators for us on the first stage of the audit. Less than a second time is the good consolidated figure. The consolidated page display time divided by 1 second is the indicator for the simple audit.
Below there is a Yandex-Metrics diagram, where you can see that the resource got performance problems. The time of full page loading can take up to 9 seconds. There is a big chance, that user will not wait page load and will go away.
A lot of small boxes is better than one big!
DEVOPS DEPARTMENT MANAGER, INTARO
One more not unimportant indicator is the performance capacity that you can reach making load testing of the working project. You should consider, that the aim of load testing is to get the values of maximum performance, i.e. values, at which you will get the denial of service. For that very reason you should perform load testing with a special care, in order to prevent the real denial of service. Usually the indicator of load testing is RPS (Requests per second) — the number of requests per second, which your server can handle. You can compare the received value with current average RPS number and get the available performance capacity of your project.
Below there is an example of load testing by Yandex.Tank. The testing was held on the "Live" project in the rush-hour. You can see, that RPS reached values close to 30. But in fact the denial of service began from 17 RPS. If you consider the average webside load of 100 RPS, you can say this margin is critically low and the server is the potential victim of DDoS (The attack, which aim is denial of service).
Usually the reason for it is absence of server architecture or errors in its constructon, or lack of coordination between developers and system administrators. As a rule, different teams of developers and even system administrators can work over the same project. But what also can be the reason, is that the project was developed for low load and was not ready for an abrupt jump.
Anyway, it's never late for everything to be adjusted. And I'd like to mention, that bringing to chaos, as well as bringing everything to order in the project is the result of harmonious work of developers, system administrators, content managers and etc. That is exactly why you need to detect the bottlenecks of the project, divide responsibility zones and solve problems in a complex.
Servers are the things that make us who we are!
There is a belief, that if you buy a server and it is rather powerful, you can stop running scared and forget about the upgrade for the nearest couple of years. It can be true for projects with medium website traffic, that do not plan the increase in number of users in the nearest time. At first sight it is easy indeed. To administer and pay for 1 server. But as for the projects with high website traffic, with it's possible jumps, it is fundamentally wrong.
The off time during technical works and the minimizing the repair time in case of emergency are critically important for such projects. The expression "time is money" is applicable here. That's exactly why we need to create the architecture, that will provide proper fall-over protection and scalability level.
Three pillars of correct configuration
Let's have a closer look at each point.
Scalability is understood as the ability to add compute capacity to the project fast and simply. That's exatly the reason why you need to separate roles for servers. You need to understand clearly, that many "small" servers is better than one "big". All servers of the same role should have identical configuration. This gives us capability for simple horizontal scaling. In case if one "small" server fails we have a proportional performance decrease. In case if one "big" server fails, we are likely to have the denial of service.
If we consider a typical website on PHP and MySQL DBMS, the roles separation is as follows: Request balancer, Application server, Database server.