Indian Fintech Paytm Introduces Plutus, A Cloud Cost Optimization Tool

Indian Fintech firm Paytm reveals that it’s growing rapidly and so is the cloud infrastructure.

The main reason for choosing cloud computing and processing is its ability to scale in and scale out according to changing requirements. As explained by Paytm in a blog post, Cloud infrastructure makes it “highly efficient” and easy to modify the IT infrastructure including processing power, networking ability, storage, memory, without negatively impacting operations.

Paytm leverages third-party cloud providers like AWS, Azure, and Google Clouds and has managed to offer a seamless user experience to millions of customers.

While expanding horizontally and vertically, Paytm created half of its existing infrastructure during the past 5 years. There are 100+ AWS accounts, more than 100 verticals, thousands of servers, multiple Petabytes of cloud storage and thousands of micro-services “running at the same time at this moment.”

Because of the lack of appropriate Cloud resource creation processes in the beginning, these micro-services were “set up in a very complicated manner.” The team at Paytm also mentioned that it “became really difficult to keep track of all the resources and their optimization.”

The firm pointed out that it was high time that Paytm “needed a monitoring system throughout the organization.” A system where “all the vertical owners, technical leaders, and other stakeholders can take a look to find out the current state of cloud infrastructure and the exact cost associated with it.”

Furthermore, it was expected that this tool “should also suggest how the entire infrastructure can be optimized both cost-wise and performance-wise.”

Paytm has named this tool The ‘Plutus’, “inspired by the Greek God of wealth.”

Introducing the Paytm Cloud Explorer Reports

The first step was the “most difficult and time-consuming,” the company revealed. It involved getting every AWS account onboarded. After onboarding all the accounts, “a program was written to check the current state of every AWS cloud resource.”

Based on the information obtained, the program determined the cost associated with it, “potential cost savings and other monitoring parameters to help decision making.” These reports revealed “potential cost savings for cloud infrastructure.”

As noted in the update:

“The cost savings program started sending the cost reports to all the AWS owners along with the technical leaders of Paytm. Soon, it was realized that the cost optimization throughout Paytm can not be driven through email chains and email reports. This led to creating a tool where the cloud resources can be optimized for cost savings. One such use case where we decided to use EC2 spot instances instead of On-demand instances to reduce the compute cost by 50-60%.”

The Plutus- ASG Module

EC2 spot instances, “allowed in only Auto Scaling Groups, are cheaper than the regular ones, but there is a risk.”

The risk of “being interrupted/terminated by AWS at 2-minute notice.” Spot instances are “ideally designed to host stateless and fault-tolerant micro-services.”

To enhance the reliability of these instances, “an intermediate platform is required,” the team at Paytm explains. This platform ensures the proper infrastructure configurations and “leverages the data of thousands of other spot instances from all the onboarded accounts.”

Based on the data collected, the platforms “predicts the interruption early enough to make the necessary changes in infrastructure for a graceful transition from one machine to a newly launched machine.”

The Plutus became this intermediate platform to “ensure the highest level of reliability in discounted/Spot virtual machines.”

As noted by Paytm:

“The vision for the Plutus is to become a one-stop solution when it comes to cloud infrastructure cost savings, optimization, and self-healing ability. In future, this platform will allow users to create automated scripts such as auto-tagging AWS creators on resources, stopping EC2 that are using unapproved AMIs, auto enforcing SSL for critical services, deleting the unencrypted volume, and many more other use cases. The spot module will support EMR clusters along with ASGs. Network cost violations & diagnosis will be added along with Fix/Negate features.”

For the full details on this update, shared by Paytm, check here.



Sponsored Links by DQ Promote

 

 

Send this to a friend