Autoscaling a Load Balanced Web Application

Automatic, horizontal scaling of web servers is an essential capability of the infrastructure behind any service that manages a variable amount of traffic. This post is the first in a four-part series that demonstrates how several of the over 120 features that Oracle Cloud Infrastructure has added in 2019 work together to enable a web application to scale based on changes in demand.

Let’s use a fictional online retailer, TheSmithStore, as an example. At the TheSmithStore, web traffic is low overnight and starts to increase at about 6 a.m. EST. Traffic trends up over the day and generally peaks at about 10 p.m. EST. On the weekend, traffic follows the same general pattern but is much heavier. Additionally, traffic is heavier during the holiday season than at other times of the year. Throughout the year, traffic at its peak is 20 times greater than at its lowest.

To handle the traffic variation on-premises, TheSmithStore would need to purchase and maintain the infrastructure required to serve their customers at peak traffic. But maintaining peak infrastructure is a tremendous waste of resources; most of the investment in people, compute, and networking equipment would be unused 300 days of the year. By moving this workload to Oracle Cloud Infrastructure, TheSmithStore costs would follow their traffic, and more importantly, their revenue. The same number of people required to maintain a fleet of 100 web servers can maintain 2000 web servers.

Components

The primary Oracle Cloud Infrastructure components that help TheSmithStore respond to variable web traffic are the Load Balancing service, custom Compute images, and Compute autoscaling. Each post in this series focuses on one component of this configuration. The following diagram shows how TheSmithStore uses each Oracle Cloud Infrastructure component and how they interact with each other.

This blog series covers the following resources:

One load balancer
One custom Compute image
One instance configuration
One instance pool
One autoscaling configuration
Multiple Compute instances managed by autoscaling

This blog series does not cover the following resources in detail, but documentation links are provided:

One virtual cloud network
One regional public subnet with associated route table, internet gateway, and security list
One regional private subnet with associated route table, NAT gateway, and security list
(Optional) One bastion public subnet with associated route table, internet gateway, and security list
(Optional) One bastion compute instance for administrative access to the web servers

Series Overview

The rest of this blog series contains the following posts:

Load Balancing Service

Part 2, Preparing a Load Balancer for Instance Pools and Autoscaling, covers how TheSmithStore uses the Oracle Cloud Infrastructure Load Balancing service. This service provides a high availability (HA) endpoint for TheSmithStore website that makes it easy to run a web store that is always available to customers.

Custom Compute Image

Part 3, Designing Images for Compute Autoscaling, covers what TheSmithStore considers when building custom images. A well-designed custom Compute image can launch quickly and immediately take customer traffic.

Compute Autoscaling

Part 4, Using Compute Autoscaling with the Load Balancing Service, covers how TheSmithStore automatically scales their web server fleet. Compute autoscaling automatically adds or removes web servers to a load balancer based on customer-specified triggers. It ensures that enough capacity is available to serve TheSmithStore customers while also keeping costs low.

For an overview of instance configurations and instance pools, see Enhanced Compute Instance Management on Oracle Cloud Infrastructure.

For an overview of autoscaling, see Right-Size Your Workloads with Oracle Cloud Infrastructure Autoscaling.

Get Started

By using these Oracle Cloud Infrastructure components together, TheSmithStore can increase their availability and performance while eliminating idle resource costs. Go to Part 2 to get details on the Load Balancing service for TheSmithStore. You can follow along with the steps in this series with an Oracle Cloud Infrastructure free trial. If you don't already have an Oracle Cloud account, go to https://www.oracle.com/cloud/free/ to sign up today.

--------------

This post is the second in the four-part series on Autoscaling a load balanced web application. You can find links for the entire series at the end of this post.

In Part 1, we set up the scenario of TheSmithStore, a fictional online retailer who needs their web application to respond to variable amounts of customer traffic. TheSmithStore has decided to harness the scalability of Oracle Cloud Infrastructure to ensure the availability of their web application, even during unexpected traffic spikes.

The first component of TheSmithStore configuration is the Load Balancing service. The following diagram was introduced in Part 1, but here it highlights the load balancer components. Before we look at details for TheSmithStore load balancer, let’s look at an overview of the Oracle Cloud Infrastructure Load Balancing service.

Load Balancing Service Overview

The essential features of any load balancer are distributing load across backend resources, determining the health of backend resources, and being fault-tolerant. The Oracle Cloud Infrastructure Load Balancing service includes all of these features and more.

The core of the Load Balancing service is the load balancer itself. Load balancers can be used with public or private IP addresses and are provisioned based on needed bandwidth. Every public load balancer is regional in scope and includes a primary and standby. Placement of the redundant load balancers depends on whether you use a regional subnet (recommended) or two availability domain–specific subnets.

Each load balancer has a backend and a frontend. The backend of the load balancer distributes incoming client requests to a set of servers for processing. The frontend of the load balancer processes incoming client connections.

The backend of a load balancer can be configured to distribute traffic based on three types of policies: round-robin, IP hash, and least connections to the backend server. Each backend server is a member of a backend set. A load balancer can have multiple backend sets, each with a health check and the ability to use SSL encryption for communication with the load balancer. The backend set of the load balancer is where the instance pool is connected (see Part 4 of the series).

The frontend consists primarily of a listener that is configured to handle HTTP, HTTPS, or TCP traffic. The frontend of a load balancer can also be associated with virtual hostnames and route incoming requests to a particular set of backend servers based on a ruleset. Additionally, the frontend of a load balancer can have a set of SSL certificates, private keys, and certificate authorities for each listener.

For more information about the Load Balancing service, including topics like session persistence and HTTP "X-" headers, see the service documentation.

Improved Workflow for Creating Load Balancers

In June, the Oracle Cloud Infrastructure Console was updated with a new workflow for creating load balancers. The workflow simplifies the process of deploying a load balancer into three steps:

You specify details about the core load balancer, the backend, and the frontend.
After you complete the workflow, the load balancer is provisioned, typically in less than one minute.
After the load balancer is provisioned, you configure any virtual hostnames, path routes, and HTTP header rulesets.

Now let’s walk through the new workflow to set up the load balancer for TheSmithStore.

Configure the Load Balancer for TheSmithStore

We configure the load balancer for TheSmithStore to be in a regional public subnet and to respond to traffic by using the HTTPS protocol.

The Load Balancers menu is under the Networking section of the Oracle Cloud Infrastructure Console main menu. On the Load Balancers page, click Create Load Balancer to start the creation workflow.

On the Add Details page of the workflow, you specify type, bandwidth, and network details.

TheSmithStore uses a small, public load balancer, placed in a regional subnet.
On the Choose Backends page of the workflow, specify the load balancing policy, backend servers, and health checks.

TheSmithStore uses the weighted round-robin distribution. The instance pool created in Part 4 of this series attaches as a backend set for this load balancer. The default check to determine the health of backend servers is sufficient for TheSmithStore.

Note: Because the backend set name is a required reference for setting up the instance pool in Part 4, we specify thesmithstore-webservers as the name in the Advanced Options.
On the Configure Listener page of the workflow, specify details related to the frontend, or listener, of the load balancer.

For TheSmithStore, HTTPS is the protocol for client connections. A private key, certificate, and CA certificate are required when using HTTPS.
After you create the load balancer, copy the load balancer OCID from the details page. In addition to the backend set name, the load balancer OCID is required when you set up the instance pool in Part 4.

Note: The public IP address of the load balancer is displayed only after provisioning is complete. This address automatically comes from the Oracle Cloud Infrastructure public IP pool.

And we’re done. A highly available load balancer for TheSmithStore is ready to be used with the instance pool created in Part 4.

Wrapping Up

The load balancer is the public endpoint that web requests from TheSmithStore customers hit first. Until backend servers are attached to the load balancer, connections receive a 502 Bad Gateway error, and the overall health of the load balancer is Unknown. In Part 4 of this series, the backend set attaches to an instance pool that is managed by an autoscaling configuration. Before that, Part 3 covers a few things to consider when using Compute autoscaling.

As always, you can try all these features for yourself with a free trial. If you don't already have an Oracle Cloud account, go to https://www.oracle.com/cloud/free/ to sign up today.

Blog Series Links

Part 1: Autoscaling a Load Balanced Web Application

Part 2: You are here.

Part 3: Designing Images for Compute Autoscaling

Part 4: Using Compute Autoscaling with the Load Balancing Service

-----------------------

This post is the third in the four-part series on Autoscaling a load balanced web application. You can find links for the entire series at the end of this post.

In Part 1, we set up the scenario of TheSmithStore, a fictional online retailer who needs their web application to respond to variable levels of customer traffic. TheSmithStore has decided to harness the scalability of Oracle Cloud Infrastructure to ensure the availability of their web application, even during unexpected traffic spikes. In Part 2, we set up the load balancer.

This part focuses on the custom Compute image and its interaction with the Monitoring service. The following diagram was introduced in Part 1, but here it highlights the image and the Monitoring service. The connection between the them depicts a one-way push relationship. The instance emits raw data points to the service through the OracleCloudAgent.

Part 4 describes how metrics provided by the Monitoring service trigger autoscaling events. This post doesn’t cover details about the Monitoring service. For information, including how the Monitoring service extends beyond Compute instance resources, see the service documentation.

Scaling Custom Images

The core of Oracle Cloud Infrastructure Compute autoscaling is individual compute instances. Part 4 describes how individual instances are launched and terminated based on an autoscaling configuration. When a scale-out event occurs, new instances reference a custom image as part of the launch process. Custom images should be set up not only to run the service that you want but to do so in a way that interacts well with autoscaling. The key to autoscaling is that no human operator is involved in provisioning or de-provisioning individual instances.

Consider the following best practices when creating a custom image to be used with autoscaling.

Launch Immediately into Service

The instance should start and immediately begin to process work. It wouldn't make sense for these servers to need someone to log in and run a command or create a file before the they start to operate. That means there should be a service that runs on system startup.

Prepare at Image Build, not at Instance Launch

Most of the time, an instance pool expands or contracts based on load. When a new instance launches, it should be able to handle work as quickly as possible, which means that it shouldn’t go through a full system build and deploy. Instead, as much preparation as possible should happen when the image is created, such as installing packages, writing configuration, and possibly downloading or indexing data.

Enter the Load Balancer Pool When Fully Warmed Up

Some things are defined only after instance launch. Sometimes these are things that must be unique to a particular host. Sometimes, an instance might need to synchronize or otherwise warm up before it accepts production loads. When this happens, it might be necessary for the instance not to signal that it’s able to accept work until it’s truly available. For load-balanced services, the instance might come up with an “unhealthy” health check on purpose and only respond as “healthy” after it’s fully warmed up.

Publish Metrics to the Monitoring Service

All infrastructure systems determine the appropriate resources based on usage. A human operator could check an instance to see if it’s overloaded. Autoscaling must be able to query an instance's use through the Monitoring service. By default, Oracle-provided images emit raw data to the Monitoring service about memory, disk, CPU, and network. For Linux and Windows images not provided by Oracle, the OracleCloudAgent package is available for manual installation. If you need metrics beyond the standard, you can use custom utilization measurements by following our Publishing Custom Metrics guide.

Terminate Without Data Loss

Instances that are managed by an autoscaling configuration terminate without warning. Instances receive a power-off signal to shut down services gracefully, but if the system doesn’t stop in time, a hard stop occurs. To prevent data loss, your service should complete its work when it receives a terminate signal.

TheSmithStore Implementation

The web server images for TheSmithStore are built on Oracle Linux 7, have the Apache httpd web server installed and configured, and are enabled to run as a service at launch. Apache httpd knows how to gracefully shut down, so no other action is needed on that point. The prelaunch deployment consists of a web server package and scripts required to warm up our service. Finally, we include configurations to ensure that the service launches with a 503 Service Unavailable error message until warmup is complete.

Following is the manual build process for TheSmithStore, after which you follow the documentation to create a custom image.

Note: The image OCID of the custom image is a required reference in Part 4 of this blog series.

First run these commands:

yum install -y httpd

firewall-cmd --zone=public --permanent --add-service=http

systemctl enable httpd

Then write these files:

/etc/httpd/conf.d/maintenance.conf

If you remember the health checks from Part 2, the load balancer is looking for a 200 response from the instance to determine its health. Here, the default 503 error response ensures that the load balancer doesn’t distribute incoming requests to new instances prematurely.

RewriteEngine On

ErrorDocument 503 /maintenance.html

RewriteCond /var/www/html/maintenance.html -f

RewriteRule !^/maintenance.html - [L,R=503]

/var/www/html/maintenance.html

503 Service Unavailable

/usr/local/bin/warmup

#!/bin/sh

# warmup: sync displayName data into our index and mark as healthy for load balancer

curl -Ls http://169.254.169.254/opc/v1/instance/displayName > /var/www/html/index.html

echo >> /var/www/html/index.html

rm -f /var/www/html/maintenance.htm

/etc/systemd/system/warmup.service

[Unit]

After=network.target

[Service]

ExecStart=/usr/local/bin/warmup

[Install]

WantedBy=default.target

Finally run this command:

systemctl enable warmup

Wrapping Up

With that, a custom image is ready to be used with autoscaling. Be sure to copy the image OCID for use in Part 4. In that final post of this series, the custom Compute image created here launches into an instance pool that is attached to the load balancer from Part 2.

As always, you can try all these features for yourself with a free trial. If you don't already have an Oracle Cloud account, go to https://www.oracle.com/cloud/free/ to sign up today.

Blog Series Links

Part 1: Autoscaling a Load Balanced Web Application

Part 2: Preparing a Load Balancer for Instance Pools and Autoscaling

Part 3: You are here.

Part 4: Using Compute Autoscaling with the Load Balancing Service

-------------------------

This post is the final post in the four-part series on Autoscaling a load balanced web application. You can find links for the entire series at the end of this post.

In Part 1, we set up the scenario of TheSmithStore, a fictional online retailer who needs their web application to respond to variable levels of customer traffic. TheSmithStore is harnessing the scalability of Oracle Cloud Infrastructure to ensure the availability of their web application, even during unexpected traffic spikes. In Part 2, we set up the load balancer. In Part 3, we set up a custom Compute image.

In this final part, we create an Oracle Cloud Infrastructure Compute autoscaling configuration for TheSmithStore. The following diagram was introduced in Part 1, but here it highlights the instance configuration, instance pool, and autoscaling components. Note that only the relationship between autoscaling and the instance pool is depicted as directional. The autoscaling service actively modifies an instance pool. All of the other relationships are purely resource references to each other.

Before You Get Started

In this post, the Oracle Cloud Infrastructure CLI uses prepared JSON files for input. The CLI is set up with an AUTOSCALE profile for this example. The profile includes user, fingerprint, key_file, tenancy, and region in the CLI config file, and compartment-id in the oci_cli_rc file.

If you haven’t used the CLI, read the Use the CLI with Restricted Object Storage Buckets blog post or visit the CLI documentation.

In this final post in the series, we’re joining all of the earlier components of the configuration for TheSmithStore. The following resource references are required:

Instance configuration: Custom Image OCID from Part 3
Instance pool:
- Load balancer OCID from Part 2
- Load balancer backend set name from Part 2

Additionally, instance configurations require references to subnets and availability domains, for instance provisioning.

Autoscaling TheSmithStore

Compute autoscaling works by modifying the number of compute instances in an instance pool based on the aggregated performance metrics of the pool. Before creating an autoscaling configuration, we need to create the instance configuration and instance pool resources. In the case of TheSmithStore, we use one instance configuration that references the custom web server compute image created in Part 3 of this series.

Create the Instance Configuration

To focus more on the autoscaling part of this solution, the following JSON file contains the minimal values required to create a valid instance configuration. In addition to the image OCID created in Part 3, a compartment OCID is a required resource reference. The compartment OCID, in this case, is the compartment where you want to launch instances.

thesmithstore-instance-configuration.json

{

  "display-name": "thesmithstore-webserver",

  "instance-details": {

    "instance-type": "compute",

    "launch-details": {

      "compartment-id": "ocid1.compartment.oc1..aaaaaaaam5onfkwnvelxft5och4u323i53yjpon7wwl",

      "create-vnic-details": {

      },

      "metadata": {

        "ssh_authorized_keys": "ssh-rsa AAAAB3NzaCzh72+7QGDPgpP36F5WotZ/OYeYe1YDWmJIDjVBHCj8Q6T7Oa/uVbecoyrmr0NGesES5RXljiyYcXDioPIXSAZRMWjheeAL admin@localhost"

      },

      "shape": "VM.Standard2.1",

      "source-details": {

        "image-id": "ocid1.image.oc1.iad.aaaaaaaaas2uz4snqbk2fp2hipk3euk3sfdzidxl",

        "source-type": "image"

      }

    }

  }

}

After you have the JSON file, you run the following CLI command to create the instance configuration:

$ oci --profile AUTOSCALE compute-management instance-configuration create --instance-details file://thesmithstore-instance-configuration.json

For more information about instance configurations and instance pools, see the Enhanced Compute Instance Management on Oracle Cloud Infrastructure blog post.

Create the Instance Pool

Again, to focus more on the autoscaling part of this solution, the following JSON file contains the minimal values required to create a valid instance pool.

The two resources that need to exist before creating the instance pool for TheSmithStore are a load balancer and subnets for web server instances. The instance pool references the load balancer OCID created in Part 2 of this series. The web server instances launch in a private regional subnet, so the OCID of that subnet is referenced in the configuration. Security list rules need to be in place to allow traffic from the load balancer to the private subnet. One final thing to note is that the initial size of the pool is zero, intentionally.

Required resource references:

instance-configuration-id: Use the OCID output from the previous instance-configuration create command.
load-balancers:load-balancer-id: Use the load balancer OCID created in Part 3 of this series.
load-balancers:backend-set-name: Use the load balancer backend set name created in Part 3.
placement-configurations:availability-domain: The availability domains where instances launch. When creating an instance pool, include only availability domains that have similar compute limits. An availability domain that has reached is a resource limit could prevent an instance pool from scaling out.
placement-configurations:primary-subnet-id: Because we’re using a private regional subnet, we can use the same OCID for both availability domain placement configurations.

thesmithstore-instance-pool.json

{

  "display-name": "thesmithstore-webserver-pool",

  "instance-configuration-id": "ocid1.instanceconfiguration.oc1.iad.aaaaaaaarba73lx3hempves4xgsv3owdrhu73pjq",

  "load-balancers": [

    {

      "load-balancer-id": "ocid1.loadbalancer.oc1.iad.aaaaaaaaqmphpkkyvdefw5obapsijzawpckgkc5e4u2",     

      "backend-set-name": "thesmithstore-web-backend-set",

      "port": 80,

      "vnic-selection": "PrimaryVnic"

    }

  ],

  "placement-configurations": [

    {

      "availability-domain": "XXXX:US-PHOENIX-AD-1",

      "primary-subnet-id": "ocid1.subnet.oc1.iad.aaaaaaaamrpvh7fr4i7eampef7nakslftafxpqtdesya"

    },

    {

      "availability-domain": "XXXX:US-PHOENIX-AD-2",

      "primary-subnet-id": "ocid1.subnet.oc1.iad.aaaaaaaamrpvh7fr4i7eampef7nakslftafxpqtdesya"

    }

  ],

  "size": 0

}

Like the previous command, preparing the JSON file beforehand simplifies the CLI command:

$ oci --profile AUTOSCALE compute-management instance-pool create --from-json file://thesmithstore-instance-pool.json

Note: The instance pool must exist in the same compartment as the instance configuration.

Create the Autoscaling Configuration

As mentioned earlier, autoscaling works by modifying the number of compute instances in an instance pool based on the aggregated performance metrics of the pool. The autoscaling configuration for TheSmithStore is simple and starts with two compute instances. The pool scales out (add instances) when CPU utilization of the pool is above 80%, launching two instances at a time until a maximum of eight instances exists. The pool scales in (terminate instance) when CPU utilization of the pool is below 40%, terminating one instance at a time until the minimum of two is reached.

Required resource references:

resource:id: Use the OCID output from the previous instance-pool create command.

thesmithstore-instance-pool.json

{

  "display-name": "thesmithstore-autoscale-webserver-pool",

  "policies": [

    {

      "capacity": {

        "initial": 2,

        "max": 8,

        "min": 2

      },

      "display-name": "thesmithstore-cpu-80-40",

      "policy-type": "threshold",

      "rules": [

        {

          "action": {

            "type": "CHANGE_COUNT_BY",

            "value": 2

          },

          "display-name": "scale-out-rule",

          "metric": {

            "metric-type": "CPU_UTILIZATION",

            "threshold": {

              "operator": "GT",

              "value": 80

            }

          }

        },

        {

          "action": {

            "type": "CHANGE_COUNT_BY",

            "value": -1

          },

          "display-name": "scale-in-rule",

          "metric": {

            "metric-type": "CPU_UTILIZATION",

            "threshold": {

              "operator": "LT",

              "value": 40

            }

          }

        }

      ]

    }

  ],

  "resource": {

    "id": "ocid1.instancepool.oc1.iad.aaaaaaaawqw6t26kssqeerydb5awwquu7xgzk5",

    "type": "instancePool"

  }

}

All CLI commands allow input from a JSON file.

Note: The CLI options --generate-full-command-json-input and --generate-param-json-input are helpful in formatting the JSON input file.

$ oci --profile AUTOSCALE autoscaling configuration create --from-json file://thesmithstore-autoscale-configuration.json

To learn more about Compute autoscaling, visit the documentation or Right-Size Your Workloads with Oracle Cloud Infrastructure Autoscaling blog post.

Final Configuration

As soon as the autoscaling configuration appears in the tenancy, the instance pool size updates from 0 to 2, and two Compute instances launch. The instance pool has a SCALING status on the Instance Pool Details page in the Console.

Now that the configuration for TheSmithStore is complete, let's look back at the diagram from Part 1 and see how all of the Oracle Cloud Infrastructure components are configured:

The instance pool is a backend set for the load balancer.
The custom Compute instances publish metrics to the Monitoring service.
The autoscaling configuration manages the instance pool based on aggregate metrics.

Wrapping Up

Load balancers, instance configurations, instance pools, and the Monitoring service have more settings than are covered by TheSmithStore scenario. If you want to revisit any post in this series or see related documentation, use the following links.

Be on the lookout for an upcoming white paper that covers all of the details of building a hyperscale web application on Oracle Cloud Infrastructure, including deploying in multiple regions, DNS traffic management, and DBaaS.

Everything discussed in this blog series is available with a trial account. If you don't have an Oracle Cloud Infrastructure account yet, go to https://www.oracle.com/cloud/free/ to sign up today.

Blog Series Links

Part 1: Autoscaling a Load Balanced Web Application

Part 2: Preparing a Load Balancer for Instance Pools and Autoscaling

Load Balancing service documentation

Part 3: Designing Images for Compute Autoscaling

Part 4: You are here.

Subrat's Technical Blog

Saturday, October 12, 2019