137: Data Communications and Networking: August 2013

OVERVIEW

The High-Availability cluster and the Cluster Resource Manager are responsible for the recovery of nodes in case of failure in the network occurs. The concept of Active and Passive Cluster will be discussed in this activity.

Active/Passive Cluster

The Active/Passive Cluster works when a node (active node) fails, the passive node will assume the role of the failed node, and the service will continue.

Two computers (servers) are needed in this cluster. One computer will serve as the active node, while the other is the passive node. Once the process execution of the active node fails, the passive node will take over of the execution, so that the failure will never occur.

Active/Passive Cluster

[Reference: http://opentodo.net/2012/04/configuring-a-failover-cluster-with-heartbeat-pacemaker/]

PROBLEM STATEMENT

The activity needs to be performed by at least two people. Two to three computers are needed to execute the activity. The concept of IP configuration will be of help because assigning IP addresses and names to each computer is needed, and a virtual IP address to a computer that will act as a server. The Active/Passive Cluster will be use.

GUIDES IN SOLVING THE PROBLEM

Configuring the needed configuration files

To be able to configure two servers using Apache and Heartbeat to communicate with a client or browser, we need to edit several configuration files. And these configuration files are as follows:

- ha.cf

The ha.cf configuration file can be found at /etc/ha.d directory. Just type $ sudo gedit /etc/ha.d/ha.cf to edit the file. The basis of the contents of ha.cf is from this site http://www.zivtech.com/blog/setting-ip-failover-heartbeat-and-pacemaker-ubuntu-lucid. This file should contain the name of the two servers needed.

Edit ha.cf configuration file

- authkeys

The authkeys configuration file can be found at the /etc/ha.d directory. To edit this file, type $ sudo gedit /etc/ha.d/authkeys on the terminal. The basis of the contents of this file is from this site http://opentodo.net/2012/04/configuring-a-failover-cluster-with-heartbeat-pacemaker/

Edit authkeys file

-haresources

The haresources file can be found at /etc/ha.d directory. It contains the name of the server assigned to be the active node, and the virtual IP address needed.

Edit haresources file

- /etc/hosts

The /etc/hosts file should contain the name of the two computers which will act as the two server with its distinct IP address. Also, the virtual IP address or alias of another server holding the said two servers should be specified along with its name. To edit this file, type $ sudo gedit /etc/hosts.

Edit /etc/hosts file

- /etc/hostname

The /etc/hostname file should contain the name of the computer you are using. It is neccessary for the ha.cf and /etc/hosts files.To edit this file, type $ sudo gedit /etc/hostname.

Edit /etc/hostname file

Test the Active/Passive Cluster

Try the following commands to test your set-up:

- Check if Apache is running.

Command: $ sudo /etc/init.d/apache2 start

Start Apache

- Check if Heartbeat is running.

Command: $ sudo /etc/init.d/heartbeat restart

Restart Heartbeat

- Try to access the virutal IP address (10.0.5.66) in the browser using another computer. It should display the web page of the active node (server1).

Web Page of server1 (Active Node)

- Disconnect the LAN cable of the active node (server1). Refresh the web page. It should display the web page of the passive node (server2).

Web Page of server2 (Passive Node)

LEARNING AND INSIGHTS

Configuring the configuration files is a bit challenging because we didn't know how to do it. Thanks to the power of the Internet, we were able to do it (barely).

Through patience, we were able to start heartbeat successfully. Also, we were able to access the active server via the virtual IP address. And if we disconnect the LAN cable of the active server, the browser can display the page of the passive server. We were so overwhelmed that we were able to do this after two lab meetings. T__T

***Lesson learned (for me): In order to avoid problems, the server should be regularly restarted so that changes can be made on the server.

CONCLUSION
The activity, for me is so difficult. It is difficult to accomplish because once you have made mistakes on the configuration files, you need to check all those files and trace what was wrong. Though it is kinda confusing, we were able to fulfill the tasks needed in this activity. We barely surpass the trial and error on all the computers in the laboratory to have a better access on our virtual server.

OVERVIEW

High-Availability, commonly referred as HA, and Cluster Resource Manager or CRM both facilitate the monitoring of the system's hardware and software failures. They works when one of the connected hosts experience failure, the other will act as a substitute to work on the process of that failed host is working on. So basically, HA and CRM will salvage the running process of a failed certain node via the other node connected to that said failed node. This activity will focus on the concepts of HA and CRM, and on their processes on how they work on the computers.

PROBLEM STATEMENT

A research about HA and CRM is needed to conduct this activity. Definitions, installation and configure, and diagrams of both processes are needed to understand them thoroughly. After the research, both HA and CRM should be installed in the student's computer for future purposes concerning both processes.

GUIDES ON SOLVING THE PROBLEM

HIGH_AVAILABILITY (HA) with Heartbeat

*Definition and Technicalities

High-Availability (HA) refers to a system or a component that is continuously operational for a desirably long length of time. According to Wikipedia, HA is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period. It can be measured relative to "100% operational" or "never failing". HA focuses on the backup and failover processing and data storage and access. Data storage can be redundant array of independent disks (RAID) or storage area network (SAN).

*Installation and Configuration of Heartbeat

I used the command $ sudo-apt get install heartbeat to install Heartbeat in my computer.

Installation of Heartbeat

*Heartbeat Configuration Files

The configuration files can be found at /etc/ha.d directory.

Some configuration files are as follows:

- ha.cf : It is the configuration file for Heartbeat cluster messaging layer. It lists the communication facilities enabled between nodes, enables or disables certain features, and optionally lists the cluster nodes by host name.

- authkeys : It is the authentication file for Heartbeat cluster messaging layer. It enables Heartbeat to securely authenticate cluster nodes.

[Reference: http://linux.die.net/man/5/authkeys]

[Reference: http://linux.die.net/man/5/ha.cf]

*Illustrations and Diagrams

High-Availability Cluster Network Diagram

This diagram shows the HA cluster. It involves two nodes (at least) to support continuous service if a node fails. The other node will support the node's failure, and continue the service.

To be able to run HA cluster environment, an application must be able to use shared storage (SAN/NAS), must store all of its state until the last state before its failure at a non-volatile shared storage, must have a command line interface or scripts to control the application, and must not corrupt data if the application crashes.

CLUSTER RESOURCE MANAGER (CRM) with Pacemaker

*Definition and Technicalities

Cluster Resource Manager (CRM) monitors the system for both hardware and software failures. Pacemaker, a cluster resource manager detects and help recover machines and applications from failures. It supports maximum availability of a cluster by detecting and recovering from node and resource-level failures by making use of the messaging and membership capabilities provided by a cluster infrastructure (like Heartbeat). Pacemaker automatically recover your application and make sure it is available from one of the remaining machines in the cluster. After failure, it uses advanced algorithms to quickly determine the optimum locations for services based on relative node preferences and requirement with other cluster services.

[Reference: clusterlabs.org/wiki/Pacemaker]

*Illustrations and Diagrams

This type of configuration allows two nodes to be active or passive using Pacemaker and DRBD, which is cost-effective solution for many high-availability situations. Once the active nodes fail, passive node will act as its substitute.

*HA and CRM to a 3-year old child (Simple Explanation)
High-Availability Definition

High-Availability works, for example, when your toy car or your toy house castle lasts for how many years of you playing with it. It means that the parts of that toy car or toy house castle are well-designed fort to last that long, given that you've played with it very well.

Cluster Resource Manager Definition
For example, when you play your music box or the music player in your house, and suddenly it makes a screeching sound. And then after a few seconds or a few minutes, it went back to its normal behaviour, and play the music beautifully. That's how Cluster Resource Manager works. It recovers from its failure to play the music normally once the music or the music player detects what is wrong with its mechanism.

*Note: The implementation of HA and CRM will be tackled next laboratory session.

LEARNING AND INSIGHTS

Apache, Heartbeat, and Pacemaker create High-Availability on web servers. They make sure that when one node in a network fail to execute a process, another node from that same network will continue to execute that process. In this sense, it can be implied that the process needed to be execute will push through.

CONCLUSION

Basically, the activity is to research the given questions in the laboratory, and install and configure Heartbeat and Pacemaker in the computer. Though it is difficult to explain the technicalities of the processes in simple words, it can be done because of the help of the diagrams showing how HA and CRM works.

137: Data Communications and Networking

Monday, August 26, 2013

Session 7: Implementation of HA and CRM

Monday, August 12, 2013

Session 6: High-Availability and Cluster Resource Management