In one of my previous posts, I showed how you can deploy OpenShift Enterprise on AWS using the official CloudFormation template provided by Red Hat. For that deployment to work, you had to have a Red Hat/OpenShift enterprise account which many developers might not have especially if you are looking to just spin up a cluster for personal projects.
That’s why in this post, I will be showing you how you can deploy an OpenShift cluster easily on AWS without an enterprise account using OpenShift Origin 3.11. While OpenShift Origin is free to use, we will be using some EC2 instances (t2.xlarge and t2.large) that are not free. Keep that in mind.
So, the easiest way to deploy any kind of cluster on AWS is through a CloudFormation template because it can automate the whole process and allows you to easily shut down the cluster once you are done. It’s even better if someone else has already written this CloudFormation template for you. In our case, we will using the template shared by Thilina Manamgoda here. I will be making minor changes to the template and the steps from the original post to show what worked for me. Credit completely goes to Thilina for making our lives easier and hopefully, this post will make your life easier.
Deploying an AWS cluster
Before we can install OpenShift, we will need to first deploy a simple AWS cluster first.
Our cluster will consist of:
- 1 master node (t2.xlarge, 30GB EBS)
- 2 worker nodes (t2.large, 30GB EBS)
Besides that, Thilina’s OpenShift template will also setup some security groups.
Creating a Key Pair
If you have a key pair already in this region then you don’t need to create one. If you don’t, you can create one by going to EC2 page >> Key Pairs tab under Network & Security section on the left side >> Create Key Pair.
This key pair will be used by CloudFormation to create the stack as well as by you to access your servers.
Finding the right AMI
To use the CloudFormation template, you need to provide it with an AMI to use for the EC2 instances. This can depend on your requirements and which region you are deploying in. I chose to deploy my cluster in ‘Tokyo’ region.
Based on that, you will need to find a suitable AMI available in that region. Note that all AMIs you see on AWS are not available in all regions so you need to make sure that your AMI is available in your region.
The requirement to get this template to work is that your AMI should be using CentOS (x86_64) with Updates HVM. I chose an AMI with CentOS 7.4 – ami-0c1c738e580f3e01f.
You can search for public AMIs by clicking on ‘AMIs’ link under ‘Images’ category on left navigation on EC2 page.
There you can search for any image which has CentOS. Copy the ‘AMI ID’ when you have made your decision because you will need it later.
Getting your VPC ID
You will also need your VPC ID to use the CloudFormation template. You can get it by going to your VPC page and then selecting ‘Your VPCs’ from left navigation bar. You will see your VPCs there (if you have multiple), including your Default VPC. Copy the VPC ID for the VPC you want to deploy this cluster into because you will need it later as well.
Creating your IAM role
You will need to create an IAM role which will be attached to your EC2 instances when they are created. If you don’t have an IAM role already, easiest way to create one is through the AWS portal. You can do this via CloudFormation as well but for now, we will just do it via the AWS Portal.
To create a new IAM role, go to IAM page and click on ‘Roles’ on left navigation bar and then click on ‘Create role’ button.
On the next page, select ‘AWS Services’ and then select ‘EC2’ for ‘Choose the Service that will use this role’.
For this demo purpose, I am going to give admin privileges to my role. This is not recommended for other use cases. Pick the appropriate Policy name and click ‘Next’.
Add Tags if you like on the next page and click ‘Next’.
Finally, add your Role name. I will call mine openshift-admin. Click ‘Create role’ and you now have a new role which you can use for your Openshift deployment.
Finalizing CloudFormation Template
Now that we have the AMI and VPC ID, add them to Thilina’s OpenShift template. Replace AMI_ID and VPC_ID with the appropriate values. Note that I have added ‘Name’ tag so that when you go to your EC2 Dashboard, you can easily identify which instance is your master node and which instances are your worker nodes.
Additionally, replace IAM_ROLE with the name of your IAM role. In my case, that would be the role I created in the previous section: openshift-admin.
AWSTemplateFormatVersion: "2010-09-09"
Parameters:
KeyPairName:
Description: "The private key used to log in to instances through SSH"
Type: 'AWS::EC2::KeyPair::KeyName'
Resources:
Master:
Type: "AWS::EC2::Instance"
Properties:
Tags:
- Key: Name
Value: openshift-master
- Key: kubernetes.io/cluster/openshift
Value: owned
IamInstanceProfile: IAM_ROLE
ImageId: "AMI_ID"
InstanceType: "t2.xlarge"
KeyName: !Ref KeyPairName
SecurityGroupIds:
- !Ref OpenshiftMasterSecurityGroup
- !Ref OpenshiftInternalSecurityGroup
BlockDeviceMappings:
- DeviceName: "/dev/sda1"
Ebs:
VolumeType: "io1"
Iops: "200"
DeleteOnTermination: "true"
VolumeSize: "30"
Node1:
Type: "AWS::EC2::Instance"
Properties:
Tags:
- Key: Name
Value: openshift-node1
- Key: kubernetes.io/cluster/openshift
Value: owned
IamInstanceProfile: IAM_ROLE
ImageId: "AMI_ID"
InstanceType: "t2.xlarge"
KeyName: !Ref KeyPairName
SecurityGroupIds:
- !Ref OpenshiftSSHSecurityGroup
- !Ref OpenshiftInternalSecurityGroup
BlockDeviceMappings:
- DeviceName: "/dev/sda1"
Ebs:
VolumeType: "io1"
Iops: "200"
DeleteOnTermination: "true"
VolumeSize: "30"
Node2:
Type: "AWS::EC2::Instance"
Properties:
Tags:
- Key: Name
Value: openshift-node2
- Key: kubernetes.io/cluster/openshift
Value: owned
IamInstanceProfile: IAM_ROLE
ImageId: "AMI_ID"
InstanceType: "t2.xlarge"
KeyName: !Ref KeyPairName
SecurityGroupIds:
- !Ref OpenshiftSSHSecurityGroup
- !Ref OpenshiftInternalSecurityGroup
BlockDeviceMappings:
- DeviceName: "/dev/sda1"
Ebs:
VolumeType: "io1"
Iops: "200"
DeleteOnTermination: "true"
VolumeSize: "30"
OpenshiftMasterSecurityGroup:
Type: 'AWS::EC2::SecurityGroup'
Properties:
VpcId: VPC_ID
GroupDescription: Openshift Security Group for Master node
Tags:
- Key: kubernetes.io/cluster/openshift
Value: owned
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 8443
ToPort: 8443
CidrIp: 0.0.0.0/0
OpenshiftSSHSecurityGroup:
Type: 'AWS::EC2::SecurityGroup'
Properties:
VpcId: VPC_ID
GroupDescription: Openshift Security Group for Internal SSH
Tags:
- Key: kubernetes.io/cluster/openshift
Value: owned
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
SourceSecurityGroupId: !Ref OpenshiftMasterSecurityGroup
OpenshiftInternalSecurityGroup:
Type: 'AWS::EC2::SecurityGroup'
Properties:
VpcId: VPC_ID
GroupDescription: Openshift Security Group for Internal nodes
Internal53TCPIngress:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
GroupId: !Ref OpenshiftInternalSecurityGroup
IpProtocol: tcp
FromPort: 53
ToPort: 53
SourceSecurityGroupId: !Ref OpenshiftInternalSecurityGroup
Internal8053TCPIngress:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
GroupId: !Ref OpenshiftInternalSecurityGroup
IpProtocol: tcp
FromPort: 8053
ToPort: 8053
SourceSecurityGroupId: !Ref OpenshiftInternalSecurityGroup
Internal8053UDPIngress:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
GroupId: !Ref OpenshiftInternalSecurityGroup
IpProtocol: udp
FromPort: 8053
ToPort: 8053
SourceSecurityGroupId: !Ref OpenshiftInternalSecurityGroup
Internal53UDPIngress:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
GroupId: !Ref OpenshiftInternalSecurityGroup
IpProtocol: udp
FromPort: 53
ToPort: 53
SourceSecurityGroupId: !Ref OpenshiftInternalSecurityGroup
Internal2379Ingress:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
GroupId: !Ref OpenshiftInternalSecurityGroup
IpProtocol: tcp
FromPort: 2379
ToPort: 2379
SourceSecurityGroupId: !Ref OpenshiftInternalSecurityGroup
Internal4789Ingress:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
GroupId: !Ref OpenshiftInternalSecurityGroup
IpProtocol: tcp
FromPort: 4789
ToPort: 4789
SourceSecurityGroupId: !Ref OpenshiftInternalSecurityGroup
Internal10250Ingress:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
GroupId: !Ref OpenshiftInternalSecurityGroup
IpProtocol: tcp
FromPort: 10250
ToPort: 10250
SourceSecurityGroupId: !Ref OpenshiftInternalSecurityGroup
Save the template to a file locally.
Create CloudFormation Stack
Now that we have the template, go to CloudFormation page and click on ‘Create Stack’ and upload your template file by clicking on ‘Upload a template file’ button and selecting ‘Choose file’.
Click ‘Next’ once your template has been uploaded.
Enter a value for ‘Stack Name’. I simply called mine ‘openshift’.
In Parameters, you have to provide a value for ‘KeyPairName’. This is the key pair you want to use to create your stacks as well as to access your servers later. Select the appropriate key pair and click ‘Next’.
You can leave next page as it is and click ‘Next’.
Review your details and click ‘Create stack’.
It should take approximately 1 minute for your stack to be created. Your Stack will be in ‘CREATE_COMPLETE’ status once it’s up. You might get an error in this stage saying your AMI was not available in this region. Make sure to find a public AMI that is available in the region you are deploying your stack in.
You can go to your EC2 page to see your master node and worker nodes that were created by the template.
Installing necessary libraries
Now that we have our servers up and running, we can go ahead and install necessary libraries which are needed to finally install OpenShift Origin. We will be using Ansible to automate this process.
To be able to use Ansible to deploy OpenShift Origin on all 3 servers, you will need to create 2 files (prepare.yaml and inventory.yaml) on your master node in the home directory.
Copy your Key Pair to your Master Node
We will need to copy our key pair to our master node so we can login to our worker nodes from our master node.
$ scp -i himanshu-tokyo.pem himanshu-tokyo.pem centos@<ip-address>.ap-northeast-1.compute.amazonaws.com:/home/centos/
himanshu-tokyo.pem 100% 1692 8.7KB/s 00:00
Create inventory.yaml file
Create inventory.yaml on your local host and replace the following keywords with the actual values that you can find on your EC2 page:
- public_hostname_master_node
- private_hostname_master_node
- private_hostname_worker_node_1
- private_hostname_worker_node_2
[OSEv3:children]
masters
etcd
nodes
[OSEv3:vars]
ansible_ssh_user=centos
ansible_sudo=true
ansible_become=true
deployment_type=origin
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
openshift_install_examples=true
openshift_docker_options='--selinux-enabled --insecure-registry 172.30.0.0/16'
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_htpasswd_users={'admin' : '$apr1$zTCG/myL$mj1ZMOSkYg7a9NLZK9Tk9.'}
openshift_master_default_subdomain=apps.public_hostname_master_node
openshift_master_cluster_public_hostname=public_hostname_master_node
openshift_master_cluster_hostname=public_hostname_master_node
openshift_disable_check=disk_availability,docker_storage,memory_availability
openshift_hosted_router_selector='node-role.kubernetes.io/infra=true'
openshift_cloudprovider_kind=aws
openshift_clusterid=openshift
[masters]
private_hostname_master_node
[etcd]
private_hostname_master_node
[nodes]
private_hostname_master_node openshift_node_group_name='node-config-master-infra' openshift_schedulable=true
private_hostname_worker_node_1 openshift_node_group_name='node-config-compute'
private_hostname_worker_node_2 openshift_node_group_name='node-config-compute'
Note that we are creating a user here with username ‘admin’ and password ‘admin’ as well. These credentials will be used to login to OpenShift after it has been deployed. Note: This is obviously not the recommended way for production applications but for our demo use case, it works. htpasswd is being used here as the identity provider.
Once created, copy it over to your master node.
scp -i himanshu-tokyo.pem inventory.yaml centos@<ip-address>.ap-northeast-1.compute.amazonaws.com:/home/centos/
Create prepare.yaml file
Similarly, create a new file locally called prepare.yaml with the following content and copy it to your master node. You don’t need to change anything here.
---
-
gather_facts: false
hosts: nodes
pre_tasks:
-
name: "install python2"
raw: "sudo yum install -y python"
-
name: "remove docker"
raw: "sudo yum remove docker* -y"
-
name: "remove kubernetes"
raw: "sudo yum remove kube* -y"
-
name: "install NetworkManager"
raw: "sudo yum install -y NetworkManager"
tasks:
-
name: "install the latest version of Apache"
retries: 3
yum:
name: docker
state: latest
-
name: "enable network-manager"
shell: "sudo systemctl enable NetworkManager && sudo systemctl start NetworkManager"
-
name: "enable docker"
retries: 3
shell: "sudo systemctl enable docker && sudo systemctl start docker"
-
name: "Add repository"
yum_repository:
baseurl: "https://rpms.svc.ci.openshift.org/openshift-origin-v3.11/"
description: "OKD 311 repo"
gpgcheck: false
name: okd
I have added two steps in the prepare.yaml file: remove docker and remove kubernetes. When I was using Thilina’s version, I was getting errors saying there was a conflict with my existing docker and kubernetes libraries installed on my servers. This could be due to the AMI I was using and you might not encounter it at all but after a lot of trial and error, I realized removing docker* and kubernetes libraries allowed my installation to proceed.
scp -i himanshu-tokyo.pem prepare.yaml centos@<ip-address>.ap-northeast-1.compute.amazonaws.com:/home/centos/
prepare.yaml is used to install and configure few things (Docker, Network Manager Service, SE Linux Policies, OpenShift Origin) on all our servers via ansible.
Login to your Master Node
Now that we have the necessary files copied over to our master node, let’s login to the server.
$ ssh -i himanshu-tokyo.pem centos@<ip-address>.ap-northeast-1.compute.amazonaws.com
Do an ‘ls’ to confirm that we have all the necessary files:
[centos@<ip-address> ~]$ ls
himanshu-tokyo.pem inventory.yaml prepare.yaml
Install Git and check out OpenShift Origin
[centos@<ip-address> ~]$ sudo yum -y install git
Once git is installed, checkout OpenShift Origin and then switch to 3.11 branch.
[centos@<ip-address> ~]$ git clone https://github.com/openshift/openshift-ansible
Cloning into 'openshift-ansible'...
remote: Enumerating objects: 144151, done.
remote: Total 144151 (delta 0), reused 0 (delta 0), pack-reused 144151
Receiving objects: 100% (144151/144151), 39.31 MiB | 12.09 MiB/s, done.
Resolving deltas: 100% (90536/90536), done.
[centos@<ip-address> ~]$ cd openshift-ansible/
[centos@<ip-address> openshift-ansible]$ git checkout release-3.11
Branch release-3.11 set up to track remote branch release-3.11 from origin.
Switched to a new branch 'release-3.11'
Install Ansible
Thilina’s post recommends installing pip and then install a specific version of ansible (2.6.5) but I have tried with latest version and it has worked for me.
[centos@<ip-address> ~]$ sudo yum install ansible
SSH into your worker nodes from your master node
You will need to SSH into your worker nodes so that these hostnames can be added to known hosts during installation. Note you will have to run the ssh command from your master node.
/ SSH into worker node 1 from master node
[centos@<ip-address> ~]$ ssh -i himanshu-tokyo.pem <private_ip_worker_node1>
/ SSH into worker node 2 from master node
[centos@<ip-address> ~]$ ssh -i himanshu-tokyo.pem <private_ip_worker_node2>
Installing OpenShift Origin
At this point, you should be ready to run the prepare.yaml and inventory.yaml scripts to deploy OpenShift Origin on all your servers.
Go back to your home directory and run the following command:
[centos@<ip-address> ~]$ ansible-playbook prepare.yaml -i inventory.yaml --key-file himanshu-tokyo.pem
If you get the following error, simply run the previous command again:
TASK [enable docker] ***************************************************************************************
fatal: [<ip-address>]: FAILED! => {"changed": true, "cmd": "sudo systemctl enable docker && sudo systemctl start docker", "delta": "0:00:00.221729", "end": "2019-10-22 15:40:59.955707", "msg": "non-zero return code", "rc": 1, "start": "2019-10-22 15:40:59.733978", "stderr": "Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.\nJob for docker.service failed because the control process exited with error code. See \"systemctl status docker.service\" and \"journalctl -xe\" for details.", "stderr_lines": ["Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.", "Job for docker.service failed because the control process exited with error code. See \"systemctl status docker.service\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []}
Here is what the output should look like when you run the command again:
[centos@<ip-address> ~]$ ansible-playbook prepare.yaml -i inventory.yaml --key-file himanshu-tokyo.pem
PLAY [nodes] **********************************************************************************************************
TASK [install python2] ************************************************************************************************
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
TASK [remove kube] *****************************************************************************************
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
TASK [install NetworkManager] ******************************************************************************
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
TASK [install NetworkManager] *****************************************************************************************
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
TASK [install the latest version of Apache] ***************************************************************************
ok: <IP_ADDRESS>
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
TASK [enable network-manager] *****************************************************************************************
[WARNING]: Consider using 'become', 'become_method', and 'become_user' rather than running sudo
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
TASK [enable docker] **************************************************************************************************
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
TASK [Add repository] *************************************************************************************************
ok: <IP_ADDRESS>
changed: <IP_ADDRESS>
changed: <IP_ADDRESS>
PLAY RECAP ************************************************************************************************************
<IP_ADDRESS> : ok=6 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
<IP_ADDRESS> : ok=6 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
<IP_ADDRESS> : ok=6 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Now we are ready to run prerequisites.yml file from the git repo we checked out:
[centos@<ip-address> ~]$ ansible-playbook openshift-ansible/playbooks/prerequisites.yml -i inventory.yaml --key-file himanshu-tokyo.pem
I won’t paste the output of the above command here because it is really long. It should take about a minute to complete.
Once it’s done, run the following command to deploy the cluster:
[centos@<ip-address> ~]$ ansible-playbook openshift-ansible/playbooks/deploy_cluster.yml -i inventory.yaml --key-file himanshu-tokyo.pem
Again, the output of this command is really long so I won’t paste it here. It will take few minutes to complete so feel free to grab a cup of coffee while you wait.
Here is what the last bit of my output looked like:
PLAY RECAP *************************************************************************************************
<ip-address> : ok=128 changed=53 unreachable=0 failed=0 skipped=164 rescued=0 ignored=0
<ip-address> : ok=128 changed=53 unreachable=0 failed=0 skipped=164 rescued=0 ignored=0
<ip-address> : ok=719 changed=312 unreachable=0 failed=0 skipped=1012 rescued=0 ignored=0
localhost : ok=11 changed=0 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0
INSTALLER STATUS *******************************************************************************************
Initialization : Complete (0:00:17)
Health Check : Complete (0:00:42)
Node Bootstrap Preparation : Complete (0:03:19)
etcd Install : Complete (0:00:37)
Master Install : Complete (0:04:26)
Master Additional Install : Complete (0:00:34)
Node Join : Complete (0:00:38)
Hosted Install : Complete (0:00:50)
Cluster Monitoring Operator : Complete (0:01:19)
Web Console Install : Complete (0:00:53)
Console Install : Complete (0:00:25)
Service Catalog Install : Complete (0:03:37)
Phew, this is it guys. At this point, OpenShift Origin has been installed on our three nodes. To confirm that, run this command:
[centos@ip-172-31-28-201 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
<internal-ip-address> Ready compute 15m v1.11.0+d4cacc0
<internal-ip-address> Ready compute 15m v1.11.0+d4cacc0
<internal-ip-address> Ready infra,master 18m v1.11.0+d4cacc0
Sweet! This confirms that OpenShift was installed successfully. Now, let’s bring up the OpenShift WebUI by going to: https://<public_hostname_master_node>:8443/console
You should be able to see the login page where you can login by using username/password as ‘admin’.
That’s it. We now have OpenShift Origin installed!
Remember to delete your stack once you are done from CloudFormation page or else you will incur unnecessary costs!
Once again, I would like to thank Thilina for his original post which helped me setup OpenShift. This post is simply meant to complement Thilina’s work and to show some potential errors you might encounter while installing OpenShift.
2 replies on “Deploying an OpenShift (okd) cluster on AWS”
Hi Himanshu ,
Good Document , I am also planning to build for learning , additionally AWS billing estimation will be helpful if you can update
Hi Himanshu ,
I tried to follow step by step your document , but at the end the follow error appear
TASK [openshift_control_plane : Report control plane errors] **************************
fatal: [ip-172-31-27-224.eu-south-1.compute.internal]: FAILED! => {“changed”: false, “msg”: “Control plane pods didn’t come up and become ready”}
Failure summary:
1. Hosts: ip-172-31-27-224.eu-south-1.compute.internal
Play: Configure masters
Task: Report control plane errors
Message: Control plane pods didn’t come up and become ready
And after that it is not more possible to log in to any node !
Have you any clue ?
Thanks !
jermaine