CassKop a Kubernetes operator for Cassandra

Cyril Scetbon
8 min readSep 12, 2019

--

Today CassKop is being introduced to the Cassandra community at the 2019 Apache Cassandra Summit in Vegas. I’m proud to be one of the main contributors with Sebastien Allamand, as it’s been a long road to get there. In that article, I’m going to show you how simple it is to get started with this operator using Kind (Kubernetes in Docker) and to set up your first Cassandra cluster on your machine. CassKop is open source so don’t hesitate to try it out, contribute by first trying to fix a discovered issue and let’s enhance it together!

What is Kind and why do we need it ?

Kind is a tool for running Kubernetes on your local machine using containers. It means that your K8s nodes, masters as workers, are docker containers. This way you don’t have to install anything on your machine other than the client part and you can potentially create multiple independent clusters. Let’s say you wanna have one with network policies activated and one without.

Kind is written in Go but can be also used as a library. It can be used in CI jobs and makes it simple to spawn a cluster with everything you need in minutes.

What‘s the purpose of CassKop ?

CassKop is a Kubernetes operator. It’s a controller based on Operator SDK that runs within a Kubernetes cluster and provides a mean to deploy and manage Cassandra clusters.

Once the operator is installed, you can create a deployment file containing an object of type CassandraClusterdescribing the cluster that you want to create. As an example, you specify the number of nodes, the different datacenters, the racks…

Here is an example that is used in end to end tests :

apiVersion: "db.orange.com/v1alpha1"
kind: "CassandraCluster"
metadata:
name: cassandra-e2e
labels:
cluster: k8s.pic
spec:
nodesPerRacks: 1
baseImage: orangeopensource/cassandra-image
version: latest-cqlsh
imagePullPolicy: "Always"
rollingPartition: 0
dataCapacity: "1Gi"
hardAntiAffinity: false
deletePVC: false
autoPilot: true
gcStdout: true
autoUpdateSeedList: false
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 500m
memory: 1Gi
topology:
dc:
- name: dc1
rack:
- name: rack1
- name: dc2
numTokens: 32
nodesPerRacks: 1
rack:
- name: rack1

As you can see, there are global parameters like nodesPerRacks that can be overridden at datacenter levels (see datacenter dc2).

You then deploy it using kubectl and the operator that is watching events regarding this type of object will create the cluster for you. Many operations are supported and you can take a look at this exhausting list.

The goal is to avoid manual operations, ensure that they are executed with no mistakes and to make those operations as simple as possible. For example, we don’t want to remove a datacenter before ensuring keyspaces are no longer replicated there.

we use labels on pods to tell the operator what operation to run. But even applying a label on a specific pod can be a little tricky and a mistake can be made. To simplify it, we’ve created a kubectl plugin that I’ll use in the next chapter.

Depending on the operation or change to apply on the cluster it can be done by :

  • Updating Kubernetes objects ( CassandraCluster , ConfigMap, Pod)
  • Running a kubectl casskop command

Sneak peak at how to install and interact with it

Prerequisites

I’m not going to spend too much time on the installation of the different tools you need, but here is the list of what you need and a few command to install them.

  • Docker
  • kubernetes-cli, kubernetes-helm : brew install kubernetes-cli kubernetes-helm kubectx
  • kubectl and kube-ps1 plugins for oh-my-zsh. Not mandatory but useful and I’ll be using some aliases in the rest of the article like kgp for kubectl get pods
  • Kind: GO111MODULE=ON go get sigs.k8s.io/kind@v0.5.1
  • CassKop: https://github.com/Orange-OpenSource/cassandra-k8s-operator. It should be renamed to casskop soon
  • The ultimate command-line JSON processor jq : brew install jq
  • Python 3. It’s needed in order to use the kubectl plugin casskop

Create a Kubernetes cluster using Kind

I spent some time writing the documentation for the local setup. Don’t hesitate to take a look at it and create a PR if things are missing.

Handy scripts are provided to install a cluster with or without network policies enabled. Let’s create one without. You need to be at the root level of the CassKop sources :

# Replace it with your local path
cd ~/src/git/cassandra-k8s-operator
# We delete any old existing cluster
$ kind delete cluster
# And then create a brand new one without network policies enabled
$ samples/kind/create-kind-cluster.sh
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.15.3) 🖼
✓ Preparing nodes 📦📦📦
✓ Creating kubeadm config 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Cluster creation complete. You can now use the cluster with:
export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"

I’ve removed all the extra messages that are printed when creating a cluster for readibility. We can now verify that the cluster is available by asking what’s the current version :

$ export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"$ k version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:36:28Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-20T18:57:36Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

So we have a Kubernetes 1.15 running and waiting for an operator

Install CassKop

Installing the operator is easy using helm. We first need to add the incubator repository from Google and then we can install the last version of the operator . when the helm package gets installed it tries to create the CRD cassandraclusters.db.orange.com which has already been installed by our previous handy script, so we need to tell the operator to not try to create it by using the option --no-hooks :

$ helm repo add casskop https://Orange-OpenSource.github.io/cassandra-k8s-operator/helm$ helm install --name casskop casskop/cassandra-operator --no-hooks
NAME: casskop
LAST DEPLOYED: Wed Sep 11 18:52:26 2019
NAMESPACE: cassandra-e2e
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
casskop-cassandra-operator 0/1 1 0 0s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
casskop-cassandra-operator-796cb55595-9w4pd 0/1 ContainerCreating 0 0s
==> v1/RoleBinding
NAME AGE
cassandra-operator 0s
==> v1/ServiceAccount
NAME SECRETS AGE
cassandra-operator 1 0s
==> v1beta1/Role
NAME AGE
cassandra-operator 0s
NOTES:
Congratulations. You have just deployed CassKop the Cassandra Operator.
Check its status by running:
kubectl --namespace cassandra-e2e get pods -l "release=casskop"
Visit https://github.com/Orange-OpenSource/cassandra-k8s-operator for instructions on hot to create & configure Cassandra clusters using the operator.

You need to wait until the operator is running before going to the next stage :

$ kgp -l "release=casskop" -w
NAME READY STATUS RESTARTS AGE
casskop-cassandra-operator-796cb55595-9w4pd 1/1 Running 0 9m16s

Create a Cassandra cluster

In that part we’re going to create a cluster consisting of one datacenter with one rack containing 1 node

$ sed 's/2$/1/' test/e2e/testdata/cassandracluster-1DC.yaml | \
k apply -f -

When the cluster has been created you should see one node :

$ kgp
NAME READY STATUS RESTARTS AGE
cassandra-e2e-dc1-rack1-0 1/1 Running 0 3m29s
casskop-cassandra-operator-796cb55595-9w4pd 1/1 Running 0 18m

Scale up the cluster

We can now scale the cluster by adding one datacenter. Again we’re gonna use existing deployment files used in end to end tests :

$ sed '/gcStdout/d' test/e2e/testdata/cassandracluster-2DC.yaml | \                                                       k apply -f -

After less than a minute depending if the image has already been cached on the node or not we can see our 2 nodes cluster

$ kgp
NAME READY STATUS RESTARTS AGE
cassandra-e2e-dc1-rack1-0 1/1 Running 0 4m4s
cassandra-e2e-dc2-rack1-0 1/1 Running 0 2m10s
casskop-cassandra-operator-796cb55595-9w4pd 1/1 Running 5 8h

We can also use nodetool to see our current cluster state by calling the command on one node. Of course this is not something we could normally do on production if the exec permission is revoked

$ keti cassandra-e2e-dc1-rack1-0 nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.244.2.17 282.26 KiB 256 92.3% 7df6af9b-392e-49ac-b5e2-d6d79d633b1b rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.244.1.13 241.81 KiB 32 16.6% 9ac0bbd5-f7cd-4bd2-8bfb-1c54b8bd4b4c rack1

Everything has been managed by the operator and we can ask him the current status of our rack :

k get cassandraclusters.db.orange.com cassandra-e2e -o json|jq '.status.cassandraRackStatus'
{
"dc1-rack1": {
"cassandraLastAction": {
"Name": "Initializing",
"endTime": "2019-09-12T11:49:12Z",
"status": "Done"
},
"phase": "Running",
"podLastOperation": {}
},
"dc2-rack1": {
"cassandraLastAction": {
"Name": "Initializing",
"endTime": "2019-09-12T11:50:52Z",
"status": "Done"
},
"phase": "Running",
"podLastOperation": {}
}
}

Above you can see that it has initialized 2 datacenters of 1 rack (each rack is defined as a statefulset in Kubernetes) and that those actions have a status of done, which means everything went fine. You can also see that there is no pod operation that happened yet.

Run a cleanup operation on a rack

Let’s trigger a cleanup operation, which usually needs to happen after a scale up to remove old data on a node. We’re going to use the plugin that I talked about and that was created to simplify some of the available operations. The plugin uses Python 3 so remember that it’s a requirement.

Let’s do a cleanup of datacenter dc2

# First we need to add the plugin in our path for kubectl to use it
$ export PATH=$PATH:$PWD/plugins/
# We trigger the cleanup by specify a prefix
$ k casskop cleanup --prefix cassandra-e2e-dc2

When it’s done we can see it again on the kubernetes object used by the operator. This time we can take a look only at dc2-rack1's status

$ k get cassandraclusters.db.orange.com cassandra-e2e -o json|jq '.status.cassandraRackStatus."dc2-rack1"'
{
"cassandraLastAction": {
"Name": "Initializing",
"endTime": "2019-09-12T11:50:52Z",
"status": "Done"
},
"phase": "Running",
"podLastOperation": {
"Name": "cleanup",
"endTime": "2019-09-12T12:04:11Z",
"operatorName": "casskop-cassandra-operator-796cb55595-9w4pd",
"podsOK": [
"cassandra-e2e-dc2-rack1-0"
],
"startTime": "2019-09-12T12:04:08Z",
"status": "Done"
}

}

As you can see above, the operation has been successful. We can also see it if we look at the operator’s logs :

$ kl casskop-cassandra-operator-796cb55595-9w4pd|grep -i cleanup
time="2019-09-12T12:04:08Z" level=info msg="Start operation" cluster=cassandra-e2e operation=Cleanup pod=cassandra-e2e-dc2-rack1-0 rack=dc2-rack1
time="2019-09-12T12:04:08Z" level=info msg="Operation start" cluster=cassandra-e2e hostName=cassandra-e2e-dc2-rack1-0.cassandra-e2e operation=Cleanup pod=cassandra-e2e-dc2-rack1-0 rack=dc2-rack1
time="2019-09-12T12:04:08Z" level=info msg="Execute the Jolokia Operation" cluster=cassandra-e2e operation=Cleanup pod=cassandra-e2e-dc2-rack1-0 rack=dc2-rack1
time="2019-09-12T12:04:10Z" level=info msg="[cassandra-e2e-dc2-rack1-0.cassandra-e2e]: Cleanup of keyspace system_distributed"
time="2019-09-12T12:04:11Z" level=info msg="[cassandra-e2e-dc2-rack1-0.cassandra-e2e]: Cleanup of keyspace system_auth"
time="2019-09-12T12:04:11Z" level=info msg="[cassandra-e2e-dc2-rack1-0.cassandra-e2e]: Cleanup of keyspace system_traces"

You can see all the supported operations by just calling the command with no operation :

$ k casskop
usage: kubectl-casskop <command> [<args>]
The available commands are:
cleanup
upgradesstables
rebuild
remove
pause
unpause
For more information you can run kubectl-casskop <command> --help
kubectl-casskop: error: the following arguments are required: command

More are coming like being able to trigger a rolling restart with one simple command.

Last words

I hope you liked that article. You’ve seen how easy it is to get started with CassKop, to create a cluster and to trigger operations using the casskop plugin. CassKop is developed in Go and uses Python for the casskop plugin. It leverages the CoreOs Operator sdk operator-sdk which provides some abstraction to interact with Kubernetes API, and we use Kind for local developement. We try as much as possible to leverage tools that are available out there like for instance Squash for debugging. We count on you to test it and give us feedback in order to have a stable and better version as soon as possible.

--

--

Cyril Scetbon

Database/Cloud Architect & Expert, advocate of micro-services and serverless architectures