CassKop a Kubernetes operator for Cassandra

8 min readSep 12, 2019

Today CassKop is being introduced to the Cassandra community at the 2019 Apache Cassandra Summit in Vegas. I’m proud to be one of the main contributors with Sebastien Allamand, as it’s been a long road to get there. In that article, I’m going to show you how simple it is to get started with this operator using Kind (Kubernetes in Docker) and to set up your first Cassandra cluster on your machine. CassKop is open source so don’t hesitate to try it out, contribute by first trying to fix a discovered issue and let’s enhance it together!

What is Kind and why do we need it ?

Kind is a tool for running Kubernetes on your local machine using containers. It means that your K8s nodes, masters as workers, are docker containers. This way you don’t have to install anything on your machine other than the client part and you can potentially create multiple independent clusters. Let’s say you wanna have one with network policies activated and one without.

Kind is written in Go but can be also used as a library. It can be used in CI jobs and makes it simple to spawn a cluster with everything you need in minutes.

What‘s the purpose of CassKop ?

CassKop is a Kubernetes operator. It’s a controller based on Operator SDK that runs within a Kubernetes cluster and provides a mean to deploy and manage Cassandra clusters.

Once the operator is installed, you can create a deployment file containing an object of type CassandraClusterdescribing the cluster that you want to create. As an example, you specify the number of nodes, the different datacenters, the racks…

Here is an example that is used in end to end tests :

apiVersion: "db.orange.com/v1alpha1"
kind: "CassandraCluster"
metadata:
  name: cassandra-e2e
  labels:
    cluster: k8s.pic 
spec:
  nodesPerRacks: 1
  baseImage: orangeopensource/cassandra-image
  version: latest-cqlsh
  imagePullPolicy: "Always"
  rollingPartition: 0
  dataCapacity: "1Gi"
  hardAntiAffinity: false
  deletePVC: false
  autoPilot: true
  gcStdout: true
  autoUpdateSeedList: false
  resources:         
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 500m
      memory: 1Gi
  topology:
    dc:
      - name: dc1
        rack:
          - name: rack1
      - name: dc2
        numTokens: 32
        nodesPerRacks: 1
        rack:
          - name: rack1

As you can see, there are global parameters like nodesPerRacks that can be overridden at datacenter levels (see datacenter dc2).

You then deploy it using kubectl and the operator that is watching events regarding this type of object will create the cluster for you. Many operations are supported and you can take a look at this exhausting list.

The goal is to avoid manual operations, ensure that they are executed with no mistakes and to make those operations as simple as possible. For example, we don’t want to remove a datacenter before ensuring keyspaces are no longer replicated there.

we use labels on pods to tell the operator what operation to run. But even applying a label on a specific pod can be a little tricky and a mistake can be made. To simplify it, we’ve created a kubectl plugin that I’ll use in the next chapter.

Depending on the operation or change to apply on the cluster it can be done by :

Updating Kubernetes objects ( CassandraCluster , ConfigMap, Pod)
Running a kubectl casskop command

Sneak peak at how to install and interact with it

Prerequisites

I’m not going to spend too much time on the installation of the different tools you need, but here is the list of what you need and a few command to install them.

Docker
kubernetes-cli, kubernetes-helm : brew install kubernetes-cli kubernetes-helm kubectx
kubectl and kube-ps1 plugins for oh-my-zsh. Not mandatory but useful and I’ll be using some aliases in the rest of the article like kgp for kubectl get pods
Kind: GO111MODULE=ON go get sigs.k8s.io/kind@v0.5.1
CassKop: https://github.com/Orange-OpenSource/cassandra-k8s-operator. It should be renamed to casskop soon
The ultimate command-line JSON processor jq : brew install jq
Python 3. It’s needed in order to use the kubectl plugin casskop

Create a Kubernetes cluster using Kind

I spent some time writing the documentation for the local setup. Don’t hesitate to take a look at it and create a PR if things are missing.

Handy scripts are provided to install a cluster with or without network policies enabled. Let’s create one without. You need to be at the root level of the CassKop sources :

# Replace it with your local path
cd ~/src/git/cassandra-k8s-operator# We delete any old existing cluster 
$ kind delete cluster# And then create a brand new one without network policies enabled
$ samples/kind/create-kind-cluster.sh
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.15.3) 🖼
 ✓ Preparing nodes 📦📦📦
 ✓ Creating kubeadm config 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Cluster creation complete. You can now use the cluster with:export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"

I’ve removed all the extra messages that are printed when creating a cluster for readibility. We can now verify that the cluster is available by asking what’s the current version :

$ export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"$ k version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:36:28Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-20T18:57:36Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

So we have a Kubernetes 1.15 running and waiting for an operator

Install CassKop

Installing the operator is easy using helm. We first need to add the incubator repository from Google and then we can install the last version of the operator . when the helm package gets installed it tries to create the CRD cassandraclusters.db.orange.com which has already been installed by our previous handy script, so we need to tell the operator to not try to create it by using the option --no-hooks :

$ helm repo add casskop https://Orange-OpenSource.github.io/cassandra-k8s-operator/helm$ helm install --name casskop casskop/cassandra-operator --no-hooks
NAME:   casskop
LAST DEPLOYED: Wed Sep 11 18:52:26 2019
NAMESPACE: cassandra-e2e
STATUS: DEPLOYEDRESOURCES:
==> v1/Deployment
NAME                        READY  UP-TO-DATE  AVAILABLE  AGE
casskop-cassandra-operator  0/1    1           0          0s==> v1/Pod(related)
NAME                                         READY  STATUS             RESTARTS  AGE
casskop-cassandra-operator-796cb55595-9w4pd  0/1    ContainerCreating  0         0s==> v1/RoleBinding
NAME                AGE
cassandra-operator  0s==> v1/ServiceAccount
NAME                SECRETS  AGE
cassandra-operator  1        0s==> v1beta1/Role
NAME                AGE
cassandra-operator  0sNOTES:
Congratulations. You have just deployed CassKop the Cassandra Operator.
Check its status by running:
kubectl --namespace cassandra-e2e get pods -l "release=casskop"Visit https://github.com/Orange-OpenSource/cassandra-k8s-operator for instructions on hot to create & configure Cassandra clusters using the operator.

You need to wait until the operator is running before going to the next stage :

$ kgp -l "release=casskop" -w
NAME                                          READY   STATUS    RESTARTS   AGE
casskop-cassandra-operator-796cb55595-9w4pd   1/1     Running   0          9m16s

Create a Cassandra cluster

In that part we’re going to create a cluster consisting of one datacenter with one rack containing 1 node

$ sed 's/2$/1/' test/e2e/testdata/cassandracluster-1DC.yaml | \
k apply -f -

When the cluster has been created you should see one node :

$ kgp
NAME                                          READY   STATUS    RESTARTS   AGE
cassandra-e2e-dc1-rack1-0                     1/1     Running   0          3m29s
casskop-cassandra-operator-796cb55595-9w4pd   1/1     Running   0          18m

Scale up the cluster

We can now scale the cluster by adding one datacenter. Again we’re gonna use existing deployment files used in end to end tests :

$ sed '/gcStdout/d' test/e2e/testdata/cassandracluster-2DC.yaml | \                                                       k apply -f -

After less than a minute depending if the image has already been cached on the node or not we can see our 2 nodes cluster

$ kgp
NAME                                          READY   STATUS    RESTARTS   AGE
cassandra-e2e-dc1-rack1-0                     1/1     Running   0          4m4s
cassandra-e2e-dc2-rack1-0                     1/1     Running   0          2m10s
casskop-cassandra-operator-796cb55595-9w4pd   1/1     Running   5          8h

We can also use nodetool to see our current cluster state by calling the command on one node. Of course this is not something we could normally do on production if the exec permission is revoked

$ keti cassandra-e2e-dc1-rack1-0 nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.2.17  282.26 KiB  256          92.3%             7df6af9b-392e-49ac-b5e2-d6d79d633b1b  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.1.13  241.81 KiB  32           16.6%             9ac0bbd5-f7cd-4bd2-8bfb-1c54b8bd4b4c  rack1

Everything has been managed by the operator and we can ask him the current status of our rack :

k get cassandraclusters.db.orange.com cassandra-e2e -o json|jq '.status.cassandraRackStatus'
{
  "dc1-rack1": {
    "cassandraLastAction": {
      "Name": "Initializing",
      "endTime": "2019-09-12T11:49:12Z",
      "status": "Done"
    },
    "phase": "Running",
    "podLastOperation": {}
  },
  "dc2-rack1": {
    "cassandraLastAction": {
      "Name": "Initializing",
      "endTime": "2019-09-12T11:50:52Z",
      "status": "Done"
    },
    "phase": "Running",
    "podLastOperation": {}
  }
}

Above you can see that it has initialized 2 datacenters of 1 rack (each rack is defined as a statefulset in Kubernetes) and that those actions have a status of done, which means everything went fine. You can also see that there is no pod operation that happened yet.

Run a cleanup operation on a rack

Let’s trigger a cleanup operation, which usually needs to happen after a scale up to remove old data on a node. We’re going to use the plugin that I talked about and that was created to simplify some of the available operations. The plugin uses Python 3 so remember that it’s a requirement.

Let’s do a cleanup of datacenter dc2

# First we need to add the plugin in our path for kubectl to use it
$ export PATH=$PATH:$PWD/plugins/# We trigger the cleanup by specify a prefix
$ k casskop cleanup --prefix cassandra-e2e-dc2

When it’s done we can see it again on the kubernetes object used by the operator. This time we can take a look only at dc2-rack1's status

$ k get cassandraclusters.db.orange.com cassandra-e2e -o json|jq '.status.cassandraRackStatus."dc2-rack1"'
{
  "cassandraLastAction": {
    "Name": "Initializing",
    "endTime": "2019-09-12T11:50:52Z",
    "status": "Done"
  },
  "phase": "Running",
  "podLastOperation": {
    "Name": "cleanup",
    "endTime": "2019-09-12T12:04:11Z",
    "operatorName": "casskop-cassandra-operator-796cb55595-9w4pd",
    "podsOK": [
      "cassandra-e2e-dc2-rack1-0"
    ],
    "startTime": "2019-09-12T12:04:08Z",
    "status": "Done"
  }
}

As you can see above, the operation has been successful. We can also see it if we look at the operator’s logs :

$ kl casskop-cassandra-operator-796cb55595-9w4pd|grep -i cleanup
time="2019-09-12T12:04:08Z" level=info msg="Start operation" cluster=cassandra-e2e operation=Cleanup pod=cassandra-e2e-dc2-rack1-0 rack=dc2-rack1
time="2019-09-12T12:04:08Z" level=info msg="Operation start" cluster=cassandra-e2e hostName=cassandra-e2e-dc2-rack1-0.cassandra-e2e operation=Cleanup pod=cassandra-e2e-dc2-rack1-0 rack=dc2-rack1
time="2019-09-12T12:04:08Z" level=info msg="Execute the Jolokia Operation" cluster=cassandra-e2e operation=Cleanup pod=cassandra-e2e-dc2-rack1-0 rack=dc2-rack1
time="2019-09-12T12:04:10Z" level=info msg="[cassandra-e2e-dc2-rack1-0.cassandra-e2e]: Cleanup of keyspace system_distributed"
time="2019-09-12T12:04:11Z" level=info msg="[cassandra-e2e-dc2-rack1-0.cassandra-e2e]: Cleanup of keyspace system_auth"
time="2019-09-12T12:04:11Z" level=info msg="[cassandra-e2e-dc2-rack1-0.cassandra-e2e]: Cleanup of keyspace system_traces"

You can see all the supported operations by just calling the command with no operation :

$ k casskop
usage: kubectl-casskop <command> [<args>]The available commands are:
   cleanup
   upgradesstables
   rebuild
   remove
   pause
   unpauseFor more information you can run kubectl-casskop <command> --help
kubectl-casskop: error: the following arguments are required: command

More are coming like being able to trigger a rolling restart with one simple command.

Last words

I hope you liked that article. You’ve seen how easy it is to get started with CassKop, to create a cluster and to trigger operations using the casskop plugin. CassKop is developed in Go and uses Python for the casskop plugin. It leverages the CoreOs Operator sdk operator-sdk which provides some abstraction to interact with Kubernetes API, and we use Kind for local developement. We try as much as possible to leverage tools that are available out there like for instance Squash for debugging. We count on you to test it and give us feedback in order to have a stable and better version as soon as possible.