CassKop 1.0.1: Backup and restore

8 min readNov 19, 2020

A big milestone was achieved with this feature and we thought it deserved to bump our version to 1.0. In this release we collaborated with Instaclustr to add the ability to do backups and restores on cassandra clusters deployed by CassKop.

Intaclustr Icarus, the sidecar that CassKop deploys along the Cassandra nodes to trigger backups and restores, uses Instaclustr Esop which does the actual operations. It supports multiple Cloud providers for the object storage and does the backup in an incremental way. It does so by checking if an existing sstable file has already been pushed by first checking the metadata of the file. We spent a long time ensuring that failures are returned to the operator and added to the corresponding object, and that Icarus supports global requests, avoiding that complexity to be handled by CassKop.

In this post, I’m going to show you how simple it is to use those new features and what you can achieve by creating a few new kubernetes objects.

In order to demonstrate it, I’ll use a k3d cluster which makes setting up a Kubernetes cluster easy, and my own aws account with a chosen bucket I created. You can change the configuration to use GCS if you want, just take a look at the documentation of Esop.

Pre-requisites

In order to proceed with the demo, we need to deploy a Cassandra cluster with CassKop. I’ll assume that you already have k3d installed. First, we need a new cluster, so let’s create it:

$ k3d cluster create backup 
INFO[0000] Created network 'k3d-backup'
INFO[0000] Created volume 'k3d-backup-images'
INFO[0001] Creating node 'k3d-backup-server-0'
INFO[0001] Creating LoadBalancer 'k3d-backup-serverlb'
INFO[0010] Cluster 'backup' created successfully!
INFO[0010] You can now use it like this:
kubectl cluster-info

We also need to ensure that we have the last version of CassKop available in our helm repository:

$ helm repo add orange-incubator https://orange-kubernetes-charts-incubator.storage.googleapis.com/
Error: repository name (orange-incubator) already exists, please specify a different name (**Yes I already have it**)$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "orange-incubator" chart
Update Complete. ⎈Happy Helming!⎈

We now need to deploy the CRDs (I assume you have cloned the repo and are at the root level of it) and install CassKop:

$ kubectl apply -f deploy/crds
customresourcedefinition.apiextensions.k8s.io/cassandrabackups.db.orange.com created
customresourcedefinition.apiextensions.k8s.io/cassandraclusters.db.orange.com created
customresourcedefinition.apiextensions.k8s.io/cassandrarestores.db.orange.com created$ helm install casskop orange-incubator/cassandra-operator
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
NAME: casskop
LAST DEPLOYED: Wed Oct 14 14:50:24 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Congratulations. You have just deployed CassKop the Cassandra Operator.
...$ helm ls
NAME    NAMESPACE REVISION UPDATED                              STATUS   CHART                    APP VERSION
casskop default   1        2020-11-18 14:50:24.090811 -0400 EDT deployed cassandra-operator-1.0.1 1.0.1-release

Let’s now deploy a simple Cassandra cluster:

$ cat > cluster-demo.yaml<<EOF
apiVersion: db.orange.com/v1alpha1
kind: CassandraCluster
metadata:
  name: cluster-demo
  labels:
    cluster: k8s.pic
spec:
  nodesPerRacks: 2
  dataCapacity: "1Gi"
  hardAntiAffinity: false
  deletePVC: true
  autoPilot: true
  autoUpdateSeedList: false
  resources:
    requests: &requests
      cpu: 100m
      memory: 512Mi
    limits: *requests
  cassandraImage: cassandra:3.11.7
  topology:
    dc:
      - name: dc1
        rack:
          - name: rack1
EOF$ kubectl apply -f cluster-demo.yaml

After some time you should see that cluster ready to use:

$ kubectl get po
NAME                                          READY   STATUS    RESTARTS   AGE
casskop-cassandra-operator-766f668594-xl4d4   1/1     Running   0          49m
cluster-demo-dc1-rack1-0                      2/2     Running   0          5m39s
cluster-demo-dc1-rack1-1                      2/2     Running   0          4m25s

The last step is to have some data to backup. In order to create fake data, we use cassandra-stress tool available in cassandra docker image and a fancy command to generate 20,000 rows in table k1.standard1:

$ kubectl exec -ti cluster-demo-dc1-rack1-0 -- /opt/cassandra/tools/bin/cassandra-stress write n=20000 cl=one -rate threads=1 -mode native cql3 user=cassandra password=cassandra -schema 'Keyspace=k1 replication(strategy=NetworkTopologyStrategy, dc1=2)'
Defaulting container name to cassandra.
...Results:
Op rate                   :    1,107 op/s  [WRITE: 1,107 op/s]
Partition rate            :    1,107 pk/s  [WRITE: 1,107 pk/s]
Row rate                  :    1,107 row/s [WRITE: 1,107 row/s]
Latency mean              :    0.9 ms [WRITE: 0.9 ms]
Latency median            :    0.7 ms [WRITE: 0.7 ms]
Latency 95th percentile   :    1.4 ms [WRITE: 1.4 ms]
Latency 99th percentile   :    3.3 ms [WRITE: 3.3 ms]
Latency 99.9th percentile :    9.3 ms [WRITE: 9.3 ms]
Latency max               :   39.6 ms [WRITE: 39.6 ms]
Total partitions          :     20,000 [WRITE: 20,000]
Total errors              :          0 [WRITE: 0]
Total GC count            : 5
Total GC memory           : 337.000 MiB
Total GC time             :    0.2 seconds
Avg GC time               :   32.8 ms
StdDev GC time            :    4.1 ms
Total operation time      : 00:00:18

Backup

Like I said in the introduction, we’re going to use my own AWS account to push the backup to a specific bucket I created. In order for Icarus to have the credentials we first need to create a secret:

$ cat<<EOF|kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: aws-backup-secrets
type: Opaque
stringData:
# You need to use your credentials here
  awsaccesskeyid: AKIAXXXXXXXXXXXCMQ
  awssecretaccesskey: gp8PYYYYYYYYYYYYYYYYYYYYYYYI
  awsregion: us-west-2
EOFsecret/aws-backup-secrets created

Now in order to create a backup we just need to create an object of type CassandraBackup:

$ cat<<EOF|kubectl apply -f -
apiVersion: db.orange.com/v1alpha1
kind: CassandraBackup
metadata:
  name: backup-demo
spec:
  cassandracluster: cluster-demo
  datacenter: dc1
# This is my bucket, you need to use your own here
  storageLocation: s3://cscetbon-lab
  secret: aws-backup-secrets
  entities: k1.standard1
  snapshotTag: first
EOFcassandrabackup.db.orange.com/backup-demo created

If you check the events of your cluster you should see that the backup ran successfully:

kubectl get events|tail -5
20m         Normal    Pulled                  pod/cluster-demo-dc1-rack1-1                          Container image "gcr.io/cassandra-operator/instaclustr-icarus:1.0.3" already present on machine
20m         Normal    Created                 pod/cluster-demo-dc1-rack1-1                          Created container backrest-sidecar
20m         Normal    Started                 pod/cluster-demo-dc1-rack1-1                          Started container backrest-sidecar
2m25s       Normal    BackupInitiated         cassandrabackup/backup-demo                          Task initiated to backup datacenter dc1 of cluster cluster-demo to s3://cscetbon-lab under snapshot first
67s         Normal    BackupCompleted         cassandrabackup/backup-demo                          Backup operation cea888b0-1549-4c9a-863c-dc5adf8bcb9b on node cluster-demo-dc1-rack1-0 was completed.

You can also check your bucket and the files that were uploaded there. If you want to rerun that backup, you need to create a new object with a different name or just recreate the same object. you should see that it is faster the second time. Below we see that the first backup took 1m18 when the second took 18s.

cassandrabackup:backup-demo Normal BackupCompleted cassandrabackup-controller 1 8s
cassandrabackup:backup-demo Normal BackupCompleted cassandrabackup-controller 1 26s                   │
cassandrabackup:backup-demo Normal BackupCompleted cassandrabackup-controller 1 3m52s                 │
cassandrabackup:backup-demo Normal BackupCompleted cassandrabackup-controller 1 5m10s

You can see the status of the backup in the status section of the object:

kubectl get cassandrabackups.db.orange.com backup-tests -o json|jq '.status'
{
  "condition": {
    "lastTransitionTime": "Wed, 18 Nov 2020 01:01:13 GMT",
    "type": "COMPLETED"
  },
  "coordinatorMember": "cassandra-e2e-dc1-rack1-0",
  "id": "357bfd0c-7e52-442d-bf49-c2e164b0f40c",
  "progress": "100%",
  "timeCompleted": "2020-11-18T15:01:12.572Z",
  "timeCreated": "2020-11-18T15:00:03.786Z",
  "timeStarted": "2020-11-18T15:00:03.787Z"
}

There are 2 different types of backups, scheduled and non scheduled. The new CassandraBackup controller uses cronv3 which is the cron library used by Argo, a container-native workflow engine orchestrator. You can specify a schedule in the CassandraBackup object, and CassKop will run the backup at that schedule. Let’s try to run the same backup that takes only a few seconds every minute and then delete the object to stop doing it:

$ cat<<EOF|kubectl apply -f -
apiVersion: db.orange.com/v1alpha1
kind: CassandraBackup
metadata:
  name: backup-demo-scheduled
spec:
  cassandracluster: cluster-demo
  datacenter: dc1
  storageLocation: s3://cscetbon-lab
  secret: aws-backup-secrets
  entities: k1.standard1
  snapshotTag: second
# I don't really expect you to run backups every minute ;)
  schedule: "@every 1m"
EOF
cassandrabackup.db.orange.com/backup-demo-scheduled created

After a few minutes we can check the events and see that CassKop has done the job:

$ kubectl get events --field-selector involvedObject.name=backup-demo-scheduled
5m46s       Normal   BackupCompleted       cassandrabackup/backup-demo-scheduled   Backup operation 272e8ec6-b046-40cc-9125-f02b639dc083 on node cluster-demo-dc1-rack1-0 was completed.
4m46s       Normal   BackupCompleted       cassandrabackup/backup-demo-scheduled   Backup operation 034be072-def0-4eab-8b4f-e574b688d051 on node cluster-demo-dc1-rack1-1 was completed.
3m46s       Normal   BackupCompleted       cassandrabackup/backup-demo-scheduled   Backup operation a2940990-0ce4-4ec5-9be9-3bb022f07d40 on node cluster-demo-dc1-rack1-1 was completed.
2m46s       Normal   BackupCompleted       cassandrabackup/backup-demo-scheduled   Backup operation bbb09c58-93d7-478f-93c8-e19a30a30cb2 on node cluster-demo-dc1-rack1-1 was completed.
106s        Normal   BackupCompleted       cassandrabackup/backup-demo-scheduled   Backup operation 69a9d2d5-b3fb-47ba-a6fc-cdfbe0e1103d on node cluster-demo-dc1-rack1-1 was completed.
46s         Normal   BackupCompleted       cassandrabackup/backup-demo-scheduled   Backup operation 47de4f16-b1c1-4860-88f1-92994e7a943e on node cluster-demo-dc1-rack1-1 was completed.
2s          Normal   BackupInitiated       cassandrabackup/backup-demo-scheduled   Task initiated to backup datacenter dc1 of cluster cluster-demo to s3://cscetbon-lab under snapshot second

You can see on the events timing that it runs every minute as configured. We can delete that object as we won’t use it anymore for the demo:

$ kubectl delete cassandrabackup backup-demo-scheduled
cassandrabackup.db.orange.com "backup-demo-scheduled" deleted

Restore

In order to see that restoring data works as expected, let’s first delete some records (You’ll have to run some query to get some values for the key column) in table k1.standard1:

$ kubectl exec -ti cluster-demo-dc1-rack1-0 -- cqlsh -u cassandra -p cassandra --no-color -k k1 -e "select count(*) from standard1"|grep count -A2
Defaulting container name to cassandra.
Use 'kubectl describe pod/cluster-demo-dc1-rack1-0 -n default' to see all of the containers in this pod.
 count
-------
 20000$ kubectl exec -ti cluster-demo-dc1-rack1-0 -- cqlsh -u cassandra -p cassandra -k k1
...
[cqlsh 5.0.1 | Cassandra 3.11.7 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh:k1> delete from standard1 where key=0x504c3132384c324f3431;
cassandra@cqlsh:k1> delete from standard1 where key=0x4e4f4b4e4d32374d3330;
cassandra@cqlsh:k1> select count(*) from standard1;count
-------
 19998

Now we can restore our data back by creating a CassandraRestore object that uses our previous CassandraBackup object (It’s needed to get access to the configuration used during the backup, like the snapshot, the secret, etc..):

$ cat<<EOF|kubectl apply -f -
apiVersion: db.orange.com/v1alpha1
kind: CassandraRestore
metadata:
  name: restore-demo
spec:
  cassandraCluster: cluster-demo
  cassandraBackup: backup-demo
  restorationStrategyType: HARDLINKS
  entities: k1.standard1
EOF

When our new CassandraRestore controller sees that object, it triggers a restore operation on one pod and as it does for backup sends a global request to Icarus. We can see by looking at the events the restore operation completed.

$ kubectl get events --field-selector involvedObject.name=restore-demo
3m13s Normal RestoreRequired cassandrarestore/restore-demo   Restore task required from cassandraBackup of datacenter dc1 of cluster cassandra-e2e to s3://cscetbon-lab under snapshot first. Restore operation on pod cassandra-e2e-dc1-rack1-1
3m13s Normal RestoreInitiated cassandrarestore/restore-demo   Restore task initiated from cassandraBackup of datacenter dc1 of cluster cassandra-e2e to s3://cscetbon-lab under snapshot first. Restore operation 9471bb4b-7ae0-4a92-8fa4-12fe82985658 on pod cassandra-e2e-dc1-rack1-1.
2m8s Normal RestoreCompleted cassandrarestore/restore-demo   Restore task from cassandraBackup of datacenter dc1 of cluster cassandra-e2e to s3://cscetbon-lab under snapshot first is completed. Restore operation 9471bb4b-7ae0-4a92-8fa4-12fe82985658 on pod cassandra-e2e-dc1-rack1-1.

You can also check the status on the object

kubectl get cassandrarestores restore-demo -o json|jq '.status'
{
  "condition": {
    "lastTransitionTime": "Tue, 18 Nov 2020 16:03:20 GMT",
    "type": "COMPLETED"
  },
  "coordinatorMember": "cassandra-e2e-dc1-rack1-1",
  "id": "9471bb4b-7ae0-4a92-8fa4-12fe82985658",
  "progress": "100%",
  "timeCompleted": "2020-11-18T16:02:14.153Z",
  "timeCreated": "2020-11-18T16:01:15.594Z",
  "timeStarted": "2020-11-18T16:01:15.595Z"
}

Last words

So you’ve seen how easy it is to do backups and restores with CassKop. We count on you to test version 1.0.1 and give us feedback in order to continue to have the perfect solution for Cassandra users. Do not hesitate to look at our awesome documentation.

I’d like to thank all the CassKop contributors who collaborated on that release, and Instaclustr for their collaboration and the tools they’ve developed and enhanced along the way to deliver this amazing features.

CassKop 1.0.1: Backup and restore

Pre-requisites

Backup

Restore

Last words

Written by Cyril Scetbon

No responses yet