CassKop 1.0.1: Backup and restore
A big milestone was achieved with this feature and we thought it deserved to bump our version to 1.0. In this release we collaborated with Instaclustr to add the ability to do backups and restores on cassandra clusters deployed by CassKop.
Intaclustr Icarus, the sidecar that CassKop deploys along the Cassandra nodes to trigger backups and restores, uses Instaclustr Esop which does the actual operations. It supports multiple Cloud providers for the object storage and does the backup in an incremental way. It does so by checking if an existing sstable file has already been pushed by first checking the metadata of the file. We spent a long time ensuring that failures are returned to the operator and added to the corresponding object, and that Icarus supports global requests, avoiding that complexity to be handled by CassKop.
In this post, I’m going to show you how simple it is to use those new features and what you can achieve by creating a few new kubernetes objects.
In order to demonstrate it, I’ll use a k3d cluster which makes setting up a Kubernetes cluster easy, and my own aws account with a chosen bucket I created. You can change the configuration to use GCS if you want, just take a look at the documentation of Esop.
Pre-requisites
In order to proceed with the demo, we need to deploy a Cassandra cluster with CassKop. I’ll assume that you already have k3d installed. First, we need a new cluster, so let’s create it:
$ k3d cluster create backup
INFO[0000] Created network 'k3d-backup'
INFO[0000] Created volume 'k3d-backup-images'
INFO[0001] Creating node 'k3d-backup-server-0'
INFO[0001] Creating LoadBalancer 'k3d-backup-serverlb'
INFO[0010] Cluster 'backup' created successfully!
INFO[0010] You can now use it like this:
kubectl cluster-info
We also need to ensure that we have the last version of CassKop available in our helm repository:
$ helm repo add orange-incubator https://orange-kubernetes-charts-incubator.storage.googleapis.com/
Error: repository name (orange-incubator) already exists, please specify a different name (**Yes I already have it**)$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "orange-incubator" chart
Update Complete. ⎈Happy Helming!⎈
We now need to deploy the CRDs (I assume you have cloned the repo and are at the root level of it) and install CassKop:
$ kubectl apply -f deploy/crds
customresourcedefinition.apiextensions.k8s.io/cassandrabackups.db.orange.com created
customresourcedefinition.apiextensions.k8s.io/cassandraclusters.db.orange.com created
customresourcedefinition.apiextensions.k8s.io/cassandrarestores.db.orange.com created$ helm install casskop orange-incubator/cassandra-operator
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
NAME: casskop
LAST DEPLOYED: Wed Oct 14 14:50:24 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Congratulations. You have just deployed CassKop the Cassandra Operator.
...$ helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
casskop default 1 2020-11-18 14:50:24.090811 -0400 EDT deployed cassandra-operator-1.0.1 1.0.1-release
Let’s now deploy a simple Cassandra cluster:
$ cat > cluster-demo.yaml<<EOF
apiVersion: db.orange.com/v1alpha1
kind: CassandraCluster
metadata:
name: cluster-demo
labels:
cluster: k8s.pic
spec:
nodesPerRacks: 2
dataCapacity: "1Gi"
hardAntiAffinity: false
deletePVC: true
autoPilot: true
autoUpdateSeedList: false
resources:
requests: &requests
cpu: 100m
memory: 512Mi
limits: *requests
cassandraImage: cassandra:3.11.7
topology:
dc:
- name: dc1
rack:
- name: rack1
EOF$ kubectl apply -f cluster-demo.yaml
After some time you should see that cluster ready to use:
$ kubectl get po
NAME READY STATUS RESTARTS AGE
casskop-cassandra-operator-766f668594-xl4d4 1/1 Running 0 49m
cluster-demo-dc1-rack1-0 2/2 Running 0 5m39s
cluster-demo-dc1-rack1-1 2/2 Running 0 4m25s
The last step is to have some data to backup. In order to create fake data, we use cassandra-stress tool available in cassandra docker image and a fancy command to generate 20,000 rows in table k1.standard1:
$ kubectl exec -ti cluster-demo-dc1-rack1-0 -- /opt/cassandra/tools/bin/cassandra-stress write n=20000 cl=one -rate threads=1 -mode native cql3 user=cassandra password=cassandra -schema 'Keyspace=k1 replication(strategy=NetworkTopologyStrategy, dc1=2)'
Defaulting container name to cassandra.
...Results:
Op rate : 1,107 op/s [WRITE: 1,107 op/s]
Partition rate : 1,107 pk/s [WRITE: 1,107 pk/s]
Row rate : 1,107 row/s [WRITE: 1,107 row/s]
Latency mean : 0.9 ms [WRITE: 0.9 ms]
Latency median : 0.7 ms [WRITE: 0.7 ms]
Latency 95th percentile : 1.4 ms [WRITE: 1.4 ms]
Latency 99th percentile : 3.3 ms [WRITE: 3.3 ms]
Latency 99.9th percentile : 9.3 ms [WRITE: 9.3 ms]
Latency max : 39.6 ms [WRITE: 39.6 ms]
Total partitions : 20,000 [WRITE: 20,000]
Total errors : 0 [WRITE: 0]
Total GC count : 5
Total GC memory : 337.000 MiB
Total GC time : 0.2 seconds
Avg GC time : 32.8 ms
StdDev GC time : 4.1 ms
Total operation time : 00:00:18
Backup
Like I said in the introduction, we’re going to use my own AWS account to push the backup to a specific bucket I created. In order for Icarus to have the credentials we first need to create a secret:
$ cat<<EOF|kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: aws-backup-secrets
type: Opaque
stringData:
# You need to use your credentials here
awsaccesskeyid: AKIAXXXXXXXXXXXCMQ
awssecretaccesskey: gp8PYYYYYYYYYYYYYYYYYYYYYYYI
awsregion: us-west-2
EOFsecret/aws-backup-secrets created
Now in order to create a backup we just need to create an object of type CassandraBackup:
$ cat<<EOF|kubectl apply -f -
apiVersion: db.orange.com/v1alpha1
kind: CassandraBackup
metadata:
name: backup-demo
spec:
cassandracluster: cluster-demo
datacenter: dc1
# This is my bucket, you need to use your own here
storageLocation: s3://cscetbon-lab
secret: aws-backup-secrets
entities: k1.standard1
snapshotTag: first
EOFcassandrabackup.db.orange.com/backup-demo created
If you check the events of your cluster you should see that the backup ran successfully:
kubectl get events|tail -5
20m Normal Pulled pod/cluster-demo-dc1-rack1-1 Container image "gcr.io/cassandra-operator/instaclustr-icarus:1.0.3" already present on machine
20m Normal Created pod/cluster-demo-dc1-rack1-1 Created container backrest-sidecar
20m Normal Started pod/cluster-demo-dc1-rack1-1 Started container backrest-sidecar
2m25s Normal BackupInitiated cassandrabackup/backup-demo Task initiated to backup datacenter dc1 of cluster cluster-demo to s3://cscetbon-lab under snapshot first
67s Normal BackupCompleted cassandrabackup/backup-demo Backup operation cea888b0-1549-4c9a-863c-dc5adf8bcb9b on node cluster-demo-dc1-rack1-0 was completed.
You can also check your bucket and the files that were uploaded there. If you want to rerun that backup, you need to create a new object with a different name or just recreate the same object. you should see that it is faster the second time. Below we see that the first backup took 1m18 when the second took 18s.
cassandrabackup:backup-demo Normal BackupCompleted cassandrabackup-controller 1 8s
cassandrabackup:backup-demo Normal BackupCompleted cassandrabackup-controller 1 26s │
cassandrabackup:backup-demo Normal BackupCompleted cassandrabackup-controller 1 3m52s │
cassandrabackup:backup-demo Normal BackupCompleted cassandrabackup-controller 1 5m10s
You can see the status of the backup in the status section of the object:
kubectl get cassandrabackups.db.orange.com backup-tests -o json|jq '.status'
{
"condition": {
"lastTransitionTime": "Wed, 18 Nov 2020 01:01:13 GMT",
"type": "COMPLETED"
},
"coordinatorMember": "cassandra-e2e-dc1-rack1-0",
"id": "357bfd0c-7e52-442d-bf49-c2e164b0f40c",
"progress": "100%",
"timeCompleted": "2020-11-18T15:01:12.572Z",
"timeCreated": "2020-11-18T15:00:03.786Z",
"timeStarted": "2020-11-18T15:00:03.787Z"
}
There are 2 different types of backups, scheduled and non scheduled. The new CassandraBackup controller uses cronv3 which is the cron library used by Argo, a container-native workflow engine orchestrator. You can specify a schedule in the CassandraBackup object, and CassKop will run the backup at that schedule. Let’s try to run the same backup that takes only a few seconds every minute and then delete the object to stop doing it:
$ cat<<EOF|kubectl apply -f -
apiVersion: db.orange.com/v1alpha1
kind: CassandraBackup
metadata:
name: backup-demo-scheduled
spec:
cassandracluster: cluster-demo
datacenter: dc1
storageLocation: s3://cscetbon-lab
secret: aws-backup-secrets
entities: k1.standard1
snapshotTag: second
# I don't really expect you to run backups every minute ;)
schedule: "@every 1m"
EOF
cassandrabackup.db.orange.com/backup-demo-scheduled created
After a few minutes we can check the events and see that CassKop has done the job:
$ kubectl get events --field-selector involvedObject.name=backup-demo-scheduled
5m46s Normal BackupCompleted cassandrabackup/backup-demo-scheduled Backup operation 272e8ec6-b046-40cc-9125-f02b639dc083 on node cluster-demo-dc1-rack1-0 was completed.
4m46s Normal BackupCompleted cassandrabackup/backup-demo-scheduled Backup operation 034be072-def0-4eab-8b4f-e574b688d051 on node cluster-demo-dc1-rack1-1 was completed.
3m46s Normal BackupCompleted cassandrabackup/backup-demo-scheduled Backup operation a2940990-0ce4-4ec5-9be9-3bb022f07d40 on node cluster-demo-dc1-rack1-1 was completed.
2m46s Normal BackupCompleted cassandrabackup/backup-demo-scheduled Backup operation bbb09c58-93d7-478f-93c8-e19a30a30cb2 on node cluster-demo-dc1-rack1-1 was completed.
106s Normal BackupCompleted cassandrabackup/backup-demo-scheduled Backup operation 69a9d2d5-b3fb-47ba-a6fc-cdfbe0e1103d on node cluster-demo-dc1-rack1-1 was completed.
46s Normal BackupCompleted cassandrabackup/backup-demo-scheduled Backup operation 47de4f16-b1c1-4860-88f1-92994e7a943e on node cluster-demo-dc1-rack1-1 was completed.
2s Normal BackupInitiated cassandrabackup/backup-demo-scheduled Task initiated to backup datacenter dc1 of cluster cluster-demo to s3://cscetbon-lab under snapshot second
You can see on the events timing that it runs every minute as configured. We can delete that object as we won’t use it anymore for the demo:
$ kubectl delete cassandrabackup backup-demo-scheduled
cassandrabackup.db.orange.com "backup-demo-scheduled" deleted
Restore
In order to see that restoring data works as expected, let’s first delete some records (You’ll have to run some query to get some values for the key column) in table k1.standard1:
$ kubectl exec -ti cluster-demo-dc1-rack1-0 -- cqlsh -u cassandra -p cassandra --no-color -k k1 -e "select count(*) from standard1"|grep count -A2
Defaulting container name to cassandra.
Use 'kubectl describe pod/cluster-demo-dc1-rack1-0 -n default' to see all of the containers in this pod.
count
-------
20000$ kubectl exec -ti cluster-demo-dc1-rack1-0 -- cqlsh -u cassandra -p cassandra -k k1
...
[cqlsh 5.0.1 | Cassandra 3.11.7 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh:k1> delete from standard1 where key=0x504c3132384c324f3431;
cassandra@cqlsh:k1> delete from standard1 where key=0x4e4f4b4e4d32374d3330;
cassandra@cqlsh:k1> select count(*) from standard1;count
-------
19998
Now we can restore our data back by creating a CassandraRestore object that uses our previous CassandraBackup object (It’s needed to get access to the configuration used during the backup, like the snapshot, the secret, etc..):
$ cat<<EOF|kubectl apply -f -
apiVersion: db.orange.com/v1alpha1
kind: CassandraRestore
metadata:
name: restore-demo
spec:
cassandraCluster: cluster-demo
cassandraBackup: backup-demo
restorationStrategyType: HARDLINKS
entities: k1.standard1
EOF
When our new CassandraRestore controller sees that object, it triggers a restore operation on one pod and as it does for backup sends a global request to Icarus. We can see by looking at the events the restore operation completed.
$ kubectl get events --field-selector involvedObject.name=restore-demo
3m13s Normal RestoreRequired cassandrarestore/restore-demo Restore task required from cassandraBackup of datacenter dc1 of cluster cassandra-e2e to s3://cscetbon-lab under snapshot first. Restore operation on pod cassandra-e2e-dc1-rack1-1
3m13s Normal RestoreInitiated cassandrarestore/restore-demo Restore task initiated from cassandraBackup of datacenter dc1 of cluster cassandra-e2e to s3://cscetbon-lab under snapshot first. Restore operation 9471bb4b-7ae0-4a92-8fa4-12fe82985658 on pod cassandra-e2e-dc1-rack1-1.
2m8s Normal RestoreCompleted cassandrarestore/restore-demo Restore task from cassandraBackup of datacenter dc1 of cluster cassandra-e2e to s3://cscetbon-lab under snapshot first is completed. Restore operation 9471bb4b-7ae0-4a92-8fa4-12fe82985658 on pod cassandra-e2e-dc1-rack1-1.
You can also check the status on the object
kubectl get cassandrarestores restore-demo -o json|jq '.status'
{
"condition": {
"lastTransitionTime": "Tue, 18 Nov 2020 16:03:20 GMT",
"type": "COMPLETED"
},
"coordinatorMember": "cassandra-e2e-dc1-rack1-1",
"id": "9471bb4b-7ae0-4a92-8fa4-12fe82985658",
"progress": "100%",
"timeCompleted": "2020-11-18T16:02:14.153Z",
"timeCreated": "2020-11-18T16:01:15.594Z",
"timeStarted": "2020-11-18T16:01:15.595Z"
}
Last words
So you’ve seen how easy it is to do backups and restores with CassKop. We count on you to test version 1.0.1 and give us feedback in order to continue to have the perfect solution for Cassandra users. Do not hesitate to look at our awesome documentation.
I’d like to thank all the CassKop contributors who collaborated on that release, and Instaclustr for their collaboration and the tools they’ve developed and enhanced along the way to deliver this amazing features.