Setting Up Cassandra Cluster in AWS

By | October 10, 2017

Apache Cassandra is a NoSQL database that enables for straightforward horizontal scaling, using the constant hashing mechanism. Seven years in the past I attempted it and determined not use it for a side-project of mine as a end result of it was too new. Issues are totally different now, Cassandra is effectively established, theres an organization behind it (DataStax), there are much more instruments, documentation and neighborhood assist. So as quickly as once more, I made a decision to strive Cassandra.

This time I want it to run in a cluster on AWS, so I went on to setup such a cluster. Googling methods to do it offers a quantity of fascinating outcomes, like this, this and this, however they’re both incomplete, or outdates, or have too many irrelevant particulars. So they’re solely of average help.

My purpose is to make use of CloudFormation (or Terraform potentially) to launch a stack which has a Cassandra auto-scaling group (in a single region) that may develop as simply as rising the variety of nodes within the group.

Also, with a function to have the net utility hook up with Cassandra with out hardcoding the node IPs, I wished to have a load balancer in entrance of all Cassandra nodes that does the round-robin for me. The choice for that might be to have a client-side round-robin, however that might imply some additional complexity on the shopper which appears avoidable with a load balancer in entrance of the Cassandra auto-scaling group.

The related bits from my CloudFormation JSON might be seen here. What it does:

Sets up three non-public subnet (1 per availability zone within the eu-west region)Creates a safety group which permits incoming and outgoing ports that permit cassandra to merely accept connections (9042) and for the nodes to gossip (7000/7001). Observe that the ports are solely accessible from inside the VPC, no exterior connection is allowed. SSH goes solely by a bastion host.Defines a TCP load balancer for port 9042 the place all shoppers will join. The load balancer requires a so-called Goal group which is outlined as well.Configures an auto-scaling group, with a pre-configured variety of nodes. The autoscaling group has a reference to the goal group, in order that the load balancer all the time sees all nodes within the auto-scaling groupEach node within the auto-scaling group is similar primarily based on a launch configuration. The launch configuration runs a couple of scripts on initialization. These scripts will most likely be run for each node both initially, or in case a node dies and one other one is spawned as an alternative, or when the cluster has to develop. The scripts are fetched from Sthree, the place you may publish them (and model them) both manually, or with an automatic process.Note: this doesn’t configure particular EBS volumes and in actuality it’s potential you’ll must configure and fix them, if the occasion storage is inadequate. Dont fear about nodes dying, although, as information is safely replicated.

That was the simple half a bunch of AWS sources and port configurations. The Cassandra-specific setup is a bit tougher, because it requires understanding on how Cassandra functions.

The two scripts are and, so bash and python. Bash for setting-up the machine, and python for cassandra-specific stuff. As an alternative of the bash script one could use a pre-built AMI (image), e.g. with packer, however since solely 2 items of software program are put in, I assumed its a little bit of an overhead to assist AMIs.

The bash script might be seen here, and easily installs Java eight and the most recent Cassandra, runs the python script, runs the Cassandra companies and creates (if needed) a keyspace with correct replication configuration. A couple of notes right here the cassandra.yaml.template might be provided through the cloudformation script as an alternative of getting it fetched through bash (and having the cross the bucket name); you can even have it fetched within the python script itself its a matter of choice. Cassandra just isn’t configured to be used with SSL, which is usually a foul thought, however the SSL configuration is out of scope of the fundamental setup. Lastly, the script waits for the Cassandra course of to run (using a while/sleep loop) after which creates the keyspace if wanted. The keyspace (=database) must be created with a NetworkTopologyStrategy, and the variety of replicas for the actual datacenter (=AWS region) must be configured. The worth is three, for the three availability zones the place effectively have nodes. Meaning theres a replica in every AZ (which is seen like a rack, though its precisely that).

The python script does some crucial configurations with out them the cluster wont work. (I dont work with Python usually, so be at liberty to criticize my Python code). The script does the following:

Gets the present autoscaling group particulars (using AWS EC2 APIs)Sorts the cases by timeFetches the primary occasion within the group with a function to assign it as seed nodeSets the seed node within the configuration file (by changing a placeholder)Sets the listen_address (and due to this fact rpc_address) to the non-public IP of the node with a function to permit Cassandra to hear for incoming connections

Designating the seed node is vital, as all cluster nodes have to hitch the cluster by specifying no much less than one seed. You could get the primary two nodes as an alternative of only one, nevertheless it shouldnt matter. Observe that the seed node just isn’t all the time fastened its simply the oldest node within the cluster. If sooner or later the oldest node is terminated, every new node will use the second oldest as seed.

What I havent proven is the cassandra.yaml.template file. It’s principally a replica of the cassandra.yaml file from a typical Cassandra set up, with a couple of changes:

cluster_name is modified to match your utility identify. That is only for human-readable functions, doesnt matter what you set it to.allocate_tokens_for_keyspace: your_keyspace is uncommented and the keyspace is ready to match your primary keyspace. This allows the new token distribution algorithm in Cassandra 3.0. It permits for evenly distributing the information throughout nodes.endpoint_snitch: Ec2Snitch is ready as an alternative of the SimpleSnitch to make use of AWS metadata APIs. Observe that this setup is in a single area. For multi-region theres one other snitch and a few addtional issues of exposing ports and altering the printed mentionted above, ${private_ip} and ${seeds} placeholders are positioned within the acceptable locations (listen_address and rpc_address for the IP) with a function to permit substitution.

The helps you to run a Cassandra cluster as a half of your AWS stack, which is auto-scalable and doesnt require any guide intervention neither on setup, nor on scaling up. Effectively, allegedly there could additionally be points that should be resolved when you hit the usecases of actuality. And for shoppers to hook up with the cluster, merely use the load balancer DNS identify (you can print it in a config file on every utility node)

Please check this great service at: or visit FREE SERVICES menu
[Total: 0    Average: 0/5]

3 thoughts on “Setting Up Cassandra Cluster in AWS

  1. tech news weekly podcast

    Microsoft has chosen to integrate the Zune software in the phone, that is mature
    and full-featured, having been in the Microsoft Zune for years.

    Sling Media’s Sling – Player software for Windows lets the viewer control live
    TV using the Live Video Buffer–pause, rewind or fast-forward as much as sixty
    minutes of video. Upon inspection the Slingbox Pro
    distinguishes itself from the sibling the Slingbox by permitting the link and power over up to
    four audio-visual sources in addition to providing a Freeview DBV-T tuner for basic cable connections.

  2. Wordpress Theme

    Good day very nice blog!! Man .. Beautiful .. Amazing ..
    I’ll bookmark your web site and take the feeds
    additionally? I’m satisfied to find numerous useful information here in the post, we
    need work out extra strategies in this regard, thanks for sharing.
    . . . . .


Leave a Reply

Your email address will not be published. Required fields are marked *