Deploying Riak on ec2

I’ve been playing around with Riak for a while now and have deployed it to a couple of production environments. I’ll share my experiences here on ec2 deployments since there are some gray areas to cover at the time of writing this. I assume you already know what Riak is and have already worked with it. The same goes for EC2.

Deployment strategy

There are a couple of problems to field when deploying Riak.

1. The EC2 instances that are provisioned by default change the following on restart.

  • Private IP address
  • Public IP address
  • Private DNS
  • Public DNS

2. EBS instances provide stable durable storage while Ephemeral storage provides for better predictable performance at the cost of losing data on restarts.

3. Performance.

IP Address change

Riak does not like change in IP addresses. You have a couple of solutions.

1. Listen on 0.0.0.0 to accept any connection.

2. Configure a host on /etc/hosts with a valid FQDN. For example on a 3 node cluster you would have beam listening in on riak1.something.com ; riak2.something.com ; riak3.something.com. When IP addresses change you must change the IPs that these FQDNs point to.

3. Same as #2, except you can use a private DNS server or Route 53 to manage the mapping.

4. Assign a public static EIP to your node.

5. Deploy to Amazon VPC where IP addresses do not change.

What to pick ?

#1 does not change the fact that beam instances need to find other Riak nodes in the cluster. You are still stuck trying to get Riak to find other nodes correctly.

#2 is a pain in terms of maintenance. The time taken to boot back a cluster is proportional to the number of nodes in the cluster even if you have automated scripts to change IPs.

#3 Same as #2. An over-engineered solution.

#4 You can assign at most 5 EIPs per account and that will cost you. It is also not a good idea to expose Riak to the internet with a static IP.

#5 I’d recommend this. Your IP address never changes on a private VPC which makes configuration and cluster management all the much easier. The nodes cannot reach the internet without a NAT instance. The con is that it takes some work to understand a VPC environment and set it up. Trust me though, it is worth the effort.

Storage

There is a post on the riak-users list that does justice to discussing points about storage and which one you should use. There is no single answer to this really. Your storage requirements vary widely based on the backend you choose to run Riak on. Bitcask stores all keys in memory while LevelDB stores x % of the keys in memory.

One thumb rule to guide you – if you can sleep at night knowing that you have reliable backups and can survive disk failures then you can go with ephemeral storage. EBS provides varying levels of performance based on load. However the chances that you will lose data are much lower. Remember that Riak nodes are meant to recover from failures anyway. Choose what storage option seems best for you.

Performance

I would recommend that you execute 2 performance tests.

1. Use Basho bench.

2. Use a custom load test case of your own.

Why ?

Primarily because the needs of your load will vary from what basho bench can produce. Don’t get me wrong. I think Basho Bench is awesome. But there are cases where it will not do. Consider a M/R query that is executed at 50 Ops/s. By default the number of VMs allocated to handle M/Rs cannot handle this load. This failure will not show up on basho bench since we are loading the DB with GET / PUT / DELETE and not using complex features like M/R ; linking ; link-walking ; full-text query ; index query.

Whether you use these additional features on your implementation is up to you. Just keep in mind that load test results vary depending on how the database is used.

Load and its characteristics

Your performance curve will vary drastically depending on the load you generate. If your requests are write heavy and the write_buffer configured is insufficient, you will witness horrible performance from good machines. The number of vnodes per instance also determine how much runway you have. Choose these carefully.

Best practices

Basho has a good deal of advice on its site about tuning performance. Take a look at the riak operations webinar to gain some additional insights on deployment. Feel free to drop an email on the riak users list too should you need more information. Someone always replies with good advice.

Comments are closed.