After up all night babysitting the rebalance process, I am happy to report that it was a rather uneventful night of maintenance. The rebalance itself took 8-9 hours to complete, and then took another hour for all the replicas to get saved to the disk also. Theoretically, I didn’t need to take the site down while the rebalance was happening, but I took the game down just to be safe and not compromise the game experience.
The disk access was definitely the bottleneck through out the rebalance process once again. One the the reason we went for more # of smaller nodes rather than smaller # of bigger nodes is to spread out our disk activities over more # of EBS drives during a rebalance, conceptually similar to a RAID 0. We do increase the higher risk of hardware failure simply by having more nodes in the cluster, but the disk performance gain is definitely worth it.
Some folks are doing a RAID 0 setup using multiple EBS as described here on alestic, but I haven’t tried it personally. If anyone has attempted that setup, especially in a production environment, please share your experience in the comments!
Key stats to monitor for your Membase cluster
Happy holidays everyone! I realize that from working with Membase in our production over the last year, I’ve collected a few key commands in my .bashrc for quickly checking vital stats on my Membase servers, many of them came from the good folks at Couchbase. Their wiki has improved over the year as well, and you can find a lot of good information there. Here I will list the most common commands I run for monitoring and troubleshooting, along with related links to the Membase wiki:
Read More »