Happy holidays everyone! I realize that from working with Membase in our production over the last year, I’ve collected a few key commands in my .bashrc for quickly checking vital stats on my Membase servers, many of them came from the good folks at Couchbase. Their wiki has improved over the year as well, and you can find a lot of good information there. Here I will list the most common commands I run for monitoring and troubleshooting, along with related links to the Membase wiki:
mbstats is the most useful utility that will print out all the important stats in the system. Like I said, many of these stats are accessible via the web console. But ultimately, if you want a script to periodically capture the data and alert if something goes wrong, you still want to keep this command handy:
/opt/membase/bin/mbstats localhost:11210 all
When first growing out your database, it’s good to observe how quickly mem_used approaches high_wat, which is when Membase will have to start ejecting data from memory, and subsequent read of ejected data will result in a disk read. For animal party, we’ve been near high_wat for many months, but our working dataset is fairly small part of the entire data, so we don’t have a lot of disk reads. But if you see your cluster is approaching high_wat quickly, start to keep a watch on how often server is fetching data from disk. If you see the cache miss ratio from the web console start to climb, then you should look into adding more memory to the cluster.
For more detailed monitoring on the effect of cache miss, ep_bg_fetched will tell the numbers disk fetches that has been performed since server was started. You can pull the number twice with a time lapse in between to estimate disk fetches per second or minute to give you a stat to record and later compare. Adding up ep_bg_load_avg and ep_bg_wait_avg should tell you how long on average it takes an item to get loaded off the disk in microsecond, and if it starts to get too high you know you need to get more memory into the cluster to reduce ejecting item and subsequently reduce disk fetches.
eq_queue_size and ep_flusher_todo combine to tell you how many items are waiting to be written to the disk. When an item needs to be written, it first gets added into a queue that goes towards the ep_queue_size count. A dedicated disk writing process or processes (ala flusher) will on a periodic basis take items out of this queue and write them to the disk. The items that are off the queue but have not made it onto the disk yet will be accounted for in ep_flusher_todo. Obviously, you always want the disk queue to drain more quickly than it’s being filled, or server will run out of memory to store these pending items.
In general, you don’t want the queue to be too large anyways, because that means you have a lot of data that are not persisted to disk. Granted, they will be replicated in another node in the cluster, but if both the main copy and the replica copy are both stuck waiting in the disk queue and servers are rebooted, you would lose those data. During a rebalance, you will see the disk queue shoot up dramatically, since data needs to be relocated to the appropriate node. This is why I choose to schedule a downtime for the application during rebalance even though theoretically it can still serve data normally – all the new writes that are happening during a rebalance may take much longer than normal before they are persisted to disk, increasing the risk of losing those data.
The Membase wiki has a good explanation on what stats to monitor and why:
http://www.couchbase.org/wiki/display/membase/Ongoing+Monitoring+and+Maintenance
As a reference, here’s a link for complete list of all the stats and what they mean:
https://github.com/membase/ep-engine/blob/master/docs/stats.org
dispatcher is another interesting command that gives you visibility into what Membase is actually doing at this moment. You can run it with the following command:
/opt/membase/bin/mbstats localhost:11210 dispatcher logs
There are a few dispatchers, and to be frank, I don’t really know any details about them (you can read http://www.couchbase.org/wiki/display/membase/DGM+Implementation+Details if you are curious). But for monitoring purpose, I look at the entries in the “Slow jobs” section. For example, this is one of the jobs that is returned in the dispatcher log:
runtime: 2s
starttime: 43439
task: Fetching item from disk: StoryQuest40000001045816
So here I see that item “StoryQuest40000001045816” took 2 seconds to come off the disk, and the job was started at a time that is 43439 seconds after the server booted up. If you didn’t see any recent slow jobs in the log, then you know things are doing fine. If a task is taking a long time to finish, you may get an idea what is going on by looking at the description, but chances are you may need to get some support help from the Membase folks to decipher the issue.
Last useful command is mbcollect_info, which is what folks at Membase always ask me to run whenever I report an issue. Since I need to run this on all the nodes in the cluster and aggregate the output, I ended up adding this to a larger script so this can be automated.
/opt/membase/bin/mbcollect_info FILENAME.zip
Key stats to monitor for your Membase cluster
Happy holidays everyone! I realize that from working with Membase in our production over the last year, I’ve collected a few key commands in my .bashrc for quickly checking vital stats on my Membase servers, many of them came from the good folks at Couchbase. Their wiki has improved over the year as well, and you can find a lot of good information there. Here I will list the most common commands I run for monitoring and troubleshooting, along with related links to the Membase wiki:
mbstats is the most useful utility that will print out all the important stats in the system. Like I said, many of these stats are accessible via the web console. But ultimately, if you want a script to periodically capture the data and alert if something goes wrong, you still want to keep this command handy:
When first growing out your database, it’s good to observe how quickly mem_used approaches high_wat, which is when Membase will have to start ejecting data from memory, and subsequent read of ejected data will result in a disk read. For animal party, we’ve been near high_wat for many months, but our working dataset is fairly small part of the entire data, so we don’t have a lot of disk reads. But if you see your cluster is approaching high_wat quickly, start to keep a watch on how often server is fetching data from disk. If you see the cache miss ratio from the web console start to climb, then you should look into adding more memory to the cluster.
For more detailed monitoring on the effect of cache miss, ep_bg_fetched will tell the numbers disk fetches that has been performed since server was started. You can pull the number twice with a time lapse in between to estimate disk fetches per second or minute to give you a stat to record and later compare. Adding up ep_bg_load_avg and ep_bg_wait_avg should tell you how long on average it takes an item to get loaded off the disk in microsecond, and if it starts to get too high you know you need to get more memory into the cluster to reduce ejecting item and subsequently reduce disk fetches.
eq_queue_size and ep_flusher_todo combine to tell you how many items are waiting to be written to the disk. When an item needs to be written, it first gets added into a queue that goes towards the ep_queue_size count. A dedicated disk writing process or processes (ala flusher) will on a periodic basis take items out of this queue and write them to the disk. The items that are off the queue but have not made it onto the disk yet will be accounted for in ep_flusher_todo. Obviously, you always want the disk queue to drain more quickly than it’s being filled, or server will run out of memory to store these pending items.
In general, you don’t want the queue to be too large anyways, because that means you have a lot of data that are not persisted to disk. Granted, they will be replicated in another node in the cluster, but if both the main copy and the replica copy are both stuck waiting in the disk queue and servers are rebooted, you would lose those data. During a rebalance, you will see the disk queue shoot up dramatically, since data needs to be relocated to the appropriate node. This is why I choose to schedule a downtime for the application during rebalance even though theoretically it can still serve data normally – all the new writes that are happening during a rebalance may take much longer than normal before they are persisted to disk, increasing the risk of losing those data.
The Membase wiki has a good explanation on what stats to monitor and why:
http://www.couchbase.org/wiki/display/membase/Ongoing+Monitoring+and+Maintenance
As a reference, here’s a link for complete list of all the stats and what they mean:
https://github.com/membase/ep-engine/blob/master/docs/stats.org
dispatcher is another interesting command that gives you visibility into what Membase is actually doing at this moment. You can run it with the following command:
There are a few dispatchers, and to be frank, I don’t really know any details about them (you can read http://www.couchbase.org/wiki/display/membase/DGM+Implementation+Details if you are curious). But for monitoring purpose, I look at the entries in the “Slow jobs” section. For example, this is one of the jobs that is returned in the dispatcher log:
So here I see that item “StoryQuest40000001045816” took 2 seconds to come off the disk, and the job was started at a time that is 43439 seconds after the server booted up. If you didn’t see any recent slow jobs in the log, then you know things are doing fine. If a task is taking a long time to finish, you may get an idea what is going on by looking at the description, but chances are you may need to get some support help from the Membase folks to decipher the issue.
Last useful command is mbcollect_info, which is what folks at Membase always ask me to run whenever I report an issue. Since I need to run this on all the nodes in the cluster and aggregate the output, I ended up adding this to a larger script so this can be automated.