We are using Resque for our background jobs and we use the Resque Web interface to keep track of what is happening. But the only issue, is that we had no real insight to how many jobs we were running on average during certain times of the day. We didn’t know how many jobs we were processing per minute. Nor how many workers we had per host all day everyday, which varies depending on the time of the day.

Download Script

Dependencies

/opt/sensu/embedded/bin/ruby /etc/sensu/plugins/resque_queue_metrics.rb  -h localhost -n 15 --scheme redis-db01.resque
redis-db01.resque.queue.update_unapplied.workers 1, 1412961997
redis-db01.resque.queue.update_unapplied.jobs 0 1412961997
redis-db01.resque.queue.aging.workers 1, 1412961997
redis-db01.resque.queue.aging.jobs 0 1412961997
redis-db01.resque.host.worker-host-01.processed 90537 1412961997
redis-db01.resque.host.worker-host-01.queues 38 1412961997
redis-db01.resque.host.worker-host-01.workers 47 1412961997
redis-db01.resque.host.worker-host-02.processed 20482 1412961997
redis-db01.resque.host.worker-host-02.queues 29 1412961997
redis-db01.resque.host.worker-host-02.workers 20 1412961997
redis-db01.resque.host.worker-host-05.processed 20560 1412961997
redis-db01.resque.host.worker-host-05.queues 1 1412961997
redis-db01.resque.host.worker-host-05.workers 9 1412961997
redis-db01.resque.host.worker-host-03.processed 10928 1412961997
redis-db01.resque.host.worker-host-03.queues 1 1412961997
redis-db01.resque.host.worker-host-03.workers 15 1412961997
redis-db01.resque.host.worker-host-04.processed 23554 1412961997
redis-db01.resque.host.worker-host-04.queues 1 1412961997
redis-db01.resque.host.worker-host-04.workers 9 1412961997
redis-db01.resque.failed.communication.Communication.ActiveRecord_RecordNotFound 3 1412961997
redis-db01.resque.failed.communication.Communication.Redis_TimeoutError 1 1412961997
redis-db01.resque.failed.create_batch.CreateBatch.BatchCreationValidator_Error 77 1412961997
redis-db01.resque.failed.create_batch.CreateBatch.RuntimeError 5 1412961997
redis-db01.resque.failed.notification.Notification.ActiveRecord_RecordInvalid 2 1412961997
redis-db01.resque.failed.notification.Notification.Resque_DirtyExit 4 1412961997
redis-db01.resque.failed.notification.Notification.Redis_TimeoutError 3 1412961997
redis-db01.resque.failed.message_notifications.MessageTaskJob.RuntimeError 2 1412961997
redis-db01.resque.failed.note_upload.NoteUploadJob.HttpClient_AuthorizationError 40 1412961997
redis-db01.resque.failed.invitation.InvitationJob.ActiveRecord_RecordInvalid 2 1412961997
redis-db01.resque.failed.post.PostJob.Redis_TimeoutError 6 1412961997
redis-db01.resque.failed.post.PostJob.Resque_DirtyExit 48 1412961997
redis-db01.resque.failed.encounter.EncounterJob.Exception_NoTransitionAllowed 1 1412961997
redis-db01.resque.failed.claim_assembly.ClaimAssembly.Redis_TimeoutError 2 1412961997
redis-db01.resque.failed.claim_assembly.ClaimAssembly.Resque_DirtyExit 3 1412961997
redis-db01.resque.failed.claim_assembly.ClaimAssembly.Exception 1 1412961997
redis-db01.resque.failed.balance.Balancer_BalancerJob.Resque_DirtyExit 28 1412961997
redis-db01.resque.failed.balance.Balancer_BalanceJob.Redis_TimeoutError 1 1412961997
redis-db01.resque.failed.balance.Balancer_BalanceJob.Resque_DirtyExit 2 1412961997
redis-db01.resque.queue.failed.jobs   231 1412961997
redis-db01.resque.queues  35  1412961997
redis-db01.resque.workers 100 1412961997
redis-db01.resque.working 2   1412961997

Screenshots

Now you can create dashboards, with enough information, that the developers will no longer need to ask you what is happening with resque.

Resque failed jobs.
Resque jobs.
Resque workers.

Before I started working here, they were only collecting the common metrics for passenger (queue, processes, and max processes). After being here a little over a month, I realized that we were always logging into the nodes and running watch passenger-status, and watching how much memory certain passenger workers were consuming and how much time each of these processes were taking.

I kept telling my team, there had to be a better way of gathering this information. After a quick glance at passenge-status –help, I hit the gold mine.

Usage: passenger-status [options] [Phusion Passenger's PID]

Tool for inspecting Phusion Passenger's internal status.

Options:
        --show=pool|requests|backtraces|xml|union_station
                                     Whether to show the pool's contents,
                                     the currently running requests,
                                     the backtraces of all threads or an XML
                                     description of the pool.
    -v, --verbose                    Show verbose information.

Download Script

Dependencies

Testing passenger-status –show xml

If you installed libxml2-utils, the xml will look very nice.. Lets take a quick glance, on what the show xml command will display….

passenger-status --show xml
<?xml version="1.0" encoding="iso8859-1"?>
<info version="2">
   <process_count>1</process_count>
   <max>20</max>
   <capacity_used>1</capacity_used>
   <get_wait_list_size>0</get_wait_list_size>
   <get_wait_list/>
   <supergroups>
      <supergroup>
         <name>/home/deploy/test_app/current</name>
         <state>READY</state>
         <get_wait_list_size>0</get_wait_list_size>
         <capacity_used>1</capacity_used>

Another example of the xml data, but showing you the process data instead…

passenger-status --show xml
          <processes>
               <process>
                  <pid>8540</pid>
                  <sticky_session_id>1653709171</sticky_session_id>
                  <gupid>167557a-WVPmDSUsnlj</gupid>
                  <connect_password>l;sdkflskdjfslkfjsaj;flkdjs</connect_password>
                  <concurrency>1</concurrency>
                  <sessions>0</sessions>
                  <busyness>0</busyness>
                  <processed>102</processed>
                  <spawner_creation_time>1412922319025804</spawner_creation_time>
                  <spawn_start_time>1412958391718240</spawn_start_time>
                  <spawn_end_time>1412958391746191</spawn_end_time>
                  <last_used>1412958455204460</last_used>
                  <uptime>1m 5s</uptime>
                  <life_status>ALIVE</life_status>
                  <enabled>ENABLED</enabled>
                  <has_metrics>true</has_metrics>
                  <cpu>4</cpu>
                  <rss>171028</rss>
                  <pss>136789</pss>
                  <private_dirty>103424</private_dirty>
                  <swap>0</swap>
                  <real_memory>103424</real_memory>
                  <vmsize>642504</vmsize>

Running the script

As you can see, you can get a wealth of data. Data that will allow you and your team to easily track down what is happening in you rails/sinatra cluster.

This script was written to work with Sensu and it’s graphite handler.

/opt/sensu/embedded/bin/ruby /etc/sensu/plugins/passenger_metrics.rb --scheme rails-01.passenger
rails-01.passenger.max_pool_size  50  1412958773
rails-01.passenger.processes  23  1412958773
rails-01.passenger.top_level_queue    0   1412958773
rails-01.passenger._var_www_html_test_api_current.queue    0   1412958773
rails-01.passenger._var_www_html_test_api_current.processes    23  1412958773
rails-01.passenger._var_www_html_test_api_current.processes_being_spawned  0   1412958773
rails-01.passenger._var_www_html_test_api_current.process_1.processed  3567    1412958773
rails-01.passenger._var_www_html_test_api_current.process_1.pid    7703    1412958773
rails-01.passenger._var_www_html_test_api_current.process_1.uptime 921 1412958773
rails-01.passenger._var_www_html_test_api_current.process_1.memory 909664  1412958773
rails-01.passenger._var_www_html_test_api_current.process_1.cpu_percent    27  1412958773
rails-01.passenger._var_www_html_test_api_current.process_2.processed  2194    1412958773
rails-01.passenger._var_www_html_test_api_current.process_2.pid    11878   1412958773
rails-01.passenger._var_www_html_test_api_current.process_2.uptime 659 1412958773
rails-01.passenger._var_www_html_test_api_current.process_2.memory 644108  1412958773
rails-01.passenger._var_www_html_test_api_current.process_2.cpu_percent    23  1412958773
rails-01.passenger._var_www_html_test_api_current.process_3.processed  1484    1412958773
rails-01.passenger._var_www_html_test_api_current.process_3.pid    16146   1412958773
rails-01.passenger._var_www_html_test_api_current.process_3.uptime 402 1412958773
rails-01.passenger._var_www_html_test_api_current.process_3.memory 691392  1412958773
rails-01.passenger._var_www_html_test_api_current.process_3.cpu_percent    32  1412958773
rails-01.passenger._var_www_html_test_api_current.process_4.processed  1343    1412958773
rails-01.passenger._var_www_html_test_api_current.process_4.pid    16738   1412958773
rails-01.passenger._var_www_html_test_api_current.process_4.uptime 353 1412958773
rails-01.passenger._var_www_html_test_api_current.process_4.memory 521676  1412958773
rails-01.passenger._var_www_html_test_api_current.process_4.cpu_percent    32  1412958773

Screenshots

Now you can create dashboards, with enough information, that the developers will no longer need to ask you what is happening with passenger.

Passenger Workers.
Passenger Queues.
Passenger Workers Memory Used.
Passenger Workers Time Spent.
Nginx and Passenger Stats.

I’m currently at the tail end of collecting metrics for the current company I work for. We are using a combination of.

I built a python script, that leverages the awesome tool logtail in order to collect http status codes, without consuming too much cpu/ram.

I have tested this script against nginx logs and against 2 different formats (txt and json).

Download Script

Basic Logging

"GET /test/foo HTTP/1.0" 200 76288 "-" "Ruby"

Json Logging

I printed the json on multiple lines, but in the log, the json is all in 1 line.

{
    "timestamp": "2014-09-26T08:45:35-04:00",
    "fields": {
        "remote_addr": "127.0.0.1",
        "remote_user": "-",
        "body_bytes_sent": "4823",
        "request_time": "0.127",
        "status": "200",
        "request": "POST /rubyamf_gateway/ HTTP/1.0",
        "request_method": "POST",
        "http_referrer": "https://foo.bar.net/test/test",
        "http_user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36"
    }
}

The only dependency of this script is logtail which you can install easily on any *nix system.

  • (Debian/Ubuntu) apt-get install logtail
  • (CentOS/RedHat) yum install logtail

If you do not have logtail installed you will get the following error

{9:45}~/Linuxdynasty/scripts/sensu/metrics:master ✓ python nginx-status-code-metrics.py -h
Please install logtail

The following options are available

{9:45}~/Linuxdynasty/scripts/sensu/metrics:master ✓ ➭ python nginx-status-code-metrics.py -h
Options:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory=DIRECTORY
                        The directory where the access logs exist
  -f FILE, --file=FILE  name of the access log
  -t FILE_FORMAT, --file_format=FILE_FORMAT
                        json or txt
  -s SCHEME, --scheme=SCHEME
                        the metric naming scheme

This is how you would run the script

{9:45}~/Linuxdynasty/scripts/sensu/metrics:master ✓ ➭ python nginx-status-code-metrics.py -d /var/log/nginx -f access.json -t json
webserver01.codes.200 1080 1411736116.44
webserver01.codes.304 2 1411736116.44
webserver01.codes.404 19 1411736116.44

The end result will look like this. Grafana Screenshot

I just finished migrating from wordpress, and I love having the simplicity of just managing static pages and being able to edit them right from vim. I will be updating my old posts little by little, so if a download link is not valid than just check my GitHub Page

I have been so focused on vFense, that I have been negligent of this blog. vFense has come a long way from it’s inception a little over 20 months ago. We are still in the Beta, but we are continuously making improvements. As of right now, I’m currently the only active developer working on the next major release 0.8.0, which is a complete server rewrite. Examples of what is comming…

  • All functions, methods, and classes will be fully documented.
  • Vulnerability correlation for the following operating systems…
    • RedHat and clones
    • Ubuntu and clones
    • Windows
    • CVE/NVD
  • Moved to using APScheduler v3.1.0 for managing scheduled jobs
  • Logging of all administrative tasks into RethinkDB.
  • Remote command line tool for making API calls to vFense (Think of chef’s knife command)
  • Token based authentication for the agents
  • Separated WEB authentication from Agent authentication.