GitHub as a Server Config File Repo

I've been working on the idea of having cloud instances' config files in a git repo for some time. It's a good concept but there are a few gotchas. Here's the basic setup:

  • cd /
  • create a .gitignore file with an * in it in the root directory /
  • git init
  • follow the github directions for setting up a remote repo except for creating the directory

This allows you to then force add (e.g. git add -f etc/apache2/vhost.conf) any file on the instance and put it under version control.

Some operational notes:

  • Create a separate repo for each role of server.
  • If GitHub gives you trouble about unique SSH keys you can create a new rsa key pair and rename them to identity and identity.pub and SSH will automatically try them. You can them add the public key to the server repo and it will authenticate without issues.
  • When adding and committing files - omit the leading slash! DO NOT USE / in your commits. It will return the error msg: "fatal: '/etc/apache2/fname' is outside repository" because when you think about how git works that is the directory above the current working directory.
  • You can create a branch for individual instances of the server if they are different. git clone accepts a branch name to use as the default: git clone -b <branch> <remote_repo>
  • To use on a new server just go to the root and do:
    • create a .gitignore file like above
    • git init
    • git remote add origin git@github.com:[your repo here]
    • Then add a section like this to ./git/config:
    • [branch "master"] remote = origin merge = refs/heads/master

 

Syntax Color Coding in VIM on OSX

Real men use use VI, real keyboard gymnasts use EMACS. The rest of us use Textmate. (Flame war to follow in IM.) But for a quick, useful editor it's hard to beat vim. My problem was that I wasn't getting color coding in OSX. You can set "syntax on" but you have to do that every time. To make it stick just do this:

sudo vim /usr/share/vim/vimrc

add this line:  syntax on

save the file, restart vim and enjoy. 

Uninstalling MySQL on Mac OS X Leopard

This is from Rob Allen's dev notes. I found it useful. You can view the original here.

Uninstalling MySQL on Mac OS X Leopard

To uninstall MySQL and completely remove it (including all databases) from your Mac do the following:

  • Use mysqldump to backup your databases to text files!
  • Stop the database server
  • sudo rm /usr/local/mysql
  • sudo rm -rf /usr/local/mysql*
  • sudo rm -rf /Library/StartupItems/MySQLCOM
  • sudo rm -rf /Library/PreferencePanes/My*
  • edit /etc/hostconfig and remove the line MYSQLCOM=-YES-
  • rm -rf ~/Library/PreferencePanes/My*
  • sudo rm -rf /Library/Receipts/mysql*
  • sudo rm -rf /Library/Receipts/MySQL*
  • sudo rm -rf /private/var/db/receipts/*mysql*

The last three lines are particularly important as otherwise, you can't install an older version of MySQL even though you think that you've completely deleted the newer version!

Surprisingly Low CPU Load Ceiling on Amazon AWS

Screen_shot_2011-07-31_at_6
The image above looks innocuous enough. It's a CPU load graph. The left side of the graph looks healthy enough. Lot's of green. The right side looks like the server is working hard but as most Sys Admins know a load of 40% or less isn't too bad. But they would be wrong and sometimes dangerously so. The right side of the graph shows a server that is nearly non-responsive. You can probably ssh into it. Top will take a LONG time to load.  But this server can't do any real work. I/O will be affected. CPU intensive operations won't complete. When the 'steal' meets the load, in this case at about 38% the server will behave as a non-virtual server would have behaved when load approaches 100. The question is if 38% is the practical maximum for a small instance in US East 1B that day why doesn't Amazon AWS just tell us so?

When the neighborhood gets bad Netflix moves out. Netflix monitors the steal on their AWS servers and launches a new instance when it gets too high. Not ideal but we'll be looking into that going forward.

STEAL DEFINED: the amount of time your virtual machine's CPU has work waiting but the virtualization hypervisor is doing work for other instances instead. That time is being 'stolen' from your CPU.  

Apache Basic Authentication for a Rewrite Condition Not a Directory

Sometimes you need to protect something on your infrastructure from casual inspection. And while you could put authentication and session handling into an app just to protect something that can be quite a bit of overhead. It can be a bit tricky to figure out how Apache's basic authentication works with a URL rather than a directory. The key thing to remember is to use Location directive rather than Directory and also to put your rewrite statements outside the location block. Read the code and more details here.

 

# Apache Basic Authentication for A Rewrite Condition not a Directory
#
# NOTE: Locations work off of URLs not directories
#
# Put this in your virtualhost block
#
<Location /YOUR_URL_PATH/>
  AuthType Basic
  AuthName "Authorization Required."
  AuthUserFile /etc/httpd/conf/htpasswd
  require valid-user
</Location>

RewriteCond %{LA-U:REMOTE_USER} !^$
RewriteCond %{SCRIPT_FILENAME} ^/YOUR_URL_PATH/(.*)

#This is for HAProxy your destination will be different
RewriteRule ^.*$ http://127.0.0.1:86%{SCRIPT_FILENAME} [P,QSA,L]

# To create your password file...
# htpasswd -c /etc/httpd/conf/htpasswd

MongoDB Secondary Server Seconds Behind Script for Rightscale and Collectd

Screen_shot_2011-07-14_at_5
When you use MongoDB replication sets it can be difficult to determine if your secondary servers are up to date and functioning properly. An improperly formed Map/Reduce instruction can cause an error which stops replication. At this time there is no application error or any other notification for this scenario. Even under the best of circumstances it's nice to know how far behind your secondary servers are. The graph above shows a production replication set which hovers around 2-3 seconds lag. That doesn't mean that the Mongo servers take that long to replicate a change. Instead it means "how many seconds has it been since a replicable event occured?" If the time lag grows dramatically you know there hasn't been an insert into the primary DB or that the replication has failed. 

The code is in a github gist here.

 

#!/bin/bash -ex
#
# Description: Graph how old the last sync was between primary and secondary mongo instances.
#
# Base code and idea thanks to: Edward M. Goldberg
#
# any mistakes courtesy of: Dennis Faust
#
if [ $RS_DISTRO = ubuntu ]; then
plugin_dir="/etc/collectd/conf"
elif [ $RS_DISTRO = centos ]; then
plugin_dir="/etc/collectd.d"
fi

echo "*** This sets up a collectd plugin container"
cat <<"EOF" >$plugin_dir/MongoDB-lag.conf
LoadPlugin exec
<Plugin exec>
  Exec "mongodb" "/usr/local/bin/plugin-mongolag.sh"
</Plugin>
EOF

echo "*** This is called by the container"
cat <<"EOF" >/usr/local/bin/plugin-mongolag.sh
#!/bin/bash
#
while sleep 5; do
VALUE=`echo "db.printSlaveReplicationInfo()" | mongo | grep secs | head -1 | cut -f2 -d'=' | cut -f1 -d's' | cut -f2 -d' ' `
  echo "PUTVAL \"REPLACE/mongo/gauge-secondary_lag_secs\" interval=5 N:$VALUE"
done
EOF

sed -i "s/REPLACE/$SKETCHY_UUID/g" /usr/local/bin/plugin-mongolag.sh

chmod 777 /usr/local/bin/plugin-mongolag.sh

service collectd restart

exit 0 # leave with a smile...

MongoDB Open File Descriptors on Ubuntu Problem...

After installing Mongo DB on a stripped down Rightscale MySQL template all worked great until the replication pair were in production for a week or so. Then the primary instance started loading up the CPU and would eventually become unresponsive. When examined, the iostat outputs showed nothing surprising and mongostat showed lots of queued reads and writes but not much happening. I turned on profiling and expecting to find long running queries. But instead, any find call on the profile results never returned. This was vexing as the problem would clear up for a bit when we did a service restart on Mongo. By checking the exact timing of the CPU increase and looking in the mongodb.log file I found the following errors:

JS Error: out of memory

Some mongo research lead to a discussion of file descriptors. Of course, the methods described in several good write ups didn't work on the version of Ubuntu we're running: http://tech-torch.blogspot.com/2009/07/linux-ubuntu-tomcat-too-many-open-files.html

To make a long story short you have to explicitly name the user you want to increase the file descriptors for in the /etc/security/limits.conf file like this:

 

cat >>/etc/security/limits.conf <<EOF

 

# Increase the open file descriptors setting for mongodb and root users

mongodb soft nofile 5165

mongodb hard nofile 5165

root soft nofile 5165

root hard nofile 5165

mongodb soft nproc unlimited

mongodb hard nproc unlimited

EOF

Then after you've logged out and back in you can check the limits for root this way: ulimit -n

And mongodb (assuming you have a shell defined):  sudo -u mongodb ulimit -n 

rsync in the cloud causes CPU overload

Because there is little disk IO induced wait time in the cloud, using rsync to move or keep files up to date server to server can put tremendous load on the servers' CPUs. I've been looking around for a solution to this problem and the best I've seen is described here. I looked at another option which is a program named cpulimit but that seems like a much heavier and problematic way to solve the problem. Let me know if you've tried either method.