MySQL Error Lexicon #1

MySQL is justly famous for providing incomprehensible error messages.  The typical response to receiving one of these messages is to google for it, because someone else has almost certainly run into the same problem before, and the answer will be posted on a forum.

Perhaps because NDB is somewhat more exotic than MySQL itself, I have been privileged to encounter some messages that the Internets could not completely decode for me.  As a public service, I intend to annotate the unusual error messages I receive.

The first entry in this MySQL error lexicon is deceptively complicated.  It is actually a well known error message with a common meaning.  But like all the best words, it has another more subtle meaning.  I encounted it when one of our NDB API nodes died, and after rebooting, was unable to rejoin the cluster and provided only this message, which is contradictory on its face:

[MgmSrvr] Allocate nodeid (0) failed. Connection from ip 192.168.2.24. Returned error string "Connection done from wrong host ip 192.168.2.24."
I have found that message can mean any of the following:
  1. The obvious meaning: an NDB node has been configured to connect using the wrong IP address.  Perhaps the node id is specified in the connect string and it doesn't match the IP address of the node.
  2. A node is trying to connect but it hasn't been configured at all.  This is also very common, but does require a slightly more liberal interpretation of the error message.
  3. NDB allocates resources for each node that is configured, and as long as it believes that a node is connected, it will not let another process connect as the same node (be that by specifying the node id or auto-detecting the node id using the IP address). It may be that if the node has died in a particular way, ndb_mgmd still thinks that it is connected.  Issuing "PURGE STALE SESSIONS" from ndb_mgm may cause ndb_mgmd to recognize that the node is no longer connected.  Reports indicate that improvements to NDB may have rendered this problem obsolete.
  4. Just as ndb_mgmd allocates resources for each node, so do the data nodes.  And even if ndb_mgmd is aware that the problem node is not connected, the data nodes may mistakenly believe that it is, especially if it has died in an inopportune manner.  If the configuration is correct, and issuing "PURGE STALE SESSIONS" and restarting ndb_mgmd do not solve the problem, there may be a stale connection on a data node.  The command "lsof" will show whether ndbd has an open connection to the host that can not join the cluster.  If it does, a rolling restart of the data nodes may release the resources that are preventing the node from joining.

 

0 comments
Tags: db code

Read a Book by Remote Control

Array of LED modulesThe center-mounted ceiling fixture found in most homes is a lighting fixture whose best use is perhaps to provide enough light to install other lights. You are virtually guaranteed to cast a shadow on whatever task you undertake, or have the light source in your field of vision, or in some cases, both.

That's the case in my small apartment where the single futon that I use for reading and watching television sits against the wall, facing in. During the day, the window behind it lets plenty of sunlight in, but at night, the only light source is that ubiquitous ceiling fixture which, by virtue of being behind the book I'm reading (rather than behind me), doesn't do much to differentiate the type from the paper.

Of course the solution is to put a lamp beside the futon. But this isn't that kind of apartment. A lamp would take up precious floor space, and I'm using the floor for other things like standing and walking. I do have wall space though.

In fact, the ideal lamp for my situation would not take up any space, so that one isn't even aware of it if when off. It should be energy efficient, but not an ugly, toxic fluorescent lamp. A string of LEDs perched on the window frame would be just about right.

There are two lighting goals for this project. The first is to provide back fill light for reading a book. Most of the light in the room would still come from the ceiling fixture, but the LEDS will bounce extra light on the ceiling behind the reader and illuminate the page. The second goal is to have house lights for watching TV and movies. Lights that can be controlled by remote, so that after watching a movie in the dark, I don't have to stumble across the room to turn on the lights. The LEDs alone achieve the first goal. The second needs some computer automation.

A single LED module

The easiest way to control the LEDs and provide a nice dimming function to ease transitions in and out of black is to do pulse width modulation (PWM) which is electronic-speak for turning something on and off real fast. You could do this with just a parallel port on a computer, but it makes more sense to have a programmable controller like the one on an Arduino handle a task like that.

To construct the system, I bought 20 white LED modules from SparkFun, and strung them together in two channels of 10 modules each. The mounting brackets they provide (I broke off the screw attachment and just used the double sided tape) have enough room to thread the jumper cable for the second channel through the brackets for the first. With this method, alternating modules are on channels in an odd/even configuration. This is helpful in the low end of the dimming range where the ability to have the different channels at separate levels gives the appearance of smoother dimming.Graph of output functions

The Arduino is very simple to use. It has a Java based IDE so that you can instantly upload your programs to the controller and test them out. I wrote a program that receives input over the USB connection and sets PWM values appropriate for the desired light level. Setting the values linearly didn't produce a fade that felt linear, so I applied a simple exponential function to adjust the output to appear more linear. I used different functions for each channel so that in the lower part of the fade, the channels have values with different luminosity to smooth the lower end, but as they increase in value, they approach 1 and the output values are not much different in luminosity.

  • int(255*pow(x/255.0, 2.0))
  • int(255*pow(x/255.0, 1.6))

The next component is the software that controls the system. It listens for IR events from an IguanaWorks USB IR tranceiver via LIRC and fades the lights appropriately. It responds to direct on/off commands, but it also watches for commands sent to other systems to infer what is going on in the room. For instance, if it sees the TV turn on, it fades the lights up, then slowly to half. After seeing further IR activity, it fades the lights out on the assumption that everyone has found their seat. When the TV turns off, the lights come back up.

I could hook up a physical button or switch to the Arduino to control the lights as well, but since my remote is programmable, why add more clutter? I have a Harmony 880 universal remote, which is activity based. To use it, one begins by pressing a button for an activity such as "Watch a DVD" or "Watch TV". I added a new activity, which may be unusual for a remote control: "Read a book".

 Remote control

0 comments

Read Buffer performance hit

I had some fun yesterday with some odd performance problems. So I did a run with oprofile and got this:


561612   25.0417  /lib64/tls/libc-2.3.4.so memset

429457   19.1491  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page

214268    9.5540  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_page_fault

144293    6.4339  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_no_page

94410     4.2097  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux buffered_rmqueue

64998     2.8982  /lib64/tls/libc-2.3.4.so memcpy

59565     2.6559  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __down_read_trylock

59369     2.6472  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux handle_mm_fault

47312     2.1096  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux free_hot_cold_page

39161     1.7462  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux release_pages

39140     1.7452  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux unmap_vmas

27200     1.2128  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __alloc_pages

21520     0.9596  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux
__pagevec_lru_add_active

20772     0.9262  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux page_add_anon_rmap

19852     0.8852  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux page_remove_rmap

16424     0.7323  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux bad_range

11993     0.5348  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __might_sleep

Which seemed to me like an awful lot of page faults, especially on a system that wasn't
swapping at all.

Alexander Rubin and I got to talking about it, and I learned a few new things. The first thing is that mmap() appears as page_faults, because the
machinery for doing mmap() and for doing swap is actually the same. Fun.

Back to the problem at hand... the following was in the my.cnf:

read_buffer=5M

As I understand it, if read_buffer set to over 256K, it uses mmap() instead of malloc() for memory
allocation. Actually - this is a libc malloc thing and is tunable, but defaults to 256k. From the manual:
Each thread that does a sequential scan allocates a buffer of this size (in bytes) for each table it scans.
In this case, the server is pretty much handling click logging, so all of the queries are pure inserts. Why then would the read buffer get used at all?

Well, it just so happens that this is a PHP app and is not using any sort of persistent connections, so the app is connecting and disconnecting 60 - 100 times a second. Each of those connections is going to have to authenticate against (at least) the mysql.user table. On the other hand, that _should_ be cached, so it shouldn't be a problem. However, lowering the read buffer is quite easy ... and it'll be interesting.

So, theory to the fire - I reduced read_buffer to 128k (the default) and then got this output from oprofile:


2220     15.7760  /lib64/tls/libc-2.3.4.so memset

785       5.5785  /lib64/tls/libc-2.3.4.so memcpy

608       4.3206  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux thread_return

398       2.8283  /usr/libexec/mysqld      yyparse(void*)

337       2.3948  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux system_call

234       1.6629  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_gettimeoffset_hpet

197       1.3999  /lib64/tls/libc-2.3.4.so _int_malloc

183       1.3005  /lib64/tls/libpthread-2.3.4.so pthread_mutex_lock

181       1.2862  /lib64/tls/libpthread-2.3.4.so pthread_mutex_unlock

181       1.2862  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page

176       1.2507  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux wake_futex

143       1.0162  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux sys_tgkill

138       0.9807  /usr/libexec/mysqld      yylex(void*, void*)

130       0.9238  /usr/libexec/mysqld      update_sys_var_str(sys_var_str*, _pthread_rwlock_t*, set_var*)

126       0.8954  /jbd                     (no symbols)

123       0.8741  /usr/libexec/mysqld      my_strntoul_ucs2

106       0.7533  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __might_sleep

106       0.7533  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_futex

104       0.7391  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux tcp_recvmsg


Which is more what we would expect.

The results on the system as a whole were that with read buffer set to 5M, extra 3-5% usr and 6-12% sys cpu time taken on an 8 cpu box (as reported by top) or, to look at it another way, about 100% of a single CPU taken by mysqld.

With read_buffer reduced to 128k, the CPU usage dropped to just about nothing.
0 comments
Tags: code db

mysql-proxy on ubuntu (and debian)

il corra walks you through building mysql-proxy for ubuntu.

il corra » mysql-proxy on ubuntu 7.04 feisty
First of all, there is not a packetized mysql-proxy for Ubuntu, so the only way to install it is to build it from the source


Which is great. But I'd like to take this opportunity to tell people that I've actually been working on packages for debian/ubuntu. They're almost ready to be released into the wild (I'm waiting on an almost non-related event) If you'd like to play with the packaging stuff before then, check out http://launchpad.net/mysql-proxy

I'll be sure to let everyone know when the packages themselves are in an APT repository.


0 comments
Tags: db code dpkg

Why I hate YUM

I really hate YUM, the crappy-ass apt wanna-be that all the non-Debian distros wet themselves over. Yes, it's better than what they had before, which was NOTHING. But it is absolutely inexcusable that, when apt was already available, powerful and quick, that they wrote YUM. Not because I think everyone has to use my favorite toy, but because it's a worse tool. It's it's slower, it's stupid and it gives terrible feedback.


Let's start with today's beef. I tried to do this:
yum install libselinux-devel
on a just-installed Fedora 7. What I got was this:
Transaction Check Error:
file /usr/share/man/man8/matchpathcon.8.gz from install of libselinux-2.0.14-4.fc7
conflicts with file from package libselinux-2.0.13-1.fc7
file /usr/share/man/man8/selinux.8.gz from install of libselinux-2.0.14-4.fc7
conflicts with file from package libselinux-2.0.13-1.fc7

Tell anybody what the problem is? First of all, has anyone ever tried installing this package?

It turns out, what it's trying to do is install the x86_64 and the i386 version of this package, and each one of those packages want to install those manpages. It seems fairly common to me that people might want to install this package on a 64-bit server. Why does it install the 32-bit lib? Why do the packages conflict? And why does the error message suck so bad?

Speaking of terrible error messages. Try this:
yum install kernel-dev
Wait, you might exclaim, being a RedHat person, this isn't a Debian system, we name our packages -devel. Quite true, my bad. Except what does YUM tell me?


Loading "installonlyn" plugin
Setting up Install Process
Parsing package install arguments
Nothing to do
Hm. Does that mean the package is already installed? Is it up to date? Or do we not even have a package named kernel-dev? Here's another way to do the same thing:



mtaylor@qualinost:~$ sudo apt-get install kernel-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Couldn't find package kernel-dev
Look at that! E for error, and then it shock tells me what the problem was.

The work has already been done! If you don't want to use APT, that's fine, you could at least look and see what it gets right and then improve on it! I mean, that's the way this whole Free Software thing is supposed to work, isn't it? Granted, maybe having to specify YUM repositories in chunks of XML is an improvement, and maybe I'm not seeing all the wonderful ways in which this is better than the crappy old APT from Debian.

But I don't think so.
0 comments