- Create your Drupal Website. Be sure to pour in in all your spare time, neglect your family and personal hygiene, and add a dash of your heart and soul.
- Learn SEO, marketing, backlink building strategies, advertising, UNIX shell commands, PHP, MySQL and general “webmastering”. Build the site to over 15,000 members, and bring in another 10,000 – 15,000 anonymous visitors each day. Make sure you’re generating $250 – $400/day.
- Neglect your database for ONLY a few weeks then sit back and watch it implode right before your eyes – while you sit helplessly… head in hands.. weeping…just..weeping
Complete, utter, total meltdown.. That’s what I was faced with yesterday afternoon. CPU racing to over 100% (i guess that’s possible with a multiprocessor, but trust me it’s not something you’d ever want to see), memory usage at 98%, MySQL crash, restart, rinse, repeat.
I think it all started about a month ago.. For the last two years, once or twice a week I would put the site in maintenance mode, disable the cache, then go into PhPMyAdmin and truncate the cache table and sessions table.. I would then optimize all tables, then to finish off i’d do a full SQL export and save it off to my local HDD for safe-keeping. In addition, about once every 6 months or so while performing this ritual I would also truncate the history table.
Last month I got lazy and started downloading my automatic (cPanel) backups to my Mac for safe-keeping. This way I didn’t have to take-down the site or bother with any of that other (apparently) important maintenance stuff. But after a few weeks I started seeing these following random errors in the Drupal logs:
Duplicate entry '8xyz7e' for key 1 query: INSERT INTO sessions (sid, uid, cache, hostname, session, timestamp) VALUES ('87xyz7e', 0, 0, 'xx.xx.xxx.xx', '', 1218510714) in /home/public_html/includes/database.mysql.inc on line 117.
Duplicate entry '28771-1' for key 1 query: INSERT INTO history (uid, nid, timestamp) VALUES (28771, 1, 1218511966) in /home/public_html/includes/database.mysql.inc on line 117.
I had rarely ever seen these errors before, and since they didn’t seem to hurt anything, I didn’t give them a second-though. After a few more days I started to notice the site lagging for 20-30 seconds at a time, with a corresponding
Duplicate entry '8xyz7e' for key 1 query: INSERT INTO sessions error in the logs. These lags started happening a little more often, and the moderators had even mentioned it.. I did a little research on the INSERT INTO SESSIONS error and there didn’t seem to be any definite fix, so I figured I might just have to live with it.
THEN… Monday, right after getting home from my day-job early and planning to take a nap because I started before 5am, I found the site totally down. Page loads were taking in excess of 3-minutes, then MySql errors, MySql would crash & automagically restart, giving me about a minute to view the logs, then it would start all over again.. The logs were filled with INSERT INTO SESSIONS and DUPLICATE ENTRY INSERT INTO HISTORY errors and CPU usage and memory usage were maxing out, making it hard even to get into cPanel.
Googling the errors didnt come up with much. Most references that I found indicated a few possibilities: Corrupted tables, Sequences table out of sequence with the sessions table or history table, or missing “user 0” in the node_access table.
Optimizing and repairing all tables didn’t help, so next I truncated the cache, sessions, history, and accesslog tables then re-optimized.. Still no joy.
Next, I tried to get the sessions and history tables in sync with the sequences table by following this bit of wisdom I found. (i saved the text, but not the source/link.. sorry):
Check the affected table. In this case, the error tells us that it is the node table (query: INSERT INTO node). This also happens with other tables, modify these instructions accordingly.
This is the node table, so the problem is the node id (nid). Each node has a unique id. Look at your table, and find the highest node id (nid). If you have many nodes, it may help to sort your table by nid to find the highest one.
Go to your sequences table. Change the node id in the sequences table to a number higher than the id you found in step 2.
For my problem, I replaced the ‘node table’ with sessions and history tables. This cured the problem with the DUPLICATE ENTRY INSERT INTO HISTORY errors – leaving the site semi-usable, but it did nothing for the DUPLICATE ENTRY INSERT INTO SESSIONS errors that were still filling the logs. Now instead of 98% memory usage and over 100% CPU, we were humming along at 60%-80% CPU and roughly the same memory usage.
Next I came across another fix that seemed to help many Drupal sites with similar problems, which involved running these three SQL statements:
INSERT INTO users (uid, name, mail) VALUES ('0', '', '');
INSERT INTO users_roles (uid, rid) VALUES (0, 1);
INSERT INTO node_access VALUES (0, 0, 'all', 1, 0, 0);
These can be safely run, and basically all they do is insert a “user 0” into the users, user_roles and node_access tables. If the entries already exist, you’ll get a duplicate-entry error. (if you try this, please backup your tables or entire database first!). The first two INSERTS gave me the duplicate-entry error (no changes made).. But the third, inserting user-0 into the node_access table executed with no errors – a quick look at the CPU usage and BANG, down to below 1% ! Problem solved!
As I was beginning to relax and pat myself on the back for a job well done, I received a private message from one of the members on the site. The member wanted to know why he was suddenly able to see all the posts in the (previously unaccessable) “Moderators forum” area. I quickly logged-out of the site and viewed a few pages as an anonymous visitor and was able to get into all private/premium/hidden/restricted forums! A little more research revealed that adding the “user 0” entry into the node_access table effectively gives all visitors access to all nodes – fine for most websites, but very VERY bad if you are using Taxonomy Access Control ! WARNING !! If you use the Taxonomy Access Control module DO NOT ADD USER-0 TO THE NODE_ACCESS TABLE! If you do make this mistake like I did, just disable the Taxonomy Access Control module, then re-enable it. This will remove user 0 from node_access and all of your permissions will be right back where they were.
So I got the permissions back, but then CPU and memory usage shot right back up, and the site was unstable again. At this point I decided it was time to whip-out the checkbook and contact SuperDrupalMan, Khalid Baheyeldin at 2Bits.com. Since by this time it was after 11PM I didnt expect an answer right away, but I wanted to get that ball rolling so he could get on the case first thing in the morning.
After about another hour of truncating the sessions table and cache tables, optimizing, and syncing with the sequences table, almost as if by magic (or dumb-luck), CPU usage dropped to normal AND my permissions were still working. I’m not positive exactly what the key was, but I think it was getting the sessions and history table cleared at the exact same time.
The site is now functioning normally again, and Khalid/2Bits.com are still on the case to try and figure out the root-cause and hopefully tell me how to avoid this issue in the future. He’s probably going to tell me to stop being lazy, and keep doing my database maintenance every week. An expensive and frustrating lesson…
Forgive the sloppier-than-usual writing.. I havn’t slept much in the last day..