MetaPost comments and questions about the messageboards and other parts of EN World. If you have a problem, this is where to go. Moderator email addresses are posted here.
Okay, so forgive me if I don't know what I'm talking about (probably), but I'm in a course this semester (6.033: Computer Systems Engineering) wherein we learned a little bit about databases. Granted, this was not a course completely on databases, and we didn't actually do any database hacking, but we learned sound principles of database design. Anyways, in the course, we had a long section on fault tolerance wherein we were taught that any database should have a log, stored on a different disk, to be used for recovery in the case that the database becomes corrupt. Just wondering, why don't we have a log for ENWorld? Is this not common practise? It was implied in our class that even system dating back to ARIES in 1992 have been using logs. Just wondering.
__________________ "That's so far over the line between genius and madness, it passed all the way through Cthulhuville and normalcy and all the way back to brilliance!"--MoogleEmpMog
The log files can theoretically be used to re-apply transactions since the last backup. Usually you'll do a full backup, and when it is complete it'll wipe the transaction logs and start them anew. When you restore, it looks at the logs and replays those transactions and the database ends up with nothing lost. (I had to do ths with an Exchange server last year and it's really cool when it works. I also had to do it with another database system and it sucks hard when it doesn't.)
Unfortunately, with remote systems, a lot of times they won't get fully backed up very often (not enough drive space, no easy way to rotate tapes, etc), so the log files get changed to 'circular' and will overwrite themselves after a certain period. This keeps you from running the system into the ground by filling up the hard drive, but it really hurts in disaster situations.
I'm not in the loop here, though, so I don't know how it was actually set up.
Actually, "modern" web practice is to use mySQL on web applications and it can be run without transaction support. I don't know if this is true for vBulletin.
__________________ Joe Mucchiello, Head Honcho at Throwing Dice Games
Priority One: Fatherhood.
Priority Two: Sanity.
Down on the list: seemingly real close to releasing a notebook essential. It's in layout! Has been for months now. (Just nod politely so I won't cry about this.)
"I've never heard of the term Flavor lawyer..." -- Scribble
As with most things, it's not that simple. A database application has to be written to 'package' each transaction the proper way, so that if it gets replayed, all the tables it touches are restored to a consistent state. For example, if I make a post here, it has to add the post record in one table, update my post count in the user table, add a record to the subscribed threads table for me, etc. From a system standpoint, it's better to lose all the data in a transaction then to have random bits saved without knowing what or where, so the consistency functionality is crucial.
It can be very trick to do properly, especially for a system that's designed to be end-user customizeable and (essentialy) non-mission-critical. There can also be a non-negligible performance hit if a lot of tables are involved.
MySQL not only would have to support transactions (and it does), but vBulletin would have to take advantage of that, the system would need sufficient space to handle it, etc.
As I posted in another thread, I work with clients that have full-time DBAs making good bucks to manage systems less intensive than ENworld, and they have big budgets and support personnel to do on-site backups, etc. The loss sucks, but given the nature of the beast, not 100% preventable. For a system that isn't tracking large amounts of critical data (or upon which a company's fortunes depend), you make compromises where you can. In any event, even if it had been fully logged, I can't imagine there would have been sufficient space on the system to replay every transaction since last December. The transactions are only supposed to protect you during temporary glitches, or at most between frequent and regular backups.
In any event, even if it had been fully logged, I can't imagine there would have been sufficient space on the system to replay every transaction since last December. The transactions are only supposed to protect you during temporary glitches, or at most between frequent and regular backups.
Well, it would be even better to combine logging with more frequent backups, yes. In fact, more frequent backups are just a good idea in general, or at least checkpoint snapshots so that the log only needs to continue from the checkpoint.
My impression from working with DBA's, is that logs are a "black box" process.
For example (back in the old days of "film") when you took a photograph, you didn't know if everything worked right till you developed it.
When you make a backup log, you can't tell if is a "good copy" until you try to use it.
My guess is that logs (on ENWordl) are normally taken every 2 or 3 months, but that the latest log was corrupted, and the Dec. 28th log was the most recent uncorrupted copy.
__________________ The meek shall inherit the earth. Links to my NPC's
Last edited by MavrickWeirdo; 10th May 2006 at 07:24 PM..
My impression from working with DBA's, is that logs are a "black box" process.
For example (back in the old days of "film") when you took a photograph, you didn't know if everything worked right till you developed it.
When you make a backup log, you can't tell if is a "good copy" until you try to use it.
My guess is that logs (on ENWordl) are normally taken every 2 or 3 months, but that the latest log was corrupted, and the Dec. 28th log was the most recent uncorrupted copy.
Ah, the logs I'm talking about are a bit different than what you're talking about They are write-ahead logs of all the transactions, rather than the actual snapshot of the database itself.
MySQL can support transaction logging if you use InnoDB tables. By default though, MySQL uses MyISAM tables which do not have transaction support.
There are various design decisions that are typically made when building up a website or application on a DB. Sometimes performance is more critical than recoverability, other times not. Sometimes you just inherit a site and work with that you have. Some sites can either get away with the lack of transaction support because a site is primarily read-only or the backend architecture allows you to get away with it.
Frequent automated backups are important whether you use transaction based DBs or not. I am sure the issue is being looked at now.
Ah, the logs I'm talking about are a bit different than what you're talking about They are write-ahead logs of all the transactions, rather than the actual snapshot of the database itself.
Is that like Audit Tables?
__________________ The meek shall inherit the earth. Links to my NPC's
Well, it would be even better to combine logging with more frequent backups, yes. In fact, more frequent backups are just a good idea in general, or at least checkpoint snapshots so that the log only needs to continue from the checkpoint.
Well sure it would. It would also be great to backup transactional logs throughout the day. Say, every 15 minutes to tape. That way, if you lose a drive, you can still restore from tape. It would be even better to ship those logs to your offsite continuity server. That way, if you lose a system, you can bring your standby system online. And if we wanted to be hardcore, we would also have a clustered system with an active/passive host, along with the offsite standby. All of these would have updates on the transactional logs and those logs would be backed up to media constantly throughout the day.
But to keep it in perspective, this is still just a web site for a niche hobby. Mind you, I love our hobby and I would love to see a highly redundant setup for the site. But is it reasonable? What are the costs associated with all that? Both in terms of hardware and software? What about people costs? Even with a volunteer effort, it still takes a lot of time and energy to bulldog all of that.
This stuff isn't easy to setup and manage without a good budget. And for all the fundraisers EN World has done, and all the money that has been generated, it still wouldn't come close to paying for an optimal setup.
EN World does have a pretty decent setup for a shoestring budget. If we want more backups in a timely manner, that is also possible. But the best way to do that is to run a solid offline backup on a regular basis. They keywords being "offline" and "regular". That does mean a regular time where the boards are down. As well, the way to keep those backups useful is to download them to a machine, kind of like Michael Morris' archive that he uploaded to get us back to this point. How long did it take for him to upload those? Something like 16 hours? So say EN World does a weekly offline backup. Morrus and company would just need to decide where that 16 hour window should fall.
The bottleneck will really be the network connection between EN World and the person executing the backup. The payload is the database for EN World. More bandwidth and/or a smaller database would reduce those times.
The whole picture is not easy to manage and I do not envy Morrus for the decisions he has to make. EN World is a global community and part of the value is the vast archive of information posted here. Downtime and data trimming (mass thread deletion) will have a negative impact on the community. But will it have more impact than another database corruption/massive downtime/massive loss of data?
Good question! We haven't been invited to help with that discussion. While I am happy to participate, I will defer to the Admins as to whether they would like me to be involved.
Don't really think that this would be very useful... only if it was known exactly, what transaction(s) caused the corruption (if it wasn't something else). But that is probably not so easy to determine on such a huge database.
Bye
Thanee
__________________
In our world, immortality is not for the living. The legend lives on!
In Memoriam Dave Arneson ( April 7th, 2009) & Gary Gygax ( March 4th, 2008).
Wondering what the Dungeon Tiles are like? Take a look here (up to DU5 Sinister Woods).
Yay! There are logs after all! This makes me feel better about all the money I have to pay to take these courses, since before I started, I knew the better part of nothing about databases, and now I know enough to say something that might work.
From what I've heard, the problem was something along these lines: The logs were not being flushed, and they kept growing in size until the disk holding the logs was full. At that point, corruption occurred.
Coupled with inconsistent backup schedules, you get the BOOM! that we all agonized over.