Designing a Random Table Generator

Janx · May 11, 2012

Morrus said:
So I'm having trouble describing even the basic concept to a non-gamer programmer; the one I'm talking to just doesn't understand what I'm talking about. We all know what random generators are in RPG terms, but translating that into English for someone who's never seen a random encounter table is proving hard!

Can anyone help me out with this, and figure out how to describe this system in plain English?

At this point, we've got the basic idea hashed out between 2 guys who understand the end-purpose and most of my content is leaning more toward developer-speak and not targetted to somebody totally new to the concept.

I think we'll need a doc briefing on the basic problem with example tables from a gaming book, just to frame the business need for your dev candidate(s) who couldn't even fathom the topic. Ideally, a developer who understands gaming would have a better eye to completing this, but work with what you have.

Then we can get into the general approach that all our prior text is about. Then the nitty-gritty details about how this is to be done.

What's your time table to do this? It'll take me some time to hammer out documents. I can start the first doc for briefing on the general problem.

I am not familiar with vBulletin's coding architecture, so get a guy who knows that and can envision how to hook the parts into that. That way, we can focus on explaining the parts, and they'll have their own good idea on how to hook it in.

Celebrim · May 11, 2012

What you guys are trying to create is essentially an ad hoc reporting tool.

You've got tables/views and a means of creating them, you've got means of creating 'has a' and 'is a' relationships between the tables, and you seem to be moving toward some sort of simple scripting language for formating the data (in to for example paragraphs and sentences).

First, don't enter tables row entries in terms of the number needed to be rolled in order to get that result. Rather, enter table row entires in terms of their weighted likelihood. The numbers needed to select a result are computed from the weights. How they are stored is a performance issue that hopefully your developer is good enough to think through, but from an interface level, its much easier to manipulate weights than number ranges.

Second, organize your interface in such a way that it tends to loosely enforce naming conventions. That makes sure that the database of table assets is searchable. In other words, don't have an interface that encourages the name BOBSEVILMONSTERS be entered into a single textbox in the interface. Rather, encourage naming conventions by having the table creating interface have multiple textboxes for entering things like 'creator', 'category', 'subcategory1', 'subcategory2' etc. That way, you'll tend to have the community adopt a common naming convention (or at least, a somewhat more organized one).

Also, I haven't seen it discussed yet, but it would be a good idea to be able to force selection from a table. Say you've decided that all hobgoblins carry broadswords rather than random weapons you'll want to be able to do something like <my_weapons>[1] or <my_weapons>{'broadsword'}.

A row in the table is a parent (the identity of the table), the row name, it's weight, and then a generator description in your scripting language (which can reference other tables which in turn have rows with their own generator description). Obviously, you'll need a sanity checker at the time the row is saved to make sure that you have a strict tree and not a grid (that can produce infinite loops).

My biggest worry as a developer at this stage is the potential resources this thing will consume. Ad hoc reporting is extremely resource intensive. I'd be thinking about making this thing have a thick client with only intermittant connection to a hub repository. I wouldn't want to do this with a simple web client/server interface unless the customer (EnWorld in this case) was willing to accept the potential server hit. That also deals with the issue of John changing his table definition in a way that is breaking - ei, John decides to reference a table which unbeknown to him causes someone elses table to be recursive or John decides to delete a table other people depend on. The thick client is relying on local copies, and the user doesn't have to accept John's update unless he wants to. Of course, that is a different architecture that Morrus seems to be wanting to do, in that it would involve a downloadable client and some sort of API, rather than just extending the bulletin board system.

Alan Shutko · May 11, 2012

Celebrim said:
What you guys are trying to create is essentially an ad hoc reporting tool.

I don't think so... It's more of a web-based version of TableMaster from the DOS/Win3.1 days or Tablesmith.

The software that's out there to do this already is extremely lightweight and works, so I'd say follow those patterns. (I'd also recommend to use one of their languages, probably Tablesmith as I haven't finished my clone of Tablemaster. That way, there's a bigger community of table authors.)

Janx · May 11, 2012

Alan Shutko said:
I don't think so... It's more of a web-based version of TableMaster from the DOS/Win3.1 days or Tablesmith.

The software that's out there to do this already is extremely lightweight and works, so I'd say follow those patterns. (I'd also recommend to use one of their languages, probably Tablesmith as I haven't finished my clone of Tablemaster. That way, there's a bigger community of table authors.)

in many ways, Morru's idea of users picking from a list of generators was more like a Canned Report model, and my idea of them using a console to submit text was more like a canned report.

technically, this stuff ran great on my 386 back in the day with very complex table trees. Running off of flat files would probably work very fast. I'm a little concerned if we store each Asset in a DB Row, as each tag would basically require a trip to the DB to fetch the next Asset and parse it.

The actual parsing should be fast. The fetching of the next Asset could be slow if we don't consider ways to do it faster.

On Celebrim's point about being able to make the table return a specific result, I agree. That's something I alluded to in an early post as part of the scripting language syntax.

On C's point about weighted values:
good idea, though I might adapt the idea to being another way to define table entries. Normal humans will look at tables in the DMG and enter them as begin/end ranges for each row. Some people will realize they only need to define the begin number, as the next entry caps off the previous's range. Other folks like C will just want to set a weighted value. The engine isn't really stressed to handle any of that.

To recap on performance:
the biggest performance barriers I see are getting from the current text to process the next tag in that text. This nesting or recursing could be slowed down by whatever steps we have to take to open up the asset.
It's also possible that are actual parsing technology for a string could be slow. I discount that as much by the fact that this technology ran just fine on 386s, including nested tables.

[MENTION=27229]MoR[/MENTION]uss:
is it possible* with vBulletin that all of the assets could be stored as flatfile on the webserver (or wherever attachments are stored, in a seperate directory) rather than being stored directly in the vbUlletin database?

this would simplify the design, provide a model that works online or offline. We'd still tie the screens into vBulletin, but we'd store the actual meat in flat files (perhaps one file per asset).

*possible as in "does this fit in with what you're allowed to do with Enworld's host and design preferences". I know we can make PHP do whatever we want.

Janx · May 11, 2012

As some hasty research, I found some links and info:

One guy made his own web generator: " + pageTitle + "

there's TableSmith, which is shareware. it is Windows .NET based:
TableSmith &[URL=http://www.enworld.org/forum/misc.php?do=dbtech_usertag_hash&hash=x2014]#x2014 Realm of Mythosa[/url]

There's Rollem, an opensource project in Python:
Rollem Table Roller

I'd examine those and see what ideas can be gleaned. The code could be converted to PHP, thus getting us a parser with relative ease (assuming you like their syntax).

Technically, I could reverse engineer TableSmith to get the code for its parser and convert to PHP, but I'm wary of it being a for-money application. Best to steer clear of trouble.

Celebrim · May 11, 2012

Janx said:
technically, this stuff ran great on my 386 back in the day with very complex table trees. Running off of flat files would probably work very fast. I'm a little concerned if we store each Asset in a DB Row, as each tag would basically require a trip to the DB to fetch the next Asset and parse it.

I have no doubt in my mind that with 500 tables and one user, a 386 could handle this job. In fact, looking at a few files, Tablesmith rarely did anything that complicated. 5-20 tables with 2-30 rows each seems to be more normal for its data. EnWorld has 126,000 users. What happens when you have say 100,000 assets and 100 simultaneous requests?

Tablesmith seems to have a full scripting language in the background, complete with variables, assignment statements, and conditionals. It seems that you are following along this path, which certianly does ensure the application is powerful. But I'm reminded of when my first boss wanted to have a web page that accepted as input an R script and then processed it. Do you really want to have that level of scripting available to users for execution on your machine? (At least this script seems 'safe', R could actually read and write files, etc.) But what happens if someone for example creates a table which returns 1d100 foos, and each foo consists of 1d100 bars, and each bar consists of 1d100 quxs, and so forth. In a just a few lines of individually reasonable looking code I can generate gigabytes of spam. Is that a problem you want to deal with by banning users that don't play nice, or do you want to find ways to just not allow it? I guess you could track the length of the output at each step and go: "If the output is over 10k, stop processing and send a message 'sorry, output too long buy a gold subscription'".

GhostBear · May 11, 2012

Celebrim said:
I have no doubt in my mind that with 500 tables and one user, a 386 could handle this job.

Basically. Not a lot of processing power should be needed for this, but a good amount of memory would be helpful. Table data could be cached so constant database lookups (or whatever back end storage you use) aren't required. Even then, modern DBs are really good about caching requests, so probably not worth worrying about too much.

Software-wise, you can take a complex string (like a stat block), pass a string into preg_replace_callback() (if using PHP for the project) to replace the tokens, and you're done.

One suggestion I would make, though. You might want to namespace table names by username [GhostBear:WeaponList]. That way you'll allow everyone to name their tables whatever they want without chances of conflicts.

This project isn't all that complicated in terms of software complexity. With a decent spec even a novice developer should have a pretty easy time developing what you want.

Do you plan on hosting the project on something like SourceForge or GitHub and releasing it under an open license?

Edit: As far as preventing abuse / accidental infinite loops, you can specify a maximum depth for processing recursion. Just return an error string instead of the table result (and possibly the table path as well) so that someone knows what happened and why.

Morrus · May 11, 2012

Celebrim said:
I have no doubt in my mind that with 500 tables and one user, a 386 could handle this job. In fact, looking at a few files, Tablesmith rarely did anything that complicated. 5-20 tables with 2-30 rows each seems to be more normal for its data. EnWorld has 126,000 users. What happens when you have say 100,000 assets and 100 simultaneous requests?

Not even the forums have 100 simulrequests. The front page - our busiest page by a factor of 10 - gets a few thousand every 15 minutes (the interval we measure at); even that's probably only a handful of people calling the database at any given moment. And 100,000 tables would take years to build - look at other features like the campaign manager or wiki here to see the percentage takeup. No more than half a percent of traffic ever ends up on ENW's extra features.

If we do develop that sort of traffic there, I'll spend the ad revenues on a suite if servers. But it honestly won't happen; there will be a small core of users and under 5 simulrequests at any given time.

Janx · May 11, 2012

Morrus said:
Not even the forums have 100 simulrequests. The front page - our busiest page by a factor of 10 - gets a few thousand every 15 minutes (the interval we measure at); even that's probably only a handful of people calling the database at any given moment. And 100,000 tables would take years to build - look at other features like the campaign manager or wiki here to see the percentage takeup. No more than half a percent of traffic ever ends up on ENW's extra features.

If we do develop that sort of traffic there, I'll spend the ad revenues on a suite if servers. But it honestly won't happen; there will be a small core of users and under 5 simulrequests at any given time.

That's pretty much my assumption of the traffic impact on performance.

Also, let's not confuse total quantity of tables in the system versus the impact of nested tables (much like [MENTION=4937]Celebrim[/MENTION]'s example of {1d100 Foo} which calls {1d100 Bar} and so on.

If we use 1 flat file per table or 1 DB row per table, modern OSes and database engines can handle that. it's a trivial quantity. Furthermore, as C pointed out, realistic complex tables, as seen from other tools go to a depth of maybe 20 tables at most.

That means, for any given user's immediate task, 20 tables is the "realistic" top end. That's a pretty light load for any system to go fetching for.

I would still take performance and protection from piggish scripting seriously. But the basic uses aren't likely to be a problem.

as [MENTION=6667527]GhostBear[/MENTION] suggested, preg_replace_callback() will make short work of the parsing (it is a tool that finds symbols, and then calls any code you want to process those symbols). That means it'll be easy to code it to look for anything that begins and ends with [] and call the ParseTag() function. From my perspective, the parsing problem is reminiscent of my Computer Science 202 class from many centuries ago.

I also definitely agree that namespace management (or naming for normal people) is going to be the only way we can keep the collection of assets manageable.

I do like the idea that we use certain fields of the Asset to define it's Name within the namespace architecture automatically. So when Morrus makes an Encounter table called Woodlands, it's full name would be "Morrus.Encounter.Woodlands" because the editor called it that when he picked the category of Encounter and called it Woodlands.

Incorporating the username into the naming gets a little messy if users are able to change their name, but it could be managed.

Morrus · May 11, 2012

Janx said:
That's pretty much my assumption of the traffic impact on performance.

Also, let's not confuse total quantity of tables in the system versus the impact of nested tables (much like [MENTION=4937]Celebrim[/MENTION]'s example of {1d100 Foo} which calls {1d100 Bar} and so on.

If we use 1 flat file per table or 1 DB row per table, modern OSes and database engines can handle that. it's a trivial quantity. Furthermore, as C pointed out, realistic complex tables, as seen from other tools go to a depth of maybe 20 tables at most.

That means, for any given user's immediate task, 20 tables is the "realistic" top end. That's a pretty light load for any system to go fetching for.

I would still take performance and protection from piggish scripting seriously. But the basic uses aren't likely to be a problem.

as [MENTION=6667527]GhostBear[/MENTION] suggested, preg_replace_callback() will make short work of the parsing (it is a tool that finds symbols, and then calls any code you want to process those symbols). That means it'll be easy to code it to look for anything that begins and ends with [] and call the ParseTag() function. From my perspective, the parsing problem is reminiscent of my Computer Science 202 class from many centuries ago.

I also definitely agree that namespace management (or naming for normal people) is going to be the only way we can keep the collection of assets manageable.

I do like the idea that we use certain fields of the Asset to define it's Name within the namespace architecture automatically. So when Morrus makes an Encounter table called Woodlands, it's full name would be "Morrus.Encounter.Woodlands" because the editor called it that when he picked the category of Encounter and called it Woodlands.

Incorporating the username into the naming gets a little messy if users are able to change their name, but it could be managed.

The other method is categories, tags, descriptions, and searches. Combined with ratings and popularity.

Designing a Random Table Generator

Janx

Hero

Celebrim

Legend

Alan Shutko

Explorer

Janx

Hero

Janx

Hero

Celebrim

Legend

GhostBear

Explorer

Morrus

Well, that was fun

Janx

Hero

Morrus

Well, that was fun

Similar Threads