Question for Geneticists & Biologists about DNA sequence nomenclature?


log in or register to remove this ad


Accession numbers don't carry any info unto themselves and at times they can be a pain to find, depending on what info about the gene they refer to you already know. I've found them to be a pain in the tail this semester.

IUPAC should make a forced invasion of biology to help us out. Of course I might be biased a tad because as an undergrad I got a BS chemistry as a primary major, with a BS biology secondary to that, and then reversed directions here in grad school as I've gone into molec. biology. The lack of standardized rules in some areas is striking, especially compared to the chem side of my experience. Eh. *shrugs* I enjoy it greatly though, and not just for a near fight in the department following a departmental seminar. (Never a calm thing when the department chair says, "That was a cute little presentation, however..." after a guest speaker's talk... Was half expecting a bio geek version of mortal kombat to start).
 

Shemeska said:
IUPAC should make a forced invasion of biology to help us out. Of course I might be biased a tad because as an undergrad I got a BS chemistry as a primary major, with a BS biology secondary to that, and then reversed directions here in grad school as I've gone into molec. biology. The lack of standardized rules in some areas is striking, especially compared to the chem side of my experience. Eh. *shrugs* I enjoy it greatly though, and not just for a near fight in the department following a departmental seminar. (Never a calm thing when the department chair says, "That was a cute little presentation, however..." after a guest speaker's talk... Was half expecting a bio geek version of mortal kombat to start).
The only feasible way I can think of to standardize naming (of genes) is to base it off of the protein that a gene produces. As for proteins, I think the complexity of the form and function of them makes it all but impossible to name them with any convention. I mean, we can seperate them into classes like serine proteases and so on, but actually naming one is an extremely complex problem. Hell, two different proteins with the same primary structure can have differing tertiary ones. Not to mention something like luciferase, which we can't even fold correctly.

IUPAC had it easy with chemistry - single molecules are usually much less complex than genes and proteins. And even so, there's still a lot of the older naming conventions floating around that are acceptable. Can you imagine how hard it would be to get everyone using a single standard for biochemistry and molecular bio, and remembering it? It'd be the equivalent, IMO, of trying to switch the US over to the metric system.
 

Shemeska: I totally agree with you... with my BS in chemistry but working as a molecular biologist, it gets to be a pain. Now, there are IUPAC names for proteins, of course. But do you really want to deal with a name like:

methionylprolylglycylglycylarginylarginylarginylglycylleucylvalylalanylprolylglutaminyl etc etc... ?

(The first few AA of the 1174-aa protein we work with. I'd rather just call it EAG.) And hell... even that uses trivial names, rather than calling glutamine 2-amino-4-carbamoylbutanoic acid.

The major problem is that the further you shorten the name, the less information you are indicating about the molecule in question.

Now, for genes:

(1) Probably the only genes we can construct systematic names for would be in those where there's already a full or nearly-complete genome sequence for the organism in question.

(2) We could probably pinpoint a gene by summing up as: Species-Chromosome-P/Q-distance from centromere. But even this is quite imprecise... where do we define the centromere as ending? How do you determine which strand is the sense strand? How do you label the exons... and so forth.

The only way to do it, it seems, is to keep a database of the information in question (GenBank/Entrez) and make that database as easily searchable as possible. The former works pretty well; the latter is less than finished. :)
 

LazarusLong42 said:
Shemeska: I totally agree with you... with my BS in chemistry but working as a molecular biologist, it gets to be a pain. Now, there are IUPAC names for proteins, of course. But do you really want to deal with a name like:

methionylprolylglycylglycylarginylarginylarginylglycylleucylvalylalanylprolylglutaminyl etc etc... ?

*shudder* How true, and you're right about the databases and making them user friendly as the best solution at the moment, and perhaps long term as well. Especially as how to handle different isoforms of a protein, all of which come from the same 'gene' by way of varient RNA processing. I'd rather not think about how to make a system to handle that on top of subunits, folding in different environments or in combination w/ other proteins or RNAs, etc.

:)
 

LazarusLong42 said:
Now, for genes:

(1) Probably the only genes we can construct systematic names for would be in those where there's already a full or nearly-complete genome sequence for the organism in question.

(2) We could probably pinpoint a gene by summing up as: Species-Chromosome-P/Q-distance from centromere. But even this is quite imprecise... where do we define the centromere as ending? How do you determine which strand is the sense strand? How do you label the exons... and so forth.
QUOTE]

Which puts us up to what 12 now? But even that convention would run into problems. At least with current conventions, within a species, the name will usually make sense. It is highly restricted to it's own field but since those are the people using it on a daily basis an easy name (EAG) makes sense. But the later point has issues as well as Shemeska pointed out naming a gene (or a predicted gene) by physical mapping like this hinders the ability to name all the various proteins that can be formed (or functional RNAs) from splicing etc. This doesn't even cover gene over-lap in the crowded bacterial genomes. Each group working with a species has come (more or less) to agreements within the community working on those organisms. The true problem comes in interspecific comparissons where genes have COMPLETELY different names even if they have the exact same function. And our understading of function in vivo is so narrow in nearly every organism that names based upon function become irrelevant quickly. The main problem biologists need to face right now is the consolidation of gene names. Different alleles with different names generated through different screens or discovery processes need to be consolidated now to reduce confusion in the literature 5 years from now. This is especially common in systems where mutations were a principle form of gene discovery (I am thinking principaly of Arabidopsis here)
 


Heh. Looks like everyone else got here before me. Yeah gene nomenclature is wierd stuff. You want some really silly gene names check drosophila, they have some silly ones in there. Also I'd like to mention that sequence, or parts of sequences, searching is posible and is usualy what we currently use to identify what potential proteins are. The acuracy varies but it could get you general information about what your looking at.

Leareth
 

LazarusLong42 said:
A mosquito cried out in vain:
"A chemist has poisoned my brain!"
The cause of his sorrow
Was 2,2-dichloro-
Diphenyltrichloroethane.

Umm... maybe I'm off my game (I am home sick after all) but don't that doesn't sound like an actual chemical.

Ethane has room for 6 substituants, and you've got seven listed (5 chloros and 2 phenyls) unless I'm misreading.

Do you mean
And the 2,2-dichloros can be on the phenyls (one substituant on each carbon).

Very confused.

Zhaneel

Edit: Okay, so DDT. Which structure I see here [https://www.sigmaaldrich.com/cgi-bi...oductNumber=ALDRICH-386340&VersionSequence=1]

Now, I am really wondering where you got the name from.

[Yes, I am a geek, and I am fully employed as a medcinal chemist]
 
Last edited:

Remove ads

Top