Henry said:
A thread in d20 Systems got me thinking abut something, and I was hoping a poster familiar with the science of Genetics (either a professional or a student) would help point me in the right direction.
Woot, my biochem degree comes in handy!
I was wondering how certain Genes and DNA sequences in a Genome are properly named and described. I have absolutely no clue, but I know that there has to be a proper I.D. system for identifying a certain gene or DNA sequence in a genome, especially since so much work has now been done on identifying parts of the human genome.
Ho boy.
Nomenclature
Generally, a gene is named after one of two things - the
phenotype (physical appearance) of the mutated gene, or the protein that it codes for. Now, a gene can take different forms - for instance, a fruit fly can have red eyes, or white eyes. These different forms of the gene are called
alleles. Two alleles make up the
genotype, which very simply causes you to have a certain phenotype.
With me so far?
Now, the most common allele in a population is called the
wild type (basically). The wild type allele is commonly denoted with a +. So to continue my example above, the wild type fruit fly eye color could be expressed as
eye+. Note that the plus sign should be a superscript.
eye+ would indicate a red-eye allele,
eye would indicate a white-eye allele. The
eye part... well, basically it's up to whomever discovers the gene.
Alright, great for fruit flies, what about humans? Well, the same principles exist for nomenclature - it's pretty much up to the discoverer, and usually has to do with phenotype or the protein produced. For instance, the beta-hemoglobin chain is commonly annotated as the Greek beta. Alternatively, you can call it by its full name - beta-hemoglobin chain, fruit fly eye color, and so on.
Identification
This is going to be
really tough to simplify, but I'm going to try.
If you know where a gene is, it's simple - say 3q15. This means, third chromosome, the shorter arm (p for the longer one), 15 units from the center.
If you know the genetic code, it's still pretty simple. DNA is double-stranded, and they are complementary strands - that is, if you know one strand, you know the other. What you can do to identify the gene is make something called a
primer. A primer is a short sequence of genetic code complementary to the one you are look for - so it'll stick to it. You attach a phosphorescent or fluorescent molecule to the primer - you make it glow. You can then disrupt the DNA, put in the primer, and it will stick naturally. You then seperate and isolate the glowing bits

. Like I said, very simple.
If you don't know the code, but you have the protein it produces, you can attempt to sequence the protein, reverse engineer from that the genetic code, and then make primers to try and find it in the DNA. It's a long and arduous process, and it's why it takes so long to find genes. This is the most common way of doing things. Of course, isolating a specific protein is pretty difficult, and there
are some problems with this approach, but once it's isolated the hard part is essentially over.
A good shortcut is if you've identified the protein in another animal, you can use that as a base to start - chances are if the proteins have similar functions, they'll have similar code. This isn't always true, but it works enough to be a viable shortcut to investigate.
I'd really need a more specific question to make a more coherent answer.
Oh, btw, biochem people unite! Actually, if you people with Ph.D.s wouldn't mind, I'd like to ask you guys a few questions in e-mail or PM, if that's okay with you guys.