Expression profiling
From Wikipedia, the free encyclopedia

Microarray technology is often used for gene expression profiling. It makes use of the sequence resources created by the genome sequencing projects and other sequencing efforts to answer the question, what genes are expressed in a particular cell type of an organism, at a particular time, under particular conditions?
For instance, they allow comparison of gene expression between normal and diseased (e.g., cancerous) cells. There are several names for this technology - DNA microarrays, DNA arrays, DNA chips, gene chips, others. Sometimes a distinction is made between these names but in fact they are all synonyms as there are no standard definitions for which type of microarray technology should be called by which name.
Microarrays exploit the preferential binding of complementary nucleic acid sequences. A microarray is typically a glass slide, on to which DNA molecules are attached at fixed locations (spots or features). There may be tens of thousands of spots on an array, each containing a huge number of identical DNA molecules (or fragments of identical molecules), of lengths from twenty to hundreds of nucleotides. The spots on a microarray are either printed on the microarrays by a robot, or synthesized by photo-lithography (similar to computer chip productions) or by ink-jet printing. There are commercially available microarrays, however many academic labs produce their own microarrays.
Microarrays that contain all of the about 6000 genes of the yeast genome have been available since 1997. The latest generations of commercial microarrays represent the entire human genome, more than 30,000 genes, on two microarrays.
Because the data generated by such experiments are highly multidimensional and often noisy, one must pay considerable attention to experimental design if valid biological conclusions are to be obtained. A key distinction that must be kept in mind is that between technical replicates versus biological replicates because both instrument noise and biological variation may represent confounding variables in the ANOVA calculations. If one divides a sample from the same organism or culture into several subsamples and runs each on a chip, these are technical replicates: they provide information about the variability in the technology, but they do not provide information about the biological variation between individual experimental units. Only by giving the same treatment to multiple experimental units can one estimate the biological variability.
Many biologists interpret changes in gene expression levels based on the fold ratio by which it has gone up or down between treatments. However, this is not a statistically valid approach, since it does not take into account the variability of that gene between replicates assigned the same treatment. A fourfold change in the measured expression level of a gene that varies greatly between samples given the same treatment is probably not significant, whereas a 1.4-fold change in the measured expression of a tightly regulated gene could be very significant.
Consider for example Down syndrome, which usually affects people born with a third copy of chromosome 21, so they have three copies of the genes on chromosome 21 rather than the normal two copies. Theoretically, this means about a 1.5-fold ratio compared to normal, but dosage compensation mechanisms will probably reduce this somewhat to perhaps 1.4-fold. By the cutoff of a two-fold change used by many biologists, few of the gene expression changes caused by trisomy 21 would be deemed significant, and yet the syndrome very clearly is significant.
Biologists embarking on expression studies are strongly advised to consult with biostatisticians before starting work, in order to estimate how much replication is needed to obtain sufficient statistical power. Using more replicates than needed may be wasteful, but using too few in an attempt to keep down costs is even more wasteful. If the experimental design is not appropriate, then even very sophisticated statistical analysis after the fact might not be able to answer the biological question that was being asked.