Newer
Older
grobid-corpus / fulltext / istex / tei / 4EE4CA4527C3CBD0A238BC82EF137E0445AEE879.training.fulltext.tei.xml
@zeynalig zeynalig on 26 Apr 2017 37 KB initialisation des corpus
<?xml version="1.0" ?>
<tei>
	<teiHeader>
		<fileDesc xml:id="0"/>
	</teiHeader>
	<text xml:lang="en">

		<p>The ability to inactivate a target gene transiently by RNAi <ref type="biblio">1</ref>
			has<lb/> greatly accelerated the analysis of loss-of-function phenotypes in C.<lb/>
			elegans and other organisms. Although several large-scale RNAi-<lb/>based screens have
			been used to study gene function in C. elegans <ref type="biblio">2–4</ref> ,<lb/> in
			total only about a third of the predicted genes have been analysed<lb/> so far.
			Genome-wide RNAi analyses would not only provide a key<lb/> resource for studying gene
			function in C. elegans but should also<lb/> address important issues in functional
			genomics, such as the global<lb/> organization of gene functions in a metazoan genome.
			In addition,<lb/> because more than half of the genes in C. elegans have a human<lb/>
			homologue, this kind of functional analysis in the worm should<lb/> provide insights
			into human gene function.<lb/></p>

		<head>Analysis of gene functions by RNAi<lb/></head>

		<p>Loss-of-function RNAi phenotypes can be generated efficiently by<lb/> feeding worms with
			bacteria expressing double-stranded RNA<lb/> (dsRNA) that is homologous to a target gene
				<ref type="biblio">5–7</ref> ; we previously<lb/> used this method to screen roughly
			87% of predicted genes on<lb/> chromosome I of C. elegans <ref type="biblio"/>(ref. 2).
			To screen most of the predicted<lb/> genes in C. elegans by RNAi, we constructed a
			library of bacterial<lb/> strains, each capable of expressing dsRNA designed to
			correspond<lb/> to a single gene. The library consists of 16,757 bacterial strains,<lb/>
			which in total correspond to about 86% of the 19,427 current<lb/> predicted genes in C.
			elegans with similar coverage across each<lb/> chromosome (see <ref type="table"
				>Supplementary Tables 1 and 2</ref>). Using this library,<lb/> we screened wild-type
			C. elegans hermaphrodites to identify genes<lb/> for which RNAi reproducibly results in
			sterility, embryonic or larval<lb/> lethality, slow post-embryonic growth, or a
			post-embryonic defect<lb/> (Methods). Such phenotypes were obtained with 1,722
			bacterial<lb/> strains (10.3% of those analysed; <ref type="figure">Fig.
			1a</ref>).<lb/></p>

		<p>Many strains gave rise to several reproducible RNAi phenotypes,<lb/> indicating that the
			targeted gene has many developmental roles. For<lb/> example, RNAi against Y77E11A.13a
			(which encodes a homologue<lb/> of the yeast Sec13p protein implicated in protein
			trafficking from<lb/> the endoplasmic reticulum to the Golgi <ref type="biblio">8</ref>
			) results in sterility,<lb/> embryonic lethality or uncoordinated movement. To
			simplify<lb/> subsequent genomic analyses, we defined three mutually exclusive<lb/>
			phenotypic classes: the nonviable (Nonv) class, consisting of<lb/> embryonic or larval
			lethality or sterility (with or without associated<lb/> post-embryonic defects); the
			growth defects (Gro) class, consisting<lb/> of slow or arrested post-embryonic growth;
			and the viable post-<lb/>embryonic phenotype (Vpep) class, consisting of defects in
			post-<lb/>embryonic development (for example, in movement or body shape)<lb/> without
			any associated lethality or slowed growth. The RNAi<lb/> phenotypes obtained on each
			chromosome are summarized in<lb/>
			<ref type="figure">Fig. 1a</ref>, and a full list of phenotypes by gene is given in <ref
				type="table">Supplemen-<lb/>tary Tables 2–4</ref>; these data are available publicly
			on Wormbase<lb/> (http://www.wormbase.org).<lb/></p>

		<p>To determine the effectiveness of the screen, we assessed our<lb/> ability to identify
			correctly the known loss-of-function phenotypes<lb/> for previously studied loci.
			Overall, we obtained RNAi phenotypes<lb/> for 63.5% of 323 detectable loci; almost all
			of those detected (92%)<lb/> produced an RNAi phenotype similar to the known mutant<lb/>
			phenotype (see <ref type="table">Supplementary Tables 5 and 6</ref>). More loci with
			a<lb/> Nonv phenotype were detected (77.9%) than loci with a Vpep<lb/> phenotype
			(42.2%). This difference is likely to arise because certain<lb/> classes of gene with
			Vpep phenotypes (for example, neuronally<lb/> expressed genes) are relatively resistant
			to RNAi <ref type="biblio">7,9</ref> and because Vpep<lb/> phenotypes are more difficult
			to detect in this screen (Methods).<lb/> Notably, the estimated rate of false-positive
			RNAi phenotypes is<lb/> very low (,1%; see <ref type="figure">Supplementary Fig.
			1</ref>). In addition, our results<lb/> correlate well with, and are as sensitive as,
			previous RNAi screens<lb/> (<ref type="biblio">refs 3, 4</ref>, and <ref type="figure"
				>Supplementary Fig. 1</ref>), indicating that RNAi data are<lb/> highly reproducible
			irrespective of the method used.<lb/></p>

		<p>The most common RNAi phenotype is embryonic lethality,<lb/> which was observed for 929
			strains (5.5%). On the basis of our<lb/> efficiency of detecting known embryonic lethal
			loci, this probably<lb/> includes over 70% of embryonic lethal genes and thus will be
			an<lb/> excellent starting point for more detailed analyses of the molecular<lb/>
			mechanisms of embryogenesis in C. elegans. Of the post-embryonic<lb/> phenotypes
			detected, the largest class was uncoordinated movement<lb/> (Unc), which is typically
			indicative of a defect in the neuromuscular<lb/> system. We also defined an RNAi
			phenotype for 33 close homo-<lb/>logues (BlastP E values less than 10 26 ) of human
			disease genes<lb/>
			<ref type="table">(Table 1)</ref>. Notably, many of these genes had Vpep phenotypes
			(50%<lb/> versus 16% among all genes with a phenotype), consistent with their<lb/>
			embryonic viability in humans, and thus may be useful for estab-<lb/>lishing C. elegans
			models of some human diseases.<lb/></p>

		<p>A small percentage of the bacterial strains were predicted to target<lb/> more than one
			predicted gene. Before carrying out global analyses,<lb/> we removed these ambiguous
			data to generate a set of 1,528 clones<lb/> for which RNAi phenotypes could be
			attributed to a single predicted<lb/> gene (Methods).<lb/></p>

		<head>Conservation and gene function<lb/></head>

		<p>We and others have previously found relationships between the<lb/> RNAi phenotype of a
			gene and its degree of conservation and<lb/> putative molecular function, using
			relatively small datasets <ref type="biblio">2–4,10</ref> .<lb/> Using the larger
			dataset obtained here, we have confirmed and<lb/> extended those conclusions. We find
			that C. elegans genes with an<lb/> orthologue in another eukaryote are much more likely
			to have a<lb/> detectable RNAi phenotype than all other genes (21% versus 6%).<lb/> In
			addition, highly conserved genes that are present as a single copy<lb/> in the C.
			elegans genome are more than twice as likely to have an<lb/> RNAi phenotype as those
			that are present in more than one copy<lb/> (31% versus 12%); this suggests that many
			recently duplicated<lb/> paralogues are at least partially functionally redundant or
			have<lb/> specialized functions that are not detectable in this screen.<lb/></p>

		<p>The highest cross-species conservation is seen among genes with<lb/> a Nonv RNAi
			phenotype, of which 52% have an orthologue in<lb/> another eukaryote; this shows that
			similar essential basal cellular<lb/> machinery is common to all eukaryotes. Indeed, 51%
			of C. elegans<lb/> orthologues of yeast essential genes <ref type="biblio">11</ref> have
			a Nonv RNAi phenotype.<lb/> Consistent with these findings, genes involved in the basic
			metabo-<lb/>lism and maintenance of the cell are significantly enriched for<lb/> having
			a Nonv RNAi phenotype <ref type="figure">(Fig. 2a)</ref>; by contrast, genes<lb/>
			involved in more complex processes that are expanded in metazoa,<lb/> such as signal
			transduction and transcriptional regulation, are<lb/> enriched for Vpep phenotypes <ref
				type="figure">(Fig. 2b)</ref>.<lb/></p>

		<head>Domain evolution and gene function<lb/></head>

		<p>To study further the relationship between the sequence and function<lb/> of a gene, we
			examined the domain composition of genes in each<lb/> phenotypic class. Of the 200 most
			abundant InterPro domains <ref type="biblio">12</ref> in<lb/> the C. elegans genome, 28
			show significant (P , 0.05) associations<lb/></p>

		<figure>Figure 1 Summary of RNAi phenotypes. a, Number of bacterial strains associated
			with<lb/> each RNAi phenotype. The Nonv (nonviable, including all phenotypic classes
			that result in<lb/> lethality or sterility), Gro (growth defects, including slow
			post-embryonic growth or larval<lb/> arrest) and Vpep (viable post-embryonic phenotype,
			including all other phenotypic<lb/> classes) categories are mutually exclusive; however,
			many genes are associated with<lb/> several specific RNAi phenotypes. Phenotypic classes
			are described in Methods. The<lb/> percentages are out of the total number of clones
			screened per chromosome. b, Relative<lb/> proportion of Nonv, Gro and Vpep phenotype on
			each chromosome.<lb/></figure>

		<p>with particular classes of RNAi phenotype <ref type="table">(Table 2)</ref>. Notably, of
			the<lb/> seven InterPro domains that are significantly associated with Vpep<lb/> RNAi
			phenotypes, most (six) are represented in the fly <ref type="biblio">13</ref> and<lb/>
			human <ref type="biblio">14,15</ref> genomes but not in the genome of budding yeast <ref
				type="biblio">16</ref> or<lb/> Arabidopsis <ref type="biblio">17</ref> . Genes with
			a Vpep phenotype by definition have no<lb/> associated lethality but instead have a role
			in the multicellular<lb/> animal (such as in movement or body shape); therefore,
			these<lb/> data suggest that many of the &apos;animal-specific&apos; functions encoded
			by<lb/> genes with Vpep phenotypes may have arisen through the evolution<lb/> of new
			domains.<lb/></p>

		<p>To explore this idea further, we examined whether genes with<lb/> animal-specific domains
			are, in general, more likely to have an<lb/> &apos;animal-specific&apos; function (that
			is, to have a Vpep RNAi phenotype).<lb/> C. elegans genes encoding at least one
			identifiable domain were split<lb/> into three groups: &apos;ancient&apos;, in which all
			encoded protein domains<lb/> are found in yeast, Arabidopsis, Drosophila and humans;
			&apos;animal&apos;, in<lb/> which at least one domain is found in Drosophila or humans
			but not<lb/> in yeast or Arabidopsis; and &apos;worm&apos;, in which any domain is
			found<lb/> only in C. elegans (37% are ancient, 8% are animal, 10% are worm<lb/> and 46%
			have no identifiable domain).<lb/></p>

		<p>Whereas genes with a Nonv RNAi phenotype are highly enriched<lb/> for being in the
			ancient class (<ref type="figure">Fig. 3</ref>; 90% of those with an<lb/> identifiable
			domain are &apos;ancient&apos;), genes with a Vpep RNAi pheno-<lb/>type are enriched for
			being in the animal class (16% of Vpep genes<lb/> but only 6% of Nonv genes are in the
			animal class). This supports<lb/> the idea that the evolution of new domains has been
			important for<lb/> the evolution of animal-specific gene functions. In addition, we<lb/>
			found that almost none of the genes in the &apos;worm&apos; class has an<lb/> essential
			role in C. elegans, although many have a Vpep phenotype.<lb/> This suggests that these
			genes have nematode-specific developmen-<lb/>tal functions and supports the view that
			the basal machinery of<lb/> eukaryotes is shared and not phylum-specific.<lb/></p>

		<head>The X chromosome<lb/></head>

		<p>The C. elegans genome is organized into five autosomes and a sex<lb/> chromosome (X) <ref
				type="biblio">18</ref> . Sex in C. elegans is determined by the number of<lb/>
			copies of the X chromosome: hermaphrodites have two copies of the<lb/> X chromosome,
			each of which is partially transcriptionally silenced<lb/> to ensure dosage compensation
			to and males have a single copy<lb/> (reviewed in <ref type="biblio">ref. 19</ref>). We
			explored whether there are functional<lb/> differences between genes on the autosomes
			and the X chromo-<lb/>some. We found that whereas the autosomes each have a similar<lb/>
			distribution of RNAi phenotypes, the distribution on the X<lb/> chromosome is markedly
			different <ref type="figure">(Fig. 1b)</ref>. This difference is<lb/> due almost
			completely to a reduction in the percentage of genes<lb/> with a Nonv phenotype <ref
				type="figure">(Fig. 1a)</ref>, an effect previously reported by<lb/> other groups
			using smaller datasets <ref type="biblio">3,10</ref> . Thus, there has been strong<lb/>
			selection against the encoding of essential functions on the X<lb/> chromosome.<lb/></p>

		<p>Previous studies have shown that X-linked genes are transcrip-<lb/>tionally silenced in
			the germ line during mitosis and early meio-<lb/>sis <ref type="biblio">20,21</ref> .
			Genes required for the basic cellular processes that are<lb/> essential for the
			viability of all cells (including those in the germ<lb/> line) might thus be expected to
			be absent from the X chromosome;<lb/> many such genes have Nonv RNAi phenotypes. We
			indeed found<lb/> that genes in the functional classes enriched for Nonv phenotypes<lb/>
			(such as protein synthesis) are highly underrepresented on the X<lb/> chromosome (<ref
				type="figure">Fig. 2c</ref> and <ref type="figure">Supplementary Fig. 2</ref>). The
			reduction in<lb/> the number of essential functions encoded on the X chromosome<lb/>
			therefore seems to be related to the transcriptional repression of<lb/> X-linked genes
			in the germ line. Differential expression of X-linked<lb/> genes does not explain the
			entire difference, however, because<lb/> X-linked and autosomal genes with similar
			germline expression<lb/> profiles have very different roles. For example, although genes
			with<lb/> oocyte-enriched expression are found in similar numbers on the<lb/> X
			chromosome and the autosomes <ref type="biblio">20</ref> , none of the X-linked
			oocyte-<lb/>enriched genes have a Nonv RNAi phenotype, whereas 19% of the<lb/> autosomal
			oocyte-enriched genes are essential.<lb/></p>

		<p>A second, more intriguing property of the X chromosome is that<lb/> it is enriched for
			genes with Vpep phenotypes (P , 0.01; chromo-<lb/></p>

		<figure type="table">Table 1 Thirty-three human disease gene homologues with an RNAi
			phenotype<lb/> Predicted gene<lb/> C. elegans locus<lb/> Human disease<lb/> Human
			gene<lb/> BlastP E value<lb/> RNAi phenotype<lb/>
			...................................................................................................................................................................................................................................................................................................................................................................<lb/>
			B0035.5<lb/> G6PD deficiency<lb/> G6PD<lb/> 1 £ 10 2176<lb/> Emb, Clr, Gro<lb/>
			B0350.2A<lb/> unc-44<lb/> Hereditary spherocytosis<lb/> ANK1<lb/> 0.00<lb/> Slu<lb/>
			C01G6.8<lb/> cam-1/kin-8<lb/> Insulin-resistant diabetes mellitus<lb/> INSR<lb/> 6 £ 10
			255<lb/> Unc, Pvl, clear patch<lb/> C01G8.5A<lb/> Neurofibromatosis<lb/> NF2<lb/> 1 £ 10
			2123<lb/> Unc, Lvl, Gro<lb/> C06A1.1<lb/> Zellweger syndrome<lb/> PEX1<lb/> 3 £ 10
			267<lb/> Emb, Bmd, Sck, Gro<lb/> C07H6.7<lb/> lin-39<lb/> MODY, type IV<lb/> IPF1<lb/> 5
			£ 10 214<lb/> Egl, Vul, Muv<lb/> C17E4.5<lb/> Oculopharyngeal muscular dystrophy<lb/>
			PABPN1<lb/> 3 £ 10 241<lb/> Emb, Unc, Lva<lb/> C29A12.3<lb/> lig-1<lb/> DNA ligase I
			deficiency<lb/> DNA ligase1<lb/> 1 £ 10 2167<lb/> Emb<lb/> C48A7.1<lb/> egl-19<lb/> Long
			QT syndrome 3<lb/> SCN5A<lb/> 2 £ 10 264<lb/> Egl, Clr<lb/> C50H2.1<lb/> Leydig cell
			hypoplasia<lb/> LHCGR<lb/> 9 £ 10 276<lb/> Gro<lb/> D2045.1<lb/> Spinocerebellar ataxia
			2<lb/> SCA2<lb/> 7 £ 10 209<lb/> Emb<lb/> F01G10.1<lb/> Wernicke–Korsakoff syndrome<lb/>
			TKT<lb/> 0.00<lb/> Emb, Clr, Gro<lb/> F07A5.7<lb/> unc-15<lb/> Tuberous sclerosis<lb/>
			TSC1<lb/> 1 £ 10 207<lb/> Unc, Prz, Egl<lb/> F11C1.6<lb/> nhr-25<lb/>
			Pseudohyperaldosteronism<lb/> NR3C2<lb/> 7 £ 10 224<lb/> Unc, Prz, Clr, Egl<lb/>
			F11H8.4<lb/> cyk-1<lb/> Nonsyndromic sensorineural deafness<lb/> DFNA1<lb/> 9 £ 10
			249<lb/> Emb, Adl, Rup, Clr<lb/> F20B6.2<lb/> vha-12<lb/> Renal tubular acidosis<lb/>
			ATP6B1<lb/> 0.00<lb/> Emb, Ste, Adl, Lvl, Prz<lb/> F54D8.1<lb/> Ehlers–Danlos syndrome,
			type IV<lb/> COL3A1<lb/> 1 £ 10 206<lb/> Dpy<lb/> F53G12.3<lb/> Chronic Granulomatous
			Disease<lb/> X-CGD<lb/> 3 £ 10 234<lb/> Bli, Mlt, Lvl<lb/> F58A3.2A<lb/> egl-15<lb/>
			Multiple venous malformations<lb/> VMCM<lb/> 1 £ 10 262<lb/> Egl<lb/> K04G2.8A<lb/>
			apr-1<lb/> Adenomatous polyposis of the colon<lb/> APC<lb/> 9 £ 10 234<lb/> Unc, Bmd,
			Lvl<lb/> K07A1.12<lb/> rba-2<lb/> Cockayne syndrome<lb/> CKN1<lb/> 6 £ 10 213<lb/> Emb,
			Pvl, Lvl<lb/> K08A8.2<lb/> Gonadal dysgenesis<lb/> SRY<lb/> 3 £ 10 231<lb/> Unc,
			Egl<lb/> K08C7.3<lb/> epi-1<lb/> Usher syndrome 2a<lb/> USH2A<lb/> 1 £ 10 2112<lb/> Ste,
			Unc, Muv, Dpy, Pvl, Rup<lb/> K11D9.2A<lb/> Darier–White disease<lb/> SERCA<lb/>
			0.00<lb/> Ste, Sck<lb/> M02A10.2<lb/> Hyperinsulinism<lb/> KCNJ11<lb/> 4 £ 10 278<lb/>
			Unc<lb/> R107.8<lb/> lin-12<lb/> Alagille syndrome<lb/> JAG1<lb/> 2 £ 10 290<lb/>
			Egl<lb/> R12B2.1<lb/> sma-4<lb/> Pancreatic carcinoma<lb/> MADH4<lb/> 2 £ 10 239<lb/>
			Sma, Dpy<lb/> T03F6.5<lb/> lis-1<lb/> Miller–Dieker lissencephaly syndrome<lb/> PAF<lb/>
			1 £ 10 2148<lb/> Emb<lb/> W05E10.3<lb/> ceh-32<lb/> Holoprosencephaly<lb/> SIX3<lb/> 1 £
			10 269<lb/> Unc<lb/> W10G6.3<lb/> ifa-2<lb/> Keratoderma<lb/> KRT9<lb/> 7 £ 10 226<lb/>
			Unc, Lvl, Mlt<lb/> Y47D3A.6A<lb/> tra-1<lb/> Grieg cephalopolysyndactyly syndrome<lb/>
			GLI<lb/> 6 £ 10 258<lb/> Rup, clear patch<lb/> Y76A2A.2<lb/> Menkes disease<lb/>
			ATP7A<lb/> 0.00<lb/> Prz, Adl, Unc<lb/> ZC506.4<lb/> mgl-1<lb/> Hypercalcemia<lb/>
			CASR<lb/> 2 £ 10 277<lb/> Gro<lb/>
			...................................................................................................................................................................................................................................................................................................................................................................<lb/>
			C. elegans genes with a human disease gene homologue are defined as those with a BlastP
			E value less than 1.0 £ 10 26 , taken from refs 38, 39. Shown are those with an RNAi
			phenotype. The<lb/> phenotypes are defined in Methods. MODY, maturity onset diabetes of
			the young. G6PD, glucose-6-phosphate dehydrogenase.<lb/></figure>

		<p>some II is also enriched for Vpep genes). In addition, significantly<lb/> (P , 0.01) more
			X-linked genes than autosomal genes encode<lb/> components of signalling pathways and
			transcription factors;<lb/> these genes are enriched for Vpep phenotypes. This
			concentration<lb/> of Vpep genes on the X chromosome may have evolutionary<lb/>
			benefits. Whereas a hermaphrodite worm that is heterozygous for<lb/> a mutant allele of
			an X-linked gene is likely to be phenotypically wild<lb/> type, a (hemizygous) male
			inheriting the mutant allele will be<lb/> mutant. Hermaphrodites could thus act as
			wild-type repositories<lb/> for mutant alleles of genes affecting the patterning,
			structure or<lb/> behaviour of worms; these alleles could then be selected for or<lb/>
			against in a dominant manner in the hemizygous male animal.<lb/> Because the number of
			males spontaneously arising from<lb/> hermaphrodites through meiotic non-disjunction
			events increases<lb/> markedly under stressful conditions (such as increased
			tempera-<lb/>ture), this haploselection for relatively subtle phenotypic changes<lb/>
			might be a powerful mechanism by which to adapt to a changing<lb/> environment.<lb/></p>

		<head>Large-scale functional gene clustering<lb/></head>

		<p>Our RNAi experiments targeted most of the genes in C. elegans, with<lb/> similar
			proportions of genes covered along each chromosome.<lb/> Using these data, we examined
			whether genes of similar function<lb/> cluster in specific regions of chromosomes.
			Unlike most animals,<lb/> C. elegans has holocentric chromosomes that lack a
			localized<lb/> centromeric region. The five autosomes have a central
			&apos;cluster&apos;,<lb/> where rates of recombination are low and where most
			studied<lb/> genetic loci are found, which is flanked by chromosome &apos;arms&apos;
			,<lb/> where recombination rates are more than tenfold higher <ref type="biblio"
				>22</ref> . These<lb/> clusters have characteristic features on all autosomes: lower
			repeat<lb/> content, greater conservation and greater representation by<lb/> expressed
			sequence tags (ESTs) <ref type="biblio">18</ref> . By contrast, the X chromosome<lb/>
			does not have a defined cluster region.<lb/></p>

		<p>In agreement with data derived from classical genetics, we found<lb/> that genes with
			RNAi phenotypes are enriched twofold in the cluster<lb/> regions relative to the arms
			(7.6% of genes on arms have an RNAi<lb/> phenotype versus 14.9% in the cluster regions;
				<ref type="figure">Fig. 4a</ref>). We next<lb/> examined the distribution of the
			Nonv, Gro and Vpep genes in the<lb/> genome (Methods). Notably, genes with a Nonv RNAi
			phenotype<lb/> are strongly enriched in large regions of the clusters of
			chromo-<lb/>somes I, II and III (P , 0.01; <ref type="figure">Fig. 4b</ref>): 36% of the
			Nonv genes lie in<lb/></p>

		<figure>Figure 2 Relative enrichment of Nonv, Vpep and X chromosome genes for different<lb/>
			functional classes. The functional classes are protein synthesis (P synth), RNA
			synthesis<lb/> (RNA synth), DNA synthesis and repair/cell cycle (DNA/CC), cellular
			architecture (Cell<lb/> arch), RNA binding (RNA bind), chromatin regulation (Chromatin),
			protein degradation<lb/> (Degrad), energy and intermediary metabolism (Metab),
			transcription factors (Txn factor),<lb/> nucleic-acid binding (NA bind), signal
			transduction (Signalling), small-molecule transport<lb/> (SM tport), specific proteases
			(Protease), retroviral-and transposon-derived sequences<lb/> (Viral), collagens
			(Collagen), genes with neuronal functions (Neuro), and Unknown. Shown<lb/> are the
			levels of enrichment among genes in each functional class for Nonv phenotypes<lb/> (a),
			Vpep phenotypes (b) or genes on the X chromosome (c); bars in black denote a<lb/>
			statistically significant overenrichment (P , 0.01). The grey bars in c represent
			an<lb/> underenrichment (P , 0.01). For reference, a line is drawn at a relative
			representation of<lb/> 1.0.<lb/></figure>

		<figure type="table">Table 2 InterPro domains associated with RNAi phenotypes<lb/> Nonv
			only<lb/> Elongation factor, GTP-binding<lb/> Cyclin<lb/> Ubiquitin domain<lb/> TPR
			repeat<lb/> Zinc-finger, CCHC type<lb/> Myb DNA-binding domain<lb/> Laminin-type
			EGF-like domain<lb/> DEAD/DEAH box helicase<lb/> Ubiquitin-associated domain<lb/>
			Zinc-finger, C 2 H 2 type<lb/> Mitochondrial substrate carrier<lb/> Protein kinase C,
			phorbol ester/DAG binding<lb/>
			.............................................................................................................................................................................<lb/>
			Gro only<lb/> Glycosyl transferase, family 2<lb/> Zinc-finger, RING<lb/> Phosphotyrosine
			interaction domain<lb/> Proline-rich extensin<lb/>
			.............................................................................................................................................................................<lb/>
			Nonv and Gro<lb/> G-protein b-subunit WD40 repeat<lb/> AAA ATPase<lb/> KH domain<lb/>
			Zinc-finger, C-X 8 -C-X 5 -C-X 3 -H type<lb/> RNA-binding region RNP-1 (RNA
			recognition)<lb/>
			.............................................................................................................................................................................<lb/>
			Vpep<lb/> Immunoglobulin/major histocompatibility complex<lb/> Collagen triple helix
			repeat<lb/> Immunoglobulin-like<lb/> EGF-like calcium-binding<lb/> Aspartic acid and
			asparagine hydroxylation site<lb/> Fibronectin, type III<lb/> Worm-specific repeat type
			1<lb/> ....
			.........................................................................................................................................................................<lb/>
			We examined the phenotypes of genes containing any of the 200 most abundant InterPro
			12<lb/> domains in the C. elegans genome; genes containing the listed domains were
			significantly<lb/> enriched (P , 0.05) for the indicated phenotypes, in order of
			decreasing significance. DAG,<lb/> diacylglycerol; EGF, epidermal growth
			factor.<lb/></figure>

		<p>these enriched regions, which represent about 13% of the genome.<lb/> By contrast, Nonv
			genes are underenriched on the autosomal arms<lb/> and the whole of the X chromosome.
			Functional redundancy<lb/> among paralogous genes might explain some of the
			underenrich-<lb/>ment, because these regions frequently overlap those areas of the<lb/>
			autosomes with increased gene duplication <ref type="figure">(Fig. 4b)</ref>.<lb/></p>

		<p>Genes with Vpep and Gro phenotypes are enriched in different<lb/> regions of the genome
			from those showing enrichment for Nonv<lb/> genes. Notably, genes with a Vpep phenotype
			are enriched signifi-<lb/>cantly in the centre of the X chromosome, despite the absence
			of a<lb/> recombinationally defined cluster <ref type="biblio">22</ref> . This suggests
			that the X<lb/> chromosome, like the autosomes, has a central accumulation of<lb/> genes
			with nonredundant functions; on the X chromosome, how-<lb/>ever, these genes are not
			required for viability, but rather for worm<lb/> behaviour or morphology. These findings
			suggest that in C. elegans<lb/> there is selective pressure for genes with similar
			organismal<lb/> functions to be colocalized in large domains of the genome.<lb/></p>

		<p>How such domains are maintained and what they represent<lb/> mechanistically are unclear.
			A possible hypothesis is that, perhaps<lb/> as a consequence of long-range chromatin
			regulation, genes in these<lb/> domains are transcriptionally co-regulated. To
			investigate this<lb/> possibility, we examined sets or &quot; mounts &quot; <ref
				type="biblio">23</ref> of C. elegans genes<lb/> identified by microarray analysis to
			share expression profiles;<lb/> we found that genes in each mount are enriched in
			distinct<lb/> regions of the chromosomes <ref type="figure">(Supplementary Fig.
			3)</ref>. Such large-<lb/>scale clustering has also been observed in both humans <ref
				type="biblio">24</ref> and<lb/> Drosophila <ref type="biblio">25</ref> .<lb/></p>

		<p>Notably, genes in mounts 7 and 11 are significantly enriched in<lb/> the same regions of
			the genome as are the Nonv genes (<ref type="figure">Fig. 4b</ref> and<lb/>
			<ref type="figure">Supplementary Fig. 3</ref>); in addition, these mounts are enriched
			for<lb/> genes with Nonv RNAi phenotypes. This suggests that in regions of<lb/> the
			genome that have concentrations of genes of similar functions,<lb/> there is large-scale
			broad transcriptional co-regulation. The scale of<lb/> these regions (over 1 megabase)
			indicates that this mode of<lb/> regulation is clearly distinct from that previously
			reported in<lb/> yeast <ref type="biblio">26</ref> and in C. elegans <ref type="biblio"
				>27</ref> , in which small clusters of nearly adjacent<lb/> genes are likely to be
			co-regulated, perhaps as a consequence of open<lb/> loops of chromatin <ref
				type="biblio">26,28</ref> . When an assembled genome sequence is<lb/> available for
			the nematode Caenorhabditis briggsae, which is closely<lb/> related to C. elegans, it
			will be intriguing to see whether these<lb/> functional domains are maintained as
			syntenic regions.<lb/></p>

		<p>In summary, we note that there are differences in gene function<lb/> between the X
			chromosome and the autosomes, as well as func-<lb/>tional clustering in different
			regions of the genome. Each chromo-<lb/>some has unique features—for example, chromosome
			V has few<lb/> essential genes relative to the other autosomes and has a high
			degree<lb/> of gene duplications, whereas chromosome III is enriched for Nonv<lb/>
			genes, and chromosome II is enriched for Vpep genes. These data<lb/> suggest that
			different chromosomes and regions of the genome may<lb/> be specialized for particular
			functions.<lb/></p>

		<figure>Figure 4 Distribution of RNAi phenotypes across the C. elegans chromosomes.<lb/> a,
			Genomic locations of genes with RNAi phenotypes. Horizontal yellow (arm regions)
			and<lb/> blue-green (cluster regions) bars represent C. elegans chromosomes; black bars
			indicate<lb/> regions enriched for duplicated genes (that is, those with a C. elegans
			homologue). Each<lb/> RNAi phenotype is represented by a single red (Nonv), green (Gro)
			or blue (Vpep) line<lb/> above the chromosomes. b, Chomosomal enrichment of genes with
			different RNAi<lb/> phenotypes. Overenrichment is indicated by filled boxes,
			underenrichment by open boxes.<lb/> No windows could be significantly underenriched for
			Gro or Vpep phenotypes owing to the<lb/> smaller sample sizes. The purple bars below the
			chromosomes represent regions that are<lb/> significantly (P , 0.01) over-or
			underenriched for genes in mount 11 (ref. 23). In the<lb/> enriched regions, 36% of Nonv
			genes lie in 13% of the genome, 11.6% of Gro genes lie in<lb/> 3.9% of the genome, and
			23.9% of Vpep genes lie in 7.8% of the genome.<lb/></figure>

		<figure>Figure 3 Conservation of domains in genes with different RNAi phenotypes. All
			predicted<lb/> genes were placed into one of four mutually exclusive classes on the
			basis of their InterPro<lb/> domain content. The &apos;ancient&apos; class comprises
			genes for which all predicted domains<lb/> are also encoded in the S. cerevisiae, A.
			thaliana, D. melanogaster and H. sapiens<lb/> genomes; the &apos;animal&apos; class
			comprises genes that contain any domain present in the<lb/> D. melanogaster or H.
			sapiens genomes, but not in S. cerevisiae or A. thaliana, and the<lb/> &apos;worm&apos;
			class comprises genes containing any domain present in the C. elegans genome,<lb/> but
			not in the other four. The proportions of All, Nonv and Vpep genes that fall into
			each<lb/> class are shown.<lb/></figure>

		<head>Conclusion<lb/></head>

		<p>We have used RNAi to examine the loss-of-function phenotypes of<lb/> about 86% of
			predicted genes in C. elegans. To our knowledge, this is<lb/> the first systematic
			functional analysis of a metazoan genome. Of<lb/> the 1,528 genes for which we could
			assign an RNAi phenotype, over<lb/> two-thirds had not been previously associated with a
			biological<lb/> function in vivo. In addition, we have created an RNAi feeding<lb/>
			library of bacterial clones that can be replicated and reused for<lb/> an unlimited
			number of future genome-wide RNAi screens in<lb/> C. elegans.<lb/></p>

		<p>Much as the genome sequence has provided an invaluable plat-<lb/>form for investigating
			C. elegans biology, these data and the<lb/> availability of this library will form a
			useful tool for functional<lb/> genomic studies in C. elegans. In the future, an
			analogous genome-<lb/>wide RNAi library approach could be extended to mammalian
			cells<lb/> by capitalizing on techniques using DNA constructs to encode<lb/> hairpin
			RNAs <ref type="biblio">29–34</ref> . We anticipate that in the coming years the<lb/>
			quantity of functional data derived from RNAi-based screens in<lb/> C. elegans and in
			other organisms will greatly expand our under-<lb/>standing of how genes function to
			bring about the phenotype of an<lb/> organism.<lb/> A<lb/></p>

		<head>Methods<lb/></head>

		<head>Generation of bacterial feeding library<lb/></head>

		<p>Polymerase chain reaction (PCR) products were generated using the Research Genetics<lb/>
			C. elegans GenePairs primer set of 19,213 primer pairs. The set of predicted genes
			used<lb/> includes only those genes thought to encode proteins. Primer sequences are
			listed on<lb/> the Kim Lab website at Stanford University
			(http://cmgm.stanford.edu/~kimlab/<lb/> primers.12-22-99.html). Current alignments of
			predicted GenePair PCR products on the<lb/> C. elegans genome are available at WormBase
			(http://www.wormbase.org). We generated<lb/> PCR products and constructed bacterial
			strains as described 2 . Inserts were checked for the<lb/> correct size and confirmed by
			PCR using the original GenePair oligomers. The whole-<lb/>genome library consists of
			16,757 clones, which represent 87.2% of the GenePairs set and<lb/> are predicted to
			correspond to 86.3% of C. elegans predicted genes 18 , exclusive of cross-<lb/>RNAi
			interactions (see below ). To assess the quality of the cloning procedure, we<lb/>
			sequenced 100 random clones and found all of them to be correct. For the 13% of<lb/>
			GenePairs for which no bacterial strain was made, either the GenePair failed to generate
			a<lb/> PCR product or the generated product could not be cloned into the T-tailed
			vector; up to<lb/> three cloning attempts were made for each GenePair. Supplementary
				<ref type="table">Table 2</ref> gives the<lb/> complete list of GenePairs and RNAi
			phenotype class, and indicates whether a clone is<lb/> available.<lb/></p>

		<head>Screening using RNAi by feeding<lb/></head>

		<p>We carried out RNAi as described <ref type="biblio">2,7</ref> . Embryonic lethality was
			defined as .10% dead<lb/> embryos, and sterility required a brood size of ,10 among fed
			worms (Ste) or their<lb/> progeny (Stp); wild-type worms under similar conditions
			typically have .100 progeny.<lb/> Each post-embryonic phenotype was required to be
			present among at least 10% of<lb/> analysed worms; the phenotypes assayed were Emb
			(embryonic lethal), Ste (sterile), Stp<lb/> (sterile progeny), Gro (slow post-embryonic
			growth), Lva (larval arrest), Lvl (larval<lb/> lethality), Adl (adult lethal), Bli
			(blistering of cuticle), Bmd (body morphological<lb/> defects), Clr (clear), Dpy
			(dumpy), Egl (egg-laying defective), Him (high incidence of<lb/> males), Lon (long), Mlt
			(moult defects), Muv (multivulva), Prz (paralysed), Pvl<lb/> (protruding vulva), Rol
			(roller), Rup (ruptured), Sck (sick) and Unc (uncoordinated).<lb/> Phenotypes expressed
			in adults (such as Egl) were difficult to score in this screen<lb/> because food became
			limiting at this time point; some of the late expressing phenotypes<lb/> will therefore
			have been missed. Detailed listings of GenePairs with corresponding RNAi<lb/> phenotypes
			are given in <ref type="table">Supplementary Tables 3 and 4</ref> and are available at
			WormBase<lb/> (http://www.wormbase.org).<lb/></p>

		<head>Bioinformatic analyses<lb/></head>

		<p>We carried out BlastP <ref type="biblio">35</ref> analyses for all C. elegans predicted
			genes against similar databases<lb/> (downloaded on 13 Feb 2002) for S. cerevisiae
			(6,183 entries), Arabidopsis (25,813 entries),<lb/> Drosophila (13,957 entries) and Homo
			sapiens (36,493 entries), or against C. elegans itself.<lb/> C. elegans genes with
			orthologues were defined as those with BlastP E values of less than<lb/> 10 210 with
			conservation extending over at least 80% of matched protein lengths; 21% of<lb/>
			predicted genes in C. elegans have such conservation. Predicted gene products were
			placed<lb/> into functional classes by manual inspection, primarily using data from
			Proteome,<lb/> InterPro release 4.0 (ref. 12) and BLAST analysis <ref type="biblio"
				>35,36</ref> . We could place 41% of all predicted<lb/> genes into 1 of 16
			functional classes <ref type="table">(Supplementary Table 2)</ref>, with the remaining
			59%<lb/> having unknown function.<lb/></p>

		<p>Predicted genes targeted by a given bacterial clone were determined by comparing<lb/>
			electronic PCR (ePCR) products corresponding to the bacterial clone insert (ftp://<lb/>
			ftp.ncbi.nlm.nih.gov/pub/schuler/e-PCR) <ref type="biblio">37</ref> obtained using
			chromosome DNA files from<lb/> the WS61 release of Wormbase
			(ftp://ftp.sanger.ac.uk/pub/wormbase) to gene predictions<lb/> from the same database.
			Roughly 94% of bacterial strains tested correspond to a single<lb/> predicted gene. To
			identify genes elsewhere in the genome that might be targeted by cross-<lb/>RNAi owing
			to strong homology of part of the gene to the ePCR product, we found genes<lb/> having
			.80% identity over a region of at least 200 nucleotides for each ePCR product by<lb/>
			parsing BlastN results against Wormpep release 71. In total, 1,528 clones with RNAi<lb/>
			phenotypes could be assigned directly to a single C. elegans predicted gene; these are
			listed<lb/> in <ref type="table">Supplementary Table 3</ref>. By contrast, 194 clones
			with RNAi phenotypes could not be<lb/> assigned definitively to a single predicted gene;
			these are listed in <ref type="table">Supplementary Table 4<lb/></ref> and include
			GenePairs with either no or multiple ePCR products or for which the ePCR<lb/> product is
			not predicted to overlap any coding sequence.<lb/></p>

		<p>We found chromosomal regions of significant over-or underrepresentation by<lb/>
			considering moving windows of 250 consecutive genes along the chromosomes, and by<lb/>
			examining whether the number of genes showing a particular phenotype or in a
			particular<lb/> expression cluster within a window was significantly different from that
			expected<lb/> according to the genomic mean, using a 1% significance level in a
			two-tailed test using<lb/> the binomial distribution. <ref type="figure">Figure 4b</ref>
			and <ref type="figure">Supplementary Fig. 3</ref> show continuous<lb/> significant
			windows, from the midpoint of the leftmost to the midpoint of the rightmost<lb/> window.
			Gene positions were taken from the predicted gene set from Wormbase release<lb/>
			WS61.<lb/></p> 

	</text>
</tei>