15
A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3 $ Kazunori Shibuya, 1 Izumi Obayashi, 1 Shuichi Asakawa, Shinsei Minoshima, 2 Jun Kudoh, and Nobuyoshi Shimizu* Department of Molecular Biology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo 160-8582, Japan Received 1 July 2003; accepted 30 September 2003 Abstract Recently, we identified multiple unique sequences in the 21q22.3 region and predicted them to be a cluster of genes encoding hair-specific keratin-associated proteins (KAPs). Detailed computer-aided analysis of these clustered genes revealed that the cluster spans over 165 kb and consists of 21 KAP-related sequences including 16 putative genes and 5 pseudogenes. These were further divided into two subfamilies, KRTAP12 (KRTAP12.1 12.4 and KRTAP12.5P) and KRTAP18 (KRTAP18.1 18.12 and KRTAP18.13P 18.16P). All 16 putative genes possess several intragenic repeat sequences and apparently belong to the high-sulfur KAP gene family (16 – 30% cysteine content) known for nonhuman mammalian species. Transcripts were detected by RT-PCR analysis for all 16 putative KAP genes and their expression was restricted to hair root cells (radix pili cells) and not found in 28 other tissues, including skin. All 16 KAP genes produced unspliced transcripts, indicating their nature to be that of active intronless genes. Interestingly, all these KAP-related genes are located within introns of the recently identified gene TSPEAR (approved gene symbol C21orf29), 214 kb in size. Surprisingly, the transcriptional direction of 8 of the 16 active genes is the same as that of C21orf29/TSPEAR. This finding suggests a novel transcription mechanism in which C21orf29/TSPEAR gene transcription passes over the multiple transcriptional termination sites of the KAP genes. D 2003 Elsevier Inc. All rights reserved. Keywords: Chromosome 21; Keratin-associated protein; Gene cluster Previously, as an international collaborative project, we reported the complete DNA sequence of the euchromatic region of human chromosome 21 and presented a compre- hensive gene catalogue, in which 127 confirmed genes and 98 predicted genes were listed [1]. In the report, we described multiple unique sequences on the 21q22.3 region and predicted them as a cluster of keratin-associated protein (KAP) genes, but molecular nature of individual genes in the cluster had not been fully characterized. Mammalian hair is produced in the hair follicles, which are composed of several different cell types to form the medulla, cortex, and outer cuticle of the hair shaft as well as the surrounding inner and outer root sheaths [2]. Major compo- nents of hair are keratin intermediate filaments (KIFs) and the above-mentioned KAPs. The KIFs, also called hair keratin, consist of a number of similar but distinct proteins, which are classified into two types (type I and type II). The KIF genes are located as two separate clusters on two different human chromosomes, 17 and 12 [3–9]. Contrary to human KIFs, little is known about human KAPs. However, nonhuman mammalian KAPs have been better characterized and they are classified into three types based on their amino acid composition: ultrahigh-sulfur KAP (>30% cysteine content), high-sulfur KAP (16 –30% cysteine content), and high-gly- cine/tyrosine KAP (35 –60% glycine and tyrosine content) [2]. These three types of mammalian KAPs are further divided into 17 subfamilies [2,10 – 15]. KAPs have scarce sequence similarity among subfamilies, although KAPs har- bor some common motifs consisting of several amino acid residues. Interestingly, a mouse gene cluster including the Krtap12-1 gene that encodes a high-sulfur KAP has been 0888-7543/$ - see front matter D 2003 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2003.09.024 $ Sequence data from this article have been deposited with the DDBJ/ EMBL/GenBank Data Libraries under Accession Nos. AB076347 – AB076364. * Corresponding author. Fax: +81-3-3351-2370. E-mail address: [email protected] (N. Shimizu). 1 These authors contributed equally to this work. 2 Present address: Photon Medical Research Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Hamamatsu 431- 3192, Japan. www.elsevier.com/locate/ygeno Genomics 83 (2004) 679 – 693

A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Embed Size (px)

Citation preview

Page 1: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

www.elsevier.com/locate/ygeno

Genomics 83 (2004) 679–693

A cluster of 21 keratin-associated protein genes within introns of

another gene on human chromosome 21q22.3$

Kazunori Shibuya,1 Izumi Obayashi,1 Shuichi Asakawa, Shinsei Minoshima,2

Jun Kudoh, and Nobuyoshi Shimizu*

Department of Molecular Biology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo 160-8582, Japan

Received 1 July 2003; accepted 30 September 2003

Abstract

Recently, we identified multiple unique sequences in the 21q22.3 region and predicted them to be a cluster of genes encoding hair-specific

keratin-associated proteins (KAPs). Detailed computer-aided analysis of these clustered genes revealed that the cluster spans over 165 kb and

consists of 21 KAP-related sequences including 16 putative genes and 5 pseudogenes. These were further divided into two subfamilies,

KRTAP12 (KRTAP12.1–12.4 and KRTAP12.5P) and KRTAP18 (KRTAP18.1–18.12 and KRTAP18.13P–18.16P). All 16 putative genes

possess several intragenic repeat sequences and apparently belong to the high-sulfur KAP gene family (16–30% cysteine content) known for

nonhuman mammalian species. Transcripts were detected by RT-PCR analysis for all 16 putative KAP genes and their expression was

restricted to hair root cells (radix pili cells) and not found in 28 other tissues, including skin. All 16 KAP genes produced unspliced

transcripts, indicating their nature to be that of active intronless genes. Interestingly, all these KAP-related genes are located within introns of

the recently identified gene TSPEAR (approved gene symbol C21orf29), 214 kb in size. Surprisingly, the transcriptional direction of 8 of the

16 active genes is the same as that of C21orf29/TSPEAR. This finding suggests a novel transcription mechanism in which C21orf29/TSPEAR

gene transcription passes over the multiple transcriptional termination sites of the KAP genes.

D 2003 Elsevier Inc. All rights reserved.

Keywords: Chromosome 21; Keratin-associated protein; Gene cluster

Previously, as an international collaborative project, we

reported the complete DNA sequence of the euchromatic

region of human chromosome 21 and presented a compre-

hensive gene catalogue, in which 127 confirmed genes and

98 predicted genes were listed [1]. In the report, we

described multiple unique sequences on the 21q22.3 region

and predicted them as a cluster of keratin-associated protein

(KAP) genes, but molecular nature of individual genes in

the cluster had not been fully characterized.

Mammalian hair is produced in the hair follicles, which are

composed of several different cell types to form the medulla,

0888-7543/$ - see front matter D 2003 Elsevier Inc. All rights reserved.

doi:10.1016/j.ygeno.2003.09.024

$ Sequence data from this article have been deposited with the DDBJ/

EMBL/GenBank Data Libraries under Accession Nos. AB076347–

AB076364.

* Corresponding author. Fax: +81-3-3351-2370.

E-mail address: [email protected] (N. Shimizu).1 These authors contributed equally to this work.2 Present address: Photon Medical Research Center, Hamamatsu

University School of Medicine, 1-20-1 Handayama, Hamamatsu 431-

3192, Japan.

cortex, and outer cuticle of the hair shaft as well as the

surrounding inner and outer root sheaths [2]. Major compo-

nents of hair are keratin intermediate filaments (KIFs) and the

above-mentioned KAPs. The KIFs, also called hair keratin,

consist of a number of similar but distinct proteins, which are

classified into two types (type I and type II). The KIF genes

are located as two separate clusters on two different human

chromosomes, 17 and 12 [3–9]. Contrary to human KIFs,

little is known about human KAPs. However, nonhuman

mammalian KAPs have been better characterized and they

are classified into three types based on their amino acid

composition: ultrahigh-sulfur KAP (>30% cysteine content),

high-sulfur KAP (16–30% cysteine content), and high-gly-

cine/tyrosine KAP (35–60% glycine and tyrosine content)

[2]. These three types of mammalian KAPs are further

divided into 17 subfamilies [2,10–15]. KAPs have scarce

sequence similarity among subfamilies, although KAPs har-

bor some common motifs consisting of several amino acid

residues. Interestingly, a mouse gene cluster including the

Krtap12-1 gene that encodes a high-sulfur KAP has been

Page 2: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

K. Shibuya et al. / Genomics 83 (2004) 679–693680

located on a particular region of mouse chromosome 10,

which is known to be homologous to a distal part of human

chromosome 21q22.3 [10].

Here, we report a novel cluster of human KAP-related

genes consisting of 16 active genes and 5 pseudogenes,

which were identified by the computer-aided dot-matrix

Page 3: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

K. Shibuya et al. / Genomics 83 (2004) 679–693 681

analysis of genomic DNA sequence and experimental

evidence. Initial characterization of this novel cluster of

KAP-related genes is described in terms of their genomic

organization, gene structure, transcript variants, tissue

expression, and molecular evolution.

Results

Dot-matrix analysis of genomic DNA sequence of the

21q22.3 region

We have carried out dot-matrix analysis of the ge-

nomic sequence of 200 kb of the 21q22.3 region after

masking high-frequency repetitive elements by Repeat-

Masker, and we in fact identified 21 KAP-related sequen-

ces. Further analysis with computer-aided manual

inspection enabled us to predict the presence or absence

of open reading frames (ORFs) for these 21 KAP-related

sequences. Consequently, we were able to classify them

into 16 putat ive genes (KRTAP12.1 – 12.4 and

KRTAP18.1–18.12) and 5 pseudogenes (KRTAP12.5P

and KRTAP18.13P – 18.16P) (Fig. 1). Although

KRTAP18.13P is relatively conserved in length, it does

not contain any ORFs because of the lack of an initiation

codon and many mutations such as substitutions or

insertions–deletions. Other pseudogenes (KRTAP12.5P

and KRTAP18.14P–18.16P) are much shorter than the

putative KAP genes (Table 1).

Interestingly, 16 putative genes and 5 pseudogenes

belonged to the high-sulfur KAP gene family, but they

were further classified into two subfamilies (KRTAP12

and KRTAP18) based on their nucleotide sequence and

amino acid sequence. More specifically, 12 KAP genes

(KRTAP18.1–18.12) in one subfamily are distinct from any

KAP genes that have been deposited with the DDBJ/EMBL/

GenBank database, whereas 4 KAP genes (KRTAP12.1–

12.4) in the other subfamily are very similar to mouse

Krtap12.1 located on mouse chromosome 10 [10]. Thus,

dot-matrix analysis followed by computer-aided manual

inspection enabled us to identify all of these KAP-related

genes and this finding would not have been possible if only

prediction software (GENSCAN,MZEF, and Grail) had been

used for sequence analysis. Nonetheless, GENSCAN cor-

Fig. 1. The KAP gene cluster region on human chromosome 21. (A) Location of

genes, pseudogene, DNA marker, exons (predicted by GENSCAN, MZEF, and X

Alu (red), MIR (green), LINE (blue), LTR (yellow), DNA element (aqua), simple r

categorized with different colors into ‘‘excellent’’ (green), ‘‘good’’ (blue), and

independent gene unit instead of exons. Lines above GC content diagram indicate

direction. Eight known genes are TRPM2/TRPC7 (Accession No. AB001535), C2

and AF426270), C21orf29/TSPEAR (Accession No. AJ487962), UBE2G2 (A

(Accession No. Z50022), and ITGB2 (Accession No. M15395) and a pseudoge

21q22.3 region. Human 200-kb KAP gene cluster sequence was located on both

homology within a 50-nucleotide sliding window. Lines diagonal to both axes rep

gene (red) and pseudogene (aqua).

rectly predicted 2 KAP genes (KRTAP18.9 andKRTAP18.10)

but incorrectly predicted 1 KAP gene (KRTAP18.1) as a part

of a multiple-exon gene (Fig. 1).

A cluster of genes within introns of another gene

Recently, a novel cDNA for TSPEAR (thrombospon-

din-N domain and epilepsy-associated repeat domain) has

been reported [16]. TSPEAR cDNA corresponds to the

full-length cDNA for a putative gene, C21orf29 [1].

Although the TSPEAR cDNA sequence (GenBank Acces-

sion No. AJ487962) was reported to be reconstructed on

the basis of a human IMAGE cDNA clone (GenBank

Accession No. BC021197), EST sequences from various

organisms, and the corresponding genome sequence,

precise information was not provided [16]. Thus, we

made a homology search in the EST database of various

species and found 22 mammalian ESTs including 14

human ESTs, 4 bovine ESTs, 3 mouse ESTs, and 1

sheep EST. Since exon 1 of the TSPEAR gene was

supported by only 1 bovine EST, we performed RT-

PCR against cDNAs from various human tissues using

a pair of primers located on exon 1 and exon 2 of the

TSPEAR gene (Fig. 2). We detected TSPEAR cDNA

amplification in mRNAs from various tissues including

kidney, pancreas, heart, lymph node, and fetal lung but

not in hair root (data not shown). We therefore concluded

that the presence of TSPEAR cDNA was confirmed by

experimental evidence, including one IMAGE clone (Ac-

cession No. BC021197), two bovine ESTs (Accession

Nos. BM088875 and BI847891), and a human cDNA

fragment obtained by RT-PCR in this study (Fig. 2). We

located the TSPEAR gene in a region overlapping with a

KAP gene cluster. More deliberate sequence analysis

elucidated an intriguing genomic organization in that all

16 putative KAP genes and 5 KAP-related sequences

(pseudogenes) are located within a larger gene C21orf29/

TSPEAR, particularly in its intron 1 or intron 2 (Fig. 2).

Expression of KAP-related genes in the 21q22.3 region

We have examined the expression of KAP transcripts by

RT-PCR with various primer sets specific to each KAP gene

(Table 1).

KAP gene cluster region and the summary of computer analyses indicating

grail), repeats, and GC content. Repeats are classified into seven types, i.e.,

epeat or low complexity (gray), and other (olive). Predictions by Xgrail are

‘‘marginal’’ (red). Two colors in predictions by GENSCAN denote an

the direction from centromere to telomere and lines below indicate reverse

1orf30 (Accession No. AL117578), C21orf90 (Accession Nos. AF426269

ccession No. AF032456), SMT3H1 (Accession No. X99584), C21orf1

ne is IMMTP (Accession No. AP001755). (B) Dot-matrix analysis of the

horizontal and vertical axes. The strength of each dot reflects the sequence

resent high sequence homology. KAP genes were categorized into putative

Page 4: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Table 1

A cluster of KAP genes on 21q22.3

Gene or

transcript

cDNA

accession

number

Primer position Sense primer Anti-sense primer Annealing

temp. (jC)Enzymea PCR

product

(bp)

CDS

(bp)

G+C

of

CDS

(%)

M.W.

(Da)

Cys

(%)

Ser

(%)

Pro

(%)

Isoelectric

point

KRTAP18.1 AB076347 254806–255816b TCACTCACTCACACCTCCCG GAGACACGGGGACCCGTCCT 55 HF 1011 849 67.05 28,663.28 26.59 20.56 11.70 7.25

KRTAP18.2 AB076348 266056–267126b TCACTCACTCACACACCTCCCC ATCCCCAACCAGCGACCAGCGA 55 HF 1071 768 66.19 25,614.67 27.05 22.35 10.19 6.96

KRTAP18.2s AB076349 266056–267126b TCACTCACTCACACACCTCCCC ATCCCCAACCAGCGACCAGCGA 55 HF 366 381 67.52 13,398.10 3.17 15.87 14.28 5.86

KRTAP18.3 AB076350 273645–274381b TCACTCACTCACGTCTCCCC AACTCTGGAGAAACGGGACC 55 HF 737 666 67.97 22,403.29 25.33 19.90 12.66 7.38

KRTAP18.4 AB076351 289351–290732b AGCTCAACCCCCAGCACAGCA GTCAAAGTGCAGGAGCAATTC 55 HF 1382 1206 65.47 40,427.45 27.43 21.44 11.47 6.58

KRTAP18.5 AB076352 295309–296226b CCAGCTCACGTCTTCCCCAC CCTAACCCGAGTCAGGACCA 55 HF 918 816 66.11 27,566.05 26.93 19.55 12.17 6.78

KRTAP18.6 AB076353 306899–308136b CTCCACCAGTTCAACCCCAGCAT TAAGACAAAGAGCCTGCCCCAT 65 EL 1238 1098 65.42 36,779.97 26.57 22.46 10.13 5.87

KRTAP18.7 AB076354 316253–317435b CATCTCCTCCAGTTCAATCC TCAGGCTTTGGATGATCTTAAG 55 HF 1168 1113 65.06 37,370.88 26.48 20.54 11.35 6.80

KRTAP18.8 AB076355 327752–328627b ACCACCCAGTCCAGCACCCA AGGACAGGACCGGAGCCGGC 55 HF 876 780 65.97 26,316.62 25.48 19.30 10.81 7.30

KRTAP18.9 AB076356 3797–4975c TCACACACTCACTTACACCTCC CGTCCCCAACCAGCGACCAGCG 55 HF 1186 879 66.77 29,975.63 25.34 19.52 11.98 7.24

KRTAP18.9s AB076357 3797–4975c TCACACACTCACTTACACCTCC CGTCCCCAACCAGCGACCAGCG 55 HF 468 480 68.98 16,335.51 6.91 18.86 15.09 8.05

KRTAP18.10 AB076358 14045–14966c TCACTCACTCACACACCTCCCC CAAGACAAAGAGCCTGCCCCAC 55 HF 922 756 64.96 25,569.59 24.30 20.31 11.15 6.86

KRTAP18.11 AB076359 23088–24094c ACACTCACTTACACCTCCCCCA TCCTGAGACTGGAGAATCCTGC 65 EL 1007 897 64.83 30,264.94 26.17 21.47 10.40 7.22

KRTAP12.4 AB076360 30887–31333c AGACCAGCCCTGTCCTCTGCG GGAGTTCAGAGAGCCTGCTGG 55 HF 447 339 65.31 11,432.14 23.21 11.60 16.07 7.24

KRTAP12.3 AB076361 34606–35015c CAGACATCACCATCCTCCTCCC TCTGGGGGTCCACCAGATGCT 60 EL 410 291 64.14 9,965.42 22.91 17.70 14.58 7.65

KRTAP12.2 AB076362 42971–43580c TTATCCAGCCACACGCCACCATG CTGTCACATTCTCAATCCAGAA 55 EL 610 441 61.14 14,688.75 21.23 21.91 13.01 7.58

KRTAP12.1 AB076363 58341–58815c TTATCCAGCCACACGCCACCATG AGGGCTCCAGATCATTCTATTA 55 EL 475 291 61.25 9,737.06 21.87 19.79 13.54 7.67

KRTAP18.12 AB076364 73844–74716c AGCTCAACCCCCAGCACGGCT GAGCAGCCGAGGGGCCAGTAG 55 HF 873 738 67.12 25,106.39 25.71 19.59 12.65 7.32

KRTAP18.13P — 79059–79630c,d CAGCTCCTGCACGCCCTTGT AGTGGATAGGTAAGCCGTGGTTG 65 EL 572 — — — — — — —

KRTAP12.5P — 63030–63301c — — — — — — — — — — — —

KRTAP18.14P — 260420–260468b — — — — — — — — — — — —

KRTAP18.15P — 332739–332879b — — — — — — — — — — — —

KRTAP18.16P — 337092–337151b — — — — — — — — — — — —

a HF, Expand High Fidelity PCR System; EL, Expand Long Template PCR System.b Positions of cDNA or pseudogenes on AP001754.c Positions of cDNA or pseudogenes on AP001755.d KRTAP18.13P region homologous to other KRTAP18’s locates between 78891 and 79541 on AP001755.

K.Shibuya

etal./Genomics

83(2004)679–693

682

Page 5: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Fig. 2. Genomic organization of the KAP gene cluster region. Horizontal scale is consistent with that of Fig. 1A. Arrows indicate transcriptional

direction. Vertical bars on the top and the second lines indicate 16 KAP genes, 5 KAP pseudogenes, and the IMMTP pseudogene, for which the

numbers above or below the bars indicate the name of the respective KAP gene. ‘‘P’’ after a number indicates a KAP pseudogene. The numbers above

the vertical bars on the third line indicate exons of the C21orf29/TSPEAR gene, which were determined based on TSPEAR cDNA sequence (Accession

No. AJ487962). A human IMAGE clone (Accession No. BC021197), a bovine EST (Accession No. BM088875), and another bovine EST (Accession

No. BI847891) are shown in the fourth, fifth, and sixth lines, respectively. Arrows on both ends of the dashed line indicate the region confirmed by RT-

PCR in this study.

K. Shibuya et al. / Genomics 83 (2004) 679–693 683

We detected at least one RT-PCR product each with the

size expected from primer positions on the genomic sequence

for all 16 putative KAP genes using an mRNA preparation

from hair root cells (Fig. 3A, lanes +). More importantly,

these RT-PCR products contained the predicted coding

sequences (see below). Thus, all 16 putative KAP genes

were confirmed to be active for transcription.

Some RT-PCR products with smaller size were also

detected for two KAP genes (KRTAP18.2 and KRTAP18.9)

(Fig. 3A). Sequencing analysis confirmed that these tran-

scripts are spliced forms (KRTAP18.2s and KRTAP18.9s)

(Fig. 3A, indicated by gray and black arrows, respectively).

Splice donor and acceptor sites of KRTAP18.2s and

KRTAP18.9s were consistent with the consensus ‘‘GT–

AG’’ sequence. Also, one of the five pseudogenes

(KRTAP18.13P) with abnormal coding sequence generated

a significant amount of transcripts (Fig. 3A).

Next, we examined expression of some KAP-related

genes (KRTAP18.1, KRTAP18.2, KRTAP18.11, and

KRTAP18.12) in various human tissues other than hair root

cells. Expression of these KAP genes was not detected in

human skin (Fig. 3B) or in any of 27 human tissues examined

(Fig. 3C). Moreover, KRTAP18.12 could be amplified even

from diluted hair root cDNA (40 and 4 pg) (Fig. 3C). Thus,

expression of KAP genes seems restricted to hair root cells at

the RT-PCR level.

Isolation of KAP cDNAs and their sequence analysis

We have cloned cDNA fragments that were amplified

by RT-PCR for all 16 KAPs using mRNAs from hair root

cells and determined their nucleotide sequence. The sizes

of the ORFs of the 16 KAP cDNAs ranged from 291 to

1206 bp as an intronless form (Table 1). The deduced

amino acid sequences revealed that the 16 human KAPs

are rich in Ser and Pro as well as Cys (Table 1). The

amino acid sequence deduced from the spliced forms

(KRTAP18.2s and KRTAP18.9s) revealed several features

different from active KAP genes, especially in the content

of Cys (Table 1). However, these proteins produced from

spliced mRNAs may not function because only the first

26 or 59 amino acids at the N-terminal portions are

identical with active KAPs (KRTAP18.2 and KRTAP18.9)

and most of the ORFs are derived from the 3V UTR

(Fig. 4).

Moreover, in the process of cDNA sequence analysis,

several polymorphisms were detected (Table 2). The poly-

morphisms of the KAP genes included single-base substi-

Page 6: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Fig. 3. Expression patterns of KAP genes. (A) RT-PCR analysis of KAP genes in hair roots. Gray and black arrows indicate KRTAP18.2s and KRTAP18.9s,

respectively. A band indicated by an asterisk in the blot for KRTAP18.2 (lane +) was KRTAP18.9s apparently generated by false priming of the primer set for

KRTAP18.2. (B) RT-PCR analysis of KAP genes in skin. ‘‘+’’ and ‘‘�’’ indicate the presence or absence of reverse transcriptase. ‘‘G’’ indicates genomic DNA.

‘‘C’’ indicatesG3PDH (glyceraldehyde-3-phosphate dehydrogenase) cDNA. RT-PCR using G3PDH primers was used as positive control. Detailed procedure is

described under Material and methods. (C) RT-PCR analysis of KRTAP18.12 in 27 human tissues (0.2 ng cDNA each) and diluted hair root cDNA (40 and 4 pg).

K. Shibuya et al. / Genomics 83 (2004) 679–693684

Page 7: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Fig. 4. Two spliced forms, KRTAP18.2s and KRTAP18.9s. Bands with smaller size were detected in KRTAP18.2 and KRTAP18.9. Open box, closed black box,

hatched box, and dotted box indicate noncoding region, KAP-coding region, coding region translated only in spliced form, and coding region predicted from

genomic sequence (Accession Nos. AP001754 and AP001755) only in spliced forms, respectively.

K. Shibuya et al. / Genomics 83 (2004) 679–693 685

tution at various positions and a 15-bp insertion and a 7-bp

deletion at two positions.

To determine the structural features of KAPs, we have

compared the amino acid sequences of every member of

the subfamilies (Fig. 5). For 12 members of the KAP18

subfamily, it was found that the ORFs contain several

repetitive units with diverse numbers and sequences (Fig.

5A). The KAP18 members contain several common

motifs (CC[R/Q]P[S/T], CCRT, and CC[V/K]P) that are

frequently found in the ultrahigh-sulfur-type KAPs [2],

and hence they may be considered as ultrahigh-sulfur

type rather than high-sulfur type. Four members of the

KAP12 subfamily also contain common motifs such as

CCVP, CCQP, and CCQPS, but the density of these

motifs is much lower than in KAP18 (Fig. 5B). These

KAPs possess four to six units of f20 amino acids with

consensus sequence of CQX3CX4CX4CX3S. Similar di-

rect repeats of 10 amino acids with a CQ motif at the N-

terminal are also found in mouse KAP13 protein [11]

(Accession No. AF031485).

Sequence analysis of 5V upstream regions of KAP genes

We have analyzed the 5V upstream sequences of the

KAP genes by sequence alignment using the program

CLUSTALW [17]. It was elucidated that TATA-like

sequences are located at sites between �121 and �68

bp upstream of each initiation codon in the KRTAP18

subfamily members (Fig. 6A) and �75 or �74 bp

upstream of each initiation codon in the KRTAP12

subfamily members (Fig. 6B). Since there is a four and

a half times expansion of an approximately 40-bp se-

quence in KRTAP18.12, the TATA-like sequence is ex-

ceptionally located at �237 bp upstream of the initiation

codon (Fig. 6A). No typical CpG islands were identified

in the 5V upstream region of the KAP genes.

To search further for a common sequence motif that

might specifically regulate the expression of the KAP

genes, we have analyzed the 5V upstream sequences of

all 16 KAP genes using the Gibbs sampler program

[18,19]. Accordingly, it was elucidated that a highly

conserved sequence with the 14-bp consensus sequence

CAVCAACAAGGAAG (V = A, C, or G) is located

between �64 and �51 bp upstream of each TATA-like

sequence (Fig. 6C). Analysis with MatInspector [20]

indicated that the 14-bp consensus sequence is not

identical to the known binding sites for transcription

factors and it could be a new type of binding site.

Evolutionary relationship of mammalian KAP family

members

To elucidate the evolutionary relationship between

known mammalian KAPs and human KAPs identified

in this study, a phylogenetic tree was constructed (Fig.

7). It was revealed that the evolutionary relationship

between subfamilies KAP18 and KAP12 is extremely

close, distinguishing them from other subfamilies. This

Page 8: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Table 2

Polymorphisms identified in this study

Gene name Positiona Base Codon Amino acid Gene name Positiona Base Codon Amino acid

KRTAP18.1 116 C CCG P KRTAP18.6 81 T TCT S

116 T CTG L 81 C TCC S

937 G 3V UTR 126 C TGC C

937 T 3V UTR 126 T TGT C

KRTAP18.2 43 A AAC N 144–145 15-bp insertionb

43 G GAC D 615 A ACA T

319 C CCT P 615 G ACG T

319 A ACT T 621 A TCA S

324 G GTG V 621 G TCG S

324 C GTC V 679 G GCC A

349 G GCT A 679 A ACC T

349 C CCT P KRTAP18.9 330 T TGT C

530 T CTG L 330 C TGC C

530 C CCG P 545 G TGC C

624 G CCG P 545 A TAC Y

624 A CCA P 622 T TTG L

721 C CGC R 622 C CTG L

721 G GGC G 769 T TGC C

859 G 3V UTR 769 C CGC R

859 A 3V UTR 837 T CCT P

925 T 3V UTR 837 G CCG P

925 C 3V UTR 1044 C6 3V UTR

936 G 3V UTR 1044 C7 3V UTR

936 C 3V UTR 1055 G 3V UTR

958 T 3V UTR 1055 A 3V UTR

958 G 3V UTR 1070 G 3V UTR

967 G 3V UTR 1070 T 3V UTR

967 T 3V UTR 1079 T 3V UTR

KRTAP18.4 184 C CGT R 1079 G 3V UTR

184 T TGT C 1109–1115 7-bp deletion 3V UTR

216 T TGC C KRTAP18.10 214 A ACC T

216 C TGT C 214 C CCC P

475 A ATC I KRTAP18.11 638 A TAT Y

475 G GTC V 638 C TCT S

KRTAP18.5 58 G GAC D KRTAP12.4 330 T ACT T

58 A AAC N 330 C ACC T

703 G GTG V KRTAP12.3 50 G CGC R

703 C CTG L 50 A CAC H

803 C CCC P

803 G CGC R

a ‘‘A’’ of initiation codon, ATG in each KAP gene is regarded as +1.b Inserted sequence is CCCAGCTGCTGCGCC.

K. Shibuya et al. / Genomics 83 (2004) 679–693686

suggests that KAP genes in these two subfamilies may

have been generated by duplication of a common ances-

tral gene, although the amino acid sequences of the

KAPs at the present time are distinct (Fig. 5).

Discussion

Gene clusters by tandem gene duplication

It has been postulated that tandem gene duplication has

contributed to the diversification of the structure and func-

tion of a number of gene clusters such as the T cell receptor

[21], IgH [22], IgE [23], and Ign [24]. Similarly, the

diversification of KAPs may have been generated through

tandem gene duplication. It is quite intriguing why such

diversity was necessary for establishing the physiological

function of hair-forming KAP molecules during the evolu-

tionary process.

Classification of KAP genes

Sixteen KAP genes identified in this study are classi-

fied into two subfamilies of which one subfamily consists

of 12 KAP genes and the other subfamily consists of 4

KAP genes. Since the first subfamily genes do not

resemble any known KAP genes from other mammalian

species, we designated these genes as a new group

KRTAP18.1–18.12. The second subfamily, KRTAP12.1–

12.4, was named after the mouse Krtap12.1 gene because

of their extreme similarity [10]. However, there are some

discrepancies in naming (Fig. 7), and hence reconsidera-

tion of the nomenclature of the KAP genes should be

necessary in the near future.

Page 9: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Fig. 5. Multiple alignments of the amino acid sequences of KAPs on 21q22.3. (A) Multiple alignment of the amino acid sequences of 12 KAP18’s. (B) Multiple alignment of the amino acid sequences of 4

KAP12’s. White on black background, black on light gray background, and black on dark gray background signify identical, conserved, and similar amino acids, respectively. Colored sequences represent common

sequence motifs reported previously [2].

K.Shibuya

etal./Genomics

83(2004)679–693

687

Page 10: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Fig. 5 (continued).

K. Shibuya et al. / Genomics 83 (2004) 679–693688

Coding sequences

To predict coding sequences, we at first chose the longest

possible ORF between the TATA-like sequence and the

putative poly(A) signal in the genomic sequence for each

KAP gene. TATA-like sequences of all KAP genes locate

ca. 100 bp upstream of the putative initiation codon,

whereas putative poly(A) signals locate ca. 290 bp down-

stream of the termination codon in the genomic sequence.

We then performed RT-PCR analysis using primer sets

(Table 1) designed to amplify the entire region of the longest

ORF to examine whether it is transcribed. For all 16 putative

KRTAP genes (12 KRTAP18 genes and 4 KRTAP12 genes),

RT-PCR revealed that mRNAs that include the entire ORF

are in fact transcribed. However, we could not rule out the

possibility that the coding frame may start from a common

motif, MS[V/I]CSS, which is conserved in all KRTAP18

genes (Fig. 5A). For KRTAP12 genes, we concluded that all

of them have well-conserved ORFs that start with the

common sequence MCHTS (Fig. 5B).

Moreover, the C-terminal amino acid of all the KAPs is

Cys with one exception, KAP18.4 (Fig. 5). For KAP18.4,

we speculated that the termination codon TGA after Leu401

may have been TGT or TGC (for Cys) and that the third

letter may have been substituted with ‘‘A’’ during evolution,

because neighboring amino acids of Leu401 in most other

KAP18 members are Cys. Before this substitution took

place, KAP18.4 may have had 10 additional amino acids,

CRPVCSRPAC, at the C-terminal, which is very similar to

some of the other KAP18’s. However, it is unclear how this

C-terminal truncation influences the function of KAP18.4.

Expression specificity

Expression of KAP genes on 21q22.3 was specific to hair

root cells and could not be detected in any other human

tissues at the RT-PCR level (Fig. 3). A Blast search of the

nucleotide sequence databases (DDBJ, EMBL, and Gen-

Bank) revealed several human ESTs that match KRTAP18.2,

KRTAP18.11, and KRTAP18.12. Two ESTs (Accession Nos.

W70177 and W69912) matching KRTAP18.11 are registered

as ESTs derived from fetal heart, whereas six ESTs (Acces-

sion Nos. BF057518, BF058205, BF057369, BF515903,

BF221518, and BM719379) matching KRTAP18.12 are

registered as derived mainly from fibrotheoma of the ovary.

Another EST (Accession No. AJ003277) matching

KRTAP18.2 is not mentioned for tissue specificity. Since

KAP is a major component of mammalian hair, it is unlikely

that the reported expression in these tissues has any phys-

iological significance. In fact, expression in these tissues

could not be confirmed by our expression study (Fig. 3C).

The 14-bp consensus sequence CAVCAACAAGGAAG

(V=A, C, or G) identified in the 5V upstream region includes

the UHS-1 motif (ACAAGGAA), which is found in the

promoter region of hair ultrahigh-sulfur-type KAP genes

expressed in human cuticle, mouse cortex/cuticle, and rabbit

cortex [25]. The UHS-1 motif is considered a possible

regulatory sequence in the specific expression of KAP genes

in hair root cells.

Genes within a gene

It is striking that all the KAP gene family members are

located within introns of another gene, C21orf29/TSPEAR.

The transcriptional direction of eight KAP genes is the same

as that of C21orf29/TSPEAR, whereas the remaining eight

KAP genes are transcribed in the opposite direction (Fig. 2).

The latter situation has been reported for several case but not

for the former case [26]. Thus, further study on the former

situation may uncover a new mechanism of transcriptional

regulation of the C21orf29/TSPEAR gene. It is easily

conceived that RNA polymerase II must overcome at least

eight transcriptional termination sites of the KAP genes

when the transcription of C21orf29/TSPEAR is taking place.

Three possibilities can be raised: (1) Incomplete mRNAs

may be synthesized from C21orf29/TSPEAR by stopping at

every transcriptional termination site of the KAP genes. In

this case, a poly(A) tail would be added to the 3V end of the

incomplete mRNAs in the same way in which KAP mRNAs

are polyadenylated, and thus they would become good

templates for RT-PCR of KAP mRNA. Since PCR products

were not detected by RT-PCR using appropriate primer sets

for KAP genes in the tissues that express C21orf29/TSPEAR

mRNA (kidney, pancreas, heart, lymph node, and fetal

lung), this possibility may be ruled out. (2) There may be

special sites that are recognized only when KAP gene

transcription is taking place and are not recognized as

transcriptional termination sites for C21orf29/TSPEAR. (3)

Page 11: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

Fig. 6. Multiple alignments of the nucleotide sequences in the 5V upstream region of the KAP genes. (A) Multiple alignment of the nucleotide sequences in the 5V upstream region of the KRTAP18 genes. Sequences

colored in red or green display the components, repetitive units of the inserted expansion in KRTAP18.12. (B) Multiple alignment of the nucleotide sequences in the 5V upstream region of the KRTAP12 genes.

Numbering for nucleotide sequences regarded the ‘‘A’’ of the initiation codon as the +1 position. White on black background, black on light gray background, and black on dark gray background signify identical,

conserved, and similar nucleotides, respectively. Pink color indicates putative TATA-like sequence. Double-headed arrow indicates highly conserved region except for the TATA-like sequence among the genes. (C)

Multiple alignment of the highly conserved sequences among 16 active KAP genes. Numbering for nucleotide sequences regarded the ‘‘A’’ of the initiation codon as the +1 position. White on black background,

black on light gray background, and black on dark gray background signify identical, conserved, and similar nucleotides, respectively. Values in parentheses at the extreme right represent the probability of finding

a conserved sequence motif.

K.Shibuya

etal./Genomics

83(2004)679–693

689

Page 12: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

K. Shibuya et al. / Genomics 83 (2004) 679–693690

There may be a novel mechanism to pass over the tran-

scriptional termination sites of KAP genes when C21orf29/

TSPEAR gene transcription is taking place. We believe this

possibility is most likely because it was demonstrated that a

poly(A) signal introduced into the intron of the rabbit h-globin gene does not efficiently function in an experimental

Page 13: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

K. Shibuya et al. / Genomics 83 (2004) 679–693 691

system [27] and this was thought to be due to the predom-

inance of splicing over polyadenylation [28]. Thus, tran-

scriptional termination sites of KAP genes in the introns of

the C21orf29/TSPEAR gene may not be recognized as

termination signals when C21orf29/TSPEAR is being tran-

scribed, by which production of incomplete mRNA is

prevented.

Conclusion

In this study, we established the genomic organization of

21 KAP-related genes, including 16 active KAP genes on

the human chromosome 21q22.3 region. To our knowledge,

there are 97 genes identified for the growth and formation of

human hair. They include 9 type I KIF genes on 17q12–q21

[29], 6 type II KIF genes on 12q13 [30], 37 KAP genes on

17q12–q21 [15], 16 KAP genes on 21q22.3 (this study),

and 30 KAP genes on 21q22.11 (I. Obayashi et al.,

unpublished). Such multigene families appear consistent

with the previous estimation that 50–100 keratin-related

genes (KIFs and KAPs) are being expressed in the growing

hair [25,31]. In this regard, comparative DNA sequence

analysis of KAP gene clusters among primates, rodents, and

even marine mammals will be significant to trace the origin

and evolution of mammalian hairs.

Materials and methods

Computer analysis of DNA and protein sequences

Computer analysis was performed for a genomic DNA

sequence of 200 kb (Accession Nos. AP001754 and

AP001755), which was determined and assembled by us

[1]. The repetitive elements in the genomic sequence were

masked through the RepeatMasker2 Web server (Smit and

Green; RepeatMasker at http://ftp.genome.washington.edu/

RM/RepeatMasker.html). The exon prediction was carried

out using Xgrail version 1.3c [32], GENSCAN [33], and

MZEF [34]. Homology search of nucleotide and amino acid

sequences was carried out through the BLAST server (http://

www.ncbi.nlm.nih.gov/BLAST/) at NCBI using BLASTN,

BLASTP, and BLASTX [35]. Multiple alignment of DNA

and amino acid sequences and drawing of the phylogenetic

Fig. 7. Phylogenetic tree of mammalian KAPs including human KAPs identified in

CLUSTALW version 1.81 [18]. Database accession numbers are as follows: mo

AF345295; mouse KAP16.1, AF345291; mouse KAP16.3, AF345293; mouse K

D89901; mouse KAP16.8, AF345298; mouse KAP6.1/HGTpII.1, D86420 or D8

M95719; mouse KAP16.7, AF345297; mouse KAP6.2/HGTpII.4, D89902; mouse

2, AF162800; mouse KAP13, AF031485; mouse KAP5.4, M37760; mouse KA

KAP12.1, AF081797; sheep KAP2.3/BIIIA3, U60024; mouse high-sulfur kera

AB052934; human KAP1.7, AB055057; human KAP1.6, AB052868; rat high-sulf

human KAP17.1, KAP9.4, KAP9.8, KAP9.3, KAP9.2, KAP9.9, KAP4.2, KAP4

KAP2.4, KAP3.3, KAP3.2, KAP3.1, KAP1.5, KAP1.3, KAP9.7, KAP9.5, KA

KAP2.3, KAP2.1A/B, KAP16.1, and KAP1.4, see [15].

tree using the neighbor-joining method were performed with

CLUSTALW version 1.81 [17]. Graphic representation of

the alignment of amino acid and nucleotide sequences was

obtained by the program BOXSHADE 2.15 (available at

http://www.ch.embnet.org/software/BOX_form.html). The

dot-matrix analysis was carried out using the DOTTER

program [36] with dynamic threshold control. Analysis of

conserved DNA sequence motif was carried out using the

Gibbs sampler program [18,19]. Search for transcription

factor binding sites was performed with MatInspector [20].

Preparation of RNAs from human radix pili cells and

dermal cells

One hundred fifty pieces of hair were collected from 9

females and 11 males after receiving informed consent, and

only hair roots were excised and kept in liquid nitrogen until

use. Total RNA was extracted using the RNAeasy Mini Kit

(Qiagen). Briefly, frozen samples were solubilized in a

denaturing buffer containing guanidine isothiocyanate and

homogenized with intense stirring, and then total RNA was

purified using the RNAeasy mini spin column. To exclude

contamination of genomic DNA, purified RNA was treated

with 5 U of RNase-free DNase I (Nippon Gene) at 37jC for

10 min and then DNase I was inactivated by heat treatment

at 80jC for 10 min. Eventually, approximately 30 Ag of

RNA was purified from 150 pieces of hair. A dermal piece

of approximately 36 mg was surgically obtained from the

brachium of one of the authors. The dermal sample was

frozen in liquid nitrogen, ground using a Cryo-Press CP-

50W (Microtec Co., Ltd.), and solubilized in a denaturing

buffer containing guanidine isothiocyanate. Following the

same procedure used for hair roots, approximately 6.2 Ag of

RNA was purified from 36 mg of dermis.

RT-PCR of KAP cDNAs and partial TSPEAR cDNA

A 12-Al mixture containing 1 Ag of total RNA and 50

pmol of oligo(dT)18VN primer [5V-T18V(A or C or G)N-

3V] was incubated at 65jC for 10 min; chilled on ice;

mixed with 4 Al of 5� first-strand buffer, 2 Al of 0.1 M

DTT, and 1 Al of 10 mM dNTP; and then incubated at

42jC for 2 min. One Al (200 units) of SuperScript II RNase

H� reverse transcriptase (Invitrogen) was added to the

mixture and cDNA was synthesized at 42jC for 1 h. Then,

this study. This tree was constructed using the neighbor-joining method in

use KAP16.9, AF345299; mouse KAP16.2, AF345292; mouse KAP16.5,

AP16.4, AF345294; mouse KAP8.2/HGTpI.a, D86422; mouse HGTpII.3,

6421; mouse HGTpII.2, D86419; rabbit KAP6.1, M95718; sheep KAP6.1,

HGTpIF, D86423; mouse KAP14/Pmg-1, AF003691; mouse KAP15/Pmg-

P5.1, M37759; mouse KAP9.1, M27685; rabbit KAP4L, X80035; mouse

tin protein, D86424; mouse KAP11.1/Hacl-1, U03686; human KAP1.1,

ur protein B2F, AB003753; and rat high-sulfur protein B2E, AB003753; for

.12, KAP4.5, KAP4.7, KAP4.15, KAP4.14, KAP4.13, KAP4.4, KAP4.10,

P9.6, KAP9.1, KAP4.6, KAP4.11, KAP4.9, KAP4.8, KAP4.3, KAP4.1,

Page 14: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

K. Shibuya et al. / Genomics 83 (2004) 679–693692

the reaction was terminated at 70jC for 15 min. RNase H

(2 units) was added to the mixture and incubated at 37jCfor 20 min to degrade RNA. PCR was performed using the

Expand High Fidelity PCR System (Roche) or the Expand

Long PCR System (Roche) using primer sets shown in

Table 1. Templates used for PCR include hair root cDNA

(1 ng), hair root total RNA (25 ng), dermal cDNA (1 ng),

and dermal total RNA (25 ng) as well as genomic DNA

(0.5 ng) as control. PCR was performed according to the

manufacturer’s protocol under the conditions of 94jC for

30 s; 55, 60, or 65jC for 1 min; and 72jC for 2 min for 39

cycles in an automated thermal cycler (Trio thermoblock

48, Biometra). The PCR conditions for each KAP gene are

summarized in Table 1. Other templates include human

tissue cDNAs (0.2 ng each) from Human Multiple Tissues

cDNA panels (Clontech; I, II, fetal, and immune system

panels) containing cDNA from 27 human tissues. For

KRTAP18.12, templates used for additional PCR include

diluted hair root cDNA (40 and 4 pg). PCR conditions were

the same as described above, except for 35 cycles instead

of 39 cycles.

PCR of TSPEAR cDNAwas performed using KOD-Plus-

DNA polymerase (Toyobo) according to the manufacturer’s

protocol under the condition of 94jC for 15 s, 55jC for 30

s, and 68jC for 2 min for 35 cycles using a pair of primers,

5V-CTGCCCTGCTGAGTCTGTGTTTTGT-3V and 5V-AAATCCTGGATGCTGGGAAGCTCAT-3V, located on

exon 1 and exon 2, respectively. Templates used for PCR

include hair root cDNA (1 ng) and Marathon Ready cDNAs

(0.1 ng each) (Clontech) from 25 different human tissues

(fetal brain, brain cerebellum, brain cerebral cortex, brain

hippocampus, brain hypothalamus, whole brain, fetal liver,

liver, fetal lung, lung, fetal thymus, thymus, fetal kidney,

kidney, skeletal muscle, fetal skeletal muscle, retina, pros-

tate, pancreas, placenta, ovary, heart, lymph node, testis, and

colon).

Sequencing of KAP cDNAs and partial TSPEAR cDNA

DNA sequencing was performed as described previous-

ly [37]. Briefly, fragments amplified by RT-PCR were

cloned into pBluescript II SK(+) (Stratagene) or pUC18

(TaKaRa Biochemicals) and sequencing was performed

with DNA sequencers (Models 377 and 3100; Applied

Biosystems) using a combination of BigDye Terminator

Cycle Sequencing Ready Reaction Kit v2.0 and dRhod-

amine Terminator Cycle Sequencing FS Ready Reaction

Kit (Applied Biosystems).

Acknowledgments

The authors thank Miho Tatsuyama and Marie L. Yaspo

for their interest and support during the initial stage of this

research. This work was supported in part by a Grant-in-Aid

for Scientific Research on Priority Areas from the Ministry

of Education, Culture, Sports, Science, and Technology

(MEXT) and a Grant-in-Aid for Scientific Research and the

fund for the ‘‘Research for the Future’’ program from the

Japan Society for the Promotion of Science and MEXT.

References

[1] The Chromosome 21Mapping and Sequencing Consortium, The DNA

sequence of human chromosome 21, Nature 405 (2000) 311–319.

[2] B.C. Powell, G.E. Rogers, The role of keratin proteins and their genes

in the growth, structure and properties of hair, in: P. Jolles, H. Zahn, H.

Hocker (Eds.), Formation and Structure of Human Hair, Birkhauser

Verlag, Basel, 1997, pp. 59–148.

[3] V. Romano, et al., Chromosomal assignments of human type I and

type II cytokeratin genes to different chromosomes, Cytogenet. Cell

Genet. 48 (1988) 148–151.

[4] M.A. Rogers, et al., Sequence data and chromosomal localization of

human type I and type II hair keratin genes, Exp. Cell Res. 220 (1995)

357–362.

[5] M. Rosenberg, A. RayChaudhury, T.B. Shows, M.M. Le Beau, E.

Fuchs, A group of type I keratin genes on human chromosome

17: characterization and expression, Mol. Cell. Biol. 8 (1988)

722–736.

[6] N. Ceratto, et al., Human type I cytokeratin genes are a compact

cluster, Cytogenet. Cell Genet. 77 (1997) 169–174.

[7] N.C. Popescu, P.E. Bowden, J.A. DiPaolo, Two type II keratin genes

are localized on human chromosome 12, Hum. Genet. 82 (1989)

109–112.

[8] M. Rosenberg, E. Fuchs, M.M. Le Beau, R.L. Eddy, T.B. Shows,

Three epidermal and one simple epithelial type II keratin genes

map to human chromosome 12, Cytogenet. Cell Genet. 57 (1991)

33–38.

[9] S.J. Yoon, J. LeBlanc-Straceski, D. Ward, K. Krauter, R. Kucherla-

pati, Organization of the human keratin type II gene cluster at 12q13,

Genomics 24 (1994) 502–508.

[10] S.E. Cole, R.H. Reeves, A cluster of keratin-associated proteins on

mouse chromosome 10 in the region of conserved linkage with human

chromosome 21, Genomics 54 (1998) 437–442.

[11] M. Takaishi, Y. Takata, T. Kuroki, N. Huh, Isolation and character-

ization of a putative keratin-associated protein gene expressed in em-

bryonic skin of mice, J. Invest. Dermatol. 111 (1998) 128–132.

[12] N. Aoki, K. Ito, M. Ito, Hair follicle has a novel anagen-specific

protein, mKAP13, J. Invest. Dermatol. 111 (1998) 804–809.

[13] F. Kuhn, et al., Pmg-1 and pmg-2 constitute a novel family of KAP

genes differentially expressed during skin and mammary gland devel-

opment, Mech. Dev. 86 (1999) 193–196.

[14] A.V. Tkatchenko, et al., Overexpression of Hoxc13 in differentiating

keratinocytes results in downregulation of a novel hair keratin gene

cluster and alopecia, Development 128 (2001) 1547–1558.

[15] M.A. Rogers, et al., Characterization of a cluster of human high/ultra-

high sulfur keratin-associated protein genes embedded in the type I

keratin gene domain on chromosome 17q12–21, J. Biol. Chem. 276

(2001) 19440–19451.

[16] H. Scheel, S. Tomiuk, K. Hofmann, A common protein interaction

domain links two recently identified epilepsy genes, Hum. Mol. Genet.

11 (2002) 1757–1762.

[17] J.D. Thompson, D.G. Higgins, T.J. Gibson, CLUSTALW: improving

the sensitivity of progressive multiple sequence alignment through

sequence weighting, position-specific gap penalties and weight matrix

choice, Nucleic Acids Res. 22 (1994) 4673–4680.

[18] C.E. Lawrence, et al., Detecting subtle sequence signals: a Gibbs sam-

pling strategy for multiple alignment, Science 262 (1993) 208–214.

[19] A.F. Neuwald, J.S. Liu, C.E. Lawrence, Gibbs motif sampling: de-

tection of bacterial outer membrane protein repeats, Protein Sci. 4

(1995) 1618–1632.

Page 15: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3

K. Shibuya et al. / Genomics 83 (2004) 679–693 693

[20] K. Quandt, K. Frech, H. Karas, E. Wingender, T. Werner, MatInd and

MatInspector: new fast and versatile tools for detection of consensus

matches in nucleotide sequence data, Nucleic Acids Res. 23 (1995)

4878–4884.

[21] L. Rowen, B.F. Koop, L. Hood, The complete 685-kilobase DNA

sequence of the human beta T cell receptor locus, Science 272

(1996) 1755–1762.

[22] F. Matsuda, et al., The complete nucleotide sequence of the human

immunoglobulin heavy chain variable region locus, J. Exp. Med. 188

(1998) 2151–2162.

[23] K. Kawasaki, et al., One-megabase sequence analysis of the human

immunoglobulin lambda gene locus, Genome Res. 7 (1997) 250–261.

[24] K. Kawasaki, et al., Evolutionary dynamics of the human immuno-

globulin kappa locus and the germline repertoire of the Vkappa genes,

Eur. J. Immunol. 31 (2001) 1017–1028.

[25] B.C. Powell, A. Nesci, G.E. Rogers, Regulation of keratin gene ex-

pression in hair follicle differentiation, Ann. N. Y. Acad. Sci. 642

(1991) 1–20.

[26] I. Dunham, et al., The DNA sequence of human chromosome 22,

Nature 402 (1999) 489–495.

[27] N. Levitt, D. Briggs, A. Gil, N.J. Proudfoot, Definition of an efficient

synthetic poly(A) site, Genes Dev. 3 (1989) 1019–1025.

[28] G. Adami, J.R. Nevins, Splice site selection dominates over poly(A)

site choice in RNA production from complex adenovirus transcription

units, EMBO J. 7 (1988) 2107–2116.

[29] M.A. Rogers, H. Winter, C. Wolf, M. Heck, J. Schweizer, Character-

ization of a 190-kilobase pair domain of human type I hair keratin

genes, J. Biol. Chem. 273 (1998) 26683–26691.

[30] M.A. Rogers, H. Winter, L. Langbein, C. Wolf, J. Schweizer,

Characterization of a 300 kbp region of human DNA containing

the type II hair keratin gene domain, J. Invest. Dermatol. 114

(2000) 464–472.

[31] P.J. MacKinnon, B.C. Powell, G.E. Rogers, Structure and expression

of genes for a class of cysteine-rich proteins of the cuticle layers of

differentiating wool and hair follicles, J. Cell Biol. 111 (1990)

2587–2600.

[32] E.C. Uberbacher, Y. Xu, R.J. Mural, Discovering and understanding

genes in human DNA sequence using GRAIL, Methods Enzymol.

266 (1996) 259–281.

[33] C. Burge, S. Karlin, Prediction of complete gene structures in human

genomic DNA, J. Mol. Biol. 268 (1997) 78–94.

[34] M.Q. Zhang, Identification of protein coding regions in the human

genome by quadratic discriminant analysis, Proc. Natl. Acad. Sci.

USA 94 (1997) 565–568.

[35] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic

local alignment search tool, J. Mol. Biol. 215 (1990) 403–410.

[36] E.L.L. Sonnhammer, R. Durbin, A dot-matrix program with dynamic

threshold control suited for genomic DNA and protein sequence anal-

ysis, Gene 167 (1995) GC1–GC10.

[37] K. Shibuya, et al., Isolation of two novel genes, DSCR5 and DSCR6,

from Down syndrome critical region on human chromosome 21q22.2,

Biochem. Biophys. Res. Commun. 271 (2000) 693–698.