Upload
kazunori-shibuya
View
215
Download
2
Embed Size (px)
Citation preview
www.elsevier.com/locate/ygeno
Genomics 83 (2004) 679–693
A cluster of 21 keratin-associated protein genes within introns of
another gene on human chromosome 21q22.3$
Kazunori Shibuya,1 Izumi Obayashi,1 Shuichi Asakawa, Shinsei Minoshima,2
Jun Kudoh, and Nobuyoshi Shimizu*
Department of Molecular Biology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo 160-8582, Japan
Received 1 July 2003; accepted 30 September 2003
Abstract
Recently, we identified multiple unique sequences in the 21q22.3 region and predicted them to be a cluster of genes encoding hair-specific
keratin-associated proteins (KAPs). Detailed computer-aided analysis of these clustered genes revealed that the cluster spans over 165 kb and
consists of 21 KAP-related sequences including 16 putative genes and 5 pseudogenes. These were further divided into two subfamilies,
KRTAP12 (KRTAP12.1–12.4 and KRTAP12.5P) and KRTAP18 (KRTAP18.1–18.12 and KRTAP18.13P–18.16P). All 16 putative genes
possess several intragenic repeat sequences and apparently belong to the high-sulfur KAP gene family (16–30% cysteine content) known for
nonhuman mammalian species. Transcripts were detected by RT-PCR analysis for all 16 putative KAP genes and their expression was
restricted to hair root cells (radix pili cells) and not found in 28 other tissues, including skin. All 16 KAP genes produced unspliced
transcripts, indicating their nature to be that of active intronless genes. Interestingly, all these KAP-related genes are located within introns of
the recently identified gene TSPEAR (approved gene symbol C21orf29), 214 kb in size. Surprisingly, the transcriptional direction of 8 of the
16 active genes is the same as that of C21orf29/TSPEAR. This finding suggests a novel transcription mechanism in which C21orf29/TSPEAR
gene transcription passes over the multiple transcriptional termination sites of the KAP genes.
D 2003 Elsevier Inc. All rights reserved.
Keywords: Chromosome 21; Keratin-associated protein; Gene cluster
Previously, as an international collaborative project, we
reported the complete DNA sequence of the euchromatic
region of human chromosome 21 and presented a compre-
hensive gene catalogue, in which 127 confirmed genes and
98 predicted genes were listed [1]. In the report, we
described multiple unique sequences on the 21q22.3 region
and predicted them as a cluster of keratin-associated protein
(KAP) genes, but molecular nature of individual genes in
the cluster had not been fully characterized.
Mammalian hair is produced in the hair follicles, which are
composed of several different cell types to form the medulla,
0888-7543/$ - see front matter D 2003 Elsevier Inc. All rights reserved.
doi:10.1016/j.ygeno.2003.09.024
$ Sequence data from this article have been deposited with the DDBJ/
EMBL/GenBank Data Libraries under Accession Nos. AB076347–
AB076364.
* Corresponding author. Fax: +81-3-3351-2370.
E-mail address: [email protected] (N. Shimizu).1 These authors contributed equally to this work.2 Present address: Photon Medical Research Center, Hamamatsu
University School of Medicine, 1-20-1 Handayama, Hamamatsu 431-
3192, Japan.
cortex, and outer cuticle of the hair shaft as well as the
surrounding inner and outer root sheaths [2]. Major compo-
nents of hair are keratin intermediate filaments (KIFs) and the
above-mentioned KAPs. The KIFs, also called hair keratin,
consist of a number of similar but distinct proteins, which are
classified into two types (type I and type II). The KIF genes
are located as two separate clusters on two different human
chromosomes, 17 and 12 [3–9]. Contrary to human KIFs,
little is known about human KAPs. However, nonhuman
mammalian KAPs have been better characterized and they
are classified into three types based on their amino acid
composition: ultrahigh-sulfur KAP (>30% cysteine content),
high-sulfur KAP (16–30% cysteine content), and high-gly-
cine/tyrosine KAP (35–60% glycine and tyrosine content)
[2]. These three types of mammalian KAPs are further
divided into 17 subfamilies [2,10–15]. KAPs have scarce
sequence similarity among subfamilies, although KAPs har-
bor some common motifs consisting of several amino acid
residues. Interestingly, a mouse gene cluster including the
Krtap12-1 gene that encodes a high-sulfur KAP has been
K. Shibuya et al. / Genomics 83 (2004) 679–693680
located on a particular region of mouse chromosome 10,
which is known to be homologous to a distal part of human
chromosome 21q22.3 [10].
Here, we report a novel cluster of human KAP-related
genes consisting of 16 active genes and 5 pseudogenes,
which were identified by the computer-aided dot-matrix
K. Shibuya et al. / Genomics 83 (2004) 679–693 681
analysis of genomic DNA sequence and experimental
evidence. Initial characterization of this novel cluster of
KAP-related genes is described in terms of their genomic
organization, gene structure, transcript variants, tissue
expression, and molecular evolution.
Results
Dot-matrix analysis of genomic DNA sequence of the
21q22.3 region
We have carried out dot-matrix analysis of the ge-
nomic sequence of 200 kb of the 21q22.3 region after
masking high-frequency repetitive elements by Repeat-
Masker, and we in fact identified 21 KAP-related sequen-
ces. Further analysis with computer-aided manual
inspection enabled us to predict the presence or absence
of open reading frames (ORFs) for these 21 KAP-related
sequences. Consequently, we were able to classify them
into 16 putat ive genes (KRTAP12.1 – 12.4 and
KRTAP18.1–18.12) and 5 pseudogenes (KRTAP12.5P
and KRTAP18.13P – 18.16P) (Fig. 1). Although
KRTAP18.13P is relatively conserved in length, it does
not contain any ORFs because of the lack of an initiation
codon and many mutations such as substitutions or
insertions–deletions. Other pseudogenes (KRTAP12.5P
and KRTAP18.14P–18.16P) are much shorter than the
putative KAP genes (Table 1).
Interestingly, 16 putative genes and 5 pseudogenes
belonged to the high-sulfur KAP gene family, but they
were further classified into two subfamilies (KRTAP12
and KRTAP18) based on their nucleotide sequence and
amino acid sequence. More specifically, 12 KAP genes
(KRTAP18.1–18.12) in one subfamily are distinct from any
KAP genes that have been deposited with the DDBJ/EMBL/
GenBank database, whereas 4 KAP genes (KRTAP12.1–
12.4) in the other subfamily are very similar to mouse
Krtap12.1 located on mouse chromosome 10 [10]. Thus,
dot-matrix analysis followed by computer-aided manual
inspection enabled us to identify all of these KAP-related
genes and this finding would not have been possible if only
prediction software (GENSCAN,MZEF, and Grail) had been
used for sequence analysis. Nonetheless, GENSCAN cor-
Fig. 1. The KAP gene cluster region on human chromosome 21. (A) Location of
genes, pseudogene, DNA marker, exons (predicted by GENSCAN, MZEF, and X
Alu (red), MIR (green), LINE (blue), LTR (yellow), DNA element (aqua), simple r
categorized with different colors into ‘‘excellent’’ (green), ‘‘good’’ (blue), and
independent gene unit instead of exons. Lines above GC content diagram indicate
direction. Eight known genes are TRPM2/TRPC7 (Accession No. AB001535), C2
and AF426270), C21orf29/TSPEAR (Accession No. AJ487962), UBE2G2 (A
(Accession No. Z50022), and ITGB2 (Accession No. M15395) and a pseudoge
21q22.3 region. Human 200-kb KAP gene cluster sequence was located on both
homology within a 50-nucleotide sliding window. Lines diagonal to both axes rep
gene (red) and pseudogene (aqua).
rectly predicted 2 KAP genes (KRTAP18.9 andKRTAP18.10)
but incorrectly predicted 1 KAP gene (KRTAP18.1) as a part
of a multiple-exon gene (Fig. 1).
A cluster of genes within introns of another gene
Recently, a novel cDNA for TSPEAR (thrombospon-
din-N domain and epilepsy-associated repeat domain) has
been reported [16]. TSPEAR cDNA corresponds to the
full-length cDNA for a putative gene, C21orf29 [1].
Although the TSPEAR cDNA sequence (GenBank Acces-
sion No. AJ487962) was reported to be reconstructed on
the basis of a human IMAGE cDNA clone (GenBank
Accession No. BC021197), EST sequences from various
organisms, and the corresponding genome sequence,
precise information was not provided [16]. Thus, we
made a homology search in the EST database of various
species and found 22 mammalian ESTs including 14
human ESTs, 4 bovine ESTs, 3 mouse ESTs, and 1
sheep EST. Since exon 1 of the TSPEAR gene was
supported by only 1 bovine EST, we performed RT-
PCR against cDNAs from various human tissues using
a pair of primers located on exon 1 and exon 2 of the
TSPEAR gene (Fig. 2). We detected TSPEAR cDNA
amplification in mRNAs from various tissues including
kidney, pancreas, heart, lymph node, and fetal lung but
not in hair root (data not shown). We therefore concluded
that the presence of TSPEAR cDNA was confirmed by
experimental evidence, including one IMAGE clone (Ac-
cession No. BC021197), two bovine ESTs (Accession
Nos. BM088875 and BI847891), and a human cDNA
fragment obtained by RT-PCR in this study (Fig. 2). We
located the TSPEAR gene in a region overlapping with a
KAP gene cluster. More deliberate sequence analysis
elucidated an intriguing genomic organization in that all
16 putative KAP genes and 5 KAP-related sequences
(pseudogenes) are located within a larger gene C21orf29/
TSPEAR, particularly in its intron 1 or intron 2 (Fig. 2).
Expression of KAP-related genes in the 21q22.3 region
We have examined the expression of KAP transcripts by
RT-PCR with various primer sets specific to each KAP gene
(Table 1).
KAP gene cluster region and the summary of computer analyses indicating
grail), repeats, and GC content. Repeats are classified into seven types, i.e.,
epeat or low complexity (gray), and other (olive). Predictions by Xgrail are
‘‘marginal’’ (red). Two colors in predictions by GENSCAN denote an
the direction from centromere to telomere and lines below indicate reverse
1orf30 (Accession No. AL117578), C21orf90 (Accession Nos. AF426269
ccession No. AF032456), SMT3H1 (Accession No. X99584), C21orf1
ne is IMMTP (Accession No. AP001755). (B) Dot-matrix analysis of the
horizontal and vertical axes. The strength of each dot reflects the sequence
resent high sequence homology. KAP genes were categorized into putative
Table 1
A cluster of KAP genes on 21q22.3
Gene or
transcript
cDNA
accession
number
Primer position Sense primer Anti-sense primer Annealing
temp. (jC)Enzymea PCR
product
(bp)
CDS
(bp)
G+C
of
CDS
(%)
M.W.
(Da)
Cys
(%)
Ser
(%)
Pro
(%)
Isoelectric
point
KRTAP18.1 AB076347 254806–255816b TCACTCACTCACACCTCCCG GAGACACGGGGACCCGTCCT 55 HF 1011 849 67.05 28,663.28 26.59 20.56 11.70 7.25
KRTAP18.2 AB076348 266056–267126b TCACTCACTCACACACCTCCCC ATCCCCAACCAGCGACCAGCGA 55 HF 1071 768 66.19 25,614.67 27.05 22.35 10.19 6.96
KRTAP18.2s AB076349 266056–267126b TCACTCACTCACACACCTCCCC ATCCCCAACCAGCGACCAGCGA 55 HF 366 381 67.52 13,398.10 3.17 15.87 14.28 5.86
KRTAP18.3 AB076350 273645–274381b TCACTCACTCACGTCTCCCC AACTCTGGAGAAACGGGACC 55 HF 737 666 67.97 22,403.29 25.33 19.90 12.66 7.38
KRTAP18.4 AB076351 289351–290732b AGCTCAACCCCCAGCACAGCA GTCAAAGTGCAGGAGCAATTC 55 HF 1382 1206 65.47 40,427.45 27.43 21.44 11.47 6.58
KRTAP18.5 AB076352 295309–296226b CCAGCTCACGTCTTCCCCAC CCTAACCCGAGTCAGGACCA 55 HF 918 816 66.11 27,566.05 26.93 19.55 12.17 6.78
KRTAP18.6 AB076353 306899–308136b CTCCACCAGTTCAACCCCAGCAT TAAGACAAAGAGCCTGCCCCAT 65 EL 1238 1098 65.42 36,779.97 26.57 22.46 10.13 5.87
KRTAP18.7 AB076354 316253–317435b CATCTCCTCCAGTTCAATCC TCAGGCTTTGGATGATCTTAAG 55 HF 1168 1113 65.06 37,370.88 26.48 20.54 11.35 6.80
KRTAP18.8 AB076355 327752–328627b ACCACCCAGTCCAGCACCCA AGGACAGGACCGGAGCCGGC 55 HF 876 780 65.97 26,316.62 25.48 19.30 10.81 7.30
KRTAP18.9 AB076356 3797–4975c TCACACACTCACTTACACCTCC CGTCCCCAACCAGCGACCAGCG 55 HF 1186 879 66.77 29,975.63 25.34 19.52 11.98 7.24
KRTAP18.9s AB076357 3797–4975c TCACACACTCACTTACACCTCC CGTCCCCAACCAGCGACCAGCG 55 HF 468 480 68.98 16,335.51 6.91 18.86 15.09 8.05
KRTAP18.10 AB076358 14045–14966c TCACTCACTCACACACCTCCCC CAAGACAAAGAGCCTGCCCCAC 55 HF 922 756 64.96 25,569.59 24.30 20.31 11.15 6.86
KRTAP18.11 AB076359 23088–24094c ACACTCACTTACACCTCCCCCA TCCTGAGACTGGAGAATCCTGC 65 EL 1007 897 64.83 30,264.94 26.17 21.47 10.40 7.22
KRTAP12.4 AB076360 30887–31333c AGACCAGCCCTGTCCTCTGCG GGAGTTCAGAGAGCCTGCTGG 55 HF 447 339 65.31 11,432.14 23.21 11.60 16.07 7.24
KRTAP12.3 AB076361 34606–35015c CAGACATCACCATCCTCCTCCC TCTGGGGGTCCACCAGATGCT 60 EL 410 291 64.14 9,965.42 22.91 17.70 14.58 7.65
KRTAP12.2 AB076362 42971–43580c TTATCCAGCCACACGCCACCATG CTGTCACATTCTCAATCCAGAA 55 EL 610 441 61.14 14,688.75 21.23 21.91 13.01 7.58
KRTAP12.1 AB076363 58341–58815c TTATCCAGCCACACGCCACCATG AGGGCTCCAGATCATTCTATTA 55 EL 475 291 61.25 9,737.06 21.87 19.79 13.54 7.67
KRTAP18.12 AB076364 73844–74716c AGCTCAACCCCCAGCACGGCT GAGCAGCCGAGGGGCCAGTAG 55 HF 873 738 67.12 25,106.39 25.71 19.59 12.65 7.32
KRTAP18.13P — 79059–79630c,d CAGCTCCTGCACGCCCTTGT AGTGGATAGGTAAGCCGTGGTTG 65 EL 572 — — — — — — —
KRTAP12.5P — 63030–63301c — — — — — — — — — — — —
KRTAP18.14P — 260420–260468b — — — — — — — — — — — —
KRTAP18.15P — 332739–332879b — — — — — — — — — — — —
KRTAP18.16P — 337092–337151b — — — — — — — — — — — —
a HF, Expand High Fidelity PCR System; EL, Expand Long Template PCR System.b Positions of cDNA or pseudogenes on AP001754.c Positions of cDNA or pseudogenes on AP001755.d KRTAP18.13P region homologous to other KRTAP18’s locates between 78891 and 79541 on AP001755.
K.Shibuya
etal./Genomics
83(2004)679–693
682
Fig. 2. Genomic organization of the KAP gene cluster region. Horizontal scale is consistent with that of Fig. 1A. Arrows indicate transcriptional
direction. Vertical bars on the top and the second lines indicate 16 KAP genes, 5 KAP pseudogenes, and the IMMTP pseudogene, for which the
numbers above or below the bars indicate the name of the respective KAP gene. ‘‘P’’ after a number indicates a KAP pseudogene. The numbers above
the vertical bars on the third line indicate exons of the C21orf29/TSPEAR gene, which were determined based on TSPEAR cDNA sequence (Accession
No. AJ487962). A human IMAGE clone (Accession No. BC021197), a bovine EST (Accession No. BM088875), and another bovine EST (Accession
No. BI847891) are shown in the fourth, fifth, and sixth lines, respectively. Arrows on both ends of the dashed line indicate the region confirmed by RT-
PCR in this study.
K. Shibuya et al. / Genomics 83 (2004) 679–693 683
We detected at least one RT-PCR product each with the
size expected from primer positions on the genomic sequence
for all 16 putative KAP genes using an mRNA preparation
from hair root cells (Fig. 3A, lanes +). More importantly,
these RT-PCR products contained the predicted coding
sequences (see below). Thus, all 16 putative KAP genes
were confirmed to be active for transcription.
Some RT-PCR products with smaller size were also
detected for two KAP genes (KRTAP18.2 and KRTAP18.9)
(Fig. 3A). Sequencing analysis confirmed that these tran-
scripts are spliced forms (KRTAP18.2s and KRTAP18.9s)
(Fig. 3A, indicated by gray and black arrows, respectively).
Splice donor and acceptor sites of KRTAP18.2s and
KRTAP18.9s were consistent with the consensus ‘‘GT–
AG’’ sequence. Also, one of the five pseudogenes
(KRTAP18.13P) with abnormal coding sequence generated
a significant amount of transcripts (Fig. 3A).
Next, we examined expression of some KAP-related
genes (KRTAP18.1, KRTAP18.2, KRTAP18.11, and
KRTAP18.12) in various human tissues other than hair root
cells. Expression of these KAP genes was not detected in
human skin (Fig. 3B) or in any of 27 human tissues examined
(Fig. 3C). Moreover, KRTAP18.12 could be amplified even
from diluted hair root cDNA (40 and 4 pg) (Fig. 3C). Thus,
expression of KAP genes seems restricted to hair root cells at
the RT-PCR level.
Isolation of KAP cDNAs and their sequence analysis
We have cloned cDNA fragments that were amplified
by RT-PCR for all 16 KAPs using mRNAs from hair root
cells and determined their nucleotide sequence. The sizes
of the ORFs of the 16 KAP cDNAs ranged from 291 to
1206 bp as an intronless form (Table 1). The deduced
amino acid sequences revealed that the 16 human KAPs
are rich in Ser and Pro as well as Cys (Table 1). The
amino acid sequence deduced from the spliced forms
(KRTAP18.2s and KRTAP18.9s) revealed several features
different from active KAP genes, especially in the content
of Cys (Table 1). However, these proteins produced from
spliced mRNAs may not function because only the first
26 or 59 amino acids at the N-terminal portions are
identical with active KAPs (KRTAP18.2 and KRTAP18.9)
and most of the ORFs are derived from the 3V UTR
(Fig. 4).
Moreover, in the process of cDNA sequence analysis,
several polymorphisms were detected (Table 2). The poly-
morphisms of the KAP genes included single-base substi-
Fig. 3. Expression patterns of KAP genes. (A) RT-PCR analysis of KAP genes in hair roots. Gray and black arrows indicate KRTAP18.2s and KRTAP18.9s,
respectively. A band indicated by an asterisk in the blot for KRTAP18.2 (lane +) was KRTAP18.9s apparently generated by false priming of the primer set for
KRTAP18.2. (B) RT-PCR analysis of KAP genes in skin. ‘‘+’’ and ‘‘�’’ indicate the presence or absence of reverse transcriptase. ‘‘G’’ indicates genomic DNA.
‘‘C’’ indicatesG3PDH (glyceraldehyde-3-phosphate dehydrogenase) cDNA. RT-PCR using G3PDH primers was used as positive control. Detailed procedure is
described under Material and methods. (C) RT-PCR analysis of KRTAP18.12 in 27 human tissues (0.2 ng cDNA each) and diluted hair root cDNA (40 and 4 pg).
K. Shibuya et al. / Genomics 83 (2004) 679–693684
Fig. 4. Two spliced forms, KRTAP18.2s and KRTAP18.9s. Bands with smaller size were detected in KRTAP18.2 and KRTAP18.9. Open box, closed black box,
hatched box, and dotted box indicate noncoding region, KAP-coding region, coding region translated only in spliced form, and coding region predicted from
genomic sequence (Accession Nos. AP001754 and AP001755) only in spliced forms, respectively.
K. Shibuya et al. / Genomics 83 (2004) 679–693 685
tution at various positions and a 15-bp insertion and a 7-bp
deletion at two positions.
To determine the structural features of KAPs, we have
compared the amino acid sequences of every member of
the subfamilies (Fig. 5). For 12 members of the KAP18
subfamily, it was found that the ORFs contain several
repetitive units with diverse numbers and sequences (Fig.
5A). The KAP18 members contain several common
motifs (CC[R/Q]P[S/T], CCRT, and CC[V/K]P) that are
frequently found in the ultrahigh-sulfur-type KAPs [2],
and hence they may be considered as ultrahigh-sulfur
type rather than high-sulfur type. Four members of the
KAP12 subfamily also contain common motifs such as
CCVP, CCQP, and CCQPS, but the density of these
motifs is much lower than in KAP18 (Fig. 5B). These
KAPs possess four to six units of f20 amino acids with
consensus sequence of CQX3CX4CX4CX3S. Similar di-
rect repeats of 10 amino acids with a CQ motif at the N-
terminal are also found in mouse KAP13 protein [11]
(Accession No. AF031485).
Sequence analysis of 5V upstream regions of KAP genes
We have analyzed the 5V upstream sequences of the
KAP genes by sequence alignment using the program
CLUSTALW [17]. It was elucidated that TATA-like
sequences are located at sites between �121 and �68
bp upstream of each initiation codon in the KRTAP18
subfamily members (Fig. 6A) and �75 or �74 bp
upstream of each initiation codon in the KRTAP12
subfamily members (Fig. 6B). Since there is a four and
a half times expansion of an approximately 40-bp se-
quence in KRTAP18.12, the TATA-like sequence is ex-
ceptionally located at �237 bp upstream of the initiation
codon (Fig. 6A). No typical CpG islands were identified
in the 5V upstream region of the KAP genes.
To search further for a common sequence motif that
might specifically regulate the expression of the KAP
genes, we have analyzed the 5V upstream sequences of
all 16 KAP genes using the Gibbs sampler program
[18,19]. Accordingly, it was elucidated that a highly
conserved sequence with the 14-bp consensus sequence
CAVCAACAAGGAAG (V = A, C, or G) is located
between �64 and �51 bp upstream of each TATA-like
sequence (Fig. 6C). Analysis with MatInspector [20]
indicated that the 14-bp consensus sequence is not
identical to the known binding sites for transcription
factors and it could be a new type of binding site.
Evolutionary relationship of mammalian KAP family
members
To elucidate the evolutionary relationship between
known mammalian KAPs and human KAPs identified
in this study, a phylogenetic tree was constructed (Fig.
7). It was revealed that the evolutionary relationship
between subfamilies KAP18 and KAP12 is extremely
close, distinguishing them from other subfamilies. This
Table 2
Polymorphisms identified in this study
Gene name Positiona Base Codon Amino acid Gene name Positiona Base Codon Amino acid
KRTAP18.1 116 C CCG P KRTAP18.6 81 T TCT S
116 T CTG L 81 C TCC S
937 G 3V UTR 126 C TGC C
937 T 3V UTR 126 T TGT C
KRTAP18.2 43 A AAC N 144–145 15-bp insertionb
43 G GAC D 615 A ACA T
319 C CCT P 615 G ACG T
319 A ACT T 621 A TCA S
324 G GTG V 621 G TCG S
324 C GTC V 679 G GCC A
349 G GCT A 679 A ACC T
349 C CCT P KRTAP18.9 330 T TGT C
530 T CTG L 330 C TGC C
530 C CCG P 545 G TGC C
624 G CCG P 545 A TAC Y
624 A CCA P 622 T TTG L
721 C CGC R 622 C CTG L
721 G GGC G 769 T TGC C
859 G 3V UTR 769 C CGC R
859 A 3V UTR 837 T CCT P
925 T 3V UTR 837 G CCG P
925 C 3V UTR 1044 C6 3V UTR
936 G 3V UTR 1044 C7 3V UTR
936 C 3V UTR 1055 G 3V UTR
958 T 3V UTR 1055 A 3V UTR
958 G 3V UTR 1070 G 3V UTR
967 G 3V UTR 1070 T 3V UTR
967 T 3V UTR 1079 T 3V UTR
KRTAP18.4 184 C CGT R 1079 G 3V UTR
184 T TGT C 1109–1115 7-bp deletion 3V UTR
216 T TGC C KRTAP18.10 214 A ACC T
216 C TGT C 214 C CCC P
475 A ATC I KRTAP18.11 638 A TAT Y
475 G GTC V 638 C TCT S
KRTAP18.5 58 G GAC D KRTAP12.4 330 T ACT T
58 A AAC N 330 C ACC T
703 G GTG V KRTAP12.3 50 G CGC R
703 C CTG L 50 A CAC H
803 C CCC P
803 G CGC R
a ‘‘A’’ of initiation codon, ATG in each KAP gene is regarded as +1.b Inserted sequence is CCCAGCTGCTGCGCC.
K. Shibuya et al. / Genomics 83 (2004) 679–693686
suggests that KAP genes in these two subfamilies may
have been generated by duplication of a common ances-
tral gene, although the amino acid sequences of the
KAPs at the present time are distinct (Fig. 5).
Discussion
Gene clusters by tandem gene duplication
It has been postulated that tandem gene duplication has
contributed to the diversification of the structure and func-
tion of a number of gene clusters such as the T cell receptor
[21], IgH [22], IgE [23], and Ign [24]. Similarly, the
diversification of KAPs may have been generated through
tandem gene duplication. It is quite intriguing why such
diversity was necessary for establishing the physiological
function of hair-forming KAP molecules during the evolu-
tionary process.
Classification of KAP genes
Sixteen KAP genes identified in this study are classi-
fied into two subfamilies of which one subfamily consists
of 12 KAP genes and the other subfamily consists of 4
KAP genes. Since the first subfamily genes do not
resemble any known KAP genes from other mammalian
species, we designated these genes as a new group
KRTAP18.1–18.12. The second subfamily, KRTAP12.1–
12.4, was named after the mouse Krtap12.1 gene because
of their extreme similarity [10]. However, there are some
discrepancies in naming (Fig. 7), and hence reconsidera-
tion of the nomenclature of the KAP genes should be
necessary in the near future.
Fig. 5. Multiple alignments of the amino acid sequences of KAPs on 21q22.3. (A) Multiple alignment of the amino acid sequences of 12 KAP18’s. (B) Multiple alignment of the amino acid sequences of 4
KAP12’s. White on black background, black on light gray background, and black on dark gray background signify identical, conserved, and similar amino acids, respectively. Colored sequences represent common
sequence motifs reported previously [2].
K.Shibuya
etal./Genomics
83(2004)679–693
687
Fig. 5 (continued).
K. Shibuya et al. / Genomics 83 (2004) 679–693688
Coding sequences
To predict coding sequences, we at first chose the longest
possible ORF between the TATA-like sequence and the
putative poly(A) signal in the genomic sequence for each
KAP gene. TATA-like sequences of all KAP genes locate
ca. 100 bp upstream of the putative initiation codon,
whereas putative poly(A) signals locate ca. 290 bp down-
stream of the termination codon in the genomic sequence.
We then performed RT-PCR analysis using primer sets
(Table 1) designed to amplify the entire region of the longest
ORF to examine whether it is transcribed. For all 16 putative
KRTAP genes (12 KRTAP18 genes and 4 KRTAP12 genes),
RT-PCR revealed that mRNAs that include the entire ORF
are in fact transcribed. However, we could not rule out the
possibility that the coding frame may start from a common
motif, MS[V/I]CSS, which is conserved in all KRTAP18
genes (Fig. 5A). For KRTAP12 genes, we concluded that all
of them have well-conserved ORFs that start with the
common sequence MCHTS (Fig. 5B).
Moreover, the C-terminal amino acid of all the KAPs is
Cys with one exception, KAP18.4 (Fig. 5). For KAP18.4,
we speculated that the termination codon TGA after Leu401
may have been TGT or TGC (for Cys) and that the third
letter may have been substituted with ‘‘A’’ during evolution,
because neighboring amino acids of Leu401 in most other
KAP18 members are Cys. Before this substitution took
place, KAP18.4 may have had 10 additional amino acids,
CRPVCSRPAC, at the C-terminal, which is very similar to
some of the other KAP18’s. However, it is unclear how this
C-terminal truncation influences the function of KAP18.4.
Expression specificity
Expression of KAP genes on 21q22.3 was specific to hair
root cells and could not be detected in any other human
tissues at the RT-PCR level (Fig. 3). A Blast search of the
nucleotide sequence databases (DDBJ, EMBL, and Gen-
Bank) revealed several human ESTs that match KRTAP18.2,
KRTAP18.11, and KRTAP18.12. Two ESTs (Accession Nos.
W70177 and W69912) matching KRTAP18.11 are registered
as ESTs derived from fetal heart, whereas six ESTs (Acces-
sion Nos. BF057518, BF058205, BF057369, BF515903,
BF221518, and BM719379) matching KRTAP18.12 are
registered as derived mainly from fibrotheoma of the ovary.
Another EST (Accession No. AJ003277) matching
KRTAP18.2 is not mentioned for tissue specificity. Since
KAP is a major component of mammalian hair, it is unlikely
that the reported expression in these tissues has any phys-
iological significance. In fact, expression in these tissues
could not be confirmed by our expression study (Fig. 3C).
The 14-bp consensus sequence CAVCAACAAGGAAG
(V=A, C, or G) identified in the 5V upstream region includes
the UHS-1 motif (ACAAGGAA), which is found in the
promoter region of hair ultrahigh-sulfur-type KAP genes
expressed in human cuticle, mouse cortex/cuticle, and rabbit
cortex [25]. The UHS-1 motif is considered a possible
regulatory sequence in the specific expression of KAP genes
in hair root cells.
Genes within a gene
It is striking that all the KAP gene family members are
located within introns of another gene, C21orf29/TSPEAR.
The transcriptional direction of eight KAP genes is the same
as that of C21orf29/TSPEAR, whereas the remaining eight
KAP genes are transcribed in the opposite direction (Fig. 2).
The latter situation has been reported for several case but not
for the former case [26]. Thus, further study on the former
situation may uncover a new mechanism of transcriptional
regulation of the C21orf29/TSPEAR gene. It is easily
conceived that RNA polymerase II must overcome at least
eight transcriptional termination sites of the KAP genes
when the transcription of C21orf29/TSPEAR is taking place.
Three possibilities can be raised: (1) Incomplete mRNAs
may be synthesized from C21orf29/TSPEAR by stopping at
every transcriptional termination site of the KAP genes. In
this case, a poly(A) tail would be added to the 3V end of the
incomplete mRNAs in the same way in which KAP mRNAs
are polyadenylated, and thus they would become good
templates for RT-PCR of KAP mRNA. Since PCR products
were not detected by RT-PCR using appropriate primer sets
for KAP genes in the tissues that express C21orf29/TSPEAR
mRNA (kidney, pancreas, heart, lymph node, and fetal
lung), this possibility may be ruled out. (2) There may be
special sites that are recognized only when KAP gene
transcription is taking place and are not recognized as
transcriptional termination sites for C21orf29/TSPEAR. (3)
Fig. 6. Multiple alignments of the nucleotide sequences in the 5V upstream region of the KAP genes. (A) Multiple alignment of the nucleotide sequences in the 5V upstream region of the KRTAP18 genes. Sequences
colored in red or green display the components, repetitive units of the inserted expansion in KRTAP18.12. (B) Multiple alignment of the nucleotide sequences in the 5V upstream region of the KRTAP12 genes.
Numbering for nucleotide sequences regarded the ‘‘A’’ of the initiation codon as the +1 position. White on black background, black on light gray background, and black on dark gray background signify identical,
conserved, and similar nucleotides, respectively. Pink color indicates putative TATA-like sequence. Double-headed arrow indicates highly conserved region except for the TATA-like sequence among the genes. (C)
Multiple alignment of the highly conserved sequences among 16 active KAP genes. Numbering for nucleotide sequences regarded the ‘‘A’’ of the initiation codon as the +1 position. White on black background,
black on light gray background, and black on dark gray background signify identical, conserved, and similar nucleotides, respectively. Values in parentheses at the extreme right represent the probability of finding
a conserved sequence motif.
K.Shibuya
etal./Genomics
83(2004)679–693
689
K. Shibuya et al. / Genomics 83 (2004) 679–693690
There may be a novel mechanism to pass over the tran-
scriptional termination sites of KAP genes when C21orf29/
TSPEAR gene transcription is taking place. We believe this
possibility is most likely because it was demonstrated that a
poly(A) signal introduced into the intron of the rabbit h-globin gene does not efficiently function in an experimental
K. Shibuya et al. / Genomics 83 (2004) 679–693 691
system [27] and this was thought to be due to the predom-
inance of splicing over polyadenylation [28]. Thus, tran-
scriptional termination sites of KAP genes in the introns of
the C21orf29/TSPEAR gene may not be recognized as
termination signals when C21orf29/TSPEAR is being tran-
scribed, by which production of incomplete mRNA is
prevented.
Conclusion
In this study, we established the genomic organization of
21 KAP-related genes, including 16 active KAP genes on
the human chromosome 21q22.3 region. To our knowledge,
there are 97 genes identified for the growth and formation of
human hair. They include 9 type I KIF genes on 17q12–q21
[29], 6 type II KIF genes on 12q13 [30], 37 KAP genes on
17q12–q21 [15], 16 KAP genes on 21q22.3 (this study),
and 30 KAP genes on 21q22.11 (I. Obayashi et al.,
unpublished). Such multigene families appear consistent
with the previous estimation that 50–100 keratin-related
genes (KIFs and KAPs) are being expressed in the growing
hair [25,31]. In this regard, comparative DNA sequence
analysis of KAP gene clusters among primates, rodents, and
even marine mammals will be significant to trace the origin
and evolution of mammalian hairs.
Materials and methods
Computer analysis of DNA and protein sequences
Computer analysis was performed for a genomic DNA
sequence of 200 kb (Accession Nos. AP001754 and
AP001755), which was determined and assembled by us
[1]. The repetitive elements in the genomic sequence were
masked through the RepeatMasker2 Web server (Smit and
Green; RepeatMasker at http://ftp.genome.washington.edu/
RM/RepeatMasker.html). The exon prediction was carried
out using Xgrail version 1.3c [32], GENSCAN [33], and
MZEF [34]. Homology search of nucleotide and amino acid
sequences was carried out through the BLAST server (http://
www.ncbi.nlm.nih.gov/BLAST/) at NCBI using BLASTN,
BLASTP, and BLASTX [35]. Multiple alignment of DNA
and amino acid sequences and drawing of the phylogenetic
Fig. 7. Phylogenetic tree of mammalian KAPs including human KAPs identified in
CLUSTALW version 1.81 [18]. Database accession numbers are as follows: mo
AF345295; mouse KAP16.1, AF345291; mouse KAP16.3, AF345293; mouse K
D89901; mouse KAP16.8, AF345298; mouse KAP6.1/HGTpII.1, D86420 or D8
M95719; mouse KAP16.7, AF345297; mouse KAP6.2/HGTpII.4, D89902; mouse
2, AF162800; mouse KAP13, AF031485; mouse KAP5.4, M37760; mouse KA
KAP12.1, AF081797; sheep KAP2.3/BIIIA3, U60024; mouse high-sulfur kera
AB052934; human KAP1.7, AB055057; human KAP1.6, AB052868; rat high-sulf
human KAP17.1, KAP9.4, KAP9.8, KAP9.3, KAP9.2, KAP9.9, KAP4.2, KAP4
KAP2.4, KAP3.3, KAP3.2, KAP3.1, KAP1.5, KAP1.3, KAP9.7, KAP9.5, KA
KAP2.3, KAP2.1A/B, KAP16.1, and KAP1.4, see [15].
tree using the neighbor-joining method were performed with
CLUSTALW version 1.81 [17]. Graphic representation of
the alignment of amino acid and nucleotide sequences was
obtained by the program BOXSHADE 2.15 (available at
http://www.ch.embnet.org/software/BOX_form.html). The
dot-matrix analysis was carried out using the DOTTER
program [36] with dynamic threshold control. Analysis of
conserved DNA sequence motif was carried out using the
Gibbs sampler program [18,19]. Search for transcription
factor binding sites was performed with MatInspector [20].
Preparation of RNAs from human radix pili cells and
dermal cells
One hundred fifty pieces of hair were collected from 9
females and 11 males after receiving informed consent, and
only hair roots were excised and kept in liquid nitrogen until
use. Total RNA was extracted using the RNAeasy Mini Kit
(Qiagen). Briefly, frozen samples were solubilized in a
denaturing buffer containing guanidine isothiocyanate and
homogenized with intense stirring, and then total RNA was
purified using the RNAeasy mini spin column. To exclude
contamination of genomic DNA, purified RNA was treated
with 5 U of RNase-free DNase I (Nippon Gene) at 37jC for
10 min and then DNase I was inactivated by heat treatment
at 80jC for 10 min. Eventually, approximately 30 Ag of
RNA was purified from 150 pieces of hair. A dermal piece
of approximately 36 mg was surgically obtained from the
brachium of one of the authors. The dermal sample was
frozen in liquid nitrogen, ground using a Cryo-Press CP-
50W (Microtec Co., Ltd.), and solubilized in a denaturing
buffer containing guanidine isothiocyanate. Following the
same procedure used for hair roots, approximately 6.2 Ag of
RNA was purified from 36 mg of dermis.
RT-PCR of KAP cDNAs and partial TSPEAR cDNA
A 12-Al mixture containing 1 Ag of total RNA and 50
pmol of oligo(dT)18VN primer [5V-T18V(A or C or G)N-
3V] was incubated at 65jC for 10 min; chilled on ice;
mixed with 4 Al of 5� first-strand buffer, 2 Al of 0.1 M
DTT, and 1 Al of 10 mM dNTP; and then incubated at
42jC for 2 min. One Al (200 units) of SuperScript II RNase
H� reverse transcriptase (Invitrogen) was added to the
mixture and cDNA was synthesized at 42jC for 1 h. Then,
this study. This tree was constructed using the neighbor-joining method in
use KAP16.9, AF345299; mouse KAP16.2, AF345292; mouse KAP16.5,
AP16.4, AF345294; mouse KAP8.2/HGTpI.a, D86422; mouse HGTpII.3,
6421; mouse HGTpII.2, D86419; rabbit KAP6.1, M95718; sheep KAP6.1,
HGTpIF, D86423; mouse KAP14/Pmg-1, AF003691; mouse KAP15/Pmg-
P5.1, M37759; mouse KAP9.1, M27685; rabbit KAP4L, X80035; mouse
tin protein, D86424; mouse KAP11.1/Hacl-1, U03686; human KAP1.1,
ur protein B2F, AB003753; and rat high-sulfur protein B2E, AB003753; for
.12, KAP4.5, KAP4.7, KAP4.15, KAP4.14, KAP4.13, KAP4.4, KAP4.10,
P9.6, KAP9.1, KAP4.6, KAP4.11, KAP4.9, KAP4.8, KAP4.3, KAP4.1,
K. Shibuya et al. / Genomics 83 (2004) 679–693692
the reaction was terminated at 70jC for 15 min. RNase H
(2 units) was added to the mixture and incubated at 37jCfor 20 min to degrade RNA. PCR was performed using the
Expand High Fidelity PCR System (Roche) or the Expand
Long PCR System (Roche) using primer sets shown in
Table 1. Templates used for PCR include hair root cDNA
(1 ng), hair root total RNA (25 ng), dermal cDNA (1 ng),
and dermal total RNA (25 ng) as well as genomic DNA
(0.5 ng) as control. PCR was performed according to the
manufacturer’s protocol under the conditions of 94jC for
30 s; 55, 60, or 65jC for 1 min; and 72jC for 2 min for 39
cycles in an automated thermal cycler (Trio thermoblock
48, Biometra). The PCR conditions for each KAP gene are
summarized in Table 1. Other templates include human
tissue cDNAs (0.2 ng each) from Human Multiple Tissues
cDNA panels (Clontech; I, II, fetal, and immune system
panels) containing cDNA from 27 human tissues. For
KRTAP18.12, templates used for additional PCR include
diluted hair root cDNA (40 and 4 pg). PCR conditions were
the same as described above, except for 35 cycles instead
of 39 cycles.
PCR of TSPEAR cDNAwas performed using KOD-Plus-
DNA polymerase (Toyobo) according to the manufacturer’s
protocol under the condition of 94jC for 15 s, 55jC for 30
s, and 68jC for 2 min for 35 cycles using a pair of primers,
5V-CTGCCCTGCTGAGTCTGTGTTTTGT-3V and 5V-AAATCCTGGATGCTGGGAAGCTCAT-3V, located on
exon 1 and exon 2, respectively. Templates used for PCR
include hair root cDNA (1 ng) and Marathon Ready cDNAs
(0.1 ng each) (Clontech) from 25 different human tissues
(fetal brain, brain cerebellum, brain cerebral cortex, brain
hippocampus, brain hypothalamus, whole brain, fetal liver,
liver, fetal lung, lung, fetal thymus, thymus, fetal kidney,
kidney, skeletal muscle, fetal skeletal muscle, retina, pros-
tate, pancreas, placenta, ovary, heart, lymph node, testis, and
colon).
Sequencing of KAP cDNAs and partial TSPEAR cDNA
DNA sequencing was performed as described previous-
ly [37]. Briefly, fragments amplified by RT-PCR were
cloned into pBluescript II SK(+) (Stratagene) or pUC18
(TaKaRa Biochemicals) and sequencing was performed
with DNA sequencers (Models 377 and 3100; Applied
Biosystems) using a combination of BigDye Terminator
Cycle Sequencing Ready Reaction Kit v2.0 and dRhod-
amine Terminator Cycle Sequencing FS Ready Reaction
Kit (Applied Biosystems).
Acknowledgments
The authors thank Miho Tatsuyama and Marie L. Yaspo
for their interest and support during the initial stage of this
research. This work was supported in part by a Grant-in-Aid
for Scientific Research on Priority Areas from the Ministry
of Education, Culture, Sports, Science, and Technology
(MEXT) and a Grant-in-Aid for Scientific Research and the
fund for the ‘‘Research for the Future’’ program from the
Japan Society for the Promotion of Science and MEXT.
References
[1] The Chromosome 21Mapping and Sequencing Consortium, The DNA
sequence of human chromosome 21, Nature 405 (2000) 311–319.
[2] B.C. Powell, G.E. Rogers, The role of keratin proteins and their genes
in the growth, structure and properties of hair, in: P. Jolles, H. Zahn, H.
Hocker (Eds.), Formation and Structure of Human Hair, Birkhauser
Verlag, Basel, 1997, pp. 59–148.
[3] V. Romano, et al., Chromosomal assignments of human type I and
type II cytokeratin genes to different chromosomes, Cytogenet. Cell
Genet. 48 (1988) 148–151.
[4] M.A. Rogers, et al., Sequence data and chromosomal localization of
human type I and type II hair keratin genes, Exp. Cell Res. 220 (1995)
357–362.
[5] M. Rosenberg, A. RayChaudhury, T.B. Shows, M.M. Le Beau, E.
Fuchs, A group of type I keratin genes on human chromosome
17: characterization and expression, Mol. Cell. Biol. 8 (1988)
722–736.
[6] N. Ceratto, et al., Human type I cytokeratin genes are a compact
cluster, Cytogenet. Cell Genet. 77 (1997) 169–174.
[7] N.C. Popescu, P.E. Bowden, J.A. DiPaolo, Two type II keratin genes
are localized on human chromosome 12, Hum. Genet. 82 (1989)
109–112.
[8] M. Rosenberg, E. Fuchs, M.M. Le Beau, R.L. Eddy, T.B. Shows,
Three epidermal and one simple epithelial type II keratin genes
map to human chromosome 12, Cytogenet. Cell Genet. 57 (1991)
33–38.
[9] S.J. Yoon, J. LeBlanc-Straceski, D. Ward, K. Krauter, R. Kucherla-
pati, Organization of the human keratin type II gene cluster at 12q13,
Genomics 24 (1994) 502–508.
[10] S.E. Cole, R.H. Reeves, A cluster of keratin-associated proteins on
mouse chromosome 10 in the region of conserved linkage with human
chromosome 21, Genomics 54 (1998) 437–442.
[11] M. Takaishi, Y. Takata, T. Kuroki, N. Huh, Isolation and character-
ization of a putative keratin-associated protein gene expressed in em-
bryonic skin of mice, J. Invest. Dermatol. 111 (1998) 128–132.
[12] N. Aoki, K. Ito, M. Ito, Hair follicle has a novel anagen-specific
protein, mKAP13, J. Invest. Dermatol. 111 (1998) 804–809.
[13] F. Kuhn, et al., Pmg-1 and pmg-2 constitute a novel family of KAP
genes differentially expressed during skin and mammary gland devel-
opment, Mech. Dev. 86 (1999) 193–196.
[14] A.V. Tkatchenko, et al., Overexpression of Hoxc13 in differentiating
keratinocytes results in downregulation of a novel hair keratin gene
cluster and alopecia, Development 128 (2001) 1547–1558.
[15] M.A. Rogers, et al., Characterization of a cluster of human high/ultra-
high sulfur keratin-associated protein genes embedded in the type I
keratin gene domain on chromosome 17q12–21, J. Biol. Chem. 276
(2001) 19440–19451.
[16] H. Scheel, S. Tomiuk, K. Hofmann, A common protein interaction
domain links two recently identified epilepsy genes, Hum. Mol. Genet.
11 (2002) 1757–1762.
[17] J.D. Thompson, D.G. Higgins, T.J. Gibson, CLUSTALW: improving
the sensitivity of progressive multiple sequence alignment through
sequence weighting, position-specific gap penalties and weight matrix
choice, Nucleic Acids Res. 22 (1994) 4673–4680.
[18] C.E. Lawrence, et al., Detecting subtle sequence signals: a Gibbs sam-
pling strategy for multiple alignment, Science 262 (1993) 208–214.
[19] A.F. Neuwald, J.S. Liu, C.E. Lawrence, Gibbs motif sampling: de-
tection of bacterial outer membrane protein repeats, Protein Sci. 4
(1995) 1618–1632.
K. Shibuya et al. / Genomics 83 (2004) 679–693 693
[20] K. Quandt, K. Frech, H. Karas, E. Wingender, T. Werner, MatInd and
MatInspector: new fast and versatile tools for detection of consensus
matches in nucleotide sequence data, Nucleic Acids Res. 23 (1995)
4878–4884.
[21] L. Rowen, B.F. Koop, L. Hood, The complete 685-kilobase DNA
sequence of the human beta T cell receptor locus, Science 272
(1996) 1755–1762.
[22] F. Matsuda, et al., The complete nucleotide sequence of the human
immunoglobulin heavy chain variable region locus, J. Exp. Med. 188
(1998) 2151–2162.
[23] K. Kawasaki, et al., One-megabase sequence analysis of the human
immunoglobulin lambda gene locus, Genome Res. 7 (1997) 250–261.
[24] K. Kawasaki, et al., Evolutionary dynamics of the human immuno-
globulin kappa locus and the germline repertoire of the Vkappa genes,
Eur. J. Immunol. 31 (2001) 1017–1028.
[25] B.C. Powell, A. Nesci, G.E. Rogers, Regulation of keratin gene ex-
pression in hair follicle differentiation, Ann. N. Y. Acad. Sci. 642
(1991) 1–20.
[26] I. Dunham, et al., The DNA sequence of human chromosome 22,
Nature 402 (1999) 489–495.
[27] N. Levitt, D. Briggs, A. Gil, N.J. Proudfoot, Definition of an efficient
synthetic poly(A) site, Genes Dev. 3 (1989) 1019–1025.
[28] G. Adami, J.R. Nevins, Splice site selection dominates over poly(A)
site choice in RNA production from complex adenovirus transcription
units, EMBO J. 7 (1988) 2107–2116.
[29] M.A. Rogers, H. Winter, C. Wolf, M. Heck, J. Schweizer, Character-
ization of a 190-kilobase pair domain of human type I hair keratin
genes, J. Biol. Chem. 273 (1998) 26683–26691.
[30] M.A. Rogers, H. Winter, L. Langbein, C. Wolf, J. Schweizer,
Characterization of a 300 kbp region of human DNA containing
the type II hair keratin gene domain, J. Invest. Dermatol. 114
(2000) 464–472.
[31] P.J. MacKinnon, B.C. Powell, G.E. Rogers, Structure and expression
of genes for a class of cysteine-rich proteins of the cuticle layers of
differentiating wool and hair follicles, J. Cell Biol. 111 (1990)
2587–2600.
[32] E.C. Uberbacher, Y. Xu, R.J. Mural, Discovering and understanding
genes in human DNA sequence using GRAIL, Methods Enzymol.
266 (1996) 259–281.
[33] C. Burge, S. Karlin, Prediction of complete gene structures in human
genomic DNA, J. Mol. Biol. 268 (1997) 78–94.
[34] M.Q. Zhang, Identification of protein coding regions in the human
genome by quadratic discriminant analysis, Proc. Natl. Acad. Sci.
USA 94 (1997) 565–568.
[35] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic
local alignment search tool, J. Mol. Biol. 215 (1990) 403–410.
[36] E.L.L. Sonnhammer, R. Durbin, A dot-matrix program with dynamic
threshold control suited for genomic DNA and protein sequence anal-
ysis, Gene 167 (1995) GC1–GC10.
[37] K. Shibuya, et al., Isolation of two novel genes, DSCR5 and DSCR6,
from Down syndrome critical region on human chromosome 21q22.2,
Biochem. Biophys. Res. Commun. 271 (2000) 693–698.