24
REVIEW ARTICLE published: 20 June 2013 doi: 10.3389/fneur.2013.00076 Trinucleotide repeats: a structural perspective Bruno Almeida, Sara Fernandes , Isabel A. Abreu and Sandra Macedo-Ribeiro* Instituto de Biologia Molecular e Celular, Universidade do Porto, Porto, Portugal Edited by: Thomas M. Durcan, McGill University, Canada Reviewed by: Denis Soulet, Laval University, Canada Thomas M. Durcan, McGill University, Canada *Correspondence: Sandra Macedo-Ribeiro, Instituto de Biologia Molecular e Celular, Universidade do Porto, Rua do Campo Alegre 823, 4150-180 Porto, Portugal e-mail: [email protected] Present address: Sara Fernandes, Shannon ABC, Limerick Institute ofTechnology, Limerick, Ireland; Isabel A. Abreu, GplantS, Instituto de Tecnologia Química e Biológica, Oeiras, Portugal. Trinucleotide repeat (TNR) expansions are present in a wide range of genes involved in sev- eral neurological disorders, being directly involved in the molecular mechanisms underlying pathogenesis through modulation of gene expression and/or the function of the RNA or protein it encodes. Structural and functional information on the role of TNR sequences in RNA and protein is crucial to understand the effect ofTNR expansions in neurodegenera- tion.Therefore, this review intends to provide to the reader a structural and functional view of TNR and encoded homopeptide expansions, with a particular emphasis on polyQ expan- sions and its role at inducing the self-assembly, aggregation and functional alterations of the carrier protein, which culminates in neuronal toxicity and cell death. Detail will be given to the Machado-Joseph Disease-causative and polyQ-containing protein, ataxin-3, provid- ing clues for the impact of polyQ expansion and its flanking regions in the modulation of ataxin-3 molecular interactions, function, and aggregation. Keywords: amino acid-repeats, microsatellites, protein complexes, protein aggregation, amyloid, protein structure TRINUCLEOTIDE REPEATS AND HUMAN DISEASE Trinucleotide repeat (TNR) expansions and their association with neurological disorders have been known for the past 20 years (La Spada et al., 1991). Expansion of CAG, GCG, CTG, CGG, and GAA repeats located in coding or non-coding sequences of different genes (summarized in Table 1; Figures 1 and 2) are asso- ciated with a diverse range of human monogenic diseases such as Spinobulbar Muscular Atrophy (SBMA, a.k.a. Kennedy dis- ease), Huntington Disease (HD), Spinocerebellar Ataxias (SCAs), Oculopharyngeal Muscular Dystrophy (OPMD), Myotonic Type 1 (DM1), Fragile X-Associated Tremor Ataxia Syndrome (FXTAS), and Friedreich Ataxia (FRDA) (for a review see Orr and Zoghbi, 2007), with longer repeats being correlated with earlier age at onset and increased disease severity. These TNR are highly unstable and the repeat tract length can change between affected indi- viduals within the same family and can be different in different tissues (La Spada, 1997; Brouwer et al., 2009). More interestingly, in the brain of patients affected by CAG expansions, differences in repeat instability have been found between specific cell types (Pearson et al., 2005; Gonitel et al., 2008; Lopez Castel et al., 2010). GCG repeats are usually shorter and reveal a higher sta- bility in different tissues and across generations than CAG repeats. The dynamic nature of these DNA repeat expansions is a con- sequence of their capability to form different secondary struc- tures, which interfere with the cellular mechanisms of replication, repair, recombination and transcription (for a recent review see Lopez Castel et al., 2010). The molecular mechanisms underly- ing pathogenesis in those disorders, either associated with mental retardation, neuronal, or muscular degeneration, might result from alterations in the levels of gene expression and/or the func- tion of the RNA or protein it encodes, mechanisms that likely act in concert to influence the pattern of selective cell toxic- ity. Some of those toxicity mechanisms will be briefly discussed below. TRINUCLEOTIDE REPEATS AND RNA STRUCTURE The formation of hairpin structures within the TNR RNA is related to the gain in RNA toxic function, the major pathogenic mecha- nism associated with CUG and CGG repeat expansions in non- coding regions of DM1 and FXTAS transcripts, which was also shown to contribute to pathogenesis in CAG repeat disorders such as HD and Machado-Joseph disease (MJD, a.k.a. SCA3) (reviewed in Krzyzosiak et al., 2012). These duplex structures, whose sta- bility is positively correlated with the repeat size (Napierala and Krzyzosiak, 1997), sequester dsRNA binding proteins involved in mRNA splicing such as CUG-binding protein (CUGBP) and mus- cleblind protein 1 (MBNL1) (Miller et al., 2000), inducing aber- rant splicing in affected cells, compromising multiple intracellular pathways, affecting cell-quality control regulation, and ultimately resulting in cell dysfunction (Li and Bonini, 2010). Structural stud- ies on model trinucleotide CUG, CAG, and CGG repeats forming double-stranded chains revealed the features induced by peri- odic U-U, A-A, and G-G mismatches, and provided hints into the structural details of pathogenic RNAs that are recognized by RNA- binding proteins (Mooers et al., 2005; Kiliszek et al., 2010, 2011; Kumar et al., 2011; Parkesh et al., 2011). MBNL1 is composed of four zinc-containing RNA-binding domains arranged in two tandem segments, with the C-terminal zinc-finger pair displaying a GC-sequence recognition motif (Teplova and Patel, 2008) and interacting with the stem region of expanded CUG RNAs (Yuan et al., 2007). Electron microscopy analysis of MBNL1:CUG 136 complexes showed that the pathogenic dsRNA forms a scaffold www.frontiersin.org June 2013 |Volume 4 | Article 76 | 1

Trinucleotide repeats: a structural perspective · 2017. 4. 12. · Trinucleotide repeat (TNR) expansions and their association with neurological disorders have been known for the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • REVIEW ARTICLEpublished: 20 June 2013

    doi: 10.3389/fneur.2013.00076

    Trinucleotide repeats: a structural perspective

    Bruno Almeida, Sara Fernandes†, Isabel A. Abreu† and Sandra Macedo-Ribeiro*

    Instituto de Biologia Molecular e Celular, Universidade do Porto, Porto, Portugal

    Edited by:Thomas M. Durcan, McGill University,Canada

    Reviewed by:Denis Soulet, Laval University, CanadaThomas M. Durcan, McGill University,Canada

    *Correspondence:Sandra Macedo-Ribeiro, Instituto deBiologia Molecular e Celular,Universidade do Porto, Rua do CampoAlegre 823, 4150-180 Porto, Portugale-mail: [email protected]†Present address:Sara Fernandes, Shannon ABC,Limerick Institute of Technology,Limerick, Ireland;Isabel A. Abreu, GplantS, Instituto deTecnologia Química e Biológica,Oeiras, Portugal.

    Trinucleotide repeat (TNR) expansions are present in a wide range of genes involved in sev-eral neurological disorders, being directly involved in the molecular mechanisms underlyingpathogenesis through modulation of gene expression and/or the function of the RNA orprotein it encodes. Structural and functional information on the role of TNR sequences inRNA and protein is crucial to understand the effect of TNR expansions in neurodegenera-tion.Therefore, this review intends to provide to the reader a structural and functional viewofTNR and encoded homopeptide expansions, with a particular emphasis on polyQ expan-sions and its role at inducing the self-assembly, aggregation and functional alterations ofthe carrier protein, which culminates in neuronal toxicity and cell death. Detail will be givento the Machado-Joseph Disease-causative and polyQ-containing protein, ataxin-3, provid-ing clues for the impact of polyQ expansion and its flanking regions in the modulation ofataxin-3 molecular interactions, function, and aggregation.

    Keywords: amino acid-repeats, microsatellites, protein complexes, protein aggregation, amyloid, protein structure

    TRINUCLEOTIDE REPEATS AND HUMAN DISEASETrinucleotide repeat (TNR) expansions and their association withneurological disorders have been known for the past 20 years(La Spada et al., 1991). Expansion of CAG, GCG, CTG, CGG,and GAA repeats located in coding or non-coding sequences ofdifferent genes (summarized in Table 1; Figures 1 and 2) are asso-ciated with a diverse range of human monogenic diseases suchas Spinobulbar Muscular Atrophy (SBMA, a.k.a. Kennedy dis-ease), Huntington Disease (HD), Spinocerebellar Ataxias (SCAs),Oculopharyngeal Muscular Dystrophy (OPMD), Myotonic Type 1(DM1), Fragile X-Associated Tremor Ataxia Syndrome (FXTAS),and Friedreich Ataxia (FRDA) (for a review see Orr and Zoghbi,2007), with longer repeats being correlated with earlier age at onsetand increased disease severity. These TNR are highly unstableand the repeat tract length can change between affected indi-viduals within the same family and can be different in differenttissues (La Spada, 1997; Brouwer et al., 2009). More interestingly,in the brain of patients affected by CAG expansions, differencesin repeat instability have been found between specific cell types(Pearson et al., 2005; Gonitel et al., 2008; Lopez Castel et al.,2010). GCG repeats are usually shorter and reveal a higher sta-bility in different tissues and across generations than CAG repeats.The dynamic nature of these DNA repeat expansions is a con-sequence of their capability to form different secondary struc-tures, which interfere with the cellular mechanisms of replication,repair, recombination and transcription (for a recent review seeLopez Castel et al., 2010). The molecular mechanisms underly-ing pathogenesis in those disorders, either associated with mentalretardation, neuronal, or muscular degeneration, might resultfrom alterations in the levels of gene expression and/or the func-tion of the RNA or protein it encodes, mechanisms that likely

    act in concert to influence the pattern of selective cell toxic-ity. Some of those toxicity mechanisms will be briefly discussedbelow.

    TRINUCLEOTIDE REPEATS AND RNA STRUCTUREThe formation of hairpin structures within the TNR RNA is relatedto the gain in RNA toxic function, the major pathogenic mecha-nism associated with CUG and CGG repeat expansions in non-coding regions of DM1 and FXTAS transcripts, which was alsoshown to contribute to pathogenesis in CAG repeat disorders suchas HD and Machado-Joseph disease (MJD, a.k.a. SCA3) (reviewedin Krzyzosiak et al., 2012). These duplex structures, whose sta-bility is positively correlated with the repeat size (Napierala andKrzyzosiak, 1997), sequester dsRNA binding proteins involved inmRNA splicing such as CUG-binding protein (CUGBP) and mus-cleblind protein 1 (MBNL1) (Miller et al., 2000), inducing aber-rant splicing in affected cells, compromising multiple intracellularpathways, affecting cell-quality control regulation, and ultimatelyresulting in cell dysfunction (Li and Bonini, 2010). Structural stud-ies on model trinucleotide CUG, CAG, and CGG repeats formingdouble-stranded chains revealed the features induced by peri-odic U-U, A-A, and G-G mismatches, and provided hints into thestructural details of pathogenic RNAs that are recognized by RNA-binding proteins (Mooers et al., 2005; Kiliszek et al., 2010, 2011;Kumar et al., 2011; Parkesh et al., 2011). MBNL1 is composedof four zinc-containing RNA-binding domains arranged in twotandem segments, with the C-terminal zinc-finger pair displayinga GC-sequence recognition motif (Teplova and Patel, 2008) andinteracting with the stem region of expanded CUG RNAs (Yuanet al., 2007). Electron microscopy analysis of MBNL1:CUG136

    complexes showed that the pathogenic dsRNA forms a scaffold

    www.frontiersin.org June 2013 | Volume 4 | Article 76 | 1

    http://www.frontiersin.org/Neurologyhttp://www.frontiersin.org/Neurology/editorialboardhttp://www.frontiersin.org/Neurology/editorialboardhttp://www.frontiersin.org/Neurology/editorialboardhttp://www.frontiersin.org/Neurology/abouthttp://www.frontiersin.org/Neurodegeneration/10.3389/fneur.2013.00076/abstracthttp://www.frontiersin.org/Community/WhosWhoActivity.aspx?sname=BrunoAlmeida&UID=83513http://www.frontiersin.org/Community/WhosWhoActivity.aspx?sname=SaraFernandes&UID=88667http://www.frontiersin.org/Community/WhosWhoActivity.aspx?sname=IsabelAbreu&UID=83538http://www.frontiersin.org/Community/WhosWhoActivity.aspx?sname=SandraMacedo-Ribeiro&UID=83310mailto:[email protected]://www.frontiersin.orghttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    Tab

    le1

    |Hu

    man

    dis

    ease

    sas

    soci

    ated

    wit

    hn

    ucl

    eoti

    de

    rep

    eat

    exp

    ansi

    on

    s(a

    dap

    ted

    fro

    mM

    essa

    edan

    dR

    ou

    leau

    ,200

    9;Lo

    pez

    Cas

    tele

    tal

    .,20

    10;M

    ato

    set

    al.,

    2011

    ).

    Dis

    ease

    nam

    eR

    epea

    t

    typ

    e

    Rep

    eat

    loca

    tio

    n

    Gen

    eP

    rote

    in(U

    niP

    rot

    iden

    tifi

    er,n

    um

    ber

    of

    resi

    du

    es)

    Bio

    log

    ical

    pro

    cess

    a

    No

    rmal

    rep

    eat

    len

    gth

    Dis

    ease

    rep

    eat

    len

    gth

    Pro

    tein

    stru

    ctu

    red

    eter

    min

    ed?

    Spi

    nala

    ndbu

    lbar

    mus

    cula

    rat

    roph

    y

    (SB

    MA

    )

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    AR

    And

    roge

    nre

    cept

    or

    (P10

    275,

    919

    resi

    dues

    )

    Tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    9–36

    38–6

    2R

    esid

    ues

    20–3

    0an

    d67

    1–91

    9(P

    DB

    code

    1xow

    )

    Hun

    tingt

    on’s

    dise

    ase

    (HD

    )

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    HTT

    Hun

    tingt

    in(P

    4285

    8,31

    42

    resi

    dues

    )

    Apo

    ptos

    is6–

    3436

    –121

    Res

    idue

    s5–

    18(3

    lrh),

    Res

    idue

    s1–

    17

    (2ld

    0,2l

    d2),

    Res

    idue

    s1–

    64(3

    io4,

    3io6

    ,3io

    r,3i

    ot,3

    iou,

    3iov

    ,3io

    w)

    Den

    tato

    rubr

    al-

    palli

    douy

    sian

    atro

    phy

    (DR

    PLA

    )

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    ATN

    1at

    roph

    in1

    (P54

    259,

    1190

    resi

    dues

    )

    Tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    7–34

    49–8

    8N

    ost

    ruct

    ural

    info

    rmat

    ion

    Spi

    noce

    rebe

    llar

    atax

    ia1

    (SC

    A1)

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    ATX

    N1

    atax

    in1

    (P54

    253,

    815

    resi

    dues

    )

    Tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    6–39

    40–8

    2R

    esid

    ues

    563–

    693

    (1oa

    8)

    Spi

    noce

    rebe

    llar

    atax

    ia2

    (SC

    A2)

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    ATX

    N2

    atax

    in2

    (Q99

    700,

    1313

    resi

    dues

    )

    No

    asso

    ciat

    edG

    O

    keyw

    ords

    for

    biol

    ogic

    al

    proc

    ess

    15–2

    432

    –200

    Res

    idue

    s91

    2–92

    8(3

    ktr)

    Spi

    noce

    rebe

    llar

    atax

    ia3

    (SC

    A3)

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    ATX

    N3/

    MJD

    atax

    in3

    (P54

    252,

    364

    resi

    dues

    )

    Tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n,U

    blco

    njug

    atio

    n

    path

    way

    10–5

    155

    –87

    Res

    idue

    s1–

    182

    (1yz

    b),R

    esid

    ues

    222–

    263

    (2kl

    z)

    Spi

    noce

    rebe

    llar

    atax

    ia6

    (SC

    A6)

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    CA

    CN

    A1

    AC

    AC

    NA

    1 A,P

    /Q-t

    ype

    α1A

    calc

    ium

    chan

    nels

    ubun

    it

    (O00

    555,

    2505

    resi

    dues

    )

    Cal

    cium

    tran

    spor

    t,io

    n

    tran

    spor

    t,tr

    ansp

    ort

    4–20

    20–2

    9R

    esid

    ues

    1955

    –197

    5(3

    bxk)

    Spi

    noce

    rebe

    llar

    atax

    ia7

    (SC

    A7)

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    ATX

    N7

    atax

    in7

    (O15

    265,

    892

    resi

    dues

    )

    Tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    4–35

    37–3

    06R

    esid

    ues

    330–

    401

    (2kk

    r)

    Spi

    noce

    rebe

    llar

    atax

    ia17

    (SC

    A17

    )

    CA

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyQ

    )

    ATX

    N17

    TATA

    box

    bind

    ing

    prot

    ein

    (TB

    P)(

    P20

    226,

    339

    resi

    dues

    )

    Tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n,H

    ost-

    viru

    s

    inte

    ract

    ion

    25–4

    247

    –63

    Res

    idue

    s15

    9–33

    7(1

    cdw

    ,1c9

    b,1j

    fi,

    1nvp

    ,1tg

    h)

    Mul

    tiple

    skel

    etal

    dysp

    lasi

    as(C

    OM

    P)

    GA

    CPr

    otei

    nco

    ding

    regi

    on

    (pol

    yasp

    arta

    te)

    CO

    MP

    cart

    ilage

    olig

    omer

    icm

    atrix

    prot

    ein

    (a.k

    .a

    Thro

    mbo

    spon

    din-

    5)

    (P49

    747,

    757

    resi

    dues

    )

    Apo

    ptos

    is,c

    ella

    dhes

    ion

    54,

    6,7

    Res

    idue

    s22

    5–75

    7(3

    fby)

    .

    Synp

    olyd

    acty

    ly

    (HO

    XD

    13)

    GC

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyA

    )

    HO

    XD

    13ho

    meo

    box

    D13

    (P35

    453,

    343

    resi

    dues

    )

    Tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    1522

    –29

    No

    stru

    ctur

    alin

    form

    atio

    n

    (Con

    tinue

    d)

    Frontiers in Neurology | Neurodegeneration June 2013 | Volume 4 | Article 76 | 2

    http://www.frontiersin.org/Neurodegenerationhttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    Tab

    le1

    |Co

    nti

    nu

    ed

    Dis

    ease

    nam

    eR

    epea

    t

    typ

    e

    Rep

    eat

    loca

    tio

    n

    Gen

    eP

    rote

    in(U

    niP

    rot

    iden

    tifi

    er,n

    um

    ber

    of

    resi

    du

    es)

    Bio

    log

    ical

    pro

    cess

    a

    No

    rmal

    rep

    eat

    len

    gth

    Dis

    ease

    rep

    eat

    len

    gth

    Pro

    tein

    stru

    ctu

    red

    eter

    min

    ed?

    Ocu

    loph

    aryn

    geal

    Mus

    cula

    rD

    ystr

    ophy

    (OP

    MD

    )

    GC

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyA

    )

    PAB

    PN

    1Po

    lyad

    enyl

    ate-

    bind

    ing

    prot

    ein

    2(Q

    86U

    42,3

    06

    resi

    dues

    )

    mR

    NA

    proc

    essi

    ng10

    12–1

    7R

    esid

    ues

    167–

    254

    (3b4

    d,3b

    4m,

    3ucg

    )

    Cle

    idoc

    rani

    al

    dysp

    lasi

    a(C

    BFA

    1)

    GC

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyA

    )

    RU

    NX

    2R

    unt-

    rela

    ted

    tran

    scrip

    tion

    fact

    or2

    (Q13

    950,

    521

    resi

    dues

    )

    Tran

    scrip

    tion;

    tran

    scrip

    tion

    regu

    latio

    n

    1727

    No

    stru

    ctur

    alin

    form

    atio

    n

    Hol

    opro

    senc

    epha

    ly

    (ZIC

    2)

    GC

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyA

    )

    ZIC

    2Zi

    nc-fi

    nger

    prot

    ein

    ZIC

    2

    (O95

    409,

    532

    resi

    dues

    )

    Diff

    eren

    tiatio

    n,

    neur

    ogen

    esis

    ,

    tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    1525

    No

    stru

    ctur

    alin

    form

    atio

    n

    Han

    d-Fo

    ot-G

    enita

    l

    Synd

    rom

    e/H

    OX

    A13

    )

    GC

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyA

    )

    HO

    XA

    13ho

    meo

    box

    A13

    (P31

    271,

    388

    resi

    dues

    )

    Tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    1824

    –26

    No

    stru

    ctur

    alin

    form

    atio

    n

    Ble

    phar

    ophi

    mos

    is/

    ptos

    is/e

    pica

    nthu

    s

    inve

    rsus

    synd

    rom

    e

    type

    II(F

    OX

    L2)

    GC

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyA

    )

    FOX

    L2Fo

    rkhe

    adbo

    xlik

    e2

    (P58

    012,

    376

    resi

    dues

    )

    Diff

    eren

    tiatio

    n,

    tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    1422

    –24

    Res

    idue

    s32

    2–32

    8(2

    l7z)

    Infa

    ntile

    spas

    m

    synd

    rom

    e(A

    RX

    )

    GC

    GPr

    otei

    nco

    ding

    regi

    on(p

    olyA

    )

    AR

    XA

    rista

    less

    -rel

    ated

    hom

    eobo

    x(Q

    96Q

    S3,

    562

    resi

    dues

    )

    Diff

    eren

    tiatio

    n,

    neur

    ogen

    esis

    ,

    tran

    scrip

    tion,

    tran

    scrip

    tion

    regu

    latio

    n

    10–1

    617

    –23

    No

    stru

    ctur

    alin

    form

    atio

    n

    Myo

    toni

    cdy

    stro

    phy

    type

    1(D

    M1)

    CTG

    3′U

    TRD

    MP

    KM

    yoto

    nic

    dyst

    roph

    y

    prot

    ein

    kina

    se(D

    MP

    K)

    (Q09

    013,

    639

    resi

    dues

    )

    No

    asso

    ciat

    edG

    O

    keyw

    ords

    for

    biol

    ogic

    al

    proc

    ess

    5–37

    90–6

    500

    Res

    idue

    s11

    –420

    (2vd

    5),R

    esid

    ues

    460–

    537

    (1w

    t6)

    Frie

    drei

    chat

    axia

    (FR

    DA

    )

    GA

    AIn

    tron

    FXN

    Frat

    axin

    (Q16

    595,

    210

    resi

    dues

    )

    Hem

    ebi

    osyn

    thes

    is,I

    on

    tran

    spor

    t,Ir

    onst

    orag

    e,

    Iron

    ,tra

    nspo

    rt

    6–32

    >20

    0R

    esid

    ues

    88–2

    10(1

    ekg)

    ,Res

    idue

    s

    91–2

    10(1

    ly7)

    ,Res

    idue

    s82

    –210

    (3s4

    m,3

    s5d,

    3s5e

    ,3s5

    f,3t

    3j,3

    t3k,

    3t3l

    ,3t3

    t,3t

    3x)

    Spi

    noce

    rebe

    llar

    atax

    ia8

    (SC

    A8)

    CTG

    3′U

    TRAT

    XN

    8A

    taxi

    n-8

    (a.k

    .apr

    otei

    n1C

    2;

    (Pre

    sent

    inS

    CA

    8-sp

    ecifi

    c

    1C2-

    posi

    tive

    intr

    anuc

    lear

    incl

    usio

    ns)(

    Q15

    6A1,

    80

    resi

    dues

    )

    Cel

    ldea

    th2–

    130

    >11

    0N

    ostr

    uctu

    rali

    nfor

    mat

    ion

    (Con

    tinue

    d)

    www.frontiersin.org June 2013 | Volume 4 | Article 76 | 3

    http://www.frontiersin.orghttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    Tab

    le1

    |Co

    nti

    nu

    ed

    Dis

    ease

    nam

    eR

    epea

    t

    typ

    e

    Rep

    eat

    loca

    tio

    n

    Gen

    eP

    rote

    in(U

    niP

    rot

    iden

    tifi

    er,n

    um

    ber

    of

    resi

    du

    es)

    Bio

    log

    ical

    pro

    cess

    a

    No

    rmal

    rep

    eat

    len

    gth

    Dis

    ease

    rep

    eat

    len

    gth

    Pro

    tein

    stru

    ctu

    red

    eter

    min

    ed?

    Spi

    noce

    rebe

    llar

    atax

    ia12

    (SC

    A12

    )

    CA

    G5′

    UTR

    PP

    P2R

    2BS

    erin

    e/th

    reon

    ine-

    prot

    ein

    phos

    phat

    ase

    2A55

    kDa

    regu

    lato

    rysu

    buni

    tB

    β

    isof

    orm

    (Q00

    005,

    443

    resi

    dues

    )

    Apo

    ptos

    is7–

    4555

    –78

    No

    stru

    ctur

    alin

    form

    atio

    n

    Hun

    tingt

    on

    dise

    ase-

    like

    2(H

    DL2

    )

    CA

    GA

    ltern

    ativ

    e

    splic

    eis

    ofor

    m

    2–

    poly

    A-

    expa

    nsio

    n

    JPH

    3Ju

    ncto

    phili

    n3

    (Q8W

    XH

    2,

    748

    resi

    dues

    )

    No

    asso

    ciat

    edG

    O

    keyw

    ords

    for

    biol

    ogic

    al

    proc

    ess

    6–27

    51–5

    7N

    ost

    ruct

    ural

    info

    rmat

    ion

    FRA

    XA

    :fra

    gile

    X

    synd

    rom

    e

    CG

    G5′

    UTR

    FMR

    1Fr

    agile

    Xm

    enta

    l

    reta

    rdat

    ion

    1pr

    otei

    n

    (Q06

    787,

    632

    resi

    dues

    ).

    Tran

    spor

    t;m

    RN

    Atr

    ansp

    ort

    6–52

    230–

    2000

    Res

    idue

    s1–

    134

    (2bk

    d),R

    esid

    ues

    216–

    280

    (2fm

    r),R

    esid

    ues

    216–

    425

    (2qn

    d),R

    esid

    ues

    527–

    541

    (2la

    5)

    FXTA

    S:f

    ragi

    leX

    trem

    or/a

    taxi

    a

    synd

    rom

    e

    CG

    G5′

    UTR

    FMR

    1Fr

    agile

    Xm

    enta

    l

    reta

    rdat

    ion

    1pr

    otei

    n

    (Q06

    787,

    632

    resi

    dues

    ).

    Tran

    spor

    t;m

    RN

    Atr

    ansp

    ort

    6–52

    59–2

    30R

    esid

    ues

    1–13

    4(2

    bkd)

    ,Res

    idue

    s

    216–

    280

    (2fm

    r),R

    esid

    ues

    216–

    425

    (2qn

    d),R

    esid

    ues

    527–

    541

    (2la

    5)

    FRA

    XE

    :fra

    gile

    X

    synd

    rom

    e

    CG

    G5′

    UTR

    FMR

    2Fr

    agile

    Xm

    enta

    l

    reta

    rdat

    ion

    2pr

    otei

    n

    (P51

    816,

    1311

    resi

    dues

    )

    mR

    NA

    proc

    essi

    ng,m

    RN

    A

    splic

    ing

    4–39

    200–

    900

    No

    stru

    ctur

    alin

    form

    atio

    n

    UTR

    ,unt

    rans

    late

    dre

    gion

    .aB

    iolo

    gica

    lFun

    ctio

    nba

    sed

    onG

    ene

    Ont

    olog

    yas

    anno

    tate

    din

    Uni

    Prot

    .

    Frontiers in Neurology | Neurodegeneration June 2013 | Volume 4 | Article 76 | 4

    http://www.frontiersin.org/Neurodegenerationhttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    FIGURE 1 | Structural variability of proteins encoded byTNR-containinggenes. Illustrative domain graphics of the multi-domain structure of proteinsassociated with polyQ-expansion diseases. All proteins shown arereferenced by their name as annotated in UniProt. The protein domains forwhich information is annotated in the Pfam database are shown as coloredboxes with Pfam family accession code referenced above the domain box.Complete names of domains can be assessed by searching the specific

    Pfam accession code at http://pfam.sanger.ac.uk/. Numbers below thedomain schemes represent amino acid residue numbers. Regionscontaining the amino acid repeats and with a prediction for formation ofcoiled-coils (as annotated in UniProt) are shown as well as regions withknown 3D structure (boxed in red, with PDB accession codes shown).Notice the predominant location of the repeat regions within the N-terminalregions of the proteins.

    with tandem spaced MBNL1 binding sites were MBNL1 oligomerswith a ring-like structure can assemble, possibly leading to the for-mation of the ribonuclear foci identified in cell models of theseTNR diseases (Yuan et al., 2007; de Mezer et al., 2011). The struc-ture and stability of the TNR hairpin structures formed depends onthe presence of interruptions as well as on the nature of the flank-ing regions. This might be related with the ability of individual

    repeats to participate in the RNA toxicity mechanisms (Krzyzosiaket al., 2012).

    In FRDA and FXTAS, pathogenesis results predominantlyfrom decreased expression of the associated genes (FXN andFMR1/FMR2) caused by the expansion of GAA and CGG repeats,respectively, which results in loss of function of key proteinsinvolved in iron-sulfur cluster biogenesis and mRNA translation

    www.frontiersin.org June 2013 | Volume 4 | Article 76 | 5

    http://pfam.sanger.ac.uk/http://www.frontiersin.orghttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    FIGURE 2 | Structural variability of proteins encoded byTNR-containing genes. Illustrative domain graphics of the multi-domainstructure of proteins associated with polyD- and polyA-expansion diseases.All proteins shown are referenced by their name as annotated in UniProt.The protein domains for which information is annotated in the Pfamdatabase are shown as colored boxes with Pfam family accession codereferenced above the domain box. Complete names of domains can beassessed by searching the specific Pfam accession code athttp://pfam.sanger.ac.uk/. Numbers below the domain schemes representamino acid residue numbers. Regions containing the amino acid repeatsand with a prediction for formation of coiled-coils (as annotated in UniProt)are shown as well as regions with known 3D structure (boxed in red, withPDB accession codes shown). Notice the predominant location of therepeat regions within the N-terminal regions of the proteins.

    at synapses. Nevertheless, in FXTAS RNA toxicity is also proposedto play a role in pathogenesis (Li and Bonini, 2010). The recentlydiscovered mechanisms of pathogenesis in spinocerebellar ataxiatype 8 (SCA8) uncovered the extreme complexity of TNR disor-ders. In fact, SCA8 is caused by expansion of CTG/CAG repeatsin the affected gene, which are transcribed bi-directionally leading

    to the generation of expanded CUG and CAG-containing tran-scripts further translated into homopolymeric proteins, so thatpathogenesis can be mediated by both RNA and protein toxicity(Merienne and Trottier, 2009). Curiously, recent data have high-lighted the possibility of non-ATG translation across expandedTNR in all possible reading frames, which might further con-tribute to the generation of novel toxic proteins and RNAs addingto the multi-parametric character of the pathogenic mechanismsassociated with TNR diseases (Li and Bonini, 2010; Pearson, 2011;Sicot et al., 2011).

    TRINUCLEOTIDE REPEATS WITHIN PROTEIN CODING REGIONSOver 20 years ago, the finding that the expansion of CAG repeatswithin the coding sequence of the androgen receptor gene was thegenetic basis of SBMA (La Spada et al., 1991) represented a hall-mark in the discovery of these novel dynamic mutations and theirassociation with human disease. Some years later, the identifica-tion of intracellular inclusions containing the expanded proteins(Paulson et al., 1997) provided a clue to pathogenesis, directingresearch in the field into an extensive search for the mechanismsof polyQ-induced protein aggregation. The moderate expansionof GCG and CAG repeats, which are translated into polyA andpolyQ tracts in the affected proteins (Figures 1 and 2), results inprotein misfolding and aggregation, in accordance with a general,although not always unique, toxic gain of function mechanismof pathogenesis (Williams and Paulson, 2008). The appearanceof insoluble cytoplasmic or nuclear inclusions enriched in theexpanded polyA- or polyQ-containing protein constitutes a char-acteristic fingerprint of these diseases (Messaed and Rouleau, 2009;Orr, 2012a), regardless of their controversial role in pathogenesis.While the proteins containing polyA repeats are predominantlytranscription factors with a role in development (see Table 1and Amiel et al., 2004; Messaed and Rouleau, 2009), most ofthe proteins linked to polyQ-expansion diseases are involved inDNA-dependent regulation of transcription or neurogenesis andoften contain multiple intermolecular partners (Butland et al.,2007). Despite the overall lack of sequence or structural homol-ogy, both polyQ- and polyA-repeat expansions are associated withformation of ß-rich amyloid-like protein inclusions, and with thewider group of protein misfolding disorders. These inclusions areenriched in ubiquitin, proteasome subunits, and chaperones, andoften recruit macromolecules that are part of the macromolecularinteraction networks associated with the proteins’ native functions(Williams and Paulson, 2008). As an example, the poly(A)-bindingprotein PABNP1 forms insoluble inclusions upon alanine expan-sion, co-aggregating together with poly(A)-mRNA, proteasomesubunits, ubiquitin, heat-shock proteins, and SKIP, a transcrip-tion factor associated with muscle-specific gene expression (Brais,2003; Tavanez et al., 2009; Winter et al., 2013).

    The simplistic view of the predominant role of the inclusionsin polyQ-induced pathogenesis was later challenged by the failureof this mechanism to explain the cell-specific vulnerability char-acteristic for each disease and by the identification of numerousexamples of neuronal toxicity in the absence of visible intracellu-lar inclusions (Arrasate et al., 2004). Indeed, the inclusions wereshown to be fibrillar and display amyloid-like properties bothin vivo and in vitro (Huang et al., 1998; Bevivino and Loll, 2001;

    Frontiers in Neurology | Neurodegeneration June 2013 | Volume 4 | Article 76 | 6

    http://pfam.sanger.ac.uk/http://www.frontiersin.org/Neurodegenerationhttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    Sathasivam et al., 2010) and, in a mechanistic parallel with thepathogenic mechanisms proposed for “classical” amyloids, manystudies suggested that the insoluble inclusions played a protec-tive role, sequestering toxic, and misfolded protein conformers(Arrasate et al., 2004; Rub et al., 2006; Miller et al., 2010). Indeed,soluble intermediates in the aggregation pathway such as mis-folded β-sheet rich polyQ protein monomers and oligomers havelatter been identified and proposed to represent the major toxicspecies (Kayed et al., 2003; Gales et al., 2005; Nagai et al., 2007;Miller et al., 2011). Also, in OPMD, the primary toxic species areproposed to be the soluble variants of the expanded polyA-repeatprotein PABPN1 (Messaed et al., 2007). It is currently accepted thatin polyQ disorders the expanded region plays a role in inducingthe self-assembly of the carrier protein, which engages in patho-genic interactions and leads to the formation of toxic monomersor oligomers (Takahashi et al., 2008; Weiss et al., 2008) latterconverted to insoluble intracellular amyloid-like oligomers whereboth expanded and “normal” protein are sequestered along withother macromolecular partners (reviewed in Williams and Paul-son, 2008; Matos et al., 2011; Costa and Paulson, 2012). As morebiochemical data is gathered, more is understood about the role ofamino acid expansions in modulating the interaction with macro-molecular partners. As an example, expansion of the polyA tract inPABPN1 results in increased association with Hsp70 chaperonesand type I arginine methyl transferases (Tavanez et al., 2009). Thisindicates that the distinct neuropathological features arising fromthis amino acid-repeat expansion might at least partially resultfrom alterations on the native biological functions and macro-molecular interactions of the carrier protein, which might vary indifferent intracellular environments.

    Recent data have shown that expansion of polyA repeats isfrequently associated with loss of normal function altering a mul-titude of cellular pathways with consequences in cell functionality(Amiel et al., 2004; Messaed and Rouleau, 2009), although proteinaggregation might also play a dominant role in some of the polyA-associated disorders (Messaed and Rouleau, 2009; Winter et al.,2013). Studies with polyQ proteins have shown that pathogene-sis might result from a subtle imbalance in the association of themutant protein with multiple cellular partners and that toxicityand neuronal death could result from a combination of proteinself-assembly and functional alterations (Friedman et al., 2007; Liet al., 2007b; Lim et al., 2008; Kratter and Finkbeiner, 2010; Orr,2012b; Pastore and Temussi, 2012). In fact, neuronal death as aresult of polyQ-expansion seems to resemble that of linker cell inC. elegans (Pilar and Landmesser, 1976; Chu-Wang and Oppen-heim, 1978; Blum et al., 2012, 2013) which involves the polyQprotein pqn-4, pointing for a common mechanism for linker celldeath, and neuronal death in polyQ diseases (Blum et al., 2013).

    Polyglutamine diseases constitute a representative and largelystudied group of neurodegenerative disorders where considerableamounts of data have been collected on the role of expandedpolyQ for disease pathogenesis. However, given the proposed func-tion of polyQ regions in mediating protein–protein interactions,which might be modulated by polyQ-expansion (Schaefer et al.,2012), the information on the role of these regions for native pro-tein function, structure, and dynamics is still limited. Structuraland functional information on the role of these repeat sequences

    in protein function is crucial to better understand how expan-sion affects selected neuronal subpopulations. Below, we brieflydiscuss the current knowledge on the function and structure ofpolyQ repeats and their role on macromolecular interactions, andfinally focus on the known structural and functional informationon ataxin-3, the protein whose mutation causes MJD.

    FUNCTION OF PolyQ ON PROTEIN–PROTEIN INTERACTIONSAND EVOLUTIONUntil recently, the function of many amino acid-repeat-containingproteins and the role of homopeptide regions were somewhatobscure. However, several global analysis studies on single aminoacid-repeat-containing proteins shed light onto their function andonto the biological significance of the repeated region, in particu-lar of polyQ, the most prevalent amino acid repetition in humans(Alba and Guigo, 2004). It is now accepted that TNR, particu-larly those located within protein-coding regions, are consideredimportant mutators providing the genetic variability required fordriving evolution (King, 1994; Kashi et al., 1997; Kashi and King,2006; Nithianantharajah and Hannan, 2007). In fact, simple orlow-complexity amino acid-repeats are rare within prokaryoticbut extremely abundant within eukaryotic proteins, particularlyover-represented in Plasmodium (49–90% of the total proteome),D. discoideum (52%), D. melanogaster (20%), C. elegans (9%),and H. sapiens (14%) (Haerty and Golding, 2010). Among allhomopolymeric repeats, the most common on eukaryotic pro-teins are glutamine, asparagine, alanine, and glutamate repeats(Faux et al., 2005). This seems to indicate that there has been astrong negative selection against the appearance of hydrophobicamino acid-repeats with high tendency to aggregate, such as poly-isoleucine, polyleucine, polyphenylalanine, and polyvaline (Omaet al., 2005, 2007).

    The homopeptide regions seem to be particularly relevant forbrain development and function, since these repeated regions canbe found in various neurodevelopmental genes (Nithiananthara-jah and Hannan, 2007). Indeed, the sexual behavior of prairievoles (Hammock andYoung, 2005), as well as human pair-bonding(Walum et al., 2008), seems to be dependent on the repeat lengthin the vasopressin 1A receptor gene. A wide study of the distribu-tion and function of homopeptide-containing proteins could alsodemonstrate a clear trend in humans, D. melanogater, and C. ele-gans, with the majority of homopeptide-containing proteins per-forming roles in transcription/translation and signaling processesand to a less extend in transport and adhesion processes (Fauxet al., 2005). A similar profile was also found in a comparativeanalysis of proteins with amino acid-repeats in human and rodents(Alba and Guigo, 2004) and also on a comparative genomic studyin domestic dogs, which unveiled an association between mor-phological variations and the length of the repeated region in thetranscription factor-encoded genes ALX4 and RUNX2 (Fondonand Garner,2004). Analysis of the human genome also revealed theexistence of 64 CAG repeat-containing genes involved in biologicalprocesses such as regulation of transcription, binding of transcrip-tional co-activators and transcription factors, and in neurogenesisin general (Butland et al., 2007). Additionally, a detailed analy-sis of the human polyQ database (http://pxgrid.med.monash.edu.au/polyq/) (Robertson et al., 2011) also indicated that the

    www.frontiersin.org June 2013 | Volume 4 | Article 76 | 7

    http://pxgrid.med.monash.edu.au/polyq/http://pxgrid.med.monash.edu.au/polyq/http://www.frontiersin.orghttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    majority of polyQ-containing proteins display domains involvedin development (Homeobox domain-containing proteins, Fibrob-last growth factor receptor), chromatin remodeling (Bromod-omain and PHD-containing proteins), and signal transduction(PDZ domain-containing proteins), all biological processes thatare highly dependent on protein–protein interactions and associ-ated with the formation of multicomponent protein complexes. Asfor humans, analysis of bovine polyQ proteins revealed an enrich-ment for large multi-domain transcriptional regulators (Whanet al., 2010).

    It is currently accepted that the majority of repeat-containingproteins perform roles in processes that require the assembly oflarge multiprotein or protein/nucleic acid complexes (Faux et al.,2005; Hancock and Simon, 2005; Whan et al., 2010). Supportingthis notion is the fact that homopolymeric amino acid-repeats areconsidered to be unstructured (Gojobori and Ueda, 2011) andthat intrinsically unstructured regions are suggested to consti-tute macromolecular docking sites, which become structured onlywhen bound to cognate ligand partners (Huntley and Golding,2002; Simon and Hancock, 2009). In fact, “hub proteins” con-tain significantly longer and more frequent repeats or disorderedregions, which facilitate binding to multiple partners (Dosztanyiet al., 2006). Recently, Fiumara et al. (2010) found an overrep-resentation of coiled-coils domains in polyQ-containing proteinsand in their interaction partners, which are able to form α-helicalsupersecondary structures, often inducing protein oligomeriza-tion (Parry et al., 2008). Thus, polyQ tracts due to their intrinsicstructural flexibility, which is largely influenced by the flankingresidues (see PolyQ:A Simple Sequence Repeat with a PolymorphicStructure below), may act as stabilizers of intra- and intermole-cular protein interactions, possibly by extending a neighboringcoiled-coil region to promote its interaction with a coiled-coilregion in an interacting protein partner (Schaefer et al., 2012).A detailed analysis revealed heptad repeats typical of coiled-coilsin regions flanking or overlapping polyQ stretches, whose disrup-tion is sufficient to impair CHIP-huntingtin interaction, indicatingthat coiled-coils are crucial for polyQ-mediated protein contacts.Importantly, coiled-coils also seem to be important for the regula-tion of aggregation and insolubility of polyQ-containing proteins(see below and Fiumara et al., 2010) as recently proposed byPetrakis et al. (2012), which discovered a recurrent presence ofcoiled-coil domains in ataxin-1 misfolding enhancers, while suchdomains were not present in suppressors.

    Based on the several observations on the function of polyQ-containing proteins it is suggested that a general function of polyQ,as for the majority of repeat sequences, is to aid in the assem-bly of macromolecular complexes, either through tethered distantdomains or through interactions with the polyQ itself (Gerberet al., 1994; Korschen et al., 1999; Faux et al., 2005). By affectingprotein interactions, and being present in particular functionalclasses such as transcription factors, polyQ is considered central tothe evolution of this type of proteins and consequently crucial tothe evolution of cellular signaling pathways (Hancock and Simon,2005).

    A structural analysis of polyQ repeats and its flanking domainsas well as its role in protein aggregation will be discussed in greaterdetail in the next sections.

    STRUCTURAL STUDIES ON PolyQ REPEATSSince the discovery that polyQ repeats are associated with humanneurodegenerative diseases that a huge effort has been made todetermine the structure of polyQ and to understand how expan-sion of the repeat affects the structure of the carrier protein and/orthe normal interaction with molecular partners. The first evidencefrom the aggregation-prone character of polyQ-rich proteins camefrom studies with glutamine-rich cereal storage proteins and syn-thetic glutamine polypeptides (Beckwith et al., 1965; Krull et al.,1965). After the discovery that a number of neurological disor-ders were triggered by expansion of a polyQ tract in different andunrelated proteins (La Spada et al., 1994), and before intracellularinclusions enriched in the polyQ-expanded protein were identi-fied as a major fingerprint in these diseases (Davies et al., 1997;Paulson et al., 1997), Perutz (1994) anticipated that the expandedpolyQ tract could mediate protein–protein interactions causingprotein aggregation in neurons and recruiting other polyQ-richproteins such as transcription factors leading to cellular dysfunc-tion. Below, the structural features and self-assembly properties ofpolyQ sequences are briefly discussed (for a detailed review on thebiophysical and structural features of polyQ, see Wetzel, 2012).

    PolyQ: A SIMPLE SEQUENCE REPEAT WITH A POLYMORPHICSTRUCTUREIn order to elucidate the structure of the glutamine repeat andto uncover the structural changes induced by polyQ expansion,several strategies have been put forward including (a) the struc-tural analysis of polyQ-containing peptides of different lengths,(b) the characterization of proteins of well-known structure afterinsertion of an exogenous polyQ repeat, and structural determina-tion of (c) polyQ-antibody complexes, or (d) natural polyQ-richproteins.

    Using synthetic peptides containing 15 glutamine repeats,Perutz and coworkers proposed that polyQ stretches could self-associate forming hydrogen bonds between their side-chain amidegroups and the main chain of a neighboring β-strand, to formcross-β structures (polar zippers) (Perutz, 1994). This study wasfollowed by many reports where synthetic polyQ peptides wereused as models of the biophysical properties of polyQ-rich pro-teins, which established that polyQ-containing peptides have atendency toward self-assembly into amyloid-like structures (Chenet al., 2002a). Moreover, the results obtained in vitro reflected dis-ease features observed in vivo such as the correlation betweenlarger polyQ size, increased protein aggregation, and earlier diseaseonset (Chen et al., 2002b; Kar et al., 2011). Circular dichroism stud-ies of polyQ peptides in solution have shown that their monomericforms lack regular secondary structure (Altschuler et al., 1997;Klein et al., 2007) and additional biophysical experiments pro-posed that these peptides can adopt collapsed (Crick et al., 2006;Dougan et al., 2009; Peters-Libeu et al., 2012) or extended (Singhand Lapidus, 2008) coils in solution whose compactness wasstrongly correlated with the polyQ size (Walters and Murphy,2009). The determination of the structure of monomeric polyQpeptides with atomic detail is however still lacking as a result oftheir intrinsic conformational flexibility and tendency to aggregateinto heterogeneously sized β-rich oligomers. From the combina-tion of experimental and theoretical methods a picture for polyQ

    Frontiers in Neurology | Neurodegeneration June 2013 | Volume 4 | Article 76 | 8

    http://www.frontiersin.org/Neurodegenerationhttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    structure and aggregation is emerging, where the monomericpolyQ adopt an ensemble of conformations lacking regular sec-ondary structures that assemble into β-structures in a polyQ-length dependent fashion (Vitalis et al., 2009; Walters and Murphy,2009, 2011; Williamson et al., 2010; Kar et al., 2011). Divergentresults proposing the existence of predominantly extended or col-lapsed conformations or the minimum size for polyQ aggregationare likely due to the differences in the introduction of variableflanking residues (Kar et al., 2011). They might result from theinsertion of different polyQ tract interrupting residues (Waltersand Murphy, 2011), or be a consequence of the protocols used forthe preparation and disaggregation of the peptides used for thebiophysical studies (Jayaraman et al., 2011). Most results obtainedwith these peptides do not generally take into account the pos-sible effects of the protein context on the structural propertiesof the polyQ stretches, a particularly relevant feature consideringthat the role of non-polyQ domains in protein aggregation hasbeen reported for ataxin-1 (de Chiara et al., 2005), ataxin-3 (Galeset al., 2005), and huntingtin (Tam et al., 2009; Thakur et al., 2009;Liebman and Meredith, 2010).

    In a pioneer work, Stott et al. (1995) inserted a G-Q10-G peptide into the inhibitory loop of chymotrypsin inhibitor2 (CI2), a soluble small protein from barley seeds, showingthat this CI2-polyQ chimera has an increased tendency for self-assembly. Even though a CI2 variant with four glutamines crys-tallized, the structure of the CI2-Q4 dimer showed that thepolyQ region was disordered and that oligomerization was medi-ated by domain swapping (Figure 3A) and not by direct polyQassociation (Chen et al., 1999). A structure resembling the pro-posed polar zipper was later observed between two asparaginesin the hinge loop of the major domain swapped dimer ofbovine pancreatic ribonuclease A (Liu et al., 2001) (Figure 3B).Insertion of a 10 glutamine repeat within this hinge loop ofribonuclease A, resulted in domain swapping, oligomerization,and amyloid-like fiber formation, but strikingly the enzymewithin the fibers was catalytically active, retaining its nativefold (Sambashivan et al., 2005). However, although the struc-ture of the domain swapped dimer was solved by X-ray crys-tallography, the repeat region was not visible in the electrondensity maps.

    FIGURE 3 | Structure of proteins/protein domains containing polyQregions. (A) Cartoon representation of the domain swapped dimer ofchymotrypsin inhibitor 2 with a 4 glutamine insertion [(Chen et al., 1999); PDBaccession code 1cq4], dotted lines represent the polyQ linker not visible inthe X-ray crystal structure. (B) Cartoon representation of domain swappedmajor dimers of ribonuclease A. Inset shows a short segment resembling thepolar zipper formed by asparagine residues in the linker region [(Liu et al.,2001); PDB accession code 1f0v]. (C) Surface representation Fv fragment of amonoclonal antibody in complex with a polyQ peptide shown as sticks [(Li

    et al., 2007a), PDB accession code 2otu]. (D) Cartoon representation of theglutamine-rich domain from HDAC4 showing details of the polar interactions(dotted lines) at the oligomer interfaces involving glutamine residues [(Guoet al., 2007), PDB accession code 2o94]. (E) Cartoon representation of thecrystal structures of huntingtin exon-1 fragments observed in different crystalforms, highlighting the different orientations of the C-terminal polyQ residuesshown as sticks. The 17 glutamine stretch adopts variable conformations inthe structures: α helix, random coil, and extended loop. [(Kim et al., 2009),PDB accession codes 3io4, 3iow, 3iov, 3iou, 3iot, 3ior, 3io6].

    www.frontiersin.org June 2013 | Volume 4 | Article 76 | 9

    http://www.frontiersin.orghttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    A first overview of a short polyQ stretch at atomic resolu-tion resulted from the structure of a polyQ10 peptide (GQ10G)(Figure 3C) bound to MW1, an antibody against polyQ. Thisstructure reveals that polyQ adopts an extended, coil-like struc-ture in which contacts are made between side chains and/or mainchain atoms of all 10 glutamines and the antibody-combining site(Li et al., 2007a). The peculiar structural features of these repeat-containing regions were also revealed by the crystallographicstructure of a glutamine-rich domain of human histone deacety-lase (HDAC4), that folds into a tetramer-forming straight α-helix(Figure 3D). The protein interfaces consist of multiple hydropho-bic patches separated by polar interaction networks, in whichclusters of glutamines engage in extensive intra- and interheli-cal interactions (Guo et al., 2007). Further details on the structureof polyQ were unveiled by the high-resolution crystal structuresof huntingtin (HD) exon 1, containing 17 glutamines (Htt17Q)(Kim et al., 2009). Htt17Q in fusion with maltose-binding pro-tein (MBP) folds into an amino-terminal α-helix followed by apolyQ17 region that adopts multiple conformations in the differ-ent crystal forms, including α-helix, random coil, and extendedloop, and a polyproline helix formed by the polyP11 and mixedP/Q regions (Figure 3E). The authors suggested that the shallowequilibrium between α-helical, random coil, and extended confor-mations can be subtly altered by the size of polyQ sequence, theneighboring protein context, protein interactions, or by changesin cellular environment, and that this polymorphic behavior isa common characteristic of many amyloidogenic proteins (Kimet al., 2009).

    SELF-ASSEMBLY AND AGGREGATION OF PolyQ REPEATSThe first approaches to characterize polyQ-induced protein aggre-gation and pathogenesis in the context of a full-length proteinincluded the insertion of the polyQ peptides into well-known non-pathogenic protein carriers such as hypoxanthinephosphoribosyltransferase (HPRT), which resulted in a neurological phenotypemimicking that observed in mice expressing the mutant HD trun-cated protein (Ordway et al., 1997). In vitro studies aiming at bettercharacterizing the structure and function of polyQ repeats in thecontext of full-length soluble proteins, included the insertion ofectopic polyQ stretches into well-characterized and soluble pro-teins such as CI2 (Stott et al., 1995; Chen et al., 1999), myoglobin(Mb) (Tanaka et al., 2001; Tobelmann and Murphy, 2011), glu-tathione S transferase (GST) (Masino et al., 2002; Bulone et al.,2006) and the B domain from Staphylococcus aureus Protein A(SpA) (Saunders et al., 2011). Fusion of the polyQ sequenceswith stable and soluble proteins moderates the intrinsic polyQpeptide aggregation propensity, but induces the self-assembly ofcarrier proteins into fibrillar amyloid-like structures, a nucleation-dependent process whose kinetics is directly proportional to thesize of the inserted polyQ repeat. Likewise, polyQ peptides are ableto seed the aggregation of intracellular soluble polyQ-containingproteins when added to cell cultures, conferring a heritable pheno-type of self-sustaining seeding, resembling a prion-like mechanism(Ren et al., 2009), reviewed in Cushman et al. (2010).

    The impact of the polyQ tract and its expansion on the per-turbation of the structure of flanking sequences and domains is

    critically dependent on the location of the amino acid-repeats,revealing impressive location-dependent changes in structural sta-bility, and fibril morphology of the host proteins (Robertsonet al., 2008; Saunders et al., 2011; Tobelmann and Murphy, 2011).Curiously, the studies with these model proteins showed that sta-bility and structure of the carrier protein remained unalteredby polyQ expansion when the repeat was inserted at the N- orC-terminus of the structured domain (Robertson et al., 2008),mimicking the location of polyQ tracts in most disease-relatedproteins (Figure 1).

    The role of the flanking regions in modulating protein fibrilformation in polyQ disease proteins is well supported by experi-mental data (de Chiara et al., 2005; Gales et al., 2005; Bhattacharyyaet al., 2006; Saunders and Bottomley,2009; Tam et al., 2009; Thakuret al., 2009; Liebman and Meredith, 2010), in agreement with theknowledge that different polyQ-containing proteins have a diversethreshold for aggregation. For example, addition of a polyprolineextension after the polyQ repeat slows down aggregation (Bhat-tacharyya et al., 2006), while protein domains outside the polyQtract [e.g., Josephin domain (JD) of ataxin-3 and AHX domainof ataxin-1] have been shown to contribute to protein aggre-gation (Masino et al., 2004; de Chiara et al., 2005; Gales et al.,2005; Ellisdon et al., 2006, 2007). The multitude of data on thepolyQ-induced aggregation of disease and non-disease-proteinshighlights the complex interplay between the polyQ region andthe adjacent protein domains. In light of the polymorphic natureof the polyQ and the modulation of its structural features bythe protein context, two general mechanisms have been proposedfor polyQ-mediated toxicity (Kim et al., 2009): (a) the expandedpolyQ stretch adopts a novel conformation that mediates toxicityor is the precursor to toxic species; (b) intra- or intermolecularprotein interactions mediated by expanded polyQ in the randomcoil conformation are sufficient to result in pathological effects. Inboth cases the affinity of the interactions involving the expandedpolyQ region could be higher with selected target proteins, lead-ing to a preference of the disease proteins for some of the proteinpartners, a fact that is in agreement with the hypothesis raisedby Zuchner and Brundin (2008), which postulate that resistanceto NMDA receptor-mediated excitotoxicity occurring in somemouse models for HD is a consequence of a differential bind-ing of partner proteins, in a polyQ tract size dependent manner, tothe proline-rich domain of huntingtin. In this context, differencesin molecular interactions occurring in a cell- and tissue-specificmanner would result in different toxicities according to particularcellular environments.

    Given the above mentioned studies, it is nowadays clear that thepolyQ region influences aggregation of proteins, but this process ishighly dependent on the surrounding protein context. Therefore,even though the structural information on peptides and proteinswith polyQ expansions is a useful guideline for the investiga-tion of the pathogenic effects of polyQ expansion, each of theproteins involved in polyQ diseases shows distinctive characteris-tics, cellular roles, and structural properties causing difficulties inthe formulation of structural hypothesis that could explain howdifferent monomeric conformations of polyQ leads to variousaggregated species and how they contribute to neurotoxicity.

    Frontiers in Neurology | Neurodegeneration June 2013 | Volume 4 | Article 76 | 10

    http://www.frontiersin.org/Neurodegenerationhttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    PolyQ REPEATS IN ATAXIN-3 FUNCTION AND DYSFUNCTIONMachado-Joseph disease is an inherited neurodegenerative disor-der of adult onset originally described in people of PortugueseAzorean descent but later shown to be the most common auto-somal dominant spinocerebellar ataxia worldwide. Clinically, it ischaracterized by ataxia, ophthalmoplegia, and pyramidal signs,associated in variable degree with dystonia, spasticity, periph-eral neuropathy, and amyotrophy (Coutinho and Andrade, 1978).Pathologically, the disorder is associated with degeneration ofthe deep nuclei of the cerebellum, pontine nuclei, subthalamicnuclei, substantia nigra, and spinocerebellar nuclei (Coutinhoet al., 1982; Rosenberg, 1992; Margolis and Ross, 2001). It is causedby an expansion of a repetitive CAG tract within the ATXN3 gene(Kawaguchi et al., 1994). While in the healthy population the num-ber of CAG repeats ranges between 10 and 51, in MJD patients thelength of ataxin-3 polyQ tract exceeds 55 consecutive residues.Ataxin-3 is a modular protein, located both in the nucleus and thecytoplasm (Perez et al., 1999; Antony et al., 2009; Macedo-Ribeiroet al., 2009), encompassing an N-terminal globular JD, with struc-tural similarity to cysteine proteases (Scheel et al., 2003; Albrechtet al., 2004), followed by an extended tail composed of two ubiq-uitin interaction motifs (UIMs), the expandable polyQ tract, anda C-terminal region (Matos et al., 2011). The C-terminal region ofataxin-3 may contain a third UIM, depending on the splice vari-ant (Goto et al., 1997), with the 3UIM isoform of ataxin-3 beingpredominantly found in the brain (Harris et al., 2010). Currently,the physiological function of ataxin-3, as well as the molecularmechanism by which expanded polyQ sequences causes selectiveneurodegeneration remain mostly unknown. However, since itis ubiquitously expressed and cell death is region specific, neu-rodegeneration is currently viewed as depending on sequence andstructural features outside the ataxin-3 polyQ tract [reviewed inMatos et al. (2011) and references therein].

    ATAXIN-3 BIOLOGICAL ROLESATXN3 orthologs have been identified in eukaryotic organismsincluding protozoans, plants, fungi, and animals (Albrecht et al.,2004; Costa et al., 2004; Rodrigues et al., 2007). Several functionshave been ascertained to ataxin-3 based on studies with orthologs.Specifically, a role in cell structure and/or motility was proposedfor mouse ataxin-3 as it is highly abundant in all types of muscleand in ciliated epithelial cells (Costa et al., 2004). In fact, ataxin-3is able to interact with tubulin through its JD domain (Figure 4),with nM affinity (Mazzucchelli et al., 2009), which supports arole in cell structure. Interestingly, data on ataxin-3 C. elegansortholog not only reinforces a function in structure/motility andsignal transduction (Rodrigues et al., 2007), but also indicate afunction in development as absence of ATXN3 strongly modifiesexpression of several development-related genes. ATXN3 knock-out animals showed no obvious deleterious phenotype, probablydue to a putative redundant function between ataxin-3 and otherJD-encoding proteins, such as ataxin-3-like protein, Josephin 1 andJosephin 2, all containing a typical cysteine protease catalytic triad.However the studies with ATXN3 knock-out animals revealed anoverall increase in the levels of ubiquitinated proteins (Schmittet al., 2007) and signs of altered expression of core sets of genesassociated with the ubiquitin-proteasome and signal transduction

    pathways (Rodrigues et al., 2007), pointing to a dual function ofataxin-3 in the ubiquitin-proteasome system and transcriptionalregulation (Matos et al., 2011; Orr, 2012a).

    Ataxin-3 function as transcriptional regulatorThe putative role of ataxin-3 in transcriptional regulation isproposed to entail the modulation of histone acetylation anddeacetylation at selected promoters. Ataxin-3 interacts with themajor histone acetyltransferases cAMP-response-element bindingprotein (CREB)-binding protein (CBP), p300, and p300/CREB-binding protein-associated factor (KAT2B/PCAF, Figures 4 and5), and is proposed to inhibit transcription in specific promot-ers (e.g., MMP-2 promoter) either by blocking access to histoneacetylation sites or through recruitment of histone deacetylase 3(HDAC3) and nuclear receptor co-repressor (NCOR1; Figures 4and 5) (Li et al., 2002; Evert et al., 2006). Although, the interac-tion sites have not been mapped in detail for all these proteins,co-immunoprecipitation experiments showed that KAT2B/PCAF,p300, and CBP bind exclusively to the polyQ-containing C-terminal region of ataxin-3 (Figure 4), apparently in a polyQ-sizedependent manner (Li et al., 2002). Experimental evidence alsoindicates that ataxin-3 forms part of a CREB-containing complex,although no direct interaction has been observed between the twoproteins (Li et al., 2002). In contrast, the N-terminal region ofataxin-3 directly binds histones H3 and H4 (Table 2; Figure 4)(Li et al., 2002). Of note, p300 and CBP, as well as NCOR1,also encompass amino acid repetitions in its sequence. Interest-ingly, in huntingtin and in ataxin-1, polyQ interferes with CBP-activated gene transcription via interaction of their glutamine-rich domains (Shimohata et al., 2000; Nucifora et al., 2001) andmutant huntingtin targets specific components of the core tran-scriptional machinery, in a glutamine-tract length-sensitive man-ner (Zhai et al., 2005), pinpointing once again the role of theamino acid-repeat region in the establishment of protein–proteininteractions.

    Ataxin-3 molecular function: ubiquitin hydrolaseA role for ataxin-3 in ubiquitin-dependent pathways was pro-posed by bioinformatic analysis (Scheel et al., 2003; Albrecht et al.,2004), and its ability to bind and cleave poly-ubiquitin chainsand polyubiquitinated proteins was later demonstrated experi-mentally (Burnett et al., 2003; Chai et al., 2004). Importantly,inhibition of ataxin-3 catalytic activity results in the increaseof polyubiquitinated proteins, resembling the effects of protea-some inhibition (Berke et al., 2005), indicating that ataxin-3 isinvolved with proteins targeted for proteasomal degradation. Thefunction of ataxin-3 in the ubiquitin-proteasome system was fur-ther supported by the identification of its association with theubiquitin-like domain of the human homologs of the yeast DNArepair protein Rad23, HHR23A, and HHR23B (Wang et al., 2000;Doss-Pepe et al., 2003; Nicastro et al., 2005, 2009), with valosin-containing protein (VCP)/p97 (Hirabayashi et al., 2001; Doss-Pepeet al., 2003; Boeddrich et al., 2006; Zhong and Pittman, 2006), andwith the ubiquitin ligase E4B (Matsumoto et al., 2004) (Figures 4and 5). Strikingly, the weak direct association between ataxin-3and E4B is strongly reinforced by the addition of VCP/p97, indicat-ing that these proteins form part of a higher order macromolecular

    www.frontiersin.org June 2013 | Volume 4 | Article 76 | 11

    http://www.frontiersin.orghttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    FIGURE 4 | Overview of ataxin-3 structural information. Schematicillustration of ataxin-3 (isoform 2; a.k.a. 3UIM isoform) domain structurehighlighting the regions involved in protein–protein interactions. The solutionstructures of the Josephin domain (PDB accession code 1yzb) and UIMs1-2(PDB accession code 2klz) are shown colored from N-(blue) to C- terminus(red). JD-, UIM-, NLS-, and polyQ-mediated interactions are represented byblue, red, green, and purple arrows, respectively; blue arrows indicate thelocation of post-translational modification sites, resulting from the interactionand phosphorylation by CK2 and GSK3. Representative multi-subunitcomplexes where ataxin-3 participates are boxed (Li et al., 2002; Matsumoto

    et al., 2004; Scaglione et al., 2011; Durcan et al., 2012). One of the mainquestions in the quest for ataxin-3 interacting proteins is whetherpolyQ-expansion of the disease-protein modulates the binding affinities.Current data indicates that polyQ-expansion increments the ataxin-3 affinityfor CHIP (Scaglione et al., 2011), VCP/p97 (Matsumoto et al., 2004; Boeddrichet al., 2006; Zhong and Pittman, 2006), and the transcription regulators p300,CBP, and PCAF (Li et al., 2002) (interactions represented by broken lines).Strikingly, all these interactions are mediated by ataxin-3 flexible tail, whichincludes the polyQ tract. Moreover the transcriptional regulators p300, CBP,and NCOR all contain amino acid repeats.

    complex to regulate the degradation of misfolded ER proteins(Matsumoto et al., 2004; Zhong and Pittman, 2006) (Figure 5).

    Biochemical studies showed that ataxin-3 displays a strongpreference for chains containing four or more ubiquitins (Chaiet al., 2004) and that full-length ataxin-3 and its JD both displayproteolytic activity toward either linear substrates containing asingle ubiquitin molecule (Burnett et al., 2003; Chow et al., 2004b;Weeks et al., 2011) or K48/K63-linked poly-ubiquitin chains (Win-born et al., 2008; Todi et al., 2009), displaying also the capacity tobind the ubiquitin-like protein NEED8 in a substrate-like fashion(Ferro et al., 2007). Moreover, ataxin-3-like protein, Josephin 1 andJosephin 2, also display ubiquitin protease activity (Tzvetkov andBreuer, 2007; Weeks et al., 2011), although the relative activities arehighly variable in spite of their high sequence similarity. Charac-terization of ataxin-3 ubiquitin hydrolase activity has also revealedthat the full-length protein preferentially cleaves Lys-63-linkedand mixed-linkage chains with more than four ubiquitins (Bur-nett et al., 2003; Winborn et al., 2008). This specificity is dictated

    by the UIMs, as the isolated JD shows a preference toward thedisassembly of Lys-48-linked chains (Nicastro et al., 2009, 2010).Altogether, this indicates that ataxin-3 ubiquitin hydrolase activ-ity is likely to be associated with delivery of target substrates tothe proteasome rather than with their rescue from degradation,as it happens with most of the other deubiquitinases (Ventii andWilkinson, 2008; Matos et al., 2011; Scaglione et al., 2011). Inter-estingly, ubiquitin hydrolase activity of ataxin-3 is not affectedby polyQ expansion and both normal and expanded ataxin-3 areable to increase the cellular levels of a short-lived GFP normallydegraded by the ubiquitin-proteasome pathway (Burnett et al.,2003).

    The 3D structures for JD alone or in the presence of ubiquitin aswell as that of the tandem UIM1-UIM2 have already been deter-mined (Mao et al., 2005; Nicastro et al., 2005, 2009; Song et al.,2010), giving a structural perspective on the ubiquitin hydrolasefunction of ataxin-3. The JD contains two ubiquitin binding sites,both of hydrophobic nature, with site 1 being negatively charged to

    Frontiers in Neurology | Neurodegeneration June 2013 | Volume 4 | Article 76 | 12

    http://www.frontiersin.org/Neurodegenerationhttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    Tab

    le2

    |Hu

    man

    atax

    in-3

    asso

    ciat

    edp

    rote

    ins.

    Ata

    xin

    -3in

    tera

    ctin

    gp

    rote

    in

    (Un

    iPro

    tac

    cess

    ion

    cod

    e)

    Pro

    tein

    nam

    eD

    irec

    tin

    tera

    ctio

    n?

    Inte

    ract

    ion

    do

    mai

    ns

    Ref

    eren

    ce

    Ata

    xin

    -3Pa

    rtn

    erp

    rote

    in

    CE

    LL-Q

    UA

    LITY

    CO

    NT

    RO

    L(P

    RO

    TE

    INH

    OM

    EO

    STA

    SIS

    )

    HH

    R23

    A/B

    (P54

    725/

    P54

    727)

    UV

    exci

    sion

    repa

    irpr

    otei

    n

    RA

    D23

    hom

    olog

    A/B

    Yes,

    kD(J

    D:U

    bl)=

    12µ

    MJD

    Ubi

    quiti

    n-lik

    e(U

    bl)

    N-t

    erm

    inal

    dom

    ain

    Wan

    get

    al.(

    2000

    ),D

    oss-

    Pepe

    etal

    .

    (200

    3),N

    icas

    tro

    etal

    .(20

    05,2

    009)

    Poly

    -ubi

    quiti

    n(P

    0CG

    48/P

    0CG

    47)

    Poly

    ubiq

    uitin

    -

    C/P

    olyu

    biqu

    itin-

    B

    Yes,

    kD(a

    txn3

    :K48

    -

    tetr

    aUb)=

    0.2

    µM

    ,kD

    (atx

    n3:U

    b)=

    50µ

    M

    UIM

    s,JD

    K48

    -and

    K63

    -link

    edU

    b

    (≥4

    Ub)

    ,K48

    -link

    eddi

    Ub

    Bur

    nett

    etal

    .(20

    03),

    Dos

    s-Pe

    pe

    etal

    .(20

    03),

    Cha

    iet

    al.(

    2004

    ),

    Nic

    astr

    oet

    al.(

    2009

    ,201

    0)

    Ubi

    quili

    n-1

    (Q9U

    MX

    0)Pr

    otei

    nlin

    king

    IAP

    with

    cyto

    skel

    eton

    1

    n.d.

    n.d.

    n.d.

    Hei

    ret

    al.(

    2006

    )

    NE

    DD

    8(Q

    1584

    3)U

    biqu

    itin-

    like

    prot

    ein

    Ned

    d8

    Yes

    JDN

    ED

    D8

    Ferr

    oet

    al.(

    2007

    )

    Park

    in(O

    6026

    0)E

    3ub

    iqui

    tin-p

    rote

    inlig

    ase

    park

    in

    Yes

    JD,U

    IMs

    IBR

    dom

    ain,

    Ubi

    quiti

    n-lik

    e

    (Ubl

    )dom

    ain

    Dur

    can

    etal

    .(20

    11,2

    012)

    Ubc

    7(P

    6225

    3)U

    biqu

    itin-

    conj

    ugat

    ing

    enzy

    me

    E2

    G1

    Yes

    (tra

    nsie

    ntin

    tera

    ctio

    n

    dete

    cted

    usin

    g

    cros

    s-lin

    king

    reag

    ents

    )

    n.d.

    n.d.

    Dur

    can

    etal

    .(20

    12)

    p45

    (P62

    195)

    26S

    prot

    easo

    me

    regu

    lato

    rysu

    buni

    t8

    Yes

    N-t

    erm

    inal

    atxn

    3re

    gion

    (res

    idue

    s1–

    133)

    n.d.

    Wan

    get

    al.(

    2007

    )

    20S

    Prot

    easo

    me

    (P25

    786,

    P25

    787,

    P25

    788,

    P25

    789,

    P28

    066,

    P60

    900,

    O14

    818,

    P20

    618,

    P49

    721,

    P49

    720,

    P28

    070,

    P28

    074,

    P28

    072,

    Q99

    436)

    Prot

    easo

    me

    subu

    nits

    α

    type

    s1-

    7an

    type

    s1-

    7

    n.d.

    N-t

    erm

    inal

    atxn

    3re

    gion

    (res

    idue

    s1–

    150)

    n.d.

    Dos

    s-Pe

    peet

    al.(

    2003

    )

    CH

    IP(Q

    9UN

    E7)

    E3

    ubiq

    uitin

    -pro

    tein

    ligas

    e

    CH

    IP

    Yes,

    kD

    (atx

    n3:C

    HIP

    )=2.

    M,k

    D

    (atx

    n3:U

    b-C

    HIP

    )=0.

    M

    Atx

    n3C

    -ter

    min

    us

    (res

    idue

    s13

    3–35

    7)

    CH

    IPN

    -ter

    min

    usJa

    naet

    al.(

    2005

    ),S

    cagl

    ione

    etal

    .

    (201

    1)

    VCP

    /p97

    (P55

    072)

    Tran

    sitio

    nale

    ndop

    lasm

    ic

    retic

    ulum

    ATPa

    se

    Yes

    Res

    idue

    s27

    7–28

    1

    (incl

    udes

    argi

    nine

    /lysi

    ne-r

    ich

    NLS

    )

    Ndo

    mai

    n,re

    sidu

    es1-

    199

    Hira

    baya

    shie

    tal

    .(20

    01),

    Dos

    s-Pe

    pe

    etal

    .(20

    03),

    Mat

    sum

    oto

    etal

    .

    (200

    4,?)

    Boe

    ddric

    het

    al.(

    2006

    ),an

    d

    Zhon

    gan

    dP

    ittm

    an(2

    006)

    E4B

    (O95

    155)

    Ubi

    quiti

    nco

    njug

    atio

    n

    fact

    orE

    4B

    Yes

    (with

    79Q

    -ata

    xin-

    3)n.

    d.n.

    d.M

    atsu

    mot

    oet

    al.(

    2004

    )

    (Con

    tinue

    d)

    www.frontiersin.org June 2013 | Volume 4 | Article 76 | 13

    http://www.frontiersin.orghttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    Tab

    le2

    |Co

    nti

    nu

    ed

    Ata

    xin

    -3in

    tera

    ctin

    gp

    rote

    in

    (Un

    iPro

    tac

    cess

    ion

    cod

    e)

    Pro

    tein

    nam

    eD

    irec

    tin

    tera

    ctio

    n?

    Inte

    ract

    ion

    do

    mai

    ns

    Ref

    eren

    ce

    Ata

    xin

    -3Pa

    rtn

    erp

    rote

    in

    OTU

    B2

    (Q96

    DC

    9)U

    biqu

    itin

    thio

    este

    rase

    OTU

    B2

    n.d.

    n.d.

    n.d.

    Sow

    aet

    al.(

    2009

    )

    US

    P13

    (Q92

    995)

    Ubi

    quiti

    nca

    rbox

    yl-t

    erm

    inal

    hydr

    olas

    e13

    n.d.

    n.d.

    n.d.

    Sow

    aet

    al.(

    2009

    )

    KC

    TD10

    (Q9H

    3F6)

    BTB

    /PO

    Z

    dom

    ain-

    cont

    aini

    ngad

    apte

    r

    for

    CU

    L3-m

    edia

    ted

    Rho

    A

    degr

    adat

    ion

    prot

    ein

    3

    n.d.

    n.d.

    n.d.

    Sow

    aet

    al.(

    2009

    )

    Tubu

    lindi

    mer

    (Q71

    U36

    /P68

    363)

    Tubu

    linα-1

    A,T

    ubul

    inβ-2

    BYe

    s,kD

    (atx

    n3:tu

    bulin

    )=50

    –70

    nM

    JDn.

    d.M

    azzu

    cche

    lliet

    al.(

    2009

    )

    Dyn

    ein

    (Q9Y

    6G9)

    Cyt

    opla

    smic

    dyne

    in1

    light

    inte

    rmed

    iate

    chai

    n1

    n.d.

    n.d

    n.d.

    Bur

    nett

    and

    Pitt

    man

    (200

    5)

    HD

    AC

    6(Q

    9UB

    N7)

    His

    tone

    deac

    etyl

    ase

    6n.

    d.n.

    d.n.

    d.B

    urne

    ttan

    dP

    ittm

    an(2

    005)

    TR

    AN

    SC

    RIP

    TIO

    NA

    LR

    EG

    ULA

    TIO

    N

    p300

    (Q09

    472)

    His

    tone

    acet

    yltr

    ansf

    eras

    e

    p300

    Yes

    Poly

    Q-c

    onta

    inin

    gC

    term

    inus

    ofat

    xn3

    (res

    idue

    s28

    8–35

    4)

    n.d.

    Liet

    al.(

    2002

    )

    CB

    P(Q

    9279

    3)cA

    MP-

    resp

    onse

    -ele

    men

    t

    bind

    ing

    prot

    ein

    (CR

    EB

    )-bin

    ding

    prot

    ein

    Yes

    Poly

    Q-c

    onta

    inin

    gC

    term

    inus

    ofat

    xn3

    (res

    idue

    s28

    8–35

    4)

    n.d.

    Liet

    al.(

    2002

    )

    PC

    AF

    (Q92

    831)

    p300

    /CR

    EB

    -bin

    ding

    prot

    ein-

    asso

    ciat

    edfa

    ctor

    :

    hist

    one

    acet

    yltr

    ansf

    eras

    e

    KAT

    2B

    Yes

    Poly

    Q-c

    onta

    inin

    gC

    term

    inus

    ofat

    xn3

    (res

    idue

    s28

    8–35

    4)

    n.d.

    Liet

    al.(

    2002

    )

    His

    tone

    H3/

    H4

    (P68

    431/

    P62

    805)

    His

    tone

    Yes

    JD+

    UIM

    1an

    d2

    (res

    idue

    s1–

    288)

    n.d.

    Liet

    al.(

    2002

    )

    HD

    AC

    3(O

    1537

    9)hi

    ston

    ede

    acet

    ylas

    e3

    Yes

    n.d.

    n.d.

    Eve

    rtet

    al.(

    2006

    )

    NC

    OR

    1(O

    7537

    6)N

    ucle

    arre

    cept

    or

    core

    pres

    sor

    1

    n.d.

    n.d.

    n.d.

    Eve

    rtet

    al.(

    2006

    )

    MA

    ML3

    (Q96

    JK9)

    Mas

    term

    ind-

    like

    prot

    ein

    3n.

    d.n.

    d.n.

    d.R

    avas

    iet

    al.(

    2010

    )

    EW

    SR

    1(Q

    0184

    4)R

    NA

    -bin

    ding

    prot

    ein

    EW

    Sn.

    d.n.

    d.Vi

    naya

    gam

    etal

    .(20

    11)

    (Con

    tinue

    d)

    Frontiers in Neurology | Neurodegeneration June 2013 | Volume 4 | Article 76 | 14

    http://www.frontiersin.org/Neurodegenerationhttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    Tab

    le2

    |Co

    nti

    nu

    ed

    Ata

    xin

    -3in

    tera

    ctin

    gp

    rote

    in

    (Un

    iPro

    tac

    cess

    ion

    cod

    e)

    Pro

    tein

    nam

    eD

    irec

    tin

    tera

    ctio

    n?

    Inte

    ract

    ion

    do

    mai

    ns

    Ref

    eren

    ce

    Ata

    xin

    -3Pa

    rtn

    erp

    rote

    in

    SIG

    NA

    LT

    RA

    NS

    DU

    CT

    ION

    CK

    2(P

    1978

    4)C

    asei

    nki

    nase

    IIsu

    buni

    Yes

    n.d.

    n.d.

    Tao

    etal

    .(20

    08),

    Mue

    ller

    etal

    .

    (200

    9)

    GS

    K3B

    (P49

    841)

    Gly

    coge

    nsy

    ntha

    se

    kina

    se-3

    β

    Yes

    n.d

    n.d

    Feie

    tal

    .(20

    07),

    Vina

    yaga

    met

    al.

    (201

    1)

    DN

    M2

    (P50

    570)

    Dyn

    amin

    -2n.

    d.n.

    d.n.

    d.Vi

    naya

    gam

    etal

    .(20

    11)

    CD

    KN

    1A(P

    3893

    6)C

    yclin

    -dep

    ende

    ntki

    nase

    inhi

    bito

    r1

    n.d.

    n.d.

    n.d.

    Vina

    yaga

    met

    al.(

    2011

    )

    AN

    XA

    7(P

    2007

    3)A

    nnex

    inA

    7n.

    d.n.

    d.n.

    d.Vi

    naya

    gam

    etal

    .(20

    11)

    RP

    S6A

    K1

    (Q15

    418)

    Rib

    osom

    alpr

    otei

    nS

    6

    kina

    seα-1

    n.d.

    n.d.

    n.d.

    Vina

    yaga

    met

    al.(

    2011

    )

    TK1

    (P04

    183)

    Thym

    idin

    eki

    nase

    ,cyt

    osol

    icn.

    d.n.

    d.n.

    d.Vi

    naya

    gam

    etal

    .(20

    11)

    MK

    NK

    1(Q

    9BU

    B5)

    MA

    Pki

    nase

    -inte

    ract

    ing

    serin

    e/th

    reon

    ine-

    prot

    ein

    kina

    se1

    n.d.

    n.d.

    n.d.

    Vina

    yaga

    met

    al.(

    2011

    )

    ATA

    XIO

    ME

    TEX

    11(Q

    8IY

    F3)

    Test

    is-e

    xpre

    ssed

    sequ

    ence

    11pr

    otei

    n

    n.d.

    n.d.

    n.d.

    Lim

    etal

    .(20

    06)

    C16

    orf7

    0(Q

    9BS

    U1)

    UP

    F018

    3pr

    otei

    nC

    16or

    f70

    n.d.

    n.d.

    n.d.

    Lim

    etal

    .(20

    06)

    AR

    HG

    AP

    19(Q

    14C

    B8)

    Rho

    GTP

    ase-

    activ

    atin

    g

    prot

    ein

    19

    n.d.

    n.d.

    n.d.

    Lim

    etal

    .(20

    06)

    PIC

    K1

    (Q9N

    RD

    5)P

    RK

    CA

    -bin

    ding

    prot

    ein

    n.d.

    n.d.

    n.d.

    Lim

    etal

    .(20

    06)

    Box

    essh

    aded

    ingr

    ayre

    pres

    ent

    asso

    ciat

    ions

    iden

    tified

    inhi

    gh-t

    hrou

    ghpu

    tin

    tera

    ctom

    esc

    reen

    ings

    .

    Atx

    n3,

    atax

    in-3

    ;IB

    R,

    InB

    etw

    een

    Rin

    gfin

    gers

    ;JD

    ,Jo

    seph

    indo

    mai

    n;n.

    d.,

    not

    dete

    rmin

    ed;

    NLS

    ,nu

    clea

    rlo

    caliz

    atio

    nse

    quen

    ce;

    Ub,

    ubiq

    uitin

    ;U

    BA

    ,ub

    iqui

    tinas

    soci

    ated

    dom

    ain;

    Ubl

    ,ub

    iqui

    tin-li

    kedo

    mai

    n;U

    IM,

    ubiq

    uitin

    -inte

    ract

    ing

    mot

    ifs.

    www.frontiersin.org June 2013 | Volume 4 | Article 76 | 15

    http://www.frontiersin.orghttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    FIGURE 5 | Overview of ataxin-3 protein interaction network. Data onthe ataxin-3 interactors was obtained by analysis of Interactome3D (Moscaet al., 2012), MINT (Ceol et al., 2010), and Dr. PIAS (Sugaya and Furuya,2011) protein interaction databases, and completed with data compiledfrom current literature on ataxin-3 protein associations obtained with adiverse set of experimental approaches (see complete information on

    Table 2). Red arrows indicate interactions for which structural data has beenobtained, while orange arrows indicate that biophysical data on interactionaffinity in vitro is known (Table 2). Broken arrows represent interactions thatresult from high-throughput interactome analysis that still require detailedbiochemical and functional analysis. Proteins are grouped according to theirbiological role.

    facilitate docking of the positively charged ubiquitin C-terminusclose to the catalytic site. Binding of ubiquitin to site 1 is of crucialimportance for both JD and full-length ataxin-3 activity as ubiqui-tin hydrolase (Nicastro et al., 2010). Site 2 confers ubiquitin-chainlinkage preference to ataxin-3 and it overlaps with the surface forinteraction of the ubiquitin-like domain in HHR23B (Nicastroet al., 2005, 2010). Solution structure for the two UIMs (UIM1and UIM2), which are separated by a short 2 amino acid spacer,revealed that they fold into two α-helices separated by a flexiblelinker (Song et al., 2010). Upon ubiquitin binding, this structureadopts a typical helix-loop-helix folding pattern, where hydropho-bic interactions dominate the complex formation (Song et al.,2010). When in tandem, UIM1 and UIM2 show higher bindingaffinity for mono- or poly-ubiquitin than individual UIMs (Songet al., 2010), suggesting a cooperative binding mechanism (Songet al., 2010). The effect of the presence of UIM3 in ataxin-3 bindingaffinity for ubiquitin has not been shown, but its role in ubiqui-tin chain binding and recognition is unlikely to be of relevance toataxin-3 activity, since no differences in proteolytic activity wereidentified when the 2UIM and 3UIM isoforms were compared. Inthe model proposed for ataxin-3 ubiquitin chain proteolysis, theUIMs (UIM1-UIM2) select and recruit poly-ubiquitin substrates,presenting them to the catalytic JD for cleavage (Mao et al., 2005).

    Even though ataxin-3 functions as ubiquitin hydrolase, its pro-teolytic activity is rather low, indicating that either ataxin-3/JD

    requires additional factors (post-translational modifications,cofactors, intracellular interactions) to exhibit significant prote-olytic activity or the substrates used in vitro so far are not optimal.Interestingly, only three amino acid mutations are sufficient tosignificantly increase the proteolytic activity of ataxin-3, to avalue close to that of ataxin-3-like protein (Weeks et al., 2011).Under physiological conditions, one candidate for an activatingsignal is mono-ubiquitination at K117, which has been shownto increase the enzyme’s rate of cleavage of Lys-63 linked sub-strates (Todi et al., 2009). However, the molecular mechanism bywhich ubiquitination increases enzyme activity is not still clear,nor is it known whether other cellular signals (e.g., phospho-rylation by CK2 or GSK3b; Fei et al., 2007; Tao et al., 2008)may also modulate the activity of ataxin-3. Interestingly the JD-containing protein, Josephin 1 was also demonstrated to cleaveubiquitin chains only after it is mono-ubiquitinated (Seki et al.,2013). The regulation of ataxin-3 activity through ubiquitinationmight depend on the interaction of ataxin-3 with several E3 ubiq-uitin ligases (Durcan and Fon, 2013), such as the C-terminus of70 kDa heat-shock protein (Hsp70)-interacting protein (CHIP),parkin, and E4B (Figure 5), since all were shown to promoteataxin-3 ubiquitination and regulate its degradation by the pro-teasome (Matsumoto et al., 2004; Jana et al., 2005; Miller et al.,2005). Association of ataxin-3 with CHIP is a multistep processregulated by mono-ubiquitination of the N-terminal region of

    Frontiers in Neurology | Neurodegeneration June 2013 | Volume 4 | Article 76 | 16

    http://www.frontiersin.org/Neurodegenerationhttp://www.frontiersin.org/Neurodegeneration/archive

  • Almeida et al. Structure and function of trinucleotide repeats

    CHIP by the E2-conjugating enzyme Ube2w, and occurs throughthe region encompassing polyQ and UIM1 and 2 (Jana et al.,2005) (Figure 4). As observed for other interactions involvingthe C-terminal region of ataxin-3, the ataxin-3-CHIP complex isaffected by polyQ expansion and the polyQ-expanded protein dis-plays a sixfold increase in binding affinity (Scaglione et al., 2011).The presence of ataxin-3 in multicomponent E3-ligase complexesis also supported by the identification of a direct interactionwith parkin, an association that stabilizes the interaction betweenparkin and the E2-conjugating enzyme Ubc7 (Durcan et al., 2011).In contrast with what is observed in the ataxin-3:CHIP com-plex, ataxin-3 association with parkin remains unaltered by polyQexpansion (Durcan et al., 2012) (Figure 4). However, we still donot understand the mechanisms that regulate shuttling of ataxin-3 between these functional complexes or how its distribution ismodulated by polyQ expansion. Further biochemical studies arerequired to establish the correlation between these macromolec-ular interactions and their relevance for ataxin-3 aggregation andneurodegeneration in MJD patients

    ATAXIN-3 AGGREGATION: A MULTISTEP PATHWAY MODULATED BYTHE PROTEIN CONTEXTA characteristic hallmark of MJD and other polyQ-expansion dis-eases is the appearance of intracellular inclusions enriched inthe disease protein and containing components from the cell-quality control machinery (e.g., ubiquitin, proteasome subunits,and chaperones), indicating that these diseases form part of thelarger family of protein misfolding disorders (Williams and Paul-son, 2008). Early in vitro studies showed that expansion of thepolyQ tract within the pathological range induced formationof insoluble β-rich fibrils with the capacity to bind amyloid-specific dyes (Bevivino and Loll, 2001). Later it was demonstratedthat non-pathological ataxin-3 could also form insoluble fibrillaraggregates upon destabilization of its structure by temperature,pressure or denaturing agents (Marchal et