34
Przemysław Kaszubski Faculty of English From PICLE search to IFAConc Corpora in Poland PLM Session Poznań 8 Sept. 2012

Przemysław Kaszubski Faculty of English From PICLE search to IFAConc Corpora in Poland PLM Session Poznań 8 Sept. 2012

Embed Size (px)

Citation preview

Przemysław KaszubskiFaculty of English

From PICLE search to IFAConc

Corpora in Poland PLM SessionPoznań 8 Sept. 2012

Outline

• Research goal(s) and questions• Disclaimers and challenges• The story 2006-2012:– evolving solutions (incl. demos)

• PICLE search and Perl Concordancer• bigram and trigram tools• Error concordancer• IFA Student concordancer• IFAConc

• Some conclusions and plans

Overall research goal(s)

• Pedagogic “action-like” research:– probing the potential of corpus-based e-learning for (my) EAP

writing instruction– Explore options for “seamless” integration with coursework

• Some questions:– Can students be (successful ?) (corpus) explorers of language

(for their own sake) ?– (meta-)linguistic background knowledge (or remnants of it):

facilitator or inhibitor for data-driven learning (DDL)?– can (controlled ?) corpus-exploration facilitate constructivist

(practical) knowledge making (= knowing how to write (this) better / best ?)

– bottom-up exploration or / and top-down instruction?

Some challenges and disclaimers

• small corpora (100 – 300 k words)• non-indexed text search• self-made online tools• questions of time:– speed of search– time of satisfactory data analysis (any tool!)

Why like this?

• “experimental” assumption:– if tools with these limitations can work with learners, then ...

• fun of creation• flexibility and freedom of development• availability of man-power:

– student programmers– seminar students as corpora collectors– student writing groups as testers / testees

• EAP / EGAP / ESAP context– special(ist) corpora

The start: Briefly about PICLE

• Polish sub-corpus of the International Corpus of Learner English (ICLE)

• 330,000 words of running text (over 500 essays)• Major part (c. 230,000 words) published on ICLE

CD ROM in 2002 (2006, 2nd ed), together with comparable English learner corpora collected in other EFL countries.

• 50-thousand word sampler has been error-tagged• Can be (re-)searched online, unlike most other

learner corpora

Some (lexical) research insights from PICLE (1)

Misuse•'HAVE/GIVE (sb) possibility to <do sth>' = *'MIEĆ/DAĆ (komuś) możliwość / sposobność <zrobienia czegoś>'

– ... the adoptive parents have influenced their child, without giving him any choice or *possibility to "try out" other options.

– For this reason we should reread a story because it gives us *possibility to look at the literary work from a perspective.

•BNC (chance, likelihood of): – [...] ... led him to the perception that man has the possibility of

changing his state of consciousness. – The sample was so arranged as to be fully representative over the

country as a whole, and everyone had the same possibility of being included.

Some (lexical) research insights from PICLE (2)

Overuse•High frequency vocabulary•adverbs of stance (boosters): definitely, certainly, undoubtedly, for sure

“favourite” phraseology:•BE full of <sth>:

– Our television is full of programmes unsuitable for young viewers, ...

•that/this BE why:– Since imagination belongs to one of the most important of our features we

cannot deprive ourselves of it. That is why, many of us are (...).

•TAKE care (of <sb/sth>:– ... duties on the side of a woman, who is now expected not only to take care of

her house and family but also to find time for professional work.

Some (lexical) research insights from PICLE (3)

Underuse / avoidance

•E.g.: collocation breadth: attributive adjectives before attitude

•To be sure:

Exclusive NS use:

The motivations for both sexes, to be sure, are different.

There is plenty of violence, to be sure, but it is a nice violence and no one gets killed.

Beginnings

• How (better ?) to share (and discover ?) such learner-corpus insights with learners?

– items for (passive) study (usage alerts etc.)

– items for study AND the corpus method ..?Let’s try !• potential usefulness of DDL assumed

PICLE search / Perl Concordancer

• From one corpus to a range of (comparable) corpora

• Tool hub:– http://ifa.amu.edu.pl/~kprzemek/PICLE_search.ph

p

• “Perl Concordancer(s)”:– http://ifa.amu.edu.pl/~kprzemek/concord2advr/s

earch_adv_new.html

Bigram and trigram tools

• towards more “search-worthy” items• bigrams: – http://ifa.amu.edu.pl/~

kprzemek/concord3/bigram.html • trigrams counter:– http://ifa.amu.edu.pl/~

kprzemek/concord3/trigram.php • problems:– “geek” tools

Error concordancing

• List-driven:– http://ifa.amu.edu.pl/~

kprzemek/concord2adv/errors/errors.htm • "direct error concordancer“:– http://ifa.amu.edu.pl/~

kprzemek/concord2advr/error-builder.php • problems:– direct interpretability?– away from the “error” corpus evidence towards

exposure to, and noticing, NS usage...

IFA Student Concordancer

• IFAConc’s predecessor:– http://ifa.amu.edu.pl/~

kprzemek/concord2-login/index.html

• Problems:– interface issues– search syntax– getting students to do it, e.g.:

• need for integration of prompted and spontaneous work• integration with other (online) (writing) course tasks

IFAConc inspirations

• Tim John’s ‘Kibbitzer’• Tom Cobb’s URL-driven concordance feedback• Aston’s corpora for ESP + Hyland’s emphasis on ESAP• Linguistic theory:

– Hoey’s lexical priming– also: Sinclair’s ‘extended unit of meaning’, Stubbs’ ‘phrasal schemas’,

Goldberg’s CCxG, Halliday’s metafunctions• SLA and CALL theory: ‘default path’, DDL, constructionism and Web

2.0• current DDL: Gavioli’s 'samples' vs 'examples‘, Widdowson’s

authentication• Coniam's "concordancing oneself"• online Cobuild Sampler (friendly search syntax)• web search engines

IFAConc conceptions

• friendly (enough), but demanding (some) deep-level processing (noticing, interpretation, adoption/rejection):– patterns of use / meaning– variation

• bottom-up and top-down access• recommended theoretical platform (cf. ‘default path’ in

CALL)• relevance

– authentication– personalisation

• collaborative as well as individual

IFAConc success target: A good human concordancer

• initiates searches• adjusts searches• interprets searches

– specific linguistic insights– awareness

• applies interpretation (authenticates)– on a task (eg. revision, vocab learning activities)– personal record (annotation)– discusses / shares findings – co-annotation, discussion

• personalizes the tool– annotations– personal corpora

IFAConc initial technologies (1)

• each search is a web link– concordances are interactive, not static

• each search is recorded (user-logging)– user interface– teacher-admin interface

• each search can be annotated– possible interaction with admin/teacher

• contrasting corpora along a cline of specialisation– EGP -> EGAP -> ESAP– EFL varieties– personal(ised) corpora

IFAConc initial technologies (2)

• corpora easily switchable on and off• random sampling (e.g. 20 lines – Sinclair, after

Hunston 2002)• wider context view (cf. ‘shunting’, Halliday)• Not only corpus search interface: – History – past work, possibly annotated– Resources - repository with recommended useful content

and / or tasks• enhancing web integration and publicity: RSS, blog,

Moodle, traditional CALL

IFAConc – brief phase 1 to phase 2 change log

• Tests (tasks. monitoring, questionnaires) performed up to 2010 showed:– Anybody can conduct a reasonably successful analysis (cf. > 200 extramurals)– Returning users (gradually) search and interpret better– procedure too teacher-intensive– need to increase breadth of searches / analyses

• 2010 Improvements in UX (user experience) , e.g.: – system more interactive– clearer highlighting of error-prone learner data– optimization of training and teacher-student interactions => towards an e-learning

environment– Encouraging boost in annotation quality after the changes

• Latest enhancements (after 2011):– context reading mode– more sharing options (History entries / corpora)

• CAVEAT: General dev problem:– new features vs. (pedagogical / research) focus

IFAConc – some HIGHLIGHTS ...

IFAConc highlights:Corpora Search

IFAConc highlights:Corpora Search

IFAConc highlights:Feedback link in student text(by comparison)

IFAConc highlights:Shared entry task prompt

IFAConc highlights:Resources training page

IFAConc highlights:User monitoring (1)

IFAConc highlights:User monitoring (2) – S-T collaborative annotation

IFAConc highlights:User monitoring (3) – email notification

H-98307 'devoted' - annotation update by 'jagodawasik'

IFAConc highlights:Output: potential lexical primings: literary criticism vs. linguistics

IFAConc highlights:Output: Likely overuse / underuse cases

IFAConc – demo of today’s look

• http://ifa.amu.edu.pl/~ifaconc • student interface• teacher / admin interface

• Pardon the imperfections (server changes):

Some IFAConc lessons learned

• The system CAN work and could be self-sustainable, but:• Enforced mode => free-use

– research goal: fine-tuning of automatic student-tool interaction– problematic peer-feedback-based constructivims (“collectivist”

culture? But: Is student’s web experience not changing that?)• Prolonging the “novelty effect”

– at least within assumed one user (learner) cycle• Steady user experience enhancements:

– increasing the ease of use• improvement of ways-in without sacrificing learner effort (interpretation,

authentication)• facilitation of annotations (integration of Corpora Search and History)

– enhancing interpretation options• new corpora, new / improved training tasks etc.

My relevant earlier presentations

• PALC (Lodz) 2007• TALC (Lisbon) 2008• PALC (Lodz) 2009• CL (Liverpool) 2009• TALC (Brno) 2010• ALL (Tuebingen) 2011