Upload
amy-allen
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
New Meeting Corpus @ IDIAP
Daniel Gatica-Perez, Iain McCowan, Samy Bengio
Corpus Administration – Joanne Schulz Technical Assistance – Thierry Collado, Olivier Masson
Features of the new corpus
Meeting room equipped with more sensors Meetings based on scenarios Majority of scenarios for real meetings Participants act naturally 3-5 participants per meeting Natural length: 30-80 minutes
Devices in the SMR
24 microphones (2 arrays, lapels, binaural manikin) 3 cameras Projector capture device Whiteboard capture device Personal notes capture devices (Logitech pens) Electronic versions of documents (e.g. papers) Under evaluation
– close-view cameras – EPFL camera whenever available
Features of each meeting
People entering/leaving get recorded Real-life interruptions OK (e.g. latecomers, cell-phones) Varying visual clutter (photographs, bookshelves) Meeting artifacts included (documents, laptops, coffee) Some meetings have agendas Single and multi-session meetings So far, native and non-native speakers are mixed
Current look
Current scenarios
Corpus project meeting Weekly meeting with the people involved in the recording process to discuss its progress.
Conference report People returning from conferences hold a meeting to present what they learned, and discuss emerging trends.
Current scenarios (2)
Book club Several clubs to read non-technical books, and meet to discuss them. Meetings will occur once or twice a month.
Technical reading group Staff members get together to review and discuss relevant papers on a field of interest, every two weeks.
Current scenarios (3)
Presentation rehearsals Students/researchers rehearse a presentation in front of a group. Discussion occurs naturally.
Creation of reading area at IDIAPStaff and students meet to discuss location, furnishings…
Other scenarios Open to be specified and played as the process goes on.
Current corpus status (1)
Recordings started in late November 18 meetings, ~14 hours (Jan 20th)
– Corpus project meeting: 5– Conference report: 8– Book club 1– Technical reading group 2– Presentation rehearsals 1– Other (Brno camera) 1
Current corpus status (2)
Media conversion procedure ready– One script from DV tape to divX
Under evaluation:– Subcontractor for speech transcriptions– Annotation tool for meeting actions
Noldus’ The Observer Brno video annotation tool
– Recruitment of annotators
The procedure
Each meeting coordinated by one person Corpus administrator is present but “hidden” Meetings booked via a web site (IDIAP only) Room is ready: enter, hold meeting, leave All feedback via e-mail
Ethical concerns
Steps to protect participants Procedures to track the data and ensure respectful use Creation of an internal ethics committee There will be opportunity to ‘bleep’ data by meeting
participants
Timeline and outlook
Core meeting set: 20 meetings– Raw data on mmm: late February– Speech transcriptions: mid Feb - ???– Group action annotations: mid Feb - late March
What to do with the dataset?– Define common processing tasks– Define protocols for evaluation
Initial group action annotations
Group turn-taking– floor, dialogue, discussion
Group focus-of-attention– whiteboard, presentation, notes, table, unfocused
Group level-of-interest– High, low
Feedback via e-mail