68
Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Embed Size (px)

Citation preview

Page 1: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Plagiarism WorkshopMike Joy University of Bath, 29 February 2012

Page 2: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Emergency exits

Fire alarm

Toilets

Certificates of attendance

2

Administrative Issues

Page 3: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

1.30 Introduction1.50 What is plagiarism?2.00 Our experiences2.20 Text plagiarism2.30 Computing and mathematics2.50 Why do students plagiarise?3.00 How do students plagiarise?3.15 Break3.30 Detection strategies and tools3.45 Prevention strategies and university process4.00 Discussion and conclusion4.15 End

3

Timetable

Page 4: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

What is Plagiarism?

Page 5: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

“The action or practice of taking someone else's work, idea, etc., and passing it off as one's own; literary theft” (OED Online, 2012)

“To commit literary theft; to present as new and original an idea or product derived from an existing source” (Merriam-Webster Online, 2012)

These definitions are open to interpretation.

What about equations, computer programs, etc.?

“Academic integrity”

5

Definitions

Page 6: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Not all cheating is plagiarism.

For example, taking crib-sheets into an exam.

What about “contract cheating”, where student pays another to write an assignment for them?

We adopt a broad interpretation of “plagiarism” (otherwise we may miss important types of cheating which are appropriate for use to cover here).

6

Plagiarism vs. Cheating

Page 7: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Cheating is potentially illegal.

Not fair on the other students.

Compromises the learning process.

Wastes time

— Staff time!

— Paperwork, disciplinary process

We are required to deal with it!

— QAA Quality Code (B6)

7

Why is this Important?

Page 8: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

“If you go the bar at lunchtime you can buy a solution to any of our programming assignments. I reckon the incidence of plagiarism is over 50%” (source wishes to remain anonymous, dated 1999).

Around 5% in programming assignments at Warwick University (from detailed analyses of first year programming assignments over several years, from 2002-2004).

Documented cases (90 UK HEIs, all subjects) – 0.72% (source: AMBeR Project Report 2008).

8

How big a Problem is Plagiarism?

Page 9: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Detection is fun.

Algorithms can be applied to the detection process (so Computer Scientists can apply their skills).

Getting involved gives us insights into how students are conducting their studies.

9

Why is this Interesting?

Page 10: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Our experiences

Rainbow Lorikeet, by René Modery, 2006

Page 11: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Basic TheoryBasic Theory

Foundations of the Louvre, photo by Ceronne, 2006

Page 12: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Students must know and understand (clear University policy).

Detection must happen (the more the better!).

Due process (punishment).

Thus … four stages:

Collection Detection Confirmation Investigation

(Culwin and Lancaster, 2002).

12

Four Stages

Page 13: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Get all documents together online:

– So they can be processed;

– Document formats need to be considered;

– Security is an issue.

Coursemaster (Nottingham)

BOSS (Warwick)

Managed Learning Environment (Blackboard, Moodle)

13

Stage 1: Collection

Page 14: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

(1) Compare with other submissions (“intra-corpal”)

(2) Compare with external documents (“extra-corpal”)

– essay-based assignments, can use Turnitin

– program code, equations, maybe a problem

(1) is (relatively) easy (can even be done by hand), but

(2) is a big problem.

14

Stage 2: Detection

Page 15: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Software tool says “A and B similar”.

Are they?

Never rely on a computer program!

Requires expert human judgement.

Evidence must be compelling.

Might go to court.

15

Stage 3: Confirmation

Page 16: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

A from B, or B from A, or joint work?

If A from B, did B know?

– Open networked file?

– Printer output?

Did the culprit/s understand?

University processes must be followed:

– No shortcuts!

16

Stage 4: Investigation

Page 17: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Text Plagiarism

“Portrait of a Scribe” by Bartolomeo Passerotti (1529-1592)

Page 18: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Essay time …

Page 19: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Funded mainly by subscriptions from institutions.

Cache of – the Internet– all documents submitted to it– anything else it can find!

Compares text of documents submitted to it using a string-matching algorithm.

19

Turnitin® UK

Page 20: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Can be used by academics to

– detect plagiarism

– provide evidence

Can be used by students to

– check their own work

20

Pedagogy

Page 21: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

21

Turnitin (1)

Page 22: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

22

Turnitin (2)

Page 23: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

23

Turnitin (3)

Page 24: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

AdvantagesReasonably

accurate

Ease and speed of use

Printed reports

Comprehensive datastore

Most formats

Management tool

DisadvantagesAlgorithm can be fooled

English only

Quotes and references are poorly handled

“False sense of security” 24

Algorithm and Functionality

Page 25: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Computing and Mathematics

A PowerMac G4 ("Mirrored Drive Doors" model) with open case showing the logic board. Photo by Alistair McMillan, 2006.

Page 26: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Discipline specific:

Program code

Diagrams (UML, flowcharts, etc.)

Lab reports

Images (graphics, image processing)

26

Computing

Page 27: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Discipline specific:

Equations

Theorems and proofs

Statistical analyses

MATLAB programs

27

Mathematics

Page 28: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

It won’t work!

– String matching algorithm inappropriate

– Database does not contain (much) code

Commercial products exist, for example

– Black Duck Software

– Similix Corporation

28

Why not use Turnitin?

Page 29: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

/* Program 1 */

public class Hello {

public static void main(String[] argv) {

System.out.println(“Hello World”)

}

}

/* Program 2 */

public class HelloWorld {

public static void main(String[] x) {

System.out.println(“hello world!”)

}

}

29

/* Programs 1 and 2 */

Page 30: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Program 3

(Source code for MS Windows 7)

Program 4

(code 50% identical to the source code for MS Windows 7)

30

/* Programs 3 and 4 */

Page 31: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

public class Sun { static final double latitude=52.4; static final double longitude=-1.5 static final double tpi = 2.0*pi; /* ... */

public static void main(String[] args) { calculate(); }

public static double FNrange(double x) { double b = x / tpi; double a = tpi * (b - (long)(b)); if (a < 0) a = tpi + a; return a; };

public static void calculate() { /* ... */ }/* ... */

31

/* Program 5 */

Page 32: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

public class SunsetCalculator { static float latitude=52.4; static float longitude=-1.5; /* ... */

public static void main(String[] args) { findSunsetTime(); }

public static double rangeCalc(float arg) { float x = arg / tpi; float y = 2*3.14159 * (x - (int)(x)); if (y < 0) y = 2*3.14159 + y; return y; };

public static void findSunsetTime() { /* ... */ }/* ... */

32

/* Program 6 */

Page 33: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Apart from source-code re-use, need to think about:

Use of (object-oriented) templates

Converting code to a different language

Code-generator software

Getting source-code written by someone else

What constitutes minimal / moderate / extreme plagiarism?

33

What is Source-Code Plagiarism?

Page 34: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

“Open Source” code

Translation between languages

Re-use of code from previous assignments

Placing references within technical documentation (comments)

34

What do Students Misunderstand?

Page 35: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Common equations such as E=mc2 don’t need referencing.

Probably most others do.

Are there any “grey areas”?

35

Mathematical Equations

Page 36: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Why do Students Plagiarise?

Why, Arizona, by Ken Lund, 2010.

Page 37: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Money

Career advancement

Company advancement

Tight deadlines

Poor ethics

What about academics?

37

Digression – Industry

Page 38: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Weak students

Lazy students

Students with poor time management skills

Overworked students

Peer pressure

Cultural factors

Lack of understanding

“Bad, sad or mad” (Culwin, 2006).

38

Students

Page 39: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

How do Students Plagiarise?

Tiles on LaSalle Street, New Orleans, by Infrogmation, 2009.

Page 40: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Google

Friends

Lecturer’s notes

Seeing what other students are doing

Textbooks

Code repositories

Forums

Cheat sites

Where to Find Information

Page 41: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

‘Rent-A-Coder’

Low rates ($10) – so quality of code?

Plagiarism by hired coders?

Private Internet sites make search engines ineffective.

Use of mobile devices and IM tools makes tracing difficult.

41

Contract Cheating

Page 42: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Break

Photo by Vanderdecken, 2007.

Page 43: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Detection Strategies

Sherlock Holmes and John H. Watson, by Sidney Paget (1860-1908)

Page 44: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Google search on phrases

Abnormal style

Unusual phrases or spellings (incl. in program comments)

Unusual algorithm used by a program

Unusual formatting

– Fonts, indentation (wordprocessor)

– Brace style (etc.) (program)

44

Tricks of the Trade

Page 45: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Detection ToolsPhoto by Wolfen Silva, 2004.

Page 46: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Attribute counting systems (Halstead, 1972; Ottenstein, 1976):

Numbers of unique operators

Numbers of unique operands

Total numbers of operator occurrences

Total numbers of operand occurrences

46

History (1)

Page 47: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Structure-based systems:

Each program is converted into token strings (or something similar)

Token streams are compared for determining similar source-code fragments

Tools: YAP3, JPlag, Plague, GPlag, XPlag., Plaggie, MOSS, Sherlock, Jones, Cogger, SID, SIM, …

47

History (2)

Page 48: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

CodeSuite (www.safe-corp.biz)

– Exact algorithm not published

– Patents apply

MossPlus (www.similix.com)

– Commercial version of MOSS

– “multi-million dollar copyright and criminal theft cases”

– Patents apply

48

Commercial Products (examples)

Page 49: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

int calculate(String arg) { int ans=0; for (int j=1; j<=100; j++) {

ans *= j;}

return ans;}

Integer doit(String v) { float result=0.0; for (float f=100.0; f > 0.0; f--)

result *= f; return result;}

49

Example (Tokenwise Equivalence)

type name(type name) start type name=number loop (type name=number name compare number operation name) start name operation name end

return nameend

Page 50: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Have a look at the program you have been given.

Can you spot the plagiarised bits?

How much is plagiarised?

What techniques have been used?

50

Intermission …

Page 51: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Guido Malpohl, Karlsruhe, Germany

Code fragment similarity values based on similar tokens found

Java, C#, C, C++, Scheme, and natural language text

Web-based: www.ipd.uni-karlsruhe.de/jplag

Algorithm: Parse programs and tokenise then pairwise compare using “Greedy String Tiling” (Prechelt et al., 2002)

maximises percentage of common token strings

worst case θ(n3), average case linear

Programs must compile?

51

JPlag

Page 52: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

52

JPlag Example

Page 53: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Alex Aiken, Berkeley/Stanford, USA, 1994

Multilingual: C, C++, Java, Pascal, Ada, ML, Lisp, or Scheme program

Web-based: theory.stanford.edu/~aiken/moss/

“Winnowing” (Schleimer et al., 2003)

Local document fingerprinting algorithm

Efficiency proven (33% of lower bound)

Guarantees detection of matches longer than a certain threshold

53

MOSS

Page 54: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

54

MOSS Example

Page 55: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

University of Warwick, Open Source

Open Source – sherlock.org.uk

Multilingual (including natural language), but works best on Java

Preprocesses code (not a full parse!) then simple string comparison. Preprocessing includes:

– Remove comments– Remove whitespace– Normalise formatting/indentation– Tokenise

55

Sherlock

Page 56: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

56

Sherlock Example

Page 57: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

57

Sherlock – Document Set

Page 58: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

MOSS, JPlag and Sherlock are effective

Results returned are not identical, but similar

User interface issues are important

Reliable sets of test data are unavailable.

None of these tools pulls material from the Internet

58

Effectiveness

Page 59: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Latent Semantic Analysis (Cosma and Joy, 2010)

Documents as “bags of words”Known technique in IRHandles synonymy and polysemyMaths is nasty

Clone Detection (Brixtel et al., 2010; Koschke, 2007)

Provenance of code in large software systemsUse of very large datasets (e.g. SourceForge)Not targeted at plagiarismTools include Dup and VCCFinder

59

Other Approaches

Page 60: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Prevention Strategies

Page 61: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Sometimes students are asked to copy

– group assignments

We ask students to share ideas

– that’s what universities are for!

Real programmers re-use code

What is plagiarism?

– maybe not a simple question after all!

61

Plagiarism vs. Collaboration

Page 62: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Never re-use assignments.

Assess deeper levels of learning.

Use tasks allowing multiple solutions.

Integrate tasks.

Set tasks based on recent events / sources.

Configure assignments so each students is given a slightly different version.

Require assignments to be done in controlled conditions (labs).

62

Prevention and Cure (1)

Page 63: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Define institution policy clearly.

Define rôles of institution bodies (exam board, tribunal, etc.)

Make disciplinary process also about learning.

Train staff.

Fast track procedure for minor cases.

Record and monitor.

Adapted from Carroll and Appleton (2001).

63

Prevention and Cure (2)

Page 64: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

ProcessOld Bailey, 2006. Unattributed (Wikimedia).

Page 65: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

First offence (unless very serious, e.g. PhD), meeting with appropriate senior member of staff in Department:

– tutor / friend / SU representative allowed to accompany student

– nominal penalty available (e.g. mark of 0 for assignment)

– “formative” experience for the student

Second offence (or serious first offence)

– University tribunal

– tutor / friend / SU representative allowed to accompany student

– full range of penalties (including expulsion)

65

Typical Institution Policy

Page 66: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Quality Assurance Code of Practice QA53.

Three levels of offence – Group 1 (minor), Group 2 (moderate), Group 3 (severe).

Possible penalties available for an offence specified by Group (see table in appendix to QA53).

Groups 1 and 2 offences dealt with by Department.

Group 3 offences initiate Board of Inquiry.

Appeals are allowed under certain conditions only.

66

University of Bath

Page 67: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Discussion

The round table, Great Hall, Winchester Castle, by Graham Horn, 2009.

Page 68: Plagiarism Workshop Mike Joy University of Bath, 29 February 2012

Evaluation