A Semiotic A Semiotic Information Information
Quality Quality Framework: Framework:
Applications and Applications and ExperimentsExperiments
Mr Gregory Hill,Mr Gregory Hill,
Prof. Graeme Shanks and Dr Prof. Graeme Shanks and Dr Rosanne PriceRosanne Price
Clayton School of ITClayton School of IT
Monash University, AustraliaMonash University, Australia
OverviewOverview• Research ContextResearch Context
• Theoretical BasisTheoretical Basis– Semiotic FrameworkSemiotic Framework
– Ontological ModelOntological Model
– Information TheoryInformation Theory
• ExperimentsExperiments– Impact of Data Quality TaggingImpact of Data Quality Tagging
– Impact of Data Quality TreatmentImpact of Data Quality Treatment
Research ContextResearch Context
• Semiotic Framework proposed Semiotic Framework proposed
– Shanks and Darke (1998)Shanks and Darke (1998)
• Further theoretical and empirical Further theoretical and empirical
developmentdevelopment
– Shanks and Price (2002-2005) (assessment)Shanks and Price (2002-2005) (assessment)
– Hill (2004) (measurement)Hill (2004) (measurement)
Theoretical BasisTheoretical BasisSemioticsSemiotics
• SemioticsSemiotics– Theory of signs and symbolsTheory of signs and symbols
– Philosophy, linguistics, information systemsPhilosophy, linguistics, information systems
• Understand signs at different levelsUnderstand signs at different levels– Syntactic (form)Syntactic (form)
– Semantic (meaning)Semantic (meaning)
– Pragmatic (use)Pragmatic (use)
Theoretical BasisTheoretical BasisSemiotics - cont’dSemiotics - cont’d
• Syntactic QualitySyntactic Quality– Conformance to meta-dataConformance to meta-data
• Semantic QualitySemantic Quality– Correspondence to external worldCorrespondence to external world
• Pragmatic QualityPragmatic Quality– Stakeholder assessmentStakeholder assessment
• Ratings (scores)Ratings (scores)
• Utility (prices)Utility (prices)
Theoretical BasisTheoretical BasisOntological ModelOntological Model
• Proposed by Wand and Wang (1996)Proposed by Wand and Wang (1996)
– IncompletenessIncompleteness
– AmbiguityAmbiguity
– Incorrectness (garbling)Incorrectness (garbling)
– MeaninglessnessMeaninglessness
• Measurement?Measurement?
W X
State Transitions
Representation
External World
Theoretical BasisTheoretical BasisInformation TheoryInformation Theory
• Proposed by Shannon and Weaver (1949)Proposed by Shannon and Weaver (1949)– Quantifies amount of informationQuantifies amount of information
– Information is “uncertainty removed”Information is “uncertainty removed”• Entropy: H(X) = – Entropy: H(X) = – EE[log p(x)] = -[log p(x)] = - p(x) log p(x) p(x) log p(x)
• Mutual Information: I(X;Y) = H(X) - H(X|Y)Mutual Information: I(X;Y) = H(X) - H(X|Y)
• Used in information economics, Used in information economics,
psychology, genetics, game theory, psychology, genetics, game theory,
cryptography, coding … but not cryptography, coding … but not
information systems?information systems?
Theoretical BasisTheoretical BasisModel ComparisonModel Comparison
Syntactic
Semantic
PragmaticEmpiricalEmpiricalSubjective Subjective
Assessment - Assessment -
Service-basedService-based
Ontological Ontological
ModelModelSubjective Subjective
Assessment - Assessment -
Product-basedProduct-based
Integrity RulesIntegrity Rules
EconomicEconomicSubjective Subjective
Measurement - Measurement -
Utility TheoryUtility Theory
Ontological Ontological
ModelModelObjective Objective
Measurement - Measurement -
Information Information
TheoryTheory
Integrity RulesIntegrity RulesSemiotic TheorySemiotic Theory
Experiment IExperiment IImpact of Data Quality TaggingImpact of Data Quality Tagging
• Data quality tags for human decision-Data quality tags for human decision-makingmaking
• Prior data quality tagging experimentsPrior data quality tagging experiments– Chengular-Smith et al (1999)Chengular-Smith et al (1999)
– Shanks and Tansley (2002)Shanks and Tansley (2002)
– Fisher et al (2003)Fisher et al (2003)
• Form of data quality tagsForm of data quality tags– Single criterionSingle criterion
– Objective normalised scoreObjective normalised score
Experiment IExperiment IImpact of Data Quality Tagging - Impact of Data Quality Tagging -
cont’dcont’d• Context-dependent tagsContext-dependent tags
– Semantic level criteriaSemantic level criteria
– Organisational role and taskOrganisational role and task
– Administrative/geographic contextAdministrative/geographic context
• Form of tagsForm of tags– Subjective (Likert Scale ratings)Subjective (Likert Scale ratings)
– Objective (for comparison)Objective (for comparison)
Experiment IExperiment IImpact of Data Quality Tagging - Impact of Data Quality Tagging -
cont’dcont’d Dependent
Variables Independent
Variables
Decision Strategy
Task Complexity
Data Quality Tagging
Decision Complacency
Decision Consensus
Decision Efficiency
Decision Confidence
Decision Time
Confidence Rating
Selected Apartment
Measures
Experiment IIExperiment IIImpact of Data Quality TreatmentsImpact of Data Quality Treatments
• Treatment of “dirty data” in CRM processesTreatment of “dirty data” in CRM processes
• Simulation of “real-world” scenariosSimulation of “real-world” scenarios
– Treatments (via garbling)Treatments (via garbling)
– Outcomes (via pay-offs)Outcomes (via pay-offs)
• Discover antecedents of value-creationDiscover antecedents of value-creation
– Scenario (process, pay-offs, customer attributes)Scenario (process, pay-offs, customer attributes)
– Data quality treatmentData quality treatment
Experiment IIExperiment IIImpact of Data Quality Treatments Impact of Data Quality Treatments
- cont’d- cont’dTreatme
nt Process
Customer
Attributes
Customer
Attributes
Customer
Attributes
Outcome
Outcome
Outcome
Noise Proces
s
Pay-offs
Decision
Process
External World
Information System
Experiment IIExperiment IIImpact of Data Quality Treatments Impact of Data Quality Treatments
- cont’d- cont’d• Value model of CRM processesValue model of CRM processes
– Hill (2004)Hill (2004)
• SIFT metrics for planning and SIFT metrics for planning and
monitoringmonitoring– SStake (pragmatic)take (pragmatic)
– IInfluence (pragmatic)nfluence (pragmatic)
– FFidelity (semantic)idelity (semantic)
– TTweak (semantic)weak (semantic)
Experiment IIExperiment IIImpact of Data Quality Treatments Impact of Data Quality Treatments
- cont’d- cont’d
Organisational Impact
Independent Variables
Dependent Variables
Construct
Measure
Scenario
Decision
Process
Treatment
Treatment
Treatment
InfluenceStake Fidelity Tweak Value
QuestionsQuestions
[email protected]@greg-hill.id.au