What everybody knows but nobody says
can hurt interdisciplinary research
John V. CarlisUniversity of Minnesota
“What we have here is a lack of communication”
Messages What everybody knows, nobody says Missed (not mis-) communication:
painful surprises Plan for success
exponential growth in data beyond human scale
Yeah! good work to do invent together
Inter-Disciplinary Research
IT-ist [CS/Eng. … /Math/Stat] Tool Builder [content neutral]
Biologist [soils/neuro/microbio/dent/biochem/ecol/vet] Content Seeker/Maker [tool user]
Surprises & DoInter-Alien Research
Can an IT-ist become a Biologist or vice versa?
Well, life’s too short specialization of labor bioinformatics grad minor
CBCB in our future?
Ante-Disciplinary?
Business Surprise User: NO! IT: but I built what you told me to build User: I gave you a typical example, but of
course there are exceptions IT: you didn’t tell me User: you didn’t ask,
and, besides, everybody knows that
worse in science – but why?
Science for IT is harder Business – human decides complexity Science -- reality >> models Exponential growth in data Competing models Lots of vocabulary Specific vs Abstract Vocabulary sloshes
Surprises
Surprise (1/5): Context Ph is not Ph
need to remember instrument used Annotation
Beyond genome is harder What
plus When & Where [microarray; mass spec]
harder to share/re-use data
Surprise (2/5): Casual Vocabulary chimp chimp + baby chimp + offspring chimp + offspring
+ close personal friend
Surprise (3/5): Success Brings Pain Prosite’s curated protein patterns + descriptions: ~2 mb of free (con)text
human browses toooo little success tooooooo many
Genbank Obsolete fields “misc”
Parsing free text is hard & error prone
Surprise (4/5): Vocabulary missing/overloaded/off
Text readable only by those who already know Nouns – pretty good Verbs -- Janeway’s “Immunology”:
mediate, … “Pathway” BAD diagrams
248
e.g., “metadata”
Surprise (5/5): Idiosyncratic brain viewing Different machines,conditions &
warping parameters Fuss ‘til it looks right
a day’s work! requires scarce expertise Doesn’t scale to comparisons among images
processing plan is data too
Can IT-ist ignore performance? IT-ist expects specifications Short run efficiency for given specs
get it working but cycles/space cheap/available
Change? Plan for unplanned changes
not trained/rewarded attitudelack vision
Togetherness
Communication
Anchored/Enabled/Rewarded
Vocabulary Mantra:what do we mean by one of this type?
Data Model What to remember, not how Fine distinctions [singular/plural]
disease vs affliction host vs pathogen Multi, not single function,
so not partition cluster
Hit Limits DBMS Extensions “manual” brain image manipulations
new content-neutral operators “this” is a special case of what more
general task constant vector multi-hull
not “the” query;parachute in then explore territory
Interdisciplinary Impedance Mismatch
Mundane vs interesting Messy problem (seeking insights)
vs optimal solution (irrelevant but hopeful)
Good clusters/fast algorithm/DB not directly a Bio goal
Some professional danger but big potential reward
Good Work Expect to struggle to communicate
invent vocabulary define verbs
Seek visionary colleagues