What Data Do We Needand Why Do We Need It?
Jim Pepin
Chief Technology Officer
University of Southern California
Network Data: Research Depends on It
Solutions depend on understanding the problem…Advances in many areas depend on analysis of real data• Network Management: Traffic engineering, net design• Network Control: Improving routing protocols• High Performance: Better transport protocols• Security: Tracking/stopping DoS and worm attacksOver 30% of papers in top networking conference
(SIGCOMM’04) depended on data collected by othersMost common providers: • ISPs (e.g., ATT, Sprint, I2)• Service Providers (e.g., Akamai)• Individual campuses (e.g., UNC, UOregon, USC – some
campuses give data only to local researchers)
Network Data: More than Just Packet Traces
Some data more sensitive than others• Dynamic routing information: routing protocol
advertisements• Static design information: Router configuration files,
peering arrangements, policies• Operational events: alarms, trouble tickets (very few
sources of this important info!)• Traffic logs: netflow records, packet header traces• Application data: URLs, p2p filenames, DNS queries
Tension – how much correlation to permit?• Data that can be correlated across multiple sites most
valuable in measuring network-wide events, e.g. worms• Techniques for privacy anonymize and blur identity
Example of Data Provider
DHS PREDICT• DHS support for network research• Not for operational use by DHS• Major Players• Peer review ground rules• Generic sources for legitimate research
LANDER Project• Example of PREDICT supplier• Joint project of USC-ISI networking division and USC/ISD Center
for High Performance Computing and Communications– USC-HPCC is manager of WAN for USC/CIT/JPL.– ISI provides networking research background– HPCC provides data storage and computational resources– We work together on ground rules and MOUs– LANDER funds collection systems, support staff and disk/tape
space
What is hard and easy
LANDER ground rules• Scrambled headers is primary product today• Requires MOU with researcher• No collection of data payloads.• Working on very strict MOU for very limited use of non-
scrambled header data for very select uses in very controlled environment.
• Build collection management system integrated with other PREDICT sites.
How we do this• Very close co-operation between ISI, ISD and university legal• MOUs will be very clear and understandable for the researcher• USC can reject any application• USC will review any publication based on unscrambled headers
and all work processing these headers will be done inside HPCC
Why would we do this
The Internet needs to be studied and engineered• What is the modern equivalent of Bell Labs for phone system?• How did we get to where we are today?
– Co-operation between researchers and operators.
• We can’t allow ourselves to have complete bunker mentality• We need to be selective in what we provide, but in case of
demonstrated need provide what is needed consistent with policies
• If we don’t do this no one will• The risks can be managed if we take the time and effort to work
with campus management (legal, CIOs etc) to mitigate• Researchers can be brought into these discussions if cast
correctly• If we don’t study how the network works our ability to manage it
will degrade to zero over time
Recommended