CS 498 Lecture 9Traffic Control for QoS
Jennifer Hou
Department of Computer Science
University of Illinois at Urbana-Champaign
Reading: Chapters 18, The Linux Networking Architecture: Design and Implementation of Network Protocols in the Linux Kernel
Traffic ControlTwo major functions Policing
Usually implemented at the router. Data connections are monitored and packets that are
transmitted violating a specified strategy are discarded.
Traffic shaping Usually implemented at end hosts. Data connections are regulated to conform to certain
rate. Surplus packets are either marked and then sent or delayed at the sender side until the rate constraint no longer holds true.
Processing of Network Data
Input de-multiplexing Forwarding Output queuing
Upper layers (TCP, UDP, …)
Traffic control
Ingress policing
Traffic Control in Linux Kernel
net/core/dev.cdriver.c
net/ipv4/ip_input.c
net/sched/sch_ingress.cTraffic control in
Incoming direction
Forwarding
Local delivery Locally created data
net/core/dev.c
dev_queue_xmit
net/sched/sch_*.cnet/sched/cls_*.cTraffic control inouting direction
dev-> hard_start_xmit
Traffic Control in Linux Kernel......
....
dev.cdev.c
dev.cdev.c
driver.cdriver.c
net_interrupt
netif_rx
net_rx_action
Schedulerdo_softirq
br_input.cbr_input.chandle_bridge
CONFIG_BRIDGE
dev_alloc_skb()
eth_type_trans()
CPU1 CPU2
softnet_data[cpun].input_pkt_queue
arp_rcv ip_rcv p8022_rcv
dev.cdev.c
dev_queue_xmit
dev->qdisc->enqueue
dev.cdev.c
driver.cdriver.cdev->hard_start_xmit
qdisc_run
qdisc_restart
dev->qdisc->dequeue
Scheduler
eth0 eth1
ip_queue_xmitarp_send
...
net_tx_action
ETH_P_802_2
Components of Traffic ControlQueuing disciplines Packets sent are passed to a queueing discipline and
sorted within the queue in compliance with specific rules.
Packets can be removed no earlier than when the queueing discipline has marked them as ready for transmission.
Classes (within a queuing disciplines) Within a queue discipline, packets can be allocated to
different classes.
Filters: are used to allocate packets to classes with a queueing discipline.
Queuing DisciplineEach network device has a queuing discipline
It controls how packets are enqueued on the device are treatedPossible operations: keep, drop, mark
A simple one may just consist of a single queue
Queuing discipline
Complex Queuing Discipline
Queuing discipline May use filters to distinguish
among different classes of packets
Process each class in a specific way
Two filters can point to one class
Classes do not store packets They use another queuing
discipline to do that
Queueing discipline
Filter Filter Filter
Class 2Class 1
QueueingDiscipline
QueueingDiscipline
Enqueuedequeue
Complex Queuing Discipline
Policing
When packets of a connection are enqueued, the connection can be policed:Letting the packets goDropping the packetsLetting the packets go but mark them
Data Structures
Include/net/pkt_sched.h
Include/net/sch_generic.h
Traffic Control in Linux Kernel
Traffic control kernel code resides mainly in net/schedTraffic control in the incoming direction is
handled by net/sched/sch_ingress.c.Various scheduling disciplines in the
outgoing direction are given in net/sched/sch_*.cnet/sched/cls_*.c
Traffic Control in Linux Kernel
Interface used inside the kernel can be found in
/usr/src/linux-(version)/include/net/pkt_cls.h /usr/src/linux-(version)/include/net/pket_sched.h
Interfaces between kernel traffic control and user space programs are decared in
/usr/include/linux/pkt_cls.h /usr/include/linux/pkt_sched.h.
Inserting Traffic Controldev.c, net/sched/*dev.c, net/sched/*
softirq.c, netdevice.hsoftirq.c, netdevice.h
dev->qdisc->enqueue
driver.cdriver.cdev->hard_start_xmit
qdisc_run
qdisc_restart dev->qdisc->dequeue
Scheduler
net_tx_action
dev_queue_xmit
timer_handler netif_scheduleTimer
cpu_raise_softirq
do_softirq
NET_TX_SOFTIRQQueueing discipline
Filter Filter Filter
Class 2Class 1
QueueingDiscipline
QueueingDiscipline
Enqueuedequeue
Queueing Discipline -- Qdiscstruct Qdisc{ int (*enqueue)(struct sk_buff *skb, struct Qdisc *dev); struct sk_buff * (*dequeue)(struct Qdisc *dev); unsigned flags;32 #define TCQ_F_BUILTIN 1#define TCQ_F_THROTTLED 2#define TCQ_F_INGRESS 4 int padded; struct Qdisc_ops *ops; u32 handle; u32 parent; atomic_t refcnt; struct sk_buff_head q; struct net_device *dev; struct list_head list;
struct gnet_stats_basic bstats; struct gnet_stats_queue qstats; struct gnet_stats_rate_est rate_est; spinlock_t *stats_lock; struct rcu_head q_rcu; int (*reshape_fail)(struct sk_buff *skb, struct Qdisc *q); struct Qdisc *__parent;
};
The network device to which the Qdisc isallocated
The Qdisc_ops data structure
The socket buffer queue governed bythis qdisc
When an outer queue passes a packet to an inner queuethe packet may have to be discarded. If the outer queueing discipline implements the callback functionreshape_fail then it can be invoked by the inner queueingdiscipline.
Queuing Disciplines –Qdisc_opsstruct Qdisc_ops {
struct Qdisc_ops *next; struct Qdisc_class_ops *cl_ops;char id[IFNAMSIZ];int priv_size;int (*enqueue)(struct sk_buff *, struct Qdisc *);struct sk_buff * (*dequeue)(struct Qdisc *);int (*requeue)(struct sk_buff *, struct Qdisc *);unsigned int (*drop)(struct Qdisc *);int (*init)(struct Qdisc *, struct rtattr *arg);void (*reset)(struct Qdisc *);void (*destroy)(struct Qdisc *);int (*change)(struct Qdisc *, struct rtattr *arg);int (*dump)(struct Qdisc *, struct sk_buff *); };
The packet should bearranged at the positionin the queueing disciplinewhere it has been before
A queueing discipline can be added via register_qdisc() in init_module()
Qdisc_opsenqueue() Enqueues a packet Return values are
NET_XMIT_SUCCESS, if the packet is accepted NET_XMIT_DROP, if the packet is discarded NET_XMIT_CN, if the packet is discarded because of
buffer overflow NET_XMIT_POLICED, if the packet is discarded
because of violation of a policing rule. NET_XMIT_BYPASS, if the packet is accepted, but will
not leave the queue via the regular dequeue() function.
Qdisc_opsdequeue() Returns a pointer to a packet (skb) eligible for sending A return value of null means that there are no packets ready
to be sent. (The total number of packets in the queue is given in struct Qdisc* qq.len.)
requeue() Puts a packet back into the original position in the queue
where it had been before. The number of packets running through the queue should
not be increased. drop() Drops one packet from the queue
Qdisc_opsinit() Initializes the queuing discipline
reset() Resets the queuing discipline to its initial state (empty queue,
reset counter, delete times)
destroy() Removes a queuing discipline and frees all the resources
reserved during the runtime of the queueing discipline.
change() Changes the parameters of a queuing discipline
dump() Returns output configuration parameters and statistics of a
queueing discipline.
Qdisc_class_opsstruct Qdisc_class_ops { /* Child qdisc manipulation */ int (*graft)(struct Qdisc *, unsigned long cl, struct Qdisc *, struct Qdisc **); struct Qdisc * (*leaf)(struct Qdisc *, unsigned long cl); /* Class manipulation routines */ unsigned long (*get)(struct Qdisc *, u32 classid); void (*put)(struct Qdisc *, unsigned long); int (*change)(struct Qdisc *, u32, u32, struct rtattr **, unsigned long *); int (*delete)(struct Qdisc *, unsigned long); void (*walk)(struct Qdisc *, struct qdisc_walker * arg); /* Filter manipulation */ struct tcf_proto ** (*tcf_chain)(struct Qdisc *, unsigned long); unsigned long (*bind_tcf)(struct Qdisc *, unsigned long, u32 classid); void (*unbind_tcf)(struct Qdisc *, unsigned long); /* rtnetlink specific */ int (*dump)(struct Qdisc *, unsigned long, struct sk_buff *skb, struct tcmsg*); int (*dump_stats)(struct Qdisc *, unsigned long, struct gnet_dump *); };
Qdisc_class_opsgraft(): binds a queueing discipline to a classleaf(): returns a pointer to the queueing discipline currently bound to the classget(): maps the classid to the internal identification and increments the reference counter by one. Each class is associated with two ids
classid (of type u32) is used by the user and the configuration tools used in the user space.
Internal identification (of type unsigned long) is used within the kernel
put(): decrements the usage counter.
Qdisc_class_opschange(): changes the class parametersdelete(): checks if the class is not referenced; and if not, deletes the class.walk(): walks through the linked list of the all the classes of a queueing discipline and invokes the associated callback functions to obtain configuration/statistics data.tcf_chain(): returns a pointer to the linked list for the filter bound to the class.bind_tcf(): binds a filter to a class.dump_class(): gives configuration and statistics data of a class.
tcf_protostruct tcf_proto{ /* Fast access part */ struct tcf_proto *next; void *root; int (*classify)(struct sk_buff*, struct tcf_proto*, struct tcf_result *); u32 protocol; /* All the rest */ u32 prio; u32 classid; struct Qdisc *q; void *data; struct tcf_proto_ops *ops;};
tcf_proto_opsstruct tcf_proto_ops{ struct tcf_proto_ops *next; char kind[IFNAMSIZ]; int (*classify)(struct sk_buff*, struct tcf_proto*, struct tcf_result *); int (*init)(struct tcf_proto*); void (*destroy)(struct tcf_proto*); unsigned long (*get)(struct tcf_proto*, u32 handle); void (*put)(struct tcf_proto*, unsigned long); int (*change)(struct tcf_proto*, unsigned long, u32 handle, struct rtattr **, unsigned long *); int (*delete)(struct tcf_proto*, unsigned long); void (*walk)(struct tcf_proto*, struct tcf_walker *arg); /* rtnetlink specific */ int (*dump)(struct tcf_proto*, unsigned long, struct sk_buff *skb, struct tcmsg*); struct module *owner;};
tcf_proto_ops
classify(): classifies a packet (checks if the filtering rule applies to the packet) Possible return values are
TC_POLICE_OK: the packet is accepted by the filter. TC_POLICE_RECLASSIFY: the packet violates agreed
parameters and should be allocated to a different class. TCP_POLICE_SHOT: the packet was dropped because
of violation of agreed parameters TCP_POLICE_UNSPEC: The rule does not match the
packet, and the packet should be passed to the next filter. tcf_result contains the classid and the internal
identification of the class.
Queueing Discipline Example
net/sched/sch_red.c
RED
Dropping Probability pa
ap
pb
minth maxth
maxp
1
countpp ba * Linux implementation
RED implementation Istruct red_sched_data { /* Parameters */
u32 limit; /* HARD maximal queue length */ u32 qth_min; /* Min average length threshold: A scaled */ u32 qth_max; /* Max average length threshold: A scaled
*/ char Wlog; /* log(W) */ char Plog; /* random number bits */ …unsigned long qave; /* Average queue length: A scaled */ int qcount; /* Packets since last random number
generation */ u32 qR; /* Cached random number */ psched_time_t qidlestart; /* Start of idle period */ struct tc_red_xstats st; };
RED implementation II: Compute average queue length
We want: avg = avg * (1- w) +w * backlog
Code in Linux:q->qave += sch->stats.backlog - (q-
>qave >> q->Wlog);
Why:avg = q->qave * ww = 2^(-wlog)
RED implementation III
Ideally avg should be calculated every constant clock interval
In Linux it is updated every packet outgoing
Care need to be taken for idle period
RED implementation IV:Decide dropping probabilityWe want: enqueue if
Linux code: if (((q->qave - q->qth_min)>>q-
>Wlog)*q->qcount < q->qR) goto enqeue;
max_P = (qth_max – qth_min)/2^Plog
q->qR = rnd * 2^Plog
rndcountPavg
thth
th
*max_*minmax
min