22
UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g., socket(), bind(), listen(), accept()) share the single system call, sys_socketcall. This front end is defined in net/socket.c. The parameter, call, that is passed to the sys_socketcall front end identifies the specific operation to be performed. The possible values for call are defined in include/linux/net.h: 30 #define SYS_SOCKET 1 /* sys_socket(2) */ 31 #define SYS_BIND 2 /* sys_bind(2) */ 32 #define SYS_CONNECT 3 /* sys_connect(2) */ 33 #define SYS_LISTEN 4 /* sys_listen(2) */ 34 #define SYS_ACCEPT 5 /* sys_accept(2) */ 35 #define SYS_GETSOCKNAME 6 /* sys_getsockname(2) */ 36 #define SYS_GETPEERNAME 7 /* sys_getpeername(2) */ 37 #define SYS_SOCKETPAIR 8 /* sys_socketpair(2) */ 38 #define SYS_SEND 9 /* sys_send(2) */ 39 #define SYS_RECV 10 /* sys_recv(2) */ 40 #define SYS_SENDTO 11 /* sys_sendto(2) */ 41 #define SYS_RECVFROM 12 /* sys_recvfrom(2) */ 42 #define SYS_SHUTDOWN 13 /* sys_shutdown(2) */ 43 #define SYS_SETSOCKOPT 14 /* sys_setsockopt(2) */ 44 #define SYS_GETSOCKOPT 15 /* sys_getsockopt(2) */ 45 #define SYS_SENDMSG 16 /* sys_sendmsg(2) */ 46 #define SYS_RECVMSG 17 /* sys_recvmsg(2) */ In addition to one of the above function call identifiers, sys_socketcall() is also passed a pointer to the table of argument pointers via the parameter args. 1545 asmlinkage long sys_socketcall(int call, unsigned long *args) 1546 { 1547 unsigned long a[6]; 1548 unsigned long a0, a1; 1549 int err; It first checks whether the specific socket function call identifier is a valid one. 1551 if(call < 1 || call > SYS_RECVMSG) 1552 return -EINVAL; 1

UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

UDP Socket Creation

The socketcall system call.

All standard library functions that operate upon sockets (e.g., socket(), bind(), listen(), accept())share the single system call, sys_socketcall. This front end is defined in net/socket.c. The parameter,call, that is passed to the sys_socketcall front end identifies the specific operation to be performed.

The possible values for call are defined in include/linux/net.h:

30 #define SYS_SOCKET 1 /* sys_socket(2) */ 31 #define SYS_BIND 2 /* sys_bind(2) */ 32 #define SYS_CONNECT 3 /* sys_connect(2) */ 33 #define SYS_LISTEN 4 /* sys_listen(2) */ 34 #define SYS_ACCEPT 5 /* sys_accept(2) */ 35 #define SYS_GETSOCKNAME 6 /* sys_getsockname(2) */ 36 #define SYS_GETPEERNAME 7 /* sys_getpeername(2) */ 37 #define SYS_SOCKETPAIR 8 /* sys_socketpair(2) */ 38 #define SYS_SEND 9 /* sys_send(2) */ 39 #define SYS_RECV 10 /* sys_recv(2) */ 40 #define SYS_SENDTO 11 /* sys_sendto(2) */ 41 #define SYS_RECVFROM 12 /* sys_recvfrom(2) */ 42 #define SYS_SHUTDOWN 13 /* sys_shutdown(2) */ 43 #define SYS_SETSOCKOPT 14 /* sys_setsockopt(2) */ 44 #define SYS_GETSOCKOPT 15 /* sys_getsockopt(2) */ 45 #define SYS_SENDMSG 16 /* sys_sendmsg(2) */ 46 #define SYS_RECVMSG 17 /* sys_recvmsg(2) */

In addition to one of the above function call identifiers, sys_socketcall() is also passed a pointer tothe table of argument pointers via the parameter args. 1545 asmlinkage long sys_socketcall(int call,

unsigned long *args) 1546 { 1547 unsigned long a[6]; 1548 unsigned long a0, a1; 1549 int err;

It first checks whether the specific socket function call identifier is a valid one.

1551 if(call < 1 || call > SYS_RECVMSG) 1552 return -EINVAL;

1

Page 2: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

An array of argument counts, nargs[18], has been defined in net/socket.c. There is one element foreach of the 17 socket functions. The value of nargs[j] is the size of the argument list required byfunction j. The macro AL(x) converts the number of arguments to the size of the argument list bymulitplying by the size of a long integer. For example the bind function has index j = 2 and hasthree arguments and thus the value of nargs[2] is 12.

int bind(int sockfd, struct sockaddr *addr, socklen_t addrlen);

The entry for bind is shown in emphasized typeface in the table below.

1530 /* Argument list sizes for sys_socketcall */ 1531 #define AL(x) ((x) * sizeof(unsigned long)) 1532 static unsigned char

nargs[18]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3),AL(3),AL(3),AL(4),AL(4),AL(4),AL(6),

AL(6),AL(2),AL(5),AL(5),AL(3),AL(3)}; 1535 #undef AL

The sys_socketcall() function next copies the arguments from user-space to the local array a[] inkernel-space using the size of argument list specified in the nargs[] array.

1554 /* copy_from_user should be SMP safe. */ 1555 if (copy_from_user(a, args, nargs[call])) 1156 return -EFAULT;

All socket functions have at least two arguments. The first two are stored in a0 and a1.

1557 a0=a[0]; 1559 a1=a[1];

2

Page 3: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

A switch statement then calls to the required socket service function identified by the socketfunction call identifier. A few of these are shown here:

1561 switch(call) 1562 { 1563 case SYS_SOCKET: 1564 err = sys_socket(a0,a1,a[2]); 1565 break; 1566 case SYS_BIND: 1567 err = sys_bind(a0,(struct sockaddr *)a1, a[2]); 1568 break;

1569 case SYS_CONNECT: 1570 err = sys_connect(a0, (struct sockaddr *)a1, a[2]); 1571 break;

1572 case SYS_LISTEN: 1573 err = sys_listen(a0,a1); 1574 break;

::

1619 } 1620 return err; 1621 }

3

Page 4: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Generic Socket Creation

A user program creates a socket via the socket() function call:

int socket (int family, int type, int protocol); Returns: nonnegative descriptor if OK, -1 on error.

The family parameter specifies a protocol family such as PF_INET. For the PF_INET family, thesocket type is normally SOCK_STREAM (TCP) or SOCK_DGRAM(UDP). The protocol field isnormally set to zero (IPROTO_IP) for PF_INET sockets. This causes the default protocol for thespecified socket type to be selected. If the socket type is SOCK_RAW, then the protocolIPPROTO_UDP must be specified if a UDP socket is to be created (and the application must provideUDP and IP headers).

910 asmlinkage long sys_socket(int family, int type, int protocol)

911 { 912 int retval; 913 struct socket *sock;

The real work of socket creation is driven by the sock_create() function which also resides insocket.c .

915 retval = sock_create(family, type, protocol, &sock); 916 if (retval < 0) 917 goto out;

If the socket was created successfully, it must be mapped into the file space so that it can beaccessed via the fd which is returned in retval.

919 retval = sock_map_fd(sock); 920 if (retval < 0) 921 goto out_release; 923 out: 924 /* It may be already another descriptor 8)

Not kernel problem. */ 925 return retval; 927 out_release: 928 sock_release(sock); 929 return retval; 930 }

4

Page 5: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

The sock_create() function

The first step is to verify that both the family (PF_INET) and the type (SOCK_DGRAM) are at leastin range. This test does not ensure that they are actually valid. Exactly one warning that the typeSOCK_PACKET is deprecated is also generated if that type is specified. The ``ugly moron''comment presumably refers to the dynamic loading of the protocol module which follows and notthe SOCK_PACKET.

833 int sock_create(int family, int type, int protocol, struct socket **res)

834 { 835 int i; 836 struct socket *sock; 838 /* 839 * Check protocol is in range 840 */ 841 if (family < 0 || family >= NPROTO) 842 return -EAFNOSUPPORT; 843 if (type < 0 || type >= SOCK_MAX) 844 return -EINVAL; 845 846 /* Compatibility. 848 This uglymoron is moved from INET layer to here to

avoid deadlock in module load. 850 */ 851 if (family == PF_INET && type == SOCK_PACKET) { 852 static int warned; 853 if (!warned) { 854 warned = 1; 855 printk(KERN_INFO "%s uses obsolete

(PF_INET,SOCK_PACKET)\n",current->comm); 856 } 857 family = PF_PACKET; 858 }

5

Page 6: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Linux supports the dynamic loading of complete protocol stacks as modules. If sock_create() findsthe requested protocol family pointer in net_families[] to be null, it attempts to dynamically load itusing the request_module() function. This function tries to locate and load the module using thealias entries in /etc/modules.conf and in the folder /lib/modules/'kernel-version'. For example, if

#define PF_NEWPROTO 9

is requested then the module name requested will be net-pf-9. Since the PF_INET is always registersitself at boot time , this code is not relevant to creating sockets of type PF_INET.

860 #if defined(CONFIG_KMOD) && defined(CONFIG_NET) 861 /* Attempt to load a protocol module if the find 862 failed. 12/09/1996 Marcin: But! this makes REALLY 863 only sense, if the user requested real, full- 864 featured networking support upon configuration. 865 Otherwise module support will break! 866 */ 867 if (net_families[family]==NULL) 868 { 869 char module_name[30]; 870 sprintf(module_name,"net-pf-%d",family); 871 request_module(module_name); 872 } 873 #endif 874 875 net_family_read_lock();

After the attempt to load the protocol completes if the protocol family remains unregistered then afatal error is recognized. This error should never occur during creation of a UDP socket.

876 if (net_families[family] == NULL) { 877 i = -EAFNOSUPPORT; 878 goto out; 879 }

6

Page 7: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Next sock_create() attempts to allocate a socket structure. In reality this entails allocating an inodestructure from the system's memory resident inode cache and then using the embedded socketstructure.

881 /* 882 Allocate the socket and allow the family to set 883 things up. If the protocol is 0, the family is 884 instructed to select an appropriate default. 885 */ 887 if (!(sock = sock_alloc())) 888 { 889 printk(KERN_WARNING "socket: no more sockets\n"); 890 i = -ENFILE; 891 /* Not exactly a match, but its closest posix thing*/ 892 goto out; 893 }

When sock_alloc() returns the new socket structure address to sock_create(), the socket type(SOCK_DGRAM for UDP) is stored in the structure. Then the create function for the protocolfamily is called. For PF_INET this function is inet_create() which was previously encountered inthe inet_init() function. Failure to provide a create function will cause a kernel oops here.

895 sock->type = type; 897 if ((i = net_families[family]->create(sock, protocol))

< 0) 898 { 899 sock_release(sock); 900 goto out; 901 }

On return from the indirectly invoked inet_create(), the variable i holds the return code. Thesock_create() function stores the socket pointer in the location provided by sys_socket() and thenreturns i to sys_socket().

903 *res = sock; 905 out: 906 net_family_read_unlock(); 907 return i; 908 }

7

Page 8: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Socket allocation

The sock_alloc() function defined in net/socket.c allocates an inode and returns the address of thestruct socket that is embedded in the inode.

436 struct socket *sock_alloc(void) 437 { 438 struct inode * inode; 439 struct socket * sock; 440 441 inode = get_empty_inode(); 442 if (!inode) 443 return NULL;

After get_empty_inode returns, sock_alloc() continues to initialize various inode member variables.The superblock pointer obtained from the mount point structure associated with the socket VFS thatwas created during system initialization.

445 inode->i_sb = sock_mnt->mnt_sb;

The socki_lookup function is an inline one-line function that returns a pointer to the struct socketembedded in the inode.

446 sock = socki_lookup(inode);

The sock_alloc() function continues to initialize the inode variable. Here it sets flags in bothi_mode and i_sock to indicate that this inode represents a socket. However, the bindings that causesys_read() and sys_write() to be vectored to the networking code are established later in the call tosock_map_fd(). 448 inode->i_mode = S_IFSOCK|S_IRWXUGO; 449 inode->i_sock = 1; 450 inode->i_uid = current->fsuid; 451 inode->i_gid = current->fsgid;

The struct socket itself is initialized here.

453 sock->inode = inode; 454 init_waitqueue_head(&sock->wait); 455 sock->fasync_list = NULL; 456 sock->state = SS_UNCONNECTED; 457 sock->flags = 0; 458 sock->ops = NULL; 459 sock->sk = NULL; 460 sock->file = NULL; 462 sockets_in_use[smp_processor_id()].counter++; 463 return sock; 464 }

8

Page 9: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Inode allocation

The get_empty_inode() function is defined in fs/inode.c.

805 struct inode * get_empty_inode(void) 806 { 807 static unsigned long last_ino; 808 struct inode * inode; 810 spin_lock_prefetch(&inode_lock); 812 inode = alloc_inode();

The get_empty_inode() functions ensures that inode allocation succeded then proceeds to enqueue inon the inode_in_use list and intialize several fields. The inode numbers appears to be sequentiallyassigned using the static variable last_ino.

813 if (inode) 814 { 815 spin_lock(&inode_lock); 816 inodes_stat.nr_inodes++; 817 list_add(&inode->i_list, &inode_in_use); 818 inode->i_sb = NULL; 819 inode->i_dev = 0; 820 inode->i_blkbits = 0; 821 inode->i_ino = ++last_ino; 822 inode->i_flags = 0; 823 atomic_set(&inode->i_count, 1); 824 inode->i_state = 0; 825 spin_unlock(&inode_lock); 826 clean_inode(inode); 827 } 828 return inode; 829 }

The alloc_inode() macro defined in fs/inode.c allocates an inode from the inode_cache.

78 #define alloc_inode() \ 79 ((struct inode *)kmem_cache_alloc(inode_cachep,

SLAB_KERNEL))

SLAB_KERNEL is defined in include/linux/slab.h as GFP_KERNEL. Slab allocator flagscorrespond to get_free_page() flags.

9

Page 10: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Recovery of a socket pointer from an inode

The socki_lookup function retrieves the address of the socket structure embedded in the inode.

378 extern __inline__ struct socket *socki_lookup(struct inode *inode)

379 { 380 return &inode->u.socket_i; 381 }

Protocol family dependent initialization

The mission of IPV4 dependent socket creation is to create and initialize the struct sock. The structsocket and struct sock are two related and easy-to-confuse structures that are widely used within theTCP/IP implementation. To minimize the confusion a standard naming convention is used.

Variables named:

sock always refer to the generic BSD struct socket sk always refer to the INET protocol dependent structure struct sock.

The IPV4 struct sock is defined in include/net/sock.h. Some of the members in this large structureinclude the following:

489 struct sock { 490 /* Socket demultiplex comparisons on incoming packets. */ 491 __u32 daddr; 492 __u32 rcv_saddr; 493 __u16 dport; 494 unsigned short num; 495 int bound_dev_if;

Two hash queues are provided. For raw sockets, the one associated with the next and pprev pointersis used to link all the sk's associated with a given struct proto. A pointer to the hash function iscontained in the struct proto. For the raw struct proto the hash key is the low order 5 bits of theprotocol number. In contrast, for UDP, the 64K port space is divided into 128 hash queues and thestruct socks are mapped to a hash queue by local port number. The queuing mechanism is a bitunusual here. The next pointer points to the next struct sock in the queue, but the pprev pointerpoints to the next pointer of the previous element in the queue. The bind_next and bind_pprevpointers are not used in raw or UDP sockets. They are used by TCP, to link all struct socks that arebound to a single port.

497 /* Main hash linkage for various protocol lookup tables. */ 498 struct sock *next; 499 struct sock **pprev; 500 struct sock *bind_next; 501 struct sock **bind_pprev;

10

Page 11: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Other fields are dependent upon system configuration and include areas that are private to specificprotocols.

610 /* This is where all the private (optional) areas that don't 611 * overlap will eventually live. 612 */ 613 union { 614 void *destruct_hook; 615 struct unix_opt af_unix; 616 #if defined(CONFIG_INET) || defined (CONFIG_INET_MODULE) 617 struct inet_opt af_inet; 618 #endif

619 ..Other Private Protocol Info Structures.. ... 659 } protinfo;

IP dependent socket creation with inet_create()

The inet_create() function resides in linux/net/ipv4/af_inet.c. Iit is passed a pointer to the genericstruct socket and the either the IP specific transport protocol number or (typically in the case ofUDP) the wild-card value of 0. Its primary function is to create and initialize the IP dependentstruct sock.

319 static int inet_create(struct socket *sock, int protocol) 320 { 321 struct sock *sk; 322 struct list_head *p; 323 struct inet_protosw *answer; 325 sock->state = SS_UNCONNECTED; 326 sk = sk_alloc(PF_INET, GFP_KERNEL, 1);

On return, inet_create() ensures that the struct sock was created.

327 if (sk == NULL) 328 goto do_oom;

11

Page 12: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

The protocol matching procedure

In the protocol lookup loop, the list of all the inet_protosw structures (normally only 1) associatedwith socket type, (SOCK_DGRAM for UDP), is searched. Since the input protocol is typically 0(IPPROTO_IP), in line number 342 a protocol match is found. In line 343 protocol is set to thedefault protocol namely IPPROTO_UDP for our SOCK_DGRAM socket type. 330 /* Look for the requested type/protocol pair. */ 331 answer = NULL; 332 br_read_lock_bh(BR_NETPROTO_LOCK); 333 list_for_each(p, &inetsw[sock->type]) { 334 answer = list_entry(p, struct inet_protosw, list); 336 /* Check the non-wild match. */ 337 if (protocol == answer->protocol) { 338 if (protocol != IPPROTO_IP) 339 break; 340 } else { 341 /* Check for the two wild cases. */ 342 if (IPPROTO_IP == protocol) { 343 protocol = answer->protocol; 344 break; 345 } 346 if (IPPROTO_IP == answer->protocol) 347 break; 348 } 349 answer = NULL; 350 } 351 br_read_unlock_bh(BR_NETPROTO_LOCK);

353 if (!answer) 354 goto free_and_badtype; 355 if (answer->capability > 0 &&

! capable (answer->capability)) 356 goto free_and_badperm;

357 if (!protocol) 358 goto free_and_noproto;

Here the struct proto_ops pointer is copied from the inet_protosw structure to the struct socket. Thiscontains pointers to the high level (somewhat transport independent) entry points to the AF_INETprotocol stack. 360 sock->ops = answer->ops;

12

Page 13: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

For sockets of type SOCK_DGRAM, sock->ops points to the inet_dgram_ops structure which isdefined in net/ipv4/af_inet.c

973 struct proto_ops inet_dgram_ops = { 974 family: PF_INET, 975 976 release: inet_release, 977 bind: inet_bind, 978 connect: inet_dgram_connect, 979 socketpair: sock_no_socketpair, 980 accept: sock_no_accept, 981 getname: inet_getname, 982 poll: datagram_poll, 983 ioctl: inet_ioctl, 984 listen: sock_no_listen, 985 shutdown: inet_shutdown, 986 setsockopt: inet_setsockopt, 987 getsockopt: inet_getsockopt, 988 sendmsg: inet_sendmsg, 989 recvmsg: inet_recvmsg, 990 mmap: sock_no_mmap, 991 sendpage: sock_no_sendpage, 992 };

Next the sk->prot field is set to the struct proto pointer from the answer pointer to structinet_protosw.

361 sk->prot = answer->prot;

This structure contains pointers to entry points associated with the specific transport protocol beingused, which in the case of UDP is udp_prot. The udp_prot structure is defined in net/ipv4/udp.c, thisstructure holds the UDP protocol specific functions.

1013 struct proto udp_prot = { 1014 name: "UDP", 1015 close: udp_close, 1016 connect: udp_connect, 1017 disconnect: udp_disconnect, 1018 ioctl: udp_ioctl, 1019 setsockopt: ip_setsockopt, 1020 getsockopt: ip_getsockopt, 1021 sendmsg: udp_sendmsg, 1022 recvmsg: udp_recvmsg, 1023 backlog_rcv: udp_queue_rcv_skb, 1024 hash: udp_v4_hash, 1025 unhash: udp_v4_unhash, 1026 get_port: udp_v4_get_port, 1027 };

13

Page 14: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

The value of no_check for UDP is UDP_CSUM_DEFAULT and the value of flags isINET_PROTOSW_PERMANENT. Here we also see that sk->num is set to the protocol number forsockets of type raw. The comments in the structure definition say that this field is the local portnumber. Later in this section we will see that for non-raw sockets it does indeed contain a the localport number.

362 sk->no_check = answer->no_check; 363 if (INET_PROTOSW_REUSE & answer->flags) 364 sk->reuse = 1; 366 if (SOCK_RAW == sock->type) { 367 sk->num = protocol; 368 if (IPPROTO_RAW == protocol) 369 sk->protinfo.af_inet.hdrincl = 1; 370 }

372 if (ipv4_config.no_pmtu_disc) 373 sk->protinfo.af_inet.pmtudisc = IP_PMTUDISC_DONT; 374 else 375 sk->protinfo.af_inet.pmtudisc = IP_PMTUDISC_WANT; 377 sk->protinfo.af_inet.id = 0;

The sock_init_data() funciton defined in net/core/sock.c initialises the struct sock and links it withthe struct socket. The two structures are doubly linked by pointers in each of them to the other. Thesk pointer in the struct socket points to a struct sock. The owner pointer in struct sock points to theassociated struct socket.

379 sock_init_data(sock, sk);

Miscellaneous fields of the struct sock are initialized here.

381 sk->destruct = inet_sock_destruct; 383 sk->zapped = 0; 384 sk->family = PF_INET; 385 sk->protocol = protocol; 387 sk->backlog_rcv = sk->prot->backlog_rcv; 389 sk->protinfo.af_inet.ttl = sysctl_ip_default_ttl; 391 sk->protinfo.af_inet.mc_loop = 1; 392 sk->protinfo.af_inet.mc_ttl = 1; 393 sk->protinfo.af_inet.mc_index = 0; 394 sk->protinfo.af_inet.mc_list = NULL;

396 #ifdef INET_REFCNT_DEBUG 397 atomic_inc(&inet_sock_nr); 398 #endif

14

Page 15: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

The value of sk->num as stated earlier is overloaded. For raw sockets it was set to the protocolnumber in line 367 of this function. Thus, it is mandatory that the protocol provide a hash function.For UDP/TCP sockets created as SOCK_DGRAM or SOCK_STREAM sk->num is a port numberand will be 0 here.

400 if (sk->num) { 401 /* It assumes that any protocol which allows 402 * the user to assign a number at socket 403 * creation time automatically 404 * shares. 405 */ 406 sk->sport = htons(sk->num); 408 /* Add to protocol hash chains. */ 409 sk->prot->hash(sk); 410 }

Since the udp_prot structure doesn't provide an init function the following code block does nothingin the case of a UDP socket.

412 if (sk->prot->init) { 413 int err = sk->prot->init(sk); 414 if (err != 0) { 415 inet_sock_release(sk); 416 return err; 417 } 418 } 419 return 0; 435 }

The sk_alloc() function resides in net/core/sock.c and allocates the requested structure from thecache created earlier. Since the zero_it flag is set, the structure is initialized to all zeros.

582 struct sock *sk_alloc(int family, int priority, int zero_it) 583 { 584 struct sock *sk = kmem_cache_alloc(sk_cachep, priority); 586 if(sk && zero_it) { 587 memset(sk, 0, sizeof(struct sock)); 588 sk->family = family; 589 sock_lock_init(sk); 590 } 592 return sk; 593 }

15

Page 16: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Initialization of the struct_sock

1167 void sock_init_data(struct socket *sock, struct sock *sk) 1168 {

Buffer queues for packets awaiting local delivery or transmission

1169 skb_queue_head_init(&sk->receive_queue); 1170 skb_queue_head_init(&sk->write_queue); 1171 skb_queue_head_init(&sk->error_queue); 1172 1173 init_timer(&sk->timer); 1174

Buffer space allocation flags and space quotas.

1175 sk->allocation = GFP_KERNEL; 1176 sk->rcvbuf = sysctl_rmem_default; 1177 sk->sndbuf = sysctl_wmem_default; 1178 sk->state = TCP_CLOSE; 1179 sk->zapped = 1; 1180 sk->socket = sock; 1181 1182 if(sock) 1183 { 1184 sk->type = sock->type; 1185 sk->sleep = &sock->wait; 1186 sock->sk = sk; 1187 } else 1188 sk->sleep = NULL; 1189 1190 sk->dst_lock = RW_LOCK_UNLOCKED; 1191 sk->callback_lock = RW_LOCK_UNLOCKED; 1192

Function pointers for managing sleep/wakeup transitions

1193 sk->state_change = sock_def_wakeup; 1194 sk->data_ready = sock_def_readable; 1195 sk->write_space = sock_def_write_space; 1196 sk->error_report = sock_def_error_report; 1197 sk->destruct = sock_def_destruct; 1198 1199 sk->peercred.pid = 0; 1200 sk->peercred.uid = -1; 1201 sk->peercred.gid = -1; 1202 sk->rcvlowat = 1; 1203 sk->rcvtimeo = MAX_SCHEDULE_TIMEOUT; 1204 sk->sndtimeo = MAX_SCHEDULE_TIMEOUT; 1205 1206 atomic_set(&sk->refcnt, 1); 1207 }

16

Page 17: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Mapping the socket structure into the file space

File descriptors are small integers used in Linux to identify open files via the files_struct pointer inthe task_struct of the process.

390 /* open file information */ 391 struct files_struct *files;

The files_struct contains the following fields:

172 struct files_struct { 173 atomic_t count; 174 rwlock_t file_lock; 175 int max_fds; 176 int max_fdset; 177 int next_fd; 178 struct file **fd; /* current fd array */ 179 fd_set *close_on_exec; 180 fd_set *open_fds; 181 fd_set close_on_exec_init; 182 fd_set open_fds_init; 183 struct file *fd_array[NR_OPEN_DEFAULT]; 184 };

*fd_array[] Maps file descriptors to file pointers.*open_fds A bit map of all file descriptors presently in use.count Used as a reference counter.close_on_exec_init A bit map of all file descriptors used that are to be closed when the

system call exec() is issued.

The data type fd_set is large enough to hold NR_OPEN_DEFAULT bits. NR_OPEN_DEFAULT isset to the number of bits in long integer (e.g., 32 on x86).

17

Page 18: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Mapping a file structure associated with the socket into the fd space of the current process isperformed by sock_map_fd(), defined in net/socket.c. The following comment taken from thefunction describes some complicating factors: ``This function creates a file structure and maps it tothe fd space of the current process. On success it returns the file descriptor and the file structimplicitly stored in sock->file. Note that another thread may close the file descriptor before wereturn from this function. We do not refer to socket after the mapping is complete. If one day wewill need it, this function will increment reference count on file by 1. In any case returned fd may benot valid! This race condition is inavoidable with shared fd spaces. We cannot solve it inside thekernel, but do we take care of internal coherence.''

328 */ 330 static int sock_map_fd(struct socket *sock) 331 { 332 int fd; 333 struct qstr this; 334 char name[32]; 336 /* 337 * Find a file descriptor suitable for return tothe user. 338 */ 340 fd = get_unused_fd();

On return, sock_map_fd() now holding a valid fd number, creates a new file structure or allocatesfrom an existing pool of file structures maintained by the kernel.

341 if (fd >= 0) { 342 struct file *file = get_empty_filp();

If get_empty_filp() function defined in fs/file_table.c returns NULL, if there are no more free filestructures.

344 if (!file) { 345 put_unused_fd(fd); 346 fd = -ENFILE; 347 goto out; 348 }

The variable this is of type struct qstr (quickstring), an efficient way of storing a string and it's metadata (len, hash) for quick retrieval. The name of this pseudo file is the ASCII encoding of it inodenumber.

350 sprintf(name, "[%lu]", sock->inode->i_ino); 351 this.name = name; 352 this.len = strlen(name); 353 this.hash = sock->inode->i_ino;

18

Page 19: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

The d_alloc() function copies the contents of this into the allocated storage. Note that this operationis being performed on the socket VFS . Note also that f_dentry is pointer to a directory entry.

355 file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this);

356 if (!file->f_dentry) { 357 put_filp(file); 358 put_unused_fd(fd); 359 fd = -ENOMEM; 360 goto out; 361 }

362 file->f_dentry->d_op = &sockfs_dentry_operations; 363 d_add(file->f_dentry, sock->inode); 364 file->f_vfsmnt = mntget(sock_mnt); 366 sock->file = file; 367 file->f_op = sock->inode->i_fop = &socket_file_ops; 368 file->f_mode = 3; 369 file->f_flags = O_RDWR; 370 file->f_pos = 0; 371 fd_install(fd, file); 372 } 374 out: 375 return fd; 376 }

The socket_file_ops structure provides the bindings that vector standard file system operations tohandlers in the socket file system.

115 static struct file_operations socket_file_ops = { 116 llseek: sock_lseek, 117 read: sock_read, 118 write: sock_write, 119 poll: sock_poll, 120 ioctl: sock_ioctl, 121 mmap: sock_mmap, 122 open: sock_no_open, /*code to deny open via /proc */ 123 release: sock_close, 124 fasync: sock_fasync, 125 readv: sock_readv, 126 writev: sock_writev, 127 sendpage: sock_sendpage 128 };

19

Page 20: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

Allocating a free fd

The get_unused_fd() function is defined in fs/open.c

713 /* 714 * Find an empty file descriptor entry, and mark it busy. 715 */ 716 int get_unused_fd(void) 717 { 718 struct files_struct * files = current->files; 719 int fd, error; 721 error = -EMFILE; 722 write_lock(&files->file_lock); 724 repeat: 725 fd = find_next_zero_bit(files->open_fds,

files->max_fdset, files->next_fd);

find_next_zero_bit() defined in include/asm/bitops.h finds the next available fd from the fdset, byfinding the next available bit that can be set as an used fd. When it returns, it is necessary to see ifthe number of open files exceeds the maximum.

729 /* 730 * N.B. For clone tasks sharing a files structure, this

test will limit the total number of files that can be opened. 732 */ 733 if (fd >= current->rlim[RLIMIT_NOFILE].rlim_cur) 734 goto out;

The fdset can only hold 1024 descriptors initially, but if that limit is reached it may be increased.

736 /* Do we need to expand the fdset array? */ 737 if (fd >= files->max_fdset) { 738 error = expand_fdset(files, fd); 739 if (!error) { 740 error = -EMFILE; 741 goto repeat; 742 } 743 goto out; 744 }

20

Page 21: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

The fd array can only hold 32 (NR_OPEN_DEFAULT) file pointers initially, but it may also beexpanded.

746 /* 747 * Check whether we need to expand the fd array. 748 */ 749 if (fd >= files->max_fds) { 750 error = expand_fd_array(files, fd); 751 if (!error) { 752 error = -EMFILE; 753 goto repeat; 754 } 755 goto out; 756 }

The FD_SET and FD_CLR macros set or clear the bit indexed by fd in the specified arrays.

758 FD_SET(fd, files->open_fds); 759 FD_CLR(fd, files->close_on_exec); 760 files->next_fd = fd + 1; 761 #if 1 762 /* Sanity check */ 763 if (files->fd[fd] != NULL) { 764 printk(KERN_WARNING "get_unused_fd:slot %d not

NULL!\n",fd); 765 files->fd[fd] = NULL; 766 } 767 #endif

768 error = fd; 770 out: 771 write_unlock(&files->file_lock); 772 return error; 773 }

21

Page 22: UDP Socket Creation The socketcall system call.helio/net/udpsock.pdf · UDP Socket Creation The socketcall system call. All standard library functions that operate upon sockets (e.g.,

The fd_install() function is defined in include/linux/file.h, it sets the new file structure in thefd_array. Potential race conditions exist here and are described in a contridictory way in commentsat the beginning of the function. The comment:

``The VFS is full of places where we drop the fileslock between setting the open_fds bitmap andinstalling the file pointer in the file array. Atany such point, we are vulnerable to a dup2() raceinstalling a file in the array before us. We need todetect this and fput() the struct file we are aboutto overwrite in this case.''

seems to be in conflict with the following statement:

``It should never happen - if we allow dup2() do it, _really_bad things will follow.''

The code indicates that the latter comment is actually in effect here. If the target slot in the files->fdarray is occupied at the time of the call a system crash will be requested by BUG(). 87 static inline void fd_install(unsigned int fd, struct

file * file) 88 { 89 struct files_struct *files = current->files; 91 write_lock(&files->file_lock); 92 if (files->fd[fd]) 93 BUG(); 94 files->fd[fd] = file; 95 write_unlock(&files->file_lock); 96 }

22