Lines Matching +full:max +full:- +full:frame +full:- +full:size
1 .. SPDX-License-Identifier: GPL-2.0
22 - Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
23 - Johann Baudy
33 On the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
67 [setup] socket() -------> creation of the capture socket
68 setsockopt() ---> allocation of the circular buffer (ring)
70 mmap() ---------> mapping of the allocated buffer to the
73 [capture] poll() ---------> to wait for incoming packets
75 [shutdown] close() --------> destruction of the capture socket and
88 supported and a link level pseudo-header is provided
107 [setup] socket() -------> creation of the transmission socket
108 setsockopt() ---> allocation of the circular buffer (ring)
110 bind() ---------> bind transmission socket with a network interface
111 mmap() ---------> mapping of the allocated buffer to the
114 [transmission] poll() ---------> wait for free packets (optional)
115 send() ---------> send all packets that are set as ready in
120 [shutdown] close() --------> destruction of the transmission socket and
134 know the header size of frames used in the circular buffer.
136 As capture, each frame contains two parts::
138 --------------------
140 | | of this frame
141 |--------------------|
145 --------------------
159 ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
167 bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
173 frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
178 frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
181 the frame (for payload alignment with SOCK_RAW mode for instance) you
191 - Capture process::
195 - Transmission process::
204 unsigned int tp_block_size; /* Minimal size of contiguous block */
206 unsigned int tp_frame_size; /* Size of frame */
213 related meta-information like timestamps without requiring a system call.
235 +---------+---------+ +---------+---------+
236 | frame 1 | frame 2 | | frame 3 | frame 4 |
237 +---------+---------+ +---------+---------+
240 +---------+---------+ +---------+---------+
241 | frame 5 | frame 6 | | frame 7 | frame 8 |
242 +---------+---------+ +---------+---------+
244 A frame can be of any size with the only condition it can fit in a block. A block
245 can only hold an integer number of frames, or in other words, a frame cannot
257 Block size limit
258 ----------------
265 order=2 ==> 16384 bytes, etc. The maximum size of a
285 ------------------
291 called pg_vec, its size limits the number of blocks that can be allocated::
293 +---+---+---+---+
295 +---+---+---+---+
304 a pool of pre-determined sizes. This pool of memory is maintained by the slab
309 predetermined sizes that kmalloc uses can be checked in the "size-<bytes>"
317 PACKET_MMAP buffer size calculator
323 <size-max> is the maximum size of allocable with kmalloc
325 <pointer size> depends on the architecture -- ``sizeof(void *)``
326 <page size> depends on the architecture -- PAGE_SIZE or getpagesize (2)
327 <max-order> is the value defined with MAX_PAGE_ORDER
328 <frame size> it's an upper bound of frame's capture size (more on this later)
333 <block number> = <size-max>/<pointer size>
334 <block size> = <pagesize> << <max-order>
336 so, the max buffer size is::
338 <block number> * <block size>
342 <block number> * <block size> / <frame size>
347 <size-max> = 131072 bytes
348 <pointer size> = 4 bytes
350 <max-order> = 11
352 and a value for <frame size> of 2048 bytes. These parameters will yield::
355 <block size> = 4096 << 11 = 8 MiB.
357 and hence the buffer will have a 262144 MiB size. So it can hold
360 Actually, this buffer size is not possible with an i386 architecture.
362 an i386 kernel's memory size is limited to 1GiB.
370 -----------------
372 If you check the source code you will see that what I draw here as a frame
373 is not only the link level frame. At the beginning of each frame there is a
374 header called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame
375 meta information like timestamp. So what we draw here a frame it's really
379 Frame structure:
381 - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
382 - struct tpacket_hdr
383 - pad to TPACKET_ALIGNMENT=16
384 - struct sockaddr_ll
385 - Gap, chosen so that packet data (Start+tp_net) aligns to
387 - Start+tp_mac: [ Optional MAC header ]
388 - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
389 - Pad to align to TPACKET_ALIGNMENT=16
394 - tp_block_size must be a multiple of PAGE_SIZE (1)
395 - tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
396 - tp_frame_size must be a multiple of TPACKET_ALIGNMENT
397 - tp_frame_nr must be exactly frames_per_block*tp_block_nr
403 ---------------------------------------------
410 mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
415 the frames. This is because a frame cannot be spawn across two
425 rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
426 tx_ring = rx_ring + size;
431 At the beginning of each frame there is an status field (see
432 struct tpacket_hdr). If this field is 0 means that the frame is ready
433 to be used for the kernel, If not, there is a frame the user can read
447 TP_STATUS_COPY This flag indicates that the frame (and associated
485 can use again that frame buffer.
507 #define TP_STATUS_AVAILABLE 0 // Frame is available
508 #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
509 #define TP_STATUS_SENDING 2 // Frame is currently in transmission
510 #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
513 packet, the user fills a data buffer of an available frame, sets tp_len to
514 current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
524 header->tp_len = in_i_size;
525 header->tp_status = TP_STATUS_SEND_REQUEST;
526 retval = send(this->socket, NULL, 0, 0);
552 - Default if not otherwise specified by setsockopt(2)
553 - RX_RING, TX_RING available
555 TPACKET_V1 --> TPACKET_V2:
556 - Made 64 bit clean due to unsigned long usage in TPACKET_V1
559 - Timestamp resolution in nanoseconds instead of microseconds
560 - RX_RING, TX_RING available
561 - VLAN metadata information available for packets
565 - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
567 - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
570 - How to switch to TPACKET_V2:
579 TPACKET_V2 --> TPACKET_V3:
580 - Flexible buffer implementation for RX_RING:
581 1. Blocks can be configured with non-static frame-size
582 2. Read/poll is at a block-level (as opposed to packet-level)
583 3. Added poll timeout to avoid indefinite user-space wait
585 4. Added user-configurable knobs:
590 - RX Hash data available in user space
591 - TX_RING semantics are conceptually similar to TPACKET_V2;
596 Packets with non-zero values of tp_next_offset will be dropped.
606 - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash
607 - PACKET_FANOUT_LB: schedule to socket by round-robin
608 - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
609 - PACKET_FANOUT_RND: schedule to socket by random selection
610 - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
611 - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
691 while (limit-- > 0) {
739 case -1:
757 AF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame
763 * ~15% - 20% reduction in CPU-usage
767 * Non static frame size to capture entire packet payload
772 it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.)::
774 /* Written from scratch, but kernel-to-user space API usage
844 memset(&ring->req, 0, sizeof(ring->req));
845 ring->req.tp_block_size = blocksiz;
846 ring->req.tp_frame_size = framesiz;
847 ring->req.tp_block_nr = blocknum;
848 ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
849 ring->req.tp_retire_blk_tov = 60;
850 ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
852 err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
853 sizeof(ring->req));
859 ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
861 if (ring->map == MAP_FAILED) {
866 ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd));
867 assert(ring->rd);
868 for (i = 0; i < ring->req.tp_block_nr; ++i) {
869 ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size);
870 ring->rd[i].iov_len = ring->req.tp_block_size;
892 struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
895 if (eth->h_proto == htons(ETH_P_IP)) {
901 ss.sin_addr.s_addr = ip->saddr;
907 sd.sin_addr.s_addr = ip->daddr;
911 printf("%s -> %s, ", sbuff, dbuff);
914 printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash);
919 int num_pkts = pbd->h1.num_pkts, i;
924 pbd->h1.offset_to_first_pkt);
926 bytes += ppd->tp_snaplen;
930 ppd->tp_next_offset);
939 pbd->h1.block_status = TP_STATUS_KERNEL;
944 munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr);
945 free(ring->rd);
967 fd = setup_socket(&ring, argp[argc - 1]);
978 if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
979 poll(&pfd, 1, -1);
1014 This has the side-effect, that packets sent through PF_PACKET will bypass the
1056 frames to be updated resp. the frame handed over to the application, iv) walk
1062 in a first step to see if the frame belongs to the application, and then
1077 - Packet sockets work well together with Linux socket filters, thus you also