Skip to content

Commit fdfc5c8

Browse files
edumazetdavem330
authored andcommitted
tcp: remove empty skb from write queue in error cases
Vladimir Rutsky reported stuck TCP sessions after memory pressure events. Edge Trigger epoll() user would never receive an EPOLLOUT notification allowing them to retry a sendmsg(). Jason tested the case of sk_stream_alloc_skb() returning NULL, but there are other paths that could lead both sendmsg() and sendpage() to return -1 (EAGAIN), with an empty skb queued on the write queue. This patch makes sure we remove this empty skb so that Jason code can detect that the queue is empty, and call sk->sk_write_space(sk) accordingly. Fixes: ce5ec44 ("tcp: ensure epoll edge trigger wakeup when write queue is empty") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Reported-by: Vladimir Rutsky <rutsky@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 7d0a065 commit fdfc5c8

1 file changed

Lines changed: 20 additions & 10 deletions

File tree

net/ipv4/tcp.c

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -935,6 +935,22 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
935935
return mss_now;
936936
}
937937

938+
/* In some cases, both sendpage() and sendmsg() could have added
939+
* an skb to the write queue, but failed adding payload on it.
940+
* We need to remove it to consume less memory, but more
941+
* importantly be able to generate EPOLLOUT for Edge Trigger epoll()
942+
* users.
943+
*/
944+
static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
945+
{
946+
if (skb && !skb->len) {
947+
tcp_unlink_write_queue(skb, sk);
948+
if (tcp_write_queue_empty(sk))
949+
tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
950+
sk_wmem_free_skb(sk, skb);
951+
}
952+
}
953+
938954
ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
939955
size_t size, int flags)
940956
{
@@ -1064,6 +1080,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
10641080
return copied;
10651081

10661082
do_error:
1083+
tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk));
10671084
if (copied)
10681085
goto out;
10691086
out_err:
@@ -1388,18 +1405,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
13881405
sock_zerocopy_put(uarg);
13891406
return copied + copied_syn;
13901407

1408+
do_error:
1409+
skb = tcp_write_queue_tail(sk);
13911410
do_fault:
1392-
if (!skb->len) {
1393-
tcp_unlink_write_queue(skb, sk);
1394-
/* It is the one place in all of TCP, except connection
1395-
* reset, where we can be unlinking the send_head.
1396-
*/
1397-
if (tcp_write_queue_empty(sk))
1398-
tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
1399-
sk_wmem_free_skb(sk, skb);
1400-
}
1411+
tcp_remove_empty_skb(sk, skb);
14011412

1402-
do_error:
14031413
if (copied + copied_syn)
14041414
goto out;
14051415
out_err:

0 commit comments

Comments
 (0)