adplus-dvertising

Confusion about syn queue and accept queue

Asked 2 years ago
Viewed 1.31 k times

When reading TCP source code, I find a confused thing:

I know TCP has two queues in 3 way handshake:

  • The first queue stores connections which server has received the SYN and send back ACK + SYN, which we call as syn queue.
  • The second queue stores the connections that 3WHS are successful and connection established, which we call as accept queue.

But when reading codes, I find listen() will call inet_csk_listen_start(), which will call reqsk_queue_alloc() to create icsk_accept_queue. And that queue is used in accept(), when we find the queue is not empty, we will get a connection from it and return.

What's more, after tracing the receive process, the call stack is like

tcp_v4_rcv()->tcp_v4_do_rcv()->tcp_rcv_state_process()

The server status is LISTEN when receiving the first handshake. So it will call

`tcp_v4_conn_request()->tcp_conn_request()`

In tcp_conn_request()

if (!want_cookie)
    // Add the req into the queue
    inet_csk_reqsk_queue_hash_add(sk, req, tcp_timeout_init((struct sock *)req));

But here the queue is exactly the icsk_accept_queue, not a syn queue.

void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
                   unsigned long timeout)
{
    reqsk_queue_hash_req(req, timeout);
    inet_csk_reqsk_queue_added(sk);
}

static inline void inet_csk_reqsk_queue_added(struct sock *sk)
{
    reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue);
}

The accept() will return the established connection, which means icsk_accept_queue is the second queue, but where is the first queue?

Where does the connection changes from the first queue to the second?

Why does the Linux add new req into icsk_accept_queue?

asked 2 years ago

Correct Answer

In what follows we will follow the most typical code path and will ignore issues arising from packet loss, retransmission, and the use of atypical features such as TCP fast open (TFO in the code comments).

The call to accept is processed by intet_csk_accept, which calls reqsk_queue_remove to get a socket out of the accept queue &icsk->icsk_accept_queue from the listening socket:

struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
{
    struct inet_connection_sock *icsk = inet_csk(sk);
    struct request_sock_queue *queue = &icsk->icsk_accept_queue;
    struct request_sock *req;
    struct sock *newsk;
    int error;

    lock_sock(sk);

    [...]

    req = reqsk_queue_remove(queue, sk);
    newsk = req->sk;

    [...]

    return newsk;

    [...]
}

In reqsk_queue_remove, it uses rskq_accept_head and rskq_accept_tail to pull a socket out of the queue and to call sk_acceptq_removed:

static inline struct request_sock *reqsk_queue_remove(struct request_sock_queue *queue,
                              struct sock *parent)
{
    struct request_sock *req;

    spin_lock_bh(&queue->rskq_lock);
    req = queue->rskq_accept_head;
    if (req) {
        sk_acceptq_removed(parent);
        WRITE_ONCE(queue->rskq_accept_head, req->dl_next);
        if (queue->rskq_accept_head == NULL)
            queue->rskq_accept_tail = NULL;
    }
    spin_unlock_bh(&queue->rskq_lock);
    return req;
}

And sk_acceptq_removed reduces the length of the queue of sockets waiting to be accepted in sk_ack_backlog:

static inline void sk_acceptq_removed(struct sock *sk)
{
    WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog - 1);
}

This, I think, is fully understood by the questioner. Now let's look at what happens when a SYN is recieved, and when the final ACK of the 3WH arrives.

First the receipt of the SYN. Again, let's assume that TFO and SYN cookies are not in play and look at the most common path (at least not when there is a SYN flood).

The SYN is processed in tcp_conn_request where the connection request (not a full blown socket) is stored (we shall see where soon) by calling inet_csk_reqsk_queue_hash_add and then calling send_synack to respond to the SYN:

int tcp_conn_request(struct request_sock_ops *rsk_ops,
             const struct tcp_request_sock_ops *af_ops,
             struct sock *sk, struct sk_buff *skb)
{

   [...] 

   if (!want_cookie)
            inet_csk_reqsk_queue_hash_add(sk, req,
                tcp_timeout_init((struct sock *)req));
   af_ops->send_synack(sk, dst, &fl, req, &foc,
                    !want_cookie ? TCP_SYNACK_NORMAL :
                           TCP_SYNACK_COOKIE);

   [...]

   return 0;

   [...]
}

inet_csk_reqsk_queue_hash_add calls reqsk_queue_hash_req and inet_csk_reqsk_queue_added to store the request.

void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
                   unsigned long timeout)
{
    reqsk_queue_hash_req(req, timeout);
    inet_csk_reqsk_queue_added(sk);
}

reqsk_queue_hash_req puts the request into the ehash.

static void reqsk_queue_hash_req(struct request_sock *req,
                 unsigned long timeout)
{
    [...]

    inet_ehash_insert(req_to_sk(req), NULL);

    [...]
}

And then inet_csk_reqsk_queue_added calls reqsk_queue_added with the icsk_accept_queue:

static inline void inet_csk_reqsk_queue_added(struct sock *sk)
{
    reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue);
}

Which increments the qlen (not sk_ack_backlog):

static inline void reqsk_queue_added(struct request_sock_queue *queue)
{
    atomic_inc(&queue->young);
    atomic_inc(&queue->qlen);
}

The ehash is where all the ESTABLISHED and TIMEWAIT sockets have been stored, and, more recently, where the SYN "queue" is stored.

Note that there is actually no purpose in storing the arrived connection requests in a proper queue. Their order is irrelevant (the final ACKs can arrive in any order) and by moving them out of the listening socket, it is not necessary to take a lock on the listening socket to process the final ACK.

See this commit for the code that effected this change.

Finally, we can watch the request get removed from ehash and added as a full socket to the accept queue.

The final ACK of the 3WH is processed by tcp_check_req which creates a full child socket and then calls inet_csk_complete_hashdance:

struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
               struct request_sock *req,
               bool fastopen, bool *req_stolen)
{

    [...]

    /* OK, ACK is valid, create big socket and
     * feed this segment to it. It will repeat all
     * the tests. THIS SEGMENT MUST MOVE SOCKET TO
     * ESTABLISHED STATE. If it will be dropped after
     * socket is created, wait for troubles.
     */
    child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL,
                             req, &own_req);

    [...]

    return inet_csk_complete_hashdance(sk, child, req, own_req);

    [...]

}

Then inet_csk_complete_hashdance calls inet_csk_reqsk_queue_drop and reqsk_queue_removed on the request, and inet_csk_reqsk_queue_add on the child:

struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
                     struct request_sock *req, bool own_req)
{
    if (own_req) {
        inet_csk_reqsk_queue_drop(sk, req);
        reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
        if (inet_csk_reqsk_queue_add(sk, req, child))
            return child;
    }
    [...]
}

inet_csk_reqsk_queue_drop calls reqsk_queue_unlink, which removes the request from the ehash, and reqsk_queue_removed which decrements the qlen:

void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
{
    if (reqsk_queue_unlink(req)) {
        reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
        reqsk_put(req);
    }
}

Finally, inet_csk_reqsk_queue_add adds the full socket to the accept queue.

struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
                      struct request_sock *req,
                      struct sock *child)
{
    struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;

    spin_lock(&queue->rskq_lock);
    if (unlikely(sk->sk_state != TCP_LISTEN)) {
        inet_child_forget(sk, req, child);
        child = NULL;
    } else {
        req->sk = child;
        req->dl_next = NULL;
        if (queue->rskq_accept_head == NULL)
            WRITE_ONCE(queue->rskq_accept_head, req);
        else
            queue->rskq_accept_tail->dl_next = req;
        queue->rskq_accept_tail = req;
        sk_acceptq_added(sk);
    }
    spin_unlock(&queue->rskq_lock);
    return child;
}

TL;DR it is in the ehash, and the number of such SYNs is qlen (and not sk_ack_backlog, which holds the number of sockets in the accept queue).

answered 2 years ago

Other Answer

The short answer is that SYN queues are dangerous. The reason they are dangerous is that by sending a single packet (SYN), the sender can get the receiver to commit resources (memory for the SYN queue entry). if you send enough such packets fast enough, possibly with a forged origination address, you will either cause the receiver to exhaust its memory resources or to start refusing to accept legitimate connections.

For this reasons modern operating systems do not have a SYN queue. Instead, they will various techniques (the most common is called SYN cookies) that will allow them to only have a queue for connections that have already answered the initial SYN ACK packet, and thus proved they have dedicated resources themselves for this connection.

So, you are right - there is no SYN queue.

answered 2 years ago