========================================================================
CVE-2020-HSIZE -- Integer overflow in receive_msg()
========================================================================

During our work on Exim, we stumbled across the following commit:

commit 56ac062a3ff94fc4e1bbfc2293119c079a4e980b
Date:   Thu Jan 10 21:15:11 2019 +0000

    More checks on header line length during reception

...

+JH/41 Fix the loop reading a message header line to check for integer overflow,
+      and more-often against header_maxsize.  Previously a crafted message could
+      induce a crash of the recive process; now the message is cleanly rejected.

...

+    if (header_size >= INT_MAX/2)
+      goto OVERSIZE;
     header_size *= 2;

This vulnerability is exploitable in all Exim versions before 4.92 and
allows an unauthenticated remote attacker to execute arbitrary commands
as the "exim" user. Because this commit was not identified as a security
patch, it was not backported to LTS (Long Term Support) distributions.
For example, Debian oldstable's package (exim4_4.89-2+deb9u7) contains
all known security patches, but is vulnerable to CVE-2020-HSIZE and
hence remotely exploitable.

By default, Exim limits the size of a mail header to 1MB
(header_maxsize). Unfortunately, an attacker can bypass this limit by
sending only continuation lines (i.e., '\n' followed by ' ' or '\t'),
thereby overflowing the integer header_size at line 1782:

1778   if (ptr >= header_size - 4)
1779     {
1780     int oldsize = header_size;
1781     /* header_size += 256; */
1782     header_size *= 2;
1783     if (!store_extend(next->text, oldsize, header_size))
1784       {
1785       BOOL release_ok = store_last_get[store_pool] == next->text;
1786       uschar *newtext = store_get(header_size);
1787       memcpy(newtext, next->text, ptr);
1788       if (release_ok) store_release(next->text);
1789       next->text = newtext;
1790       }
1791     }

Ironically, this vulnerability was most difficult to exploit:

- when the integer header_size overflows, it becomes negative (INT_MIN),
  but we cannot exploit the resulting back-jump at line 1786 (Digression
  1b), because the free size of the current memory block also becomes
  negative (because 0 - INT_MIN = INT_MIN, the "Leblancian Paradox"),
  which prevents us from writing to this back-jumped memory block;

- to overflow the integer header_size, we must send 1GB to Exim:
  consequently, our exploit must succeed after only a few tries (in
  particular, we cannot brute-force ASLR).

Note: we can actually overflow header_size with 1GB / 2 = 512MB; if we
send a first line that ends with "\r\n", then Exim transforms every bare
'\n' that we send into "\n " (a continuation line):

1814   if (ch == '\n')
1815     {
1816     if (first_line_ended_crlf == TRUE_UNSET) first_line_ended_crlf = FALSE;
1817       else if (first_line_ended_crlf) receive_ungetc(' ');

To exploit this vulnerability:

1/ We send three separate mails (in the same SMTP session) to achieve
the following memory layout:

                              mmap memory
-----|-------------------|-------------------|-------------------|-----
 ... |n|l|    mblock3    |n|l|    mblock2    |n|l|    mblock1    | ...
-----|-------------------|-------------------|+------------------|-----
                                              |
                              heap memory     |
                    -------------------------|v-------------|-----
                     ... |N|L|N|L|N|L|N|L|N|L|n|l|  hblock  | ...
                    -------------------------|--------------|-----
                         <- fake storeblock ->

where n and l are the next and length members of a storeblock structure
(a linked list of allocated memory blocks):

 71 typedef struct storeblock {
 72   struct storeblock *next;
 73   size_t length;
 74 } storeblock;

- we first allocate a 1GB mmap block (mblock1) by sending a mail that
  contains a 256MB header of bare '\n' characters; the next member of
  mblock1's storeblock structure initially points to a heap block
  (hblock, which immediately follows data that we control);

- we allocate a second 1GB mmap block (mblock2) by sending a mail that
  also contains a 256MB header of bare '\n' characters;

- we allocate a third 1GB mmap block (mblock3) by sending a mail that
  contains a 512MB header; this overflows the integer header_size, and
  forward-overflows mblock3 (Digression 1a), into mblock2 and mblock1:
  we overwrite mblock2's next pointer with NULL (to avoid a crash in
  store_release() at line 1788) and we partially overwrite mblock1's
  next pointer (with a single null byte).

2/ After this overflow, store_reset() traverses the linked list of
allocated memory blocks and follows mblock1's overwritten next pointer,
to our own "fake storeblock" structure: a NULL next pointer N (to avoid
a crash in store_reset()), and a large length L that covers the entire
address space (for example, 0x7050505070505050). As a result, Exim's
allocator believes that the entire heap is one large, free block of
POOL_MAIN memory (Exim's main type of memory allocation).

This powerful exploit primitive gives us write access to the entire
heap, through POOL_MAIN allocations. But the heap also contains other
types of allocations: we exploit this primitive to overwrite POOL_MAIN
allocations with raw malloc()s (for information disclosure) and to
overwrite POOL_PERM allocations with POOL_MAIN allocations (for
arbitrary code execution).

3/ Information disclosure:

- First, we send an EHLO command that allocates a large string in raw
  malloc() memory.

- Second, we send an invalid RCPT TO command that allocates a small
  string in POOL_MAIN memory (an error message); this small POOL_MAIN
  string overwrites the beginning of the large malloc() string.

- Next, we send an invalid EHLO command that free()s the large malloc()
  string; this free() overwrites the beginning of the small POOL_MAIN
  string with a pointer to the libc (a member of libc's malloc_chunk
  structure).

- Last, we send an invalid DATA command that responds with an error
  message: the small, overwritten POOL_MAIN string, and hence the libc
  pointer. This information leak is essentially the technique that we
  used for CVE-2015-0235 (GHOST).

4/ Arbitrary code execution:

- First, we start a new mail (MAIL FROM, RCPT TO, and DATA commands);
  this calls dkim_exim_verify_init() and allocates a pdkim_ctx structure
  in POOL_PERM memory (DKIM is enabled by default since Exim 4.70):

249 typedef struct pdkim_ctx {
...
263   int(*dns_txt_callback)(char *, char *);
...
274 } pdkim_ctx;

- Second, we send a mail header that is allocated to POOL_MAIN memory,
  and overwrite the pdkim_ctx structure: we overwrite dns_txt_callback
  with a pointer to libc's system() function (we derive this pointer
  from the information-leaked libc pointer).

- Next, we send a "DKIM-Signature:" header (we particularly care about
  its "selector" field).

- Last, we end our mail; this calls dkim_exim_verify_finish(), which
  calls the overwritten dns_txt_callback with a first argument that we
  control (through the selector field of our "DKIM-Signature:" header):

1328 dns_txt_name = string_sprintf("%s._domainkey.%s.", sig->selector, sig->domain);
....
1333 if (  ctx->dns_txt_callback(CS dns_txt_name, CS dns_txt_reply) != PDKIM_OK

  In other words, we execute system() with an arbitrary command.

