CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 396
Description
Hello, Wouter!
I am actively using unbound-1.13.1 (with our DNSTAP patches, issue #367). And sometimes my unbound is crashing under highload, massive recursive TCP-requests. Any abnormal terminations caused by services/outside_network.c code. And now i have one of such core dumps:
(gdb) bt
#0 0x0000000800955c2a in thr_kill () from /lib/libc.so.7
#1 0x0000000800954084 in raise () from /lib/libc.so.7
#2 0x00000008008ca279 in abort () from /lib/libc.so.7
#3 0x0000000800464641 in ?? () from /usr/local/lib/libevent-2.1.so.7
#4 0x0000000800464939 in event_errx () from /usr/local/lib/libevent-2.1.so.7
#5 0x000000080045ec54 in evmap_io_del_ () from /usr/local/lib/libevent-2.1.so.7
#6 0x0000000800457e8f in event_del_nolock_ () from /usr/local/lib/libevent-2.1.so.7
#7 0x000000080045ada8 in event_del () from /usr/local/lib/libevent-2.1.so.7
#8 0x000000000030e25b in ub_event_del (ev=) at ./util/ub_event.c:395
#9 comm_point_close (c=0xdc97b7c00) at ./util/netevent.c:3860
#10 0x0000000000315bab in decommission_pending_tcp (outnet=, pend=0xdc9494980)
at ./services/outside_network.c:945
#11 0x00000000003147d6 in reuse_cb_and_decommission (outnet=0x18e75, pend=0x6, error=-2)
at ./services/outside_network.c:986
#12 0x0000000000317491 in outnet_tcptimer (arg=0xee67c2300) at ./services/outside_network.c:2033
#13 0x000000080045e0ed in ?? () from /usr/local/lib/libevent-2.1.so.7
#14 0x000000080045a09c in event_base_loop () from /usr/local/lib/libevent-2.1.so.7
#15 0x000000000024dc54 in thread_start (arg=0x8014c0800) at ./util/ub_event.c:280
#16 0x0000000800780fac in ?? () from /lib/libthr.so.3
#17 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf7fa000
(gdb)
If we enter frame 12 (outnet_tcptimer) and do print pend structure, we will see the following:
(gdb) print pend
$15 = (struct pending_tcp *) 0x6
(gdb) print *pend
Cannot access memory at address 0x6
(gdb)
And this corrupt pend structure is passing to reuse_cb_and_decommission() function (frame 11) and higher in the stacktrace output above.
In the outnet_tcptimer() function we can see the following code (in services/outside_network.c):
/* it was in use /
struct pending_tcp pend=(struct pending_tcp*)w->next_waiting;
But the structure w->next_waiting is of type waiting_tcp:
(gdb) print w->next_waiting
$18 = (struct waiting_tcp *) 0xdc9494980
(gdb)
So my question - is the types casting correct in outnet_tcptimer() function? And does this corrupt pend structure cause event_errx() in libevent?
If it might help, i found structure of pending_tcp type in w structure:
(gdb) print w->outnet->tcp_free
$23 = (struct pending_tcp *) 0xdc9494980
(gdb)
(gdb) print *w->outnet->tcp_free
$24 = {next_free = 0xdc9493e40, pi = 0xd7da2c000, c = 0xdc97b7c00, query = 0x0, reuse = {node = {parent = 0xdc94953a0,
left = 0x3287d0 <rbtree_null_node>, right = 0x3287d0 <rbtree_null_node>, key = 0x0, color = 1 '\001'}, addr = {
ss_len = 0 '\000', ss_family = 2 '\002', __ss_pad1 = "\000\065X\320\017\067", __ss_align = 0,
__ss_pad2 = "\000\000\000\000\000\000\000\016", '\000' <repeats 103 times>}, addrlen = 16, is_ssl = 0,
lru_next = 0xdc9494ae0, lru_prev = 0x0, item_on_lru_list = 0, pending = 0xdc9494980, cp_more_read_again = 0,
cp_more_write_again = 0, tree_by_id = {root = 0x3287d0 <rbtree_null_node>, count = 0,
cmp = 0x3133e0 <reuse_id_cmp>}, write_wait_first = 0x0, write_wait_last = 0x0, outnet = 0xd7d805000}}
(gdb)
Big thank you in advance!
PS I did not send core-file itself because of 31GB in size of the file.