kernel-hacking-2024-linux-steffo

unimore/kernel-hacking-2024-linux-steffo

Author	SHA1	Message	Date
Rami Rosen	04af8cf6f3	net: Remove unused parameter from fill method in fib_rules_ops. The netlink message header (struct nlmsghdr) is an unused parameter in fill method of fib_rules_ops struct. This patch removes this parameter from this method and fixes the places where this method is called. (include/net/fib_rules.h) Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-20 17:26:23 -07:00
Chris Friesen	9643f45512	ipv4: teach ipconfig about the MTU option in DHCP The DHCP spec allows the server to specify the MTU. This can be useful for netbooting with UDP-based NFS-root on a network using jumbo frames. This patch allows the kernel IP autoconfiguration to handle this option correctly. It would be possible to use initramfs and add a script to set the MTU, but that seems like a complicated solution if no initramfs is otherwise necessary, and would bloat the kernel image more than this code would. This patch was originally submitted to LKML in 2003 by Hans-Peter Jansen. Signed-off-by: Chris Friesen <cfriesen@nortel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-19 15:36:17 -07:00
Eric W. Biederman	9b8adb5ea0	net: Fix devinet_sysctl_forward sysctls are unregistered with the rntl_lock held making it unsafe to unconditionally grab the the rtnl_lock. Instead we need to call rtnl_trylock and restart the system call if we can not grab it. Otherwise we could deadlock at unregistration time. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-18 22:15:58 -07:00
David S. Miller	bb803cfbec	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/scsi/fcoe/fcoe.c	2009-05-18 21:08:20 -07:00
Rami Rosen	d23a9b5baa	ipv4: cleanup: remove unnecessary include. There is no need for net/icmp.h header in net/ipv4/fib_frontend.c. This patch removes the #include net/icmp.h from it. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-18 15:16:38 -07:00
Rami Rosen	e204a345a0	ipv4: cleanup - remove two unused parameters from fib_semantic_match(). Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-18 15:16:37 -07:00
Ilpo Järvinen	7752731318	tcp: fix MSG_PEEK race check Commit `518a09ef11` (tcp: Fix recvmsg MSG_PEEK influence of blocking behavior) lets the loop run longer than the race check did previously expect, so we need to be more careful with this check and consider the work we have been doing. I tried my best to deal with urg hole madness too which happens here: if (!sock_flag(sk, SOCK_URGINLINE)) { ++*seq; ... by using additional offset by one but I certainly have very little interest in testing that part. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Frans Pop <elendil@planet.nl> Tested-by: Ian Zimmermann <itz@buug.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-18 15:05:40 -07:00
Chris Friesen	2513dfb83f	ipconfig: handle case of delayed DHCP server If a DHCP server is delayed, it's possible for the client to receive the DHCPOFFER after it has already sent out a new DHCPDISCOVER message from a second interface. The client then sends out a DHCPREQUEST from the second interface, but the server doesn't recognize the device and rejects the request. This patch simply tracks the current device being configured and throws away the OFFER if it is not intended for the current device. A more sophisticated approach would be to put the OFFER information into the struct ic_device rather than storing it globally. Signed-off-by: Chris Friesen <cfriesen@nortel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-17 20:39:33 -07:00
Rami Rosen	8b3521eeb7	ipv4: remove an unused parameter from configure method of fib_rules_ops. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-17 11:59:45 -07:00
David S. Miller	e81963b180	ipv4: Make INET_LRO a bool instead of tristate. This code is used as a library by several device drivers, which select INET_LRO. If some are modules and some are statically built into the kernel, we get build failures if INET_LRO is modular. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-08 12:45:26 -07:00
David S. Miller	22f6dacdfc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/net/tcp.h	2009-05-08 02:48:30 -07:00
Arnaldo Carvalho de Melo	4dbc8ef7e1	net: Make inet_twsk_put similar to sock_put By separating the freeing code from the refcounting decrementing. Probably reducing icache pressure when we still have reference counts to go. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-06 16:50:52 -07:00
Shan Wei	ae8d7f884a	tcp:fix the code indent Signed-off-by: Shan Wei<shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-05 12:29:47 -07:00
Satoru SATOH	0c266898b4	tcp: Fix tcp_prequeue() to get correct rto_min value tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of the actual value might be tuned. The following patches fix this and make tcp_prequeue get the actual value returns from tcp_rto_min(). Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-04 11:11:01 -07:00
Ilpo Järvinen	255cac91c3	tcp: extend ECN sysctl to allow server-side only ECN This should be very safe compared with full enabled, so I see no reason why it shouldn't be done right away. As ECN can only be negotiated if the SYN sending party is also supporting it, somebody in the loop probably knows what he/she is doing. If SYN does not ask for ECN, the server side SYN-ACK is identical to what it is without ECN. Thus it's quite safe. The chosen value is safe w.r.t to existing configs which choose to currently set manually either 0 or 1 but silently upgrades those who have not explicitly requested ECN off. Whether to just enable both sides comes up time to time but unless that gets done now we can at least make the servers aware of ECN already. As there are some known problems to occur if ECN is enabled, it's currently questionable whether there's any real gain from enabling clients as servers mostly won't support it anyway (so we'd hit just the negative sides). After enabling the servers and getting that deployed, the client end enable really has some potential gain too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-04 11:07:36 -07:00
David S. Miller	aba7453037	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/isdn/00-INDEX drivers/net/wireless/iwlwifi/iwl-scan.c drivers/net/wireless/rndis_wlan.c net/mac80211/main.c	2009-04-29 20:30:35 -07:00
Stephen Hemminger	942e4a2bd6	netfilter: revised locking for x_tables The x_tables are organized with a table structure and a per-cpu copies of the counters and rules. On older kernels there was a reader/writer lock per table which was a performance bottleneck. In 2.6.30-rc, this was converted to use RCU and the counters/rules which solved the performance problems for do_table but made replacing rules much slower because of the necessary RCU grace period. This version uses a per-cpu set of spinlocks and counters to allow to table processing to proceed without the cache thrashing of a global reader lock and keeps the same performance for table updates. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-28 22:36:33 -07:00
Arnaldo Carvalho de Melo	ac5978e7f8	inet_diag: Remove dup assignments These are later assigned to other values without being used meanwhile. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-28 08:03:26 -07:00
Herbert Xu	36e7b1b8da	gro: Fix COMPLETE checksum handling On a brand new GRO skb, we cannot call ip_hdr since the header may lie in the non-linear area. This patch adds the helper skb_gro_network_header to handle this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-27 05:44:45 -07:00
Anton Blanchard	c9503e0fe0	ipv4: Limit size of route cache hash table Right now we have no upper limit on the size of the route cache hash table. On a 128GB POWER6 box it ends up as 32MB: IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes) It would be nice to cap this for memory consumption reasons, but a massive hashtable also causes a significant spike when measuring OS jitter. With a 32MB hashtable and 4 million entries, rt_worker_func is taking 5 ms to complete. On another system with more memory it's taking 14 ms. Even though rt_worker_func does call cond_sched() to limit its impact, in an HPC environment we want to keep all sources of OS jitter to a minimum. With the patch applied we limit the number of entries to 512k which can still be overriden by using the rt_entries boot option: IP route cache hash table entries: 524288 (order: 6, 4194304 bytes) With this patch rt_worker_func now takes 0.460 ms on the same system. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-27 05:42:24 -07:00
Neil Horman	edf391ff17	snmp: add missing counters for RFC 4293 The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and OutMcastOctets: http://tools.ietf.org/html/rfc4293 But it seems we don't track those in any way that easy to separate from other protocols. This patch adds those missing counters to the stats file. Tested successfully by me With help from Eric Dumazet. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-27 02:45:02 -07:00
David S. Miller	e5e9743bb7	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/core/dev.c	2009-04-21 01:32:26 -07:00
Florian Westphal	a0f82f64e2	syncookies: remove last_synq_overflow from struct tcp_sock last_synq_overflow eats 4 or 8 bytes in struct tcp_sock, even though it is only used when a listening sockets syn queue is full. We can (ab)use rx_opt.ts_recent_stamp to store the same information; it is not used otherwise as long as a socket is in listen state. Move linger2 around to avoid splitting struct mtu_probe across cacheline boundary on 32 bit arches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-20 02:25:26 -07:00
Ilpo Järvinen	52cf3cc8ac	tcp: fix mid-wq adjustment helper Just noticed while doing some new work that the recent mid-wq adjustment logic will misbehave when FACK is not in use (happens either due sysctl'ed off or auto-detected reordering) because I forgot the relevant TCPCB tagbit. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-20 02:15:00 -07:00
Eric Dumazet	573636cbaf	[PATCH] net: remove superfluous call to synchronize_net() inet_register_protosw() function is responsible for adding a new inet protocol into a global table (inetsw[]) that is used with RCU rules. As soon as the store of the pointer is done, other cpus might see this new protocol in inetsw[], so we have to make sure new protocol is ready for use. All pending memory updates should thus be committed to memory before setting the pointer. This is correctly done using rcu_assign_pointer() synchronize_net() is typically used at unregister time, after unsetting the pointer, to make sure no other cpu is still using the object we want to dismantle. Using it at register time is only adding an artificial delay that could hide a real bug, and this bug could popup if/when synchronize_rcu() can proceed faster than now. This saves about 13 ms on boot time on a HZ=1000 8 cpus machine ;) (4 calls to inet_register_protosw(), and about 3200 us per call) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-17 04:52:48 -07:00
Herbert Xu	a0a69a0106	gro: Fix use after free in tcp_gro_receive After calling skb_gro_receive skb->len can no longer be relied on since if the skb was merged using frags, then its pages will have been removed and the length reduced. This caused tcp_gro_receive to prematurely end merging which resulted in suboptimal performance with ixgbe. The fix is to store skb->len on the stack. Reported-by: Mark Wagner <mwagner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-17 02:34:38 -07:00
David S. Miller	134ffb4cad	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6	2009-04-16 16:32:29 -07:00
Patrick McHardy	98d500d66c	netfilter: nf_nat: add support for persistent mappings The removal of the SAME target accidentally removed one feature that is not available from the normal NAT targets so far, having multi-range mappings that use the same mapping for each connection from a single client. The current behaviour is to choose the address from the range based on source and destination IP, which breaks when communicating with sites having multiple addresses that require all connections to originate from the same IP address. Introduce a IP_NAT_RANGE_PERSISTENT option that controls whether the destination address is taken into account for selecting addresses. http://bugzilla.kernel.org/show_bug.cgi?id=12954 Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-04-16 18:33:01 +02:00
Ilpo Järvinen	86bcebafc5	tcp: fix >2 iw selection A long-standing feature in tcp_init_metrics() is such that any of its goto reset prevents call to tcp_init_cwnd(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-14 02:08:53 -07:00
Vlad Yasevich	499923c7a3	ipv6: Fix NULL pointer dereference with time-wait sockets Commit `b2f5e7cd3d` (ipv6: Fix conflict resolutions during ipv6 binding) introduced a regression where time-wait sockets were not treated correctly. This resulted in the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000062 IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70 ... Call Trace: [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6] [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6] [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400 [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6] [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0 [<ffffffff8056ed49>] sys_bind+0x89/0x100 [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b Tested-by: Brian Haley <brian.haley@hp.com> Tested-by: Ed Tomlinson <edt@aei.ca> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-11 01:53:06 -07:00
Linus Torvalds	ef8a97bbc9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (54 commits) glge: remove unused #include <version.h> dnet: remove unused #include <version.h> tcp: miscounts due to tcp_fragment pcount reset tcp: add helper for counter tweaking due mid-wq change hso: fix for the 'invalid frame length' messages hso: fix for crash when unplugging the device fsl_pq_mdio: Fix compile failure fsl_pq_mdio: Revive UCC MDIO support ucc_geth: Pass proper device to DMA routines, otherwise oops happens i.MX31: Fixing cs89x0 network building to i.MX31ADS tc35815: Fix build error if NAPI enabled hso: add Vendor/Product ID's for new devices ucc_geth: Remove unused header gianfar: Remove unused header kaweth: Fix locking to be SMP-safe net: allow multiple dev per napi with GRO r8169: reset IntrStatus after chip reset ixgbe: Fix potential memory leak/driver panic issue while setting up Tx & Rx ring parameters ixgbe: fix ethtool -A\|a behavior ixgbe: Patch to fix driver panic while freeing up tx & rx resources ...	2009-04-02 21:05:30 -07:00
Ilpo Järvinen	9eb9362e56	tcp: miscounts due to tcp_fragment pcount reset It seems that trivial reset of pcount to one was not sufficient in tcp_retransmit_skb. Multiple counters experience a positive miscount when skb's pcount gets lowered without the necessary adjustments (depending on skb's sacked bits which exactly), at worst a packets_out miscount can crash at RTO if the write queue is empty! Triggering this requires mss change, so bidir tcp or mtu probe or like. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Uwe Bugla <uwe.bugla@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 16:31:45 -07:00
Ilpo Järvinen	797108d134	tcp: add helper for counter tweaking due mid-wq change We need full-scale adjustment to fix a TCP miscount in the next patch, so just move it into a helper and call for that from the other places. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 16:31:44 -07:00
Eric Dumazet	fa9a86ddc8	netfilter: use rcu_read_bh() in ipt_do_table() Commit `784544739a` (netfilter: iptables: lock free counters) forgot to disable BH in arpt_do_table(), ipt_do_table() and ip6t_do_table() Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem. Reported-and-bisected-by: Roman Mindalev <r000n@r000n.net> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 00:54:43 -07:00
Rami Rosen	377f0a08e4	ipv4: remove unused parameter from tcp_recv_urg(). Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-31 14:43:17 -07:00
Linus Torvalds	7541bba880	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: smack: Add a new '-CIPSO' option to the network address label configuration netlabel: Cleanup the Smack/NetLabel code to fix incoming TCP connections lsm: Remove the socket_post_accept() hook selinux: Remove the "compat_net" compatibility code netlabel: Label incoming TCP connections correctly in SELinux lsm: Relocate the IPv4 security_inet_conn_request() hooks TOMOYO: Fix a typo. smack: convert smack to standard linux lists	2009-03-28 17:30:42 -07:00
Paul Moore	389fb800ac	netlabel: Label incoming TCP connections correctly in SELinux The current NetLabel/SELinux behavior for incoming TCP connections works but only through a series of happy coincidences that rely on the limited nature of standard CIPSO (only able to convey MLS attributes) and the write equality imposed by the SELinux MLS constraints. The problem is that network sockets created as the result of an incoming TCP connection were not on-the-wire labeled based on the security attributes of the parent socket but rather based on the wire label of the remote peer. The issue had to do with how IP options were managed as part of the network stack and where the LSM hooks were in relation to the code which set the IP options on these newly created child sockets. While NetLabel/SELinux did correctly set the socket's on-the-wire label it was promptly cleared by the network stack and reset based on the IP options of the remote peer. This patch, in conjunction with a prior patch that adjusted the LSM hook locations, works to set the correct on-the-wire label format for new incoming connections through the security_inet_conn_request() hook. Besides the correct behavior there are many advantages to this change, the most significant is that all of the NetLabel socket labeling code in SELinux now lives in hooks which can return error codes to the core stack which allows us to finally get ride of the selinux_netlbl_inode_permission() logic which greatly simplfies the NetLabel/SELinux glue code. In the process of developing this patch I also ran into a small handful of AF_INET6 cleanliness issues that have been fixed which should make the code safer and easier to extend in the future. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-03-28 15:01:36 +11:00
Paul Moore	284904aa79	lsm: Relocate the IPv4 security_inet_conn_request() hooks The current placement of the security_inet_conn_request() hooks do not allow individual LSMs to override the IP options of the connection's request_sock. This is a problem as both SELinux and Smack have the ability to use labeled networking protocols which make use of IP options to carry security attributes and the inability to set the IP options at the start of the TCP handshake is problematic. This patch moves the IPv4 security_inet_conn_request() hooks past the code where the request_sock's IP options are set/reset so that the LSM can safely manipulate the IP options as needed. This patch intentionally does not change the related IPv6 hooks as IPv6 based labeling protocols which use IPv6 options are not currently implemented, once they are we will have a better idea of the correct placement for the IPv6 hooks. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: James Morris <jmorris@namei.org>	2009-03-28 15:01:36 +11:00
Ingo Molnar	82268da1b1	Merge branch 'linus' into percpu-cpumask-x86-for-linus-2 Conflicts: arch/sparc/kernel/time_64.c drivers/gpu/drm/drm_proc.c Manual merge to resolve build warning due to phys_addr_t type change on x86: drivers/gpu/drm/drm_info.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-28 04:26:01 +01:00
Ingo Molnar	6e15cf0486	Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2 Conflicts: arch/parisc/kernel/irq.c arch/x86/include/asm/fixmap_64.h arch/x86/include/asm/setup.h kernel/irq/handle.c Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-27 17:28:43 +01:00
David S. Miller	01e6de64d9	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2009-03-26 22:45:23 -07:00
David S. Miller	f0de70f8bb	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2009-03-26 01:22:01 -07:00
Holger Eitzenberger	a400c30edb	netfilter: nf_conntrack: calculate per-protocol nlattr size Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-03-25 21:53:39 +01:00
Eric Dumazet	ea781f197d	netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu() Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP. This permits an easy conversion from call_rcu() based hash lists to a SLAB_DESTROY_BY_RCU one. Avoiding call_rcu() delay at nf_conn freeing time has numerous gains. First, it doesnt fill RCU queues (up to 10000 elements per cpu). This reduces OOM possibility, if queued elements are not taken into account This reduces latency problems when RCU queue size hits hilimit and triggers emergency mode. - It allows fast reuse of just freed elements, permitting better use of CPU cache. - We delete rcu_head from "struct nf_conn", shrinking size of this structure by 8 or 16 bytes. This patch only takes care of "struct nf_conn". call_rcu() is still used for less critical conntrack parts, that may be converted later if necessary. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-03-25 21:05:46 +01:00
Patrick McHardy	1f9352ae22	netfilter: {ip,ip6,arp}_tables: fix incorrect loop detection Commit `e1b4b9f` ([NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case search for loops) introduced a regression in the loop detection algorithm, causing sporadic incorrectly detected loops. When a chain has already been visited during the check, it is treated as having a standard target containing a RETURN verdict directly at the beginning in order to not check it again. The real target of the first rule is then incorrectly treated as STANDARD target and checked not to contain invalid verdicts. Fix by making sure the rule does actually contain a standard target. Based on patch by Francis Dupont <Francis_Dupont@isc.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-03-25 19:26:35 +01:00
Eric Dumazet	b8dfe49877	netfilter: factorize ifname_compare() We use same not trivial helper function in four places. We can factorize it. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-03-25 17:31:52 +01:00
Vlad Yasevich	b2f5e7cd3d	ipv6: Fix conflict resolutions during ipv6 binding The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal() which at times wrongly identified intersections between addresses. It particularly broke down under a few instances and caused erroneous bind conflicts. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-24 19:49:11 -07:00
Eric Dumazet	35c7f6de73	arp_tables: ifname_compare() can assume 16bit alignment Arches without efficient unaligned access can still perform a loop assuming 16bit alignment in ifname_compare() Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-24 14:15:22 -07:00
David S. Miller	b5bb14386e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2009-03-24 13:24:36 -07:00
Vitaly Mayatskikh	30842f2989	udp: Wrong locking code in udp seq_file infrastructure Reading zero bytes from /proc/net/udp or other similar files which use the same seq_file udp infrastructure panics kernel in that way: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- read/1985 is trying to release lock (&table->hash[i].lock) at: [<ffffffff81321d83>] udp_seq_stop+0x27/0x29 but there are no more locks to release! other info that might help us debug this: 1 lock held by read/1985: #0: (&p->lock){--..}, at: [<ffffffff810eefb6>] seq_read+0x38/0x348 stack backtrace: Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9 Call Trace: [<ffffffff81321d83>] ? udp_seq_stop+0x27/0x29 [<ffffffff8106dab9>] print_unlock_inbalance_bug+0xd6/0xe1 [<ffffffff8106db62>] lock_release_non_nested+0x9e/0x1c6 [<ffffffff810ef030>] ? seq_read+0xb2/0x348 [<ffffffff8106bdba>] ? mark_held_locks+0x68/0x86 [<ffffffff81321d83>] ? udp_seq_stop+0x27/0x29 [<ffffffff8106dde7>] lock_release+0x15d/0x189 [<ffffffff8137163c>] _spin_unlock_bh+0x1e/0x34 [<ffffffff81321d83>] udp_seq_stop+0x27/0x29 [<ffffffff810ef239>] seq_read+0x2bb/0x348 [<ffffffff810eef7e>] ? seq_read+0x0/0x348 [<ffffffff8111aedd>] proc_reg_read+0x90/0xaf [<ffffffff810d878f>] vfs_read+0xa6/0x103 [<ffffffff8106bfac>] ? trace_hardirqs_on_caller+0x12f/0x153 [<ffffffff810d88a2>] sys_read+0x45/0x69 [<ffffffff8101123a>] system_call_fastpath+0x16/0x1b BUG: scheduling while atomic: read/1985/0xffffff00 INFO: lockdep is turned off. Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm ppdev snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event arc4 snd_s eq ecb thinkpad_acpi snd_seq_device iwl3945 hwmon sdhci_pci snd_pcm_oss sdhci rfkill mmc_core snd_mixer_oss i2c_i801 mac80211 yenta_socket ricoh_mmc i2c_core iTCO_wdt snd_pcm iTCO_vendor_support rs rc_nonstatic snd_timer snd lib80211 cfg80211 soundcore snd_page_alloc video parport_pc output parport e1000e [last unloaded: scsi_wait_scan] Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9 Call Trace: [<ffffffff8106b456>] ? __debug_show_held_locks+0x1b/0x24 [<ffffffff81043660>] __schedule_bug+0x7e/0x83 [<ffffffff8136ede9>] schedule+0xce/0x838 [<ffffffff810d7972>] ? fsnotify_access+0x5f/0x67 [<ffffffff810112d0>] ? sysret_careful+0xb/0x37 [<ffffffff8106be9c>] ? trace_hardirqs_on_caller+0x1f/0x153 [<ffffffff8137127b>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff810112f6>] sysret_careful+0x31/0x37 read[1985]: segfault at 7fffc479bfe8 ip 0000003e7420a180 sp 00007fffc479bfa0 error 6 Kernel panic - not syncing: Aiee, killing interrupt handler! udp_seq_stop() tries to unlock not yet locked spinlock. The lock was lost during splitting global udp_hash_lock to subsequent spinlocks. Signed-off by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-23 15:22:33 -07:00

1 2 3 4 5 ...

3166 commits