kernel-hacking-2024-linux-steffo

unimore/kernel-hacking-2024-linux-steffo

Author	SHA1	Message	Date
Jeff Layton	13d34ac6e5	Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" This reverts commit `c510eff6be` ("fsnotify: destroy marks with call_srcu instead of dedicated thread"). Eryu reported that he was seeing some OOM kills kick in when running a testcase that adds and removes inotify marks on a file in a tight loop. The above commit changed the code to use call_srcu to clean up the marks. While that does (in principle) work, the srcu callback job is limited to cleaning up entries in small batches and only once per jiffy. It's easily possible to overwhelm that machinery with too many call_srcu callbacks, and Eryu's reproduer did just that. There's also another potential problem with using call_srcu here. While you can obviously sleep while holding the srcu_read_lock, the callbacks run under local_bh_disable, so you can't sleep there. It's possible when putting the last reference to the fsnotify_mark that we'll end up putting a chain of references including the fsnotify_group, uid, and associated keys. While I don't see any obvious ways that that could occurs, it's probably still best to avoid using call_srcu here after all. This patch reverts the above patch. A later patch will take a different approach to eliminated the dedicated thread here. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reported-by: Eryu Guan <guaneryu@gmail.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Cc: Jan Kara <jack@suse.com> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-02-18 16:23:24 -08:00
Kirill A. Shutemov	48f7df3294	mm: fix regression in remap_file_pages() emulation Grazvydas Ignotas has reported a regression in remap_file_pages() emulation. Testcase: #define _GNU_SOURCE #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <sys/mman.h> #define SIZE (4096 * 3) int main(int argc, char *argv) { unsigned long p; long i; p = mmap(NULL, SIZE, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(p)] = i; if (remap_file_pages(p, 4096, 0, 1, 0)) { perror("remap_file_pages"); return -1; } if (remap_file_pages(p, 4096 2, 0, 1, 0)) { perror("remap_file_pages"); return -1; } assert(p[0] == 1); munmap(p, SIZE); return 0; } The second remap_file_pages() fails with -EINVAL. The reason is that remap_file_pages() emulation assumes that the target vma covers whole area we want to over map. That assumption is broken by first remap_file_pages() call: it split the area into two vma. The solution is to check next adjacent vmas, if they map the same file with the same flags. Fixes: `c8d78c1823` ("mm: replace remap_file_pages() syscall with emulation") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Grazvydas Ignotas <notasas@gmail.com> Tested-by: Grazvydas Ignotas <notasas@gmail.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-02-18 16:23:24 -08:00
Kirill A. Shutemov	69a8ec2d81	thp, dax: do not try to withdraw pgtable from non-anon VMA DAX doesn't deposit pgtables when it maps huge pages: nothing to withdraw. It can lead to crash. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-02-18 16:23:24 -08:00
Arnd Bergmann	f3bb23764f	USB: cdc_subset: only build when one driver is enabled This avoids a harmless randconfig warning I get when USB_NET_CDC_SUBSET is enabled, but all of the more specific drivers are not: drivers/net/usb/cdc_subset.c:241:2: #warning You need to configure some hardware for this driver The current behavior is clearly intentional, giving a warning when a user picks a configuration that won't do anything good. The only reason for even addressing this is that I'm getting close to eliminating all 'randconfig' warnings on ARM, and this came up a couple of times. My workaround is to not even build the module when none of the configurations are enable. Alternatively we could simply remove the #warning (nothing wrong for compile-testing), turn it into a runtime warning, or change the Kconfig options into a menu to hide CONFIG_USB_NET_CDC_SUBSET. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 15:59:45 -05:00
Jiri Benc	f468a729a2	vxlan: do not use fdb in metadata mode In metadata mode, the vxlan interface is not supposed to use the fdb control plane but an external one (openvswitch or static routes). With the current code, packets may leak into the fdb handling code which usually causes them to be dropped anyway but may have strange side effects. Just drop the packets directly when in metadata mode if the destination data are not correctly provided on egress. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 15:01:14 -05:00
Anton Protopopov	e60b13e4f5	mISDN: prevent possible NULL pointer dereference A return value of the bchannel_get_rxbuf() function is compared with the positive ENOMEM value instead of the negative -ENOMEM value to detect a memory allocation problem. Thus, after a possible memory allocation failure the bc->bch.rx_skb will be NULL which will lead to a NULL pointer dereference. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:59:35 -05:00
Anton Protopopov	449f14f01f	net: caif: fix erroneous return value The cfrfml_receive() function might return positive value EPROTO Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:59:35 -05:00
Anton Protopopov	48bb230e87	appletalk: fix erroneous return value The atalk_sendmsg() function might return wrong value ENETUNREACH instead of -ENETUNREACH. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:59:34 -05:00
Amitoj Kaur Chawla	a09f4af177	lance: Return correct error code Failure of kzalloc should cause the enclosing function to return -ENOMEM, not -ENODEV. Additionally, removed the following checkpatch warnings: ERROR: spaces required around that '==' (ctx:VxV) ERROR: space required before the open parenthesis '(' CHECK: Comparison to NULL could be written "!lp" Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:58:47 -05:00
Phil Sutter	a813104d92	IFF_NO_QUEUE: Fix for drivers not calling ether_setup() My implementation around IFF_NO_QUEUE driver flag assumed that leaving tx_queue_len untouched (specifically: not setting it to zero) by drivers would make it possible to assign a regular qdisc to them without having to worry about setting tx_queue_len to a useful value. This was only partially true: I overlooked that some drivers don't call ether_setup() and therefore not initialize tx_queue_len to the default value of 1000. Consequently, removing the workarounds in place for that case in qdisc implementations which cared about it (namely, pfifo, bfifo, gred, htb, plug and sfb) leads to problems with these specific interface types and qdiscs. Luckily, there's already a sanitization point for drivers setting tx_queue_len to zero, which can be reused to assign the fallback value most qdisc implementations used, which is 1. Fixes: `348e3435cb` ("net: sched: drop all special handling of tx_queue_len == 0") Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:56:53 -05:00
Jiri Benc	d13b161c2c	gre: clear IFF_TX_SKB_SHARING ether_setup sets IFF_TX_SKB_SHARING but this is not supported by gre as it modifies the skb on xmit. Also, clean up whitespace in ipgre_tap_setup when we're already touching it. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:43:48 -05:00
Jiri Benc	fc41cdb322	geneve: clear IFF_TX_SKB_SHARING ether_setup sets IFF_TX_SKB_SHARING but this is not supported by geneve as it modifies the skb on xmit. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:43:47 -05:00
Jiri Benc	82a0f6b4ab	vxlan: clear IFF_TX_SKB_SHARING ether_setup sets IFF_TX_SKB_SHARING but this is not supported by vxlan as it modifies the skb on xmit. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:43:47 -05:00
David Wragg	aeee0e66c6	geneve: Refine MTU limit Calculate the maximum MTU taking into account the size of headers involved in GENEVE encapsulation, as for other tunnel types. Changes in v3: - Correct comment style Changes in v2: - Conform more closely to ip_tunnel_change_mtu - Exclude GENEVE options from max MTU calculation Signed-off-by: David Wragg <david@weave.works> Acked-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 13:57:15 -05:00
Eric Dumazet	7716682cc5	tcp/dccp: fix another race at listener dismantle Ilya reported following lockdep splat: kernel: ========================= kernel: [ BUG: held lock freed! ] kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted kernel: ------------------------- kernel: swapper/5/0 is freeing memory ffff880035c9d200-ffff880035c9dbff, with a lock still held there! kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 kernel: 4 locks held by swapper/5/0: kernel: #0: (rcu_read_lock){......}, at: [<ffffffff8169ef6b>] netif_receive_skb_internal+0x4b/0x1f0 kernel: #1: (rcu_read_lock){......}, at: [<ffffffff816e977f>] ip_local_deliver_finish+0x3f/0x380 kernel: #2: (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>] sk_clone_lock+0x19b/0x440 kernel: #3: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 To properly fix this issue, inet_csk_reqsk_queue_add() needs to return to its callers if the child as been queued into accept queue. We also need to make sure listener is still there before calling sk->sk_data_ready(), by holding a reference on it, since the reference carried by the child can disappear as soon as the child is put on accept queue. Reported-by: Ilya Dryomov <idryomov@gmail.com> Fixes: `ebb516af60` ("tcp/dccp: fix race at listener dismantle phase") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 11:35:51 -05:00
Xin Long	deed49df73	route: check and remove route cache when we get route Since the gc of ipv4 route was removed, the route cached would has no chance to be removed, and even it has been timeout, it still could be used, cause no code to check it's expires. Fix this issue by checking and removing route cache when we get route. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 11:31:36 -05:00
Jamal Hadi Salim	619fe32640	net_sched fix: reclassification needs to consider ether protocol changes actions could change the etherproto in particular with ethernet tunnelled data. Typically such actions, after peeling the outer header, will ask for the packet to be reclassified. We then need to restart the classification with the new proto header. Example setup used to catch this: sudo tc qdisc add dev $ETH ingress sudo $TC filter add dev $ETH parent ffff: pref 1 protocol 802.1Q \ u32 match u32 0 0 flowid 1:1 \ action vlan pop reclassify Fixes: `3b3ae88026` ("net: sched: consolidate tc_classify{,_compat}") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 11:14:19 -05:00
David S. Miller	39712e599b	Merge branch 'mlxsw-fixes' Jiri Pirko says: ==================== mlxsw fixes Another bulk of fixes from Ido. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 10:44:27 -05:00
Ido Schimmel	28a01d2d7d	mlxsw: spectrum: Allow for PVID deletion When PVID is toggled off on a port member in a VLAN filtering bridge or the PVID VLAN is deleted, make the port drop untagged packets. Reverse the operation when PVID is toggled back on. Set the PVID back to the default (1), when leaving the bridge so that untagged traffic will be directed to the CPU. Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 10:44:26 -05:00
Ido Schimmel	148f472da5	mlxsw: reg: Add the Switch Port Acceptable Frame Types register When VLAN filtering is enabled on a bridge and PVID is deleted from a bridge port, then untagged frames are not allowed to ingress into the bridge from this port. Add the Switch Port Acceptable Frame Types (SPAFT) register, which configures the frame admittance of the port. Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 10:44:26 -05:00
Alexandra Yates	1a1503c539	i2c: i801: Adding Intel Lewisburg support for iTCO Starting from Intel Sunrisepoint (Skylake PCH) the iTCO watchdog resources have been moved to reside under the i801 SMBus host controller whereas previously they were under the LPC device. This patch adds Intel lewisburg SMBus support for iTCO device. It allows to load watchdog dynamically when the hardware is present. Fixes: `cdc5a3110e` ("i2c: i801: add Intel Lewisburg device IDs") Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org	2016-02-18 13:18:48 +01:00
Takashi Iwai	67ec1072b0	ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream A non-atomic PCM stream may take snd_pcm_link_rwsem rw semaphore twice in the same code path, e.g. one in snd_pcm_action_nonatomic() and another in snd_pcm_stream_lock(). Usually this is OK, but when a write lock is issued between these two read locks, the problem happens: the write lock is blocked due to the first reade lock, and the second read lock is also blocked by the write lock. This eventually deadlocks. The reason is the way rwsem manages waiters; it's queued like FIFO, so even if the writer itself doesn't take the lock yet, it blocks all the waiters (including reads) queued after it. As a workaround, in this patch, we replace the standard down_write() with an spinning loop. This is far from optimal, but it's good enough, as the spinning time is supposed to be relatively short for normal PCM operations, and the code paths requiring the write lock aren't called so often. Reported-by: Vinod Koul <vinod.koul@intel.com> Tested-by: Ramesh Babu <ramesh.babu@intel.com> Cc: <stable@vger.kernel.org> # v3.18+ Signed-off-by: Takashi Iwai <tiwai@suse.de>	2016-02-18 11:27:52 +01:00
Toshi Kani	f4eafd8bcd	x86/mm: Fix vmalloc_fault() to handle large pages properly A kernel page fault oops with the callstack below was observed when a read syscall was made to a pmem device after a huge amount (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64 system: BUG: unable to handle kernel paging request at ffff880840000ff8 IP: vmalloc_fault+0x1be/0x300 PGD c7f03a067 PUD 0 Oops: 0000 [#1] SM Call Trace: __do_page_fault+0x285/0x3e0 do_page_fault+0x2f/0x80 ? put_prev_entity+0x35/0x7a0 page_fault+0x28/0x30 ? memcpy_erms+0x6/0x10 ? schedule+0x35/0x80 ? pmem_rw_bytes+0x6a/0x190 [nd_pmem] ? schedule_timeout+0x183/0x240 btt_log_read+0x63/0x140 [nd_btt] : ? __symbol_put+0x60/0x60 ? kernel_read+0x50/0x80 SyS_finit_module+0xb9/0xf0 entry_SYSCALL_64_fastpath+0x1a/0xa4 Since v4.1, ioremap() supports large page (pud/pmd) mappings in x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc range is limited to pte mappings. vmalloc faults do not normally happen in ioremap'd ranges since ioremap() sets up the kernel page tables, which are shared by user processes. pgd_ctor() sets the kernel's PGD entries to user's during fork(). When allocation of the vmalloc ranges crosses a 512GB boundary, ioremap() allocates a new pud table and updates the kernel PGD entry to point it. If user process's PGD entry does not have this update yet, a read/write syscall to the range will cause a vmalloc fault, which hits the Oops above as it does not handle a large page properly. Following changes are made to vmalloc_fault(). 64-bit: - No change for the PGD sync operation as it handles large pages already. - Add pud_huge() and pmd_huge() to the validation code to handle large pages. - Change pud_page_vaddr() to pud_pfn() since an ioremap range is not directly mapped (while the if-statement still works with a bogus addr). - Change pmd_page() to pmd_pfn() since an ioremap range is not backed by struct page (while the if-statement still works with a bogus addr). 32-bit: - No change for the sync operation since the index3 PGD entry covers the entire vmalloc range, which is always valid. (A separate change to sync PGD entry is necessary if this memory layout is changed regardless of the page size.) - Add pmd_huge() to the validation code to handle large pages. This is for completeness since vmalloc_fault() won't happen in ioremap'd ranges as its PGD entry is always valid. Reported-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> # 4.1+ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-02-18 09:02:59 +01:00
Insu Yun	562a9f91a0	et131x: check return value of dma_alloc_coherent For error handling, dma_alloc_coherent's return value needs to be checked, not argument. Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 23:32:05 -05:00
David S. Miller	4fbe366ce3	Merge branch 'thunderx-fixes' Sunil Goutham says: ==================== net: thunderx: Miscellaneous fixes This patch series fixes couple of issues w.r.t multiqset mode and receive packet statastics. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 22:24:57 -05:00
Sunil Goutham	ad2ecebd67	net: thunderx: Fix receive packet stats Counting rx packets for every CQE_RX in CQ irq handler is incorrect. Synchronization is missing when multiple queues are receiving packets simultaneously. Like transmit packet stats use HW stats here. Also removed unused 'cqe_type' parameter in nicvf_rcv_pkt_handler(). Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 22:24:57 -05:00
Sunil Goutham	8d210d54c5	net: thunderx: Fix for HW TSO not enabled for secondary qsets For secondary Qsets 'hw_tso' is not getting set as probe() returns much earlier. Fixed it by moving silicon revision check. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 22:24:57 -05:00
Sunil Goutham	6a9bab79bb	net: thunderx: Fix for multiqset not configured upon interface toggle When a interface is assigned morethan 8 queues and the logical interface is toggled i.e down & up, additional queues or qsets are not initialized as secondary qset count is being set to zero while tearing down. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 22:24:57 -05:00
Insu Yun	1eea84b74c	tcp: correctly crypto_alloc_hash return check crypto_alloc_hash never returns NULL Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 22:23:04 -05:00
Florian Fainelli	73dcb55653	net: dsa: Unregister slave_dev in error path With commit `0071f56e46` ("dsa: Register netdev before phy"), we are now trying to free a network device that has been previously registered, and in case of errors, this will make us hit the BUG_ON(dev->reg_state != NETREG_UNREGISTERED) condition. Fix this by adding a missing unregister_netdev() before free_netdev(). Fixes: `0071f56e46` ("dsa: Register netdev before phy") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 22:05:16 -05:00
Clemens Gruber	79be1a1c90	phy: marvell: Fix and unify reg-init behavior For the Marvell 88E1510, marvell_of_reg_init was called too late, in the config_aneg function. Since commit `113c74d83e` ("net: phy: turn carrier off on phy attach"), this lead to the link not coming up at boot anymore, due to the phy state machine being stuck at waiting for interrupts (off by default on the 88E1510). For seven other Marvell PHYs, marvell_of_reg_init was not called at all. Add a generic marvell_config_init function, which in turn calls marvell_of_reg_init. PHYs, which already have a specific config_init function with a call to marvell_of_reg_init, are left untouched. The generic marvell_config_init function is called for all the others, to get consistent behavior across all Marvell PHYs. Fixes: `113c74d83e` ("net: phy: turn carrier off on phy attach") Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 16:20:40 -05:00
Jessica Yu	7dcd182bec	ftrace/module: remove ftrace module notifier Remove the ftrace module notifier in favor of directly calling ftrace_module_enable() and ftrace_release_mod() in the module loader. Hard-coding the function calls directly in the module loader removes dependence on the module notifier call chain and provides better visibility and control over what gets called when, which is important to kernel utilities such as livepatch. This fixes a notifier ordering issue in which the ftrace module notifier (and hence ftrace_module_enable()) for coming modules was being called after klp_module_notify(), which caused livepatch modules to initialize incorrectly. This patch removes dependence on the module notifier call chain in favor of hard coding the corresponding function calls in the module loader. This ensures that ftrace and livepatch code get called in the correct order on patch module load and unload. Fixes: `5156dca34a` ("ftrace: Fix the race between ftrace and insmod") Signed-off-by: Jessica Yu <jeyu@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-02-17 22:14:06 +01:00
Guillaume Nault	29e73269aa	pppoe: fix reference counting in PPPoE proxy Drop reference on the relay_po socket when __pppoe_xmit() succeeds. This is already handled correctly in the error path. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 16:02:01 -05:00
Geert Uytterhoeven	705bcdda81	ravb: Update DT binding example for final CPG/MSSR bindings The example in the DT binding documentation uses the preliminary DT bindings for the r8a7795 MSTP clocks, which never went upstream. Update the example to use the DT bindings for the upstream Clock Pulse Generator / Module Standby and Software Reset hardware block. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 15:55:50 -05:00
David S. Miller	9ffa8a1863	Merge branch 'mlxsw-fixes' Jiri Pirko says: ==================== mlxsw fixes Just a couple of fixes from Ido. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 15:52:59 -05:00
Ido Schimmel	6a9863a622	mlxsw: spectrum: Set STP state when leaving 802.1D bridge When a VLAN device leaves a bridge its STP state is set to DISABLED, which causes the hardware to discard any packets coming through the port with this VLAN. Fix that by setting STP state to FORWARDING when the device leaves its bridge and allow traffic to be directed to CPU. Fixes: `26f0e7fb15` ("mlxsw: spectrum: Add support for VLAN devices bridging") Reported-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 15:52:58 -05:00
Ido Schimmel	1e5ad30c64	mlxsw: Treat local port 64 as valid MLXSW_PORT_MAX_PORTS represents the maximum number of local ports, which is 65 for both ASICs (SwitchX-2 and Spectrum) supported by this driver. Fixes: `93c1edb27f` ("mlxsw: Introduce Mellanox switch driver core") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 15:52:58 -05:00
Mark Tomlinson	853effc55b	l2tp: Fix error creating L2TP tunnels A previous commit (`33f72e6`) added notification via netlink for tunnels when created/modified/deleted. If the notification returned an error, this error was returned from the tunnel function. If there were no listeners, the error code ESRCH was returned, even though having no listeners is not an error. Other calls to this and other similar notification functions either ignore the error code, or filter ESRCH. This patch checks for ESRCH and does not flag this as an error. Reviewed-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz> Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 15:34:47 -05:00
Linus Torvalds	2850713576	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A collection of fixes from the past few weeks that should go into 4.5. This contains: - Overflow fix for sysfs discard show function from Alan. - A stacking limit init fix for max_dev_sectors, so we don't end up artificially capping some use cases. From Keith. - Have blk-mq proper end unstarted requests on a dying queue, instead of pushing that to the driver. From Keith. - NVMe: - Update to Kconfig description for NVME_SCSI, since it was vague and having it on is important for some SUSE distros. From Christoph. - Set of fixes from Keith, around surprise removal. Also kills the no-merge flag, so it supports merging. - Set of fixes for lightnvm from Matias, Javier, and Wenwei. - Fix null_blk oops when asked for lightnvm, but not available. From Matias. - Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails if interrupted by a signal. - Two floppy fixes from Jiri, fixing signal handling and blocking open. - A use-after-free fix for O_DIRECT, from Mike Krinkin. - A block module ref count fix from Roman Pen. - An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini. - Smaller reallo fix for xen-blkfront from Bob Liu. - Removal of an unused struct member in the deadline IO scheduler, from Tahsin. - Also from Tahsin, properly initialize inode struct members associated with cgroup writeback, if enabled. - From Tejun, ensure that we keep the superblock pinned during cgroup writeback" * 'for-linus' of git://git.kernel.dk/linux-block: (25 commits) blk: fix overflow in queue_discard_max_hw_show writeback: initialize inode members that track writeback history writeback: keep superblock pinned during cgroup writeback association switches bio: return EINTR if copying to user space got interrupted NVMe: Rate limit nvme IO warnings NVMe: Poll device while still active during remove NVMe: Requeue requests on suspended queues NVMe: Allow request merges NVMe: Fix io incapable return values blk-mq: End unstarted requests on dying queue block: Initialize max_dev_sectors to 0 null_blk: oops when initializing without lightnvm block: fix module reference leak on put_disk() call for cgroups throttle nvme: fix Kconfig description for BLK_DEV_NVME_SCSI kernel/fs: fix I/O wait not accounted for RW O_DSYNC floppy: refactor open() flags handling lightnvm: allow to force mm initialization lightnvm: check overflow and correct mlc pairs lightnvm: fix request intersection locking in rrpc lightnvm: warn if irqs are disabled in lock laddr ...	2016-02-17 11:59:23 -08:00
Linus Torvalds	c28b947d04	DeviceTree fixes for 4.5-rc5: - Fix irq msi-map calculation for nonzero rid-base. - Binding doc updates for GICv3, fsl-imx-uart, and S3C RTC. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWxH6ZAAoJEPr7XbWNvGHDyAwQAKR71b5HRhqd8Sm/RdkcBZhE 85qhzldnSmypeC9cD6D4dlCpGqHNn603udDJq2WGUtm5RT9lOee6Qb6vv5Fbf+M0 n9LLwZvvJz/udQms2Rc7pQAC/rwKewGQo37DPnqjuBGelVC3V5E7Yt9ikW0RjRlZ NR5Ku7NjdTUye9vqCWzh323QLNwTY3zfxl/dCBISP5VQmYq3SbBmJrvyksGI7izF sTs/fZbMD4IfWUfMZ6Yb2GqydFVorsNH2HQBmktL08c6lVJ/tVCcRWmgDvC+pQlB oG5rnFkICcxlNQD1utbxOLig63OLHp0sARGdRcQ5T8Mss84FuxxXyZcu3kcyalTM n43HoEhZlQqj9TjjMvb06Hfd9oedR+jYD2CCu4dh4fbdiyaZrGI8DMMaWlJEpDHM CTOnDepsUcaysPfiysG7kzSw9QNPxD/VbmcgUB/0Q3G2WO7e7ICzKs4s/W5UqAPR v2IRQ7cE5UOJ1xQ2gudsMEV54J3XFw5r1sMyD3ci9BuRV+nc5hWLpb4U6fmmxdOw 1lwSsysT8SXBlghiR0anRi/5dHQvKOBvpKC6GlD0py8LUGK1Qh5fj2wHZ5arTxoA FYsHBXbOJinsEeVN3Rj+jWlELjAXpHPWm3d1byr6MmO6Sn+CK+rlyhUG9q/RXxWv 2qOJKnwDEXBRZ4q6IDUJ =IpqU -----END PGP SIGNATURE----- Merge tag 'devicetree-fixes-for-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree fixes from Rob Herring: - Fix irq msi-map calculation for nonzero rid-base. - Binding doc updates for GICv3, fsl-imx-uart, and S3C RTC. * tag 'devicetree-fixes-for-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: rtc: s3c: Document required clocks in the DT binding serial: fsl-imx-uart: Fix typo in fsl,dte-mode description dt-bindings: arm, gic-v3: require that reserved cells are always 0 of/irq: Fix msi-map calculation for nonzero rid-base	2016-02-17 11:50:53 -08:00
Linus Torvalds	35683dd326	Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "This has two main sets of fixes: - A bunch of Exynos fixes, mainly for their MIC component. - vblank regression fixes from Mario, apparantly some changes in 4.4 caused some vblank breakage on radeon/nouveau, this set fixes all the issues seen. There is also a revert of one of the MST changse, that I was overzealous in including, that broke 30" MST monitors, and two qxl fixes" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/qxl: fix erroneous return value drm/nouveau/display: Enable vblank irqs after display engine is on again. drm/radeon/pm: Handle failure of drm_vblank_get. drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2) drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2) drm: No-Op redundant calls to drm_vblank_off() (v2) drm/qxl: use kmalloc_array to alloc reloc_info in qxl_process_single_command Revert "drm/dp/mst: change MST detection scheme" drm/exynos/decon: fix disable clocks order drm/exynos: fix incorrect cpu address for dma_mmap_attrs() drm/exynos: exynos5433_decon: fix wrong state in decon_vblank_enable drm/exynos: exynos5433_decon: fix wrong state assignment in decon_enable drm/exynos: dsi: restore support for drm bridge drm/exynos: mic: make all functions static drm/exynos: mic: convert to component framework drm/exynos: mic: use devm_clk interface drm/exynos: fix types for compilation on 64bit architectures drm/exynos: ipp: fix incorrect format specifiers in debug messages drm/exynos: depend on ARCH_EXYNOS for DRM_EXYNOS	2016-02-17 11:45:10 -08:00
Linus Torvalds	a9f70bd4e7	This includes two fixes. The first is something that has come up a few times and has been worked out individually, but it's come up now enough that the problem should be generic. Tracepoints are protected by RCU sched. There are several tracepoints within core infrastructure like kfree(). If a tracepoint is called when the CPU is going down, or when it's coming up but has yet to be recognized by RCU, a RCU warning is triggered. This is a true bug as that tracepoint is not protected by RCU. Usually, this is taken care of by testing for cpu online as a tracepoint condition. But as this is happening more often, moving it from a individual tracepoint to a check in the tracepoint infrastructure is more robust. Note, there is now a duplicate of a cpu online test, because this update does not remove the individual checks. But the overhead is small enough that the removal can be done in another release. The second change is strange linker breakage due to the branch tracer's builtin_constant_p() check failing, and treating the condition as a variable instead of a constant. Arnd Bergmann found that this can be fixed by testing !!(cond) instead of just (cond). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWw2vTAAoJEKKk/i67LK/8vkMIAI+Fx+S9sCeWVGp4VZ3DKH9K DibRD/2KREZe1AjYEU8ZAgo+VsFzW8OHiI1TI/1jP61YkiQSIhu6kVdPCoLG5buy 8WwiKEQ94VWC1hbPOiiq3K7THEu+M8zuFdU3+odS8E3sXIGqKPKQ3iFwwfTVHI6o /cMTuefqsxo/hj8VwwaZdwlgWwLltM8sR040auTTEsqBLZ7D1q0aCyBrnju3FtBt uSIPK91d92ANkpq3ELDihxBa41XSEahYgGm/ozewjHwpooWvIQz4tpGaxxkyltuE RzeYBrM5LNBQUaXZ6C6jAdL0Y+bukS2MdNUjv8U6LwKbUvQoLuYteGEQ9g/m+mE= =8LDX -----END PGP SIGNATURE----- Merge tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This includes two fixes. The first is something that has come up a few times and has been worked out individually, but it's come up now enough that the problem should be generic. Tracepoints are protected by RCU sched. There are several tracepoints within core infrastructure like kfree(). If a tracepoint is called when the CPU is going down, or when it's coming up but has yet to be recognized by RCU, a RCU warning is triggered. This is a true bug as that tracepoint is not protected by RCU. Usually, this is taken care of by testing for cpu online as a tracepoint condition. But as this is happening more often, moving it from a individual tracepoint to a check in the tracepoint infrastructure is more robust. Note, there is now a duplicate of a cpu online test, because this update does not remove the individual checks. But the overhead is small enough that the removal can be done in another release. The second change is strange linker breakage due to the branch tracer's builtin_constant_p() check failing, and treating the condition as a variable instead of a constant. Arnd Bergmann found that this can be fixed by testing !!(cond) instead of just (cond)" * tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix freak link error caused by branch tracer tracepoints: Do not trace when cpu is offline	2016-02-17 11:35:41 -08:00
Alan	18f922d037	blk: fix overflow in queue_discard_max_hw_show We get this right for queue_discard_max_show but not max_hw_show. Follow the same pattern as queue_discard_max_show instead so that we don't truncate. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-17 10:20:42 -07:00
David Rivshin	d148bbd37a	drivers: net: cpsw-phy-sel: add dev_warn() for unsupported PHY mode The cpsw-phy-sel driver supports only MII, RMII, and RGMII PHY modes, and silently handled any other values as if MII was specified. In a case where the PHY mode was incorrectly specified, or a bug elsewhere, there would be no indication of a problem. If MII was the correct mode, then this will go unnoticed, otherwise the symptom will be a failure to transmit/receive data over the RMII/RGMII link. Add a dev_warn() to make this condition obvious and provide a breadcrumb to follow. Cc: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David Rivshin <drivshin@allworx.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:50:15 -05:00
Woojung.Huh@microchip.com	cd772de358	phy: keep pause flags in phy driver features genphy_config_init() masked out pause flags set in phy driver structure. Pause flags needs to be preserved in phydev->supported & phydev->advertising. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:48:07 -05:00
David S. Miller	1543b765d2	Merge branch 'mlx4-fixes' Or Gerlitz says: ==================== Mellanox 10/40G mlx4 driver fixes for 4.5-rc Bunch of fixes from the team to the mlx4 Eth and core drivers. Series generated against net commit `aac8d3c` "qmi_wwan: add "4G LTE usb-modem U901"" Please push patches 1,2 and 6 to -stable as well changes from v0: - handled another wrongly accounted HW counter in patch #1 (Rick) - fixed coding style issues in patch #4 (Sergei) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:27 -05:00
Eugenia Emantayev	925ab1aa93	net/mlx4_en: Avoid changing dev->features directly in run-time It's forbidden to manually change dev->features in run-time. Currently, this is done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when VXLAN tunnel is set. However, since the stack actually does features intersection with hw_enc_features, we can safely revert to advertizing features early when registering the netdevice. Fixes: `f4a1edd561` ('net/mlx4_en: Advertize encapsulation offloads [...]') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:27 -05:00
Huy Nguyen	85743f1eb3	net/mlx4_core: Set UAR page size to 4KB regardless of system page size problem description: The current code sets UAR page size equal to system page size. The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages. The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages. solution: Always set UAR page to 4KB. This allows more UAR pages if the OS has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB system page size, with 4MB uar region, there are 4MB/2/64KB = 32 uars (half for uar, half for blueflame). This does not meet minimum 128 UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars which meet the minimum requirement. Note that only codes in mlx4_core that deal with firmware know that uar page size is 4KB. Codes that deal with usr page in cq and qp context (mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption that uar page size equals to system page size. Note that with this implementation, on 64KB system page size kernel, there are 16 uars per system page but only one uars is used. The other 15 uars are ignored because of the above assumption. Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size to 4KB and mlx4_core code in virtual OS will obtain the uar page size from firmware. Regarding backward compatibility in SR-IOV, if hypervisor has this new code, the virtual OS must be updated. If hypervisor has old code, and the virtual OS has this new code, the new code will be backward compatible with the old code. If the uar size is big enough, this new code in VF continues to work with 64 KB uar page size (on PowerPc kernel). If the uar size does not meet 128 uars requirement, this new code not loaded in VF and print the same error message as the old code in Hypervisor. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:27 -05:00
Daniel Jurgens	22e3817e6c	net/mlx4_core: Do not BUG_ON during reset when PCI is offline The PCI channel could go offline during reset due to EEH. Don't bug on in this case, the error is recoverable. Fixes: `f6bc11e426` ('net/mlx4_core: Enhance the catas flow to support device reset') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:26 -05:00
Eran Ben Elisha	6b94bab0ee	net/mlx4_core: Fix potential corruption in counters database The error flow in procedure handle_existing_counter() is wrong. The procedure should exit after encountering the error, not continue as if everything is OK. Fixes: `68230242cd` ('net/mlx4_core: Add port attribute when tracking counters') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:26 -05:00

... 2 3 4 5 6 ...

574963 commits