Commit graph

857219 commits

Author SHA1 Message Date
Linus Torvalds
94a76d9b52 This pull request contains the following fixes for UBIFS and JFFS2:
UBIFS:
 
 - Don't block too long in writeback_inodes_sb()
 - Fix for a possible overrun of the log head
 - Fix double unlock in orphan_delete()
 
 JFFS2:
 
 - Remove C++ style from UAPI header and unbreak picky toolchains
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl1ik14WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wbP2D/4xVW7YP5Yyt6YrABJuclfoib30
 2LI6eOz0+5OojQKUbOzXCN9N7Dv4TLJKrCjRc9qKYTIB1DiQXuBDqtYKg6CTBhHb
 MjiftEDiBQ6j3jVmRxkQRXZEB9I3Uu9CkA8s65+UmL8peJfgNElpH34omsU1fzup
 y0NhZhj77P5jsAG6r7yXvuaofCOTlZIZVPya9FX17J0Ra+3rMOCtVEqnaHk2E5RB
 EQPAEByqXUIx7+9mOi1Krw7B7fesB7oOVbCykE5knX1pZQCTURP64yNr35WxN+7Z
 crcpdEQtf54qWMCKf4ClIBHiPmmsDIHYJy3JXjgJKOwIYvrB3dZ5E170qPr3JixY
 nS+l8x69IYZhWUzHg8gxDizk92iFYKbO1h5vBwI7NUFHkHLzylsgonBK0KdaUnol
 OvI5oCO/rdJEMBPr5LEFpOjZJIEptPtXpDvQCpm5tWd5tuW+8edNpI38lDO9LThC
 O0diZZUQfsuzD1XrvKRORPU+4lskzGV5b1UA0DWXdGKALqM5VrQZo1XftvA74Zkv
 oZQcHNK5wdecQX81Oadfb/0a5SN7FGGtTUCKTpOyBIu0adarGIasC6TQr2aDiiNh
 7jLjBoV2XEGhXZQrK2lm8G+6rJ7Mp11B6aoTFgDELzt+SB7htp6dARR2+4aGWXh9
 iXgme0n9HXDDeuosag==
 =Bsgx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBIFS and JFFS2 fixes from Richard Weinberger:
 "UBIFS:
   - Don't block too long in writeback_inodes_sb()
   - Fix for a possible overrun of the log head
   - Fix double unlock in orphan_delete()

  JFFS2:
   - Remove C++ style from UAPI header and unbreak picky toolchains"

* tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubifs: Limit the number of pages in shrink_liability
  ubifs: Correctly initialize c->min_log_bytes
  ubifs: Fix double unlock around orphan_delete()
  jffs2: Remove C++ style comments from uapi header
2019-08-25 11:29:27 -07:00
Linus Torvalds
146c3d3220 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A few fixes for x86:

   - Fix a boot regression caused by the recent bootparam sanitizing
     change, which escaped the attention of all people who reviewed that
     code.

   - Address a boot problem on machines with broken E820 tables caused
     by an underflow which ended up placing the trampoline start at
     physical address 0.

   - Handle machines which do not advertise a legacy timer of any form,
     but need calibration of the local APIC timer gracefully by making
     the calibration routine independent from the tick interrupt. Marked
     for stable as well as there seems to be quite some new laptops
     rolled out which expose this.

   - Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are
     affected by broken firmware which does not initialize RDRAND
     correctly after resume. Add a command line parameter to override
     this for machine which either do not use suspend/resume or have a
     fixed BIOS. Unfortunately there is no way to detect this on boot,
     so the only safe decision is to turn it off by default.

   - Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which
     caused fast KVM instruction emulation to break.

   - Explain the Intel CPU model naming convention so that the repeating
     discussions come to an end"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
  x86/boot: Fix boot regression caused by bootparam sanitizing
  x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
  x86/boot/compressed/64: Fix boot on machines with broken E820 table
  x86/apic: Handle missing global clockevent gracefully
  x86/cpu: Explain Intel model naming convention
2019-08-25 10:10:15 -07:00
Linus Torvalds
5a13fc3d8b Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping fix from Thomas Gleixner:
 "A single fix for a regression caused by the generic VDSO
  implementation where a math overflow causes CLOCK_BOOTTIME to become a
  random number generator"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
2019-08-25 10:08:01 -07:00
Linus Torvalds
8a04c2ee62 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "Handle the worker management in situations where a task is scheduled
  out on a PI lock contention correctly and schedule a new worker if
  possible"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Schedule new worker even if PI-blocked
2019-08-25 10:06:12 -07:00
Linus Torvalds
05bbb9360a Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Two small fixes for kprobes and perf:

   - Prevent a deadlock in kprobe_optimizer() causes by reverse lock
     ordering

   - Fix a comment typo"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Fix potential deadlock in kprobe_optimizer()
  perf/x86: Fix typo in comment
2019-08-25 10:03:32 -07:00
Linus Torvalds
44c471e436 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "A single fix for a imbalanced kobject operation in the irq decriptor
  code which was unearthed by the new warnings in the kobject code"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Properly pair kobject_del() with kobject_add()
2019-08-25 10:00:21 -07:00
Linus Torvalds
f47edb59bb Merge branch 'akpm' (patches from Andrew)
Mergr misc fixes from Andrew Morton:
 "11 fixes"

Mostly VM fixes, one psi polling fix, and one parisc build fix.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
  mm/zsmalloc.c: fix race condition in zs_destroy_pool
  mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
  mm, page_owner: handle THP splits correctly
  userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
  psi: get poll_work to run when calling poll syscall next time
  mm: memcontrol: flush percpu vmevents before releasing memcg
  mm: memcontrol: flush percpu vmstats before releasing memcg
  parisc: fix compilation errrors
  mm, page_alloc: move_freepages should not examine struct page of reserved memory
  mm/z3fold.c: fix race between migration and destruction
2019-08-25 09:56:27 -07:00
Takashi Iwai
75545304eb ALSA: seq: Fix potential concurrent access to the deleted pool
The input pool of a client might be deleted via the resize ioctl, the
the access to it should be covered by the proper locks.  Currently the
only missing place is the call in snd_seq_ioctl_get_client_pool(), and
this patch papers over it.

Reported-by: syzbot+4a75454b9ca2777f35c7@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-08-25 09:31:10 +02:00
Linus Torvalds
e67095fd2f dma-mapping fixes for 5.3-rc
Two fixes for regressions in this merge window:
 
  - select the Kconfig symbols for the noncoherent dma arch helpers
    on arm if swiotlb is selected, not just for LPAE to not break then
    Xen build, that uses swiotlb indirectly through swiotlb-xen
  - fix the page allocator fallback in dma_alloc_contiguous if the CMA
    allocation fails
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl1hvn4LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYON4w//Recfoy5T2Q4Gfjp1xVKGbr2sP7J93Vs7VCyQNZmX
 PrtzhmNKs4gxCEXVgHm+GVA+IJwQFqDtSFaPb8q3GQ+qM9NUDF4ScMFpfrLZsFr1
 dorm5kC1xcwrQtWjS1CQS/Gj0VBtWiMQOoUcAESMqgBIUo4ssj3Ny+vnh8hWgAOs
 oVDgOM4wt35bW0Pv/iY44uQzOq7xcYJUUYtPIiP9vMDrhPsxe6D1DgFQ4HZKJWix
 uS3BjZnsZDnLltXM/0CKdRV9wLF+jHYP/wJTztksRlr/A5V3FJ8lJIvgphxG1v3J
 tDfQs4BNuGWBjqdg+Qo6qOPEL9krvVYYVVql93DXwtPK/cJW1Z+0glgC2rbbHmIy
 ew35DFnYm9v0sFLZnbpuoHd6sQ9G59nTZstkqt/Z/hldBvKotwBpeuILAcMC9Nlw
 3iYW6Sz5L7cmkifC8OvopKKJWVoW5rVtMrVQw5niBiZVERtWbY825r/7ju2xYhZC
 iSAaUHT5wNtXsXQOTrFQ5LzTDBtgGyXRXgvNagEHhBf120jBQfOhvOCVT2HHOxdy
 5vx7xeeRS0M2HpxIsmd3XQjIUQEY9x1to4FKiYczGM1kcKeyWWBMFOXfLxe2Rmhg
 h14lbfsAxIEWdFkJAVFhjyjzC6IzxyVGtHCxw1iw0VgGzYATO/K6Oo8T2hG3HagR
 abQ=
 =DXk9
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
 "Two fixes for regressions in this merge window:

   - select the Kconfig symbols for the noncoherent dma arch helpers on
     arm if swiotlb is selected, not just for LPAE to not break then Xen
     build, that uses swiotlb indirectly through swiotlb-xen

   - fix the page allocator fallback in dma_alloc_contiguous if the CMA
     allocation fails"

* tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: fix zone selection after an unaddressable CMA allocation
  arm: select the dma-noncoherent symbols for all swiotlb builds
2019-08-24 20:00:11 -07:00
Andrey Ryabinin
00fb24a42a mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
The code like this:

	ptr = kmalloc(size, GFP_KERNEL);
	page = virt_to_page(ptr);
	offset = offset_in_page(ptr);
	kfree(page_address(page) + offset);

may produce false-positive invalid-free reports on the kernel with
CONFIG_KASAN_SW_TAGS=y.

In the example above we lose the original tag assigned to 'ptr', so
kfree() gets the pointer with 0xFF tag.  In kfree() we check that 0xFF
tag is different from the tag in shadow hence print false report.

Instead of just comparing tags, do the following:

1) Check that shadow doesn't contain KASAN_TAG_INVALID.  Otherwise it's
   double-free and it doesn't matter what tag the pointer have.

2) If pointer tag is different from 0xFF, make sure that tag in the
   shadow is the same as in the pointer.

Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com
Fixes: 7f94ffbc4c ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Walter Wu <walter-zh.wu@mediatek.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Henry Burns
701d678599 mm/zsmalloc.c: fix race condition in zs_destroy_pool
In zs_destroy_pool() we call flush_work(&pool->free_work).  However, we
have no guarantee that migration isn't happening in the background at
that time.

Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages.  But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work().  Which would mean
pages still pointing at the inode when we free it.

Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages).  This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding.  Keeping that state under the class spinlock
keeps the logic straightforward.

In this case a memory leak could lead to an eventual crash if compaction
hits the leaked page.  This crash would only occur if people are
changing their zswap backend at runtime (which eventually starts
destruction).

Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Henry Burns
1a87aa0359 mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
In zs_page_migrate() we call putback_zspage() after we have finished
migrating all pages in this zspage.  However, the return value is
ignored.  If a zs_free() races in between zs_page_isolate() and
zs_page_migrate(), freeing the last object in the zspage,
putback_zspage() will leave the page in ZS_EMPTY for potentially an
unbounded amount of time.

To fix this, we need to do the same thing as zs_page_putback() does:
schedule free_work to occur.

To avoid duplicated code, move the sequence to a new
putback_zspage_deferred() function which both zs_page_migrate() and
zs_page_putback() call.

Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Vlastimil Babka
f7da677bc6 mm, page_owner: handle THP splits correctly
THP splitting path is missing the split_page_owner() call that
split_page() has.

As a result, split THP pages are wrongly reported in the page_owner file
as order-9 pages.  Furthermore when the former head page is freed, the
remaining former tail pages are not listed in the page_owner file at
all.  This patch fixes that by adding the split_page_owner() call into
__split_huge_page().

Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz
Fixes: a9627bc5e3 ("mm/page_owner: introduce split_page_owner and replace manual handling")
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Oleg Nesterov
46d0b24c5e userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
mm->core_state != NULL.

Otherwise a page fault can see userfaultfd_missing() == T and use an
already freed userfaultfd_ctx.

Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
Fixes: 04f5866e41 ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Jason Xing
7b2b55da1d psi: get poll_work to run when calling poll syscall next time
Only when calling the poll syscall the first time can user receive
POLLPRI correctly.  After that, user always fails to acquire the event
signal.

Reproduce case:
 1. Get the monitor code in Documentation/accounting/psi.txt
 2. Run it, and wait for the event triggered.
 3. Kill and restart the process.

The question is why we can end up with poll_scheduled = 1 but the work
not running (which would reset it to 0).  And the answer is because the
scheduling side sees group->poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker.  The
cancel needs to pair with resetting the poll_scheduled flag.

Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Jason Xing <kerneljasonxing@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Caspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Roman Gushchin
bb65f89b7d mm: memcontrol: flush percpu vmevents before releasing memcg
Similar to vmstats, percpu caching of local vmevents leads to an
accumulation of errors on non-leaf levels.  This happens because some
leftovers may remain in percpu caches, so that they are never propagated
up by the cgroup tree and just disappear into nonexistence with on
releasing of the memory cgroup.

To fix this issue let's accumulate and propagate percpu vmevents values
before releasing the memory cgroup similar to what we're doing with
vmstats.

Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.

Link: http://lkml.kernel.org/r/20190819202338.363363-4-guro@fb.com
Fixes: 42a3003535 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Roman Gushchin
c350a99ea2 mm: memcontrol: flush percpu vmstats before releasing memcg
Percpu caching of local vmstats with the conditional propagation by the
cgroup tree leads to an accumulation of errors on non-leaf levels.

Let's imagine two nested memory cgroups A and A/B.  Say, a process
belonging to A/B allocates 100 pagecache pages on the CPU 0.  The percpu
cache will spill 3 times, so that 32*3=96 pages will be accounted to A/B
and A atomic vmstat counters, 4 pages will remain in the percpu cache.

Imagine A/B is nearby memory.max, so that every following allocation
triggers a direct reclaim on the local CPU.  Say, each such attempt will
free 16 pages on a new cpu.  That means every percpu cache will have -16
pages, except the first one, which will have 4 - 16 = -12.  A/B and A
atomic counters will not be touched at all.

Now a user removes A/B.  All percpu caches are freed and corresponding
vmstat numbers are forgotten.  A has 96 pages more than expected.

As memory cgroups are created and destroyed, errors do accumulate.  Even
1-2 pages differences can accumulate into large numbers.

To fix this issue let's accumulate and propagate percpu vmstat values
before releasing the memory cgroup.  At this point these numbers are
stable and cannot be changed.

Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.

Link: http://lkml.kernel.org/r/20190819202338.363363-2-guro@fb.com
Fixes: 42a3003535 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Qian Cai
bbcb03a97f parisc: fix compilation errrors
Commit 0cfaee2af3 ("include/asm-generic/5level-fixup.h: fix variable
'p4d' set but not used") converted a few functions from macros to static
inline, which causes parisc to complain,

  In file included from include/asm-generic/4level-fixup.h:38:0,
                   from arch/parisc/include/asm/pgtable.h:5,
                   from arch/parisc/include/asm/io.h:6,
                   from include/linux/io.h:13,
                   from sound/core/memory.c:9:
  include/asm-generic/5level-fixup.h:14:18: error: unknown type name 'pgd_t'; did you mean 'pid_t'?
   #define p4d_t    pgd_t
                    ^
  include/asm-generic/5level-fixup.h:24:28: note: in expansion of macro 'p4d_t'
   static inline int p4d_none(p4d_t p4d)
                              ^~~~~

It is because "4level-fixup.h" is included before "asm/page.h" where
"pgd_t" is defined.

Link: http://lkml.kernel.org/r/20190815205305.1382-1-cai@lca.pw
Fixes: 0cfaee2af3 ("include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used")
Signed-off-by: Qian Cai <cai@lca.pw>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
David Rientjes
cd96103838 mm, page_alloc: move_freepages should not examine struct page of reserved memory
After commit 907ec5fca3 ("mm: zero remaining unavailable struct
pages"), struct page of reserved memory is zeroed.  This causes
page->flags to be 0 and fixes issues related to reading
/proc/kpageflags, for example, of reserved memory.

The VM_BUG_ON() in move_freepages_block(), however, assumes that
page_zone() is meaningful even for reserved memory.  That assumption is
no longer true after the aforementioned commit.

There's no reason why move_freepages_block() should be testing the
legitimacy of page_zone() for reserved memory; its scope is limited only
to pages on the zone's freelist.

Note that pfn_valid() can be true for reserved memory: there is a
backing struct page.  The check for page_to_nid(page) is also buggy but
reserved memory normally only appears on node 0 so the zeroing doesn't
affect this.

Move the debug checks to after verifying PageBuddy is true.  This
isolates the scope of the checks to only be for buddy pages which are on
the zone's freelist which move_freepages_block() is operating on.  In
this case, an incorrect node or zone is a bug worthy of being warned
about (and the examination of struct page is acceptable bcause this
memory is not reserved).

Why does move_freepages_block() gets called on reserved memory? It's
simply math after finding a valid free page from the per-zone free area
to use as fallback.  We find the beginning and end of the pageblock of
the valid page and that can bring us into memory that was reserved per
the e820.  pfn_valid() is still true (it's backed by a struct page), but
since it's zero'd we shouldn't make any inferences here about comparing
its node or zone.  The current node check just happens to succeed most
of the time by luck because reserved memory typically appears on node 0.

The fix here is to validate that we actually have buddy pages before
testing if there's any type of zone or node strangeness going on.

We noticed it almost immediately after bringing 907ec5fca3 in on
CONFIG_DEBUG_VM builds.  It depends on finding specific free pages in
the per-zone free area where the math in move_freepages() will bring the
start or end pfn into reserved memory and wanting to claim that entire
pageblock as a new migratetype.  So the path will be rare, require
CONFIG_DEBUG_VM, and require fallback to a different migratetype.

Some struct pages were already zeroed from reserve pages before
907ec5fca3c so it theoretically could trigger before this commit.  I
think it's rare enough under a config option that most people don't run
that others may not have noticed.  I wouldn't argue against a stable tag
and the backport should be easy enough, but probably wouldn't single out
a commit that this is fixing.

Mel said:

: The overhead of the debugging check is higher with this patch although
: it'll only affect debug builds and the path is not particularly hot.
: If this was a concern, I think it would be reasonable to simply remove
: the debugging check as the zone boundaries are checked in
: move_freepages_block and we never expect a zone/node to be smaller than
: a pageblock and stuck in the middle of another zone.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1908122036560.10779@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Henry Burns
d776aaa989 mm/z3fold.c: fix race between migration and destruction
In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq).
However, we have no guarantee that migration isn't happening in the
background at that time.

Migration directly calls queue_work_on(pool->compact_wq), if destruction
wins that race we are using a destroyed workqueue.

Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Zhu Yanjun
e0e6d06282 net: rds: add service level support in rds-info
>From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
is used to identify different flows within an IBA subnet.
It is carried in the local route header of the packet.

Before this commit, run "rds-info -I". The outputs are as
below:
"
RDS IB Connections:
 LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
"
After this commit, the output is as below:
"
RDS IB Connections:
 LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
"

The commit fe3475af3b ("net: rds: add per rds connection cache
statistics") adds cache_allocs in struct rds_info_rdma_connection
as below:
struct rds_info_rdma_connection {
...
        __u32           rdma_mr_max;
        __u32           rdma_mr_size;
        __u8            tos;
        __u32           cache_allocs;
 };
The peer struct in rds-tools of struct rds_info_rdma_connection is as
below:
struct rds_info_rdma_connection {
...
        uint32_t        rdma_mr_max;
        uint32_t        rdma_mr_size;
        uint8_t         tos;
        uint8_t         sl;
        uint32_t        cache_allocs;
};
The difference between userspace and kernel is the member variable sl.
In the kernel struct, the member variable sl is missing. This will
introduce risks. So it is necessary to use this commit to avoid this risk.

Fixes: fe3475af3b ("net: rds: add per rds connection cache statistics")
CC: Joe Jin <joe.jin@oracle.com>
CC: JUNXIAO_BI <junxiao.bi@oracle.com>
Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:55:25 -07:00
John Fastabend
e93fb3e952 net: route dump netlink NLM_F_MULTI flag missing
An excerpt from netlink(7) man page,

  In multipart messages (multiple nlmsghdr headers with associated payload
  in one byte stream) the first and all following headers have the
  NLM_F_MULTI flag set, except for the last  header  which  has the type
  NLMSG_DONE.

but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a
FIB dump. The result is user space applications following above man page
excerpt may get confused and may stop parsing msg believing something went
wrong.

In the golang netlink lib [0] the library logic stops parsing believing the
message is not a multipart message. Found this running Cilium[1] against
net-next while adding a feature to auto-detect routes. I noticed with
multiple route tables we no longer could detect the default routes on net
tree kernels because the library logic was not returning them.

Fix this by handling the fib_dump_info_fnhe() case the same way the
fib_dump_info() handles it by passing the flags argument through the
call chain and adding a flags argument to rt_fill_info().

Tested with Cilium stack and auto-detection of routes works again. Also
annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
NLMSG_DONE flags look correct after this.

Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
as is done for fib_dump_info() so this looks correct to me.

[0] https://github.com/vishvananda/netlink/
[1] https://github.com/cilium/

Fixes: ee28906fd7 ("ipv4: Dump route exceptions if requested")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:49:48 -07:00
Julian Wiedmann
292a50e3fc s390/qeth: reject oversized SNMP requests
Commit d4c08afafa ("s390/qeth: streamline SNMP cmd code") removed
the bounds checking for req_len, under the assumption that the check in
qeth_alloc_cmd() would suffice.

But that code path isn't sufficiently robust to handle a user-provided
data_length, which could overflow (when adding the cmd header overhead)
before being checked against QETH_BUFSIZE. We end up allocating just a
tiny iob, and the subsequent copy_from_user() writes past the end of
that iob.

Special-case this path and add a coarse bounds check, to protect against
maliciuous requests. This let's the subsequent code flow do its normal
job and precise checking, without risk of overflow.

Fixes: d4c08afafa ("s390/qeth: streamline SNMP cmd code")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:34:08 -07:00
zhanglin
b45ce32135 sock: fix potential memory leak in proto_register()
If protocols registered exceeded PROTO_INUSE_NR, prot will be
added to proto_list, but no available bit left for prot in
proto_inuse_idx.

Changes since v2:
* Propagate the error code properly

Signed-off-by: zhanglin <zhang.lin16@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:33:14 -07:00
David S. Miller
d37fb9758f mlx5-fixes-2019-08-22
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1e/VgACgkQSD+KveBX
 +j6IgAgAt5c/f8Q06yrCBdUw1plX0sOXY+0TCT3iRCVQxkGsr4KSn8SmFK1Jn3PC
 CU1SF0dpmKHWMxjOBI+vONSErFayNEGhHfXIaiug5b/Bcs6VEi4hSzjyj6DDJSbn
 8sn/1enVR9S9ZlMprKexUu3YBb/sWXKAx9bQdZu82yfi8o/PKCr7kA3BimmFVJmH
 nXoGA++c6WJUOp/4vh8FqxI8zWjvPNt8cf4qEu/gjN2y/LtAMBx9BLyY1Zd3yqWV
 Pa0y741NA4LFtzWSVfJtLFS8rdVyvigbbDYrDO+yydQBO5VD/Qumu+ENnPOqWhe+
 CfMP9KAP/aBGXTBS8Atj9gQVoSXYeQ==
 =GQ0j
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2019-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-08-22

This series introduces some fixes to mlx5 driver.

1) Form Moshe, two fixes for firmware health reporter
2) From Eran, two ktls fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:27:09 -07:00
Andrew Lunn
0c69b19f92 MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
Russell king maintains phylink, as part of the SFP module support.
However, much of the review work is about drivers swapping from phylib
to phylink. Such changes don't make changes to the phylink core, and
so the F: rules in MAINTAINERS don't match. Add a K:, keywork rule,
which hopefully get_maintainers will match against for patches to MAC
drivers swapping to phylink.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:14:55 -07:00
David S. Miller
9b45ff9106 Merge branch 'collect_md-mode-dev-null'
Hangbin Liu says:

====================
fix dev null pointer dereference when send packets larger than mtu in collect_md mode

When we send a packet larger than PMTU, we need to reply with
icmp_send(ICMP_FRAG_NEEDED) or icmpv6_send(ICMPV6_PKT_TOOBIG).

But with collect_md mode, kernel will crash while accessing the dst dev
as __metadata_dst_init() init dst->dev to NULL by default. Here is what
the code path looks like, for GRE:

- ip6gre_tunnel_xmit
  - ip6gre_xmit_ipv4
    - __gre6_xmit
      - ip6_tnl_xmit
        - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmp_send
      - net = dev_net(rt->dst.dev); <-- here
  - ip6gre_xmit_ipv6
    - __gre6_xmit
      - ip6_tnl_xmit
        - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmpv6_send
      ...
      - decode_session4
        - oif = skb_dst(skb)->dev->ifindex; <-- here
      - decode_session6
        - oif = skb_dst(skb)->dev->ifindex; <-- here

We could not fix it in __metadata_dst_init() as there is no dev supplied.
Look in to the __icmp_send()/decode_session{4,6} code we could find the dst
dev is actually not needed. In __icmp_send(), we could get the net by skb->dev.
For decode_session{4,6}, as it was called by xfrm_decode_session_reverse()
in this scenario, the oif is not used by
fl4->flowi4_oif = reverse ? skb->skb_iif : oif;

The reproducer is easy:

ovs-vsctl add-br br0
ip link set br0 up
ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre options:remote_ip=$dst_addr
ip link set gre0 up
ip addr add ${local_gre6}/64 dev br0
ping6 $remote_gre6 -s 1500

The kernel will crash like
[40595.821651] BUG: kernel NULL pointer dereference, address: 0000000000000108
[40595.822411] #PF: supervisor read access in kernel mode
[40595.822949] #PF: error_code(0x0000) - not-present page
[40595.823492] PGD 0 P4D 0
[40595.823767] Oops: 0000 [#1] SMP PTI
[40595.824139] CPU: 0 PID: 2831 Comm: handler12 Not tainted 5.2.0 #57
[40595.824788] Hardware name: Red Hat KVM, BIOS 1.11.1-3.module+el8.1.0+2983+b2ae9c0a 04/01/2014
[40595.825680] RIP: 0010:__xfrm_decode_session+0x6b/0x930
[40595.826219] Code: b7 c0 00 00 00 b8 06 00 00 00 66 85 d2 0f b7 ca 48 0f 45 c1 44 0f b6 2c 06 48 8b 47 58 48 83 e0 fe 0f 84 f4 04 00 00 48 8b 00 <44> 8b 80 08 01 00 00 41 f6 c4 01 4c 89 e7
ba 58 00 00 00 0f 85 47
[40595.828155] RSP: 0018:ffffc90000a73438 EFLAGS: 00010286
[40595.828705] RAX: 0000000000000000 RBX: ffff8881329d7100 RCX: 0000000000000000
[40595.829450] RDX: 0000000000000000 RSI: ffff8881339e70ce RDI: ffff8881329d7100
[40595.830191] RBP: ffffc90000a73470 R08: 0000000000000000 R09: 000000000000000a
[40595.830936] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a73490
[40595.831682] R13: 000000000000002c R14: ffff888132ff1301 R15: ffff8881329d7100
[40595.832427] FS:  00007f5bfcfd6700(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
[40595.833266] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40595.833883] CR2: 0000000000000108 CR3: 000000013a368000 CR4: 00000000000006f0
[40595.834633] Call Trace:
[40595.835392]  ? rt6_multipath_hash+0x4c/0x390
[40595.835853]  icmpv6_route_lookup+0xcb/0x1d0
[40595.836296]  ? icmpv6_xrlim_allow+0x3e/0x140
[40595.836751]  icmp6_send+0x537/0x840
[40595.837125]  icmpv6_send+0x20/0x30
[40595.837494]  tnl_update_pmtu.isra.27+0x19d/0x2a0 [ip_tunnel]
[40595.838088]  ip_md_tunnel_xmit+0x1b6/0x510 [ip_tunnel]
[40595.838633]  gre_tap_xmit+0x10c/0x160 [ip_gre]
[40595.839103]  dev_hard_start_xmit+0x93/0x200
[40595.839551]  sch_direct_xmit+0x101/0x2d0
[40595.839967]  __dev_queue_xmit+0x69f/0x9c0
[40595.840399]  do_execute_actions+0x1717/0x1910 [openvswitch]
[40595.840987]  ? validate_set.isra.12+0x2f5/0x3d0 [openvswitch]
[40595.841596]  ? reserve_sfa_size+0x31/0x130 [openvswitch]
[40595.842154]  ? __ovs_nla_copy_actions+0x1b4/0xad0 [openvswitch]
[40595.842778]  ? __kmalloc_reserve.isra.50+0x2e/0x80
[40595.843285]  ? should_failslab+0xa/0x20
[40595.843696]  ? __kmalloc+0x188/0x220
[40595.844078]  ? __alloc_skb+0x97/0x270
[40595.844472]  ovs_execute_actions+0x47/0x120 [openvswitch]
[40595.845041]  ovs_packet_cmd_execute+0x27d/0x2b0 [openvswitch]
[40595.845648]  genl_family_rcv_msg+0x3a8/0x430
[40595.846101]  genl_rcv_msg+0x47/0x90
[40595.846476]  ? __alloc_skb+0x83/0x270
[40595.846866]  ? genl_family_rcv_msg+0x430/0x430
[40595.847335]  netlink_rcv_skb+0xcb/0x100
[40595.847777]  genl_rcv+0x24/0x40
[40595.848113]  netlink_unicast+0x17f/0x230
[40595.848535]  netlink_sendmsg+0x2ed/0x3e0
[40595.848951]  sock_sendmsg+0x4f/0x60
[40595.849323]  ___sys_sendmsg+0x2bd/0x2e0
[40595.849733]  ? sock_poll+0x6f/0xb0
[40595.850098]  ? ep_scan_ready_list.isra.14+0x20b/0x240
[40595.850634]  ? _cond_resched+0x15/0x30
[40595.851032]  ? ep_poll+0x11b/0x440
[40595.851401]  ? _copy_to_user+0x22/0x30
[40595.851799]  __sys_sendmsg+0x58/0xa0
[40595.852180]  do_syscall_64+0x5b/0x190
[40595.852574]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[40595.853105] RIP: 0033:0x7f5c00038c7d
[40595.853489] Code: c7 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 8e f7 ff ff 48 89 04 24 b8 2e 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 d7 f7 ff ff 48 89
d0 48 83 c4 08 48 3d 01
[40595.855443] RSP: 002b:00007f5bfcf73c00 EFLAGS: 00003293 ORIG_RAX: 000000000000002e
[40595.856244] RAX: ffffffffffffffda RBX: 00007f5bfcf74a60 RCX: 00007f5c00038c7d
[40595.856990] RDX: 0000000000000000 RSI: 00007f5bfcf73c60 RDI: 0000000000000015
[40595.857736] RBP: 0000000000000004 R08: 0000000000000b7c R09: 0000000000000110
[40595.858613] R10: 0001000800050004 R11: 0000000000003293 R12: 000055c2d8329da0
[40595.859401] R13: 00007f5bfcf74120 R14: 0000000000000347 R15: 00007f5bfcf73c60
[40595.860185] Modules linked in: ip_gre ip_tunnel gre openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc bochs_drm ttm drm_kms_helper drm pcspkr joydev i2c_piix4 qemu_fw_cfg xfs libcrc32c virtio_net net_failover serio_raw failover ata_generic virtio_blk pata_acpi floppy
[40595.863155] CR2: 0000000000000108
[40595.863551] ---[ end trace 22209bbcacb4addd ]---

v4: Julian Anastasov remind skb->dev also could be NULL in icmp_send. We'd
better still use dst.dev and do a check to avoid crash.

v3: only replace pkg to packets in cover letter. So I didn't update the version
info in the follow up patches.

v2: fix it in __icmp_send() and decode_session{4,6} separately instead of
updating shared dst dev in {ip_md, ip6}_tunnel_xmit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 14:49:36 -07:00
Hangbin Liu
c3b4c3a47e xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
e,g, with tunnel collect_md mode, which will cause kernel crash.
Here is what the code path looks like, for GRE:

- ip6gre_tunnel_xmit
  - ip6gre_xmit_ipv6
    - __gre6_xmit
      - ip6_tnl_xmit
        - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmpv6_send
      - icmpv6_route_lookup
        - xfrm_decode_session_reverse
          - decode_session4
            - oif = skb_dst(skb)->dev->ifindex; <-- here
          - decode_session6
            - oif = skb_dst(skb)->dev->ifindex; <-- here

The reason is __metadata_dst_init() init dst->dev to NULL by default.
We could not fix it in __metadata_dst_init() as there is no dev supplied.
On the other hand, the skb_dst(skb)->dev is actually not needed as we
called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;

So make a dst dev check here should be clean and safe.

v4: No changes.

v3: No changes.

v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
in {ip_md, ip6}_tunnel_xmit.

Fixes: 8d79266bc4 ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 14:49:35 -07:00
Hangbin Liu
e2c6939341 ipv4/icmp: fix rt dst dev null pointer dereference
In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
e,g, with tunnel collect_md mode, which will cause kernel crash.
Here is what the code path looks like, for GRE:

- ip6gre_tunnel_xmit
  - ip6gre_xmit_ipv4
    - __gre6_xmit
      - ip6_tnl_xmit
        - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmp_send
      - net = dev_net(rt->dst.dev); <-- here

The reason is __metadata_dst_init() init dst->dev to NULL by default.
We could not fix it in __metadata_dst_init() as there is no dev supplied.
On the other hand, the reason we need rt->dst.dev is to get the net.
So we can just try get it from skb->dev when rt->dst.dev is NULL.

v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
still use dst.dev and do a check to avoid crash.

v3: No changes.

v2: fix the issue in __icmp_send() instead of updating shared dst dev
in {ip_md, ip6}_tunnel_xmit.

Fixes: c8b34e680a ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 14:49:35 -07:00
Linus Torvalds
083f0f2cd4 GPIO fixes for the v5.3 kernel:
- Fix not reporting open drain/source lines to userspace as "input"
 - Fix a minor build error found in randconfigs
 - Fix a chip select quirk on the Freescale SPI
 - Fix the irqchip initialization semantic order to reflect what
   it was using the old API
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl1hoSwACgkQQRCzN7AZ
 XXP7Gg/9HeB5NKtBeMjLh0iRjKfHuIs0dFdJVxV8hYI/BTx9nJN38dW5CXkE5rBP
 0sqyO5/HqQJpArrc5KfgcXDay3rRvYgBvNwaB0ZFaOmGhqDUjRL3ZzKcP7SzKCVg
 imNUDuwJgyUsop3mAWe2MkNrc+YKr5GIDb8pyxOZDP+JDudajoUnZRfEDZYjUx2t
 X2LjfkqcUpFH5IjPLgND2A43oS4dSrgZ1XTW5R0Dp36a8ZpfJFUYtkXFGdiu9tIa
 J0Koc4npHgUFMieKedDTbFXob+y6YekXkD2uI0LMByqLwz7cZ+l+YLYZJJSTGuSM
 sr2etxthfatnmmObbmv+A9dixzcA7CgG8jycqYSyhu88QJyDqlmJBF/cDgTYQss4
 bJRVsPapLSBjsQ5+DNCMWPiYLNRvdsI7AFAmZxdpF/iOzBhc331EqgAYqU7nG+4t
 XnmvCig5SQwwiILEApHAq5lqgbj36yx900UL8OSXpLFX4SfJFFs0rSVDRGesyhvv
 WBaUtsJYDvH7mh24SdC8CX7C/V+8EEx9Dht+NT459DvhaCF7xu+xUixloSVpbb1p
 GkMGIvxV2Sy6I7TzKciAoKgWxI2sCigmqSKhN5oWK5cAWzgTbD/9hJsycKJ47sL6
 tBbI/Xc0GHC5lPfTolVfvjefFWIOblhixiEY2Qafw2HSm+3cjB0=
 =yuPv
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "Here is a (hopefully last) set of GPIO fixes for the v5.3 kernel
  cycle. Two are pretty core:

   - Fix not reporting open drain/source lines to userspace as "input"

   - Fix a minor build error found in randconfigs

   - Fix a chip select quirk on the Freescale SPI

   - Fix the irqchip initialization semantic order to reflect what it
     was using the old API"

* tag 'gpio-v5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: Fix irqchip initialization order
  gpio: of: fix Freescale SPI CS quirk handling
  gpio: Fix build error of function redefinition
  gpiolib: never report open-drain/source lines as 'input' to user-space
2019-08-24 14:45:33 -07:00
Yi-Hung Wei
12c6bc38f9 openvswitch: Fix log message in ovs conntrack
Fixes: 06bd2bdf19 ("openvswitch: Add timeout support to ct action")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 14:18:59 -07:00
David S. Miller
12e2e15d83 Merge branch 'ieee802154-for-davem-2019-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:

====================
pull-request: ieee802154 for net 2019-08-24

An update from ieee802154 for your *net* tree.

Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for
ieee802154 and Colin Ian King cleaned up a redundant variable assignment.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 13:46:57 -07:00
Linus Torvalds
361469211f - Fix for panics and network failures on PAE guests by Dexuan Cui.
- Fix of a memory leak (and related cleanups) in the hyper-v keyboard
 driver by Dexuan Cui.
 - Code cleanups for hyper-v clocksource driver during the merge window
 by Dexuan Cui.
 - Fix for a false positive warning in the userspace hyper-v KVP store by
 Vitaly Kuznetsov.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl1hT7EACgkQ3qZv95d3
 LNy4PQ/+J8FDpxqBlg7gh5iv0FMM6z/knU2z6RNuwsXs5xPCF/rVvKb1MlW0V4pO
 apGWAqCZbpslQfxN8rNKgOtJQ9FRSXVKzLapfeVI9loVvkoSZCGxrLU1fS0NBEL2
 Eg1qzj9qAHMA6UyClNhQipKuK83ZBJJCPsupD2IjLzb0UcPKXgFS3FTJP7UkcREh
 jV8djoyzmwQJDFBWsbp1IY1ZAhDhnqeEYJ8GZXziU58TfoT9zS0zrlO5/Pf+0mPT
 5QNmLpK/e/l1gDra84DEY9zQUUYepHbCd54XGTMaMGPnUWXCCgDQcEAC4n1/i3ZP
 lUP5AG994F0Z5csN3DKtTnX2Mz7DSXmx9ro4jnHDWVqhHKwrFqd9Ml+ukDkICf/S
 6dgVcV9pUF1zvO235Dujo2Ht+95eHg+Dbh9aW21khzwKGUd/FyxzirpuTGQZol7N
 CjnKnWkVyz8lGjrE6UQwa8oguqyIOjVPAXQv1xI1+jrVnXlZ+EbpMwnVBy39A1SU
 BpimckYZwznmKVpmTU6L9MrIfM/cU1KTXt6mMLQfArdMef1a3O6AWsFg/mawFQPq
 7C5ixC2BwW9yA0pjO1igmNDgnOqqIjqsOxl7BtQOxgnu5QPkLIY/FVXHNkhygsU3
 CsWyo3C7QQK21MmIAFU0mFlrdMLIKuldNf/wkUVBcOTSMbX72CM=
 =GJPQ
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V fixes from Sasha Levin:

 - Fix for panics and network failures on PAE guests by Dexuan Cui.

 - Fix of a memory leak (and related cleanups) in the hyper-v keyboard
   driver by Dexuan Cui.

 - Code cleanups for hyper-v clocksource driver during the merge window
   by Dexuan Cui.

 - Fix for a false positive warning in the userspace hyper-v KVP store
   by Vitaly Kuznetsov.

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAE
  Tools: hv: kvp: eliminate 'may be used uninitialized' warning
  Input: hyperv-keyboard: Use in-place iterator API in the channel callback
  Drivers: hv: vmbus: Remove the unused "tsc_page" from struct hv_context
2019-08-24 11:42:06 -07:00
Linus Torvalds
0a022eccf7 arm64 fixes for -rc6
- Two KVM fixes for MMIO emulation and UBSAN
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl1hJBkACgkQt6xw3ITB
 YzTmiAgAxjvbjzgSM2osucgNekCKHPR6VX+tWpm6bUFXbc5J/s5pZXwDonSZG/ys
 oEREohIu667kDb6+ryyefs+a9fCxK8LUf1oQHx0GdaPUEhVzghMgYxtJUbtL3kw3
 HKDk9e7sb0tB7mkDVPI7hWCYPD2AjrmVu9j1G/DSXi57Iqc5FuDZwp8DTht7YdJm
 rRAR7sEWNjrHFAPOIRUH9eshw1KDz8iOsLYSpir4oAFXprPG0XyEXkXbtTXgeeet
 2onPBEM5RnBOjXJGK4EVtLgRVhYxq1EnhJ/JXH7/XKdlSaZAwf5QM/8jEUcSp2Sp
 sv5gNG6gquf7TbQlDbAifHf5HB6Wsg==
 =5rNC
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "Two KVM/arm fixes for MMIO emulation and UBSAN.

  Unusually, we're routing them via the arm64 tree as per Paolo's
  request on the list:

    https://lore.kernel.org/kvm/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com/

  We don't actually have any other arm64 fixes pending at the moment
  (touch wood), so I've pulled from Marc, written a merge commit, tagged
  the result and run it through my build/boot/bisect scripts"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity
  KVM: arm/arm64: Only skip MMIO insn once
2019-08-24 11:35:25 -07:00
Linus Torvalds
17d0fbf47e SCSI fixes on 20190824
Four fixes, three for edge conditions which don't occur very often.
 The lpfc fix mitigates memory exhaustion for some high CPU systems.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXWEBrCYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishds+AQCyQlgV
 TzSFQ1zvbAb3SNFdNsCzzb8Aq2vJC+RojF2VFgD/cJfE2fix9E7Nk8PCGwH1sgnf
 m5Glsvv8BEmmtoikrb8=
 =Ti6g
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four fixes, three for edge conditions which don't occur very often.
  The lpfc fix mitigates memory exhaustion for some high CPU systems"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ
  scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()
  scsi: target: tcmu: avoid use-after-free after command timeout
  scsi: qla2xxx: Fix gnl.l memory leak on adapter init failure
2019-08-24 11:26:51 -07:00
Linus Torvalds
8942230a7e Changes since last time:
- Fix a forgotten inode unlock when chown/chgrp fail due to quota.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl1gnj0ACgkQ+H93GTRK
 tOvAlA/8DE5Ff/itTrz7D+1JCGxZgLyD1osTn8ZFuqLn6gEOR36i/WD+7infM5Tr
 yowKvHXT3qOzAGGAyJFcjYkKx+wcYd7URR3105RFGVpd5FzW60lA/Cbzi7ecY7vL
 e2ukHeWBfOJGZsIuw/+E/sl6PeTmcq3NzHyLSHg2hYjcxTW6wxmvTbporC3Ns73L
 48AI39g1++1vz9W/T0wXNVGlDKih8gZIXtSTVqdbX3/sZ6C3dMiNqKUQTce+u/Nh
 KI6aELb8ClhWhBv8fBBlCRZ9Zl1iHKEB9Rj4vwotzK2Fm4jnYh1m0R6tuL8BK7jd
 H50qpokQ51RmtdWdicQ290S+XZi4kWpUaQiPl5f8Hf9UYj+M3Vg3zrwyx9O2xdnk
 Oj4LPG/gvkFtJM5A9hhmK2VvEUqmb04ikovdOy1cmUYJmfyX+78968uX7Fkq4kbR
 Gqk2m8zSxwbBxn8Io8jA0PsrQjrAU98rNibhHpcseSsmK2z44M6Ch+uXW8j9a4ws
 xllJ2R0wtm0o9phIaUiwhaBq8/j1m8fe+1haUSeeeByMOl3j/oHtk0T8p/zbMAvz
 EmMcF3Poe6vFeSXNZTqKuTVg9J445fKZizgouEtNmuBU/mYq9TkHjN6MaqwGDaMn
 n8zzzpgoW1YT9Yxf6u0CzBBVZgjapF9wg6Op4JuDdsl/DU//UI8=
 =gRWY
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Darrick Wong:
 "A single patch that fixes a xfs lockup problem when a chown/chgrp
  operation fails due to running out of quota. It has survived the usual
  xfstests runs and merges cleanly with this morning's master:

   - Fix a forgotten inode unlock when chown/chgrp fail due to quota"

* tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
2019-08-24 11:21:26 -07:00
Linus Torvalds
bc67b17eb9 drm fixes for 5.3-rc6 (part 2)
mediatek:
 - fix build in some cases
 
 nouveau:
 - fix hang with i2c and mst docks
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdYMgbAAoJEAx081l5xIa+GkIP/1k6WUcpg1dRKVT17KOz1dd1
 L03nK6yETBT2SrNRqZohQslkRhCY50BBXTZG93lvt2YQvnfLa6/CT2Obm6qzRaDG
 mcpkibniikt0UDPpU5QXixEndyfo22DNYc8lr80bFnzWa+AH3/VOH7wlkEiXGUr/
 hkc1FSu4LpBdHs0EQB8R5o3VJFOPDl/2ysvNitBZKXtZuAYswSja0ozxgasKSoG1
 NxTSVrV5FIlf5Fy1odoOmEpdC2KsJGEOM/hOxg2KxgIm3KqHRmjKyDFlrKCxrf2i
 TSu5ab4cgHDmyvZGb2nZsWS30eD/6Pw56RkqjGLTxT/CuuVMOVyzs6dfg+Jkpflr
 qEwzN6rT+aH8ZL4zQIBFZwwH5Z5Cq8TBPLKhh+86QS2IY9sZ7MQcc3R7IsLmbI3Q
 f1qcXeKeOZ66oLIhXttIDtlc8ZSUVy3fFAyaii0HwTGYDdbEkqgfW+rkI9tVob3m
 FyI61VVpimpwfX/F2a0gfwEUFPmCzi37CAreu+kbP+5kqJFpc8Vt/qjCLkOMC8ju
 ndXbvpw1dF4H6HQJqPT/SerluA/oWuHlhv/rSrdZhLPXxJPKTUZIGy78IMZxzIp5
 ZpnrN165oIO85l2MfiwkTPQ7YI6mIiLm2GKVvTkWuiZ9k+v6BLEEKcokmdXJ/TYB
 j4V9+SiVfMbJHarvb3CV
 =4afb
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2019-08-24' of git://anongit.freedesktop.org/drm/drm

Pull more drm fixes from Dave Airlie:
 "Although the tree built for me fine on arm here, it appears either
  header cleanups in next or some kconfig combo it breaks, so this
  contains a fix to mediatek to include dma-mapping.h explicitly.

  There was also one nouveau fix that came in late that I was going to
  leave until next week, but since I was sending this I thought it may
  as well be in here:

  mediatek:
   - fix build in some cases

  nouveau:
   - fix hang with i2c and mst docks"

* tag 'drm-fixes-2019-08-24' of git://anongit.freedesktop.org/drm/drm:
  drm/mediatek: include dma-mapping header
  drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX
2019-08-24 11:16:04 -07:00
Will Deacon
087eeea9ad KVM/arm fixes for 5.3, take #3
- Don't overskip instructions on MMIO emulation
 - Fix UBSAN splat when initializing PPI priorities
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl1gE4sPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDvT0QALDyWgNXugpyOTbuH01zgTT5W2PxWPnLT6bl
 yCN84C2falMjLgsvBGJo+HuD8nOTwCsam+6mVnbVOmXjDpFtsp1z/unJ9Cv9T4e+
 1/TSDgp1Y1wJsdfVMqLOj2LOJterVC65e+eRp4ShEaCaGl0QsJLQIZNndoycen8K
 XcwLokABKoypctGz/1XJD9fX0GeJdgJQ2dASVuccaWxvo0lrD5qoRlZUIdWKjTmn
 OneayyIB8Dqn2Ju/bQ9bbTzg5VLfw2L9lnrVAlaFnWZETWHAtG6uCK6Zj/eDNZj8
 TFBwXtKLbdJPQO+JR7l40QjvK/qkHdVaOp4M1kB+4niYogK23WlWvh7kZ/sZbAUb
 A1PRQ39L6f0LJrJJtWeS/bJyUmBnX4PJkwZMNV4EN4fXDi2+79/DxUKXih+im/WN
 W26WMAqFwxKiMSEENLfl4ladmrgo9SUBeI8QAnEgvUChCcy9HGpKgQp/KjImM9b3
 ab87VS8BUYfyThF7PPshfBteWg3rHPQY2kjRn7B8yRhCcoWBErGtXkEdIhxvtfjk
 hUgvT8CPk4uoh4DynqRxvDR16xMPwpUTtedVoZzIkGgG6ZLHAdX0303OLaRZ/KDl
 j6vKdm8rU5I4samalckcHuoP+t2Hmmdbd0JNo+BaiorbtBXXXJQWd++85tCMniTg
 kWGoHUn5
 =FGm5
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm/fixes

Pull KVM/arm fixes from Marc Zyngier as per Paulo's request at:

  https://lkml.kernel.org/r/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com

  "One (hopefully last) set of fixes for KVM/arm for 5.3: an embarassing
   MMIO emulation regression, and a UBSAN splat. Oh well...

   - Don't overskip instructions on MMIO emulation

   - Fix UBSAN splat when initializing PPI priorities"

* tag 'kvmarm-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm:
  KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity
  KVM: arm/arm64: Only skip MMIO insn once
2019-08-24 12:46:30 +01:00
Dave Airlie
7837951a12 drm/mediatek: include dma-mapping header
Although it builds fine here in my arm cross compile, it seems
either via some other patches in -next or some Kconfig combination,
this fails to build for everyone.

Include linux/dma-mapping.h should fix it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-08-24 15:09:20 +10:00
David S. Miller
211c462452 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2019-08-24

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.

2) Fix a use-after-free in prog symbol exposure, from Daniel.

3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.

4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.

5) Fix a potential use-after-free on flow dissector detach, from Jakub.

6) Fix bpftool to close prog fd after showing metadata, from Quentin.

7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23 17:34:11 -07:00
Ilya Leoshkevich
2c238177bd bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
test_select_reuseport fails on s390 due to verifier rejecting
test_select_reuseport_kern.o with the following message:

	; data_check.eth_protocol = reuse_md->eth_protocol;
	18: (69) r1 = *(u16 *)(r6 +22)
	invalid bpf_context access off=22 size=2

This is because on big-endian machines casts from __u32 to __u16 are
generated by referencing the respective variable as __u16 with an offset
of 2 (as opposed to 0 on little-endian machines).

The verifier already has all the infrastructure in place to allow such
accesses, it's just that they are not explicitly enabled for
eth_protocol field. Enable them for eth_protocol field by using
bpf_ctx_range instead of offsetof.

Ditto for ip_protocol, bind_inany and len, since they already allow
narrowing, and the same problem can arise when working with them.

Fixes: 2dbb9b9e6d ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-24 01:25:41 +02:00
Daniel Borkmann
c751798aa2 bpf: fix use after free in prog symbol exposure
syzkaller managed to trigger the warning in bpf_jit_free() which checks via
bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
in kallsyms, and subsequently trips over GPF when walking kallsyms entries:

  [...]
  8021q: adding VLAN 0 to HW filter on device batadv0
  8021q: adding VLAN 0 to HW filter on device batadv0
  WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events bpf_prog_free_deferred
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x113/0x167 lib/dump_stack.c:113
   panic+0x212/0x40b kernel/panic.c:214
   __warn.cold.8+0x1b/0x38 kernel/panic.c:571
   report_bug+0x1a4/0x200 lib/bug.c:186
   fixup_bug arch/x86/kernel/traps.c:178 [inline]
   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
   do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
  RIP: 0010:bpf_jit_free+0x1e8/0x2a0
  Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
  RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
  RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
  RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
  RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
  R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
  R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
  BUG: unable to handle kernel paging request at fffffbfff400d000
  #PF error: [normal kernel read fault]
  PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events bpf_prog_free_deferred
  RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
  RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
  RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
  RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
  RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
  Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
  [...]

Upon further debugging, it turns out that whenever we trigger this
issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
but yet bpf_jit_free() reported that the entry is /in use/.

Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
fd is exposed to the public, a parallel close request came in right
before we attempted to do the bpf_prog_kallsyms_add().

Given at this time the prog reference count is one, we start to rip
everything underneath us via bpf_prog_release() -> bpf_prog_put().
The memory is eventually released via deferred free, so we're seeing
that bpf_jit_free() has a kallsym entry because we added it from
bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.

Therefore, move both notifications /before/ we install the fd. The
issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
because upon bpf_prog_get_fd_by_id() we'll take another reference to
the BPF prog, so we're still holding the original reference from the
bpf_prog_load().

Fixes: 6ee52e2a3f ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
Fixes: 74451e66d5 ("bpf: make jited programs visible in traces")
Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Song Liu <songliubraving@fb.com>
2019-08-24 01:17:47 +02:00
Alexei Starovoitov
6754172c20 bpf: fix precision tracking in presence of bpf2bpf calls
While adding extra tests for precision tracking and extra infra
to adjust verifier heuristics the existing test
"calls: cross frame pruning - liveness propagation" started to fail.
The root cause is the same as described in verifer.c comment:

 * Also if parent's curframe > frame where backtracking started,
 * the verifier need to mark registers in both frames, otherwise callees
 * may incorrectly prune callers. This is similar to
 * commit 7640ead939 ("bpf: verifier: make sure callees don't prune with caller differences")
 * For now backtracking falls back into conservative marking.

Turned out though that returning -ENOTSUPP from backtrack_insn() and
doing mark_all_scalars_precise() in the current parentage chain is not enough.
Depending on how is_state_visited() heuristic is creating parentage chain
it's possible that callee will incorrectly prune caller.
Fix the issue by setting precise=true earlier and more aggressively.
Before this fix the precision tracking _within_ functions that don't do
bpf2bpf calls would still work. Whereas now precision tracking is completely
disabled when bpf2bpf calls are present anywhere in the program.

No difference in cilium tests (they don't have bpf2bpf calls).
No difference in test_progs though some of them have bpf2bpf calls,
but precision tracking wasn't effective there.

Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-24 01:17:12 +02:00
Jakub Sitnicki
db38de3968 flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
free the program once a grace period has elapsed. The callback can run
together with new RCU readers that started after the last grace period.
New RCU readers can potentially see the "old" to-be-freed or already-freed
pointer to the program object before the RCU update-side NULLs it.

Reorder the operations so that the RCU update-side resets the protected
pointer before the end of the grace period after which the program will be
freed.

Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-24 01:15:34 +02:00
Heiner Kallweit
345b93265b Revert "r8169: remove not needed call to dma_sync_single_for_device"
This reverts commit f072218cca.

As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.

Fixes: f072218cca ("r8169: remove not needed call to dma_sync_single_for_device")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23 15:11:26 -07:00
Linus Torvalds
9140d8bdd4 Pull request for 5.3-rc5
- Fix siw buffer mapping issue
 - Fix siw 32/64 casting issues
 - Fix a KASAN access issue in bnxt_re
 - Fix several memory leaks (hfi1, mlx4)
 - Fix a NULL deref in cma_cleanup
 - Fixes for UMR memory support in mlx5 (4 patch series)
 - Fix namespace check for restrack
 - Fixes for counter support
 - Fixes for hfi1 TID processing (5 patch series)
 - Fix potential NULL deref in siw
 - Fix memory page calculations in mlx5
 
 Signed-off-by: Doug Ledford <dledford@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEErmsb2hIrI7QmWxJ0uCajMw5XL90FAl1gLY8ACgkQuCajMw5X
 L930yxAAsPMWG9B8TYT80M/+4iA0SP2o9WqJ6VjWt5j8ArcjWKHb9aepmSxMViUq
 T+W+ZLqk9tCfqEv88Z4T4iXBhwlzqzz6Xj5goohL2L4sOViit+YoNdx9stfW3yNh
 DPOfoxwIehxMiy00OGViQ1F/nC4KeyTYtMtoPgnYeB/7Jqzc20ipkZNopi6MIufn
 xSAwwaatzvj00nB1b+DC1eu9IzLWBjvzMmhPI9GBpgYTC6if43Q6PBwp6+hdag0K
 jNMHvO2BvtjtMBiZsFtaO3wu2gKIgR5CMxVMsQ5wnhZMDE3kQI7Ilrl0za0RdwMa
 +XMjn7mzExUjeNy2kXIxl3i9oH5s/iSmA1bqmcsG1q0dRmg7eI0iaB+ZTonpzMOB
 +7oTeKnklR0Q6vQ+rKu24vEND86l4eAWcOhqFat2jzCWOYMgRsnq0NCxzvG6NduF
 MdBNDkOIN+SmC+/9tMIwuUerZSrXGm2C6x4T6+YPYgxDOf+iKFm6g6lkGDrNCLsR
 g7qDRCWxlhAOqjlnFuH2T56IEmlFPC8RDA9sICxUnjH295ucBfO7w5fGRX9eCc0N
 0WrAlKbE9/4Hu7B2RN3slwJf2WpmE3ceOdEHSvfY2lje1SVXzYAl8o1bus8tSg3G
 YfWKHkCwA+h8X81lS3KHmlJ9eJnPdUwwBmOrcPHR2aQAw7uboPw=
 =bLpd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "No beating around the bush: this is a monster pull request for an -rc5
  kernel. Intel hit me with a series of fixes for TID processing.
  Mellanox hit me with a series for their UMR memory support.

  And we had one fix for siw that fixes the 32bit build warnings and
  because of the number of casts that had to be changed to properly
  silence the warnings, that one patch alone is a full 40% of the LOC of
  this entire pull request. Given that this is the initial release
  kernel for siw, I'm trying to fix anything in it that we can, so that
  adds to the impetus to take fixes for it like this one.

  I had to do a rebase early in the week. Jason had thought he put a
  patch on the rc queue that he needed to be there so he could base some
  work off of it, and it had actually not been placed there. So he asked
  me (on Tuesday) to fix that up before pushing my wip branch to the
  official rc branch. I did, and that's why the early patches look like
  they were all committed at the same time on Tuesday. That bunch had
  been in my queue prior.

  The various patches all pass my test for being legitimate fixes and
  not attempts to slide new features or development into a late rc.
  Well, they were all fixes with the exception of a couple clean up
  patches people wrote for making the fixes they also wrote better (like
  a cleanup patch to move UMR checking into a function so that the
  remaining UMR fix patches can reference that function), so I left
  those in place too.

  My apologies for the LOC count and the number of patches here, it's
  just how the cards fell this cycle.

  Summary:

   - Fix siw buffer mapping issue

   - Fix siw 32/64 casting issues

   - Fix a KASAN access issue in bnxt_re

   - Fix several memory leaks (hfi1, mlx4)

   - Fix a NULL deref in cma_cleanup

   - Fixes for UMR memory support in mlx5 (4 patch series)

   - Fix namespace check for restrack

   - Fixes for counter support

   - Fixes for hfi1 TID processing (5 patch series)

   - Fix potential NULL deref in siw

   - Fix memory page calculations in mlx5"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (21 commits)
  RDMA/siw: Fix 64/32bit pointer inconsistency
  RDMA/siw: Fix SGL mapping issues
  RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message
  infiniband: hfi1: fix memory leaks
  infiniband: hfi1: fix a memory leak bug
  IB/mlx4: Fix memory leaks
  RDMA/cma: fix null-ptr-deref Read in cma_cleanup
  IB/mlx5: Block MR WR if UMR is not possible
  IB/mlx5: Fix MR re-registration flow to use UMR properly
  IB/mlx5: Report and handle ODP support properly
  IB/mlx5: Consolidate use_umr checks into single function
  RDMA/restrack: Rewrite PID namespace check to be reliable
  RDMA/counters: Properly implement PID checks
  IB/core: Fix NULL pointer dereference when bind QP to counter
  IB/hfi1: Drop stale TID RDMA packets that cause TIDErr
  IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packet
  IB/hfi1: Add additional checks when handling TID RDMA READ RESP packet
  IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet
  IB/hfi1: Drop stale TID RDMA packets
  RDMA/siw: Fix potential NULL de-ref
  ...
2019-08-23 14:53:09 -07:00
Sabrina Dubroca
db0b99f59a ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
ignoring the specific error value. This results in addrconf_add_dev
returning ENOBUFS in all cases, which is unfortunate in cases such as:

    # ip link add dummyX type dummy
    # ip link set dummyX mtu 1200 up
    # ip addr add 2000::/64 dev dummyX
    RTNETLINK answers: No buffer space available

Commit a317a2f19d ("ipv6: fail early when creating netdev named all
or default") introduced error returns in ipv6_add_dev. Before that,
that function would simply return NULL for all failures.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23 14:53:06 -07:00
Linus Torvalds
b9bd6806d0 for-linus-20190823
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1gLIsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnNgD/9SVVtQ6DpSyPojSxVrcAfbH7n0Y+62Mfzs
 yWeCpYvmxTd2APWAVtGeBh74uH58MYqwHBp6IKF1713WwENDpv5cDXtHCNi+d3xI
 KulR9SQSC0wCIov7ak43TeKwuIUjn0cVz9VdrmaXLlp5f5nzEeNDixIlxaDXm1sf
 PGksrXxnMnxKJU00uaW3J05E7GW/6kUDYq2IuG26cIkdA6c4TCj+y8uSnn2RNIsc
 KeynzPx9UyX40weoLhb1HTi2HzZ+Cfz7t34kZZeluaJOiFkBdS5G/1sBf2MWdPwd
 ZdpKCC86SmZF87pk9B455DALj3tqrvtym3nCn2HQ8jiNsgSqmUl+qTseH5OpLLbB
 AL6OzSMh5HZ1g+hsBPgATVlb3GyJoSno3BZMAe+dTgu+wcv1sowajpm3p4rEQcbk
 p6RmdmCz8mdCGuC0wWpVtQVk7nE0EKIBDMggM2T3dvRPkSTiep2Zdjg1iu/6HNlW
 RSIWtcqo8H3CgOi7EcFjbHGLJ0kt98MUXcUHBTbwdGmRGhxbTUyKENL3FeWGiSZ/
 Ojmnv4grdBch2rI4wmyenqnL/eQ37Mzr1nW5ZkHkcf27MP/v8HEhRDwS1a+YQr1x
 acEsy7OC6nDyycsamWgSavm+x5t0zWWOjl6O92UbnZ3pvIkeoReXLbH9sjzzjj0c
 VvBO9UArSg==
 =uM7/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190823' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Here's a set of fixes that should go into this release. This contains:

   - Three minor fixes for NVMe.

   - Three minor tweaks for the io_uring polling logic.

   - Officially mark Song as the MD maintainer, after he's been filling
     that role sucessfully for the last 6 months or so"

* tag 'for-linus-20190823' of git://git.kernel.dk/linux-block:
  io_uring: add need_resched() check in inner poll loop
  md: update MAINTAINERS info
  io_uring: don't enter poll loop if we have CQEs pending
  nvme: Add quirk for LiteON CL1 devices running FW 22301111
  nvme: Fix cntlid validation when not using NVMEoF
  nvme-multipath: fix possible I/O hang when paths are updated
  io_uring: fix potential hang with polled IO
2019-08-23 14:45:45 -07:00
Linus Torvalds
dd469a4560 - Revert a DM bufio change from during the 5.3 merge window now that a
proper fix has been made to the block loopback driver.
 
 - Fix DM kcopyd to wakeup so failed subjobs get completed.
 
 - Various fixes to DM zoned target to address error handling, and other
   small tweaks (SPDX license identifiers and fix typos).
 
 - Fix DM integrity range locking race by tracking whether journal has
   changed.
 
 - Fix DM dust target to detect reads of badblocks beyond the first 512b
   sector (applicable if blocksize is larger than 512b).
 
 - Fix DM persistent-data issue in both the DM btree and DM
   space-map-metadata interfaces.
 
 - Fix out of bounds memory access with certain DM table configurations.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl1gCAITHHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWmKwB/kBsKiN2Vt1a4RuwUvLvEr9aijZ3HEe
 l6lwZ8rB6WRDAc4rEbteqKbCMvjg1RMZwkzL3RPrtWtjYdsdC/yJzHGETIym3Ckd
 0s1nfZgJ7jWFilwR5/RJ9bFYADjqUwAKdzc49sAT/aEPEaQywYrV7ZiD9rVZf/o5
 oQxDMps/zWbayeF2oS1tyb7m1qi8xN3yGe575vXaj+ag+10JbGiYcSObLUwyYCJu
 WqELCL3JMiaC6QkZjZWpV99V9+0yO/Px0zwuq6jRSx6VAgKGLV2CoFk0ibsRa/vI
 8IyeMwybRfSzUqMnzeh57F1H0FXrvYnD6c8obnDlGP28ZSRQQJvfm3TQ
 =R5Dn
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Revert a DM bufio change from during the 5.3 merge window now that a
   proper fix has been made to the block loopback driver.

 - Fix DM kcopyd to wakeup so failed subjobs get completed.

 - Various fixes to DM zoned target to address error handling, and other
   small tweaks (SPDX license identifiers and fix typos).

 - Fix DM integrity range locking race by tracking whether journal has
   changed.

 - Fix DM dust target to detect reads of badblocks beyond the first 512b
   sector (applicable if blocksize is larger than 512b).

 - Fix DM persistent-data issue in both the DM btree and DM
   space-map-metadata interfaces.

 - Fix out of bounds memory access with certain DM table configurations.

* tag 'for-5.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm table: fix invalid memory accesses with too high sector number
  dm space map metadata: fix missing store of apply_bops() return value
  dm btree: fix order of block initialization in btree_split_beneath
  dm raid: add missing cleanup in raid_ctr()
  dm zoned: fix potential NULL dereference in dmz_do_reclaim()
  dm dust: use dust block size for badblocklist index
  dm integrity: fix a crash due to BUG_ON in __journal_read_write()
  dm zoned: fix a few typos
  dm zoned: add SPDX license identifiers
  dm zoned: properly handle backing device failure
  dm zoned: improve error handling in i/o map code
  dm zoned: improve error handling in reclaim
  dm kcopyd: always complete failed jobs
  Revert "dm bufio: fix deadlock with loop device"
2019-08-23 10:53:34 -07:00
Linus Torvalds
f576518c9a Changes since last update:
- Fix missing compat ioctl handling for get/setlabel
 - Fix missing ioctl pointer sanitization on s390
 - Fix a page locking deadlock in the dedupe comparison code
 - Fix inadequate locking in reflink code w.r.t. concurrent directio
 - Fix broken error detection when breaking layouts
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl1cEXsACgkQ+H93GTRK
 tOsXlhAAiUowRArnwXnqR+5Z7e3nyFZOeL0DTJHVE3UpKABz/NBnevQgsy70Bqmk
 mo27ANMY8y9i7zatuCvM9UX8PXnOdaUKwoey8j5BB44iaEAkz9afeOt09PuCe141
 sNucDjq7yQWkhDNd38lujpcXMNqlVNDkDtpYGx8ArzdVaEJfudqgHFqR+lnL2LRH
 xylaJprOxcE6tCFmCVsvQmlnIbuCMWF1e7B5IA0Aoh6dLTWdD8nRNbPi9PNp3nbK
 c7UvsDcl2SrngXFbdgGCexmguKT29va8t/GkwRVPmhXgu/hslOIcZPhqIti/LG2w
 7u6CuvTa22xIA0yX9utCSq04HSKRsDKygPpYuI3U10caKmvUsvXpMFZ3goktqAgd
 8pUZpapMGORe2W+b5Wa1vi5/wv+MKMOxeeAoui38KyDJvFNOADT6hlQ//GfuJSph
 /4d7BKcZFykWEl/NI2tzaoiCzHy3ObdBTi3eloNjFE/KxVKKuBbjX/j6YisyhUpW
 i6/i4i1POp5E41tM3u17cC2DmgYiqFCzg799yrt1QBgqOCVZvGyOHR4X2B4AFWSh
 RALHKS2hBdzDIIRwLJVzA428kRMRptRviELgluJLLvx7fIrhGJ3URNzFBVty+fJi
 YG8d1WUHcxLamO3ayjydyWCgO7W8tWOP/jCOGe/2apU+hCNZFUk=
 =50ZB
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Here are a few more bug fixes that trickled in since the last pull.
  They've survived the usual xfstests runs and merge cleanly with this
  morning's master.

  I expect there to be one more pull request tomorrow for the fix to
  that quota related inode unlock bug that we were reviewing last night,
  but it will continue to soak in the testing machine for several more
  hours.

   - Fix missing compat ioctl handling for get/setlabel

   - Fix missing ioctl pointer sanitization on s390

   - Fix a page locking deadlock in the dedupe comparison code

   - Fix inadequate locking in reflink code w.r.t. concurrent directio

   - Fix broken error detection when breaking layouts"

* tag 'xfs-5.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fs/xfs: Fix return code of xfs_break_leased_layouts()
  xfs: fix reflink source file racing with directio writes
  vfs: fix page locking deadlocks when deduping files
  xfs: compat_ioctl: use compat_ptr()
  xfs: fall back to native ioctls for unhandled compat ones
2019-08-23 10:49:44 -07:00