Commit graph

181036 commits

Author SHA1 Message Date
Adam Litke
1f34c71afe virtio: Fix scheduling while atomic in virtio_balloon stats
This is a fix for my earlier patch: "virtio: Add memory statistics reporting to
the balloon driver (V4)".

I discovered that all_vm_events() can sleep and therefore stats collection
cannot be done in interrupt context.  One solution is to handle the interrupt
by noting that stats need to be collected and waking the existing vballoon
kthread which will complete the work via stats_handle_request().  Rusty, is
this a saner way of doing business?

There is one issue that I would like a broader opinion on.  In stats_request, I
update vb->need_stats_update and then wake up the kthread.  The kthread uses
vb->need_stats_update as a condition variable.  Do I need a memory barrier
between the update and wake_up to ensure that my kthread sees the correct
value?  My testing suggests that it is not needed but I would like some
confirmation from the experts.

Signed-off-by: Adam Litke <agl@us.ibm.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 14:22:14 +10:30
Adam Litke
9564e138b1 virtio: Add memory statistics reporting to the balloon driver (V4)
Changes since V3:
 - Do not do endian conversions as they will be done in the host
 - Report stats that reference a quantity of memory in bytes
 - Minor coding style updates

Changes since V2:
 - Increase stat field size to 64 bits
 - Report all sizes in kb (not pages)
 - Drop anon_pages stat and fix endianness conversion

Changes since V1:
 - Use a virtqueue instead of the device config space

When using ballooning to manage overcommitted memory on a host, a system for
guests to communicate their memory usage to the host can provide information
that will minimize the impact of ballooning on the guests.  The current method
employs a daemon running in each guest that communicates memory statistics to a
host daemon at a specified time interval.  The host daemon aggregates this
information and inflates and/or deflates balloons according to the level of
host memory pressure.  This approach is effective but overly complex since a
daemon must be installed inside each guest and coordinated to communicate with
the host.  A simpler approach is to collect memory statistics in the virtio
balloon driver and communicate them directly to the hypervisor.

This patch enables the guest-side support by adding stats collection and
reporting to the virtio balloon driver.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor fixes)
2010-02-24 14:22:08 +10:30
Jamie Lokier
1f08b833dd Add __devexit_p around reference to virtio_pci_remove
This is needed to compile with CONFIG_VIRTIO_PCI=y,
because virtio_pci_remove is marked __devexit.

Signed-off-by: Jamie Lokier <jamie@shareable.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 14:22:04 +10:30
Linus Torvalds
75ef7cdda2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net: bug fix for vlan + gro issue
  tc35815: Remove a wrong netif_wake_queue() call which triggers BUG_ON
  cdc_ether: new PID for Ericsson C3607w to the whitelist (resubmit)
  IPv6: better document max_addresses parameter
  MAINTAINERS: update mv643xx_eth maintenance status
  e1000: Fix DMA mapping error handling on RX
  iwlwifi: sanity check before counting number of tfds can be free
  iwlwifi: error checking for number of tfds in queue
  iwlwifi: set HT flags after channel in rxon
2010-02-23 19:44:07 -08:00
Ajit Khaparde
c4d49794ff net: bug fix for vlan + gro issue
Traffic (tcp) doesnot start on a vlan interface when gro is enabled.
Even the tcp handshake was not taking place.
This is because, the eth_type_trans call before the netif_receive_skb
in napi_gro_finish() resets the skb->dev to napi->dev from the previously
set vlan netdev interface. This causes the ip_route_input to drop the
incoming packet considering it as a packet coming from a martian source.

I could repro this on 2.6.32.7 (stable) and 2.6.33-rc7.
With this fix, the traffic starts and the test runs fine on both vlan
and non-vlan interfaces.

CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-23 19:09:31 -08:00
Linus Torvalds
be64c970f6 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI: Be in TS_POLLING state during mwait based C-state entry
  ACPI: Fix regression where _PPC is not read at boot even when ignore_ppc=0
  acer-wmi: Respect current backlight level when loading
2010-02-23 18:15:05 -08:00
Linus Torvalds
34e3f91b4e Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/vmwgfx: Fix queries if no dma buffer thrashing is occuring.
  drm/nv50: fix vram ptes on IGPs to point at stolen system memory
  drm/nv50: fix instmem binding on IGPs to point at stolen system memory
  drm/nv50: improve vram page table construction
  drm/nv50: more efficient clearing of gpu page table entries
  drm/nv50: make nv50_mem_vm_{bind,unbind} operate only on vram
  drm/nouveau: Fix up pre-nv17 analog load detection.
2010-02-23 18:13:34 -08:00
Hedi Berriche
f7624c97b8 [IA64] Fix broken sn2 build
Revert the change made to arch/ia64/sn/kernel/setup.c by commit
204fba4aa3 as it breaks the build.

Fixing the build the b94b08081f way
breaks xpc because genksyms then fails to generate an CRC for
per_cpu____sn_cnodeid_to_nasid because of limitations in the
generic genksyms code.

Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2010-02-23 16:07:25 -08:00
Bjorn Helgaas
7bc5e3f2be x86/PCI: use host bridge _CRS info by default on 2008 and newer machines
The main benefit of using ACPI host bridge window information is that
we can do better resource allocation in systems with multiple host bridges,
e.g., http://bugzilla.kernel.org/show_bug.cgi?id=14183

Sometimes we need _CRS information even if we only have one host bridge,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/341681

Most of these systems are relatively new, so this patch turns on
"pci=use_crs" only on machines with a BIOS date of 2008 or newer.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:42 -08:00
Bjorn Helgaas
2fe2abf896 PCI: augment bus resource table with a list
Previously we used a table of size PCI_BUS_NUM_RESOURCES (16) for resources
forwarded to a bus by its upstream bridge.  We've increased this size
several times when the table overflowed.

But there's no good limit on the number of resources because host bridges
and subtractive decode bridges can forward any number of ranges to their
secondary buses.

This patch reduces the table to only PCI_BRIDGE_RESOURCE_NUM (4) entries,
which corresponds to the number of windows a PCI-to-PCI (3) or CardBus (4)
bridge can positively decode.  Any additional resources, e.g., PCI host
bridge windows or subtractively-decoded regions, are kept in a list.

I'd prefer a single list rather than this split table/list approach, but
that requires simultaneous changes to every architecture.  This approach
only requires immediate changes where we set up (a) host bridges with more
than four windows and (b) subtractive-decode P2P bridges, and we can
incrementally change other architectures to use the list.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:37 -08:00
Bjorn Helgaas
89a74ecccd PCI: add pci_bus_for_each_resource(), remove direct bus->resource[] refs
No functional change; this converts loops that iterate from 0 to
PCI_BUS_NUM_RESOURCES through pci_bus resource[] table to use the
pci_bus_for_each_resource() iterator instead.

This doesn't change the way resources are stored; it merely removes
dependencies on the fact that they're in a table.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:31 -08:00
Bjorn Helgaas
2adf75160b PCI: read bridge windows before filling in subtractive decode resources
No functional change; this fills in the bus subtractive decode resources
after reading the bridge window information rather than before.  Also,
print out the subtractive decode resources as we already do for the
positive decode windows.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:25 -08:00
Bjorn Helgaas
fa27b2d108 PCI: split up pci_read_bridge_bases()
No functional change; this breaks up pci_read_bridge_bases() into separate
pieces for the I/O, memory, and prefetchable memory windows, similar to how
Yinghai recently split up pci_setup_bridge() in 68e84ff3bdc.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:17 -08:00
David S. Miller
675c60706c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-02-23 01:27:05 -08:00
Atsushi Nemoto
662a96bd6f tc35815: Remove a wrong netif_wake_queue() call which triggers BUG_ON
The netif_wake_queue() is called correctly (i.e. only on !txfull
condition) from txdone routine.  So Unconditional call to the
netif_wake_queue() here is wrong.  This might cause calling of
start_xmit routine on txfull state and trigger BUG_ON.

This bug does not happen when NAPI disabled.  After txdone there
must be at least one free tx slot.  But with NAPI, this is not
true anymore and the BUG_ON can hits on heavy load.

In this driver NAPI was enabled on 2.6.33-rc1 so this is
regression from 2.6.32 kernel.

Reported-by: Ralf Roesch <ralf.roesch@rw-gmbh.de>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-23 01:25:01 -08:00
Torgny Johansson
cac43a1b7b cdc_ether: new PID for Ericsson C3607w to the whitelist (resubmit)
This patch adds a new vid/pid to the cdc_ether whitelist.

Device added:
- Ericsson Mobile Broadband variant C3607w

Signed-off-by: Torgny Johansson <torgny.johansson@gmail.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-23 01:25:01 -08:00
Brian Haley
e79dc48431 IPv6: better document max_addresses parameter
Andrew Morton wrote:
>> >From ip-sysctl.txt file in kernel documentation I can see following description
>> for max_addresses:
>> max_addresses - INTEGER
>>         Number of maximum addresses per interface.  0 disables limitation.
>>         It is recommended not set too large value (or 0) because it would
>>         be too easy way to crash kernel to allow to create too much of
>>         autoconfigured addresses.
           ^^^^^^^^^^^^^^

>> If this parameter applies only for auto-configured IP addressed, please state
>> it more clearly in docs or rename the parameter to show that it refers to
>> auto-configuration.

It did mention autoconfigured in the text, but the below makes it more obvious.

More clearly document IPv6 max_addresses parameter.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-23 01:25:00 -08:00
Lennert Buytenhek
f5ca8502f7 MAINTAINERS: update mv643xx_eth maintenance status
I am no longer with Marvell.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-23 01:25:00 -08:00
Anton Blanchard
b5abb028e2 e1000: Fix DMA mapping error handling on RX
Check for error return from pci_map_single/pci_map_page and clean up.

With this and the previous patch the driver was able to handle a significant
percentage of errors (I set the fault injection rate to 10% and could still
download large files at a reasonable speed).

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-23 01:24:59 -08:00
Jens Axboe
79da0644a8 Revert "block: improve queue_should_plug() by looking at IO depths"
This reverts commit fb1e75389b.

"Benjamin S." <sbenni@gmx.de> reports that the patch in question
causes a big drop in sequential throughput for him, dropping from
200MB/sec down to only 70MB/sec.

Needs to be investigated more fully, for now lets just revert the
offending commit.

Conflicts:

	include/linux/blkdev.h

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-23 08:40:43 +01:00
Thomas Hellstrom
4e4ddd4777 drm/vmwgfx: Fix queries if no dma buffer thrashing is occuring.
Intercept query commands and apply relocations to their guest pointers.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-02-23 15:42:36 +10:00
Dave Airlie
f7072e00f0 Merge remote branch 'nouveau/for-airlied' of ../drm-nouveau-next into drm-linus
* 'nouveau/for-airlied' of ../drm-nouveau-next:
  drm/nv50: fix vram ptes on IGPs to point at stolen system memory
  drm/nv50: fix instmem binding on IGPs to point at stolen system memory
  drm/nv50: improve vram page table construction
  drm/nv50: more efficient clearing of gpu page table entries
  drm/nv50: make nv50_mem_vm_{bind,unbind} operate only on vram
  drm/nouveau: Fix up pre-nv17 analog load detection.
2010-02-23 15:42:18 +10:00
Len Brown
b2cb9dcb98 Merge branch 'pcc' into release 2010-02-23 00:39:00 -05:00
Len Brown
e4f23f66ed Merge branches 'bugzilla-14207' and 'idle' into release 2010-02-23 00:19:48 -05:00
Linus Torvalds
9f3a628488 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  MIPS: BCM47xx: Fix 128MB RAM support
  MIPS: Highmem: Fix build error
2010-02-22 19:51:39 -08:00
Linus Torvalds
26b0833366 Merge branch 'parisc/tracehook' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'parisc/tracehook' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  Revert "parisc: HAVE_ARCH_TRACEHOOK"
2010-02-22 19:51:13 -08:00
Michael Neuling
a17e18790a fs/exec.c: fix initial stack reservation
803bf5ec25 ("fs/exec.c: restrict initial
stack space expansion to rlimit") attempts to limit the initial stack to
20*PAGE_SIZE.  Unfortunately, in attempting ensure the stack is not
reduced in size, we ended up not changing the stack at all.

This size reduction check is not necessary as the expand_stack call does
this already.

This caused a regression in UML resulting in most guest processes being
killed.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jouni Malinen <j@w1.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-22 19:50:34 -08:00
Marcin Slusarz
89f3f21990 efifb: fix framebuffer handoff
Commit 4410f39109 ("fbdev: add support for
handoff from firmware to hw framebuffers") didn't add fb_destroy
operation to efifb.  Fix it and change aperture_size to match size
passed to request_mem_region.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=15151

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Reported-by: Alex Zhavnerchik <alex.vizor@gmail.com>
Tested-by: Alex Zhavnerchik <alex.vizor@gmail.com>
Acked-by: Peter Jones <pjones@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-22 19:50:34 -08:00
Jens Rottmann
115079aad9 geode-mfgpt: restore previous behavior for selecting IRQ
geode-mfgpt: restore previous behavior for selecting IRQ

The MFGPT IRQ used to be, in order of decreasing priority,
 * IRQ supplied by the user as a boot-time parameter,
 * IRQ previously set by the BIOS or another driver,
 * default IRQ given at compile time.

Return to this behavior, which got broken when splitting the
MFGPT/clocksource driver for 2.6.33-rc1.

Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de>
Acked-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-22 19:50:34 -08:00
Tejun Heo
d2e7276b6b idr: fix a critical misallocation bug, take#2
This is retry of reverted 859ddf0974
("idr: fix a critical misallocation bug") which contained two bugs.

* pa[idp->layers] should be cleared even if it's not used by
  sub_alloc() because it's used by mark idr_mark_full().

* The original condition check also assigned pa[l] to p which the new
  code didn't do thus leaving p pointing at the wrong layer.

Both problems have been fixed and the idr code has received good amount
testing using userland testing setup where simple bitmap allocator is
run parallel to verify the result of idr allocation.

The bug this patch fixes is caused by sub_alloc() optimization path
bypassing out-of-room condition check and restarting allocation loop
with starting value higher than maximum allowed value.  For detailed
description, please read commit message of 859ddf09.

Signed-off-by: Tejun Heo <tj@kernel.org>
Based-on-patch-from: Eric Paris <eparis@redhat.com>
Reported-by: Eric Paris <eparis@redhat.com>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-22 19:50:34 -08:00
Tetsuo Handa
701188374b kernel/sys.c: fix missing rcu protection for sys_getpriority()
find_task_by_vpid() is not safe without rcu_read_lock().  2.6.33-rc7 got
RCU protection for sys_setpriority() but missed it for sys_getpriority().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-22 19:50:34 -08:00
KAMEZAWA Hiroyuki
5a2d41961d memcg: fix oom killing a child process in an other cgroup
Presently the oom-killer is memcg aware and it finds the worst process
from processes under memcg(s) in oom.  Then, it kills victim's child
first.

It may kill a child in another cgroup and may not be any help for
recovery.  And it will break the assumption users have.

This patch fixes it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-22 19:50:34 -08:00
Ben Skeggs
6c42966768 drm/nv50: fix vram ptes on IGPs to point at stolen system memory
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-02-23 13:50:24 +10:00
Ben Skeggs
76befb8c30 drm/nv50: fix instmem binding on IGPs to point at stolen system memory
This also modifies the unused PRAMIN PT entries to be all zeroes, can't
really recall why I used 9/0 initially, just that it didn't work for
some reason.  It was likely masking a bug elsewhere that's since been
fixed.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-02-23 13:50:21 +10:00
Ben Skeggs
531e77139f drm/nv50: improve vram page table construction
This commit changes nouveau to construct PTEs which look very much like
the ones the binary driver creates.

I presume that filling multiple PTEs identically with length flags and
the physical address of the start of a block of VRAM is a hint to the
memory controller that it need not perform additional page table lookups
for that range of addresses.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-02-23 13:50:03 +10:00
Ben Skeggs
4c27bd339d drm/nv50: more efficient clearing of gpu page table entries
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-02-23 13:49:58 +10:00
Ben Skeggs
66b6ebaccb drm/nv50: make nv50_mem_vm_{bind,unbind} operate only on vram
GART is handled elsewhere, no reason to have the code for it here too.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-02-23 13:41:16 +10:00
Francisco Jerez
e7e65caefd drm/nouveau: Fix up pre-nv17 analog load detection.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-02-23 13:41:05 +10:00
Kenji Kaneshige
b16694f70c PCIe PME: use pci_pcie_cap()
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe
capability offset. This reduces redundant search in PCI configuration
space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:21:21 -08:00
Rafael J. Wysocki
6cbf82148f PCI PM: Run-time callbacks for PCI bus type
Introduce run-time PM callbacks for the PCI bus type.  Make the new
callbacks work in analogy with the existing system sleep PM
callbacks, so that the drivers already converted to struct dev_pm_ops
can use their suspend and resume routines for run-time PM without
modifications.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:21:19 -08:00
Kenji Kaneshige
552be54cc4 PCIe PME: use pci_is_pcie()
Use pci_is_pcie() instead of looking at obsolete is_pcie field in
struct pci_dev.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:21:10 -08:00
Rafael J. Wysocki
b67ea76172 PCI / ACPI / PM: Platform support for PCI PME wake-up
Although the majority of PCI devices can generate PMEs that in
principle may be used to wake up devices suspended at run time,
platform support is generally necessary to convert PMEs into wake-up
events that can be delivered to the kernel.  If ACPI is used for this
purpose, PME signals generated by a PCI device will trigger the ACPI
GPE associated with the device to generate an ACPI wake-up event that
we can set up a handler for, provided that everything is configured
correctly.

Unfortunately, the subset of PCI devices that have GPEs associated
with them is quite limited.  The devices without dedicated GPEs have
to rely on the GPEs associated with other devices (in the majority of
cases their upstream bridges and, possibly, the root bridge) to
generate ACPI wake-up events in response to PME signals from them.

Add ACPI platform support for PCI PME wake-up:
o Add a framework making is possible to use ACPI system notify
  handlers for run-time PM.
o Add new PCI platform callback ->run_wake() to struct
  pci_platform_pm_ops allowing us to enable/disable the platform to
  generate wake-up events for given device.  Implemet this callback
  for the ACPI platform.
o Define ACPI wake-up handlers for PCI devices and PCI root buses and
  make the PCI-ACPI binding code register wake-up notifiers for all
  PCI devices present in the ACPI tables.
o Add function pci_dev_run_wake() which can be used by PCI drivers to
  check if given device is capable of generating wake-up events at
  run time.

Developed in cooperation with Matthew Garrett <mjg@redhat.com>.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:21:02 -08:00
Rafael J. Wysocki
3f0be67188 ACPI / ACPICA: Multiple system notify handlers per device
Currently it only is possible to install one system notify handler
per namespace node, but this is not enough for PCI run-time power
management, because we need to install power management notifiers for
devices that already have hotplug notifiers installed.  While in
principle this could be handled at the PCI level, that would be
suboptimal due to the way in which the ACPI-based PCI hotplug code is
designed.

For this reason, modify ACPICA so that it is possible to install more
than one system notify handler per namespace node.  Namely, make
acpi_install_notify_handler(), acpi_remove_notify_handler() and
acpi_ev_notify_dispatch() use a list of system notify handler objects
associated with a namespace node.

Make acpi_remove_notify_handler() call acpi_os_wait_events_complete()
upfront to avoid a situation in which concurrent instance of
acpi_remove_notify_handler() removes the handler from under us while
we're waiting for the event queues to flush.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:20:56 -08:00
Rafael J. Wysocki
f517709d65 ACPI / PM: Add more run-time wake-up fields
Use the run_wake flag to mark all devices for which run-time wake-up
events may be generated by the platform.  Introduce a new wake-up
flag, always_enabled, for marking devices that should be permanently
enabled to generate run-time events.  Also, introduce a reference
counter for run-wake devices and a function that will initialize all
of the run-time wake-up fields for given device.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:20:51 -08:00
Rafael J. Wysocki
9630bdd9b1 ACPI: Use GPE reference counting to support shared GPEs
ACPI GPEs may map to multiple devices.  The current GPE interface
only provides a mechanism for enabling and disabling GPEs, making
it difficult to change the state of GPEs at runtime without extensive
cooperation between devices.

Add an API to allow devices to indicate whether or not they want
their device's GPE to be enabled for both runtime and wakeup events.

Remove the old GPE type handling entirely, which gets rid of various
quirks, like the implicit disabling with GPE type setting. This
requires a small amount of rework in order to ensure that non-wake
GPEs are enabled by default to preserve existing behaviour.

Based on patches from Matthew Garrett <mjg@redhat.com>.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:20:45 -08:00
Rafael J. Wysocki
c39fae1416 PCI PM: Make it possible to force using INTx for PCIe PME signaling
Apparently, some machines may have problems with PCI run-time power
management if MSIs are used for the native PCIe PME signaling.  In
particular, on the MSI Wind U-100 PCIe PME interrupts are not
generated by a PCIe root port after a resume from suspend to RAM, if
the system wake-up was triggered by a PME from the device attached to
this port.  [It doesn't help to free the interrupt on suspend and
request it back on resume, even if that is done along with disabling
the MSI and re-enabling it, respectively.]  However, if INTx
interrupts are used for this purpose on the same machine, everything
works just fine.

For this reason, add a kernel command line switch allowing one to
request that MSIs be not used for the native PCIe PME signaling,
introduce a DMI table allowing us to blacklist machines that need
this switch to be set by default and put the MSI Wind U-100 into this
table.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:20:39 -08:00
Rafael J. Wysocki
c7f486567c PCI PM: PCIe PME root port service driver
PCIe native PME detection mechanism is based on interrupts generated
by root ports or event collectors every time a PCIe device sends a
PME message upstream.

Once a PME message has been sent by an endpoint device and received
by its root port (or event collector in the case of root complex
integrated endpoints), the Requester ID from the message header is
registered in the root port's Root Status register.  At the same
time, the PME Status bit of the Root Status register is set to
indicate that there's a PME to handle.  If PCIe PME interrupt is
enabled for the root port, it generates an interrupt once the PME
Status has been set.  After receiving the interrupt, the kernel can
identify the PCIe device that generated the PME using the Requester
ID from the root port's Root Status register. [For details, see PCI
Express Base Specification, Rev. 2.0.]

Implement a driver for the PCIe PME root port service working in
accordance with the above description.

Based on a patch from Shaohua Li <shaohua.li@intel.com>.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:20:31 -08:00
Rafael J. Wysocki
58ff463396 PCI PM: Add function for checking PME status of devices
Add function pci_check_pme_status() that will check the PME status
bit of given device and clear it along with the PME enable bit.  It
will be necessary for PCI run-time power management.

Based on a patch from Shaohua Li <shaohua.li@intel.com>

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:20:24 -08:00
Kenji Kaneshige
6d3be84aab PCI: mark is_pcie obsolete
The "is_pcie" field in struct pci_dev is no longer needed because
struct pci_dev has PCIe capability offset in "pcie_cap" field and
(pcie_cap != 0) means the device is PCIe capable. This patch marks
"is_pcie" fields obsolete.

Current users of "is_pcie" field are:

- drivers/ssb/scan.c
- drivers/net/wireless/ath/ath9k/pci.c
- drivers/net/wireless/ath/ath5k/attach.c
- drivers/net/wireless/ath/ath5k/reset.c
- drivers/acpi/hest.c
- drivers/pci/pcie/pme/pcie_pme.c

Will post patches for each to use pci_is_pcie() as a follow-up.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:17:26 -08:00
Yinghai Lu
9958610552 PCI: set PCI_PREF_RANGE_TYPE_64 in pci_bridge_check_ranges
Make pci_bridge_check_ranges() store the PCI_PREF_RANGE_TYPE_64 in
addition to IORESOURCE_MEM_64.  Just like pci_read_bridge_bases().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:17:25 -08:00