bpf-next-for-netdev

-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY4AC5QAKCRDbK58LschI
 g1e0AQCfAqduTy7mYd02jDNCV0wLphNp9FbPiP9OrQT37ABpKAEA1ulj1X59bX3d
 HnZdDKuatcPZT9MV5hDLM7MFJ9GjOA4=
 =fNmM
 -----END PGP SIGNATURE-----

Daniel Borkmann says:

====================
bpf-next 2022-11-25

We've added 101 non-merge commits during the last 11 day(s) which contain
a total of 109 files changed, 8827 insertions(+), 1129 deletions(-).

The main changes are:

1) Support for user defined BPF objects: the use case is to allocate own
   objects, build own object hierarchies and use the building blocks to
   build own data structures flexibly, for example, linked lists in BPF,
   from Kumar Kartikeya Dwivedi.

2) Add bpf_rcu_read_{,un}lock() support for sleepable programs,
   from Yonghong Song.

3) Add support storing struct task_struct objects as kptrs in maps,
   from David Vernet.

4) Batch of BPF map documentation improvements, from Maryam Tahhan
   and Donald Hunter.

5) Improve BPF verifier to propagate nullness information for branches
   of register to register comparisons, from Eduard Zingerman.

6) Fix cgroup BPF iter infra to hold reference on the start cgroup,
   from Hou Tao.

7) Fix BPF verifier to not mark fentry/fexit program arguments as trusted
   given it is not the case for them, from Alexei Starovoitov.

8) Improve BPF verifier's realloc handling to better play along with dynamic
   runtime analysis tools like KASAN and friends, from Kees Cook.

9) Remove legacy libbpf mode support from bpftool,
   from Sahid Orentino Ferdjaoui.

10) Rework zero-len skb redirection checks to avoid potentially breaking
    existing BPF test infra users, from Stanislav Fomichev.

11) Two small refactorings which are independent and have been split out
    of the XDP queueing RFC series, from Toke Høiland-Jørgensen.

12) Fix a memory leak in LSM cgroup BPF selftest, from Wang Yufen.

13) Documentation on how to run BPF CI without patch submission,
    from Daniel Müller.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================

Link: https://lore.kernel.org/r/20221125012450.441-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2022-11-28 17:14:01 -08:00
commit d6dc62fca6
109 changed files with 8695 additions and 997 deletions

View file

@ -332,13 +332,14 @@ avoid defining types with 'bpf\_' prefix to not be broken in future releases.
In other words, no backwards compatibility is guaranteed if one using a type
in BTF with 'bpf\_' prefix.
Q: What is the compatibility story for special BPF types in local kptrs?
------------------------------------------------------------------------
Q: Same as above, but for local kptrs (i.e. pointers to objects allocated using
bpf_obj_new for user defined structures). Will the kernel preserve backwards
Q: What is the compatibility story for special BPF types in allocated objects?
------------------------------------------------------------------------------
Q: Same as above, but for allocated objects (i.e. objects allocated using
bpf_obj_new for user defined types). Will the kernel preserve backwards
compatibility for these features?
A: NO.
Unlike map value types, there are no stability guarantees for this case. The
whole local kptr API itself is unstable (since it is exposed through kfuncs).
whole API to work with allocated objects and any support for special fields
inside them is unstable (since it is exposed through kfuncs).

View file

@ -44,6 +44,33 @@ is a guarantee that the reported issue will be overlooked.**
Submitting patches
==================
Q: How do I run BPF CI on my changes before sending them out for review?
------------------------------------------------------------------------
A: BPF CI is GitHub based and hosted at https://github.com/kernel-patches/bpf.
While GitHub also provides a CLI that can be used to accomplish the same
results, here we focus on the UI based workflow.
The following steps lay out how to start a CI run for your patches:
- Create a fork of the aforementioned repository in your own account (one time
action)
- Clone the fork locally, check out a new branch tracking either the bpf-next
or bpf branch, and apply your to-be-tested patches on top of it
- Push the local branch to your fork and create a pull request against
kernel-patches/bpf's bpf-next_base or bpf_base branch, respectively
Shortly after the pull request has been created, the CI workflow will run. Note
that capacity is shared with patches submitted upstream being checked and so
depending on utilization the run can take a while to finish.
Note furthermore that both base branches (bpf-next_base and bpf_base) will be
updated as patches are pushed to the respective upstream branches they track. As
such, your patch set will automatically (be attempted to) be rebased as well.
This behavior can result in a CI run being aborted and restarted with the new
base line.
Q: To which mailing list do I need to submit my BPF patches?
------------------------------------------------------------
A: Please submit your BPF patches to the bpf kernel mailing list:

View file

@ -1062,4 +1062,9 @@ format.::
7. Testing
==========
Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.
The kernel BPF selftest `tools/testing/selftests/bpf/prog_tests/btf.c`_
provides an extensive set of BTF-related tests.
.. Links
.. _tools/testing/selftests/bpf/prog_tests/btf.c:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/prog_tests/btf.c

View file

@ -29,6 +29,7 @@ that goes into great technical depth about the BPF Architecture.
clang-notes
linux-notes
other
redirect
.. only:: subproject and html

View file

@ -72,6 +72,30 @@ argument as its size. By default, without __sz annotation, the size of the type
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
pointer.
2.2.2 __k Annotation
--------------------
This annotation is only understood for scalar arguments, where it indicates that
the verifier must check the scalar argument to be a known constant, which does
not indicate a size parameter, and the value of the constant is relevant to the
safety of the program.
An example is given below::
void *bpf_obj_new(u32 local_type_id__k, ...)
{
...
}
Here, bpf_obj_new uses local_type_id argument to find out the size of that type
ID in program's BTF and return a sized pointer to it. Each type ID will have a
distinct size, hence it is crucial to treat each such call as distinct when
values don't match during verifier state pruning checks.
Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.
.. _BPF_kfunc_nodef:
2.3 Using an existing kernel function
@ -137,22 +161,20 @@ KF_ACQUIRE and KF_RET_NULL flags.
--------------------------
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
indicates that the all pointer arguments will always have a guaranteed lifetime,
and pointers to kernel objects are always passed to helpers in their unmodified
form (as obtained from acquire kfuncs).
indicates that the all pointer arguments are valid, and that all pointers to
BTF objects have been passed in their unmodified form (that is, at a zero
offset, and without having been obtained from walking another pointer).
It can be used to enforce that a pointer to a refcounted object acquired from a
kfunc or BPF helper is passed as an argument to this kfunc without any
modifications (e.g. pointer arithmetic) such that it is trusted and points to
the original object.
There are two types of pointers to kernel objects which are considered "valid":
Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
but those can have a non-zero offset.
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
This flag is often used for kfuncs that operate (change some property, perform
some operation) on an object that was obtained using an acquire kfunc. Such
kfuncs need an unchanged pointer to ensure the integrity of the operation being
performed on the expected object.
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
The definition of "valid" pointers is subject to change at any time, and has
absolutely no ABI stability guarantees.
2.4.6 KF_SLEEPABLE flag
-----------------------

View file

@ -1,5 +1,7 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
.. _libbpf:
libbpf
======
@ -7,6 +9,7 @@ libbpf
:maxdepth: 1
API Documentation <https://libbpf.readthedocs.io/en/latest/api.html>
program_types
libbpf_naming_convention
libbpf_build

View file

@ -0,0 +1,203 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
.. _program_types_and_elf:
Program Types and ELF Sections
==============================
The table below lists the program types, their attach types where relevant and the ELF section
names supported by libbpf for them. The ELF section names follow these rules:
- ``type`` is an exact match, e.g. ``SEC("socket")``
- ``type+`` means it can be either exact ``SEC("type")`` or well-formed ``SEC("type/extras")``
with a '``/``' separator between ``type`` and ``extras``.
When ``extras`` are specified, they provide details of how to auto-attach the BPF program. The
format of ``extras`` depends on the program type, e.g. ``SEC("tracepoint/<category>/<name>")``
for tracepoints or ``SEC("usdt/<path>:<provider>:<name>")`` for USDT probes. The extras are
described in more detail in the footnotes.
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| Program Type | Attach Type | ELF Section Name | Sleepable |
+===========================================+========================================+==================================+===========+
| ``BPF_PROG_TYPE_CGROUP_DEVICE`` | ``BPF_CGROUP_DEVICE`` | ``cgroup/dev`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SKB`` | | ``cgroup/skb`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_EGRESS`` | ``cgroup_skb/egress`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_INGRESS`` | ``cgroup_skb/ingress`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` | ``BPF_CGROUP_GETSOCKOPT`` | ``cgroup/getsockopt`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_SETSOCKOPT`` | ``cgroup/setsockopt`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCK_ADDR`` | ``BPF_CGROUP_INET4_BIND`` | ``cgroup/bind4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_CONNECT`` | ``cgroup/connect4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_GETPEERNAME`` | ``cgroup/getpeername4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET4_GETSOCKNAME`` | ``cgroup/getsockname4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_BIND`` | ``cgroup/bind6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_CONNECT`` | ``cgroup/connect6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_GETPEERNAME`` | ``cgroup/getpeername6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_GETSOCKNAME`` | ``cgroup/getsockname6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP4_RECVMSG`` | ``cgroup/recvmsg4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP4_SENDMSG`` | ``cgroup/sendmsg4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP6_RECVMSG`` | ``cgroup/recvmsg6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_UDP6_SENDMSG`` | ``cgroup/sendmsg6`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SOCK`` | ``BPF_CGROUP_INET4_POST_BIND`` | ``cgroup/post_bind4`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET6_POST_BIND`` | ``cgroup/post_bind6`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_SOCK_CREATE`` | ``cgroup/sock_create`` | |
+ + +----------------------------------+-----------+
| | | ``cgroup/sock`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_CGROUP_INET_SOCK_RELEASE`` | ``cgroup/sock_release`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_CGROUP_SYSCTL`` | ``BPF_CGROUP_SYSCTL`` | ``cgroup/sysctl`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_EXT`` | | ``freplace+`` [#fentry]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_FLOW_DISSECTOR`` | ``BPF_FLOW_DISSECTOR`` | ``flow_dissector`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_KPROBE`` | | ``kprobe+`` [#kprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``kretprobe+`` [#kprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``ksyscall+`` [#ksyscall]_ | |
+ + +----------------------------------+-----------+
| | | ``kretsyscall+`` [#ksyscall]_ | |
+ + +----------------------------------+-----------+
| | | ``uprobe+`` [#uprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``uprobe.s+`` [#uprobe]_ | Yes |
+ + +----------------------------------+-----------+
| | | ``uretprobe+`` [#uprobe]_ | |
+ + +----------------------------------+-----------+
| | | ``uretprobe.s+`` [#uprobe]_ | Yes |
+ + +----------------------------------+-----------+
| | | ``usdt+`` [#usdt]_ | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_KPROBE_MULTI`` | ``kprobe.multi+`` [#kpmulti]_ | |
+ + +----------------------------------+-----------+
| | | ``kretprobe.multi+`` [#kpmulti]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LIRC_MODE2`` | ``BPF_LIRC_MODE2`` | ``lirc_mode2`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LSM`` | ``BPF_LSM_CGROUP`` | ``lsm_cgroup+`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_LSM_MAC`` | ``lsm+`` [#lsm]_ | |
+ + +----------------------------------+-----------+
| | | ``lsm.s+`` [#lsm]_ | Yes |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_IN`` | | ``lwt_in`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_OUT`` | | ``lwt_out`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` | | ``lwt_seg6local`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LWT_XMIT`` | | ``lwt_xmit`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_PERF_EVENT`` | | ``perf_event`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE`` | | ``raw_tp.w+`` [#rawtp]_ | |
+ + +----------------------------------+-----------+
| | | ``raw_tracepoint.w+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_RAW_TRACEPOINT`` | | ``raw_tp+`` [#rawtp]_ | |
+ + +----------------------------------+-----------+
| | | ``raw_tracepoint+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SCHED_ACT`` | | ``action`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SCHED_CLS`` | | ``classifier`` | |
+ + +----------------------------------+-----------+
| | | ``tc`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_LOOKUP`` | ``BPF_SK_LOOKUP`` | ``sk_lookup`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_MSG`` | ``BPF_SK_MSG_VERDICT`` | ``sk_msg`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_REUSEPORT`` | ``BPF_SK_REUSEPORT_SELECT_OR_MIGRATE`` | ``sk_reuseport/migrate`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_REUSEPORT_SELECT`` | ``sk_reuseport`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SK_SKB`` | | ``sk_skb`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_SKB_STREAM_PARSER`` | ``sk_skb/stream_parser`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_SK_SKB_STREAM_VERDICT`` | ``sk_skb/stream_verdict`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SOCKET_FILTER`` | | ``socket`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SOCK_OPS`` | ``BPF_CGROUP_SOCK_OPS`` | ``sockops`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_STRUCT_OPS`` | | ``struct_ops+`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_SYSCALL`` | | ``syscall`` | Yes |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_TRACEPOINT`` | | ``tp+`` [#tp]_ | |
+ + +----------------------------------+-----------+
| | | ``tracepoint+`` [#tp]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_TRACING`` | ``BPF_MODIFY_RETURN`` | ``fmod_ret+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fmod_ret.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_FENTRY`` | ``fentry+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fentry.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_FEXIT`` | ``fexit+`` [#fentry]_ | |
+ + +----------------------------------+-----------+
| | | ``fexit.s+`` [#fentry]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_ITER`` | ``iter+`` [#iter]_ | |
+ + +----------------------------------+-----------+
| | | ``iter.s+`` [#iter]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_RAW_TP`` | ``tp_btf+`` [#fentry]_ | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_XDP`` | ``BPF_XDP_CPUMAP`` | ``xdp.frags/cpumap`` | |
+ + +----------------------------------+-----------+
| | | ``xdp/cpumap`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_XDP_DEVMAP`` | ``xdp.frags/devmap`` | |
+ + +----------------------------------+-----------+
| | | ``xdp/devmap`` | |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_XDP`` | ``xdp.frags`` | |
+ + +----------------------------------+-----------+
| | | ``xdp`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
.. rubric:: Footnotes
.. [#fentry] The ``fentry`` attach format is ``fentry[.s]/<function>``.
.. [#kprobe] The ``kprobe`` attach format is ``kprobe/<function>[+<offset>]``. Valid
characters for ``function`` are ``a-zA-Z0-9_.`` and ``offset`` must be a valid
non-negative integer.
.. [#ksyscall] The ``ksyscall`` attach format is ``ksyscall/<syscall>``.
.. [#uprobe] The ``uprobe`` attach format is ``uprobe[.s]/<path>:<function>[+<offset>]``.
.. [#usdt] The ``usdt`` attach format is ``usdt/<path>:<provider>:<name>``.
.. [#kpmulti] The ``kprobe.multi`` attach format is ``kprobe.multi/<pattern>`` where ``pattern``
supports ``*`` and ``?`` wildcards. Valid characters for pattern are
``a-zA-Z0-9_.*?``.
.. [#lsm] The ``lsm`` attachment format is ``lsm[.s]/<hook>``.
.. [#rawtp] The ``raw_tp`` attach format is ``raw_tracepoint[.w]/<tracepoint>``.
.. [#tp] The ``tracepoint`` attach format is ``tracepoint/<category>/<name>``.
.. [#iter] The ``iter`` attach format is ``iter[.s]/<struct-name>``.

View file

@ -32,7 +32,11 @@ Usage
Kernel BPF
----------
.. c:function::
bpf_map_lookup_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
Array elements can be retrieved using the ``bpf_map_lookup_elem()`` helper.
@ -40,7 +44,11 @@ This helper returns a pointer into the array element, so to avoid data races
with userspace reading the value, the user must use primitives like
``__sync_fetch_and_add()`` when updating the value in-place.
.. c:function::
bpf_map_update_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
Array elements can be updated using the ``bpf_map_update_elem()`` helper.
@ -53,7 +61,7 @@ To clear an array element, you may use ``bpf_map_update_elem()`` to insert a
zero value to that index.
Per CPU Array
~~~~~~~~~~~~~
-------------
Values stored in ``BPF_MAP_TYPE_ARRAY`` can be accessed by multiple programs
across different CPUs. To restrict storage to a single CPU, you may use a
@ -63,7 +71,11 @@ When using a ``BPF_MAP_TYPE_PERCPU_ARRAY`` the ``bpf_map_update_elem()`` and
``bpf_map_lookup_elem()`` helpers automatically access the slot for the current
CPU.
.. c:function::
bpf_map_lookup_percpu_elem()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the array
@ -119,7 +131,7 @@ This example BPF program shows how to access an array element.
index = ip.protocol;
value = bpf_map_lookup_elem(&my_map, &index);
if (value)
__sync_fetch_and_add(&value, skb->len);
__sync_fetch_and_add(value, skb->len);
return 0;
}

View file

@ -0,0 +1,174 @@
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.
=========================
BPF_MAP_TYPE_BLOOM_FILTER
=========================
.. note::
- ``BPF_MAP_TYPE_BLOOM_FILTER`` was introduced in kernel version 5.16
``BPF_MAP_TYPE_BLOOM_FILTER`` provides a BPF bloom filter map. Bloom
filters are a space-efficient probabilistic data structure used to
quickly test whether an element exists in a set. In a bloom filter,
false positives are possible whereas false negatives are not.
The bloom filter map does not have keys, only values. When the bloom
filter map is created, it must be created with a ``key_size`` of 0. The
bloom filter map supports two operations:
- push: adding an element to the map
- peek: determining whether an element is present in the map
BPF programs must use ``bpf_map_push_elem`` to add an element to the
bloom filter map and ``bpf_map_peek_elem`` to query the map. These
operations are exposed to userspace applications using the existing
``bpf`` syscall in the following way:
- ``BPF_MAP_UPDATE_ELEM`` -> push
- ``BPF_MAP_LOOKUP_ELEM`` -> peek
The ``max_entries`` size that is specified at map creation time is used
to approximate a reasonable bitmap size for the bloom filter, and is not
otherwise strictly enforced. If the user wishes to insert more entries
into the bloom filter than ``max_entries``, this may lead to a higher
false positive rate.
The number of hashes to use for the bloom filter is configurable using
the lower 4 bits of ``map_extra`` in ``union bpf_attr`` at map creation
time. If no number is specified, the default used will be 5 hash
functions. In general, using more hashes decreases both the false
positive rate and the speed of a lookup.
It is not possible to delete elements from a bloom filter map. A bloom
filter map may be used as an inner map. The user is responsible for
synchronising concurrent updates and lookups to ensure no false negative
lookups occur.
Usage
=====
Kernel BPF
----------
bpf_map_push_elem()
~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
A ``value`` can be added to a bloom filter using the
``bpf_map_push_elem()`` helper. The ``flags`` parameter must be set to
``BPF_ANY`` when adding an entry to the bloom filter. This helper
returns ``0`` on success, or negative error in case of failure.
bpf_map_peek_elem()
~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_peek_elem(struct bpf_map *map, void *value)
The ``bpf_map_peek_elem()`` helper is used to determine whether
``value`` is present in the bloom filter map. This helper returns ``0``
if ``value`` is probably present in the map, or ``-ENOENT`` if ``value``
is definitely not present in the map.
Userspace
---------
bpf_map_update_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
int bpf_map_update_elem (int fd, const void *key, const void *value, __u64 flags)
A userspace program can add a ``value`` to a bloom filter using libbpf's
``bpf_map_update_elem`` function. The ``key`` parameter must be set to
``NULL`` and ``flags`` must be set to ``BPF_ANY``. Returns ``0`` on
success, or negative error in case of failure.
bpf_map_lookup_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
int bpf_map_lookup_elem (int fd, const void *key, void *value)
A userspace program can determine the presence of ``value`` in a bloom
filter using libbpf's ``bpf_map_lookup_elem`` function. The ``key``
parameter must be set to ``NULL``. Returns ``0`` if ``value`` is
probably present in the map, or ``-ENOENT`` if ``value`` is definitely
not present in the map.
Examples
========
Kernel BPF
----------
This snippet shows how to declare a bloom filter in a BPF program:
.. code-block:: c
struct {
__uint(type, BPF_MAP_TYPE_BLOOM_FILTER);
__type(value, __u32);
__uint(max_entries, 1000);
__uint(map_extra, 3);
} bloom_filter SEC(".maps");
This snippet shows how to determine presence of a value in a bloom
filter in a BPF program:
.. code-block:: c
void *lookup(__u32 key)
{
if (bpf_map_peek_elem(&bloom_filter, &key) == 0) {
/* Verify not a false positive and fetch an associated
* value using a secondary lookup, e.g. in a hash table
*/
return bpf_map_lookup_elem(&hash_table, &key);
}
return 0;
}
Userspace
---------
This snippet shows how to use libbpf to create a bloom filter map from
userspace:
.. code-block:: c
int create_bloom()
{
LIBBPF_OPTS(bpf_map_create_opts, opts,
.map_extra = 3); /* number of hashes */
return bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER,
"ipv6_bloom", /* name */
0, /* key size, must be zero */
sizeof(ipv6_addr), /* value size */
10000, /* max entries */
&opts); /* create options */
}
This snippet shows how to add an element to a bloom filter from
userspace:
.. code-block:: c
int add_element(struct bpf_map *bloom_map, __u32 value)
{
int bloom_fd = bpf_map__fd(bloom_map);
return bpf_map_update_elem(bloom_fd, NULL, &value, BPF_ANY);
}
References
==========
https://lwn.net/ml/bpf/20210831225005.2762202-1-joannekoong@fb.com/

View file

@ -30,30 +30,35 @@ Usage
Kernel BPF
----------
.. c:function::
bpf_redirect_map()
^^^^^^^^^^^^^^^^^^
.. code-block:: c
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
For ``BPF_MAP_TYPE_CPUMAP`` this map contains references to CPUs.
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
For ``BPF_MAP_TYPE_CPUMAP`` this map contains references to CPUs.
The lower two bits of ``flags`` are used as the return code if the map lookup
fails. This is so that the return value can be one of the XDP program return
codes up to ``XDP_TX``, as chosen by the caller.
The lower two bits of ``flags`` are used as the return code if the map lookup
fails. This is so that the return value can be one of the XDP program return
codes up to ``XDP_TX``, as chosen by the caller.
Userspace
---------
User space
----------
.. note::
CPUMAP entries can only be updated/looked up/deleted from user space and not
from an eBPF program. Trying to call these functions from a kernel eBPF
program will result in the program failing to load and a verifier warning.
.. c:function::
int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags);
bpf_map_update_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
CPU entries can be added or updated using the ``bpf_map_update_elem()``
helper. This helper replaces existing elements atomically. The ``value`` parameter
can be ``struct bpf_cpumap_val``.
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags);
CPU entries can be added or updated using the ``bpf_map_update_elem()``
helper. This helper replaces existing elements atomically. The ``value`` parameter
can be ``struct bpf_cpumap_val``.
.. code-block:: c
@ -65,23 +70,29 @@ Userspace
} bpf_prog;
};
The flags argument can be one of the following:
The flags argument can be one of the following:
- BPF_ANY: Create a new element or update an existing element.
- BPF_NOEXIST: Create a new element only if it did not exist.
- BPF_EXIST: Update an existing element.
.. c:function::
bpf_map_lookup_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
int bpf_map_lookup_elem(int fd, const void *key, void *value);
CPU entries can be retrieved using the ``bpf_map_lookup_elem()``
helper.
CPU entries can be retrieved using the ``bpf_map_lookup_elem()``
helper.
bpf_map_delete_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
.. c:function::
int bpf_map_delete_elem(int fd, const void *key);
CPU entries can be deleted using the ``bpf_map_delete_elem()``
helper. This helper will return 0 on success, or negative error in case of
failure.
CPU entries can be deleted using the ``bpf_map_delete_elem()``
helper. This helper will return 0 on success, or negative error in case of
failure.
Examples
========
@ -142,8 +153,8 @@ The following code snippet shows how to declare a ``BPF_MAP_TYPE_CPUMAP`` called
return bpf_redirect_map(&cpu_map, cpu_dest, 0);
}
Userspace
---------
User space
----------
The following code snippet shows how to dynamically set the max_entries for a
CPUMAP to the max number of cpus available on the system.

View file

@ -0,0 +1,238 @@
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.
=================================================
BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH
=================================================
.. note::
- ``BPF_MAP_TYPE_DEVMAP`` was introduced in kernel version 4.14
- ``BPF_MAP_TYPE_DEVMAP_HASH`` was introduced in kernel version 5.4
``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` are BPF maps primarily
used as backend maps for the XDP BPF helper call ``bpf_redirect_map()``.
``BPF_MAP_TYPE_DEVMAP`` is backed by an array that uses the key as
the index to lookup a reference to a net device. While ``BPF_MAP_TYPE_DEVMAP_HASH``
is backed by a hash table that uses a key to lookup a reference to a net device.
The user provides either <``key``/ ``ifindex``> or <``key``/ ``struct bpf_devmap_val``>
pairs to update the maps with new net devices.
.. note::
- The key to a hash map doesn't have to be an ``ifindex``.
- While ``BPF_MAP_TYPE_DEVMAP_HASH`` allows for densely packing the net devices
it comes at the cost of a hash of the key when performing a look up.
The setup and packet enqueue/send code is shared between the two types of
devmap; only the lookup and insertion is different.
Usage
=====
Kernel BPF
----------
bpf_redirect_map()
^^^^^^^^^^^^^^^^^^
.. code-block:: c
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
For ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` this map contains
references to net devices (for forwarding packets through other ports).
The lower two bits of *flags* are used as the return code if the map lookup
fails. This is so that the return value can be one of the XDP program return
codes up to ``XDP_TX``, as chosen by the caller. The higher bits of ``flags``
can be set to ``BPF_F_BROADCAST`` or ``BPF_F_EXCLUDE_INGRESS`` as defined
below.
With ``BPF_F_BROADCAST`` the packet will be broadcast to all the interfaces
in the map, with ``BPF_F_EXCLUDE_INGRESS`` the ingress interface will be excluded
from the broadcast.
.. note::
- The key is ignored if BPF_F_BROADCAST is set.
- The broadcast feature can also be used to implement multicast forwarding:
simply create multiple DEVMAPs, each one corresponding to a single multicast group.
This helper will return ``XDP_REDIRECT`` on success, or the value of the two
lower bits of the ``flags`` argument if the map lookup fails.
More information about redirection can be found :doc:`redirect`
bpf_map_lookup_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
Net device entries can be retrieved using the ``bpf_map_lookup_elem()``
helper.
User space
----------
.. note::
DEVMAP entries can only be updated/deleted from user space and not
from an eBPF program. Trying to call these functions from a kernel eBPF
program will result in the program failing to load and a verifier warning.
bpf_map_update_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags);
Net device entries can be added or updated using the ``bpf_map_update_elem()``
helper. This helper replaces existing elements atomically. The ``value`` parameter
can be ``struct bpf_devmap_val`` or a simple ``int ifindex`` for backwards
compatibility.
.. code-block:: c
struct bpf_devmap_val {
__u32 ifindex; /* device index */
union {
int fd; /* prog fd on map write */
__u32 id; /* prog id on map read */
} bpf_prog;
};
The ``flags`` argument can be one of the following:
- ``BPF_ANY``: Create a new element or update an existing element.
- ``BPF_NOEXIST``: Create a new element only if it did not exist.
- ``BPF_EXIST``: Update an existing element.
DEVMAPs can associate a program with a device entry by adding a ``bpf_prog.fd``
to ``struct bpf_devmap_val``. Programs are run after ``XDP_REDIRECT`` and have
access to both Rx device and Tx device. The program associated with the ``fd``
must have type XDP with expected attach type ``xdp_devmap``.
When a program is associated with a device index, the program is run on an
``XDP_REDIRECT`` and before the buffer is added to the per-cpu queue. Examples
of how to attach/use xdp_devmap progs can be found in the kernel selftests:
- ``tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c``
- ``tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c``
bpf_map_lookup_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
.. c:function::
int bpf_map_lookup_elem(int fd, const void *key, void *value);
Net device entries can be retrieved using the ``bpf_map_lookup_elem()``
helper.
bpf_map_delete_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
.. c:function::
int bpf_map_delete_elem(int fd, const void *key);
Net device entries can be deleted using the ``bpf_map_delete_elem()``
helper. This helper will return 0 on success, or negative error in case of
failure.
Examples
========
Kernel BPF
----------
The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP``
called tx_port.
.. code-block:: c
struct {
__uint(type, BPF_MAP_TYPE_DEVMAP);
__type(key, __u32);
__type(value, __u32);
__uint(max_entries, 256);
} tx_port SEC(".maps");
The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP_HASH``
called forward_map.
.. code-block:: c
struct {
__uint(type, BPF_MAP_TYPE_DEVMAP_HASH);
__type(key, __u32);
__type(value, struct bpf_devmap_val);
__uint(max_entries, 32);
} forward_map SEC(".maps");
.. note::
The value type in the DEVMAP above is a ``struct bpf_devmap_val``
The following code snippet shows a simple xdp_redirect_map program. This program
would work with a user space program that populates the devmap ``forward_map`` based
on ingress ifindexes. The BPF program (below) is redirecting packets using the
ingress ``ifindex`` as the ``key``.
.. code-block:: c
SEC("xdp")
int xdp_redirect_map_func(struct xdp_md *ctx)
{
int index = ctx->ingress_ifindex;
return bpf_redirect_map(&forward_map, index, 0);
}
The following code snippet shows a BPF program that is broadcasting packets to
all the interfaces in the ``tx_port`` devmap.
.. code-block:: c
SEC("xdp")
int xdp_redirect_map_func(struct xdp_md *ctx)
{
return bpf_redirect_map(&tx_port, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS);
}
User space
----------
The following code snippet shows how to update a devmap called ``tx_port``.
.. code-block:: c
int update_devmap(int ifindex, int redirect_ifindex)
{
int ret;
ret = bpf_map_update_elem(bpf_map__fd(tx_port), &ifindex, &redirect_ifindex, 0);
if (ret < 0) {
fprintf(stderr, "Failed to update devmap_ value: %s\n",
strerror(errno));
}
return ret;
}
The following code snippet shows how to update a hash_devmap called ``forward_map``.
.. code-block:: c
int update_devmap(int ifindex, int redirect_ifindex)
{
struct bpf_devmap_val devmap_val = { .ifindex = redirect_ifindex };
int ret;
ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0);
if (ret < 0) {
fprintf(stderr, "Failed to update devmap_ value: %s\n",
strerror(errno));
}
return ret;
}
References
===========
- https://lwn.net/Articles/728146/
- https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6f9d451ab1a33728adb72d7ff66a7b374d665176
- https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4106

View file

@ -34,7 +34,14 @@ the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.
Usage
=====
.. c:function::
Kernel BPF
----------
bpf_map_update_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
Hash entries can be added or updated using the ``bpf_map_update_elem()``
@ -49,14 +56,22 @@ parameter can be used to control the update behaviour:
``bpf_map_update_elem()`` returns 0 on success, or negative error in
case of failure.
.. c:function::
bpf_map_lookup_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
Hash entries can be retrieved using the ``bpf_map_lookup_elem()``
helper. This helper returns a pointer to the value associated with
``key``, or ``NULL`` if no entry was found.
.. c:function::
bpf_map_delete_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
Hash entries can be deleted using the ``bpf_map_delete_elem()``
@ -70,7 +85,11 @@ For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
automatically access the hash slot for the current CPU.
.. c:function::
bpf_map_lookup_percpu_elem()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
@ -89,7 +108,11 @@ See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
Userspace
---------
.. c:function::
bpf_map_get_next_key()
~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
In userspace, it is possible to iterate through the keys of a hash using

View file

@ -35,7 +35,11 @@ Usage
Kernel BPF
----------
.. c:function::
bpf_map_lookup_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
The longest prefix entry for a given data value can be found using the
@ -48,7 +52,11 @@ performing longest prefix lookups. For example, when searching for the
longest prefix match for an IPv4 address, ``prefixlen`` should be set to
``32``.
.. c:function::
bpf_map_update_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
Prefix entries can be added or updated using the ``bpf_map_update_elem()``
@ -61,7 +69,11 @@ case of failure.
The flags parameter must be one of BPF_ANY, BPF_NOEXIST or BPF_EXIST,
but the value is ignored, giving BPF_ANY semantics.
.. c:function::
bpf_map_delete_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
Prefix entries can be deleted using the ``bpf_map_delete_elem()``
@ -74,7 +86,11 @@ Userspace
Access from userspace uses libbpf APIs with the same names as above, with
the map identified by ``fd``.
.. c:function::
bpf_map_get_next_key()
~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
int bpf_map_get_next_key (int fd, const void *cur_key, void *next_key)
A userspace program can iterate through the entries in an LPM trie using

View file

@ -45,7 +45,11 @@ Usage
Kernel BPF Helper
-----------------
.. c:function::
bpf_map_lookup_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
Inner maps can be retrieved using the ``bpf_map_lookup_elem()`` helper. This

View file

@ -28,7 +28,11 @@ Usage
Kernel BPF
----------
.. c:function::
bpf_map_push_elem()
~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
An element ``value`` can be added to a queue or stack using the
@ -38,14 +42,22 @@ when the queue or stack is full, the oldest element will be removed to
make room for ``value`` to be added. Returns ``0`` on success, or
negative error in case of failure.
.. c:function::
bpf_map_peek_elem()
~~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_peek_elem(struct bpf_map *map, void *value)
This helper fetches an element ``value`` from a queue or stack without
removing it. Returns ``0`` on success, or negative error in case of
failure.
.. c:function::
bpf_map_pop_elem()
~~~~~~~~~~~~~~~~~~
.. code-block:: c
long bpf_map_pop_elem(struct bpf_map *map, void *value)
This helper removes an element into ``value`` from a queue or
@ -55,7 +67,11 @@ stack. Returns ``0`` on success, or negative error in case of failure.
Userspace
---------
.. c:function::
bpf_map_update_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
int bpf_map_update_elem (int fd, const void *key, const void *value, __u64 flags)
A userspace program can push ``value`` onto a queue or stack using libbpf's
@ -64,7 +80,11 @@ A userspace program can push ``value`` onto a queue or stack using libbpf's
same semantics as the ``bpf_map_push_elem`` kernel helper. Returns ``0`` on
success, or negative error in case of failure.
.. c:function::
bpf_map_lookup_elem()
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
int bpf_map_lookup_elem (int fd, const void *key, void *value)
A userspace program can peek at the ``value`` at the head of a queue or stack
@ -72,7 +92,11 @@ using the libbpf ``bpf_map_lookup_elem`` function. The ``key`` parameter must be
set to ``NULL``. Returns ``0`` on success, or negative error in case of
failure.
.. c:function::
bpf_map_lookup_and_delete_elem()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: c
int bpf_map_lookup_and_delete_elem (int fd, const void *key, void *value)
A userspace program can pop a ``value`` from the head of a queue or stack using

View file

@ -0,0 +1,192 @@
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.
===================
BPF_MAP_TYPE_XSKMAP
===================
.. note::
- ``BPF_MAP_TYPE_XSKMAP`` was introduced in kernel version 4.18
The ``BPF_MAP_TYPE_XSKMAP`` is used as a backend map for XDP BPF helper
call ``bpf_redirect_map()`` and ``XDP_REDIRECT`` action, like 'devmap' and 'cpumap'.
This map type redirects raw XDP frames to `AF_XDP`_ sockets (XSKs), a new type of
address family in the kernel that allows redirection of frames from a driver to
user space without having to traverse the full network stack. An AF_XDP socket
binds to a single netdev queue. A mapping of XSKs to queues is shown below:
.. code-block:: none
+---------------------------------------------------+
| xsk A | xsk B | xsk C |<---+ User space
=========================================================|==========
| Queue 0 | Queue 1 | Queue 2 | | Kernel
+---------------------------------------------------+ |
| Netdev eth0 | |
+---------------------------------------------------+ |
| +=============+ | |
| | key | xsk | | |
| +---------+ +=============+ | |
| | | | 0 | xsk A | | |
| | | +-------------+ | |
| | | | 1 | xsk B | | |
| | BPF |-- redirect -->+-------------+-------------+
| | prog | | 2 | xsk C | |
| | | +-------------+ |
| | | |
| | | |
| +---------+ |
| |
+---------------------------------------------------+
.. note::
An AF_XDP socket that is bound to a certain <netdev/queue_id> will *only*
accept XDP frames from that <netdev/queue_id>. If an XDP program tries to redirect
from a <netdev/queue_id> other than what the socket is bound to, the frame will
not be received on the socket.
Typically an XSKMAP is created per netdev. This map contains an array of XSK File
Descriptors (FDs). The number of array elements is typically set or adjusted using
the ``max_entries`` map parameter. For AF_XDP ``max_entries`` is equal to the number
of queues supported by the netdev.
.. note::
Both the map key and map value size must be 4 bytes.
Usage
=====
Kernel BPF
----------
bpf_redirect_map()
^^^^^^^^^^^^^^^^^^
.. code-block:: c
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
For ``BPF_MAP_TYPE_XSKMAP`` this map contains references to XSK FDs
for sockets attached to a netdev's queues.
.. note::
If the map is empty at an index, the packet is dropped. This means that it is
necessary to have an XDP program loaded with at least one XSK in the
XSKMAP to be able to get any traffic to user space through the socket.
bpf_map_lookup_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
XSK entry references of type ``struct xdp_sock *`` can be retrieved using the
``bpf_map_lookup_elem()`` helper.
User space
----------
.. note::
XSK entries can only be updated/deleted from user space and not from
a BPF program. Trying to call these functions from a kernel BPF program will
result in the program failing to load and a verifier warning.
bpf_map_update_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags)
XSK entries can be added or updated using the ``bpf_map_update_elem()``
helper. The ``key`` parameter is equal to the queue_id of the queue the XSK
is attaching to. And the ``value`` parameter is the FD value of that socket.
Under the hood, the XSKMAP update function uses the XSK FD value to retrieve the
associated ``struct xdp_sock`` instance.
The flags argument can be one of the following:
- BPF_ANY: Create a new element or update an existing element.
- BPF_NOEXIST: Create a new element only if it did not exist.
- BPF_EXIST: Update an existing element.
bpf_map_lookup_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
int bpf_map_lookup_elem(int fd, const void *key, void *value)
Returns ``struct xdp_sock *`` or negative error in case of failure.
bpf_map_delete_elem()
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: c
int bpf_map_delete_elem(int fd, const void *key)
XSK entries can be deleted using the ``bpf_map_delete_elem()``
helper. This helper will return 0 on success, or negative error in case of
failure.
.. note::
When `libxdp`_ deletes an XSK it also removes the associated socket
entry from the XSKMAP.
Examples
========
Kernel
------
The following code snippet shows how to declare a ``BPF_MAP_TYPE_XSKMAP`` called
``xsks_map`` and how to redirect packets to an XSK.
.. code-block:: c
struct {
__uint(type, BPF_MAP_TYPE_XSKMAP);
__type(key, __u32);
__type(value, __u32);
__uint(max_entries, 64);
} xsks_map SEC(".maps");
SEC("xdp")
int xsk_redir_prog(struct xdp_md *ctx)
{
__u32 index = ctx->rx_queue_index;
if (bpf_map_lookup_elem(&xsks_map, &index))
return bpf_redirect_map(&xsks_map, index, 0);
return XDP_PASS;
}
User space
----------
The following code snippet shows how to update an XSKMAP with an XSK entry.
.. code-block:: c
int update_xsks_map(struct bpf_map *xsks_map, int queue_id, int xsk_fd)
{
int ret;
ret = bpf_map_update_elem(bpf_map__fd(xsks_map), &queue_id, &xsk_fd, 0);
if (ret < 0)
fprintf(stderr, "Failed to update xsks_map: %s\n", strerror(errno));
return ret;
}
For an example on how create AF_XDP sockets, please see the AF_XDP-example and
AF_XDP-forwarding programs in the `bpf-examples`_ directory in the `libxdp`_ repository.
For a detailed explaination of the AF_XDP interface please see:
- `libxdp-readme`_.
- `AF_XDP`_ kernel documentation.
.. note::
The most comprehensive resource for using XSKMAPs and AF_XDP is `libxdp`_.
.. _libxdp: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
.. _AF_XDP: https://www.kernel.org/doc/html/latest/networking/af_xdp.html
.. _bpf-examples: https://github.com/xdp-project/bpf-examples
.. _libxdp-readme: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp#using-af_xdp-sockets

View file

@ -7,3 +7,6 @@ Program Types
:glob:
prog_*
For a list of all program types, see :ref:`program_types_and_elf` in
the :ref:`libbpf` documentation.

View file

@ -0,0 +1,81 @@
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.
========
Redirect
========
XDP_REDIRECT
############
Supported maps
--------------
XDP_REDIRECT works with the following map types:
- ``BPF_MAP_TYPE_DEVMAP``
- ``BPF_MAP_TYPE_DEVMAP_HASH``
- ``BPF_MAP_TYPE_CPUMAP``
- ``BPF_MAP_TYPE_XSKMAP``
For more information on these maps, please see the specific map documentation.
Process
-------
.. kernel-doc:: net/core/filter.c
:doc: xdp redirect
.. note::
Not all drivers support transmitting frames after a redirect, and for
those that do, not all of them support non-linear frames. Non-linear xdp
bufs/frames are bufs/frames that contain more than one fragment.
Debugging packet drops
----------------------
Silent packet drops for XDP_REDIRECT can be debugged using:
- bpf_trace
- perf_record
bpf_trace
^^^^^^^^^
The following bpftrace command can be used to capture and count all XDP tracepoints:
.. code-block:: none
sudo bpftrace -e 'tracepoint:xdp:* { @cnt[probe] = count(); }'
Attaching 12 probes...
^C
@cnt[tracepoint:xdp:mem_connect]: 18
@cnt[tracepoint:xdp:mem_disconnect]: 18
@cnt[tracepoint:xdp:xdp_exception]: 19605
@cnt[tracepoint:xdp:xdp_devmap_xmit]: 1393604
@cnt[tracepoint:xdp:xdp_redirect]: 22292200
.. note::
The various xdp tracepoints can be found in ``source/include/trace/events/xdp.h``
The following bpftrace command can be used to extract the ``ERRNO`` being returned as
part of the err parameter:
.. code-block:: none
sudo bpftrace -e \
'tracepoint:xdp:xdp_redirect*_err {@redir_errno[-args->err] = count();}
tracepoint:xdp:xdp_devmap_xmit {@devmap_errno[-args->err] = count();}'
perf record
^^^^^^^^^^^
The perf tool also supports recording tracepoints:
.. code-block:: none
perf record -a -e xdp:xdp_redirect_err \
-e xdp:xdp_redirect_map_err \
-e xdp:xdp_exception \
-e xdp:xdp_devmap_xmit
References
===========
- https://github.com/xdp-project/xdp-tutorial/tree/master/tracing02-xdp-monitor

View file

@ -54,6 +54,8 @@ struct cgroup;
extern struct idr btf_idr;
extern spinlock_t btf_idr_lock;
extern struct kobject *btf_kobj;
extern struct bpf_mem_alloc bpf_global_ma;
extern bool bpf_global_ma_set;
typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
@ -85,7 +87,8 @@ struct bpf_map_ops {
int (*map_lookup_and_delete_batch)(struct bpf_map *map,
const union bpf_attr *attr,
union bpf_attr __user *uattr);
int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr,
int (*map_update_batch)(struct bpf_map *map, struct file *map_file,
const union bpf_attr *attr,
union bpf_attr __user *uattr);
int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
union bpf_attr __user *uattr);
@ -135,7 +138,7 @@ struct bpf_map_ops {
struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner);
/* Misc helpers.*/
int (*map_redirect)(struct bpf_map *map, u32 ifindex, u64 flags);
int (*map_redirect)(struct bpf_map *map, u64 key, u64 flags);
/* map_meta_equal must be implemented for maps that can be
* used as an inner map. It is a runtime check to ensure
@ -165,9 +168,8 @@ struct bpf_map_ops {
};
enum {
/* Support at most 8 pointers in a BTF type */
BTF_FIELDS_MAX = 10,
BPF_MAP_OFF_ARR_MAX = BTF_FIELDS_MAX,
/* Support at most 10 fields in a BTF type */
BTF_FIELDS_MAX = 10,
};
enum btf_field_type {
@ -176,6 +178,8 @@ enum btf_field_type {
BPF_KPTR_UNREF = (1 << 2),
BPF_KPTR_REF = (1 << 3),
BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF,
BPF_LIST_HEAD = (1 << 4),
BPF_LIST_NODE = (1 << 5),
};
struct btf_field_kptr {
@ -185,11 +189,19 @@ struct btf_field_kptr {
u32 btf_id;
};
struct btf_field_list_head {
struct btf *btf;
u32 value_btf_id;
u32 node_offset;
struct btf_record *value_rec;
};
struct btf_field {
u32 offset;
enum btf_field_type type;
union {
struct btf_field_kptr kptr;
struct btf_field_list_head list_head;
};
};
@ -203,8 +215,8 @@ struct btf_record {
struct btf_field_offs {
u32 cnt;
u32 field_off[BPF_MAP_OFF_ARR_MAX];
u8 field_sz[BPF_MAP_OFF_ARR_MAX];
u32 field_off[BTF_FIELDS_MAX];
u8 field_sz[BTF_FIELDS_MAX];
};
struct bpf_map {
@ -267,6 +279,10 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
return "kptr";
case BPF_LIST_HEAD:
return "bpf_list_head";
case BPF_LIST_NODE:
return "bpf_list_node";
default:
WARN_ON_ONCE(1);
return "unknown";
@ -283,6 +299,10 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
return sizeof(u64);
case BPF_LIST_HEAD:
return sizeof(struct bpf_list_head);
case BPF_LIST_NODE:
return sizeof(struct bpf_list_node);
default:
WARN_ON_ONCE(1);
return 0;
@ -299,6 +319,10 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
return __alignof__(u64);
case BPF_LIST_HEAD:
return __alignof__(struct bpf_list_head);
case BPF_LIST_NODE:
return __alignof__(struct bpf_list_node);
default:
WARN_ON_ONCE(1);
return 0;
@ -312,16 +336,19 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
return rec->field_mask & type;
}
static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
{
int i;
if (!foffs)
return;
for (i = 0; i < foffs->cnt; i++)
memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
}
static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
{
if (!IS_ERR_OR_NULL(map->record)) {
struct btf_field *fields = map->record->fields;
u32 cnt = map->record->cnt;
int i;
for (i = 0; i < cnt; i++)
memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
}
bpf_obj_init(map->field_offs, dst);
}
/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
@ -361,7 +388,7 @@ static inline void bpf_obj_memcpy(struct btf_field_offs *foffs,
u32 sz = next_off - curr_off;
memcpy(dst + curr_off, src + curr_off, sz);
curr_off = next_off + foffs->field_sz[i];
curr_off += foffs->field_sz[i] + sz;
}
memcpy(dst + curr_off, src + curr_off, size - curr_off);
}
@ -391,7 +418,7 @@ static inline void bpf_obj_memzero(struct btf_field_offs *foffs, void *dst, u32
u32 sz = next_off - curr_off;
memset(dst + curr_off, 0, sz);
curr_off = next_off + foffs->field_sz[i];
curr_off += foffs->field_sz[i] + sz;
}
memset(dst + curr_off, 0, size - curr_off);
}
@ -404,6 +431,9 @@ static inline void zero_map_value(struct bpf_map *map, void *dst)
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
bool lock_src);
void bpf_timer_cancel_and_free(void *timer);
void bpf_list_head_free(const struct btf_field *field, void *list_head,
struct bpf_spin_lock *spin_lock);
int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size);
struct bpf_offload_dev;
@ -472,10 +502,8 @@ enum bpf_type_flag {
*/
MEM_RDONLY = BIT(1 + BPF_BASE_TYPE_BITS),
/* MEM was "allocated" from a different helper, and cannot be mixed
* with regular non-MEM_ALLOC'ed MEM types.
*/
MEM_ALLOC = BIT(2 + BPF_BASE_TYPE_BITS),
/* MEM points to BPF ring buffer reservation. */
MEM_RINGBUF = BIT(2 + BPF_BASE_TYPE_BITS),
/* MEM is in user address space. */
MEM_USER = BIT(3 + BPF_BASE_TYPE_BITS),
@ -510,6 +538,43 @@ enum bpf_type_flag {
/* Size is known at compile time. */
MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS),
/* MEM is of an allocated object of type in program BTF. This is used to
* tag PTR_TO_BTF_ID allocated using bpf_obj_new.
*/
MEM_ALLOC = BIT(11 + BPF_BASE_TYPE_BITS),
/* PTR was passed from the kernel in a trusted context, and may be
* passed to KF_TRUSTED_ARGS kfuncs or BPF helper functions.
* Confusingly, this is _not_ the opposite of PTR_UNTRUSTED above.
* PTR_UNTRUSTED refers to a kptr that was read directly from a map
* without invoking bpf_kptr_xchg(). What we really need to know is
* whether a pointer is safe to pass to a kfunc or BPF helper function.
* While PTR_UNTRUSTED pointers are unsafe to pass to kfuncs and BPF
* helpers, they do not cover all possible instances of unsafe
* pointers. For example, a pointer that was obtained from walking a
* struct will _not_ get the PTR_UNTRUSTED type modifier, despite the
* fact that it may be NULL, invalid, etc. This is due to backwards
* compatibility requirements, as this was the behavior that was first
* introduced when kptrs were added. The behavior is now considered
* deprecated, and PTR_UNTRUSTED will eventually be removed.
*
* PTR_TRUSTED, on the other hand, is a pointer that the kernel
* guarantees to be valid and safe to pass to kfuncs and BPF helpers.
* For example, pointers passed to tracepoint arguments are considered
* PTR_TRUSTED, as are pointers that are passed to struct_ops
* callbacks. As alluded to above, pointers that are obtained from
* walking PTR_TRUSTED pointers are _not_ trusted. For example, if a
* struct task_struct *task is PTR_TRUSTED, then accessing
* task->last_wakee will lose the PTR_TRUSTED modifier when it's stored
* in a BPF register. Similarly, pointers passed to certain programs
* types such as kretprobes are not guaranteed to be valid, as they may
* for example contain an object that was recently freed.
*/
PTR_TRUSTED = BIT(12 + BPF_BASE_TYPE_BITS),
/* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */
MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS),
__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
};
@ -549,7 +614,7 @@ enum bpf_arg_type {
ARG_PTR_TO_LONG, /* pointer to long */
ARG_PTR_TO_SOCKET, /* pointer to bpf_sock (fullsock) */
ARG_PTR_TO_BTF_ID, /* pointer to in-kernel struct */
ARG_PTR_TO_ALLOC_MEM, /* pointer to dynamically allocated memory */
ARG_PTR_TO_RINGBUF_MEM, /* pointer to dynamically reserved ringbuf memory */
ARG_CONST_ALLOC_SIZE_OR_ZERO, /* number of allocated bytes requested */
ARG_PTR_TO_BTF_ID_SOCK_COMMON, /* pointer to in-kernel sock_common or bpf-mirrored bpf_sock */
ARG_PTR_TO_PERCPU_BTF_ID, /* pointer to in-kernel percpu type */
@ -566,7 +631,6 @@ enum bpf_arg_type {
ARG_PTR_TO_MEM_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_MEM,
ARG_PTR_TO_CTX_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_CTX,
ARG_PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
ARG_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
ARG_PTR_TO_STACK_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
ARG_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,
/* pointer to memory does not need to be initialized, helper function must fill
@ -591,7 +655,7 @@ enum bpf_return_type {
RET_PTR_TO_SOCKET, /* returns a pointer to a socket */
RET_PTR_TO_TCP_SOCK, /* returns a pointer to a tcp_sock */
RET_PTR_TO_SOCK_COMMON, /* returns a pointer to a sock_common */
RET_PTR_TO_ALLOC_MEM, /* returns a pointer to dynamically allocated memory */
RET_PTR_TO_MEM, /* returns a pointer to memory */
RET_PTR_TO_MEM_OR_BTF_ID, /* returns a pointer to a valid memory or a btf_id */
RET_PTR_TO_BTF_ID, /* returns a pointer to a btf_id */
__BPF_RET_TYPE_MAX,
@ -601,9 +665,10 @@ enum bpf_return_type {
RET_PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_SOCKET,
RET_PTR_TO_TCP_SOCK_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK,
RET_PTR_TO_SOCK_COMMON_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON,
RET_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_ALLOC_MEM,
RET_PTR_TO_DYNPTR_MEM_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_ALLOC_MEM,
RET_PTR_TO_RINGBUF_MEM_OR_NULL = PTR_MAYBE_NULL | MEM_RINGBUF | RET_PTR_TO_MEM,
RET_PTR_TO_DYNPTR_MEM_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_MEM,
RET_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
RET_PTR_TO_BTF_ID_TRUSTED = PTR_TRUSTED | RET_PTR_TO_BTF_ID,
/* This must be the last entry. Its purpose is to ensure the enum is
* wide enough to hold the higher bits reserved for bpf_type_flag.
@ -620,6 +685,7 @@ struct bpf_func_proto {
u64 (*func)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
bool gpl_only;
bool pkt_access;
bool might_sleep;
enum bpf_return_type ret_type;
union {
struct {
@ -758,6 +824,7 @@ struct bpf_prog_ops {
union bpf_attr __user *uattr);
};
struct bpf_reg_state;
struct bpf_verifier_ops {
/* return eBPF function prototype for verification */
const struct bpf_func_proto *
@ -779,9 +846,8 @@ struct bpf_verifier_ops {
struct bpf_insn *dst,
struct bpf_prog *prog, u32 *target_size);
int (*btf_struct_access)(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag);
};
@ -1794,7 +1860,7 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
int generic_map_lookup_batch(struct bpf_map *map,
const union bpf_attr *attr,
union bpf_attr __user *uattr);
int generic_map_update_batch(struct bpf_map *map,
int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
const union bpf_attr *attr,
union bpf_attr __user *uattr);
int generic_map_delete_batch(struct bpf_map *map,
@ -2085,9 +2151,9 @@ static inline bool bpf_tracing_btf_ctx_access(int off, int size,
return btf_ctx_access(off, size, type, prog, info);
}
int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
int btf_struct_access(struct bpf_verifier_log *log,
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag);
bool btf_struct_ids_match(struct bpf_verifier_log *log,
const struct btf *btf, u32 id, int off,
@ -2100,22 +2166,11 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
const char *func_name,
struct btf_func_model *m);
struct bpf_kfunc_arg_meta {
u64 r0_size;
bool r0_rdonly;
int ref_obj_id;
u32 flags;
};
struct bpf_reg_state;
int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *regs);
int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *regs);
int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
const struct btf *btf, u32 func_id,
struct bpf_reg_state *regs,
struct bpf_kfunc_arg_meta *meta);
int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *reg);
int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
@ -2338,9 +2393,8 @@ static inline struct bpf_prog *bpf_prog_by_id(u32 id)
}
static inline int btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag)
{
return -EACCES;
@ -2797,4 +2851,10 @@ struct bpf_key {
bool has_ref;
};
#endif /* CONFIG_KEYS */
static inline bool type_is_alloc(u32 type)
{
return type & MEM_ALLOC;
}
#endif /* _LINUX_BPF_H */

View file

@ -19,7 +19,7 @@
*/
#define BPF_MAX_VAR_SIZ (1 << 29)
/* size of type_str_buf in bpf_verifier. */
#define TYPE_STR_BUF_LEN 64
#define TYPE_STR_BUF_LEN 128
/* Liveness marks, used for registers and spilled-regs (in stack slots).
* Read marks propagate upwards until they find a write mark; they record that
@ -223,6 +223,11 @@ struct bpf_reference_state {
* exiting a callback function.
*/
int callback_ref;
/* Mark the reference state to release the registers sharing the same id
* on bpf_spin_unlock (for nodes that we will lose ownership to but are
* safe to access inside the critical section).
*/
bool release_on_unlock;
};
/* state of the program:
@ -323,8 +328,23 @@ struct bpf_verifier_state {
u32 branches;
u32 insn_idx;
u32 curframe;
u32 active_spin_lock;
/* For every reg representing a map value or allocated object pointer,
* we consider the tuple of (ptr, id) for them to be unique in verifier
* context and conside them to not alias each other for the purposes of
* tracking lock state.
*/
struct {
/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
* there's no active lock held, and other fields have no
* meaning. If non-NULL, it indicates that a lock is held and
* id member has the reg->id of the register which can be >= 0.
*/
void *ptr;
/* This will be reg->id */
u32 id;
} active_lock;
bool speculative;
bool active_rcu_lock;
/* first and last insn idx of this verifier state */
u32 first_insn_idx;
@ -419,11 +439,14 @@ struct bpf_insn_aux_data {
*/
struct bpf_loop_inline_state loop_inline_state;
};
u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */
struct btf_struct_meta *kptr_struct_meta;
u64 map_key_state; /* constant (32 bit) key tracking for maps */
int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */
bool zext_dst; /* this insn zero extends dst reg */
bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
u8 alu_state; /* used in combination with alu_limit */
/* below fields are initialized once */
@ -513,6 +536,7 @@ struct bpf_verifier_env {
bool bypass_spec_v1;
bool bypass_spec_v4;
bool seen_direct_write;
bool rcu_tag_supported;
struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
const struct bpf_line_info *prev_linfo;
struct bpf_verifier_log log;
@ -589,8 +613,6 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
int check_func_arg_reg_off(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno,
enum bpf_arg_type arg_type);
int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno, u32 mem_size);
bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env,
@ -661,4 +683,11 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
}
}
#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | MEM_RCU | PTR_TRUSTED)
static inline bool bpf_type_has_unsafe_modifiers(u32 type)
{
return type_flag(type) & ~BPF_REG_TRUSTED_MODIFIERS;
}
#endif /* _LINUX_BPF_VERIFIER_H */

View file

@ -6,6 +6,8 @@
#include <linux/types.h>
#include <linux/bpfptr.h>
#include <linux/bsearch.h>
#include <linux/btf_ids.h>
#include <uapi/linux/btf.h>
#include <uapi/linux/bpf.h>
@ -17,36 +19,53 @@
#define KF_RELEASE (1 << 1) /* kfunc is a release function */
#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */
#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */
/* Trusted arguments are those which are meant to be referenced arguments with
* unchanged offset. It is used to enforce that pointers obtained from acquire
* kfuncs remain unmodified when being passed to helpers taking trusted args.
/* Trusted arguments are those which are guaranteed to be valid when passed to
* the kfunc. It is used to enforce that pointers obtained from either acquire
* kfuncs, or from the main kernel on a tracepoint or struct_ops callback
* invocation, remain unmodified when being passed to helpers taking trusted
* args.
*
* Consider
* struct foo {
* int data;
* struct foo *next;
* };
* Consider, for example, the following new task tracepoint:
*
* struct bar {
* int data;
* struct foo f;
* };
* SEC("tp_btf/task_newtask")
* int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
* {
* ...
* }
*
* struct foo *f = alloc_foo(); // Acquire kfunc
* struct bar *b = alloc_bar(); // Acquire kfunc
* And the following kfunc:
*
* If a kfunc set_foo_data() wants to operate only on the allocated object, it
* will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
* BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
*
* set_foo_data(f, 42); // Allowed
* set_foo_data(f->next, 42); // Rejected, non-referenced pointer
* set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type
* set_foo_data(&b->f, 42); // Rejected, referenced, but bad offset
* All invocations to the kfunc must pass the unmodified, unwalked task:
*
* In the final case, usually for the purposes of type matching, it is deduced
* by looking at the type of the member at the offset, but due to the
* requirement of trusted argument, this deduction will be strict and not done
* for this case.
* bpf_task_acquire(task); // Allowed
* bpf_task_acquire(task->last_wakee); // Rejected, walked task
*
* Programs may also pass referenced tasks directly to the kfunc:
*
* struct task_struct *acquired;
*
* acquired = bpf_task_acquire(task); // Allowed, same as above
* bpf_task_acquire(acquired); // Allowed
* bpf_task_acquire(task); // Allowed
* bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
*
* Programs may _not_, however, pass a task from an arbitrary fentry/fexit, or
* kprobe/kretprobe to the kfunc, as BPF cannot guarantee that all of these
* pointers are guaranteed to be safe. For example, the following BPF program
* would be rejected:
*
* SEC("kretprobe/free_task")
* int BPF_PROG(free_task_probe, struct task_struct *tsk)
* {
* struct task_struct *acquired;
*
* acquired = bpf_task_acquire(acquired); // Rejected, not a trusted pointer
* bpf_task_release(acquired);
*
* return 0;
* }
*/
#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */
@ -78,6 +97,17 @@ struct btf_id_dtor_kfunc {
u32 kfunc_btf_id;
};
struct btf_struct_meta {
u32 btf_id;
struct btf_record *record;
struct btf_field_offs *field_offs;
};
struct btf_struct_metas {
u32 cnt;
struct btf_struct_meta types[];
};
typedef void (*btf_dtor_kfunc_t)(void *);
extern const struct file_operations btf_fops;
@ -165,6 +195,7 @@ int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
int btf_find_timer(const struct btf *btf, const struct btf_type *t);
struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
u32 field_mask, u32 value_size);
int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec);
struct btf_field_offs *btf_parse_field_offs(struct btf_record *rec);
bool btf_type_is_void(const struct btf_type *t);
s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
@ -324,6 +355,16 @@ static inline bool btf_type_is_struct(const struct btf_type *t)
return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
}
static inline bool __btf_type_is_struct(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
}
static inline bool btf_type_is_array(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
}
static inline u16 btf_type_vlen(const struct btf_type *t)
{
return BTF_INFO_VLEN(t->info);
@ -408,9 +449,27 @@ static inline struct btf_param *btf_params(const struct btf_type *t)
return (struct btf_param *)(t + 1);
}
#ifdef CONFIG_BPF_SYSCALL
struct bpf_prog;
static inline int btf_id_cmp_func(const void *a, const void *b)
{
const int *pa = a, *pb = b;
return *pa - *pb;
}
static inline bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
{
return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
}
static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
{
return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
}
struct bpf_prog;
struct bpf_verifier_log;
#ifdef CONFIG_BPF_SYSCALL
const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
const char *btf_name_by_offset(const struct btf *btf, u32 offset);
struct btf *btf_parse_vmlinux(void);
@ -423,6 +482,14 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
struct module *owner);
struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
const struct btf_member *
btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, enum bpf_prog_type prog_type,
int arg);
int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_type);
bool btf_types_are_same(const struct btf *btf1, u32 id1,
const struct btf *btf2, u32 id2);
#else
static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
u32 type_id)
@ -454,6 +521,26 @@ static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dt
{
return 0;
}
static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id)
{
return NULL;
}
static inline const struct btf_member *
btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, enum bpf_prog_type prog_type,
int arg)
{
return NULL;
}
static inline int get_kern_ctx_btf_id(struct bpf_verifier_log *log,
enum bpf_prog_type prog_type) {
return -EINVAL;
}
static inline bool btf_types_are_same(const struct btf *btf1, u32 id1,
const struct btf *btf2, u32 id2)
{
return false;
}
#endif
static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)

View file

@ -204,7 +204,7 @@ extern struct btf_id_set8 name;
#else
#define BTF_ID_LIST(name) static u32 __maybe_unused name[5];
#define BTF_ID_LIST(name) static u32 __maybe_unused name[16];
#define BTF_ID(prefix, name)
#define BTF_ID_FLAGS(prefix, name, ...)
#define BTF_ID_UNUSED

View file

@ -49,7 +49,8 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { }
# endif
# define __iomem
# define __percpu BTF_TYPE_TAG(percpu)
# define __rcu
# define __rcu BTF_TYPE_TAG(rcu)
# define __chk_user_ptr(x) (void)0
# define __chk_io_ptr(x) (void)0
/* context/locking */

View file

@ -568,10 +568,10 @@ struct sk_filter {
DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
extern struct mutex nf_conn_btf_access_lock;
extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype, u32 *next_btf_id,
enum bpf_type_flag *flag);
extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag);
typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
const struct bpf_insn *insnsi,
@ -643,13 +643,13 @@ struct bpf_nh_params {
};
struct bpf_redirect_info {
u32 flags;
u32 tgt_index;
u64 tgt_index;
void *tgt_value;
struct bpf_map *map;
u32 flags;
u32 kern_flags;
u32 map_id;
enum bpf_map_type map_type;
u32 kern_flags;
struct bpf_nh_params nh;
};
@ -1504,7 +1504,7 @@ static inline bool bpf_sk_lookup_run_v6(struct net *net, int protocol,
}
#endif /* IS_ENABLED(CONFIG_IPV6) */
static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifindex,
static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u64 index,
u64 flags, const u64 flag_mask,
void *lookup_elem(struct bpf_map *map, u32 key))
{
@ -1515,7 +1515,7 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind
if (unlikely(flags & ~(action_mask | flag_mask)))
return XDP_ABORTED;
ri->tgt_value = lookup_elem(map, ifindex);
ri->tgt_value = lookup_elem(map, index);
if (unlikely(!ri->tgt_value) && !(flags & BPF_F_BROADCAST)) {
/* If the lookup fails we want to clear out the state in the
* redirect_info struct completely, so that if an eBPF program
@ -1527,7 +1527,7 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind
return flags & action_mask;
}
ri->tgt_index = ifindex;
ri->tgt_index = index;
ri->map_id = map->id;
ri->map_type = map->map_type;

View file

@ -3135,7 +3135,6 @@ struct softnet_data {
/* stats */
unsigned int processed;
unsigned int time_squeeze;
unsigned int received_rps;
#ifdef CONFIG_RPS
struct softnet_data *rps_ipi_list;
#endif
@ -3168,6 +3167,7 @@ struct softnet_data {
unsigned int cpu;
unsigned int input_queue_tail;
#endif
unsigned int received_rps;
unsigned int dropped;
struct sk_buff_head input_pkt_queue;
struct napi_struct backlog;

View file

@ -2584,14 +2584,19 @@ union bpf_attr {
* * **SOL_SOCKET**, which supports the following *optname*\ s:
* **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
* **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**.
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
* **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
* * **IPPROTO_TCP**, which supports the following *optname*\ s:
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
* **TCP_KEEPIDLE**, **TCP_KEEPINTVL**, **TCP_KEEPCNT**,
* **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**.
* **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**,
* **TCP_NODELAY**, **TCP_MAXSEG**, **TCP_WINDOW_CLAMP**,
* **TCP_THIN_LINEAR_TIMEOUTS**, **TCP_BPF_DELACK_MAX**,
* **TCP_BPF_RTO_MIN**.
* * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
* * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
* * **IPPROTO_IPV6**, which supports the following *optname*\ s:
* **IPV6_TCLASS**, **IPV6_AUTOFLOWLABEL**.
* Return
* 0 on success, or a negative error in case of failure.
*
@ -2647,7 +2652,7 @@ union bpf_attr {
* Return
* 0 on success, or a negative error in case of failure.
*
* long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
* long bpf_redirect_map(struct bpf_map *map, u64 key, u64 flags)
* Description
* Redirect the packet to the endpoint referenced by *map* at
* index *key*. Depending on its type, this *map* can contain
@ -2808,12 +2813,10 @@ union bpf_attr {
* and **BPF_CGROUP_INET6_CONNECT**.
*
* This helper actually implements a subset of **getsockopt()**.
* It supports the following *level*\ s:
*
* * **IPPROTO_TCP**, which supports *optname*
* **TCP_CONGESTION**.
* * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
* * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
* It supports the same set of *optname*\ s that is supported by
* the **bpf_setsockopt**\ () helper. The exceptions are
* **TCP_BPF_*** is **bpf_setsockopt**\ () only and
* **TCP_SAVED_SYN** is **bpf_getsockopt**\ () only.
* Return
* 0 on success, or a negative error in case of failure.
*
@ -6888,6 +6891,16 @@ struct bpf_dynptr {
__u64 :64;
} __attribute__((aligned(8)));
struct bpf_list_head {
__u64 :64;
__u64 :64;
} __attribute__((aligned(8)));
struct bpf_list_node {
__u64 :64;
__u64 :64;
} __attribute__((aligned(8)));
struct bpf_sysctl {
__u32 write; /* Sysctl is being read (= 0) or written (= 1).
* Allows 1,2,4-byte read, but no write.

View file

@ -430,7 +430,6 @@ static void array_map_free(struct bpf_map *map)
for (i = 0; i < array->map.max_entries; i++)
bpf_obj_free_fields(map->record, array_map_elem_ptr(array, i));
}
bpf_map_free_record(map);
}
if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)

View file

@ -151,6 +151,7 @@ BTF_ID_LIST_SINGLE(bpf_ima_inode_hash_btf_ids, struct, inode)
static const struct bpf_func_proto bpf_ima_inode_hash_proto = {
.func = bpf_ima_inode_hash,
.gpl_only = false,
.might_sleep = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_BTF_ID,
.arg1_btf_id = &bpf_ima_inode_hash_btf_ids[0],
@ -169,6 +170,7 @@ BTF_ID_LIST_SINGLE(bpf_ima_file_hash_btf_ids, struct, file)
static const struct bpf_func_proto bpf_ima_file_hash_proto = {
.func = bpf_ima_file_hash,
.gpl_only = false,
.might_sleep = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_BTF_ID,
.arg1_btf_id = &bpf_ima_file_hash_btf_ids[0],
@ -221,9 +223,9 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_bprm_opts_set:
return &bpf_bprm_opts_set_proto;
case BPF_FUNC_ima_inode_hash:
return prog->aux->sleepable ? &bpf_ima_inode_hash_proto : NULL;
return &bpf_ima_inode_hash_proto;
case BPF_FUNC_ima_file_hash:
return prog->aux->sleepable ? &bpf_ima_file_hash_proto : NULL;
return &bpf_ima_file_hash_proto;
case BPF_FUNC_get_attach_cookie:
return bpf_prog_has_trampoline(prog) ? &bpf_get_attach_cookie_proto : NULL;
#ifdef CONFIG_NET

File diff suppressed because it is too large Load diff

View file

@ -164,16 +164,30 @@ static int cgroup_iter_seq_init(void *priv, struct bpf_iter_aux_info *aux)
struct cgroup_iter_priv *p = (struct cgroup_iter_priv *)priv;
struct cgroup *cgrp = aux->cgroup.start;
/* bpf_iter_attach_cgroup() has already acquired an extra reference
* for the start cgroup, but the reference may be released after
* cgroup_iter_seq_init(), so acquire another reference for the
* start cgroup.
*/
p->start_css = &cgrp->self;
css_get(p->start_css);
p->terminate = false;
p->visited_all = false;
p->order = aux->cgroup.order;
return 0;
}
static void cgroup_iter_seq_fini(void *priv)
{
struct cgroup_iter_priv *p = (struct cgroup_iter_priv *)priv;
css_put(p->start_css);
}
static const struct bpf_iter_seq_info cgroup_iter_seq_info = {
.seq_ops = &cgroup_iter_seq_ops,
.init_seq_private = cgroup_iter_seq_init,
.fini_seq_private = cgroup_iter_seq_fini,
.seq_priv_size = sizeof(struct cgroup_iter_priv),
};

View file

@ -34,6 +34,7 @@
#include <linux/log2.h>
#include <linux/bpf_verifier.h>
#include <linux/nodemask.h>
#include <linux/bpf_mem_alloc.h>
#include <asm/barrier.h>
#include <asm/unaligned.h>
@ -60,6 +61,9 @@
#define CTX regs[BPF_REG_CTX]
#define IMM insn->imm
struct bpf_mem_alloc bpf_global_ma;
bool bpf_global_ma_set;
/* No hurry in this branch
*
* Exported for the bpf jit load helper.
@ -2746,6 +2750,18 @@ int __weak bpf_arch_text_invalidate(void *dst, size_t len)
return -ENOTSUPP;
}
#ifdef CONFIG_BPF_SYSCALL
static int __init bpf_global_ma_init(void)
{
int ret;
ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
bpf_global_ma_set = !ret;
return ret;
}
late_initcall(bpf_global_ma_init);
#endif
DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
EXPORT_SYMBOL(bpf_stats_enabled_key);

View file

@ -667,9 +667,9 @@ static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
return 0;
}
static int cpu_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags)
static int cpu_map_redirect(struct bpf_map *map, u64 index, u64 flags)
{
return __bpf_xdp_redirect_map(map, ifindex, flags, 0,
return __bpf_xdp_redirect_map(map, index, flags, 0,
__cpu_map_lookup_elem);
}

View file

@ -992,14 +992,14 @@ static int dev_map_hash_update_elem(struct bpf_map *map, void *key, void *value,
map, key, value, map_flags);
}
static int dev_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags)
static int dev_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
{
return __bpf_xdp_redirect_map(map, ifindex, flags,
BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS,
__dev_map_lookup_elem);
}
static int dev_hash_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags)
static int dev_hash_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
{
return __bpf_xdp_redirect_map(map, ifindex, flags,
BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS,

View file

@ -1511,7 +1511,6 @@ static void htab_map_free(struct bpf_map *map)
prealloc_destroy(htab);
}
bpf_map_free_record(map);
free_percpu(htab->extra_elems);
bpf_map_area_free(htab->buckets);
bpf_mem_alloc_destroy(&htab->pcpu_ma);

View file

@ -4,6 +4,7 @@
#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/bpf-cgroup.h>
#include <linux/cgroup.h>
#include <linux/rcupdate.h>
#include <linux/random.h>
#include <linux/smp.h>
@ -19,6 +20,7 @@
#include <linux/proc_ns.h>
#include <linux/security.h>
#include <linux/btf_ids.h>
#include <linux/bpf_mem_alloc.h>
#include "../../lib/kstrtox.h"
@ -336,6 +338,7 @@ const struct bpf_func_proto bpf_spin_lock_proto = {
.gpl_only = false,
.ret_type = RET_VOID,
.arg1_type = ARG_PTR_TO_SPIN_LOCK,
.arg1_btf_id = BPF_PTR_POISON,
};
static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
@ -358,6 +361,7 @@ const struct bpf_func_proto bpf_spin_unlock_proto = {
.gpl_only = false,
.ret_type = RET_VOID,
.arg1_type = ARG_PTR_TO_SPIN_LOCK,
.arg1_btf_id = BPF_PTR_POISON,
};
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
@ -657,6 +661,7 @@ BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size,
const struct bpf_func_proto bpf_copy_from_user_proto = {
.func = bpf_copy_from_user,
.gpl_only = false,
.might_sleep = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
@ -687,6 +692,7 @@ BPF_CALL_5(bpf_copy_from_user_task, void *, dst, u32, size,
const struct bpf_func_proto bpf_copy_from_user_task_proto = {
.func = bpf_copy_from_user_task,
.gpl_only = true,
.might_sleep = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
@ -1706,20 +1712,367 @@ bpf_base_func_proto(enum bpf_func_id func_id)
}
}
BTF_SET8_START(tracing_btf_ids)
void bpf_list_head_free(const struct btf_field *field, void *list_head,
struct bpf_spin_lock *spin_lock)
{
struct list_head *head = list_head, *orig_head = list_head;
BUILD_BUG_ON(sizeof(struct list_head) > sizeof(struct bpf_list_head));
BUILD_BUG_ON(__alignof__(struct list_head) > __alignof__(struct bpf_list_head));
/* Do the actual list draining outside the lock to not hold the lock for
* too long, and also prevent deadlocks if tracing programs end up
* executing on entry/exit of functions called inside the critical
* section, and end up doing map ops that call bpf_list_head_free for
* the same map value again.
*/
__bpf_spin_lock_irqsave(spin_lock);
if (!head->next || list_empty(head))
goto unlock;
head = head->next;
unlock:
INIT_LIST_HEAD(orig_head);
__bpf_spin_unlock_irqrestore(spin_lock);
while (head != orig_head) {
void *obj = head;
obj -= field->list_head.node_offset;
head = head->next;
/* The contained type can also have resources, including a
* bpf_list_head which needs to be freed.
*/
bpf_obj_free_fields(field->list_head.value_rec, obj);
/* bpf_mem_free requires migrate_disable(), since we can be
* called from map free path as well apart from BPF program (as
* part of map ops doing bpf_obj_free_fields).
*/
migrate_disable();
bpf_mem_free(&bpf_global_ma, obj);
migrate_enable();
}
}
__diag_push();
__diag_ignore_all("-Wmissing-prototypes",
"Global functions as their definitions will be in vmlinux BTF");
void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
{
struct btf_struct_meta *meta = meta__ign;
u64 size = local_type_id__k;
void *p;
p = bpf_mem_alloc(&bpf_global_ma, size);
if (!p)
return NULL;
if (meta)
bpf_obj_init(meta->field_offs, p);
return p;
}
void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
{
struct btf_struct_meta *meta = meta__ign;
void *p = p__alloc;
if (meta)
bpf_obj_free_fields(meta->record, p);
bpf_mem_free(&bpf_global_ma, p);
}
static void __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head, bool tail)
{
struct list_head *n = (void *)node, *h = (void *)head;
if (unlikely(!h->next))
INIT_LIST_HEAD(h);
if (unlikely(!n->next))
INIT_LIST_HEAD(n);
tail ? list_add_tail(n, h) : list_add(n, h);
}
void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node)
{
return __bpf_list_add(node, head, false);
}
void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node)
{
return __bpf_list_add(node, head, true);
}
static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail)
{
struct list_head *n, *h = (void *)head;
if (unlikely(!h->next))
INIT_LIST_HEAD(h);
if (list_empty(h))
return NULL;
n = tail ? h->prev : h->next;
list_del_init(n);
return (struct bpf_list_node *)n;
}
struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head)
{
return __bpf_list_del(head, false);
}
struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
{
return __bpf_list_del(head, true);
}
/**
* bpf_task_acquire - Acquire a reference to a task. A task acquired by this
* kfunc which is not stored in a map as a kptr, must be released by calling
* bpf_task_release().
* @p: The task on which a reference is being acquired.
*/
struct task_struct *bpf_task_acquire(struct task_struct *p)
{
refcount_inc(&p->rcu_users);
return p;
}
/**
* bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
* kptr acquired by this kfunc which is not subsequently stored in a map, must
* be released by calling bpf_task_release().
* @pp: A pointer to a task kptr on which a reference is being acquired.
*/
struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
{
struct task_struct *p;
rcu_read_lock();
p = READ_ONCE(*pp);
/* Another context could remove the task from the map and release it at
* any time, including after we've done the lookup above. This is safe
* because we're in an RCU read region, so the task is guaranteed to
* remain valid until at least the rcu_read_unlock() below.
*/
if (p && !refcount_inc_not_zero(&p->rcu_users))
/* If the task had been removed from the map and freed as
* described above, refcount_inc_not_zero() will return false.
* The task will be freed at some point after the current RCU
* gp has ended, so just return NULL to the user.
*/
p = NULL;
rcu_read_unlock();
return p;
}
/**
* bpf_task_release - Release the reference acquired on a struct task_struct *.
* If this kfunc is invoked in an RCU read region, the task_struct is
* guaranteed to not be freed until the current grace period has ended, even if
* its refcount drops to 0.
* @p: The task on which a reference is being released.
*/
void bpf_task_release(struct task_struct *p)
{
if (!p)
return;
put_task_struct_rcu_user(p);
}
#ifdef CONFIG_CGROUPS
/**
* bpf_cgroup_acquire - Acquire a reference to a cgroup. A cgroup acquired by
* this kfunc which is not stored in a map as a kptr, must be released by
* calling bpf_cgroup_release().
* @cgrp: The cgroup on which a reference is being acquired.
*/
struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp)
{
cgroup_get(cgrp);
return cgrp;
}
/**
* bpf_cgroup_kptr_get - Acquire a reference on a struct cgroup kptr. A cgroup
* kptr acquired by this kfunc which is not subsequently stored in a map, must
* be released by calling bpf_cgroup_release().
* @cgrpp: A pointer to a cgroup kptr on which a reference is being acquired.
*/
struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp)
{
struct cgroup *cgrp;
rcu_read_lock();
/* Another context could remove the cgroup from the map and release it
* at any time, including after we've done the lookup above. This is
* safe because we're in an RCU read region, so the cgroup is
* guaranteed to remain valid until at least the rcu_read_unlock()
* below.
*/
cgrp = READ_ONCE(*cgrpp);
if (cgrp && !cgroup_tryget(cgrp))
/* If the cgroup had been removed from the map and freed as
* described above, cgroup_tryget() will return false. The
* cgroup will be freed at some point after the current RCU gp
* has ended, so just return NULL to the user.
*/
cgrp = NULL;
rcu_read_unlock();
return cgrp;
}
/**
* bpf_cgroup_release - Release the reference acquired on a struct cgroup *.
* If this kfunc is invoked in an RCU read region, the cgroup is guaranteed to
* not be freed until the current grace period has ended, even if its refcount
* drops to 0.
* @cgrp: The cgroup on which a reference is being released.
*/
void bpf_cgroup_release(struct cgroup *cgrp)
{
if (!cgrp)
return;
cgroup_put(cgrp);
}
/**
* bpf_cgroup_ancestor - Perform a lookup on an entry in a cgroup's ancestor
* array. A cgroup returned by this kfunc which is not subsequently stored in a
* map, must be released by calling bpf_cgroup_release().
* @cgrp: The cgroup for which we're performing a lookup.
* @level: The level of ancestor to look up.
*/
struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level)
{
struct cgroup *ancestor;
if (level > cgrp->level || level < 0)
return NULL;
ancestor = cgrp->ancestors[level];
cgroup_get(ancestor);
return ancestor;
}
#endif /* CONFIG_CGROUPS */
/**
* bpf_task_from_pid - Find a struct task_struct from its pid by looking it up
* in the root pid namespace idr. If a task is returned, it must either be
* stored in a map, or released with bpf_task_release().
* @pid: The pid of the task being looked up.
*/
struct task_struct *bpf_task_from_pid(s32 pid)
{
struct task_struct *p;
rcu_read_lock();
p = find_task_by_pid_ns(pid, &init_pid_ns);
if (p)
bpf_task_acquire(p);
rcu_read_unlock();
return p;
}
void *bpf_cast_to_kern_ctx(void *obj)
{
return obj;
}
void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k)
{
return obj__ign;
}
void bpf_rcu_read_lock(void)
{
rcu_read_lock();
}
void bpf_rcu_read_unlock(void)
{
rcu_read_unlock();
}
__diag_pop();
BTF_SET8_START(generic_btf_ids)
#ifdef CONFIG_KEXEC_CORE
BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
#endif
BTF_SET8_END(tracing_btf_ids)
BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_list_push_front)
BTF_ID_FLAGS(func, bpf_list_push_back)
BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
#ifdef CONFIG_CGROUPS
BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_SET8_END(generic_btf_ids)
static const struct btf_kfunc_id_set tracing_kfunc_set = {
static const struct btf_kfunc_id_set generic_kfunc_set = {
.owner = THIS_MODULE,
.set = &tracing_btf_ids,
.set = &generic_btf_ids,
};
BTF_ID_LIST(generic_dtor_ids)
BTF_ID(struct, task_struct)
BTF_ID(func, bpf_task_release)
#ifdef CONFIG_CGROUPS
BTF_ID(struct, cgroup)
BTF_ID(func, bpf_cgroup_release)
#endif
BTF_SET8_START(common_btf_ids)
BTF_ID_FLAGS(func, bpf_cast_to_kern_ctx)
BTF_ID_FLAGS(func, bpf_rdonly_cast)
BTF_ID_FLAGS(func, bpf_rcu_read_lock)
BTF_ID_FLAGS(func, bpf_rcu_read_unlock)
BTF_SET8_END(common_btf_ids)
static const struct btf_kfunc_id_set common_kfunc_set = {
.owner = THIS_MODULE,
.set = &common_btf_ids,
};
static int __init kfunc_init(void)
{
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
int ret;
const struct btf_id_dtor_kfunc generic_dtors[] = {
{
.btf_id = generic_dtor_ids[0],
.kfunc_btf_id = generic_dtor_ids[1]
},
#ifdef CONFIG_CGROUPS
{
.btf_id = generic_dtor_ids[2],
.kfunc_btf_id = generic_dtor_ids[3]
},
#endif
};
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set);
ret = ret ?: register_btf_id_dtor_kfuncs(generic_dtors,
ARRAY_SIZE(generic_dtors),
THIS_MODULE);
return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &common_kfunc_set);
}
late_initcall(kfunc_init);

View file

@ -12,6 +12,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
struct bpf_map *inner_map, *inner_map_meta;
u32 inner_map_meta_size;
struct fd f;
int ret;
f = fdget(inner_map_ufd);
inner_map = __bpf_map_get(f);
@ -20,18 +21,13 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
/* Does not support >1 level map-in-map */
if (inner_map->inner_map_meta) {
fdput(f);
return ERR_PTR(-EINVAL);
ret = -EINVAL;
goto put;
}
if (!inner_map->ops->map_meta_equal) {
fdput(f);
return ERR_PTR(-ENOTSUPP);
}
if (btf_record_has_field(inner_map->record, BPF_SPIN_LOCK)) {
fdput(f);
return ERR_PTR(-ENOTSUPP);
ret = -ENOTSUPP;
goto put;
}
inner_map_meta_size = sizeof(*inner_map_meta);
@ -41,8 +37,8 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
inner_map_meta = kzalloc(inner_map_meta_size, GFP_USER);
if (!inner_map_meta) {
fdput(f);
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
goto put;
}
inner_map_meta->map_type = inner_map->map_type;
@ -50,15 +46,33 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
inner_map_meta->value_size = inner_map->value_size;
inner_map_meta->map_flags = inner_map->map_flags;
inner_map_meta->max_entries = inner_map->max_entries;
inner_map_meta->record = btf_record_dup(inner_map->record);
if (IS_ERR(inner_map_meta->record)) {
/* btf_record_dup returns NULL or valid pointer in case of
* invalid/empty/valid, but ERR_PTR in case of errors. During
* equality NULL or IS_ERR is equivalent.
*/
fdput(f);
return ERR_CAST(inner_map_meta->record);
ret = PTR_ERR(inner_map_meta->record);
goto free;
}
if (inner_map_meta->record) {
struct btf_field_offs *field_offs;
/* If btf_record is !IS_ERR_OR_NULL, then field_offs is always
* valid.
*/
field_offs = kmemdup(inner_map->field_offs, sizeof(*inner_map->field_offs), GFP_KERNEL | __GFP_NOWARN);
if (!field_offs) {
ret = -ENOMEM;
goto free_rec;
}
inner_map_meta->field_offs = field_offs;
}
/* Note: We must use the same BTF, as we also used btf_record_dup above
* which relies on BTF being same for both maps, as some members like
* record->fields.list_head have pointers like value_rec pointing into
* inner_map->btf.
*/
if (inner_map->btf) {
btf_get(inner_map->btf);
inner_map_meta->btf = inner_map->btf;
@ -74,10 +88,18 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
fdput(f);
return inner_map_meta;
free_rec:
btf_record_free(inner_map_meta->record);
free:
kfree(inner_map_meta);
put:
fdput(f);
return ERR_PTR(ret);
}
void bpf_map_meta_free(struct bpf_map *map_meta)
{
kfree(map_meta->field_offs);
bpf_map_free_record(map_meta);
btf_put(map_meta->btf);
kfree(map_meta);

View file

@ -447,7 +447,7 @@ BPF_CALL_3(bpf_ringbuf_reserve, struct bpf_map *, map, u64, size, u64, flags)
const struct bpf_func_proto bpf_ringbuf_reserve_proto = {
.func = bpf_ringbuf_reserve,
.ret_type = RET_PTR_TO_ALLOC_MEM_OR_NULL,
.ret_type = RET_PTR_TO_RINGBUF_MEM_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_CONST_ALLOC_SIZE_OR_ZERO,
.arg3_type = ARG_ANYTHING,
@ -490,7 +490,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
const struct bpf_func_proto bpf_ringbuf_submit_proto = {
.func = bpf_ringbuf_submit,
.ret_type = RET_VOID,
.arg1_type = ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
.arg1_type = ARG_PTR_TO_RINGBUF_MEM | OBJ_RELEASE,
.arg2_type = ARG_ANYTHING,
};
@ -503,7 +503,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
const struct bpf_func_proto bpf_ringbuf_discard_proto = {
.func = bpf_ringbuf_discard,
.ret_type = RET_VOID,
.arg1_type = ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
.arg1_type = ARG_PTR_TO_RINGBUF_MEM | OBJ_RELEASE,
.arg2_type = ARG_ANYTHING,
};

View file

@ -175,8 +175,8 @@ static void maybe_wait_bpf_programs(struct bpf_map *map)
synchronize_rcu();
}
static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
void *value, __u64 flags)
static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
void *key, void *value, __u64 flags)
{
int err;
@ -190,7 +190,7 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
map->map_type == BPF_MAP_TYPE_SOCKMAP) {
return sock_map_update_elem_sys(map, key, value, flags);
} else if (IS_FD_PROG_ARRAY(map)) {
return bpf_fd_array_map_update_elem(map, f.file, key, value,
return bpf_fd_array_map_update_elem(map, map_file, key, value,
flags);
}
@ -205,12 +205,12 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
flags);
} else if (IS_FD_ARRAY(map)) {
rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value,
err = bpf_fd_array_map_update_elem(map, map_file, key, value,
flags);
rcu_read_unlock();
} else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
rcu_read_lock();
err = bpf_fd_htab_map_update_elem(map, f.file, key, value,
err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
flags);
rcu_read_unlock();
} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
@ -536,6 +536,10 @@ void btf_record_free(struct btf_record *rec)
module_put(rec->fields[i].kptr.module);
btf_put(rec->fields[i].kptr.btf);
break;
case BPF_LIST_HEAD:
case BPF_LIST_NODE:
/* Nothing to release for bpf_list_head */
break;
default:
WARN_ON_ONCE(1);
continue;
@ -578,6 +582,10 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
goto free;
}
break;
case BPF_LIST_HEAD:
case BPF_LIST_NODE:
/* Nothing to acquire for bpf_list_head */
break;
default:
ret = -EFAULT;
WARN_ON_ONCE(1);
@ -603,6 +611,20 @@ bool btf_record_equal(const struct btf_record *rec_a, const struct btf_record *r
if (rec_a->cnt != rec_b->cnt)
return false;
size = offsetof(struct btf_record, fields[rec_a->cnt]);
/* btf_parse_fields uses kzalloc to allocate a btf_record, so unused
* members are zeroed out. So memcmp is safe to do without worrying
* about padding/unused fields.
*
* While spin_lock, timer, and kptr have no relation to map BTF,
* list_head metadata is specific to map BTF, the btf and value_rec
* members in particular. btf is the map BTF, while value_rec points to
* btf_record in that map BTF.
*
* So while by default, we don't rely on the map BTF (which the records
* were parsed from) matching for both records, which is not backwards
* compatible, in case list_head is part of it, we implicitly rely on
* that by way of depending on memcmp succeeding for it.
*/
return !memcmp(rec_a, rec_b, size);
}
@ -637,6 +659,13 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
case BPF_KPTR_REF:
field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
break;
case BPF_LIST_HEAD:
if (WARN_ON_ONCE(rec->spin_lock_off < 0))
continue;
bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
break;
case BPF_LIST_NODE:
break;
default:
WARN_ON_ONCE(1);
continue;
@ -648,14 +677,24 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
static void bpf_map_free_deferred(struct work_struct *work)
{
struct bpf_map *map = container_of(work, struct bpf_map, work);
struct btf_field_offs *foffs = map->field_offs;
struct btf_record *rec = map->record;
security_bpf_map_free(map);
kfree(map->field_offs);
bpf_map_release_memcg(map);
/* implementation dependent freeing, map_free callback also does
* bpf_map_free_record, if needed.
*/
/* implementation dependent freeing */
map->ops->map_free(map);
/* Delay freeing of field_offs and btf_record for maps, as map_free
* callback usually needs access to them. It is better to do it here
* than require each callback to do the free itself manually.
*
* Note that the btf_record stashed in map->inner_map_meta->record was
* already freed using the map_free callback for map in map case which
* eventually calls bpf_map_free_meta, since inner_map_meta is only a
* template bpf_map struct used during verification.
*/
kfree(foffs);
btf_record_free(rec);
}
static void bpf_map_put_uref(struct bpf_map *map)
@ -965,7 +1004,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
if (!value_type || value_size != map->value_size)
return -EINVAL;
map->record = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR,
map->record = btf_parse_fields(btf, value_type,
BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
map->value_size);
if (!IS_ERR_OR_NULL(map->record)) {
int i;
@ -998,7 +1038,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {
return -EOPNOTSUPP;
ret = -EOPNOTSUPP;
goto free_map_tab;
}
break;
@ -1012,6 +1052,14 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
goto free_map_tab;
}
break;
case BPF_LIST_HEAD:
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {
ret = -EOPNOTSUPP;
goto free_map_tab;
}
break;
default:
/* Fail if map_type checks are missing for a field type */
ret = -EOPNOTSUPP;
@ -1020,6 +1068,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
}
}
ret = btf_check_and_fixup_fields(btf, map->record);
if (ret < 0)
goto free_map_tab;
if (map->ops->map_check_btf) {
ret = map->ops->map_check_btf(map, btf, key_type, value_type);
if (ret < 0)
@ -1390,7 +1442,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
goto free_key;
}
err = bpf_map_update_value(map, f, key, value, attr->flags);
err = bpf_map_update_value(map, f.file, key, value, attr->flags);
kvfree(value);
free_key:
@ -1576,16 +1628,14 @@ int generic_map_delete_batch(struct bpf_map *map,
return err;
}
int generic_map_update_batch(struct bpf_map *map,
int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
void __user *values = u64_to_user_ptr(attr->batch.values);
void __user *keys = u64_to_user_ptr(attr->batch.keys);
u32 value_size, cp, max_count;
int ufd = attr->batch.map_fd;
void *key, *value;
struct fd f;
int err = 0;
if (attr->batch.elem_flags & ~BPF_F_LOCK)
@ -1612,7 +1662,6 @@ int generic_map_update_batch(struct bpf_map *map,
return -ENOMEM;
}
f = fdget(ufd); /* bpf_map_do_batch() guarantees ufd is valid */
for (cp = 0; cp < max_count; cp++) {
err = -EFAULT;
if (copy_from_user(key, keys + cp * map->key_size,
@ -1620,7 +1669,7 @@ int generic_map_update_batch(struct bpf_map *map,
copy_from_user(value, values + cp * value_size, value_size))
break;
err = bpf_map_update_value(map, f, key, value,
err = bpf_map_update_value(map, map_file, key, value,
attr->batch.elem_flags);
if (err)
@ -1633,7 +1682,6 @@ int generic_map_update_batch(struct bpf_map *map,
kvfree(value);
kvfree(key);
fdput(f);
return err;
}
@ -4426,13 +4474,13 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
#define BPF_MAP_BATCH_LAST_FIELD batch.flags
#define BPF_DO_BATCH(fn) \
#define BPF_DO_BATCH(fn, ...) \
do { \
if (!fn) { \
err = -ENOTSUPP; \
goto err_put; \
} \
err = fn(map, attr, uattr); \
err = fn(__VA_ARGS__); \
} while (0)
static int bpf_map_do_batch(const union bpf_attr *attr,
@ -4466,13 +4514,13 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
}
if (cmd == BPF_MAP_LOOKUP_BATCH)
BPF_DO_BATCH(map->ops->map_lookup_batch);
BPF_DO_BATCH(map->ops->map_lookup_batch, map, attr, uattr);
else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH)
BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch);
BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch, map, attr, uattr);
else if (cmd == BPF_MAP_UPDATE_BATCH)
BPF_DO_BATCH(map->ops->map_update_batch);
BPF_DO_BATCH(map->ops->map_update_batch, map, f.file, attr, uattr);
else
BPF_DO_BATCH(map->ops->map_delete_batch);
BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr);
err_put:
if (has_write)
bpf_map_write_active_dec(map);

File diff suppressed because it is too large Load diff

View file

@ -774,7 +774,7 @@ BPF_CALL_0(bpf_get_current_task_btf)
const struct bpf_func_proto bpf_get_current_task_btf_proto = {
.func = bpf_get_current_task_btf,
.gpl_only = true,
.ret_type = RET_PTR_TO_BTF_ID,
.ret_type = RET_PTR_TO_BTF_ID_TRUSTED,
.ret_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
};
@ -1485,9 +1485,9 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_task_stack:
return &bpf_get_task_stack_proto;
case BPF_FUNC_copy_from_user:
return prog->aux->sleepable ? &bpf_copy_from_user_proto : NULL;
return &bpf_copy_from_user_proto;
case BPF_FUNC_copy_from_user_task:
return prog->aux->sleepable ? &bpf_copy_from_user_task_proto : NULL;
return &bpf_copy_from_user_task_proto;
case BPF_FUNC_snprintf_btf:
return &bpf_snprintf_btf_proto;
case BPF_FUNC_per_cpu_ptr:

View file

@ -156,29 +156,29 @@ static bool bpf_dummy_ops_is_valid_access(int off, int size,
}
static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off,
int size, enum bpf_access_type atype,
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id,
enum bpf_type_flag *flag)
{
const struct btf_type *state;
const struct btf_type *t;
s32 type_id;
int err;
type_id = btf_find_by_name_kind(btf, "bpf_dummy_ops_state",
type_id = btf_find_by_name_kind(reg->btf, "bpf_dummy_ops_state",
BTF_KIND_STRUCT);
if (type_id < 0)
return -EINVAL;
state = btf_type_by_id(btf, type_id);
t = btf_type_by_id(reg->btf, reg->btf_id);
state = btf_type_by_id(reg->btf, type_id);
if (t != state) {
bpf_log(log, "only access to bpf_dummy_ops_state is supported\n");
return -EACCES;
}
err = btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
flag);
err = btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
if (err < 0)
return err;

View file

@ -980,9 +980,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
{
struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
if (!skb->len)
return -EINVAL;
if (!__skb)
return 0;

View file

@ -2124,12 +2124,13 @@ static int __bpf_redirect_no_mac(struct sk_buff *skb, struct net_device *dev,
{
unsigned int mlen = skb_network_offset(skb);
if (unlikely(skb->len <= mlen)) {
kfree_skb(skb);
return -ERANGE;
}
if (mlen) {
__skb_pull(skb, mlen);
if (unlikely(!skb->len)) {
kfree_skb(skb);
return -ERANGE;
}
/* At ingress, the mac header has already been pulled once.
* At egress, skb_pospull_rcsum has to be done in case that
@ -2149,7 +2150,7 @@ static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
u32 flags)
{
/* Verify that a link layer header is carried */
if (unlikely(skb->mac_header >= skb->network_header)) {
if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) {
kfree_skb(skb);
return -ERANGE;
}
@ -4108,7 +4109,10 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {
.arg2_type = ARG_ANYTHING,
};
/* XDP_REDIRECT works by a three-step process, implemented in the functions
/**
* DOC: xdp redirect
*
* XDP_REDIRECT works by a three-step process, implemented in the functions
* below:
*
* 1. The bpf_redirect() and bpf_redirect_map() helpers will lookup the target
@ -4123,7 +4127,8 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {
* 3. Before exiting its NAPI poll loop, the driver will call xdp_do_flush(),
* which will flush all the different bulk queues, thus completing the
* redirect.
*
*/
/*
* Pointers to the map entries will be kept around for this whole sequence of
* steps, protected by RCU. However, there is no top-level rcu_read_lock() in
* the core code; instead, the RCU protection relies on everything happening
@ -4414,10 +4419,10 @@ static const struct bpf_func_proto bpf_xdp_redirect_proto = {
.arg2_type = ARG_ANYTHING,
};
BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex,
BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u64, key,
u64, flags)
{
return map->ops->map_redirect(map, ifindex, flags);
return map->ops->map_redirect(map, key, flags);
}
static const struct bpf_func_proto bpf_xdp_redirect_map_proto = {
@ -8651,28 +8656,25 @@ static bool tc_cls_act_is_valid_access(int off, int size,
DEFINE_MUTEX(nf_conn_btf_access_lock);
EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock);
int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype, u32 *next_btf_id,
enum bpf_type_flag *flag);
int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag);
EXPORT_SYMBOL_GPL(nfct_btf_struct_access);
static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off,
int size, enum bpf_access_type atype,
u32 *next_btf_id,
enum bpf_type_flag *flag)
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag)
{
int ret = -EACCES;
if (atype == BPF_READ)
return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
flag);
return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
mutex_lock(&nf_conn_btf_access_lock);
if (nfct_btf_struct_access)
ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
mutex_unlock(&nf_conn_btf_access_lock);
return ret;
@ -8738,21 +8740,18 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
static int xdp_btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off,
int size, enum bpf_access_type atype,
u32 *next_btf_id,
enum bpf_type_flag *flag)
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag)
{
int ret = -EACCES;
if (atype == BPF_READ)
return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
flag);
return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
mutex_lock(&nf_conn_btf_access_lock);
if (nfct_btf_struct_access)
ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
mutex_unlock(&nf_conn_btf_access_lock);
return ret;

View file

@ -61,7 +61,9 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
if (!bpf_tracing_btf_ctx_access(off, size, type, prog, info))
return false;
if (info->reg_type == PTR_TO_BTF_ID && info->btf_id == sock_id)
if (base_type(info->reg_type) == PTR_TO_BTF_ID &&
!bpf_type_has_unsafe_modifiers(info->reg_type) &&
info->btf_id == sock_id)
/* promote it to tcp_sock */
info->btf_id = tcp_sock_id;
@ -69,18 +71,17 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
}
static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off,
int size, enum bpf_access_type atype,
u32 *next_btf_id,
enum bpf_type_flag *flag)
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag)
{
const struct btf_type *t;
size_t end;
if (atype == BPF_READ)
return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
flag);
return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
t = btf_type_by_id(reg->btf, reg->btf_id);
if (t != tcp_sock_type) {
bpf_log(log, "only read is supported\n");
return -EACCES;

View file

@ -191,19 +191,16 @@ BTF_ID(struct, nf_conn___init)
/* Check writes into `struct nf_conn` */
static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off,
int size, enum bpf_access_type atype,
u32 *next_btf_id,
enum bpf_type_flag *flag)
const struct bpf_reg_state *reg,
int off, int size, enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag)
{
const struct btf_type *ncit;
const struct btf_type *nct;
const struct btf_type *ncit, *nct, *t;
size_t end;
ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]);
nct = btf_type_by_id(btf, btf_nf_conn_ids[0]);
ncit = btf_type_by_id(reg->btf, btf_nf_conn_ids[1]);
nct = btf_type_by_id(reg->btf, btf_nf_conn_ids[0]);
t = btf_type_by_id(reg->btf, reg->btf_id);
if (t != nct && t != ncit) {
bpf_log(log, "only read is supported\n");
return -EACCES;

View file

@ -231,9 +231,9 @@ static int xsk_map_delete_elem(struct bpf_map *map, void *key)
return 0;
}
static int xsk_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags)
static int xsk_map_redirect(struct bpf_map *map, u64 index, u64 flags)
{
return __bpf_xdp_redirect_map(map, ifindex, flags, 0,
return __bpf_xdp_redirect_map(map, index, flags, 0,
__xsk_map_lookup_elem);
}

View file

@ -115,7 +115,7 @@ do_exit() {
if [ "$DEBUG" == "yes" ] && [ "$MODE" != 'cleanuponly' ]
then
echo "------ DEBUG ------"
echo "mount: "; mount | egrep '(cgroup2|bpf)'; echo
echo "mount: "; mount | grep -E '(cgroup2|bpf)'; echo
echo "$CGRP2_TC_LEAF: "; ls -l $CGRP2_TC_LEAF; echo
if [ -d "$BPF_FS_TC_SHARE" ]
then

View file

@ -162,7 +162,7 @@ static void read_route(struct nlmsghdr *nh, int nll)
__be32 gw;
} *prefix_value;
prefix_key = alloca(sizeof(*prefix_key) + 3);
prefix_key = alloca(sizeof(*prefix_key) + 4);
prefix_value = alloca(sizeof(*prefix_value));
prefix_key->prefixlen = 32;

View file

@ -23,12 +23,3 @@
Print all logs available, even debug-level information. This includes
logs from libbpf as well as from the verifier, when attempting to
load programs.
-l, --legacy
Use legacy libbpf mode which has more relaxed BPF program
requirements. By default, bpftool has more strict requirements
about section names, changes pinning logic and doesn't support
some of the older non-BTF map declarations.
See https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0
for details.

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
.. |COMMON_OPTIONS| replace:: { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | { **-l** | **--legacy** }
.. |COMMON_OPTIONS| replace:: { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** }

View file

@ -261,7 +261,7 @@ _bpftool()
# Deal with options
if [[ ${words[cword]} == -* ]]; then
local c='--version --json --pretty --bpffs --mapcompat --debug \
--use-loader --base-btf --legacy'
--use-loader --base-btf'
COMPREPLY=( $( compgen -W "$c" -- "$cur" ) )
return 0
fi

View file

@ -467,9 +467,8 @@ static int dump_btf_c(const struct btf *btf,
int err = 0, i;
d = btf_dump__new(btf, btf_dump_printf, NULL, NULL);
err = libbpf_get_error(d);
if (err)
return err;
if (!d)
return -errno;
printf("#ifndef __VMLINUX_H__\n");
printf("#define __VMLINUX_H__\n");
@ -512,11 +511,9 @@ static struct btf *get_vmlinux_btf_from_sysfs(void)
struct btf *base;
base = btf__parse(sysfs_vmlinux, NULL);
if (libbpf_get_error(base)) {
p_err("failed to parse vmlinux BTF at '%s': %ld\n",
sysfs_vmlinux, libbpf_get_error(base));
base = NULL;
}
if (!base)
p_err("failed to parse vmlinux BTF at '%s': %d\n",
sysfs_vmlinux, -errno);
return base;
}
@ -559,7 +556,7 @@ static int do_dump(int argc, char **argv)
__u32 btf_id = -1;
const char *src;
int fd = -1;
int err;
int err = 0;
if (!REQ_ARGS(2)) {
usage();
@ -634,8 +631,8 @@ static int do_dump(int argc, char **argv)
base = get_vmlinux_btf_from_sysfs();
btf = btf__parse_split(*argv, base ?: base_btf);
err = libbpf_get_error(btf);
if (!btf) {
err = -errno;
p_err("failed to load BTF from %s: %s",
*argv, strerror(errno));
goto done;
@ -681,8 +678,8 @@ static int do_dump(int argc, char **argv)
}
btf = btf__load_from_kernel_by_id_split(btf_id, base_btf);
err = libbpf_get_error(btf);
if (!btf) {
err = -errno;
p_err("get btf by id (%u): %s", btf_id, strerror(errno));
goto done;
}

View file

@ -75,7 +75,7 @@ static int dump_prog_id_as_func_ptr(const struct btf_dumper *d,
goto print;
prog_btf = btf__load_from_kernel_by_id(info.btf_id);
if (libbpf_get_error(prog_btf))
if (!prog_btf)
goto print;
func_type = btf__type_by_id(prog_btf, finfo.type_id);
if (!func_type || !btf_is_func(func_type))

View file

@ -252,9 +252,8 @@ static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
int err = 0;
d = btf_dump__new(btf, codegen_btf_dump_printf, NULL, NULL);
err = libbpf_get_error(d);
if (err)
return err;
if (!d)
return -errno;
bpf_object__for_each_map(map, obj) {
/* only generate definitions for memory-mapped internal maps */
@ -976,13 +975,12 @@ static int do_skeleton(int argc, char **argv)
/* log_level1 + log_level2 + stats, but not stable UAPI */
opts.kernel_log_level = 1 + 2 + 4;
obj = bpf_object__open_mem(obj_data, file_sz, &opts);
err = libbpf_get_error(obj);
if (err) {
if (!obj) {
char err_buf[256];
err = -errno;
libbpf_strerror(err, err_buf, sizeof(err_buf));
p_err("failed to open BPF object file: %s", err_buf);
obj = NULL;
goto out;
}

View file

@ -4,6 +4,7 @@
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <errno.h>
#include <unistd.h>
#include <linux/err.h>
#include <bpf/libbpf.h>
@ -48,8 +49,8 @@ static int do_pin(int argc, char **argv)
}
obj = bpf_object__open(objfile);
err = libbpf_get_error(obj);
if (err) {
if (!obj) {
err = -errno;
p_err("can't open objfile %s", objfile);
goto close_map_fd;
}
@ -62,13 +63,14 @@ static int do_pin(int argc, char **argv)
prog = bpf_object__next_program(obj, NULL);
if (!prog) {
err = -errno;
p_err("can't find bpf program in objfile %s", objfile);
goto close_obj;
}
link = bpf_program__attach_iter(prog, &iter_opts);
err = libbpf_get_error(link);
if (err) {
if (!link) {
err = -errno;
p_err("attach_iter failed for program %s",
bpf_program__name(prog));
goto close_obj;

View file

@ -31,7 +31,6 @@ bool block_mount;
bool verifier_logs;
bool relaxed_maps;
bool use_loader;
bool legacy_libbpf;
struct btf *base_btf;
struct hashmap *refs_table;
@ -160,7 +159,6 @@ static int do_version(int argc, char **argv)
jsonw_start_object(json_wtr); /* features */
jsonw_bool_field(json_wtr, "libbfd", has_libbfd);
jsonw_bool_field(json_wtr, "llvm", has_llvm);
jsonw_bool_field(json_wtr, "libbpf_strict", !legacy_libbpf);
jsonw_bool_field(json_wtr, "skeletons", has_skeletons);
jsonw_bool_field(json_wtr, "bootstrap", bootstrap);
jsonw_end_object(json_wtr); /* features */
@ -179,7 +177,6 @@ static int do_version(int argc, char **argv)
printf("features:");
print_feature("libbfd", has_libbfd, &nb_features);
print_feature("llvm", has_llvm, &nb_features);
print_feature("libbpf_strict", !legacy_libbpf, &nb_features);
print_feature("skeletons", has_skeletons, &nb_features);
print_feature("bootstrap", bootstrap, &nb_features);
printf("\n");
@ -337,12 +334,12 @@ static int do_batch(int argc, char **argv)
if (argc < 2) {
p_err("too few parameters for batch");
return -1;
} else if (!is_prefix(*argv, "file")) {
p_err("expected 'file', got: %s", *argv);
return -1;
} else if (argc > 2) {
p_err("too many parameters for batch");
return -1;
} else if (!is_prefix(*argv, "file")) {
p_err("expected 'file', got: %s", *argv);
return -1;
}
NEXT_ARG();
@ -451,7 +448,6 @@ int main(int argc, char **argv)
{ "debug", no_argument, NULL, 'd' },
{ "use-loader", no_argument, NULL, 'L' },
{ "base-btf", required_argument, NULL, 'B' },
{ "legacy", no_argument, NULL, 'l' },
{ 0 }
};
bool version_requested = false;
@ -514,19 +510,15 @@ int main(int argc, char **argv)
break;
case 'B':
base_btf = btf__parse(optarg, NULL);
if (libbpf_get_error(base_btf)) {
p_err("failed to parse base BTF at '%s': %ld\n",
optarg, libbpf_get_error(base_btf));
base_btf = NULL;
if (!base_btf) {
p_err("failed to parse base BTF at '%s': %d\n",
optarg, -errno);
return -1;
}
break;
case 'L':
use_loader = true;
break;
case 'l':
legacy_libbpf = true;
break;
default:
p_err("unrecognized option '%s'", argv[optind - 1]);
if (json_output)
@ -536,14 +528,6 @@ int main(int argc, char **argv)
}
}
if (!legacy_libbpf) {
/* Allow legacy map definitions for skeleton generation.
* It will still be rejected if users use LIBBPF_STRICT_ALL
* mode for loading generated skeleton.
*/
libbpf_set_strict_mode(LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS);
}
argc -= optind;
argv += optind;
if (argc < 0)

View file

@ -57,7 +57,7 @@ static inline void *u64_to_ptr(__u64 ptr)
#define HELP_SPEC_PROGRAM \
"PROG := { id PROG_ID | pinned FILE | tag PROG_TAG | name PROG_NAME }"
#define HELP_SPEC_OPTIONS \
"OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug} | {-l|--legacy}"
"OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug}"
#define HELP_SPEC_MAP \
"MAP := { id MAP_ID | pinned FILE | name MAP_NAME }"
#define HELP_SPEC_LINK \
@ -82,7 +82,6 @@ extern bool block_mount;
extern bool verifier_logs;
extern bool relaxed_maps;
extern bool use_loader;
extern bool legacy_libbpf;
extern struct btf *base_btf;
extern struct hashmap *refs_table;

View file

@ -786,18 +786,18 @@ static int get_map_kv_btf(const struct bpf_map_info *info, struct btf **btf)
if (info->btf_vmlinux_value_type_id) {
if (!btf_vmlinux) {
btf_vmlinux = libbpf_find_kernel_btf();
err = libbpf_get_error(btf_vmlinux);
if (err) {
if (!btf_vmlinux) {
p_err("failed to get kernel btf");
return err;
return -errno;
}
}
*btf = btf_vmlinux;
} else if (info->btf_value_type_id) {
*btf = btf__load_from_kernel_by_id(info->btf_id);
err = libbpf_get_error(*btf);
if (err)
if (!*btf) {
err = -errno;
p_err("failed to get btf");
}
} else {
*btf = NULL;
}
@ -807,16 +807,10 @@ static int get_map_kv_btf(const struct bpf_map_info *info, struct btf **btf)
static void free_map_kv_btf(struct btf *btf)
{
if (!libbpf_get_error(btf) && btf != btf_vmlinux)
if (btf != btf_vmlinux)
btf__free(btf);
}
static void free_btf_vmlinux(void)
{
if (!libbpf_get_error(btf_vmlinux))
btf__free(btf_vmlinux);
}
static int
map_dump(int fd, struct bpf_map_info *info, json_writer_t *wtr,
bool show_header)
@ -953,7 +947,7 @@ static int do_dump(int argc, char **argv)
close(fds[i]);
exit_free:
free(fds);
free_btf_vmlinux();
btf__free(btf_vmlinux);
return err;
}

View file

@ -322,7 +322,7 @@ static void show_prog_metadata(int fd, __u32 num_maps)
return;
btf = btf__load_from_kernel_by_id(map_info.btf_id);
if (libbpf_get_error(btf))
if (!btf)
goto out_free;
t_datasec = btf__type_by_id(btf, map_info.btf_value_type_id);
@ -726,7 +726,7 @@ prog_dump(struct bpf_prog_info *info, enum dump_mode mode,
if (info->btf_id) {
btf = btf__load_from_kernel_by_id(info->btf_id);
if (libbpf_get_error(btf)) {
if (!btf) {
p_err("failed to get btf");
return -1;
}
@ -1663,7 +1663,7 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
open_opts.kernel_log_level = 1 + 2 + 4;
obj = bpf_object__open_file(file, &open_opts);
if (libbpf_get_error(obj)) {
if (!obj) {
p_err("failed to open object file");
goto err_free_reuse_maps;
}
@ -1802,11 +1802,6 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
else
bpf_object__unpin_programs(obj, pinfile);
err_close_obj:
if (!legacy_libbpf) {
p_info("Warning: bpftool is now running in libbpf strict mode and has more stringent requirements about BPF programs.\n"
"If it used to work for this object file but now doesn't, see --legacy option for more details.\n");
}
bpf_object__close(obj);
err_free_reuse_maps:
for (i = 0; i < old_map_fds; i++)
@ -1887,7 +1882,7 @@ static int do_loader(int argc, char **argv)
open_opts.kernel_log_level = 1 + 2 + 4;
obj = bpf_object__open_file(file, &open_opts);
if (libbpf_get_error(obj)) {
if (!obj) {
p_err("failed to open object file");
goto err_close_obj;
}
@ -2204,7 +2199,7 @@ static char *profile_target_name(int tgt_fd)
}
btf = btf__load_from_kernel_by_id(info.btf_id);
if (libbpf_get_error(btf)) {
if (!btf) {
p_err("failed to load btf for prog FD %d", tgt_fd);
goto out;
}

View file

@ -32,7 +32,7 @@ static const struct btf *get_btf_vmlinux(void)
return btf_vmlinux;
btf_vmlinux = libbpf_find_kernel_btf();
if (libbpf_get_error(btf_vmlinux))
if (!btf_vmlinux)
p_err("struct_ops requires kernel CONFIG_DEBUG_INFO_BTF=y");
return btf_vmlinux;
@ -45,7 +45,7 @@ static const char *get_kern_struct_ops_name(const struct bpf_map_info *info)
const char *st_ops_name;
kern_btf = get_btf_vmlinux();
if (libbpf_get_error(kern_btf))
if (!kern_btf)
return "<btf_vmlinux_not_found>";
t = btf__type_by_id(kern_btf, info->btf_vmlinux_value_type_id);
@ -63,10 +63,8 @@ static __s32 get_map_info_type_id(void)
return map_info_type_id;
kern_btf = get_btf_vmlinux();
if (libbpf_get_error(kern_btf)) {
map_info_type_id = PTR_ERR(kern_btf);
return map_info_type_id;
}
if (!kern_btf)
return 0;
map_info_type_id = btf__find_by_name_kind(kern_btf, "bpf_map_info",
BTF_KIND_STRUCT);
@ -415,7 +413,7 @@ static int do_dump(int argc, char **argv)
}
kern_btf = get_btf_vmlinux();
if (libbpf_get_error(kern_btf))
if (!kern_btf)
return -1;
if (!json_output) {
@ -498,7 +496,7 @@ static int do_register(int argc, char **argv)
open_opts.kernel_log_level = 1 + 2 + 4;
obj = bpf_object__open_file(file, &open_opts);
if (libbpf_get_error(obj))
if (!obj)
return -1;
set_max_rlimit();
@ -513,10 +511,9 @@ static int do_register(int argc, char **argv)
continue;
link = bpf_map__attach_struct_ops(map);
if (libbpf_get_error(link)) {
if (!link) {
p_err("can't register struct_ops %s: %s",
bpf_map__name(map),
strerror(-PTR_ERR(link)));
bpf_map__name(map), strerror(errno));
nr_errs++;
continue;
}
@ -593,8 +590,7 @@ int do_struct_ops(int argc, char **argv)
err = cmd_select(cmds, argc, argv, do_help);
if (!libbpf_get_error(btf_vmlinux))
btf__free(btf_vmlinux);
btf__free(btf_vmlinux);
return err;
}

View file

@ -2584,14 +2584,19 @@ union bpf_attr {
* * **SOL_SOCKET**, which supports the following *optname*\ s:
* **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
* **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**.
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
* **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
* * **IPPROTO_TCP**, which supports the following *optname*\ s:
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
* **TCP_KEEPIDLE**, **TCP_KEEPINTVL**, **TCP_KEEPCNT**,
* **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**.
* **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**,
* **TCP_NODELAY**, **TCP_MAXSEG**, **TCP_WINDOW_CLAMP**,
* **TCP_THIN_LINEAR_TIMEOUTS**, **TCP_BPF_DELACK_MAX**,
* **TCP_BPF_RTO_MIN**.
* * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
* * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
* * **IPPROTO_IPV6**, which supports the following *optname*\ s:
* **IPV6_TCLASS**, **IPV6_AUTOFLOWLABEL**.
* Return
* 0 on success, or a negative error in case of failure.
*
@ -2647,7 +2652,7 @@ union bpf_attr {
* Return
* 0 on success, or a negative error in case of failure.
*
* long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
* long bpf_redirect_map(struct bpf_map *map, u64 key, u64 flags)
* Description
* Redirect the packet to the endpoint referenced by *map* at
* index *key*. Depending on its type, this *map* can contain
@ -2808,12 +2813,10 @@ union bpf_attr {
* and **BPF_CGROUP_INET6_CONNECT**.
*
* This helper actually implements a subset of **getsockopt()**.
* It supports the following *level*\ s:
*
* * **IPPROTO_TCP**, which supports *optname*
* **TCP_CONGESTION**.
* * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
* * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
* It supports the same set of *optname*\ s that is supported by
* the **bpf_setsockopt**\ () helper. The exceptions are
* **TCP_BPF_*** is **bpf_setsockopt**\ () only and
* **TCP_SAVED_SYN** is **bpf_getsockopt**\ () only.
* Return
* 0 on success, or a negative error in case of failure.
*
@ -6888,6 +6891,16 @@ struct bpf_dynptr {
__u64 :64;
} __attribute__((aligned(8)));
struct bpf_list_head {
__u64 :64;
__u64 :64;
} __attribute__((aligned(8)));
struct bpf_list_node {
__u64 :64;
__u64 :64;
} __attribute__((aligned(8)));
struct bpf_sysctl {
__u32 write; /* Sysctl is being read (= 0) or written (= 1).
* Allows 1,2,4-byte read, but no write.

View file

@ -1724,7 +1724,8 @@ int btf__add_btf(struct btf *btf, const struct btf *src_btf)
memset(btf->strs_data + old_strs_len, 0, btf->hdr->str_len - old_strs_len);
/* and now restore original strings section size; types data size
* wasn't modified, so doesn't need restoring, see big comment above */
* wasn't modified, so doesn't need restoring, see big comment above
*/
btf->hdr->str_len = old_strs_len;
hashmap__free(p.str_off_map);
@ -2329,7 +2330,7 @@ int btf__add_restrict(struct btf *btf, int ref_type_id)
*/
int btf__add_type_tag(struct btf *btf, const char *value, int ref_type_id)
{
if (!value|| !value[0])
if (!value || !value[0])
return libbpf_err(-EINVAL);
return btf_add_ref_kind(btf, BTF_KIND_TYPE_TAG, value, ref_type_id);

View file

@ -1543,7 +1543,7 @@ static size_t btf_dump_name_dups(struct btf_dump *d, struct hashmap *name_map,
if (!new_name)
return 1;
hashmap__find(name_map, orig_name, &dup_cnt);
(void)hashmap__find(name_map, orig_name, &dup_cnt);
dup_cnt++;
err = hashmap__set(name_map, new_name, dup_cnt, &old_name, NULL);
@ -1989,7 +1989,7 @@ static int btf_dump_struct_data(struct btf_dump *d,
{
const struct btf_member *m = btf_members(t);
__u16 n = btf_vlen(t);
int i, err;
int i, err = 0;
/* note that we increment depth before calling btf_dump_print() below;
* this is intentional. btf_dump_data_newline() will not print a

View file

@ -347,7 +347,8 @@ enum sec_def_flags {
SEC_ATTACHABLE = 2,
SEC_ATTACHABLE_OPT = SEC_ATTACHABLE | SEC_EXP_ATTACH_OPT,
/* attachment target is specified through BTF ID in either kernel or
* other BPF program's BTF object */
* other BPF program's BTF object
*/
SEC_ATTACH_BTF = 4,
/* BPF program type allows sleeping/blocking in kernel */
SEC_SLEEPABLE = 8,
@ -488,7 +489,7 @@ struct bpf_map {
char *name;
/* real_name is defined for special internal maps (.rodata*,
* .data*, .bss, .kconfig) and preserves their original ELF section
* name. This is important to be be able to find corresponding BTF
* name. This is important to be able to find corresponding BTF
* DATASEC information.
*/
char *real_name;
@ -1863,12 +1864,20 @@ static int set_kcfg_value_num(struct extern_desc *ext, void *ext_val,
return -ERANGE;
}
switch (ext->kcfg.sz) {
case 1: *(__u8 *)ext_val = value; break;
case 2: *(__u16 *)ext_val = value; break;
case 4: *(__u32 *)ext_val = value; break;
case 8: *(__u64 *)ext_val = value; break;
default:
return -EINVAL;
case 1:
*(__u8 *)ext_val = value;
break;
case 2:
*(__u16 *)ext_val = value;
break;
case 4:
*(__u32 *)ext_val = value;
break;
case 8:
*(__u64 *)ext_val = value;
break;
default:
return -EINVAL;
}
ext->is_set = true;
return 0;
@ -2770,7 +2779,7 @@ static int bpf_object__sanitize_btf(struct bpf_object *obj, struct btf *btf)
m->type = enum64_placeholder_id;
m->offset = 0;
}
}
}
}
return 0;
@ -3502,7 +3511,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
sec_desc->sec_type = SEC_RELO;
sec_desc->shdr = sh;
sec_desc->data = data;
} else if (sh->sh_type == SHT_NOBITS && strcmp(name, BSS_SEC) == 0) {
} else if (sh->sh_type == SHT_NOBITS && (strcmp(name, BSS_SEC) == 0 ||
str_has_pfx(name, BSS_SEC "."))) {
sec_desc->sec_type = SEC_BSS;
sec_desc->shdr = sh;
sec_desc->data = data;
@ -3518,7 +3528,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
}
/* sort BPF programs by section name and in-section instruction offset
* for faster search */
* for faster search
*/
if (obj->nr_programs)
qsort(obj->programs, obj->nr_programs, sizeof(*obj->programs), cmp_progs);
@ -3817,7 +3828,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
return -EINVAL;
}
ext->kcfg.type = find_kcfg_type(obj->btf, t->type,
&ext->kcfg.is_signed);
&ext->kcfg.is_signed);
if (ext->kcfg.type == KCFG_UNKNOWN) {
pr_warn("extern (kcfg) '%s': type is unsupported\n", ext_name);
return -ENOTSUP;
@ -4965,9 +4976,9 @@ bpf_object__reuse_map(struct bpf_map *map)
err = bpf_map__reuse_fd(map, pin_fd);
close(pin_fd);
if (err) {
if (err)
return err;
}
map->pinned = true;
pr_debug("reused pinned map at '%s'\n", map->pin_path);
@ -5485,7 +5496,7 @@ static int load_module_btfs(struct bpf_object *obj)
}
err = libbpf_ensure_mem((void **)&obj->btf_modules, &obj->btf_module_cap,
sizeof(*obj->btf_modules), obj->btf_module_cnt + 1);
sizeof(*obj->btf_modules), obj->btf_module_cnt + 1);
if (err)
goto err_out;
@ -6237,7 +6248,8 @@ bpf_object__reloc_code(struct bpf_object *obj, struct bpf_program *main_prog,
* prog; each main prog can have a different set of
* subprograms appended (potentially in different order as
* well), so position of any subprog can be different for
* different main programs */
* different main programs
*/
insn->imm = subprog->sub_insn_off - (prog->sub_insn_off + insn_idx) - 1;
pr_debug("prog '%s': insn #%zu relocated, imm %d points to subprog '%s' (now at %zu offset)\n",
@ -10995,7 +11007,7 @@ struct bpf_link *bpf_program__attach_usdt(const struct bpf_program *prog,
usdt_cookie = OPTS_GET(opts, usdt_cookie, 0);
link = usdt_manager_attach_usdt(obj->usdt_man, prog, pid, binary_path,
usdt_provider, usdt_name, usdt_cookie);
usdt_provider, usdt_name, usdt_cookie);
err = libbpf_get_error(link);
if (err)
return libbpf_err_ptr(err);
@ -12304,7 +12316,7 @@ int bpf_object__open_subskeleton(struct bpf_object_subskeleton *s)
btf = bpf_object__btf(s->obj);
if (!btf) {
pr_warn("subskeletons require BTF at runtime (object %s)\n",
bpf_object__name(s->obj));
bpf_object__name(s->obj));
return libbpf_err(-errno);
}

View file

@ -128,7 +128,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
/* Map read-only producer page and data pages. We map twice as big
* data size to allow simple reading of samples that wrap around the
* end of a ring buffer. See kernel implementation for details.
* */
*/
tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ,
MAP_SHARED, map_fd, rb->page_size);
if (tmp == MAP_FAILED) {
@ -220,7 +220,7 @@ static inline int roundup_len(__u32 len)
return (len + 7) / 8 * 8;
}
static int64_t ringbuf_process_ring(struct ring* r)
static int64_t ringbuf_process_ring(struct ring *r)
{
int *len_ptr, len, err;
/* 64-bit to avoid overflow in case of extreme application behavior */

View file

@ -38,12 +38,14 @@ kprobe_multi_test/skel_api # kprobe_multi__attach unexpect
ksyms_module/libbpf # 'bpf_testmod_ksym_percpu': not found in kernel BTF
ksyms_module/lskel # test_ksyms_module_lskel__open_and_load unexpected error: -2
libbpf_get_fd_by_id_opts # test_libbpf_get_fd_by_id_opts__attach unexpected error: -524 (errno 524)
linked_list
lookup_key # test_lookup_key__attach unexpected error: -524 (errno 524)
lru_bug # lru_bug__attach unexpected error: -524 (errno 524)
modify_return # modify_return__attach failed unexpected error: -524 (errno 524)
module_attach # skel_attach skeleton attach failed: -524
mptcp/base # run_test mptcp unexpected error: -524 (errno 524)
netcnt # packets unexpected packets: actual 10001 != expected 10000
rcu_read_lock # failed to attach: ERROR: strerror_r(-524)=22
recursion # skel_attach unexpected error: -524 (errno 524)
ringbuf # skel_attach skeleton attachment failed: -1
setget_sockopt # attach_cgroup unexpected error: -524

View file

@ -10,6 +10,7 @@ bpf_nf # JIT does not support calling kernel f
bpf_tcp_ca # JIT does not support calling kernel function (kfunc)
cb_refs # expected error message unexpected error: -524 (trampoline)
cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc)
cgrp_kfunc # JIT does not support calling kernel function
cgrp_local_storage # prog_attach unexpected error: -524 (trampoline)
core_read_macros # unknown func bpf_probe_read#4 (overlapping)
d_path # failed to auto-attach program 'prog_stat': -524 (trampoline)
@ -33,6 +34,7 @@ ksyms_module # test_ksyms_module__open_and_load unex
ksyms_module_libbpf # JIT does not support calling kernel function (kfunc)
ksyms_module_lskel # test_ksyms_module_lskel__open_and_load unexpected error: -9 (?)
libbpf_get_fd_by_id_opts # failed to attach: ERROR: strerror_r(-524)=22 (trampoline)
linked_list # JIT does not support calling kernel function (kfunc)
lookup_key # JIT does not support calling kernel function (kfunc)
lru_bug # prog 'printk': failed to auto-attach: -524
map_kptr # failed to open_and_load program: -524 (trampoline)
@ -41,6 +43,7 @@ module_attach # skel_attach skeleton attach failed: -
mptcp
netcnt # failed to load BPF skeleton 'netcnt_prog': -7 (?)
probe_user # check_kprobe_res wrong kprobe res from probe read (?)
rcu_read_lock # failed to find kernel BTF type ID of '__x64_sys_getpgid': -3 (?)
recursion # skel_attach unexpected error: -524 (trampoline)
ringbuf # skel_load skeleton load failed (?)
select_reuseport # intermittently fails on new s390x setup
@ -53,6 +56,7 @@ skc_to_unix_sock # could not attach BPF object unexpecte
socket_cookie # prog_attach unexpected error: -524 (trampoline)
stacktrace_build_id # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2 (?)
tailcalls # tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls (?)
task_kfunc # JIT does not support calling kernel function
task_local_storage # failed to auto-attach program 'trace_exit_creds': -524 (trampoline)
test_bpffs # bpffs test failed 255 (iterator)
test_bprm_opts # failed to auto-attach program 'secure_exec': -524 (trampoline)
@ -69,6 +73,7 @@ trace_printk # trace_printk__load unexpected error:
trace_vprintk # trace_vprintk__open_and_load unexpected error: -9 (?)
tracing_struct # failed to auto-attach: -524 (trampoline)
trampoline_count # prog 'prog1': failed to attach: ERROR: strerror_r(-524)=22 (trampoline)
type_cast # JIT does not support calling kernel function
unpriv_bpf_disabled # fentry
user_ringbuf # failed to find kernel BTF type ID of '__s390x_sys_prctl': -3 (?)
verif_stats # trace_vprintk__open_and_load unexpected error: -9 (?)

View file

@ -201,7 +201,7 @@ $(OUTPUT)/sign-file: ../../../../scripts/sign-file.c
$(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch])
$(call msg,MOD,,$@)
$(Q)$(RM) bpf_testmod/bpf_testmod.ko # force re-compilation
$(Q)$(MAKE) $(submake_extras) -C bpf_testmod
$(Q)$(MAKE) $(submake_extras) RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) -C bpf_testmod
$(Q)cp bpf_testmod/bpf_testmod.ko $@
DEFAULT_BPFTOOL := $(HOST_SCRATCH_DIR)/sbin/bpftool
@ -310,9 +310,9 @@ $(RESOLVE_BTFIDS): $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/resolve_btfids \
# Use '-idirafter': Don't interfere with include mechanics except where the
# build would have failed anyways.
define get_sys_includes
$(shell $(1) -v -E - </dev/null 2>&1 \
$(shell $(1) $(2) -v -E - </dev/null 2>&1 \
| sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }') \
$(shell $(1) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}')
$(shell $(1) $(2) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}')
endef
# Determine target endianness.
@ -320,7 +320,11 @@ IS_LITTLE_ENDIAN = $(shell $(CC) -dM -E - </dev/null | \
grep 'define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__')
MENDIAN=$(if $(IS_LITTLE_ENDIAN),-mlittle-endian,-mbig-endian)
CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG))
ifneq ($(CROSS_COMPILE),)
CLANG_TARGET_ARCH = --target=$(notdir $(CROSS_COMPILE:%-=%))
endif
CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH))
BPF_CFLAGS = -g -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
-I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR) \
-I$(abspath $(OUTPUT)/../usr/include)
@ -542,7 +546,7 @@ $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
# Define test_progs BPF-GCC-flavored test runner.
ifneq ($(BPF_GCC),)
TRUNNER_BPF_BUILD_RULE := GCC_BPF_BUILD_RULE
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(call get_sys_includes,gcc)
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(call get_sys_includes,gcc,)
$(eval $(call DEFINE_TEST_RUNNER,test_progs,bpf_gcc))
endif

View file

@ -0,0 +1,68 @@
#ifndef __BPF_EXPERIMENTAL__
#define __BPF_EXPERIMENTAL__
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
#define __contains(name, node) __attribute__((btf_decl_tag("contains:" #name ":" #node)))
/* Description
* Allocates an object of the type represented by 'local_type_id' in
* program BTF. User may use the bpf_core_type_id_local macro to pass the
* type ID of a struct in program BTF.
*
* The 'local_type_id' parameter must be a known constant.
* The 'meta' parameter is a hidden argument that is ignored.
* Returns
* A pointer to an object of the type corresponding to the passed in
* 'local_type_id', or NULL on failure.
*/
extern void *bpf_obj_new_impl(__u64 local_type_id, void *meta) __ksym;
/* Convenience macro to wrap over bpf_obj_new_impl */
#define bpf_obj_new(type) ((type *)bpf_obj_new_impl(bpf_core_type_id_local(type), NULL))
/* Description
* Free an allocated object. All fields of the object that require
* destruction will be destructed before the storage is freed.
*
* The 'meta' parameter is a hidden argument that is ignored.
* Returns
* Void.
*/
extern void bpf_obj_drop_impl(void *kptr, void *meta) __ksym;
/* Convenience macro to wrap over bpf_obj_drop_impl */
#define bpf_obj_drop(kptr) bpf_obj_drop_impl(kptr, NULL)
/* Description
* Add a new entry to the beginning of the BPF linked list.
* Returns
* Void.
*/
extern void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node) __ksym;
/* Description
* Add a new entry to the end of the BPF linked list.
* Returns
* Void.
*/
extern void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node) __ksym;
/* Description
* Remove the entry at the beginning of the BPF linked list.
* Returns
* Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
*/
extern struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) __ksym;
/* Description
* Remove the entry at the end of the BPF linked list.
* Returns
* Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
*/
extern struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) __ksym;
#endif

View file

@ -333,6 +333,25 @@ int get_root_cgroup(void)
return fd;
}
/*
* remove_cgroup() - Remove a cgroup
* @relative_path: The cgroup path, relative to the workdir, to remove
*
* This function expects a cgroup to already be created, relative to the cgroup
* work dir. It also expects the cgroup doesn't have any children or live
* processes and it removes the cgroup.
*
* On failure, it will print an error to stderr.
*/
void remove_cgroup(const char *relative_path)
{
char cgroup_path[PATH_MAX + 1];
format_cgroup_path(cgroup_path, relative_path);
if (rmdir(cgroup_path))
log_err("rmdiring cgroup %s .. %s", relative_path, cgroup_path);
}
/**
* create_and_get_cgroup() - Create a cgroup, relative to workdir, and get the FD
* @relative_path: The cgroup path, relative to the workdir, to join

View file

@ -18,6 +18,7 @@ int write_cgroup_file_parent(const char *relative_path, const char *file,
int cgroup_setup_and_join(const char *relative_path);
int get_root_cgroup(void);
int create_and_get_cgroup(const char *relative_path);
void remove_cgroup(const char *relative_path);
unsigned long long get_cgroup_id(const char *relative_path);
int join_cgroup(const char *relative_path);

View file

@ -8,6 +8,7 @@ CONFIG_BPF_LIRC_MODE2=y
CONFIG_BPF_LSM=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_UNPRIV_DEFAULT_OFF=n
CONFIG_CGROUP_BPF=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_SHA256=y

View file

@ -426,6 +426,10 @@ static int setns_by_fd(int nsfd)
if (!ASSERT_OK(err, "mount /sys/fs/bpf"))
return err;
err = mount("debugfs", "/sys/kernel/debug", "debugfs", 0, NULL);
if (!ASSERT_OK(err, "mount /sys/kernel/debug"))
return err;
return 0;
}

View file

@ -3948,6 +3948,20 @@ static struct btf_raw_test raw_tests[] = {
.btf_load_err = true,
.err_str = "Invalid return type",
},
{
.descr = "decl_tag test #17, func proto, argument",
.raw_types = {
BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DECL_TAG, 0, 0), 4), (-1), /* [1] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 0), /* [2] */
BTF_FUNC_PROTO_ENC(0, 1), /* [3] */
BTF_FUNC_PROTO_ARG_ENC(NAME_TBD, 1),
BTF_VAR_ENC(NAME_TBD, 2, 0), /* [4] */
BTF_END_RAW,
},
BTF_STR_SEC("\0local\0tag1\0var"),
.btf_load_err = true,
.err_str = "Invalid arg#1",
},
{
.descr = "type_tag test #1",
.raw_types = {

View file

@ -189,6 +189,80 @@ static void test_walk_self_only(struct cgroup_iter *skel)
BPF_CGROUP_ITER_SELF_ONLY, "self_only");
}
static void test_walk_dead_self_only(struct cgroup_iter *skel)
{
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
char expected_output[128], buf[128];
const char *cgrp_name = "/dead";
union bpf_iter_link_info linfo;
int len, cgrp_fd, iter_fd;
struct bpf_link *link;
size_t left;
char *p;
cgrp_fd = create_and_get_cgroup(cgrp_name);
if (!ASSERT_GE(cgrp_fd, 0, "create cgrp"))
return;
/* The cgroup will be dead during read() iteration, so it only has
* epilogue in the output
*/
snprintf(expected_output, sizeof(expected_output), EPILOGUE);
memset(&linfo, 0, sizeof(linfo));
linfo.cgroup.cgroup_fd = cgrp_fd;
linfo.cgroup.order = BPF_CGROUP_ITER_SELF_ONLY;
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
link = bpf_program__attach_iter(skel->progs.cgroup_id_printer, &opts);
if (!ASSERT_OK_PTR(link, "attach_iter"))
goto close_cgrp;
iter_fd = bpf_iter_create(bpf_link__fd(link));
if (!ASSERT_GE(iter_fd, 0, "iter_create"))
goto free_link;
/* Close link fd and cgroup fd */
bpf_link__destroy(link);
close(cgrp_fd);
/* Remove cgroup to mark it as dead */
remove_cgroup(cgrp_name);
/* Two kern_sync_rcu() and usleep() pairs are used to wait for the
* releases of cgroup css, and the last kern_sync_rcu() and usleep()
* pair is used to wait for the free of cgroup itself.
*/
kern_sync_rcu();
usleep(8000);
kern_sync_rcu();
usleep(8000);
kern_sync_rcu();
usleep(1000);
memset(buf, 0, sizeof(buf));
left = ARRAY_SIZE(buf);
p = buf;
while ((len = read(iter_fd, p, left)) > 0) {
p += len;
left -= len;
}
ASSERT_STREQ(buf, expected_output, "dead cgroup output");
/* read() after iter finishes should be ok. */
if (len == 0)
ASSERT_OK(read(iter_fd, buf, sizeof(buf)), "second_read");
close(iter_fd);
return;
free_link:
bpf_link__destroy(link);
close_cgrp:
close(cgrp_fd);
}
void test_cgroup_iter(void)
{
struct cgroup_iter *skel = NULL;
@ -217,6 +291,8 @@ void test_cgroup_iter(void)
test_early_termination(skel);
if (test__start_subtest("cgroup_iter__self_only"))
test_walk_self_only(skel);
if (test__start_subtest("cgroup_iter__dead_self_only"))
test_walk_dead_self_only(skel);
out:
cgroup_iter__destroy(skel);
cleanup_cgroups();

View file

@ -0,0 +1,175 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#define _GNU_SOURCE
#include <cgroup_helpers.h>
#include <test_progs.h>
#include "cgrp_kfunc_failure.skel.h"
#include "cgrp_kfunc_success.skel.h"
static size_t log_buf_sz = 1 << 20; /* 1 MB */
static char obj_log_buf[1048576];
static struct cgrp_kfunc_success *open_load_cgrp_kfunc_skel(void)
{
struct cgrp_kfunc_success *skel;
int err;
skel = cgrp_kfunc_success__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return NULL;
skel->bss->pid = getpid();
err = cgrp_kfunc_success__load(skel);
if (!ASSERT_OK(err, "skel_load"))
goto cleanup;
return skel;
cleanup:
cgrp_kfunc_success__destroy(skel);
return NULL;
}
static int mkdir_rm_test_dir(void)
{
int fd;
const char *cgrp_path = "cgrp_kfunc";
fd = create_and_get_cgroup(cgrp_path);
if (!ASSERT_GT(fd, 0, "mkdir_cgrp_fd"))
return -1;
close(fd);
remove_cgroup(cgrp_path);
return 0;
}
static void run_success_test(const char *prog_name)
{
struct cgrp_kfunc_success *skel;
struct bpf_program *prog;
struct bpf_link *link = NULL;
skel = open_load_cgrp_kfunc_skel();
if (!ASSERT_OK_PTR(skel, "open_load_skel"))
return;
if (!ASSERT_OK(skel->bss->err, "pre_mkdir_err"))
goto cleanup;
prog = bpf_object__find_program_by_name(skel->obj, prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto cleanup;
link = bpf_program__attach(prog);
if (!ASSERT_OK_PTR(link, "attached_link"))
goto cleanup;
ASSERT_EQ(skel->bss->invocations, 0, "pre_rmdir_count");
if (!ASSERT_OK(mkdir_rm_test_dir(), "cgrp_mkdir"))
goto cleanup;
ASSERT_EQ(skel->bss->invocations, 1, "post_rmdir_count");
ASSERT_OK(skel->bss->err, "post_rmdir_err");
cleanup:
bpf_link__destroy(link);
cgrp_kfunc_success__destroy(skel);
}
static const char * const success_tests[] = {
"test_cgrp_acquire_release_argument",
"test_cgrp_acquire_leave_in_map",
"test_cgrp_xchg_release",
"test_cgrp_get_release",
"test_cgrp_get_ancestors",
};
static struct {
const char *prog_name;
const char *expected_err_msg;
} failure_tests[] = {
{"cgrp_kfunc_acquire_untrusted", "R1 must be referenced or trusted"},
{"cgrp_kfunc_acquire_fp", "arg#0 pointer type STRUCT cgroup must point"},
{"cgrp_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"},
{"cgrp_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"},
{"cgrp_kfunc_acquire_null", "arg#0 pointer type STRUCT cgroup must point"},
{"cgrp_kfunc_acquire_unreleased", "Unreleased reference"},
{"cgrp_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
{"cgrp_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"},
{"cgrp_kfunc_get_null", "arg#0 expected pointer to map value"},
{"cgrp_kfunc_xchg_unreleased", "Unreleased reference"},
{"cgrp_kfunc_get_unreleased", "Unreleased reference"},
{"cgrp_kfunc_release_untrusted", "arg#0 is untrusted_ptr_or_null_ expected ptr_ or socket"},
{"cgrp_kfunc_release_fp", "arg#0 pointer type STRUCT cgroup must point"},
{"cgrp_kfunc_release_null", "arg#0 is ptr_or_null_ expected ptr_ or socket"},
{"cgrp_kfunc_release_unacquired", "release kernel function bpf_cgroup_release expects"},
};
static void verify_fail(const char *prog_name, const char *expected_err_msg)
{
LIBBPF_OPTS(bpf_object_open_opts, opts);
struct cgrp_kfunc_failure *skel;
int err, i;
opts.kernel_log_buf = obj_log_buf;
opts.kernel_log_size = log_buf_sz;
opts.kernel_log_level = 1;
skel = cgrp_kfunc_failure__open_opts(&opts);
if (!ASSERT_OK_PTR(skel, "cgrp_kfunc_failure__open_opts"))
goto cleanup;
for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
struct bpf_program *prog;
const char *curr_name = failure_tests[i].prog_name;
prog = bpf_object__find_program_by_name(skel->obj, curr_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto cleanup;
bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name));
}
err = cgrp_kfunc_failure__load(skel);
if (!ASSERT_ERR(err, "unexpected load success"))
goto cleanup;
if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
}
cleanup:
cgrp_kfunc_failure__destroy(skel);
}
void test_cgrp_kfunc(void)
{
int i, err;
err = setup_cgroup_environment();
if (!ASSERT_OK(err, "cgrp_env_setup"))
goto cleanup;
for (i = 0; i < ARRAY_SIZE(success_tests); i++) {
if (!test__start_subtest(success_tests[i]))
continue;
run_success_test(success_tests[i]);
}
for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
if (!test__start_subtest(failure_tests[i].prog_name))
continue;
verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg);
}
cleanup:
cleanup_cgroup_environment();
}

View file

@ -17,7 +17,7 @@ static struct {
{"ringbuf_missing_release2", "Unreleased reference id=2"},
{"ringbuf_missing_release_callback", "Unreleased reference id"},
{"use_after_invalid", "Expected an initialized dynptr as arg #3"},
{"ringbuf_invalid_api", "type=mem expected=alloc_mem"},
{"ringbuf_invalid_api", "type=mem expected=ringbuf_mem"},
{"add_dynptr_to_map1", "invalid indirect read from stack"},
{"add_dynptr_to_map2", "invalid indirect read from stack"},
{"data_slice_out_of_bounds_ringbuf", "value is outside of the allowed memory range"},

View file

@ -0,0 +1,146 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <network_helpers.h>
#include <net/if.h>
#include "empty_skb.skel.h"
#define SYS(cmd) ({ \
if (!ASSERT_OK(system(cmd), (cmd))) \
goto out; \
})
void serial_test_empty_skb(void)
{
LIBBPF_OPTS(bpf_test_run_opts, tattr);
struct empty_skb *bpf_obj = NULL;
struct nstoken *tok = NULL;
struct bpf_program *prog;
char eth_hlen_pp[15];
char eth_hlen[14];
int veth_ifindex;
int ipip_ifindex;
int err;
int i;
struct {
const char *msg;
const void *data_in;
__u32 data_size_in;
int *ifindex;
int err;
int ret;
bool success_on_tc;
} tests[] = {
/* Empty packets are always rejected. */
{
/* BPF_PROG_RUN ETH_HLEN size check */
.msg = "veth empty ingress packet",
.data_in = NULL,
.data_size_in = 0,
.ifindex = &veth_ifindex,
.err = -EINVAL,
},
{
/* BPF_PROG_RUN ETH_HLEN size check */
.msg = "ipip empty ingress packet",
.data_in = NULL,
.data_size_in = 0,
.ifindex = &ipip_ifindex,
.err = -EINVAL,
},
/* ETH_HLEN-sized packets:
* - can not be redirected at LWT_XMIT
* - can be redirected at TC to non-tunneling dest
*/
{
/* __bpf_redirect_common */
.msg = "veth ETH_HLEN packet ingress",
.data_in = eth_hlen,
.data_size_in = sizeof(eth_hlen),
.ifindex = &veth_ifindex,
.ret = -ERANGE,
.success_on_tc = true,
},
{
/* __bpf_redirect_no_mac
*
* lwt: skb->len=0 <= skb_network_offset=0
* tc: skb->len=14 <= skb_network_offset=14
*/
.msg = "ipip ETH_HLEN packet ingress",
.data_in = eth_hlen,
.data_size_in = sizeof(eth_hlen),
.ifindex = &ipip_ifindex,
.ret = -ERANGE,
},
/* ETH_HLEN+1-sized packet should be redirected. */
{
.msg = "veth ETH_HLEN+1 packet ingress",
.data_in = eth_hlen_pp,
.data_size_in = sizeof(eth_hlen_pp),
.ifindex = &veth_ifindex,
},
{
.msg = "ipip ETH_HLEN+1 packet ingress",
.data_in = eth_hlen_pp,
.data_size_in = sizeof(eth_hlen_pp),
.ifindex = &ipip_ifindex,
},
};
SYS("ip netns add empty_skb");
tok = open_netns("empty_skb");
SYS("ip link add veth0 type veth peer veth1");
SYS("ip link set dev veth0 up");
SYS("ip link set dev veth1 up");
SYS("ip addr add 10.0.0.1/8 dev veth0");
SYS("ip addr add 10.0.0.2/8 dev veth1");
veth_ifindex = if_nametoindex("veth0");
SYS("ip link add ipip0 type ipip local 10.0.0.1 remote 10.0.0.2");
SYS("ip link set ipip0 up");
SYS("ip addr add 192.168.1.1/16 dev ipip0");
ipip_ifindex = if_nametoindex("ipip0");
bpf_obj = empty_skb__open_and_load();
if (!ASSERT_OK_PTR(bpf_obj, "open skeleton"))
goto out;
for (i = 0; i < ARRAY_SIZE(tests); i++) {
bpf_object__for_each_program(prog, bpf_obj->obj) {
char buf[128];
bool at_tc = !strncmp(bpf_program__section_name(prog), "tc", 2);
tattr.data_in = tests[i].data_in;
tattr.data_size_in = tests[i].data_size_in;
tattr.data_size_out = 0;
bpf_obj->bss->ifindex = *tests[i].ifindex;
bpf_obj->bss->ret = 0;
err = bpf_prog_test_run_opts(bpf_program__fd(prog), &tattr);
sprintf(buf, "err: %s [%s]", tests[i].msg, bpf_program__name(prog));
if (at_tc && tests[i].success_on_tc)
ASSERT_GE(err, 0, buf);
else
ASSERT_EQ(err, tests[i].err, buf);
sprintf(buf, "ret: %s [%s]", tests[i].msg, bpf_program__name(prog));
if (at_tc && tests[i].success_on_tc)
ASSERT_GE(bpf_obj->bss->ret, 0, buf);
else
ASSERT_EQ(bpf_obj->bss->ret, tests[i].ret, buf);
}
}
out:
if (bpf_obj)
empty_skb__destroy(bpf_obj);
if (tok)
close_netns(tok);
system("ip netns del empty_skb");
}

View file

@ -22,7 +22,7 @@ static struct {
"arg#0 pointer type STRUCT bpf_dynptr_kern points to unsupported dynamic pointer type", 0},
{"not_valid_dynptr",
"arg#0 pointer type STRUCT bpf_dynptr_kern must be valid and initialized", 0},
{"not_ptr_to_stack", "arg#0 pointer type STRUCT bpf_dynptr_kern not to stack", 0},
{"not_ptr_to_stack", "arg#0 expected pointer to stack", 0},
{"dynptr_data_null", NULL, -EBADMSG},
};

View file

@ -0,0 +1,740 @@
// SPDX-License-Identifier: GPL-2.0
#include <bpf/btf.h>
#include <test_btf.h>
#include <linux/btf.h>
#include <test_progs.h>
#include <network_helpers.h>
#include "linked_list.skel.h"
#include "linked_list_fail.skel.h"
static char log_buf[1024 * 1024];
static struct {
const char *prog_name;
const char *err_msg;
} linked_list_fail_tests[] = {
#define TEST(test, off) \
{ #test "_missing_lock_push_front", \
"bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \
{ #test "_missing_lock_push_back", \
"bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \
{ #test "_missing_lock_pop_front", \
"bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \
{ #test "_missing_lock_pop_back", \
"bpf_spin_lock at off=" #off " must be held for bpf_list_head" },
TEST(kptr, 32)
TEST(global, 16)
TEST(map, 0)
TEST(inner_map, 0)
#undef TEST
#define TEST(test, op) \
{ #test "_kptr_incorrect_lock_" #op, \
"held lock and object are not in the same allocation\n" \
"bpf_spin_lock at off=32 must be held for bpf_list_head" }, \
{ #test "_global_incorrect_lock_" #op, \
"held lock and object are not in the same allocation\n" \
"bpf_spin_lock at off=16 must be held for bpf_list_head" }, \
{ #test "_map_incorrect_lock_" #op, \
"held lock and object are not in the same allocation\n" \
"bpf_spin_lock at off=0 must be held for bpf_list_head" }, \
{ #test "_inner_map_incorrect_lock_" #op, \
"held lock and object are not in the same allocation\n" \
"bpf_spin_lock at off=0 must be held for bpf_list_head" },
TEST(kptr, push_front)
TEST(kptr, push_back)
TEST(kptr, pop_front)
TEST(kptr, pop_back)
TEST(global, push_front)
TEST(global, push_back)
TEST(global, pop_front)
TEST(global, pop_back)
TEST(map, push_front)
TEST(map, push_back)
TEST(map, pop_front)
TEST(map, pop_back)
TEST(inner_map, push_front)
TEST(inner_map, push_back)
TEST(inner_map, pop_front)
TEST(inner_map, pop_back)
#undef TEST
{ "map_compat_kprobe", "tracing progs cannot use bpf_list_head yet" },
{ "map_compat_kretprobe", "tracing progs cannot use bpf_list_head yet" },
{ "map_compat_tp", "tracing progs cannot use bpf_list_head yet" },
{ "map_compat_perf", "tracing progs cannot use bpf_list_head yet" },
{ "map_compat_raw_tp", "tracing progs cannot use bpf_list_head yet" },
{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_list_head yet" },
{ "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" },
{ "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" },
{ "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" },
{ "obj_drop_non_zero_off", "R1 must have zero offset when passed to release func" },
{ "new_null_ret", "R0 invalid mem access 'ptr_or_null_'" },
{ "obj_new_acq", "Unreleased reference id=" },
{ "use_after_drop", "invalid mem access 'scalar'" },
{ "ptr_walk_scalar", "type=scalar expected=percpu_ptr_" },
{ "direct_read_lock", "direct access to bpf_spin_lock is disallowed" },
{ "direct_write_lock", "direct access to bpf_spin_lock is disallowed" },
{ "direct_read_head", "direct access to bpf_list_head is disallowed" },
{ "direct_write_head", "direct access to bpf_list_head is disallowed" },
{ "direct_read_node", "direct access to bpf_list_node is disallowed" },
{ "direct_write_node", "direct access to bpf_list_node is disallowed" },
{ "write_after_push_front", "only read is supported" },
{ "write_after_push_back", "only read is supported" },
{ "use_after_unlock_push_front", "invalid mem access 'scalar'" },
{ "use_after_unlock_push_back", "invalid mem access 'scalar'" },
{ "double_push_front", "arg#1 expected pointer to allocated object" },
{ "double_push_back", "arg#1 expected pointer to allocated object" },
{ "no_node_value_type", "bpf_list_node not found at offset=0" },
{ "incorrect_value_type",
"operation on bpf_list_head expects arg#1 bpf_list_node at offset=0 in struct foo, "
"but arg is at offset=0 in struct bar" },
{ "incorrect_node_var_off", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" },
{ "incorrect_node_off1", "bpf_list_node not found at offset=1" },
{ "incorrect_node_off2", "arg#1 offset=40, but expected bpf_list_node at offset=0 in struct foo" },
{ "no_head_type", "bpf_list_head not found at offset=0" },
{ "incorrect_head_var_off1", "R1 doesn't have constant offset" },
{ "incorrect_head_var_off2", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" },
{ "incorrect_head_off1", "bpf_list_head not found at offset=17" },
{ "incorrect_head_off2", "bpf_list_head not found at offset=1" },
{ "pop_front_off",
"15: (bf) r1 = r6 ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) "
"R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) refs=2,4\n"
"16: (85) call bpf_this_cpu_ptr#154\nR1 type=ptr_or_null_ expected=percpu_ptr_" },
{ "pop_back_off",
"15: (bf) r1 = r6 ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) "
"R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) refs=2,4\n"
"16: (85) call bpf_this_cpu_ptr#154\nR1 type=ptr_or_null_ expected=percpu_ptr_" },
};
static void test_linked_list_fail_prog(const char *prog_name, const char *err_msg)
{
LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf,
.kernel_log_size = sizeof(log_buf),
.kernel_log_level = 1);
struct linked_list_fail *skel;
struct bpf_program *prog;
int ret;
skel = linked_list_fail__open_opts(&opts);
if (!ASSERT_OK_PTR(skel, "linked_list_fail__open_opts"))
return;
prog = bpf_object__find_program_by_name(skel->obj, prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto end;
bpf_program__set_autoload(prog, true);
ret = linked_list_fail__load(skel);
if (!ASSERT_ERR(ret, "linked_list_fail__load must fail"))
goto end;
if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
fprintf(stderr, "Expected: %s\n", err_msg);
fprintf(stderr, "Verifier: %s\n", log_buf);
}
end:
linked_list_fail__destroy(skel);
}
static void clear_fields(struct bpf_map *map)
{
char buf[24];
int key = 0;
memset(buf, 0xff, sizeof(buf));
ASSERT_OK(bpf_map__update_elem(map, &key, sizeof(key), buf, sizeof(buf), 0), "check_and_free_fields");
}
enum {
TEST_ALL,
PUSH_POP,
PUSH_POP_MULT,
LIST_IN_LIST,
};
static void test_linked_list_success(int mode, bool leave_in_map)
{
LIBBPF_OPTS(bpf_test_run_opts, opts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.repeat = 1,
);
struct linked_list *skel;
int ret;
skel = linked_list__open_and_load();
if (!ASSERT_OK_PTR(skel, "linked_list__open_and_load"))
return;
if (mode == LIST_IN_LIST)
goto lil;
if (mode == PUSH_POP_MULT)
goto ppm;
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_push_pop), &opts);
ASSERT_OK(ret, "map_list_push_pop");
ASSERT_OK(opts.retval, "map_list_push_pop retval");
if (!leave_in_map)
clear_fields(skel->maps.array_map);
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_push_pop), &opts);
ASSERT_OK(ret, "inner_map_list_push_pop");
ASSERT_OK(opts.retval, "inner_map_list_push_pop retval");
if (!leave_in_map)
clear_fields(skel->maps.inner_map);
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop), &opts);
ASSERT_OK(ret, "global_list_push_pop");
ASSERT_OK(opts.retval, "global_list_push_pop retval");
if (!leave_in_map)
clear_fields(skel->maps.bss_A);
if (mode == PUSH_POP)
goto end;
ppm:
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_push_pop_multiple), &opts);
ASSERT_OK(ret, "map_list_push_pop_multiple");
ASSERT_OK(opts.retval, "map_list_push_pop_multiple retval");
if (!leave_in_map)
clear_fields(skel->maps.array_map);
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_push_pop_multiple), &opts);
ASSERT_OK(ret, "inner_map_list_push_pop_multiple");
ASSERT_OK(opts.retval, "inner_map_list_push_pop_multiple retval");
if (!leave_in_map)
clear_fields(skel->maps.inner_map);
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop_multiple), &opts);
ASSERT_OK(ret, "global_list_push_pop_multiple");
ASSERT_OK(opts.retval, "global_list_push_pop_multiple retval");
if (!leave_in_map)
clear_fields(skel->maps.bss_A);
if (mode == PUSH_POP_MULT)
goto end;
lil:
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_in_list), &opts);
ASSERT_OK(ret, "map_list_in_list");
ASSERT_OK(opts.retval, "map_list_in_list retval");
if (!leave_in_map)
clear_fields(skel->maps.array_map);
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_in_list), &opts);
ASSERT_OK(ret, "inner_map_list_in_list");
ASSERT_OK(opts.retval, "inner_map_list_in_list retval");
if (!leave_in_map)
clear_fields(skel->maps.inner_map);
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_in_list), &opts);
ASSERT_OK(ret, "global_list_in_list");
ASSERT_OK(opts.retval, "global_list_in_list retval");
if (!leave_in_map)
clear_fields(skel->maps.bss_A);
end:
linked_list__destroy(skel);
}
#define SPIN_LOCK 2
#define LIST_HEAD 3
#define LIST_NODE 4
static struct btf *init_btf(void)
{
int id, lid, hid, nid;
struct btf *btf;
btf = btf__new_empty();
if (!ASSERT_OK_PTR(btf, "btf__new_empty"))
return NULL;
id = btf__add_int(btf, "int", 4, BTF_INT_SIGNED);
if (!ASSERT_EQ(id, 1, "btf__add_int"))
goto end;
lid = btf__add_struct(btf, "bpf_spin_lock", 4);
if (!ASSERT_EQ(lid, SPIN_LOCK, "btf__add_struct bpf_spin_lock"))
goto end;
hid = btf__add_struct(btf, "bpf_list_head", 16);
if (!ASSERT_EQ(hid, LIST_HEAD, "btf__add_struct bpf_list_head"))
goto end;
nid = btf__add_struct(btf, "bpf_list_node", 16);
if (!ASSERT_EQ(nid, LIST_NODE, "btf__add_struct bpf_list_node"))
goto end;
return btf;
end:
btf__free(btf);
return NULL;
}
static void test_btf(void)
{
struct btf *btf = NULL;
int id, err;
while (test__start_subtest("btf: too many locks")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 24);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
if (!ASSERT_OK(err, "btf__add_struct foo::a"))
break;
err = btf__add_field(btf, "b", SPIN_LOCK, 32, 0);
if (!ASSERT_OK(err, "btf__add_struct foo::a"))
break;
err = btf__add_field(btf, "c", LIST_HEAD, 64, 0);
if (!ASSERT_OK(err, "btf__add_struct foo::a"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -E2BIG, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: missing lock")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 16);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_struct foo::a"))
break;
id = btf__add_decl_tag(btf, "contains:baz:a", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:baz:a"))
break;
id = btf__add_struct(btf, "baz", 16);
if (!ASSERT_EQ(id, 7, "btf__add_struct baz"))
break;
err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
if (!ASSERT_OK(err, "btf__add_field baz::a"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -EINVAL, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: bad offset")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 36);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::c"))
break;
id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -EEXIST, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: missing contains:")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 24);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", LIST_HEAD, 64, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -EINVAL, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: missing struct")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 24);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", LIST_HEAD, 64, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
id = btf__add_decl_tag(btf, "contains:bar:bar", 5, 1);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:bar"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -ENOENT, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: missing node")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 24);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", LIST_HEAD, 64, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
id = btf__add_decl_tag(btf, "contains:foo:c", 5, 1);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:c"))
break;
err = btf__load_into_kernel(btf);
btf__free(btf);
ASSERT_EQ(err, -ENOENT, "check btf");
break;
}
while (test__start_subtest("btf: node incorrect type")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 20);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
id = btf__add_decl_tag(btf, "contains:bar:a", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:a"))
break;
id = btf__add_struct(btf, "bar", 4);
if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
break;
err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
if (!ASSERT_OK(err, "btf__add_field bar::a"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -EINVAL, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: multiple bpf_list_node with name b")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 52);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 256, 0);
if (!ASSERT_OK(err, "btf__add_field foo::c"))
break;
err = btf__add_field(btf, "d", SPIN_LOCK, 384, 0);
if (!ASSERT_OK(err, "btf__add_field foo::d"))
break;
id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -EINVAL, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: owning | owned AA cycle")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 36);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
if (!ASSERT_OK(err, "btf__add_field foo::c"))
break;
id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -ELOOP, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: owning | owned ABA cycle")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 36);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
if (!ASSERT_OK(err, "btf__add_field foo::c"))
break;
id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b"))
break;
id = btf__add_struct(btf, "bar", 36);
if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field bar::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field bar::b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
if (!ASSERT_OK(err, "btf__add_field bar::c"))
break;
id = btf__add_decl_tag(btf, "contains:foo:b", 7, 0);
if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:foo:b"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -ELOOP, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: owning -> owned")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 20);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
id = btf__add_decl_tag(btf, "contains:bar:a", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:a"))
break;
id = btf__add_struct(btf, "bar", 16);
if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
break;
err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
if (!ASSERT_OK(err, "btf__add_field bar::a"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, 0, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: owning -> owning | owned -> owned")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 20);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b"))
break;
id = btf__add_struct(btf, "bar", 36);
if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field bar::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field bar::b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
if (!ASSERT_OK(err, "btf__add_field bar::c"))
break;
id = btf__add_decl_tag(btf, "contains:baz:a", 7, 0);
if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:baz:a"))
break;
id = btf__add_struct(btf, "baz", 16);
if (!ASSERT_EQ(id, 9, "btf__add_struct baz"))
break;
err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
if (!ASSERT_OK(err, "btf__add_field baz:a"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, 0, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: owning | owned -> owning | owned -> owned")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 36);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
if (!ASSERT_OK(err, "btf__add_field foo::c"))
break;
id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b"))
break;
id = btf__add_struct(btf, "bar", 36);
if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field bar:a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field bar:b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
if (!ASSERT_OK(err, "btf__add_field bar:c"))
break;
id = btf__add_decl_tag(btf, "contains:baz:a", 7, 0);
if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:baz:a"))
break;
id = btf__add_struct(btf, "baz", 16);
if (!ASSERT_EQ(id, 9, "btf__add_struct baz"))
break;
err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
if (!ASSERT_OK(err, "btf__add_field baz:a"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -ELOOP, "check btf");
btf__free(btf);
break;
}
while (test__start_subtest("btf: owning -> owning | owned -> owning | owned -> owned")) {
btf = init_btf();
if (!ASSERT_OK_PTR(btf, "init_btf"))
break;
id = btf__add_struct(btf, "foo", 20);
if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field foo::a"))
break;
err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
if (!ASSERT_OK(err, "btf__add_field foo::b"))
break;
id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0);
if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b"))
break;
id = btf__add_struct(btf, "bar", 36);
if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field bar::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field bar::b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
if (!ASSERT_OK(err, "btf__add_field bar::c"))
break;
id = btf__add_decl_tag(btf, "contains:baz:b", 7, 0);
if (!ASSERT_EQ(id, 8, "btf__add_decl_tag"))
break;
id = btf__add_struct(btf, "baz", 36);
if (!ASSERT_EQ(id, 9, "btf__add_struct baz"))
break;
err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
if (!ASSERT_OK(err, "btf__add_field bar::a"))
break;
err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
if (!ASSERT_OK(err, "btf__add_field bar::b"))
break;
err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
if (!ASSERT_OK(err, "btf__add_field bar::c"))
break;
id = btf__add_decl_tag(btf, "contains:bam:a", 9, 0);
if (!ASSERT_EQ(id, 10, "btf__add_decl_tag contains:bam:a"))
break;
id = btf__add_struct(btf, "bam", 16);
if (!ASSERT_EQ(id, 11, "btf__add_struct bam"))
break;
err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
if (!ASSERT_OK(err, "btf__add_field bam::a"))
break;
err = btf__load_into_kernel(btf);
ASSERT_EQ(err, -ELOOP, "check btf");
btf__free(btf);
break;
}
}
void test_linked_list(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(linked_list_fail_tests); i++) {
if (!test__start_subtest(linked_list_fail_tests[i].prog_name))
continue;
test_linked_list_fail_prog(linked_list_fail_tests[i].prog_name,
linked_list_fail_tests[i].err_msg);
}
test_btf();
test_linked_list_success(PUSH_POP, false);
test_linked_list_success(PUSH_POP, true);
test_linked_list_success(PUSH_POP_MULT, false);
test_linked_list_success(PUSH_POP_MULT, true);
test_linked_list_success(LIST_IN_LIST, false);
test_linked_list_success(LIST_IN_LIST, true);
test_linked_list_success(TEST_ALL, false);
}

View file

@ -173,10 +173,12 @@ static void test_lsm_cgroup_functional(void)
ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 4, "total prog count");
ASSERT_EQ(query_prog_cnt(cgroup_fd2, NULL), 1, "total prog count");
/* AF_UNIX is prohibited. */
fd = socket(AF_UNIX, SOCK_STREAM, 0);
ASSERT_LT(fd, 0, "socket(AF_UNIX)");
if (!(skel->kconfig->CONFIG_SECURITY_APPARMOR
|| skel->kconfig->CONFIG_SECURITY_SELINUX
|| skel->kconfig->CONFIG_SECURITY_SMACK))
/* AF_UNIX is prohibited. */
ASSERT_LT(fd, 0, "socket(AF_UNIX)");
close(fd);
/* AF_INET6 gets default policy (sk_priority). */
@ -233,11 +235,18 @@ static void test_lsm_cgroup_functional(void)
/* AF_INET6+SOCK_STREAM
* AF_PACKET+SOCK_RAW
* AF_UNIX+SOCK_RAW if already have non-bpf lsms installed
* listen_fd
* client_fd
* accepted_fd
*/
ASSERT_EQ(skel->bss->called_socket_post_create2, 5, "called_create2");
if (skel->kconfig->CONFIG_SECURITY_APPARMOR
|| skel->kconfig->CONFIG_SECURITY_SELINUX
|| skel->kconfig->CONFIG_SECURITY_SMACK)
/* AF_UNIX+SOCK_RAW if already have non-bpf lsms installed */
ASSERT_EQ(skel->bss->called_socket_post_create2, 6, "called_create2");
else
ASSERT_EQ(skel->bss->called_socket_post_create2, 5, "called_create2");
/* start_server
* bind(ETH_P_ALL)

View file

@ -0,0 +1,158 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates.*/
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <test_progs.h>
#include <bpf/btf.h>
#include "rcu_read_lock.skel.h"
#include "cgroup_helpers.h"
static unsigned long long cgroup_id;
static void test_success(void)
{
struct rcu_read_lock *skel;
int err;
skel = rcu_read_lock__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
skel->bss->target_pid = syscall(SYS_gettid);
bpf_program__set_autoload(skel->progs.get_cgroup_id, true);
bpf_program__set_autoload(skel->progs.task_succ, true);
bpf_program__set_autoload(skel->progs.no_lock, true);
bpf_program__set_autoload(skel->progs.two_regions, true);
bpf_program__set_autoload(skel->progs.non_sleepable_1, true);
bpf_program__set_autoload(skel->progs.non_sleepable_2, true);
err = rcu_read_lock__load(skel);
if (!ASSERT_OK(err, "skel_load"))
goto out;
err = rcu_read_lock__attach(skel);
if (!ASSERT_OK(err, "skel_attach"))
goto out;
syscall(SYS_getpgid);
ASSERT_EQ(skel->bss->task_storage_val, 2, "task_storage_val");
ASSERT_EQ(skel->bss->cgroup_id, cgroup_id, "cgroup_id");
out:
rcu_read_lock__destroy(skel);
}
static void test_rcuptr_acquire(void)
{
struct rcu_read_lock *skel;
int err;
skel = rcu_read_lock__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
skel->bss->target_pid = syscall(SYS_gettid);
bpf_program__set_autoload(skel->progs.task_acquire, true);
err = rcu_read_lock__load(skel);
if (!ASSERT_OK(err, "skel_load"))
goto out;
err = rcu_read_lock__attach(skel);
ASSERT_OK(err, "skel_attach");
out:
rcu_read_lock__destroy(skel);
}
static const char * const inproper_region_tests[] = {
"miss_lock",
"miss_unlock",
"non_sleepable_rcu_mismatch",
"inproper_sleepable_helper",
"inproper_sleepable_kfunc",
"nested_rcu_region",
};
static void test_inproper_region(void)
{
struct rcu_read_lock *skel;
struct bpf_program *prog;
int i, err;
for (i = 0; i < ARRAY_SIZE(inproper_region_tests); i++) {
skel = rcu_read_lock__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
prog = bpf_object__find_program_by_name(skel->obj, inproper_region_tests[i]);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto out;
bpf_program__set_autoload(prog, true);
err = rcu_read_lock__load(skel);
ASSERT_ERR(err, "skel_load");
out:
rcu_read_lock__destroy(skel);
}
}
static const char * const rcuptr_misuse_tests[] = {
"task_untrusted_non_rcuptr",
"task_untrusted_rcuptr",
"cross_rcu_region",
};
static void test_rcuptr_misuse(void)
{
struct rcu_read_lock *skel;
struct bpf_program *prog;
int i, err;
for (i = 0; i < ARRAY_SIZE(rcuptr_misuse_tests); i++) {
skel = rcu_read_lock__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
prog = bpf_object__find_program_by_name(skel->obj, rcuptr_misuse_tests[i]);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto out;
bpf_program__set_autoload(prog, true);
err = rcu_read_lock__load(skel);
ASSERT_ERR(err, "skel_load");
out:
rcu_read_lock__destroy(skel);
}
}
void test_rcu_read_lock(void)
{
struct btf *vmlinux_btf;
int cgroup_fd;
vmlinux_btf = btf__load_vmlinux_btf();
if (!ASSERT_OK_PTR(vmlinux_btf, "could not load vmlinux BTF"))
return;
if (btf__find_by_name_kind(vmlinux_btf, "rcu", BTF_KIND_TYPE_TAG) < 0) {
test__skip();
goto out;
}
cgroup_fd = test__join_cgroup("/rcu_read_lock");
if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /rcu_read_lock"))
goto out;
cgroup_id = get_cgroup_id("/rcu_read_lock");
if (test__start_subtest("success"))
test_success();
if (test__start_subtest("rcuptr_acquire"))
test_rcuptr_acquire();
if (test__start_subtest("negative_tests_inproper_region"))
test_inproper_region();
if (test__start_subtest("negative_tests_rcuptr_misuse"))
test_rcuptr_misuse();
close(cgroup_fd);
out:
btf__free(vmlinux_btf);
}

View file

@ -0,0 +1,142 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <network_helpers.h>
#include "test_spin_lock.skel.h"
#include "test_spin_lock_fail.skel.h"
static char log_buf[1024 * 1024];
static struct {
const char *prog_name;
const char *err_msg;
} spin_lock_fail_tests[] = {
{ "lock_id_kptr_preserve",
"5: (bf) r1 = r0 ; R0_w=ptr_foo(id=2,ref_obj_id=2,off=0,imm=0) "
"R1_w=ptr_foo(id=2,ref_obj_id=2,off=0,imm=0) refs=2\n6: (85) call bpf_this_cpu_ptr#154\n"
"R1 type=ptr_ expected=percpu_ptr_" },
{ "lock_id_global_zero",
"; R1_w=map_value(off=0,ks=4,vs=4,imm=0)\n2: (85) call bpf_this_cpu_ptr#154\n"
"R1 type=map_value expected=percpu_ptr_" },
{ "lock_id_mapval_preserve",
"8: (bf) r1 = r0 ; R0_w=map_value(id=1,off=0,ks=4,vs=8,imm=0) "
"R1_w=map_value(id=1,off=0,ks=4,vs=8,imm=0)\n9: (85) call bpf_this_cpu_ptr#154\n"
"R1 type=map_value expected=percpu_ptr_" },
{ "lock_id_innermapval_preserve",
"13: (bf) r1 = r0 ; R0=map_value(id=2,off=0,ks=4,vs=8,imm=0) "
"R1_w=map_value(id=2,off=0,ks=4,vs=8,imm=0)\n14: (85) call bpf_this_cpu_ptr#154\n"
"R1 type=map_value expected=percpu_ptr_" },
{ "lock_id_mismatch_kptr_kptr", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_kptr_global", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_kptr_mapval", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_kptr_innermapval", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_global_global", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_global_kptr", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_global_mapval", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_global_innermapval", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_mapval_mapval", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_mapval_kptr", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_mapval_global", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_mapval_innermapval", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_innermapval_innermapval1", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_innermapval_innermapval2", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_innermapval_kptr", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_innermapval_global", "bpf_spin_unlock of different lock" },
{ "lock_id_mismatch_innermapval_mapval", "bpf_spin_unlock of different lock" },
};
static void test_spin_lock_fail_prog(const char *prog_name, const char *err_msg)
{
LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf,
.kernel_log_size = sizeof(log_buf),
.kernel_log_level = 1);
struct test_spin_lock_fail *skel;
struct bpf_program *prog;
int ret;
skel = test_spin_lock_fail__open_opts(&opts);
if (!ASSERT_OK_PTR(skel, "test_spin_lock_fail__open_opts"))
return;
prog = bpf_object__find_program_by_name(skel->obj, prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto end;
bpf_program__set_autoload(prog, true);
ret = test_spin_lock_fail__load(skel);
if (!ASSERT_ERR(ret, "test_spin_lock_fail__load must fail"))
goto end;
/* Skip check if JIT does not support kfuncs */
if (strstr(log_buf, "JIT does not support calling kernel function")) {
test__skip();
goto end;
}
if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
fprintf(stderr, "Expected: %s\n", err_msg);
fprintf(stderr, "Verifier: %s\n", log_buf);
}
end:
test_spin_lock_fail__destroy(skel);
}
static void *spin_lock_thread(void *arg)
{
int err, prog_fd = *(u32 *) arg;
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.repeat = 10000,
);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_OK(topts.retval, "test_run retval");
pthread_exit(arg);
}
void test_spin_lock_success(void)
{
struct test_spin_lock *skel;
pthread_t thread_id[4];
int prog_fd, i;
void *ret;
skel = test_spin_lock__open_and_load();
if (!ASSERT_OK_PTR(skel, "test_spin_lock__open_and_load"))
return;
prog_fd = bpf_program__fd(skel->progs.bpf_spin_lock_test);
for (i = 0; i < 4; i++) {
int err;
err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd);
if (!ASSERT_OK(err, "pthread_create"))
goto end;
}
for (i = 0; i < 4; i++) {
if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join"))
goto end;
if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd"))
goto end;
}
end:
test_spin_lock__destroy(skel);
}
void test_spin_lock(void)
{
int i;
test_spin_lock_success();
for (i = 0; i < ARRAY_SIZE(spin_lock_fail_tests); i++) {
if (!test__start_subtest(spin_lock_fail_tests[i].prog_name))
continue;
test_spin_lock_fail_prog(spin_lock_fail_tests[i].prog_name,
spin_lock_fail_tests[i].err_msg);
}
}

View file

@ -1,45 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <network_helpers.h>
static void *spin_lock_thread(void *arg)
{
int err, prog_fd = *(u32 *) arg;
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.repeat = 10000,
);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_OK(topts.retval, "test_run retval");
pthread_exit(arg);
}
void test_spinlock(void)
{
const char *file = "./test_spin_lock.bpf.o";
pthread_t thread_id[4];
struct bpf_object *obj = NULL;
int prog_fd;
int err = 0, i;
void *ret;
err = bpf_prog_test_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
if (CHECK_FAIL(err)) {
printf("test_spin_lock:bpf_prog_test_load errno %d\n", errno);
goto close_prog;
}
for (i = 0; i < 4; i++)
if (CHECK_FAIL(pthread_create(&thread_id[i], NULL,
&spin_lock_thread, &prog_fd)))
goto close_prog;
for (i = 0; i < 4; i++)
if (CHECK_FAIL(pthread_join(thread_id[i], &ret) ||
ret != (void *)&prog_fd))
goto close_prog;
close_prog:
bpf_object__close(obj);
}

View file

@ -0,0 +1,163 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#define _GNU_SOURCE
#include <sys/wait.h>
#include <test_progs.h>
#include <unistd.h>
#include "task_kfunc_failure.skel.h"
#include "task_kfunc_success.skel.h"
static size_t log_buf_sz = 1 << 20; /* 1 MB */
static char obj_log_buf[1048576];
static struct task_kfunc_success *open_load_task_kfunc_skel(void)
{
struct task_kfunc_success *skel;
int err;
skel = task_kfunc_success__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return NULL;
skel->bss->pid = getpid();
err = task_kfunc_success__load(skel);
if (!ASSERT_OK(err, "skel_load"))
goto cleanup;
return skel;
cleanup:
task_kfunc_success__destroy(skel);
return NULL;
}
static void run_success_test(const char *prog_name)
{
struct task_kfunc_success *skel;
int status;
pid_t child_pid;
struct bpf_program *prog;
struct bpf_link *link = NULL;
skel = open_load_task_kfunc_skel();
if (!ASSERT_OK_PTR(skel, "open_load_skel"))
return;
if (!ASSERT_OK(skel->bss->err, "pre_spawn_err"))
goto cleanup;
prog = bpf_object__find_program_by_name(skel->obj, prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto cleanup;
link = bpf_program__attach(prog);
if (!ASSERT_OK_PTR(link, "attached_link"))
goto cleanup;
child_pid = fork();
if (!ASSERT_GT(child_pid, -1, "child_pid"))
goto cleanup;
if (child_pid == 0)
_exit(0);
waitpid(child_pid, &status, 0);
ASSERT_OK(skel->bss->err, "post_wait_err");
cleanup:
bpf_link__destroy(link);
task_kfunc_success__destroy(skel);
}
static const char * const success_tests[] = {
"test_task_acquire_release_argument",
"test_task_acquire_release_current",
"test_task_acquire_leave_in_map",
"test_task_xchg_release",
"test_task_get_release",
"test_task_current_acquire_release",
"test_task_from_pid_arg",
"test_task_from_pid_current",
"test_task_from_pid_invalid",
};
static struct {
const char *prog_name;
const char *expected_err_msg;
} failure_tests[] = {
{"task_kfunc_acquire_untrusted", "R1 must be referenced or trusted"},
{"task_kfunc_acquire_fp", "arg#0 pointer type STRUCT task_struct must point"},
{"task_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"},
{"task_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"},
{"task_kfunc_acquire_null", "arg#0 pointer type STRUCT task_struct must point"},
{"task_kfunc_acquire_unreleased", "Unreleased reference"},
{"task_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
{"task_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"},
{"task_kfunc_get_null", "arg#0 expected pointer to map value"},
{"task_kfunc_xchg_unreleased", "Unreleased reference"},
{"task_kfunc_get_unreleased", "Unreleased reference"},
{"task_kfunc_release_untrusted", "arg#0 is untrusted_ptr_or_null_ expected ptr_ or socket"},
{"task_kfunc_release_fp", "arg#0 pointer type STRUCT task_struct must point"},
{"task_kfunc_release_null", "arg#0 is ptr_or_null_ expected ptr_ or socket"},
{"task_kfunc_release_unacquired", "release kernel function bpf_task_release expects"},
{"task_kfunc_from_pid_no_null_check", "arg#0 is ptr_or_null_ expected ptr_ or socket"},
};
static void verify_fail(const char *prog_name, const char *expected_err_msg)
{
LIBBPF_OPTS(bpf_object_open_opts, opts);
struct task_kfunc_failure *skel;
int err, i;
opts.kernel_log_buf = obj_log_buf;
opts.kernel_log_size = log_buf_sz;
opts.kernel_log_level = 1;
skel = task_kfunc_failure__open_opts(&opts);
if (!ASSERT_OK_PTR(skel, "task_kfunc_failure__open_opts"))
goto cleanup;
for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
struct bpf_program *prog;
const char *curr_name = failure_tests[i].prog_name;
prog = bpf_object__find_program_by_name(skel->obj, curr_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto cleanup;
bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name));
}
err = task_kfunc_failure__load(skel);
if (!ASSERT_ERR(err, "unexpected load success"))
goto cleanup;
if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
}
cleanup:
task_kfunc_failure__destroy(skel);
}
void test_task_kfunc(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(success_tests); i++) {
if (!test__start_subtest(success_tests[i]))
continue;
run_success_test(success_tests[i]);
}
for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
if (!test__start_subtest(failure_tests[i].prog_name))
continue;
verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg);
}
}

View file

@ -0,0 +1,114 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#include <test_progs.h>
#include <network_helpers.h>
#include "type_cast.skel.h"
static void test_xdp(void)
{
struct type_cast *skel;
int err, prog_fd;
char buf[128];
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.data_out = buf,
.data_size_out = sizeof(buf),
.repeat = 1,
);
skel = type_cast__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
bpf_program__set_autoload(skel->progs.md_xdp, true);
err = type_cast__load(skel);
if (!ASSERT_OK(err, "skel_load"))
goto out;
prog_fd = bpf_program__fd(skel->progs.md_xdp);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_EQ(topts.retval, XDP_PASS, "xdp test_run retval");
ASSERT_EQ(skel->bss->ifindex, 1, "xdp_md ifindex");
ASSERT_EQ(skel->bss->ifindex, skel->bss->ingress_ifindex, "xdp_md ingress_ifindex");
ASSERT_STREQ(skel->bss->name, "lo", "xdp_md name");
ASSERT_NEQ(skel->bss->inum, 0, "xdp_md inum");
out:
type_cast__destroy(skel);
}
static void test_tc(void)
{
struct type_cast *skel;
int err, prog_fd;
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.repeat = 1,
);
skel = type_cast__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
bpf_program__set_autoload(skel->progs.md_skb, true);
err = type_cast__load(skel);
if (!ASSERT_OK(err, "skel_load"))
goto out;
prog_fd = bpf_program__fd(skel->progs.md_skb);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_EQ(topts.retval, 0, "tc test_run retval");
ASSERT_EQ(skel->bss->meta_len, 0, "skb meta_len");
ASSERT_EQ(skel->bss->frag0_len, 0, "skb frag0_len");
ASSERT_NEQ(skel->bss->kskb_len, 0, "skb len");
ASSERT_NEQ(skel->bss->kskb2_len, 0, "skb2 len");
ASSERT_EQ(skel->bss->kskb_len, skel->bss->kskb2_len, "skb len compare");
out:
type_cast__destroy(skel);
}
static const char * const negative_tests[] = {
"untrusted_ptr",
"kctx_u64",
};
static void test_negative(void)
{
struct bpf_program *prog;
struct type_cast *skel;
int i, err;
for (i = 0; i < ARRAY_SIZE(negative_tests); i++) {
skel = type_cast__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
prog = bpf_object__find_program_by_name(skel->obj, negative_tests[i]);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto out;
bpf_program__set_autoload(prog, true);
err = type_cast__load(skel);
ASSERT_ERR(err, "skel_load");
out:
type_cast__destroy(skel);
}
}
void test_type_cast(void)
{
if (test__start_subtest("xdp"))
test_xdp();
if (test__start_subtest("tc"))
test_tc();
if (test__start_subtest("negative"))
test_negative();
}

View file

@ -85,7 +85,7 @@ static void test_max_pkt_size(int fd)
}
#define NUM_PKTS 10000
void test_xdp_do_redirect(void)
void serial_test_xdp_do_redirect(void)
{
int err, xdp_prog_fd, tc_prog_fd, ifindex_src, ifindex_dst;
char data[sizeof(pkt_udp) + sizeof(__u32)];

View file

@ -174,7 +174,7 @@ static void test_synproxy(bool xdp)
system("ip netns del synproxy");
}
void test_xdp_synproxy(void)
void serial_test_xdp_synproxy(void)
{
if (test__start_subtest("xdp"))
test_synproxy(true);

View file

@ -0,0 +1,72 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#ifndef _CGRP_KFUNC_COMMON_H
#define _CGRP_KFUNC_COMMON_H
#include <errno.h>
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct __cgrps_kfunc_map_value {
struct cgroup __kptr_ref * cgrp;
};
struct hash_map {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, int);
__type(value, struct __cgrps_kfunc_map_value);
__uint(max_entries, 1);
} __cgrps_kfunc_map SEC(".maps");
struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym;
struct cgroup *bpf_cgroup_kptr_get(struct cgroup **pp) __ksym;
void bpf_cgroup_release(struct cgroup *p) __ksym;
struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) __ksym;
static inline struct __cgrps_kfunc_map_value *cgrps_kfunc_map_value_lookup(struct cgroup *cgrp)
{
s32 id;
long status;
status = bpf_probe_read_kernel(&id, sizeof(id), &cgrp->self.id);
if (status)
return NULL;
return bpf_map_lookup_elem(&__cgrps_kfunc_map, &id);
}
static inline int cgrps_kfunc_map_insert(struct cgroup *cgrp)
{
struct __cgrps_kfunc_map_value local, *v;
long status;
struct cgroup *acquired, *old;
s32 id;
status = bpf_probe_read_kernel(&id, sizeof(id), &cgrp->self.id);
if (status)
return status;
local.cgrp = NULL;
status = bpf_map_update_elem(&__cgrps_kfunc_map, &id, &local, BPF_NOEXIST);
if (status)
return status;
v = bpf_map_lookup_elem(&__cgrps_kfunc_map, &id);
if (!v) {
bpf_map_delete_elem(&__cgrps_kfunc_map, &id);
return -ENOENT;
}
acquired = bpf_cgroup_acquire(cgrp);
old = bpf_kptr_xchg(&v->cgrp, acquired);
if (old) {
bpf_cgroup_release(old);
return -EEXIST;
}
return 0;
}
#endif /* _CGRP_KFUNC_COMMON_H */

View file

@ -0,0 +1,260 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
#include "cgrp_kfunc_common.h"
char _license[] SEC("license") = "GPL";
/* Prototype for all of the program trace events below:
*
* TRACE_EVENT(cgroup_mkdir,
* TP_PROTO(struct cgroup *cgrp, const char *path),
* TP_ARGS(cgrp, path)
*/
static struct __cgrps_kfunc_map_value *insert_lookup_cgrp(struct cgroup *cgrp)
{
int status;
status = cgrps_kfunc_map_insert(cgrp);
if (status)
return NULL;
return cgrps_kfunc_map_value_lookup(cgrp);
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_acquire_untrusted, struct cgroup *cgrp, const char *path)
{
struct cgroup *acquired;
struct __cgrps_kfunc_map_value *v;
v = insert_lookup_cgrp(cgrp);
if (!v)
return 0;
/* Can't invoke bpf_cgroup_acquire() on an untrusted pointer. */
acquired = bpf_cgroup_acquire(v->cgrp);
bpf_cgroup_release(acquired);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_acquire_fp, struct cgroup *cgrp, const char *path)
{
struct cgroup *acquired, *stack_cgrp = (struct cgroup *)&path;
/* Can't invoke bpf_cgroup_acquire() on a random frame pointer. */
acquired = bpf_cgroup_acquire((struct cgroup *)&stack_cgrp);
bpf_cgroup_release(acquired);
return 0;
}
SEC("kretprobe/cgroup_destroy_locked")
int BPF_PROG(cgrp_kfunc_acquire_unsafe_kretprobe, struct cgroup *cgrp)
{
struct cgroup *acquired;
/* Can't acquire an untrusted struct cgroup * pointer. */
acquired = bpf_cgroup_acquire(cgrp);
bpf_cgroup_release(acquired);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_acquire_trusted_walked, struct cgroup *cgrp, const char *path)
{
struct cgroup *acquired;
/* Can't invoke bpf_cgroup_acquire() on a pointer obtained from walking a trusted cgroup. */
acquired = bpf_cgroup_acquire(cgrp->old_dom_cgrp);
bpf_cgroup_release(acquired);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_acquire_null, struct cgroup *cgrp, const char *path)
{
struct cgroup *acquired;
/* Can't invoke bpf_cgroup_acquire() on a NULL pointer. */
acquired = bpf_cgroup_acquire(NULL);
if (!acquired)
return 0;
bpf_cgroup_release(acquired);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_acquire_unreleased, struct cgroup *cgrp, const char *path)
{
struct cgroup *acquired;
acquired = bpf_cgroup_acquire(cgrp);
/* Acquired cgroup is never released. */
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_get_non_kptr_param, struct cgroup *cgrp, const char *path)
{
struct cgroup *kptr;
/* Cannot use bpf_cgroup_kptr_get() on a non-kptr, even on a valid cgroup. */
kptr = bpf_cgroup_kptr_get(&cgrp);
if (!kptr)
return 0;
bpf_cgroup_release(kptr);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_get_non_kptr_acquired, struct cgroup *cgrp, const char *path)
{
struct cgroup *kptr, *acquired;
acquired = bpf_cgroup_acquire(cgrp);
/* Cannot use bpf_cgroup_kptr_get() on a non-map-value, even if the kptr was acquired. */
kptr = bpf_cgroup_kptr_get(&acquired);
bpf_cgroup_release(acquired);
if (!kptr)
return 0;
bpf_cgroup_release(kptr);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_get_null, struct cgroup *cgrp, const char *path)
{
struct cgroup *kptr;
/* Cannot use bpf_cgroup_kptr_get() on a NULL pointer. */
kptr = bpf_cgroup_kptr_get(NULL);
if (!kptr)
return 0;
bpf_cgroup_release(kptr);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_xchg_unreleased, struct cgroup *cgrp, const char *path)
{
struct cgroup *kptr;
struct __cgrps_kfunc_map_value *v;
v = insert_lookup_cgrp(cgrp);
if (!v)
return 0;
kptr = bpf_kptr_xchg(&v->cgrp, NULL);
if (!kptr)
return 0;
/* Kptr retrieved from map is never released. */
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_get_unreleased, struct cgroup *cgrp, const char *path)
{
struct cgroup *kptr;
struct __cgrps_kfunc_map_value *v;
v = insert_lookup_cgrp(cgrp);
if (!v)
return 0;
kptr = bpf_cgroup_kptr_get(&v->cgrp);
if (!kptr)
return 0;
/* Kptr acquired above is never released. */
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_release_untrusted, struct cgroup *cgrp, const char *path)
{
struct __cgrps_kfunc_map_value *v;
v = insert_lookup_cgrp(cgrp);
if (!v)
return 0;
/* Can't invoke bpf_cgroup_release() on an untrusted pointer. */
bpf_cgroup_release(v->cgrp);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_release_fp, struct cgroup *cgrp, const char *path)
{
struct cgroup *acquired = (struct cgroup *)&path;
/* Cannot release random frame pointer. */
bpf_cgroup_release(acquired);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_release_null, struct cgroup *cgrp, const char *path)
{
struct __cgrps_kfunc_map_value local, *v;
long status;
struct cgroup *acquired, *old;
s32 id;
status = bpf_probe_read_kernel(&id, sizeof(id), &cgrp->self.id);
if (status)
return 0;
local.cgrp = NULL;
status = bpf_map_update_elem(&__cgrps_kfunc_map, &id, &local, BPF_NOEXIST);
if (status)
return status;
v = bpf_map_lookup_elem(&__cgrps_kfunc_map, &id);
if (!v)
return -ENOENT;
acquired = bpf_cgroup_acquire(cgrp);
old = bpf_kptr_xchg(&v->cgrp, acquired);
/* old cannot be passed to bpf_cgroup_release() without a NULL check. */
bpf_cgroup_release(old);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_kfunc_release_unacquired, struct cgroup *cgrp, const char *path)
{
/* Cannot release trusted cgroup pointer which was not acquired. */
bpf_cgroup_release(cgrp);
return 0;
}

View file

@ -0,0 +1,170 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
#include "cgrp_kfunc_common.h"
char _license[] SEC("license") = "GPL";
int err, pid, invocations;
/* Prototype for all of the program trace events below:
*
* TRACE_EVENT(cgroup_mkdir,
* TP_PROTO(struct cgroup *cgrp, const char *path),
* TP_ARGS(cgrp, path)
*/
static bool is_test_kfunc_task(void)
{
int cur_pid = bpf_get_current_pid_tgid() >> 32;
bool same = pid == cur_pid;
if (same)
__sync_fetch_and_add(&invocations, 1);
return same;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_acquire_release_argument, struct cgroup *cgrp, const char *path)
{
struct cgroup *acquired;
if (!is_test_kfunc_task())
return 0;
acquired = bpf_cgroup_acquire(cgrp);
bpf_cgroup_release(acquired);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_acquire_leave_in_map, struct cgroup *cgrp, const char *path)
{
long status;
if (!is_test_kfunc_task())
return 0;
status = cgrps_kfunc_map_insert(cgrp);
if (status)
err = 1;
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_xchg_release, struct cgroup *cgrp, const char *path)
{
struct cgroup *kptr;
struct __cgrps_kfunc_map_value *v;
long status;
if (!is_test_kfunc_task())
return 0;
status = cgrps_kfunc_map_insert(cgrp);
if (status) {
err = 1;
return 0;
}
v = cgrps_kfunc_map_value_lookup(cgrp);
if (!v) {
err = 2;
return 0;
}
kptr = bpf_kptr_xchg(&v->cgrp, NULL);
if (!kptr) {
err = 3;
return 0;
}
bpf_cgroup_release(kptr);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_release, struct cgroup *cgrp, const char *path)
{
struct cgroup *kptr;
struct __cgrps_kfunc_map_value *v;
long status;
if (!is_test_kfunc_task())
return 0;
status = cgrps_kfunc_map_insert(cgrp);
if (status) {
err = 1;
return 0;
}
v = cgrps_kfunc_map_value_lookup(cgrp);
if (!v) {
err = 2;
return 0;
}
kptr = bpf_cgroup_kptr_get(&v->cgrp);
if (!kptr) {
err = 3;
return 0;
}
bpf_cgroup_release(kptr);
return 0;
}
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp, const char *path)
{
struct cgroup *self, *ancestor1, *invalid;
if (!is_test_kfunc_task())
return 0;
self = bpf_cgroup_ancestor(cgrp, cgrp->level);
if (!self) {
err = 1;
return 0;
}
if (self->self.id != cgrp->self.id) {
bpf_cgroup_release(self);
err = 2;
return 0;
}
bpf_cgroup_release(self);
ancestor1 = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
if (!ancestor1) {
err = 3;
return 0;
}
bpf_cgroup_release(ancestor1);
invalid = bpf_cgroup_ancestor(cgrp, 10000);
if (invalid) {
bpf_cgroup_release(invalid);
err = 4;
return 0;
}
invalid = bpf_cgroup_ancestor(cgrp, -1);
if (invalid) {
bpf_cgroup_release(invalid);
err = 5;
return 0;
}
return 0;
}

View file

@ -0,0 +1,37 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
char _license[] SEC("license") = "GPL";
int ifindex;
int ret;
SEC("lwt_xmit")
int redirect_ingress(struct __sk_buff *skb)
{
ret = bpf_clone_redirect(skb, ifindex, BPF_F_INGRESS);
return 0;
}
SEC("lwt_xmit")
int redirect_egress(struct __sk_buff *skb)
{
ret = bpf_clone_redirect(skb, ifindex, 0);
return 0;
}
SEC("tc")
int tc_redirect_ingress(struct __sk_buff *skb)
{
ret = bpf_clone_redirect(skb, ifindex, BPF_F_INGRESS);
return 0;
}
SEC("tc")
int tc_redirect_egress(struct __sk_buff *skb)
{
ret = bpf_clone_redirect(skb, ifindex, 0);
return 0;
}

View file

@ -0,0 +1,370 @@
// SPDX-License-Identifier: GPL-2.0
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
#include "bpf_experimental.h"
#ifndef ARRAY_SIZE
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#endif
#include "linked_list.h"
static __always_inline
int list_push_pop(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map)
{
struct bpf_list_node *n;
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 2;
bpf_spin_lock(lock);
n = bpf_list_pop_front(head);
bpf_spin_unlock(lock);
if (n) {
bpf_obj_drop(container_of(n, struct foo, node));
bpf_obj_drop(f);
return 3;
}
bpf_spin_lock(lock);
n = bpf_list_pop_back(head);
bpf_spin_unlock(lock);
if (n) {
bpf_obj_drop(container_of(n, struct foo, node));
bpf_obj_drop(f);
return 4;
}
bpf_spin_lock(lock);
f->data = 42;
bpf_list_push_front(head, &f->node);
bpf_spin_unlock(lock);
if (leave_in_map)
return 0;
bpf_spin_lock(lock);
n = bpf_list_pop_back(head);
bpf_spin_unlock(lock);
if (!n)
return 5;
f = container_of(n, struct foo, node);
if (f->data != 42) {
bpf_obj_drop(f);
return 6;
}
bpf_spin_lock(lock);
f->data = 13;
bpf_list_push_front(head, &f->node);
bpf_spin_unlock(lock);
bpf_spin_lock(lock);
n = bpf_list_pop_front(head);
bpf_spin_unlock(lock);
if (!n)
return 7;
f = container_of(n, struct foo, node);
if (f->data != 13) {
bpf_obj_drop(f);
return 8;
}
bpf_obj_drop(f);
bpf_spin_lock(lock);
n = bpf_list_pop_front(head);
bpf_spin_unlock(lock);
if (n) {
bpf_obj_drop(container_of(n, struct foo, node));
return 9;
}
bpf_spin_lock(lock);
n = bpf_list_pop_back(head);
bpf_spin_unlock(lock);
if (n) {
bpf_obj_drop(container_of(n, struct foo, node));
return 10;
}
return 0;
}
static __always_inline
int list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map)
{
struct bpf_list_node *n;
struct foo *f[8], *pf;
int i;
for (i = 0; i < ARRAY_SIZE(f); i++) {
f[i] = bpf_obj_new(typeof(**f));
if (!f[i])
return 2;
f[i]->data = i;
bpf_spin_lock(lock);
bpf_list_push_front(head, &f[i]->node);
bpf_spin_unlock(lock);
}
for (i = 0; i < ARRAY_SIZE(f); i++) {
bpf_spin_lock(lock);
n = bpf_list_pop_front(head);
bpf_spin_unlock(lock);
if (!n)
return 3;
pf = container_of(n, struct foo, node);
if (pf->data != (ARRAY_SIZE(f) - i - 1)) {
bpf_obj_drop(pf);
return 4;
}
bpf_spin_lock(lock);
bpf_list_push_back(head, &pf->node);
bpf_spin_unlock(lock);
}
if (leave_in_map)
return 0;
for (i = 0; i < ARRAY_SIZE(f); i++) {
bpf_spin_lock(lock);
n = bpf_list_pop_back(head);
bpf_spin_unlock(lock);
if (!n)
return 5;
pf = container_of(n, struct foo, node);
if (pf->data != i) {
bpf_obj_drop(pf);
return 6;
}
bpf_obj_drop(pf);
}
bpf_spin_lock(lock);
n = bpf_list_pop_back(head);
bpf_spin_unlock(lock);
if (n) {
bpf_obj_drop(container_of(n, struct foo, node));
return 7;
}
bpf_spin_lock(lock);
n = bpf_list_pop_front(head);
bpf_spin_unlock(lock);
if (n) {
bpf_obj_drop(container_of(n, struct foo, node));
return 8;
}
return 0;
}
static __always_inline
int list_in_list(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map)
{
struct bpf_list_node *n;
struct bar *ba[8], *b;
struct foo *f;
int i;
f = bpf_obj_new(typeof(*f));
if (!f)
return 2;
for (i = 0; i < ARRAY_SIZE(ba); i++) {
b = bpf_obj_new(typeof(*b));
if (!b) {
bpf_obj_drop(f);
return 3;
}
b->data = i;
bpf_spin_lock(&f->lock);
bpf_list_push_back(&f->head, &b->node);
bpf_spin_unlock(&f->lock);
}
bpf_spin_lock(lock);
f->data = 42;
bpf_list_push_front(head, &f->node);
bpf_spin_unlock(lock);
if (leave_in_map)
return 0;
bpf_spin_lock(lock);
n = bpf_list_pop_front(head);
bpf_spin_unlock(lock);
if (!n)
return 4;
f = container_of(n, struct foo, node);
if (f->data != 42) {
bpf_obj_drop(f);
return 5;
}
for (i = 0; i < ARRAY_SIZE(ba); i++) {
bpf_spin_lock(&f->lock);
n = bpf_list_pop_front(&f->head);
bpf_spin_unlock(&f->lock);
if (!n) {
bpf_obj_drop(f);
return 6;
}
b = container_of(n, struct bar, node);
if (b->data != i) {
bpf_obj_drop(f);
bpf_obj_drop(b);
return 7;
}
bpf_obj_drop(b);
}
bpf_spin_lock(&f->lock);
n = bpf_list_pop_front(&f->head);
bpf_spin_unlock(&f->lock);
if (n) {
bpf_obj_drop(f);
bpf_obj_drop(container_of(n, struct bar, node));
return 8;
}
bpf_obj_drop(f);
return 0;
}
static __always_inline
int test_list_push_pop(struct bpf_spin_lock *lock, struct bpf_list_head *head)
{
int ret;
ret = list_push_pop(lock, head, false);
if (ret)
return ret;
return list_push_pop(lock, head, true);
}
static __always_inline
int test_list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head *head)
{
int ret;
ret = list_push_pop_multiple(lock ,head, false);
if (ret)
return ret;
return list_push_pop_multiple(lock, head, true);
}
static __always_inline
int test_list_in_list(struct bpf_spin_lock *lock, struct bpf_list_head *head)
{
int ret;
ret = list_in_list(lock, head, false);
if (ret)
return ret;
return list_in_list(lock, head, true);
}
SEC("tc")
int map_list_push_pop(void *ctx)
{
struct map_value *v;
v = bpf_map_lookup_elem(&array_map, &(int){0});
if (!v)
return 1;
return test_list_push_pop(&v->lock, &v->head);
}
SEC("tc")
int inner_map_list_push_pop(void *ctx)
{
struct map_value *v;
void *map;
map = bpf_map_lookup_elem(&map_of_maps, &(int){0});
if (!map)
return 1;
v = bpf_map_lookup_elem(map, &(int){0});
if (!v)
return 1;
return test_list_push_pop(&v->lock, &v->head);
}
SEC("tc")
int global_list_push_pop(void *ctx)
{
return test_list_push_pop(&glock, &ghead);
}
SEC("tc")
int map_list_push_pop_multiple(void *ctx)
{
struct map_value *v;
int ret;
v = bpf_map_lookup_elem(&array_map, &(int){0});
if (!v)
return 1;
return test_list_push_pop_multiple(&v->lock, &v->head);
}
SEC("tc")
int inner_map_list_push_pop_multiple(void *ctx)
{
struct map_value *v;
void *map;
int ret;
map = bpf_map_lookup_elem(&map_of_maps, &(int){0});
if (!map)
return 1;
v = bpf_map_lookup_elem(map, &(int){0});
if (!v)
return 1;
return test_list_push_pop_multiple(&v->lock, &v->head);
}
SEC("tc")
int global_list_push_pop_multiple(void *ctx)
{
int ret;
ret = list_push_pop_multiple(&glock, &ghead, false);
if (ret)
return ret;
return list_push_pop_multiple(&glock, &ghead, true);
}
SEC("tc")
int map_list_in_list(void *ctx)
{
struct map_value *v;
int ret;
v = bpf_map_lookup_elem(&array_map, &(int){0});
if (!v)
return 1;
return test_list_in_list(&v->lock, &v->head);
}
SEC("tc")
int inner_map_list_in_list(void *ctx)
{
struct map_value *v;
void *map;
int ret;
map = bpf_map_lookup_elem(&map_of_maps, &(int){0});
if (!map)
return 1;
v = bpf_map_lookup_elem(map, &(int){0});
if (!v)
return 1;
return test_list_in_list(&v->lock, &v->head);
}
SEC("tc")
int global_list_in_list(void *ctx)
{
return test_list_in_list(&glock, &ghead);
}
char _license[] SEC("license") = "GPL";

View file

@ -0,0 +1,56 @@
// SPDX-License-Identifier: GPL-2.0
#ifndef LINKED_LIST_H
#define LINKED_LIST_H
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include "bpf_experimental.h"
struct bar {
struct bpf_list_node node;
int data;
};
struct foo {
struct bpf_list_node node;
struct bpf_list_head head __contains(bar, node);
struct bpf_spin_lock lock;
int data;
struct bpf_list_node node2;
};
struct map_value {
struct bpf_spin_lock lock;
int data;
struct bpf_list_head head __contains(foo, node);
};
struct array_map {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, struct map_value);
__uint(max_entries, 1);
};
struct array_map array_map SEC(".maps");
struct array_map inner_map SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
__uint(max_entries, 1);
__type(key, int);
__type(value, int);
__array(values, struct array_map);
} map_of_maps SEC(".maps") = {
.values = {
[0] = &inner_map,
},
};
#define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8)))
private(A) struct bpf_spin_lock glock;
private(A) struct bpf_list_head ghead __contains(foo, node);
private(B) struct bpf_spin_lock glock2;
#endif

View file

@ -0,0 +1,581 @@
// SPDX-License-Identifier: GPL-2.0
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
#include "bpf_experimental.h"
#include "linked_list.h"
#define INIT \
struct map_value *v, *v2, *iv, *iv2; \
struct foo *f, *f1, *f2; \
struct bar *b; \
void *map; \
\
map = bpf_map_lookup_elem(&map_of_maps, &(int){ 0 }); \
if (!map) \
return 0; \
v = bpf_map_lookup_elem(&array_map, &(int){ 0 }); \
if (!v) \
return 0; \
v2 = bpf_map_lookup_elem(&array_map, &(int){ 0 }); \
if (!v2) \
return 0; \
iv = bpf_map_lookup_elem(map, &(int){ 0 }); \
if (!iv) \
return 0; \
iv2 = bpf_map_lookup_elem(map, &(int){ 0 }); \
if (!iv2) \
return 0; \
f = bpf_obj_new(typeof(*f)); \
if (!f) \
return 0; \
f1 = f; \
f2 = bpf_obj_new(typeof(*f2)); \
if (!f2) { \
bpf_obj_drop(f1); \
return 0; \
} \
b = bpf_obj_new(typeof(*b)); \
if (!b) { \
bpf_obj_drop(f2); \
bpf_obj_drop(f1); \
return 0; \
}
#define CHECK(test, op, hexpr) \
SEC("?tc") \
int test##_missing_lock_##op(void *ctx) \
{ \
INIT; \
void (*p)(void *) = (void *)&bpf_list_##op; \
p(hexpr); \
return 0; \
}
CHECK(kptr, push_front, &f->head);
CHECK(kptr, push_back, &f->head);
CHECK(kptr, pop_front, &f->head);
CHECK(kptr, pop_back, &f->head);
CHECK(global, push_front, &ghead);
CHECK(global, push_back, &ghead);
CHECK(global, pop_front, &ghead);
CHECK(global, pop_back, &ghead);
CHECK(map, push_front, &v->head);
CHECK(map, push_back, &v->head);
CHECK(map, pop_front, &v->head);
CHECK(map, pop_back, &v->head);
CHECK(inner_map, push_front, &iv->head);
CHECK(inner_map, push_back, &iv->head);
CHECK(inner_map, pop_front, &iv->head);
CHECK(inner_map, pop_back, &iv->head);
#undef CHECK
#define CHECK(test, op, lexpr, hexpr) \
SEC("?tc") \
int test##_incorrect_lock_##op(void *ctx) \
{ \
INIT; \
void (*p)(void *) = (void *)&bpf_list_##op; \
bpf_spin_lock(lexpr); \
p(hexpr); \
return 0; \
}
#define CHECK_OP(op) \
CHECK(kptr_kptr, op, &f1->lock, &f2->head); \
CHECK(kptr_global, op, &f1->lock, &ghead); \
CHECK(kptr_map, op, &f1->lock, &v->head); \
CHECK(kptr_inner_map, op, &f1->lock, &iv->head); \
\
CHECK(global_global, op, &glock2, &ghead); \
CHECK(global_kptr, op, &glock, &f1->head); \
CHECK(global_map, op, &glock, &v->head); \
CHECK(global_inner_map, op, &glock, &iv->head); \
\
CHECK(map_map, op, &v->lock, &v2->head); \
CHECK(map_kptr, op, &v->lock, &f2->head); \
CHECK(map_global, op, &v->lock, &ghead); \
CHECK(map_inner_map, op, &v->lock, &iv->head); \
\
CHECK(inner_map_inner_map, op, &iv->lock, &iv2->head); \
CHECK(inner_map_kptr, op, &iv->lock, &f2->head); \
CHECK(inner_map_global, op, &iv->lock, &ghead); \
CHECK(inner_map_map, op, &iv->lock, &v->head);
CHECK_OP(push_front);
CHECK_OP(push_back);
CHECK_OP(pop_front);
CHECK_OP(pop_back);
#undef CHECK
#undef CHECK_OP
#undef INIT
SEC("?kprobe/xyz")
int map_compat_kprobe(void *ctx)
{
bpf_list_push_front(&ghead, NULL);
return 0;
}
SEC("?kretprobe/xyz")
int map_compat_kretprobe(void *ctx)
{
bpf_list_push_front(&ghead, NULL);
return 0;
}
SEC("?tracepoint/xyz")
int map_compat_tp(void *ctx)
{
bpf_list_push_front(&ghead, NULL);
return 0;
}
SEC("?perf_event")
int map_compat_perf(void *ctx)
{
bpf_list_push_front(&ghead, NULL);
return 0;
}
SEC("?raw_tp/xyz")
int map_compat_raw_tp(void *ctx)
{
bpf_list_push_front(&ghead, NULL);
return 0;
}
SEC("?raw_tp.w/xyz")
int map_compat_raw_tp_w(void *ctx)
{
bpf_list_push_front(&ghead, NULL);
return 0;
}
SEC("?tc")
int obj_type_id_oor(void *ctx)
{
bpf_obj_new_impl(~0UL, NULL);
return 0;
}
SEC("?tc")
int obj_new_no_composite(void *ctx)
{
bpf_obj_new_impl(bpf_core_type_id_local(int), (void *)42);
return 0;
}
SEC("?tc")
int obj_new_no_struct(void *ctx)
{
bpf_obj_new(union { int data; unsigned udata; });
return 0;
}
SEC("?tc")
int obj_drop_non_zero_off(void *ctx)
{
void *f;
f = bpf_obj_new(struct foo);
if (!f)
return 0;
bpf_obj_drop(f+1);
return 0;
}
SEC("?tc")
int new_null_ret(void *ctx)
{
return bpf_obj_new(struct foo)->data;
}
SEC("?tc")
int obj_new_acq(void *ctx)
{
bpf_obj_new(struct foo);
return 0;
}
SEC("?tc")
int use_after_drop(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_obj_drop(f);
return f->data;
}
SEC("?tc")
int ptr_walk_scalar(void *ctx)
{
struct test1 {
struct test2 {
struct test2 *next;
} *ptr;
} *p;
p = bpf_obj_new(typeof(*p));
if (!p)
return 0;
bpf_this_cpu_ptr(p->ptr);
return 0;
}
SEC("?tc")
int direct_read_lock(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
return *(int *)&f->lock;
}
SEC("?tc")
int direct_write_lock(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
*(int *)&f->lock = 0;
return 0;
}
SEC("?tc")
int direct_read_head(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
return *(int *)&f->head;
}
SEC("?tc")
int direct_write_head(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
*(int *)&f->head = 0;
return 0;
}
SEC("?tc")
int direct_read_node(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
return *(int *)&f->node;
}
SEC("?tc")
int direct_write_node(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
*(int *)&f->node = 0;
return 0;
}
static __always_inline
int write_after_op(void (*push_op)(void *head, void *node))
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
push_op(&ghead, &f->node);
f->data = 42;
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int write_after_push_front(void *ctx)
{
return write_after_op((void *)bpf_list_push_front);
}
SEC("?tc")
int write_after_push_back(void *ctx)
{
return write_after_op((void *)bpf_list_push_back);
}
static __always_inline
int use_after_unlock(void (*op)(void *head, void *node))
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
f->data = 42;
op(&ghead, &f->node);
bpf_spin_unlock(&glock);
return f->data;
}
SEC("?tc")
int use_after_unlock_push_front(void *ctx)
{
return use_after_unlock((void *)bpf_list_push_front);
}
SEC("?tc")
int use_after_unlock_push_back(void *ctx)
{
return use_after_unlock((void *)bpf_list_push_back);
}
static __always_inline
int list_double_add(void (*op)(void *head, void *node))
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
op(&ghead, &f->node);
op(&ghead, &f->node);
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int double_push_front(void *ctx)
{
return list_double_add((void *)bpf_list_push_front);
}
SEC("?tc")
int double_push_back(void *ctx)
{
return list_double_add((void *)bpf_list_push_back);
}
SEC("?tc")
int no_node_value_type(void *ctx)
{
void *p;
p = bpf_obj_new(struct { int data; });
if (!p)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front(&ghead, p);
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int incorrect_value_type(void *ctx)
{
struct bar *b;
b = bpf_obj_new(typeof(*b));
if (!b)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front(&ghead, &b->node);
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int incorrect_node_var_off(struct __sk_buff *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front(&ghead, (void *)&f->node + ctx->protocol);
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int incorrect_node_off1(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front(&ghead, (void *)&f->node + 1);
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int incorrect_node_off2(void *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front(&ghead, &f->node2);
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int no_head_type(void *ctx)
{
void *p;
p = bpf_obj_new(typeof(struct { int data; }));
if (!p)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front(p, NULL);
bpf_spin_lock(&glock);
return 0;
}
SEC("?tc")
int incorrect_head_var_off1(struct __sk_buff *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front((void *)&ghead + ctx->protocol, &f->node);
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int incorrect_head_var_off2(struct __sk_buff *ctx)
{
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front((void *)&f->head + ctx->protocol, &f->node);
bpf_spin_unlock(&glock);
return 0;
}
SEC("?tc")
int incorrect_head_off1(void *ctx)
{
struct foo *f;
struct bar *b;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
b = bpf_obj_new(typeof(*b));
if (!b) {
bpf_obj_drop(f);
return 0;
}
bpf_spin_lock(&f->lock);
bpf_list_push_front((void *)&f->head + 1, &b->node);
bpf_spin_unlock(&f->lock);
return 0;
}
SEC("?tc")
int incorrect_head_off2(void *ctx)
{
struct foo *f;
struct bar *b;
f = bpf_obj_new(typeof(*f));
if (!f)
return 0;
bpf_spin_lock(&glock);
bpf_list_push_front((void *)&ghead + 1, &f->node);
bpf_spin_unlock(&glock);
return 0;
}
static __always_inline
int pop_ptr_off(void *(*op)(void *head))
{
struct {
struct bpf_list_head head __contains(foo, node2);
struct bpf_spin_lock lock;
} *p;
struct bpf_list_node *n;
p = bpf_obj_new(typeof(*p));
if (!p)
return 0;
bpf_spin_lock(&p->lock);
n = op(&p->head);
bpf_spin_unlock(&p->lock);
bpf_this_cpu_ptr(n);
return 0;
}
SEC("?tc")
int pop_front_off(void *ctx)
{
return pop_ptr_off((void *)bpf_list_pop_front);
}
SEC("?tc")
int pop_back_off(void *ctx)
{
return pop_ptr_off((void *)bpf_list_pop_back);
}
char _license[] SEC("license") = "GPL";

View file

@ -7,6 +7,10 @@
char _license[] SEC("license") = "GPL";
extern bool CONFIG_SECURITY_SELINUX __kconfig __weak;
extern bool CONFIG_SECURITY_SMACK __kconfig __weak;
extern bool CONFIG_SECURITY_APPARMOR __kconfig __weak;
#ifndef AF_PACKET
#define AF_PACKET 17
#endif
@ -140,6 +144,10 @@ SEC("lsm_cgroup/sk_alloc_security")
int BPF_PROG(socket_alloc, struct sock *sk, int family, gfp_t priority)
{
called_socket_alloc++;
/* if already have non-bpf lsms installed, EPERM will cause memory leak of non-bpf lsms */
if (CONFIG_SECURITY_SELINUX || CONFIG_SECURITY_SMACK || CONFIG_SECURITY_APPARMOR)
return 1;
if (family == AF_UNIX)
return 0; /* EPERM */

View file

@ -0,0 +1,290 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "bpf_tracing_net.h"
#include "bpf_misc.h"
char _license[] SEC("license") = "GPL";
struct {
__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
__uint(map_flags, BPF_F_NO_PREALLOC);
__type(key, int);
__type(value, long);
} map_a SEC(".maps");
__u32 user_data, key_serial, target_pid;
__u64 flags, task_storage_val, cgroup_id;
struct bpf_key *bpf_lookup_user_key(__u32 serial, __u64 flags) __ksym;
void bpf_key_put(struct bpf_key *key) __ksym;
void bpf_rcu_read_lock(void) __ksym;
void bpf_rcu_read_unlock(void) __ksym;
struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
void bpf_task_release(struct task_struct *p) __ksym;
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int get_cgroup_id(void *ctx)
{
struct task_struct *task;
task = bpf_get_current_task_btf();
if (task->pid != target_pid)
return 0;
/* simulate bpf_get_current_cgroup_id() helper */
bpf_rcu_read_lock();
cgroup_id = task->cgroups->dfl_cgrp->kn->id;
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int task_succ(void *ctx)
{
struct task_struct *task, *real_parent;
long init_val = 2;
long *ptr;
task = bpf_get_current_task_btf();
if (task->pid != target_pid)
return 0;
bpf_rcu_read_lock();
/* region including helper using rcu ptr real_parent */
real_parent = task->real_parent;
ptr = bpf_task_storage_get(&map_a, real_parent, &init_val,
BPF_LOCAL_STORAGE_GET_F_CREATE);
if (!ptr)
goto out;
ptr = bpf_task_storage_get(&map_a, real_parent, 0, 0);
if (!ptr)
goto out;
task_storage_val = *ptr;
out:
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
int no_lock(void *ctx)
{
struct task_struct *task, *real_parent;
/* no bpf_rcu_read_lock(), old code still works */
task = bpf_get_current_task_btf();
real_parent = task->real_parent;
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
int two_regions(void *ctx)
{
struct task_struct *task, *real_parent;
/* two regions */
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
bpf_rcu_read_unlock();
bpf_rcu_read_lock();
real_parent = task->real_parent;
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry/" SYS_PREFIX "sys_getpgid")
int non_sleepable_1(void *ctx)
{
struct task_struct *task, *real_parent;
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
real_parent = task->real_parent;
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry/" SYS_PREFIX "sys_getpgid")
int non_sleepable_2(void *ctx)
{
struct task_struct *task, *real_parent;
bpf_rcu_read_lock();
task = bpf_get_current_task_btf();
bpf_rcu_read_unlock();
bpf_rcu_read_lock();
real_parent = task->real_parent;
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
int task_acquire(void *ctx)
{
struct task_struct *task, *real_parent;
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
real_parent = task->real_parent;
/* acquire a reference which can be used outside rcu read lock region */
real_parent = bpf_task_acquire(real_parent);
bpf_rcu_read_unlock();
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
bpf_task_release(real_parent);
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int miss_lock(void *ctx)
{
struct task_struct *task;
struct css_set *cgroups;
struct cgroup *dfl_cgrp;
/* missing bpf_rcu_read_lock() */
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
(void)bpf_task_storage_get(&map_a, task, 0, 0);
bpf_rcu_read_unlock();
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int miss_unlock(void *ctx)
{
struct task_struct *task;
struct css_set *cgroups;
struct cgroup *dfl_cgrp;
/* missing bpf_rcu_read_unlock() */
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
(void)bpf_task_storage_get(&map_a, task, 0, 0);
return 0;
}
SEC("?fentry/" SYS_PREFIX "sys_getpgid")
int non_sleepable_rcu_mismatch(void *ctx)
{
struct task_struct *task, *real_parent;
task = bpf_get_current_task_btf();
/* non-sleepable: missing bpf_rcu_read_unlock() in one path */
bpf_rcu_read_lock();
real_parent = task->real_parent;
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
if (real_parent)
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int inproper_sleepable_helper(void *ctx)
{
struct task_struct *task, *real_parent;
struct pt_regs *regs;
__u32 value = 0;
void *ptr;
task = bpf_get_current_task_btf();
/* sleepable helper in rcu read lock region */
bpf_rcu_read_lock();
real_parent = task->real_parent;
regs = (struct pt_regs *)bpf_task_pt_regs(real_parent);
if (!regs) {
bpf_rcu_read_unlock();
return 0;
}
ptr = (void *)PT_REGS_IP(regs);
(void)bpf_copy_from_user_task(&value, sizeof(uint32_t), ptr, task, 0);
user_data = value;
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
bpf_rcu_read_unlock();
return 0;
}
SEC("?lsm.s/bpf")
int BPF_PROG(inproper_sleepable_kfunc, int cmd, union bpf_attr *attr, unsigned int size)
{
struct bpf_key *bkey;
/* sleepable kfunc in rcu read lock region */
bpf_rcu_read_lock();
bkey = bpf_lookup_user_key(key_serial, flags);
bpf_rcu_read_unlock();
if (!bkey)
return -1;
bpf_key_put(bkey);
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
int nested_rcu_region(void *ctx)
{
struct task_struct *task, *real_parent;
/* nested rcu read lock regions */
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
bpf_rcu_read_lock();
real_parent = task->real_parent;
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
bpf_rcu_read_unlock();
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int task_untrusted_non_rcuptr(void *ctx)
{
struct task_struct *task, *last_wakee;
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
/* the pointer last_wakee marked as untrusted */
last_wakee = task->real_parent->last_wakee;
(void)bpf_task_storage_get(&map_a, last_wakee, 0, 0);
bpf_rcu_read_unlock();
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int task_untrusted_rcuptr(void *ctx)
{
struct task_struct *task, *real_parent;
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
real_parent = task->real_parent;
bpf_rcu_read_unlock();
/* helper use of rcu ptr outside the rcu read lock region */
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
return 0;
}
SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
int cross_rcu_region(void *ctx)
{
struct task_struct *task, *real_parent;
/* rcu ptr define/use in different regions */
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
real_parent = task->real_parent;
bpf_rcu_read_unlock();
bpf_rcu_read_lock();
(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
bpf_rcu_read_unlock();
return 0;
}

View file

@ -0,0 +1,72 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#ifndef _TASK_KFUNC_COMMON_H
#define _TASK_KFUNC_COMMON_H
#include <errno.h>
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct __tasks_kfunc_map_value {
struct task_struct __kptr_ref * task;
};
struct hash_map {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, int);
__type(value, struct __tasks_kfunc_map_value);
__uint(max_entries, 1);
} __tasks_kfunc_map SEC(".maps");
struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
struct task_struct *bpf_task_kptr_get(struct task_struct **pp) __ksym;
void bpf_task_release(struct task_struct *p) __ksym;
struct task_struct *bpf_task_from_pid(s32 pid) __ksym;
static inline struct __tasks_kfunc_map_value *tasks_kfunc_map_value_lookup(struct task_struct *p)
{
s32 pid;
long status;
status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
if (status)
return NULL;
return bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
}
static inline int tasks_kfunc_map_insert(struct task_struct *p)
{
struct __tasks_kfunc_map_value local, *v;
long status;
struct task_struct *acquired, *old;
s32 pid;
status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
if (status)
return status;
local.task = NULL;
status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
if (status)
return status;
v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
if (!v) {
bpf_map_delete_elem(&__tasks_kfunc_map, &pid);
return -ENOENT;
}
acquired = bpf_task_acquire(p);
old = bpf_kptr_xchg(&v->task, acquired);
if (old) {
bpf_task_release(old);
return -EEXIST;
}
return 0;
}
#endif /* _TASK_KFUNC_COMMON_H */

View file

@ -0,0 +1,273 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
#include "task_kfunc_common.h"
char _license[] SEC("license") = "GPL";
/* Prototype for all of the program trace events below:
*
* TRACE_EVENT(task_newtask,
* TP_PROTO(struct task_struct *p, u64 clone_flags)
*/
static struct __tasks_kfunc_map_value *insert_lookup_task(struct task_struct *task)
{
int status;
status = tasks_kfunc_map_insert(task);
if (status)
return NULL;
return tasks_kfunc_map_value_lookup(task);
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_acquire_untrusted, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
struct __tasks_kfunc_map_value *v;
v = insert_lookup_task(task);
if (!v)
return 0;
/* Can't invoke bpf_task_acquire() on an untrusted pointer. */
acquired = bpf_task_acquire(v->task);
bpf_task_release(acquired);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_acquire_fp, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired, *stack_task = (struct task_struct *)&clone_flags;
/* Can't invoke bpf_task_acquire() on a random frame pointer. */
acquired = bpf_task_acquire((struct task_struct *)&stack_task);
bpf_task_release(acquired);
return 0;
}
SEC("kretprobe/free_task")
int BPF_PROG(task_kfunc_acquire_unsafe_kretprobe, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
acquired = bpf_task_acquire(task);
/* Can't release a bpf_task_acquire()'d task without a NULL check. */
bpf_task_release(acquired);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_acquire_trusted_walked, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
/* Can't invoke bpf_task_acquire() on a trusted pointer obtained from walking a struct. */
acquired = bpf_task_acquire(task->last_wakee);
bpf_task_release(acquired);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_acquire_null, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
/* Can't invoke bpf_task_acquire() on a NULL pointer. */
acquired = bpf_task_acquire(NULL);
if (!acquired)
return 0;
bpf_task_release(acquired);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_acquire_unreleased, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
acquired = bpf_task_acquire(task);
/* Acquired task is never released. */
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_get_non_kptr_param, struct task_struct *task, u64 clone_flags)
{
struct task_struct *kptr;
/* Cannot use bpf_task_kptr_get() on a non-kptr, even on a valid task. */
kptr = bpf_task_kptr_get(&task);
if (!kptr)
return 0;
bpf_task_release(kptr);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_get_non_kptr_acquired, struct task_struct *task, u64 clone_flags)
{
struct task_struct *kptr, *acquired;
acquired = bpf_task_acquire(task);
/* Cannot use bpf_task_kptr_get() on a non-kptr, even if it was acquired. */
kptr = bpf_task_kptr_get(&acquired);
bpf_task_release(acquired);
if (!kptr)
return 0;
bpf_task_release(kptr);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_get_null, struct task_struct *task, u64 clone_flags)
{
struct task_struct *kptr;
/* Cannot use bpf_task_kptr_get() on a NULL pointer. */
kptr = bpf_task_kptr_get(NULL);
if (!kptr)
return 0;
bpf_task_release(kptr);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_xchg_unreleased, struct task_struct *task, u64 clone_flags)
{
struct task_struct *kptr;
struct __tasks_kfunc_map_value *v;
v = insert_lookup_task(task);
if (!v)
return 0;
kptr = bpf_kptr_xchg(&v->task, NULL);
if (!kptr)
return 0;
/* Kptr retrieved from map is never released. */
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_get_unreleased, struct task_struct *task, u64 clone_flags)
{
struct task_struct *kptr;
struct __tasks_kfunc_map_value *v;
v = insert_lookup_task(task);
if (!v)
return 0;
kptr = bpf_task_kptr_get(&v->task);
if (!kptr)
return 0;
/* Kptr acquired above is never released. */
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_release_untrusted, struct task_struct *task, u64 clone_flags)
{
struct __tasks_kfunc_map_value *v;
v = insert_lookup_task(task);
if (!v)
return 0;
/* Can't invoke bpf_task_release() on an untrusted pointer. */
bpf_task_release(v->task);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_release_fp, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired = (struct task_struct *)&clone_flags;
/* Cannot release random frame pointer. */
bpf_task_release(acquired);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_release_null, struct task_struct *task, u64 clone_flags)
{
struct __tasks_kfunc_map_value local, *v;
long status;
struct task_struct *acquired, *old;
s32 pid;
status = bpf_probe_read_kernel(&pid, sizeof(pid), &task->pid);
if (status)
return 0;
local.task = NULL;
status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
if (status)
return status;
v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
if (!v)
return -ENOENT;
acquired = bpf_task_acquire(task);
old = bpf_kptr_xchg(&v->task, acquired);
/* old cannot be passed to bpf_task_release() without a NULL check. */
bpf_task_release(old);
bpf_task_release(old);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_release_unacquired, struct task_struct *task, u64 clone_flags)
{
/* Cannot release trusted task pointer which was not acquired. */
bpf_task_release(task);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(task_kfunc_from_pid_no_null_check, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
acquired = bpf_task_from_pid(task->pid);
/* Releasing bpf_task_from_pid() lookup without a NULL check. */
bpf_task_release(acquired);
return 0;
}

View file

@ -0,0 +1,222 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
#include "task_kfunc_common.h"
char _license[] SEC("license") = "GPL";
int err, pid;
/* Prototype for all of the program trace events below:
*
* TRACE_EVENT(task_newtask,
* TP_PROTO(struct task_struct *p, u64 clone_flags)
*/
static bool is_test_kfunc_task(void)
{
int cur_pid = bpf_get_current_pid_tgid() >> 32;
return pid == cur_pid;
}
static int test_acquire_release(struct task_struct *task)
{
struct task_struct *acquired;
acquired = bpf_task_acquire(task);
bpf_task_release(acquired);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_acquire_release_argument, struct task_struct *task, u64 clone_flags)
{
if (!is_test_kfunc_task())
return 0;
return test_acquire_release(task);
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_acquire_release_current, struct task_struct *task, u64 clone_flags)
{
if (!is_test_kfunc_task())
return 0;
return test_acquire_release(bpf_get_current_task_btf());
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_acquire_leave_in_map, struct task_struct *task, u64 clone_flags)
{
long status;
if (!is_test_kfunc_task())
return 0;
status = tasks_kfunc_map_insert(task);
if (status)
err = 1;
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_xchg_release, struct task_struct *task, u64 clone_flags)
{
struct task_struct *kptr;
struct __tasks_kfunc_map_value *v;
long status;
if (!is_test_kfunc_task())
return 0;
status = tasks_kfunc_map_insert(task);
if (status) {
err = 1;
return 0;
}
v = tasks_kfunc_map_value_lookup(task);
if (!v) {
err = 2;
return 0;
}
kptr = bpf_kptr_xchg(&v->task, NULL);
if (!kptr) {
err = 3;
return 0;
}
bpf_task_release(kptr);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_get_release, struct task_struct *task, u64 clone_flags)
{
struct task_struct *kptr;
struct __tasks_kfunc_map_value *v;
long status;
if (!is_test_kfunc_task())
return 0;
status = tasks_kfunc_map_insert(task);
if (status) {
err = 1;
return 0;
}
v = tasks_kfunc_map_value_lookup(task);
if (!v) {
err = 2;
return 0;
}
kptr = bpf_task_kptr_get(&v->task);
if (!kptr) {
err = 3;
return 0;
}
bpf_task_release(kptr);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_current_acquire_release, struct task_struct *task, u64 clone_flags)
{
struct task_struct *current, *acquired;
if (!is_test_kfunc_task())
return 0;
current = bpf_get_current_task_btf();
acquired = bpf_task_acquire(current);
bpf_task_release(acquired);
return 0;
}
static void lookup_compare_pid(const struct task_struct *p)
{
struct task_struct *acquired;
acquired = bpf_task_from_pid(p->pid);
if (!acquired) {
err = 1;
return;
}
if (acquired->pid != p->pid)
err = 2;
bpf_task_release(acquired);
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_from_pid_arg, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
if (!is_test_kfunc_task())
return 0;
lookup_compare_pid(task);
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_from_pid_current, struct task_struct *task, u64 clone_flags)
{
struct task_struct *current, *acquired;
if (!is_test_kfunc_task())
return 0;
lookup_compare_pid(bpf_get_current_task_btf());
return 0;
}
static int is_pid_lookup_valid(s32 pid)
{
struct task_struct *acquired;
acquired = bpf_task_from_pid(pid);
if (acquired) {
bpf_task_release(acquired);
return 1;
}
return 0;
}
SEC("tp_btf/task_newtask")
int BPF_PROG(test_task_from_pid_invalid, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
if (!is_test_kfunc_task())
return 0;
if (is_pid_lookup_valid(-1)) {
err = 1;
return 0;
}
if (is_pid_lookup_valid(0xcafef00d)) {
err = 2;
return 0;
}
return 0;
}

Some files were not shown because too many files have changed in this diff Show more