-
Notifications
You must be signed in to change notification settings - Fork 813
feat: Add NETopKV function. #1251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cb2af0c to
1f0778c
Compare
| * @publicapi | ||
| */ | ||
|
|
||
| #include "arm_compute/core/Types.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which types are needed from this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status
If I remove it I get
In file included from src/runtime/NEON/functions/NETopKV.cpp:24:0:
./arm_compute/runtime/NEON/functions/NETopKV.h:79:12: error: ‘Status’ does not name a type; did you mean ‘static’?
static Status
^~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That class is defined under arm_compute/core/Error.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced Types.h with Error.h in next patchset
| */ | ||
| void configure(const ITensor *predictions, const ITensor *targets, ITensor *output, const unsigned int k); | ||
|
|
||
| /** Static function to check if given info will lead to a valid configuration of @ref CPPTopKVKernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CpuTopKVKernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patchset
src/cpu/kernels/CpuTopKVKernel.cpp
Outdated
| #include "src/core/helpers/WindowHelpers.h" | ||
| #include "src/cpu/kernels/topkv/list.h" | ||
|
|
||
| #include <array> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector? also we need to add cstdint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patchset
src/cpu/operators/CpuTopKV.h
Outdated
| void | ||
| configure(const ITensorInfo *predictions, const ITensorInfo *targets, ITensorInfo *output, const unsigned int k); | ||
|
|
||
| /** Static function to check if given info will lead to a valid configuration of @ref CPPTopKVKernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just say "similar to @ref CpuTopKV::configure()" to reduce duplication? Have a look at arm_compute/runtime/experimental/operators/CpuActivation.h
Same comment is valid for every header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patchset
src/cpu/operators/CpuTopKV.cpp
Outdated
| ARM_COMPUTE_TRACE_EVENT(ARM_COMPUTE_PROF_CAT_CPU, ARM_COMPUTE_PROF_LVL_CPU, "CpuTopKV::run"); | ||
| ARM_COMPUTE_ERROR_ON_MSG(tensors.empty(), "No inputs provided"); | ||
|
|
||
| NEScheduler::get().schedule_op(_kernel.get(), 0, _kernel->window(), tensors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of 0, we use Window::DimX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/validation/reference/TopKV.cpp
Outdated
| *reinterpret_cast<const T *>(predictions(Coordinates{static_cast<int>(c), static_cast<int>(i)})); | ||
| const float v = static_cast<float>(vt); | ||
|
|
||
| // Mirror CPPTopKVKernel: (a - b > epsilon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same way it's implemented in CPPTopKVKernel
| { | ||
|
|
||
| template <typename T> | ||
| SimpleTensor<uint8_t> topkv(SimpleTensor<T> &predictions, SimpleTensor<uint32_t> &targets, uint32_t k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to include
- SimpleTensor.h
- TensorShape.h
- cstdint (because of int types)
- CoreTypes.h for DataType
- Coordinates.h
Anything else? We don't need the big Types.h file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patchset
| #include "tests/IAccessor.h" | ||
| #include "tests/validation/Helpers.h" | ||
| #include "tests/validation/reference/TopKV.h" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to include cstdint, type_traits, random and algorithm STL headers for the functions used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patchset
| #include "tests/AssetsLibrary.h" | ||
| #include "tests/framework/Asserts.h" | ||
| #include "tests/framework/Fixture.h" | ||
| #include "tests/framework/ParametersLibrary.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need
- ParametersLibrary.h
- IAccessor.h
- tests/validation/Helpers.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in next patch
| #define ACL_TESTS_VALIDATION_FIXTURES_TOPKVLAYERFIXTURE_H | ||
|
|
||
| #include "arm_compute/core/TensorShape.h" | ||
| #include "arm_compute/core/Types.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need anything particular from this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed them in next patch
89cd2b8 to
cd41038
Compare
|
|
||
| #include "src/core/common/Macros.h" | ||
| #include "src/cpu/ICpuKernel.h" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at more carefully, I think we need to include the STL vector, string and cstdint headers in this file; not inside the corresponding cpp. Also, type_traits for std::add_pointer.
Same goes for ITensorInfo.h, ITensor.h.
We also need to include
- arm_compute/core/Window.h for Window
- arm_compute/core/CPP/CPPTypes.h for CPUInfo
- src/cpu/kernels/CpuKernelSelectionTypes.h for DataTypeSelector.. stuff
- arm_compute/core/Error.h for Status
In a nutshell, this file should be self-sufficient and we don't need to repeat the same includes in the corresponding .cpp file for brevity.
Can you please review this file and CpuTopKVKernek.cpp accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in next patch
src/cpu/kernels/CpuTopKVKernel.h
Outdated
| * @param[in] src0 First input tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32 | ||
| * @param[in] src1 Second input tensor info. Data types supported: U32 | ||
| * @param[out] dst The dst tensor info. Data types supported: U8 | ||
| * @param[in] policy Overflow policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last argument is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next patch
tests/validation/reference/TopKV.cpp
Outdated
| for (unsigned int i = 0; i < N; ++i) | ||
| { | ||
| // targets[i] (U32) | ||
| const uint32_t target_class = *reinterpret_cast<const uint32_t *>(targets(Coordinates{static_cast<int>(i)})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this part particularly difficult to read. Why not just targets[i]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| { | ||
| namespace cpu | ||
| { | ||
| void topkv_fp16_neon(const ITensor *in1, const ITensor *in2, ITensor *out, uint32_t k, const Window &win) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there should be room to use templates in these implementations (especiall fps and quantized ones are very similar). The differences look like they can be handled using wrapper::.. type calls as in other kernels and type-enabled templates (e.g. for v7 variants, numeric_limits etc.). I think we could reduce the lines of code quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in next patch. Following the usual library pattern, common code will live in src/cpu/kernels/topkv/generic/neon/impl.h
src/cpu/operators/CpuTopKV.h
Outdated
| { | ||
| namespace cpu | ||
| { | ||
| /** Basic function to run @ref kernels::CpuActivationKernel */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CpuActivationKernel --> CpuTopKVKernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in next patch
|
|
||
| for (unsigned int n = 0; n < N; ++n) | ||
| { | ||
| const uint32_t target = *reinterpret_cast<const uint32_t *>(targets(Coordinates{static_cast<int>(n)})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
targets[i]??
tests/validation/reference/TopKV.cpp
Outdated
| for (unsigned int c = 0; c < C; ++c) | ||
| { | ||
| const T vt = | ||
| *reinterpret_cast<const T *>(predictions(Coordinates{static_cast<int>(c), static_cast<int>(i)})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use int for all these data types, we'd reduce the number of static casts quite a lot and improve readability of this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| #include "tests/SimpleTensor.h" | ||
| #include "tests/validation/Helpers.h" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <cstdint>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests/validation/reference/TopKV.h
Outdated
| #define ACL_TESTS_VALIDATION_REFERENCE_TOPKV_H | ||
|
|
||
| #include "tests/SimpleTensor.h" | ||
| #include "tests/validation/Helpers.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| @@ -0,0 +1,87 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add this operator to docs/user_guide/operator_list.dox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patch
541f913 to
e0090dc
Compare
* The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop * Resolves ARMCL-1227. Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705 Signed-off-by: Pablo Marquez Tello <[email protected]>
e0090dc to
11238b6
Compare
The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (scalar CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop.
Resolves ARMCL-1227
Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705