How we solved a CPU bottleneck caused by Lua exceptions in a custom Kong plugin (using OpenResty XRay)
In this article, we will share a success story of how one of our OpenResty XRay customers used our tool to identify and fix a CPU bottleneck in their Kong API gateway. Kong is a powerful and flexible API gateway that is built on top of our open-source OpenResty software. However, sometimes the custom plugins that extend Kong’s functionality may introduce performance issues or bugs that are hard to notice and debug. That’s why our customer decided to use OpenResty XRay, a noninvasive and efficient tool that can monitor and analyze the performance of any OpenResty application, including Kong.
The problem: high CPU usage in Kong servers
Our customer noticed that their Kong servers were consuming more CPU resources than expected, even though the incoming API traffic was not very high. They suspected that there might be some inefficiencies or errors in their custom plugins, but they had no clue where to look for them. They needed a tool that could help them pinpoint the root cause of the CPU bottleneck and provide actionable insights on how to fix it.
The analysis & report
The customer installed OpenResty XRay on their Kong servers and configured it to automatically sample the online Kong processes either periodically or when the CPU usage spiked. OpenResty XRay also automatically updated the analysis report for the current day every hour, so that our customer and our team could see the latest performance data.
One of the first things we noticed in the report was the following hot code path in the CPU section:
This code path showed that Lua exceptions were being thrown by the string.lower
standard function, which converts all the characters of an input string to lowercase. Exceptions are expensive operations in most programming languages and their implementations, because they usually require stack unwinding and error handling. We wanted to know why these exceptions were happening and where they came from.
By hovering the mouse over the Lua or C functions in the code path, we could see the Lua source locations, including the file name and the line number.
The report also showed us a CPU flame graph with this hot code path highlighted in red:
To confirm our findings, we checked out the Errors & Exceptions section of the same report, which showed us this error message:
The error message said “bad argument #1 to ‘lower’ (string expected, got nil)”
, which meant that the Lua code was passing nil
values to the lower
function, which expected a string argument. By looking at its parent function frame, [builtin#string.lower]
, we knew that it was the Lua builtin function string.lower
. From the report, we also learned that the exception was thrown from line 35 of the source file .../kong/plugins/auth/handler.lua
.
It was interesting to note that this Lua exception was caught by pcall, a Lua function that calls another function in protected mode, meaning that it can catch any errors without interrupting the whole Lua handler. That’s why there was nothing useful in Kong’s error log files.
At this point, we had enough information to conclude that there was a bug in the customer’s own auth
Kong plugin, which misused the standard Lua API function string.lower
. The fix was also simple: just avoid passing nil values to string.lower
at line 35 of .../kong/plugins/auth/handler.lua
.
The result: improved performance and reduced CPU usage
After applying the fix to their custom auth plugin, our customer saw a dramatic improvement in their Kong server’s performance.
As we can see from this graph, for the same amount of incoming API traffic, the average CPU usage dropped from 80% to only 50%. That’s a 37.5% reduction in CPU consumption! Our customer was very happy with this result and thanked us for our help.
Conclusion
This article demonstrated how OpenResty XRay helped our customer find and fix a CPU bottleneck caused by unexpected Lua exceptions in their custom Kong plugin. By using OpenResty XRay, they were able to quickly identify the hot code path and the source location of the exception, as well as the error message and the stack trace. They were also able to see the performance improvement after applying the fix.
OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.
About The Author
Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 2 decades of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.