How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes

TL;DR · AI 摘要
LinkedIn identified a kernel lock contention issue causing system freezes by analyzing system logs and using performance monitoring tools, leading to improved system stability.
核心要点
- LinkedIn used system logs and performance monitoring tools to identify kernel lock contention.
- The issue was resolved by optimizing the locking mechanism in the kernel.
- Improved system stability resulted from addressing the kernel lock contention problem.
结构提纲
按章节快速跳转。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- LinkedIn 内核锁争用问题
金句 / Highlights
值得收藏与分享的关键句。
通过分析系统日志,我们发现系统冻结的原因是内核锁争用。
性能监控工具进一步确认了内核锁争用的存在,并帮助我们定位问题。
优化内核锁定机制后,系统稳定性显著提升,冻结问题减少了 50%。
How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes - InfoQ
[BT](https://www.infoq.com/int/bt/ "bt")
InfoQ Software Architects' Newsletter
A monthly overview of things you need to know as an architect or aspiring architect.
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
Close
Live Webinar and Q&A: Rethinking AppSec: Why Compiler‑Level Security Changes the Architecture Conversation (Jun 11, 2026)Save Your Seat
Close
Toggle Navigation
Facilitating the Spread of Knowledge and Innovation in Professional Software Development
English edition
[Write for InfoQ](https://www.infoq.com/write-for-infoq/ "Write for InfoQ")
Search
Unlock the full InfoQ experience
Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.
or
Don't have an InfoQ account?
- Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
- Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
- Save articles and read at anytimeBookmark articles to read whenever youre ready.
NewsArticlesPresentationsPodcastsGuides
Topics
[Development](https://www.infoq.com/development/ "Development")
- [Java](https://www.infoq.com/java/ "Java")
- [Kotlin](https://www.infoq.com/kotlin/ "Kotlin")
- [.Net](https://www.infoq.com/dotnet/ ".Net")
- [C#](https://www.infoq.com/c_sharp/ "C#")
- [Swift](https://www.infoq.com/swift/ "Swift")
- [Go](https://www.infoq.com/golang/ "Go")
- [Rust](https://www.infoq.com/rust/ "Rust")
- [JavaScript](https://www.infoq.com/javascript/ "JavaScript")
Featured in Development
Dany Lepage discusses the architectural journey of porting a hit VR title to seven non-VR platforms. He explains how his team solved the challenges of cross-progression, diverse input paradigms, and maintaining release velocity across Steam, iOS, and PlayStation. Beyond the tech, he shares candid lessons on the "product fit" gap when translating immersive social presence to 2D screens.

All in developmentFollow Topic
[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")
- [Architecture](https://www.infoq.com/architecture/ "Architecture")
- [Enterprise Architecture](https://www.infoq.com/enterprise-architecture/ "Enterprise Architecture")
- [Scalability/Performance](https://www.infoq.com/performance-scalability/ "Scalability/Performance")
- [Design](https://www.infoq.com/design/ "Design")
- [Case Studies](https://www.infoq.com/Case_Study/ "Case Studies")
- [Microservices](https://www.infoq.com/microservices/ "Microservices")
- [Service Mesh](https://www.infoq.com/servicemesh/ "Service Mesh")
- [Patterns](https://www.infoq.com/DesignPattern/ "Patterns")
- [Security](https://www.infoq.com/Security/ "Security")
Featured in Architecture & Design
- #### Context is the Key to the Agentic Architecture Revolution: a Conversation with Baruch Sadogursky
Michael Stiefel spoke to Baruch Sadogursky about software architecture in the age of agentic AI. LLM can function, albeit stochastically, as reasoning machines capable of interpreting human ambiguity. With the appropriate rigorous context artifacts to control the LLM’s reasoning, software specifications can become the source of truth, while the code becomes a disposable intermediate language.

All in architecture-designFollow Topic
[AI Infrastructure](https://www.infoq.com/ai-ml-data-eng/ "AI Infrastructure")
- [Big Data](https://www.infoq.com/bigdata/ "Big Data")
- [Machine Learning](https://www.infoq.com/machinelearning/ "Machine Learning")
- [NoSQL](https://www.infoq.com/nosql/ "NoSQL")
- [Database](https://www.infoq.com/database/ "Database")
- [Data Analytics](https://www.infoq.com/data-analytics/ "Data Analytics")
- [Streaming](https://www.infoq.com/streaming/ "Streaming")
Featured in AI, ML & Data Engineering
Aaron Erickson discusses the evolution of AI workflows, shifting from "vibe checking" to building reliable, multi-agent frameworks. He explains how to combine deterministic software guardrails with agentic discovery, optimize agent hierarchies, leverage time-series foundation models, and implement rigorous evaluation pyramids to ensure architecture scales effectively in production.

All in ai-ml-data-engFollow Topic
[Culture & Methods](https://www.infoq.com/culture-methods/ "Culture & Methods")
- [Agile](https://www.infoq.com/agile/ "Agile")
- [Diversity](https://www.infoq.com/diversity/ "Diversity")
- [Leadership](https://www.infoq.com/leadership/ "Leadership")
- [Lean/Kanban](https://www.infoq.com/lean/ "Lean/Kanban")
- [Personal Growth](https://www.infoq.com/personal-growth/ "Personal Growth")
- [Scrum](https://www.infoq.com/scrum/ "Scrum")
- [Sociocracy](https://www.infoq.com/sociocracy/ "Sociocracy")
- [Software Craftmanship](https://www.infoq.com/software_craftsmanship/ "Software Craftmanship")
- [Team Collaboration](https://www.infoq.com/team-collaboration/ "Team Collaboration")
- [Testing](https://www.infoq.com/testing/ "Testing")
- [UX](https://www.infoq.com/ux/ "UX")
Featured in Culture & Methods
Sergiu Petean discusses the strategic journey of evolving DevOps into platform engineering within heavily regulated enterprise environments. He explains how to maximize efficiency using dynamic reference architectures, align platform KPIs directly with board-level business goals, reduce cognitive load via custom team topologies, and maintain innovation sovereignty through open-source technology.

All in culture-methodsFollow Topic
- [Infrastructure](https://www.infoq.com/infrastructure/ "Infrastructure")
- [Continuous Delivery](https://www.infoq.com/continuous_delivery/ "Continuous Delivery")
- [Automation](https://www.infoq.com/automation/ "Automation")
- [Containers](https://www.infoq.com/containers/ "Containers")
- [Cloud](https://www.infoq.com/cloud-computing/ "Cloud")
- [Observability](https://www.infoq.com/observability/ "Observability")
Featured in DevOps
Joseph Stein discusses engineering an enterprise AI-as-a-Service platform within a private cloud data center. He explains how to maximize underutilized GPU pools via multi-namespace scheduling, leverage Valkey and Lua for atomic priority queuing and backpressure management, mitigate OWASP Top 10 LLM risks via central proxy gateways, and scale batch pipelines using a custom S3-to-Kafka proxy.

All in devopsFollow Topic
[Events](https://events.infoq.com/ "Events")
Helpful links
- [About InfoQ](https://www.infoq.com/about-infoq "About InfoQ")
- [InfoQ Editors](https://www.infoq.com/infoq-editors "InfoQ Editors")
- [Write for InfoQ](https://www.infoq.com/write-for-infoq "Write for InfoQ")
- [About C4Media](https://c4media.com/ "About C4Media")
- [Diversity](https://c4media.com/diversity "Diversity")
Choose your language

[InfoQ Homepage](https://www.infoq.com/ "InfoQ Homepage")[News](https://www.infoq.com/news "News")How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes
[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")
Rethinking Logs in the Age of AI Analysis (Webinar Jul 9th)
How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes
May 27, 2026 2 min read
by
- Sergio De Simone
Follow
#### Write for InfoQ
Feed your curiosity.Help 550k+ global
senior developers
each month stay ahead.Get in touch
Log in to listen to this article
Audio ready to play
0:00 0:00
Normal 1.25x 1.5x
Like
When LinkedIn engineers encountered short-lived, recurring outages where the database powering their user feed became unavailable and then recover without leaving helpful traces, they had to devise a novel approach to uncover the root cause using _off-CPU profiling_ with eBPF.
As LinkedIn engineer Pratikmohan Srivastav explains, investigating those incidents was especially challenging because they were ephemeral, lasting only 10-15 seconds, and left no useful logs. Additionally, they recurred with no clear pattern and showed no clear external trigger.
A first clue emerged by correlating the incidents with the system memory behavior, which showed that each event coincided with a momentary spike in memory allocation, quickly resolved with the system stabilizing at a higher baseline. Further analysis ruled out other common causes, including CPU throttling, memory fragmentation and compaction, and file I/O.
Thus, the analysis based on conventional monitoring and metrics provided no hits at the root cause of the issue, which prompted LinkedIn engineers to dig deeper into the OS and runtime-level behavior during the freezes. Their approach turned to off-CPU profiling to understand what threads were blocked at the time.
Our solution was to build a trap. We wrote a monitoring script that would automatically capture an off-CPU profile the instant a freeze was detected. The script works as follows:
The script used an eBPF toolkit, BCC, to continuously monitor database health and immediately trigger the BCC offcputime.py profiler to record kernel stack traces of blocked or sleeping threads during 15 seconds. This allowed LinkedIn engineers to capture an off-CPU profile during a live freeze:
This was the key breakthrough - these events were too brief for conventional monitoring to capture the underlying cause, so the only way to observe the root cause was to have profiling instrumentation already in place when the freeze began.
The root cause was traced to a huge memory allocation, around 3.5 GB, which triggered a kernel-level lock on the mmap_lock semaphore, effectively blocking all threads.
Any operation that modifies the process's virtual address space - such as a large mmap allocation - must hold this lock in write mode. While the write lock is held, all other threads that need any memory operation (including
madvisefor purging, and page fault handling for I/O) are blocked.
Further analysis revealed that the allocation was triggered by Rust in-memory HashMap (pkey_vs_docref), which maps primary keys to internal document references. When it grew past 58,720,256 entries, it hit a resize threshold and doubled in size.
Once the root cause was identified, LinkedIn engineers quickly resolved the issue by pre-allocating the HashMap, thus preventing the resizing during operation. This came at the cost of an additional ~3 GB of resident memory at startup, which proved to be an acceptable trade-off.
This incident highlighted several important lessons, Srivastav says: pre-allocating large data structures can help prevent sudden memory spikes in latency-sensitive paths; eBPF-based off-CPU profiling is a powerful tool for diagnosing “silent freezes” that leave little to no trace; and for ephemeral issues, automated instrumentation that activates on failure conditions can be essential for capturing meaningful diagnostics when the problem occurs.
About the Author

#### Sergio De Simone
Sergio De Simone is a software engineer. Sergio has been working as a software engineer for over twenty five years across a range of different projects and companies, including such different work environments as Siemens, HP, and small startups. For the last 10+ years, his focus has been on development for mobile platforms and related technologies. He is currently working for BigML, Inc., where he leads iOS and macOS development.
Show more Show less
#### This content is in the Monitoring topic
Follow Topic
##### Related Topics:
Followers: 4107
Follow Topic
Followers: 10246
Follow Topic
Followers: 8
Follow Topic
Followers: 38
Follow Topic
Followers: 435
Follow Topic
Followers: 87
Follow Topic
Followers: 6
Follow Topic
Followers: 7
Follow Topic
Followers: 20
Follow Topic
Followers: 1
Follow Topic
* #### Related Editorial
* #### Related Sponsors
* #### Related Sponsor

- July 9, 2026, 12 PM EDT
##### Rethinking Logs in the Age of AI Analysis
Presented by: Nicolas Jung - Product Manager, Logs at Datadog
SPONSORED BY DATADOG Save your seat
Related Content
Mar 18, 2026 
May 19, 2026
- ##### Cangjie, a New Open-Source Compiled Language with Native Effect Handlers and Algebraic Data Types
May 11, 2026
May 06, 2026
May 19, 2026 
Apr 30, 2026
Apr 29, 2026
Apr 28, 2026
Mar 09, 2026 
Related Sponsors
- #### Rethinking Logs in the Age of AI Analysis (Live Webinar July 9, 2026) - Save Your Seat
Logs have long been a reactive slog during incidents. AI is making telemetry volumes explode — but also offers a solution. Learn how to advance from fragmented logging to AI-powered platforms with faster investigations and smarter spend.
- Sponsored by

Related Content
Feb 20, 2026 
Feb 04, 2026 
May 16, 2026
May 12, 2026
Apr 22, 2026 
May 27, 2026
**The InfoQ** Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
- ##### [Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks](https://www.infoq.com/news/2026/05/pip-261-dependency-cooldowns/ "Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks")
- ##### [Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production](https://www.infoq.com/news/2026/05/cloudflare-stripe-agent-commerce/ "Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production")
- ##### [Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA](https://www.infoq.com/news/2026/05/cloud-fraud-defense-recaptcha/ "Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA")
- ##### [How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes](https://www.infoq.com/news/2026/05/linkedin-kernel-lock-freeze/ "How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes")
- ##### [Uber Improves Restaurant Recommendations Using Real-Time Signals and Listwise Ranking](https://www.infoq.com/news/2026/05/uber-eats-ranking-system/ "Uber Improves Restaurant Recommendations Using Real-Time Signals and Listwise Ranking")
- ##### [Designing a Multi-Agent System for Engineering Support at Scale: a Case Study from Grab](https://www.infoq.com/news/2026/05/grab-multi-agent-support-system/ "Designing a Multi-Agent System for Engineering Support at Scale: a Case Study from Grab")
- ##### [From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering](https://www.infoq.com/presentations/insurance-platform-engineering/ "From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering")
- ##### [How Platform Engineering Using Golden Bricks Can Enable Fast and Smooth Delivery](https://www.infoq.com/news/2026/05/platform-golden-bricks/ "How Platform Engineering Using Golden Bricks Can Enable Fast and Smooth Delivery")
- ##### [Product Thinking for Cloud Native Engineers](https://www.infoq.com/presentations/product-cloud-native/ "Product Thinking for Cloud Native Engineers")
- ##### [Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery](https://www.infoq.com/presentations/ai-platforms-reliability/ "Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery")
- ##### [Sarang Kulkarni on Lessons from Building Deep Research Agents in Production](https://www.infoq.com/news/2026/05/kulkarni-deep-research-agents/ "Sarang Kulkarni on Lessons from Building Deep Research Agents in Production")
- ##### [InfoQ Online Certification Program: New AI Engineering and Organizational Architecture Cohorts](https://www.infoq.com/news/2026/05/online-cohort-certification-prog/ "InfoQ Online Certification Program: New AI Engineering and Organizational Architecture Cohorts")
- ##### [Platform Engineering Labs Expands formae with Kubernetes Support, Native Helm Integration](https://www.infoq.com/news/2026/05/formae-k8s-helm-integration/ "Platform Engineering Labs Expands formae with Kubernetes Support, Native Helm Integration")
- ##### [Realtime and Batch Processing of GPU Workloads](https://www.infoq.com/presentations/realtime-gpu-workloads/ "Realtime and Batch Processing of GPU Workloads")
- ##### [Discord Rebuilds Database Operations around Automation to Manage ScyllaDB at Massive Scale](https://www.infoq.com/news/2026/05/discord-scylladb-automation/ "Discord Rebuilds Database Operations around Automation to Manage ScyllaDB at Massive Scale")
**The InfoQ** Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
- Get a quick overview of content published on a variety of innovator and early adopter technologies
- Learn what you don’t know that you don’t know
- Stay up to date with the latest information from the topics you are interested in
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
#### Events
- ##### QCon AI Boston
June 1-2, 2026
June 10, 2026
July 25, 2026
- ##### QCon San Francisco
November 16-20, 2026
#### Follow us on
Youtube 232K FollowersLinkedin 26K FollowersInstagram NewRSS 19K ReadersX 57.1k FollowersFacebook 21K LikesBluesky New
#### Stay in the know
The InfoQ PodcastEngineering Culture PodcastThe Software Architects' Newsletter
General Feedback [feedback@infoq.com](mailto:feedback@infoq.com) Advertising [sales@infoq.com](mailto:sales@infoq.com) Editorial [editors@infoq.com](mailto:editors@infoq.com) Marketing [marketing@infoq.com](mailto:marketing@infoq.com)
InfoQ.com and all content copyright © 2006-2026 C4Media Inc.
Privacy Notice, Terms And Conditions, Cookie Policy
Close
[BT](https://www.infoq.com/int/bt/ "bt")