Optimizing OpenStack Nova VM Port Attach/Detach Performance
The article analyzes why Nova VM port attach and detach operations can take 70‑90 seconds due to full‑cache refresh and coarse locks, proposes incremental cache updates and fine‑grained port locks, and shows benchmark results that cut attach time to 10‑17 s and detach time to about 7 s while eliminating lock contention and consistency errors.
Problem Overview
Three performance and consistency problems were observed when attaching or detaching virtual machine ports (port) in OpenStack Nova:
Excessive latency : operations on VMs with multiple ports took 70‑90+ seconds per attach or detach.
Task blocking : attach, detach, Neutron event callbacks and periodic cache‑sync all queued on the same instance‑level lock, causing one slow task to block the others.
Concurrency consistency : rapid repeated attach‑detach‑attach on the same port could leave orphaned NICs or cause attach/detach failures.
Issue 1 – 70‑90 s per attach/detach
Root cause identification
Timing logs added to the critical path showed the bottleneck in _build_network_info_model, where the entire instance‑level network‑info cache is rebuilt.
Typical call chain for attach_interface:
attach_interface
├── allocate_port_for_instance # request port from Neutron
├── driver.attach_interface # libvirt hot‑plug NIC
└── get_instance_nw_info # refresh info_cache (bottleneck)
└── _build_network_info_model
├── list_ports # query all ports of the VM
└── for each port query Neutron for network infoEven when only one port is attached, the code re‑queries network information for **all** ports of the VM. For a VM with 10 ports, each port triggers a Neutron request, inflating latency.
Log excerpts:
_build_network_info_model list_ports took: 0.8s
_build_network_info_model _gather_port_ids_and_networks took: 28.3s
_build_network_info_model _build_vif_model took: 1.2sThe _gather_port_ids_and_networks stage consumes ~30 s, corresponding to per‑port Neutron queries.
Solution – Incremental cache update
A new helper incremental_instance_cache_with_nw_info updates the cache only for newly added ports instead of rebuilding the whole cache:
# nova/network/base_api.py
def incremental_instance_cache_with_nw_info(impl, context,
instance, nw_info=None,
update_cells=True):
"""Incremental update: append new port network info to existing cache"""
current_nw_info = instance.get_network_info()
new_nw_info = network_model.NetworkInfo(
[vif for vif in current_nw_info + nw_info]
)
ic = objects.InstanceInfoCache.get_by_instance_uuid(
context, instance.uuid)
ic.network_info = new_nw_info
ic.save(update_cells=update_cells)
instance.info_cache = icThe attach flow is reordered to perform the driver hot‑plug first, then apply the incremental cache update:
attach_interface (optimized)
├── allocate_port_for_instance
├── driver.attach_interface # hot‑plug NIC
└── incremental_instance_cache # only append new port infoBenchmark comparison (before → after):
Attach port latency: ~71 s → 10‑17 s
Detach port latency: ~92 s → ~7 s
Issue 2 – Operations blocking each other
Root cause identification
Originally, several operations shared the same instance‑level lock refresh_cache-{instance_uuid}:
Attach → get_instance_nw_info → refresh_cache lock
Detach → get_instance_nw_info → refresh_cache lock
Neutron network‑changed event → get_instance_nw_info → refresh_cache lock
Periodic task _heal_instance_info_cache → get_instance_nw_info → refresh_cache lock
If the periodic task holds the lock for ~30 s, any incoming attach/detach request must wait, leading to queueing.
Solution – Split lock granularity
The lock is divided into two distinct scopes: port-{port_uuid} lock – controls driver‑level attach/detach for a specific port. refresh_cache-{instance_uuid} lock – used only for full‑cache refreshes (periodic tasks, event callbacks).
Attach and detach now acquire only the port‑level lock, eliminating interference with the periodic cache refresh.
# nova/compute/manager.py
def attach_interface(self, context, instance, network_id, port_id, ...):
if port_id:
with lockutils.lock('port-%s' % port_id):
return self._attach_interface(...)
else:
return self._attach_interface(...)
def detach_interface(self, context, instance, port_id):
with lockutils.lock('port-%s' % port_id):
# detach logic ...Timeline after optimization:
Periodic task ----[refresh_cache lock]----
attach request ----[port‑aaa lock]---- executes immediately
detach request ----[port‑bbb lock]---- executes immediatelyThus attach/detach no longer wait for other tasks.
Issue 3 – Consistency under high concurrency
Problem reproduction
When the same port undergoes rapid attach → detach → attach, two race conditions can occur:
Detach reaches the driver first, but attach has already written the cache, leaving a stale NIC entry (orphaned NIC).
Concurrent cache modifications corrupt the cache, causing subsequent operations to miss the port or duplicate it, leading to attach/detach failures.
Solution – Unified port‑level locking
The lock split already resolves this: all operations on a given port (attach, detach, network‑vif‑deleted event) acquire the same port-{port_uuid} lock, guaranteeing serial execution:
# attach
with lockutils.lock('port-%s' % port_id):
self._attach_interface(...)
# detach
with lockutils.lock('port-%s' % port_id):
self._detach_interface(...)
# neutron event
with lockutils.lock('port-%s' % event.tag):
self._process_instance_vif_deleted_event(...)Different ports operate independently, while the periodic task _heal_instance_info_cache continues to perform full cache sync as a safety net.
Lock hierarchy (simplified):
┌───────────────────────────── Instance Level ──────────────────────────────┐
│ refresh_cache-{instance_uuid} lock │
│ ├── periodic task _heal_instance_info_cache │
│ └── network‑changed event callback │
├───────────────────────────────────────────────────────────────────────┤
│ Port Level │
│ ├── port-{port_uuid} lock │
│ │ ├── attach_interface │
│ │ ├── detach_interface │
│ │ └── network‑vif‑deleted event │
└───────────────────────────────────────────────────────────────────────┘Optimization Results
Attach port latency: ~71 s → 10‑17 s
Detach port latency: ~92 s → ~7 s
Attach/Detach waiting for other tasks: Yes (queued) → No (independent execution)
High‑concurrency port consistency: possible orphaned NICs / failures → serial execution per port, consistent results
Conclusion
Three key changes enable the performance improvement while preserving data consistency:
Full → Incremental : replace full cache rebuild with incremental updates for newly added ports.
Coarse lock → Fine lock : split the instance‑level lock into per‑port locks to reduce unnecessary waiting.
Unified entry lock : use a single port‑level lock for all attach, detach, and related events, guaranteeing serial execution per port.
Combined with the periodic full‑cache sync, these changes dramatically reduce attach/detach latency and eliminate lock contention and race‑condition failures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
