{"id":20823,"date":"2024-05-21T22:48:06","date_gmt":"2024-05-22T05:48:06","guid":{"rendered":"https:\/\/tdengine.com\/?p=20823"},"modified":"2025-03-30T23:31:46","modified_gmt":"2025-03-31T06:31:46","slug":"solving-long-query-performance-bottlenecks","status":"publish","type":"post","link":"https:\/\/tdengine.com\/solving-long-query-performance-bottlenecks\/","title":{"rendered":"Solving Long Query Performance Bottlenecks"},"content":{"rendered":"\n<p>Long query issues arise when processing large, time-consuming queries in a database that handles concurrent writes and queries. These long queries can monopolize system resources, leading to potential write blockages. Sometimes, the failure to call resource release functions in query code can also manifest as long query problems. Ensuring that long queries execute correctly without hindering data writes is a challenging problem.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-0205a119 gb-headline-text\">Introduction<\/h2>\n\n\n\n<p>While most time-series data use cases may not frequently encounter this issue, it can be quite troublesome when it does occur. To address this, the TDengine development team has been continuously optimizing the system to improve query performance and response speed. This article delves into this challenge and explores how to tackle and resolve long query issues to enhance TDengine&#8217;s performance in complex query scenarios.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-388ad440 gb-headline-text\">Data Write\/Read Mechanism<\/h2>\n\n\n\n<p>Before analyzing the long query problem, it&#8217;s essential to understand TDengine&#8217;s concurrent write\/query mechanism.<\/p>\n\n\n\n<figure class=\"gb-block-image gb-block-image-857feda0\"><img decoding=\"async\" width=\"1024\" height=\"742\" class=\"gb-image gb-image-857feda0\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1-1024x742.jpeg?strip=all&sharp=1\" alt=\"TDengine's concurrent write\/query mechanism\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1-1024x742.jpeg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1-300x217.jpeg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1-768x557.jpeg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1.jpeg?strip=all&amp;sharp=1 1098w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1.jpeg?strip=all&amp;sharp=1&amp;w=219 219w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1.jpeg?strip=all&amp;sharp=1&amp;w=439 439w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1.jpeg?strip=all&amp;sharp=1&amp;w=658 658w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-1.jpeg?strip=all&amp;sharp=1&amp;w=878 878w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"gb-headline gb-headline-ce210755 gb-headline-text\">Data Write Mechanism<\/h3>\n\n\n\n<p>In TDengine, a Vnode (Virtual Node) is the basic unit for storing and querying data. Here\u2019s an overview of its write mechanism:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li>Each Vnode is allocated a certain amount of memory based on DB parameters upon creation.<\/li>\n\n\n\n<li>This memory is divided into three blocks within the Vnode.<\/li>\n\n\n\n<li>Each Vnode has a single write thread.<\/li>\n\n\n\n<li>During data writing, the Vnode allocates a memory block from the free list for data writing.<\/li>\n\n\n\n<li>Once the data exceeds a certain amount in the memory block, it is written to disk, and a new memory block is allocated for further data writing.<\/li>\n\n\n\n<li>If all memory blocks are used up and no free memory blocks are available, the write operation is blocked, awaiting the release of a memory block.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"gb-headline gb-headline-05fc9c05 gb-headline-text\">Data Query Mechanism<\/h3>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li>Queries are executed in multiple batches, each returning a portion of data, and then waiting for the next data fetch request.<\/li>\n\n\n\n<li>The query result is a combination of memory (mem\/imem) data and disk data.<\/li>\n\n\n\n<li>When a query begins, it takes a snapshot, referencing mem\/imem and disk files.<\/li>\n\n\n\n<li>Upon query completion, it unreferences mem\/imem. If the memory block&#8217;s reference count drops to zero, the block is returned to the free list.<\/li>\n<\/ol>\n\n\n\n<figure class=\"gb-block-image gb-block-image-19b15248\"><img decoding=\"async\" width=\"836\" height=\"156\" class=\"gb-image gb-image-19b15248\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-2.jpeg?strip=all&sharp=1\" alt=\"Data Query Mechanism\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-2.jpeg?strip=all&amp;sharp=1 836w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-2-300x56.jpeg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-2-768x143.jpeg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-2.jpeg?strip=all&amp;sharp=1&amp;w=167 167w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-2.jpeg?strip=all&amp;sharp=1&amp;w=501 501w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-2.jpeg?strip=all&amp;sharp=1&amp;w=668 668w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-2.jpeg?strip=all&amp;sharp=1&amp;w=450 450w\" sizes=\"(max-width: 836px) 100vw, 836px\" \/><\/figure>\n\n\n\n<h2 class=\"gb-headline gb-headline-37f98cef gb-headline-text\">The Long Query Problem<\/h2>\n\n\n\n<p>Most time-series data queries are short, such as querying the last record of a table\/supertable or performing aggregate queries like count or sum. These queries quickly release MemTable resources for reuse, not affecting ongoing writes. However, issues arise with long-duration queries (e.g., queries exceeding an hour or a day). If multiple long queries occur simultaneously, all memory blocks in a Vnode can be occupied, leading to write stoppages.<\/p>\n\n\n\n<p>Additionally, bugs in query code that fail to close query handles can result in prolonged mem\/imem occupation, blocking writes. This issue has appeared in TDengine&#8217;s subscription and stream computing features.<\/p>\n\n\n\n<figure class=\"gb-block-image gb-block-image-33db3535\"><img decoding=\"async\" width=\"1024\" height=\"742\" class=\"gb-image gb-image-33db3535\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3-1024x742.jpeg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3-1024x742.jpeg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3-300x217.jpeg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3-768x557.jpeg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3.jpeg?strip=all&amp;sharp=1 1098w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3.jpeg?strip=all&amp;sharp=1&amp;w=219 219w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3.jpeg?strip=all&amp;sharp=1&amp;w=439 439w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3.jpeg?strip=all&amp;sharp=1&amp;w=658 658w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-3.jpeg?strip=all&amp;sharp=1&amp;w=878 878w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"gb-headline gb-headline-387cef83 gb-headline-text\">Solution to the Long Query Problem<\/h2>\n\n\n\n<p>We need a solution that ensures writes are not blocked and long queries do not fail, even when many long queries are present or when user application code fails to close query handles promptly. The proposed solution is as follows:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li>When a query takes a snapshot, it registers the query handle to the memory blocks it occupies and registers a reseek function.<\/li>\n\n\n\n<li>When the query ends, it unregisters the handle from all occupied memory blocks.<\/li>\n\n\n\n<li>If a write operation finds no available memory blocks, it attempts to reclaim the oldest committed but still occupied memory block.<\/li>\n\n\n\n<li>During memory block reclamation, the write thread traverses all registered handles on that block, calling the reseek function.<\/li>\n\n\n\n<li>The reseek function locks the query handle if possible, sets the query handle to RESEEK state, saves the query state, and releases all occupied memory blocks.<\/li>\n\n\n\n<li>During its active cycle, the query thread locks the handle, checks for RESEEK state, retakes the snapshot, restores the query state, and continues querying.<\/li>\n<\/ol>\n\n\n\n<figure class=\"gb-block-image gb-block-image-ae70b2e0\"><img decoding=\"async\" width=\"1024\" height=\"536\" class=\"gb-image gb-image-ae70b2e0\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-4.jpeg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-4.jpeg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-4-300x157.jpeg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-4-768x402.jpeg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-4.jpeg?strip=all&amp;sharp=1&amp;w=204 204w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-4.jpeg?strip=all&amp;sharp=1&amp;w=409 409w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-4.jpeg?strip=all&amp;sharp=1&amp;w=614 614w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-4.jpeg?strip=all&amp;sharp=1&amp;w=819 819w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"gb-block-image gb-block-image-3ce3a10c\"><img decoding=\"async\" width=\"1024\" height=\"424\" class=\"gb-image gb-image-3ce3a10c\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-5.jpeg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-5.jpeg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-5-300x124.jpeg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-5-768x318.jpeg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-5.jpeg?strip=all&amp;sharp=1&amp;w=204 204w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-5.jpeg?strip=all&amp;sharp=1&amp;w=409 409w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-5.jpeg?strip=all&amp;sharp=1&amp;w=614 614w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-5.jpeg?strip=all&amp;sharp=1&amp;w=819 819w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"gb-headline gb-headline-03c1cc0d gb-headline-text\">Benefits of the Solution<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Writes can actively reclaim inactive query-occupied memory blocks, preventing prolonged write blockages.<\/li>\n\n\n\n<li>Long queries can resume by retaking snapshots and continuing querying, ensuring they do not fail.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"gb-headline gb-headline-dfa69641 gb-headline-text\">Q&amp;A<\/h2>\n\n\n\n<p>Q1: How to solve deadlock issues?<\/p>\n\n\n\n<p>A: When reclaiming memory blocks, the write operation needs to lock the query handle registration list. In reseek callbacks, the query handle lock is also required, leading to potential deadlocks. To prevent this:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Use trylock instead of lock<\/strong>: For write operations reclaiming memory blocks, trylock can attempt to acquire the query handle control, avoiding thread blockage and reducing deadlock risks.<\/li>\n\n\n\n<li><strong>Multiple attempt mechanism<\/strong>: Combine trylock with multiple attempts to increase the chances of acquiring the lock, lowering deadlock risks.<\/li>\n<\/ol>\n\n\n\n<p>Q2: Will long queries be continuously reseeked by writes, causing perpetual recovery?<\/p>\n\n\n\n<p>A: No. Each query handle receives a version number when opened, indicating the latest data version written to the Vnode. A query can only see data versions up to this number. During snapshot taking, mem\/imem data covered by this version number will be referenced by the query. As new data is written, new mem\/imem data won\u2019t be referenced by the long query, limiting RESEEK to at most twice per long query.<\/p>\n\n\n\n<figure class=\"gb-block-image gb-block-image-4423dbef\"><img decoding=\"async\" width=\"1024\" height=\"742\" class=\"gb-image gb-image-4423dbef\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-6.jpeg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-6.jpeg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-6-300x217.jpeg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-6-768x557.jpeg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-6.jpeg?strip=all&amp;sharp=1&amp;w=204 204w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-6.jpeg?strip=all&amp;sharp=1&amp;w=409 409w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-6.jpeg?strip=all&amp;sharp=1&amp;w=614 614w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/LQ-6.jpeg?strip=all&amp;sharp=1&amp;w=819 819w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"gb-headline gb-headline-6bfe25c0 gb-headline-text\">Conclusion<\/h2>\n\n\n\n<p>By optimizing the handling of long queries, TDengine can prevent long-duration queries from becoming a performance bottleneck. This ensures efficient concurrent writes and queries, maintaining system performance even in complex query scenarios.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Discover innovative solutions to resolve long query performance bottlenecks in TDengine, ensuring seamless concurrent data writes and queries.<\/p>\n","protected":false},"author":5,"featured_media":20829,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[21],"tags":[],"ppma_author":[119],"class_list":["post-20823","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-engineering"],"authors":[{"term_id":119,"user_id":5,"is_guest":0,"slug":"hzcheng","display_name":"Hongze Cheng","avatar_url":{"url":"https:\/\/tdengine.com\/wp-content\/uploads\/29.03-13-hzcheng.jpg","url2x":"https:\/\/tdengine.com\/wp-content\/uploads\/29.03-13-hzcheng.jpg"},"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/20823","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/comments?post=20823"}],"version-history":[{"count":11,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/20823\/revisions"}],"predecessor-version":[{"id":24723,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/20823\/revisions\/24723"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/media\/20829"}],"wp:attachment":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/media?parent=20823"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/categories?post=20823"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/tags?post=20823"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/ppma_author?post=20823"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}