{"id":23457,"date":"2024-12-19T00:07:21","date_gmt":"2024-12-19T08:07:21","guid":{"rendered":"https:\/\/tdengine.com\/?p=23457"},"modified":"2025-03-30T23:29:40","modified_gmt":"2025-03-31T06:29:40","slug":"in-depth-exploration-of-schemaless-ingestion-optimization-techniques-and-practices","status":"publish","type":"post","link":"https:\/\/tdengine.com\/in-depth-exploration-of-schemaless-ingestion-optimization-techniques-and-practices\/","title":{"rendered":"In-Depth Exploration of Schemaless ingestion: Optimization Techniques and Practices"},"content":{"rendered":"\n<p>IoT applications often require the collection of large volumes of data to support functions such as intelligent control, business analytics, and device monitoring. However, frequent changes in application logic or hardware adjustments can lead to the need for continuous updates in the data collection schema. This presents a significant challenge for time-series databases (TSDBs).<\/p>\n\n\n\n<p>To address this dynamic requirement, TDengine offers a <strong>schemaless ingestion<\/strong> mode that allows developers to insert data without having to define the schema in advance. With this approach, the system automatically determines the appropriate data storage structure and adjusts the schema when necessary. If new data columns need to be added, the schemaless mode ensures these updates are handled automatically, enabling accurate data recording. For a more detailed understanding of the primary processing logic, mapping rules, and change handling mechanisms of schemaless ingestion, you can refer to the <a href=\"https:\/\/docs.tdengine.com\/developer-guide\/schemaless-ingestion\/\">TDengine technical documentation<\/a> on the official website.<\/p>\n\n\n\n<p>TDengine has invested significant effort in optimizing its schemaless ingestion functionality to enhance both flexibility and efficiency. This article provides an overview of some of these optimizations, giving developers insights into how they can leverage this functionality to boost application performance. Our goal is to help developers reach new heights in performance optimization.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Performance Optimization Process<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identifying Performance Bottlenecks<\/h3>\n\n\n\n<p>By analyzing flame graphs of data ingestion, we can pinpoint the areas where the system is spending the most time. Specifically, functions like <code class=\"\" data-line=\"\">parseSmlKey<\/code>, <code class=\"\" data-line=\"\">parseSmlValue<\/code>, <code class=\"\" data-line=\"\">addChildTableDataPointsIntoSql<\/code>, <code class=\"\" data-line=\"\">taos_query_a<\/code>, and <code class=\"\" data-line=\"\">buildDataPointSchema<\/code> are responsible for a high proportion of the processing time.<\/p>\n\n\n\n<figure class=\"gb-block-image gb-block-image-4c0f3572\"><img decoding=\"async\" width=\"1024\" height=\"494\" class=\"gb-image gb-image-4c0f3572\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1-1024x494.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1-1024x494.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1-300x145.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1-768x370.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1-1536x741.jpg?strip=all&amp;sharp=1 1536w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1.jpg?strip=all&amp;sharp=1 1920w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1.jpg?strip=all&amp;sharp=1&amp;w=384 384w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1.jpg?strip=all&amp;sharp=1&amp;w=1152 1152w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless1.jpg?strip=all&amp;sharp=1&amp;w=450 450w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Addressing Performance Bottlenecks<\/h3>\n\n\n\n<p>For each of these functions, we can adopt one of two strategies to improve performance: either eliminate the bottleneck entirely or reduce its duration.<\/p>\n\n\n\n<p><strong>How can we eliminate these bottlenecks?<\/strong><\/p>\n\n\n\n<p>To answer this question, we first need to understand the current data parsing framework. Below is a simplified flowchart of how the existing framework processes data:<\/p>\n\n\n\n<figure class=\"gb-block-image gb-block-image-56f75500\"><img decoding=\"async\" width=\"1024\" height=\"444\" class=\"gb-image gb-image-56f75500\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6-1024x444.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6-1024x444.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6-300x130.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6-768x333.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6-1536x666.jpg?strip=all&amp;sharp=1 1536w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6.jpg?strip=all&amp;sharp=1 1920w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6.jpg?strip=all&amp;sharp=1&amp;w=384 384w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6.jpg?strip=all&amp;sharp=1&amp;w=1152 1152w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-6.jpg?strip=all&amp;sharp=1&amp;w=450 450w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Analysis of the Current Framework<\/h3>\n\n\n\n<p>In the current process, each record is iterated over, extracting the measurement (measure), tags, and columns (col) key-value pairs. These are stored in custom structures, and the system sorts the tag keys and generates corresponding sub-table names based on predefined rules. However, the repeated parsing, sorting of tags, and subtable name generation in this workflow add unnecessary time and computational complexity.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Schema Processing<\/h5>\n\n\n\n<p>After acquiring the schema metadata for a measurement, the system must check whether the schema needs to be updated. This involves traversing the tags and columns of each data record and determining if operations like <code class=\"\" data-line=\"\">create stable<\/code>, <code class=\"\" data-line=\"\">add col<\/code>, <code class=\"\" data-line=\"\">add tag<\/code>, <code class=\"\" data-line=\"\">modify col<\/code>, or <code class=\"\" data-line=\"\">modify tag<\/code> are required.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Data Insertion<\/h5>\n\n\n\n<p>For datasets with fewer than 10 records, the system constructs SQL statements individually and inserts them via the <code class=\"\" data-line=\"\">taos_query<\/code> function. For larger datasets, a batch insert approach is adopted using <code class=\"\" data-line=\"\">stmt<\/code> structures, which allows for more efficient bulk insertion.<\/p>\n\n\n\n<p>The following code snippets illustrate these key functions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Primary Functions:<\/strong><code class=\"\" data-line=\"\">tscParseLine<\/code>, <code class=\"\" data-line=\"\">parseSmlKvPairs<\/code>, <code class=\"\" data-line=\"\">tscSmlInsert<\/code>, <code class=\"\" data-line=\"\">buildDataPointSchemas<\/code>, <code class=\"\" data-line=\"\">modifyDBSchemas<\/code>, <code class=\"\" data-line=\"\">applyDataPointsWithSqlInsert<\/code><\/li>\n<\/ul>\n\n\n\n<figure class=\"gb-block-image gb-block-image-a9994f4a\"><img decoding=\"async\" width=\"1024\" height=\"642\" class=\"gb-image gb-image-a9994f4a\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3-1024x642.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3-1024x642.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3-300x188.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3-768x482.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3.jpg?strip=all&amp;sharp=1 1153w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3.jpg?strip=all&amp;sharp=1&amp;w=230 230w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3.jpg?strip=all&amp;sharp=1&amp;w=461 461w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3.jpg?strip=all&amp;sharp=1&amp;w=691 691w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless3.jpg?strip=all&amp;sharp=1&amp;w=922 922w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>You can find the detailed code in the following GitHub links:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/taosdata\/TDengine\/blob\/2.6\/src\/client\/src\/tscParseLineProtocol.c\" rel=\"noopener\">tscParseLineProtocol.c<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/taosdata\/TDengine\/blob\/2.6\/src\/client\/src\/tscParseOpenTSDB.c\" rel=\"noopener\">tscParseOpenTSDB.c<\/a><\/li>\n<\/ul>\n\n\n\n<p>Now, let&#8217;s dive into the optimization of the architecture.<\/p>\n\n\n\n<figure class=\"gb-block-image gb-block-image-d5f079fa\"><img decoding=\"async\" width=\"1024\" height=\"510\" class=\"gb-image gb-image-d5f079fa\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7-1024x510.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7-1024x510.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7-300x150.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7-768x383.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7-1536x766.jpg?strip=all&amp;sharp=1 1536w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7.jpg?strip=all&amp;sharp=1 1920w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7.jpg?strip=all&amp;sharp=1&amp;w=384 384w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7.jpg?strip=all&amp;sharp=1&amp;w=1152 1152w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/whiteboard_exported_image-7.jpg?strip=all&amp;sharp=1&amp;w=450 450w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Major Optimizations<\/h3>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Line Protocol Parsing<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Analysis of the data shows that tags with the same prefix often remain consistent. Therefore, we can pre-group the tag strings, placing tags with the same prefix into the same group. Within each group, we only need to parse the tags once, reducing redundant parsing operations.<\/p>\n\n\n\n<p>Additionally, by checking if the data is already ordered, we can bind the data directly in sequence, avoiding the overhead of hash lookups. This reduces computational costs and accelerates the process.<\/p>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><strong>Schema Processing<\/strong><\/li>\n<\/ol>\n\n\n\n<p>During parsing, we should check whether the schema requires modification, such as adding new columns or adjusting column lengths. If no changes are needed, we bypass the schema modification logic, further improving efficiency.<\/p>\n\n\n\n<ol start=\"3\" class=\"wp-block-list\">\n<li><strong>Data Insertion<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Data construction is carried out directly during the parsing of each line, binding the data straight to the final <code class=\"\" data-line=\"\">STableDataCxt<\/code> structure. This approach eliminates the need for data binding and copying in subsequent stages.<\/p>\n\n\n\n<p>Moreover, we can bypass SQL or <code class=\"\" data-line=\"\">stmt<\/code> interfaces, directly constructing <code class=\"\" data-line=\"\">BuildRow<\/code> data to avoid the extra step of second-pass parsing.<\/p>\n\n\n\n<p>Below are code snippets from key functions involved in these optimizations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key Functions:<\/strong><code class=\"\" data-line=\"\">smlParseInfluxString<\/code>, <code class=\"\" data-line=\"\">smlParseTagKv<\/code>, <code class=\"\" data-line=\"\">smlParseColKv<\/code>, <code class=\"\" data-line=\"\">smlParseLineBottom<\/code>, <code class=\"\" data-line=\"\">smlModifyDBSchemas<\/code>, <code class=\"\" data-line=\"\">smlInsertData<\/code><\/li>\n<\/ul>\n\n\n\n<figure class=\"gb-block-image gb-block-image-226e4595\"><img decoding=\"async\" width=\"1024\" height=\"845\" class=\"gb-image gb-image-226e4595\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless4-1024x845.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless4-1024x845.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless4-300x248.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless4-768x634.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless4.jpg?strip=all&amp;sharp=1 1212w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless4.jpg?strip=all&amp;sharp=1&amp;w=242 242w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless4.jpg?strip=all&amp;sharp=1&amp;w=484 484w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless4.jpg?strip=all&amp;sharp=1&amp;w=969 969w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"gb-block-image gb-block-image-f446e436\"><img decoding=\"async\" width=\"1024\" height=\"710\" class=\"gb-image gb-image-f446e436\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5-1024x710.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5-1024x710.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5-300x208.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5-768x533.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5.jpg?strip=all&amp;sharp=1 1103w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5.jpg?strip=all&amp;sharp=1&amp;w=220 220w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5.jpg?strip=all&amp;sharp=1&amp;w=441 441w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5.jpg?strip=all&amp;sharp=1&amp;w=661 661w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless5.jpg?strip=all&amp;sharp=1&amp;w=882 882w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>You can view the full code in the following GitHub links:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/taosdata\/TDengine\/blob\/3.0\/source\/client\/src\/clientSml.c\" rel=\"noopener\">clientSml.c<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/taosdata\/TDengine\/blob\/3.0\/source\/client\/src\/clientSmlLine.c\" rel=\"noopener\">clientSmlLine.c<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/taosdata\/TDengine\/blob\/3.0\/source\/client\/src\/clientSmlTelnet.c\" rel=\"noopener\">clientSmlTelnet.c<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/taosdata\/TDengine\/blob\/3.0\/source\/client\/src\/clientSmlJson.c\" rel=\"noopener\">clientSmlJson.c<\/a><\/li>\n<\/ul>\n\n\n\n<p>We also identified areas where memory usage could be further optimized during data parsing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Memory Optimization<\/h3>\n\n\n\n<p>In the process of converting data to a schemaless format, numerous memory allocations and copies are made for measurements, tags, and columns. While these steps are common, they represent potential optimization opportunities.<\/p>\n\n\n\n<p>Instead of directly copying and allocating memory during parsing, we can record the pointer locations and lengths of each data item in the original dataset. When the data is ready to be written to the database, we can copy the data from the original dataset based on the recorded pointers.<\/p>\n\n\n\n<p>For example, instead of allocating memory immediately for <code class=\"\" data-line=\"\">t1\/t2\/t3\/c1\/c3\/c2<\/code>, we can directly record their pointer locations and perform the memory copy only when needed.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Code Optimization Example:<\/h5>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">st,t1=3,t2=4,t3=t3 c1=3i64,c3=&quot;passit&quot;,c2=false,c4=4    1626006833639000000<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">typedef struct {\n    char* key;\n    uint8_t type;\n    uint16_t length;\n    char* value;\n    uint32_t fieldSchemaIdx;\n} TAOS_SML_KV;<\/code><\/pre>\n\n\n\n<figure class=\"gb-block-image gb-block-image-830d712b\"><img decoding=\"async\" width=\"822\" height=\"484\" class=\"gb-image gb-image-830d712b\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless6.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless6.jpg?strip=all&amp;sharp=1 822w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless6-300x177.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless6-768x452.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless6.jpg?strip=all&amp;sharp=1&amp;w=164 164w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless6.jpg?strip=all&amp;sharp=1&amp;w=493 493w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless6.jpg?strip=all&amp;sharp=1&amp;w=657 657w\" sizes=\"(max-width: 822px) 100vw, 822px\" \/><\/figure>\n\n\n\n<p>Alternatively, we can use pointers to avoid unnecessary memory allocation:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">typedef struct {\n    char *measure;\n    char *tags;\n    char *cols;\n    char *timestamp;\n    char *measureTag;\n    int32_t measureLen;\n    int32_t measureTagsLen;\n    int32_t tagsLen;\n    int32_t colsLen;\n    int32_t timestampLen;\n    SArray colArray;\n} SSmlLineInfo;<\/code><\/pre>\n\n\n\n<figure class=\"gb-block-image gb-block-image-51f50b08\"><img decoding=\"async\" width=\"1024\" height=\"470\" class=\"gb-image gb-image-51f50b08\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7-1024x470.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7-1024x470.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7-300x138.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7-768x352.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7.jpg?strip=all&amp;sharp=1 1145w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7.jpg?strip=all&amp;sharp=1&amp;w=229 229w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7.jpg?strip=all&amp;sharp=1&amp;w=458 458w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7.jpg?strip=all&amp;sharp=1&amp;w=687 687w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless7.jpg?strip=all&amp;sharp=1&amp;w=916 916w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Time Precision Conversion Optimization<\/h4>\n\n\n\n<p>By directly accessing memory locations rather than performing conditional checks, we can significantly reduce processing time. For example, when converting time precision, we can avoid checking multiple conditions and instead directly access the appropriate unit.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">if(fromPrecision == ){}\nelse if( fromPrecision == ){}\nelse {}\n\nint64_t smlToMilli&#091;3] = {3600000LL, 60000LL, 1000LL};\nint64_t unit = smlToMilli&#091;fromPrecision - TSDB_TIME_PRECISION_HOURS];\nif (unit &gt; &lt;strong&gt;INT64_MAX &lt;\/strong&gt;\/ tsInt64) {\n  return -1;\n}\ntsInt64 *= unit;<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">JSON Parsing Optimization<\/h4>\n\n\n\n<p>Since the JSON format for ingestion is usually fixed, we can precompute the offsets for elements like metrics, timestamps, values, and tags. This allows for quicker data handling during parsing.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">&#091;\n{ \n    &quot;metric&quot;: &quot;meter_current&quot;,\n    &quot;timestamp&quot; : 1346846400,\n    &quot;value&quot;: 18,\n    &quot;tags&quot;: {\n        &quot;location&quot;: &quot;LosAngeles&quot;,\n        &quot;id&quot;: &quot;d1001&quot;\n    }     \n}\n]<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Conditional Logic Optimization<\/h4>\n\n\n\n<p>To minimize entropy in each check, place the most likely <code class=\"\" data-line=\"\">if<\/code> conditions first. For example: i64 \/ u64 \/ i8 \/ u8 \/ true \/ L&#8221;&#8221;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">if( str equals &quot;i64&quot;){}\nelse if( str equals &quot;i32&quot;){}\nelse if( str equals &quot;u8&quot;){}\n\u00b7\u00b7\u00b7\n\nif(str&#091;0] equals &quot;i&quot;){}\nelse if(str&#091;0] equals &quot;u&quot;){}\n...<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Other Optimizations<\/h4>\n\n\n\n<p>In some cases, using the <code class=\"\" data-line=\"\">likely<\/code> and <code class=\"\" data-line=\"\">unlikely<\/code> keywords can guide the compiler to optimize the instruction pipeline for better performance.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">if(unlikely(&lt;em&gt;len &lt;\/em&gt;== 0 || (&lt;em&gt;len &lt;\/em&gt;== 1 &amp;&amp; &lt;em&gt;data&lt;\/em&gt;&#091;0] == &#039;0&#039;))){\n  return taosGetTimestampNs()\/smlFactorNS&#091;toPrecision];\n}<\/code><\/pre>\n\n\n\n<p>Additionally, excessive logging, especially at high frequencies, can cause performance degradation. We observed that frequent logging increases processing time by about 10%.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Performance Comparison<\/h2>\n\n\n\n<table id=\"tablepress-73\" class=\"tablepress tablepress-id-73\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Version<\/th><th class=\"column-2\">SQL<\/th><th class=\"column-3\">Line Protocol<\/th><th class=\"column-4\">Telnet Protocol<\/th><th class=\"column-5\">JSON Protocol<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">2.6(4ec22e8)<\/td><td class=\"column-2\">4543622<\/td><td class=\"column-3\">1458304<\/td><td class=\"column-4\">2161855<\/td><td class=\"column-5\">1272000<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">ver-3.0.0.0<\/td><td class=\"column-2\">1638498<\/td><td class=\"column-3\">1650033<\/td><td class=\"column-4\">1945982<\/td><td class=\"column-5\">800000<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">3.0(f6793d5)<\/td><td class=\"column-2\">3740774<\/td><td class=\"column-3\">3602947<\/td><td class=\"column-4\">4328447<\/td><td class=\"column-5\">5520000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-73 from cache -->\n\n\n<p>In the comparison between TDengine 3.0 and 2.6(the speed is measured in records\/second), we observed a notable improvement in performance:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Line Protocol:<\/strong> 2.5x faster<\/li>\n\n\n\n<li><strong>Telnet:<\/strong> 2x faster<\/li>\n\n\n\n<li><strong>JSON<\/strong><strong>:<\/strong> 5x faster<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Performance Analysis Tools<\/h2>\n\n\n\n<figure class=\"gb-block-image gb-block-image-bdd69322\"><img decoding=\"async\" width=\"1024\" height=\"479\" class=\"gb-image gb-image-bdd69322\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8-1024x479.jpg?strip=all&sharp=1\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8-1024x479.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8-300x140.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8-768x360.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8-1536x719.jpg?strip=all&amp;sharp=1 1536w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8.jpg?strip=all&amp;sharp=1 1920w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8.jpg?strip=all&amp;sharp=1&amp;w=384 384w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8.jpg?strip=all&amp;sharp=1&amp;w=1152 1152w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless8.jpg?strip=all&amp;sharp=1&amp;w=450 450w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>To analyze performance, we used tools like <strong>flame graphs<\/strong> and <strong>perf<\/strong>. The <code class=\"\" data-line=\"\">perf top -p<\/code> command helps us identify high CPU usage areas and optimize them further.<\/p>\n\n\n\n<figure class=\"gb-block-image gb-block-image-e5e2f787\"><img decoding=\"async\" width=\"1588\" height=\"1134\" class=\"gb-image gb-image-e5e2f787\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1.jpg?strip=all&sharp=1\" alt=\"\" title=\"schemaless9\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1.jpg?strip=all&amp;sharp=1 1588w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1-300x214.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1-1024x731.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1-768x548.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1-1536x1097.jpg?strip=all&amp;sharp=1 1536w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1.jpg?strip=all&amp;sharp=1&amp;w=635 635w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1.jpg?strip=all&amp;sharp=1&amp;w=952 952w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1.jpg?strip=all&amp;sharp=1&amp;w=1270 1270w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless9-1.jpg?strip=all&amp;sharp=1&amp;w=450 450w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n\n\n\n<figure class=\"gb-block-image gb-block-image-d0e6f817\"><img decoding=\"async\" width=\"1920\" height=\"1290\" class=\"gb-image gb-image-d0e6f817\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1.jpg?strip=all&sharp=1\" alt=\"\" title=\"schemaless10\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1.jpg?strip=all&amp;sharp=1 1920w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1-300x202.jpg?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1-1024x688.jpg?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1-768x516.jpg?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1-1536x1032.jpg?strip=all&amp;sharp=1 1536w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1.jpg?strip=all&amp;sharp=1&amp;w=384 384w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1.jpg?strip=all&amp;sharp=1&amp;w=1152 1152w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/schemaless10-1.jpg?strip=all&amp;sharp=1&amp;w=450 450w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Before undertaking performance optimization, it is essential to thoroughly understand the system&#8217;s architecture and flow. Performance improvements should focus on three key areas:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Architecture Evaluation:<\/strong> Ensure the architecture is designed for optimal performance. It serves as the foundation for any improvements.<\/li>\n\n\n\n<li><strong>Bottleneck Identification:<\/strong> Address the most significant bottlenecks first and tackle secondary issues as needed.<\/li>\n\n\n\n<li><strong>Performance Analysis Techniques:<\/strong> Use tools like flame graphs and <code class=\"\" data-line=\"\">perf<\/code> to accurately identify areas for improvement.<\/li>\n<\/ol>\n\n\n\n<p>By considering factors like CPU usage, memory, network, compiler optimizations, and hardware constraints, you can achieve substantial performance gains. We hope this article provides valuable insights for developers working with TDengine&#8217;s schemaless ingestion and helps you unlock higher performance in your applications.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>TDengine has optimized the Schemaless ingestion feature to improve flexibility and efficiency.<\/p>\n","protected":false},"author":81,"featured_media":23475,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[21],"tags":[],"ppma_author":[167],"class_list":["post-23457","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-engineering"],"authors":[{"term_id":167,"user_id":81,"is_guest":0,"slug":"chait","display_name":"Chait Diwadkar","avatar_url":{"url":"https:\/\/tdengine.com\/wp-content\/uploads\/29.03-05-cdiwadkar.jpg","url2x":"https:\/\/tdengine.com\/wp-content\/uploads\/29.03-05-cdiwadkar.jpg"},"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/23457","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/users\/81"}],"replies":[{"embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/comments?post=23457"}],"version-history":[{"count":1,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/23457\/revisions"}],"predecessor-version":[{"id":23472,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/23457\/revisions\/23472"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/media\/23475"}],"wp:attachment":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/media?parent=23457"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/categories?post=23457"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/tags?post=23457"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/ppma_author?post=23457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}