{"id":7696,"date":"2022-08-10T13:46:05","date_gmt":"2022-08-10T20:46:05","guid":{"rendered":"https:\/\/tdengine.com\/?p=7696"},"modified":"2025-03-30T20:11:48","modified_gmt":"2025-03-31T03:11:48","slug":"synchronizing-data-from-kafka-to-tdengine","status":"publish","type":"post","link":"https:\/\/tdengine.com\/synchronizing-data-from-kafka-to-tdengine\/","title":{"rendered":"Synchronizing Data from Kafka to TDengine"},"content":{"rendered":"\n<p>This article discusses how to use the TDengine Kafka Connector to synchronize data from Kafka to TDengine and provides a sample script to test its in a data synchronization scenario.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-bc360c46 gb-headline-text\">Background<\/h2>\n\n\n\n<p>Kafka is a general-purpose message broker featuring a distributed architecture. You can use the Kafka Connect component to read from and write to Kafka, and its plug-ins can be used to read from and write to a variety of data sources. Kafka Connect supports fault tolerance, restarting, logging, elastic scaling, and serialization and deserialization.<\/p>\n\n\n\n<p>To make integrating Kafka and TDengine a simpler process, the TDengine Team developed the TDengine Kafka Connector as a Kafka Connect plug-in. The TDengine Kafka Connector consists of the TDengine Source Connector and TDengine Sink Connector. In this article, the TDengine Sink Connector is used to integrate Kafka with TDengine.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-44590e4b gb-headline-text\">TDengine Sink Connector implementation<\/h2>\n\n\n\n<p>The TDengine Sink Connector synchronizes the data from specified Kafka topics to a TDengine <a href=\"https:\/\/tdengine.com\/what-is-a-time-series-database\/\">time-series database<\/a> (TSDB) in batches or in real time.<\/p>\n\n\n\n<p>Before running the TDengine Sink Connector, you must create a <code class=\"\" data-line=\"\">properties<\/code> file. For information about the <code class=\"\" data-line=\"\">properties<\/code> file, see the <a href=\"https:\/\/docs.tdengine.com\/third-party-tools\/data-collection\/kafka-connect\/#add-source-connector-configuration-file\">official documentation<\/a>.<\/p>\n\n\n\n<p>The implementation of the TDengine Sink Connector is as follows:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The Kafka Connect framework starts a specified number of consumer threads.<\/li>\n\n\n\n<li>These consumers simultaneously subscribe to data and deserialize the data according to the values of <strong>key.converter<\/strong> and <strong>value.converter<\/strong> set in the configuration file.<\/li>\n\n\n\n<li>The Kafka Connect framework sends the deserialized data to a specified number of SinkTask instances.<\/li>\n\n\n\n<li>SinkTask uses the schemaless write interface provided by TDengine to write data into the database.<\/li>\n<\/ol>\n\n\n\n<p>In this process, the first three steps are implemented automatically by the Kafka Connect framework, and the TDengine Sink Connector performs the final step on its own.<\/p>\n\n\n\n<h3 class=\"gb-headline gb-headline-ce42792d gb-headline-text\">Supported data formats<\/h3>\n\n\n\n<p>The TDengine Sink Connector writes data in schemaless mode. It supports the InfluxDB line format, OpenTSDB telnet format, and OpenTSDB JSON format. You can modify the value of the <strong>db.schemaless<\/strong> parameter to choose the format that you want to use. As an example, the following configuration enables InfluxDB line format:<\/p>\n\n\n\n<pre class=\"wp-block-code language-properties\"><code class=\"\" data-line=\"\">db.schemaless=line<\/code><\/pre>\n\n\n\n<p>If the data in your Kafka deployment is already in one of the three formats mentioned, set <strong>value.converter<\/strong> to the built-in Kafka Connect string converter:<\/p>\n\n\n\n<pre class=\"wp-block-code language-properties\"><code class=\"\" data-line=\"\">value.converter=org.apache.kafka.connect.storage.StringConverter<\/code><\/pre>\n\n\n\n<p>Otherwise, you must implement your own converter to process your data into one of the supported formats. For more information, see <a href=\"https:\/\/docs.confluent.io\/platform\/current\/connect\/concepts.html#converters\" rel=\"noopener\">Converters<\/a>.<\/p>\n\n\n\n<p>In this implementation, Kafka Connect is acting as the consumer. Consumer behavior can therefore be controlled by modifying the Kafka Connect configuration.<\/p>\n\n\n\n<h3 class=\"gb-headline gb-headline-45c8eca7 gb-headline-text\">Topic configuration<\/h3>\n\n\n\n<p>The topics to which the consumer subscribes are controlled by the <strong>topics<\/strong> parameter. For all other parameters, you can override the default configuration by adding the parameter to the properties file with each parameter having the <strong>consumer.override<\/strong> prefix. For example, the following line changes the maximum records per poll to 3000:<\/p>\n\n\n\n<pre class=\"wp-block-code language-properties\"><code class=\"\" data-line=\"\">consumer.override.max.poll.records=3000<\/code><\/pre>\n\n\n\n<h3 class=\"gb-headline gb-headline-8ef33340 gb-headline-text\">Thread configuration<\/h3>\n\n\n\n<p>In a Kafka Connect sink connector, each task is a consumer thread that receives data from the partitions of a topic. You can use the <strong>tasks.max<\/strong> parameter to control the maximum number of tasks and thereby the maximum number of threads. However, the number of tasks that are actually initiated is related to the number of topic partitions. For example, if you have ten partitions and the value of the <strong>tasks.max<\/strong> parameter is 5, each task will receive data from two partitions and keep track of the offsets of two partitions.<\/p>\n\n\n\n<p>Note that if the value of the <strong>tasks.max<\/strong> parameter is larger than the number of partitions, the number of tasks that Kafka Connect initiates is equal to the number of partitions. The number of tasks is not related to the number of topics.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-74b77441 gb-headline-text\">Procedure<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Install Kafka.\n<ol class=\"wp-block-list\">\n<li>Download and decompress the Kafka installation package. <br><code class=\"\" data-line=\"\">wget https:\/\/dlcdn.apache.org\/kafka\/3.2.0\/kafka_2.13-3.2.0.tgz<\/code><br><code class=\"\" data-line=\"\">tar -xzf kafka_2.13-3.2.0.tgz<\/code><\/li>\n\n\n\n<li>Edit the <code class=\"\" data-line=\"\">.bash_profile<\/code> file and add the following text:<br><code class=\"\" data-line=\"\">export KAFKA_HOME=\/home\/bding\/kafka_2.13-3.2.0<\/code><br><code class=\"\" data-line=\"\">export PATH=$PATH:$KAFKA_HOME\/bin<\/code><\/li>\n\n\n\n<li>Reload the bash profile.<br><code class=\"\" data-line=\"\">source .bash_profile<\/code><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Configure Kafka.\n<ol class=\"wp-block-list\">\n<li>Edit the Kafka Connect properties file:<br><code class=\"\" data-line=\"\">cd kafka_2.13-3.2.0\/config\/<\/code><br><code class=\"\" data-line=\"\">vi connect-standalone.properties<\/code><\/li>\n\n\n\n<li>Add the plug-in directory:<br><code class=\"\" data-line=\"\">plugin.path=\/home\/bding\/connectors<\/code><\/li>\n\n\n\n<li>Edit the log configuration of Kafka Connect:<br><code class=\"\" data-line=\"\">vi connect-log4j.properties<\/code><\/li>\n\n\n\n<li>Set the logging level for the TDengine Sink Connector to <code class=\"\" data-line=\"\">DEBUG<\/code>: <br><code class=\"\" data-line=\"\">log4j.logger.com.taosdata.kafka.connect.sink=DEBUG<\/code><br>This modification is necessary because these logs are used to calculate the time spent in synchronizing data.<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Compile and install the TDengine Sink Connector.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code language-shell\"><code class=\"\" data-line=\"\">   git clone git@github.com:taosdata\/kafka-connect-tdengine.git\n   cd kafka-connect-tdengine\n   mvn clean package\n   unzip -d ~\/connectors target\/components\/packages\/taosdata-kafka-connect-tdengine-*.zip<\/code><\/pre>\n\n\n\n<ol start=\"4\" class=\"wp-block-list\">\n<li>Start the ZooKeeper and Kafka servers.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code language-shell\"><code class=\"\" data-line=\"\">   zookeeper-server-start.sh -daemon $KAFKA_HOME\/config\/zookeeper.properties\n   kafka-server-start.sh -daemon $KAFKA_HOME\/config\/server.properties<\/code><\/pre>\n\n\n\n<ol start=\"5\" class=\"wp-block-list\">\n<li>Create a topic.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code language-shell\"><code class=\"\" data-line=\"\">   kafka-topics.sh --create --topic meters --partitions 1 --bootstrap-server localhost:9092<\/code><\/pre>\n\n\n\n<ol start=\"6\" class=\"wp-block-list\">\n<li>Generate test data.\n<ol class=\"wp-block-list\">\n<li>Save the following script as <code class=\"\" data-line=\"\">gen-data.py<\/code>:<br><code class=\"\" data-line=\"\">#!\/usr\/bin\/python3<\/code><br><code class=\"\" data-line=\"\">import random<\/code><br><code class=\"\" data-line=\"\">import sys<\/code><br><code class=\"\" data-line=\"\">topic = sys.argv[1]<\/code><br><code class=\"\" data-line=\"\">count = int(sys.argv[2])<\/code><br><code class=\"\" data-line=\"\">start_ts = 1648432611249000000<\/code><br><code class=\"\" data-line=\"\">location = [&quot;SanFrancisco&quot;, &quot;LosAngeles&quot;, &quot;SanDiego&quot;]<\/code><br><code class=\"\" data-line=\"\">for i in range(count):<\/code><br>    <code class=\"\" data-line=\"\">ts = start_ts + i<\/code><br>    <code class=\"\" data-line=\"\">row = f&quot;{topic},location={location[i % 3]},groupid=2 current={random.random() * 10},voltage={random.randint(100, 300)},phase={random.random()} {ts}&quot;<\/code><br>    <code class=\"\" data-line=\"\">print(row)<\/code><\/li>\n\n\n\n<li>Run <code class=\"\" data-line=\"\">gen-data.py<\/code>:<br><code class=\"\" data-line=\"\">python3 gen-data.py meters 10000 | kafka-console-producer.sh --broker-list localhost:9092 --topic meters<\/code><p>The script generates 10,000 data points in InfluxDB line format and adds them to the meters topic. Each data point has two label fields and three data fields.<\/p><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Start Kafka Connect.\n<ol class=\"wp-block-list\">\n<li>Save the following configuration as <code class=\"\" data-line=\"\">sink-test.properties<\/code>:<br><code class=\"\" data-line=\"\">name=TDengineSinkConnector connector.class=com.taosdata.kafka.connect.sink.TDengineSinkConnector<\/code><br><code class=\"\" data-line=\"\">tasks.max=1<\/code><br><code class=\"\" data-line=\"\">topics=meters<\/code><br><code class=\"\" data-line=\"\">connection.url=jdbc:TAOS:\/\/127.0.0.1:6030<\/code><br><code class=\"\" data-line=\"\">connection.user=root<\/code><br><code class=\"\" data-line=\"\">connection.password=taosdata<\/code><br><code class=\"\" data-line=\"\">connection.database=power<\/code><br><code class=\"\" data-line=\"\">db.schemaless=line<\/code><br><code class=\"\" data-line=\"\">key.converter=org.apache.kafka.connect.storage.StringConverter <\/code><br><code class=\"\" data-line=\"\">value.converter=org.apache.kafka.connect.storage.StringConverter<\/code><\/li>\n\n\n\n<li>Start Kafka Connect:<br><code class=\"\" data-line=\"\">connect-standalone.sh -daemon $KAFKA_HOME\/config\/connect-standalone.properties sink-test.properties<\/code><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Use the TDengine CLI to query the meters table in the power database to verify that there are 10,000 data points.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-preformatted\">   [bding@vm95 test]$ taos\n\n   Welcome to the TDengine shell from Linux, Client Version:2.6.0.4\n   Copyright (c) 2022 by TAOS Data, Inc. All rights reserved.\n   taos&gt; select count(*) from power.meters;\n          count(*)        |\n   ========================\n                    10000 |<\/pre>\n\n\n\n<h2 class=\"gb-headline gb-headline-8c97fe18 gb-headline-text\">TDengine Sink Connector performance testing<\/h2>\n\n\n\n<p>This performance test is similar to steps 4 through 7 in the previous section. Note the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The performance test script takes two arguments. The first is the number of partitions, and the second is the number of data points to generate.<\/li>\n\n\n\n<li>The tasks.max parameter is set to the number of partitions. This controls the number of threads in each test.<\/li>\n\n\n\n<li>The test database must be empty before the test begins.<\/li>\n\n\n\n<li>After the test is complete, stop Kafka, ZooKeeper, and Kafka Connect manually.<\/li>\n<\/ul>\n\n\n\n<p>In each test, data is first written to Kafka and then synchronized by Kafka Connect to TDengine. This ensures that the sink connector handles the entire pressure caused by the synchronization task. The total time required for synchronization is calculated from when the sink connector receives the first batch of data to when it receives the last batch of data.<\/p>\n\n\n\n<p>To run the performance test, save the following script as <code class=\"\" data-line=\"\">run-test.sh<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code language-shell\"><code class=\"\" data-line=\"\">#!\/bin\/bash\nif &#091; $# -lt 2 ];then\n        echo  &quot;Usage: .\/run-test.sh &lt;num_of_partitions&gt;  &lt;total_records&gt;&quot;\n        exit 0\nfi\necho &quot;---------------------------TEST STARTED---------------------------------------&quot;\necho clean data and logs\ntaos -s &quot;DROP DATABASE IF EXISTS power&quot;\nrm -rf \/tmp\/kafka-logs \/tmp\/zookeeper\nrm -f $KAFKA_HOME\/logs\/connect.log\nnp=$1     # number of partitions\ntotal=$2  # number of records\necho number of partitions is $np, number of recordes is $total.\necho start zookeeper\nzookeeper-server-start.sh -daemon $KAFKA_HOME\/config\/zookeeper.properties\necho start kafka\nsleep 3\nkafka-server-start.sh -daemon $KAFKA_HOME\/config\/server.properties\nsleep 5\necho create topic\nkafka-topics.sh --create --topic meters --partitions $np --bootstrap-server localhost:9092\nkafka-topics.sh --describe --topic meters --bootstrap-server localhost:9092\necho generate test data\npython3 gen-data.py meters $total  | kafka-console-producer.sh --broker-list localhost:9092 --topic meters\n\necho alter connector configuration setting tasks.max=$np\nsed -i  &quot;s\/tasks.max=.*\/tasks.max=${np}\/&quot;  sink-test.properties\necho start kafka connect\nconnect-standalone.sh -daemon $KAFKA_HOME\/config\/connect-standalone.properties sink-test.properties\n     \necho -e &quot;\\e&#091;1;31m open another console to monitor connect.log. press enter when no more data received.\\e&#091;0m&quot;\nread\n     \necho stop connect\njps | grep ConnectStandalone | awk &#039;{print $1}&#039; | xargs kill\necho stop kafka server\nkafka-server-stop.sh\necho stop zookeeper\nzookeeper-server-stop.sh\n# extract timestamps of receiving the first batch of data and the last batch of data\ngrep &quot;records&quot; $KAFKA_HOME\/logs\/connect.log  | grep meters- &gt; tmp.log\nstart_time=`cat tmp.log | grep -Eo &quot;&#091;0-9]{4}-&#091;0-9]{2}-&#091;0-9]{2} &#091;0-9]{2}:&#091;0-9]{2}:&#091;0-9]{2},&#091;0-9]{3}&quot; | head -1`\nstop_time=`cat tmp.log | grep -Eo &quot;&#091;0-9]{4}-&#091;0-9]{2}-&#091;0-9]{2} &#091;0-9]{2}:&#091;0-9]{2}:&#091;0-9]{2},&#091;0-9]{3}&quot; | tail -1`\n    \necho &quot;--------------------------TEST FINISHED------------------------------------&quot;\necho &quot;| records | partitions | start time | stop time |&quot;\necho &quot;|---------|------------|------------|-----------|&quot;\necho &quot;| $total | $np | $start_time | $stop_time |&quot;\n<\/code><\/pre>\n\n\n\n<p>You can then run the script and specify a number of partitions and data points. As an example, run the following command for a performance test with one partition and 1 million data points:<\/p>\n\n\n\n<pre class=\"wp-block-code language-shell\"><code class=\"\" data-line=\"\">.\/run-test.sh 1 1000000<\/code><\/pre>\n\n\n\n<p>The test process is shown in the following figure.<\/p>\n\n\n\n<figure class=\"gb-block-image gb-block-image-c3c8c5dd\"><img decoding=\"async\" width=\"1182\" height=\"560\" class=\"gb-image gb-image-c3c8c5dd\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing.png?strip=all&sharp=1\" alt=\"\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing.png?strip=all&amp;sharp=1 1182w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing-300x142.png?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing-1024x485.png?strip=all&amp;sharp=1 1024w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing-768x364.png?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing.png?strip=all&amp;sharp=1&amp;w=236 236w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing.png?strip=all&amp;sharp=1&amp;w=472 472w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing.png?strip=all&amp;sharp=1&amp;w=709 709w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-01-testing.png?strip=all&amp;sharp=1&amp;w=945 945w\" sizes=\"(max-width: 1182px) 100vw, 1182px\" \/><\/figure>\n\n\n\n<p>Note that you must monitor the <code class=\"\" data-line=\"\">connect.log<\/code> file in a separate console and press <strong>Enter<\/strong> once all data has been consumed. You can use the <code class=\"\" data-line=\"\">tail -f connect.log<\/code> command to monitor progress:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[bding@vm95 ~]$ cd kafka_2.13-3.2.0\/logs\/\n[bding@vm95 logs]$ tail -f connect.log\n[2022-06-21 17:39:00,176] DEBUG [TDengineSinkConnector|task-0] Received 500 records. First record kafka coordinates:(meters-0-314496). Writing them to the database... (com.taosdata.kafka.connect.sink.TDengineSinkTask:101)\n[2022-06-21 17:39:00,180] DEBUG [TDengineSinkConnector|task-0] Received 500 records. First record kafka coordinates:(meters-0-314996). Writing them to the database... (com.taosdata.kafka.connect.sink.TDengineSinkTask:101)<\/pre>\n\n\n\n<p>When new entries are no longer being written to the log file, this indicates that the data consumption has been completed.<\/p>\n\n\n\n<p>The following table shows the data write performance determined by this test.<\/p>\n\n\n\n<table id=\"tablepress-40\" class=\"tablepress tablepress-id-40\">\n<thead>\n<tr class=\"row-1\">\n\t<td class=\"column-1\"><\/td><th class=\"column-2\">1 thread<\/th><th class=\"column-3\">3 thread<\/th><th class=\"column-4\">5 threads<\/th><th class=\"column-5\">10 threads<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">1 million<\/td><td class=\"column-2\">105,219<\/td><td class=\"column-3\">232,937<\/td><td class=\"column-4\">333,000<\/td><td class=\"column-5\">489,956<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">3 million<\/td><td class=\"column-2\">107,650<\/td><td class=\"column-3\">239,330<\/td><td class=\"column-4\">363,240<\/td><td class=\"column-5\">573,175<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">5 million<\/td><td class=\"column-2\">108,321<\/td><td class=\"column-3\">246,573<\/td><td class=\"column-4\">364,087<\/td><td class=\"column-5\">580,720<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">10 million<\/td><td class=\"column-2\">107,803<\/td><td class=\"column-3\">248,855<\/td><td class=\"column-4\">372,912<\/td><td class=\"column-5\">562,936<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">15 million<\/td><td class=\"column-2\">106,651<\/td><td class=\"column-3\">249,671<\/td><td class=\"column-4\">377,283<\/td><td class=\"column-5\">541,927<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">20 million<\/td><td class=\"column-2\">103,626<\/td><td class=\"column-3\">244,921<\/td><td class=\"column-4\">371,402<\/td><td class=\"column-5\">557,460<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-40 from cache -->\n\n\n<figure class=\"gb-block-image gb-block-image-1db3f3b9\"><img decoding=\"async\" width=\"800\" height=\"420\" class=\"gb-image gb-image-1db3f3b9\" src=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-02-performance.png?strip=all&sharp=1\" alt=\"\" srcset=\"https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-02-performance.png?strip=all&amp;sharp=1 800w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-02-performance-300x158.png?strip=all&amp;sharp=1 300w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-02-performance-768x403.png?strip=all&amp;sharp=1 768w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-02-performance.png?strip=all&amp;sharp=1&amp;w=160 160w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-02-performance.png?strip=all&amp;sharp=1&amp;w=480 480w, https:\/\/eujqw4hwudm.exactdn.com\/wp-content\/uploads\/22.055-02-performance.png?strip=all&amp;sharp=1&amp;w=640 640w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/figure>\n\n\n\n<p>The data points indicate the average number of entries written per second for configurations with one, three, five, and ten threads.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-c9172105 gb-headline-text\">Conclusion<\/h2>\n\n\n\n<p>From the preceding figure, we can see that given the same size data set, write speed increases with the number of threads.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>With a single thread, 100,000 data points are written per second.<\/li>\n\n\n\n<li>With five threads, the speed increases to 350,000 data points per second<\/li>\n\n\n\n<li>With ten threads, the speed increases to 550,000 data points per second.<\/li>\n<\/ul>\n\n\n\n<p>The write speed is relatively stable and is not clearly associated with the total size of the data set.<\/p>\n\n\n\n<p>However, it can be seen that the performance improvement per thread declines as the number of threads increases. Going from one to ten threads only increases speed by a factor of five. This may be caused by uneven distribution of data among partitions. Some tasks take longer to complete than others, and this shift increases with the size of the data set.<\/p>\n\n\n\n<p>As an example, if the data set is created with 10 million data points spread across 10 partitions, the distribution of data points per partition is as follows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[bding@vm95 kafka-logs]$ du -h .\/ -d 1\n125M    .\/meters-8\n149M    .\/meters-7\n119M    .\/meters-9\n138M    .\/meters-4\n110M    .\/meters-3\n158M    .\/meters-6\n131M    .\/meters-5\n105M    .\/meters-0\n113M    .\/meters-2\n99M     .\/meters-1\n<\/pre>\n\n\n\n<p>Another factor influencing multithreaded write speed is out-of-order data. Because this test allocates data randomly across partitions, only a single-partition configuration can result in strictly ordered data, which provides the highest performance. As the number of threads is raised, the degree to which the data is out of order increases.<\/p>\n\n\n\n<p>For this reason, it is recommended that all data contained within a subtable is stored in the same Kafka partition.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-f0bd140e gb-headline-text\">Appendix<\/h2>\n\n\n\n<p>The test environment used in this article is described as follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenJDK 1.8.0<\/li>\n\n\n\n<li>CentOS 7.9<\/li>\n\n\n\n<li>64 GB of memory<\/li>\n\n\n\n<li>16-core i7-10700 (x86-64, 2.90 GHz)<\/li>\n\n\n\n<li>HDD storage<\/li>\n\n\n\n<li>TDengine 2.6.0.4<\/li>\n\n\n\n<li>Kafka 3.2<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The TDengine Team developed the TDengine Kafka Connector to simplify Kafka integration. This article shows how the connector syncs data from Kafka to TDengine.<\/p>\n","protected":false},"author":48,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[21],"tags":[],"ppma_author":[100],"class_list":["post-7696","post","type-post","status-publish","format-standard","hentry","category-engineering"],"authors":[{"term_id":100,"user_id":48,"is_guest":0,"slug":"sangshuduo","display_name":"Shuduo Sang","avatar_url":{"url":"https:\/\/tdengine.com\/wp-content\/uploads\/29.04-28-sdsang.jpg","url2x":"https:\/\/tdengine.com\/wp-content\/uploads\/29.04-28-sdsang.jpg"},"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/7696","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/comments?post=7696"}],"version-history":[{"count":8,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/7696\/revisions"}],"predecessor-version":[{"id":24641,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/posts\/7696\/revisions\/24641"}],"wp:attachment":[{"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/media?parent=7696"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/categories?post=7696"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/tags?post=7696"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/tdengine.com\/wp-json\/wp\/v2\/ppma_author?post=7696"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}