How Fast Can We Insert? An Empirical Performance Evaluation of Apache Kafka
HHow Fast Can We Insert? An EmpiricalPerformance Evaluation of Apache Kafka
Guenter Hesse, Christoph Matthies, Matthias Uflacker
Hasso Plattner InstituteUniversity of Potsdam
Germanyfi[email protected]
Copyright ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to serversor lists, or reuse of any copyrighted component of this work in other works. DOI: will follow as soon as the paper is published.
Abstract —Message brokers see widespread adoption in modernIT landscapes, with Apache Kafka being one of the mostemployed platforms. These systems feature well-defined APIsfor use and configuration and present flexible solutions forvarious data storage scenarios. Their ability to scale horizontallyenables users to adapt to growing data volumes and changingenvironments. However, one of the main challenges concerningmessage brokers is the danger of them becoming a bottleneckwithin an IT architecture. To prevent this, knowledge about theamount of data a message broker using a specific configurationcan handle needs to be available. In this paper, we propose amonitoring architecture for message brokers and similar JavaVirtual Machine-based systems. We present a comprehensiveperformance analysis of the popular Apache Kafka platformusing our approach. As part of the benchmark, we study selecteddata ingestion scenarios with respect to their maximum dataingestion rates. The results show that we can achieve an ingestionrate of about 420,000 messages/second on the used commodityhardware and with the developed data sender tool.
Index Terms —performance, benchmarking, big data, ApacheKafka
I. I
NTRODUCTION
In the current business landscape, with an ever-increasinggrowth in data and popularity of cloud-based applications,horizontal scalability is becoming an increasingly common andimportant requirement. Message brokers play a central role inmodern IT systems as they satisfy this requirement and thus,allow for adaptations of the IT landscape to data sources thatgrow both in volume and velocity. Moreover, they can be usedto decouple disparate data sources from applications using thisdata. Usage scenarios where message brokers are employed aremanifold and reach from e.g., machine learning [1] to streamprocessing architectures [2], [3], [4] and general-purpose dataprocessing [5].In the context of a complex IT architecture, the degree,to which a system aligns with its application scenarios andthe functional and non-functional requirements derived fromit, are key [6]. If non-functional requirements related toperformance are not satisfied, the system might become abottleneck. This situation does not directly imply that thesystem itself is inadequate for the observed use case, but mightindicate a suboptimal configuration. Therefore, it is crucialto be able to evaluate the capabilities of a system in certainenvironments and with distinct configurations. The knowledge about such study results is a prerequisite for making informeddecisions about whether a system is suitable for the existinguse cases. Additionally, it is also crucial for finding or fine-tuning appropriate system configurations.The contributions of this research are as follows: • We propose a user-centered and extensible monitoringframework, which includes tooling for analyzing anyJVM-based system. • We present an analysis that highlights the capabilities ofApache Kafka regarding the maximum achievable rate ofincoming records per time unit. • We enable reproducibility of the presented results bymaking all needed artifacts available online .The rest of the paper is structured as follows: In Section IIwe give a brief introduction of Apache Kafka. Section III-Apresents the benchmark setup and the developed data sendertool. Subsequently, we describe the results of the ingestion rateanalyses. Section V introduces related work and Section VIelaborates on the lessons learned. The last chapter concludesthe study and outlines areas of future work.II. A PACHE K AFKA
Apache Kafka is a distributed open-source message brokeror messaging system originally developed at
LinkedIn in2010 [7]. The core of this publish-subscribe system is adistributed commit log, although it has extended its scopethrough extensions. An example is Kafka Streams [8], a clientlibrary for developing stream processing applications.The high-level architecture of an exemplary Apache Kafkacluster is visualized in Figure 1. A cluster consists of multiplebrokers, which are numbered and store data assigned to topics.Data producers send data to a certain topic stored in the cluster.Consumers subscribe to a topic and are forwarded new valuessent to this topic as soon as they arrive.Topics are divided into partitions. The number of topic parti-tions can be configured at the time of topic creation. Partitionsof a single topic can be distributed across different brokers ofa cluster. However, a message order across partitions is notguaranteed by Apache Kafka [10], [9].Next to the number of partitions, it is possible to definea replication factor for each topic, one being the minimum. https://github.com/guenter-hesse/KafkaAnalysisTools a r X i v : . [ c s . PF ] M a r roker _________________________ topic1 / part1 topic1 / part2 topic2 / part1 Producer AConsumer A Consumer CProducer BConsumer BBroker _________________________ topic1 / part1topic2 / part1
Broker _________________________ topic1 / part2 topic2 / part1
Apache Kafka Cluster
Fig. 1. Apache Kafka cluster architecture (based on [9])
This allows preventing data loss in the case of a single brokerfailure. In the context of replication, Apache Kafka defines leaders and followers for each partition. The leader handles allreads and writes for the corresponding topic partition, whereasfollowers copy or replicate the inserted data. In Figure 1, theleader partitions are shown in bold type. The first topic, topic1 has two partitions and a replication factor of one, while topic2 has only one partition and a replication factor of two [10].Figure 2 shows the structure of an Apache Kafka topic,specifically of a topic with two partitions. Each of thesepartitions is an ordered and immutable record sequence wherenew values are appended. A sequential number is assigned toeach topic record within a partition, referred to as an offset .Apache Kafka itself provides the topic consumer offsets forstoring the offsets. However, consumers must manage their offset . They can commit their current offset either automati-cally in certain intervals or manually. The latter can be doneeither synchronously or asynchronously. When polling data,a consumer needs to pass the offset to the cluster. ApacheKafka returns all messages with a greater offset , i.e., all newmessages that have not already been sent to this consumer. Asthe consumer has control over its offset , it can also decide tostart from the beginning and to reread messages [10].
Partition 0Partition 1old new writes
Fig. 2. Apache Kafka topic structure (based on [11])
Furthermore, Apache Kafka can be configured to use the
Lo-gAppendTime feature, which induces Apache Kafka to assigna timestamp to each message once it is appended to the log.The existing alternative, which represents the default value,is
CreateTime . In this setting, the timestamp created by theApache Kafka producer when creating the message, i.e., beforesending it, is stored along with the message. For transmitting messages, a producer can require multiple retries, which wouldincrease the difference between the timestamp assigned witha message and the time when it is appended to the log andthus, made available for consuming applications [10].III. B
ENCHMARK S ETUP
This section introduces the monitoring architecture em-ployed in the ingestion rate study as well as the developeddata sender tool.
A. Monitoring Architecture
The architecture of the monitoring system is shown in Fig-ure 3. We use
Grafana [12], an open-source tool for creatingand managing dashboards and exporting data, as the interfaceto the user. The presented benchmarks employ version 5.4.5of its docker image. OS-level virtualization through dockeris used for ease of installation and replicability of results.The OS base image used in this image allows a simpletime zone configuration via an environment variable, which isimportant for time synchronization among all systems. Laterversions of the image contain a different OS, specifically
Alpine Linux [13], which no longer supports this feature.
Apache Kafka
Message Broker / System Under Test jmxtrans
JVM Metrics Collection
Graphite
Monitoring Tool w/ Storage Functionality
Grafana
Data Visualization (Dashboard) collectd
System Metrics Collection
Fig. 3. Monitoring architecture in Fundamental Modeling Concepts(FMC) [14]
Grafana fetches the data to display from
Graphite [15], anopen-source monitoring tool. It consists of three components:
Carbon , Whisper , and
Graphite-web . Carbon is a service thatretrieves time-series data, which is stored in
Whisper , a persis-tence library.
Graphite-web includes an interface for designingdashboards. However, these dashboards are not as appealingand functionally comprehensive as the corresponding compo-nents of
Grafana , which is why
Grafana is employed. For theinstallation of
Graphite , the official docker image in version1.1.4 is used, again for time zone configuration reasons.
Graphite receives its input from two sources: collectd [16]and jmxtrans [17]. The former is a daemon collecting systemand application performance metrics, that runs on the broker’smachines in the described setup. It offers plugins for gatheringOS-level measurements, such as memory usage, system load,and received or transmitted packages over the network. mxtrans , the other data source for
Graphite , is a tool forcollecting JVM runtime metrics. These metrics are providedvia Java Management Extensions (JMX) [18]. Using jmxtrans we tracked internal metrics, such as JVM memory usage,the number of incoming bytes, and the number of messagesentering Apache Kafka per time unit.Apache Kafka is the system under test (SUT) in theevaluation of this paper. It can be exchanged for any othersystem running in a JVM, i.e., the proposed architecture isnot limited to Apache Kafka or message brokers in general.The information gathered in
Graphite is summarized in a
Grafana dashboard. Exports of the collected
Grafana dataenable further, more detailed analysis.
TABLE IC
HARACTERISTICS OF THE A PACHE K AFKA BROKER NODES
Characteristic ValueOperating system Ubuntu 18.04.2 LTSCPU Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, 8coresRAM 32GBNetwork bandwidth 1Gbit:- measured bandwidth between nodes: 117.5 MB/s- measured bandwidth of intra-node transfer: 908 MB/sDisk min. 13 Seagate ST320004CLAR2000 in RAID 6,access via Fibre Channel with 8Gbit/s:measured write performance about 70 MB/sHypervisor VMware ESXi 6.7.0Kafka version 2.3.0Java version OpenJDK 1.8.0 222
Apache Kafka is installed on three virtual machines fea-turing identical hardware setups and configurations, whichare shown in Table I. We use a commodity network setupwhose bandwidth we determined using ipfer3 [19]. Thewrite performance is measured using the Unix command-linetool dd [20], specifically with the following command: “ ddif=/dev/zero of=/opt/kafka/test1.img bs=1Gcount=1 oflag=dsync ”. The data sender is a Scala appli-cation compiled to a fat jar file and executed using OpenJDK1.8 with the default parallel garbage collector ( ParallelGC ).The data sender is assigned an initial memory allocationpool of 1 GB while the maximum size of this pool is about14 GB. Apache Kafka uses an initial and maximum memoryallocation pool of 1 GB and the
Garbage-First garbage col-lector (G1 GC). Additional arguments passed to the ApacheKafka JVMs are
MaxGCPauseMillis=20 , Initiat-ingHeapOccupanyPercent=35 , as well as
Explic-itGCInvokesConcurrent , which fine-tune the garbagecollection behavior.
B. Data Sender
To study the attainable ingestion rates of Apache Kafka, wedeveloped a configurable data sender tool in the Scala pro-gramming language, which is part of the published artifacts.It uses the Apache Kafka producer class for sending data tothe message broker.
TABLE IIA
PACHE K AFKA DEFAULT PRODUCER PROPERTIES
Property Valuekey-serializer-class org.apache.kafka.common.serialization.StringSerial-izervalue-serializer-class org.apache.kafka.common.serialization.StringSerial-izerbatch-size 16,384 bytesbuffer-memory-size 33,554,432 bytesacks 0
Table II shows the default configuration parameters, i.e., properties , that the data sender applies to the Apache Kafkaproducer. Unless otherwise stated, these are the parametersemployed in the presented measurements in Section IV. AnApache Kafka producer batches messages to lower the numberof requests, thereby increasing throughput. The batch-size property limits the size of these message packages. The usedvalue is the default of 16,384 bytes, as defined in the ApacheKafka documentation [11]. The acks producer property deter-mines the level of acknowledgments for sent messages. Thereare three different options for the acks configuration: • : The producer does not wait for any acknowledgmentand counts the message as sent as soon as it is added tothe socket buffer. • : The leader will send an acknowledgment to the pro-ducer as soon as the message is written to its local log.The leader will not wait until its followers, i.e., otherbrokers, have written it to their log. • all : The leader waits until all in-sync replicas acknowl-edge the message before sending an acknowledgment tothe producer. By default, the minimum number of in-syncreplicas is set to one.In addition to the configuration of the Apache Kafka pro-ducer, the developed data sender tool can be customized. The read-in-ram Boolean setting determines how inputs are read. If read-in-ram is not set, the data source object returns an iteratorobject of the records. If it is set, the source object first loads theentire data set into memory by converting it into a list and thenreturns an iterator object for the created data structure. Unlessotherwise stated, read-in-ram is enabled in the presentedresults. The number of messages the data sender emits pertime unit can be controlled using the java.util. concur-rent.ScheduledThreadPoolExecutor class. It canexecute a thread periodically by applying a configurable delay.Using this parameter, we can determine how many messagesare to be sent per time unit. Each execution sends a singlemessage to Apache Kafka. A configured delay of, e.g., 10K ns,leads to an input rate of 100K messages/second (MPS).IV. I
NGESTION R ATE A NALYSIS
This section presents the Apache Kafka ingestion rateanalysis, starting with a description of the benchmark process.It comprises analyzing three selected input rates with varyingonfigurations regarding acks levels, batch size , data senderlocality, read-in-ram option, and data sender processes.
A. Benchmark Execution Process
Each analysis run lasts ten minutes. The main characteristicstudied is the number of incoming or ingested messages,particularly, the one-minute rate of this key performanceindicator (KPI), i.e., the number of incoming messages duringthe last minute. If not stated otherwise, the data sender isexecuted on the broker server where the topic is stored.To reduce the number of manual steps needed,
Ansible [21]is used for automation. Starting the
Ansible script triggers abuild of the data sender project, the creation of a topic, andthe assignment of this topic to the first of our three ApacheKafka brokers. For all measurements, we use topics with asingle partition and a replication factor of one. Having onepartition is a setting used for scenarios in which the order ofdata is crucial. That is the case as Apache Kafka only makesguarantees for the correct message order within a partition, asoutlined in Section II.After the Apache Kafka topics are prepared, the datasender is started. Subsequently, a rise in the number ofincoming messages of Apache Kafka can be observed usingthe
Grafana dashboard. Once the configured send period isover, the
Ansible script stops and the dashboard charts adaptcorrespondingly. The dashboard data is then exported as CSV.The timeframe of these exports is configurable in
Grafana .We incorporate the data set of the Grand Challenge pub-lished 2012 at the conference Distributed and Event-BasedSystems (DEBS) [22] as input. It contains data captured frommultiple sensors that are combined into single records byan embedded PC within the manufacturing equipment. Onerecord comprises 66 columns with numerical and Booleanvalues. When the end of the input file is reached, the datasender starts again from the beginning.
B. Result Overview
Figure 4 shows the maximum achieved input rates ( ir ) ofApache Kafka for the selected configurations. The input ratesillustrated in all figures are the one-minute rates of incomingMPS, which is a KPI provided by Apache Kafka. For allbenchmark scenarios with the maximum configured input of1,000K MPS, we selected the runs with the most stable inputrates.The highest input rate with about 421K MPS was achievedwith two distinct data sender processes, each sending250K MPS. However, this is less than the configured input rate.With a single data sender configured to send 1,000K MPS, theinput rates are lower. The results for the acks levels of and all are similar with input rates around 340K MPS. Surprisingly,sending messages without waiting for acknowledgment, i.e., acks set to , decreased the achieved input rate. The maximumis at about 294K MPS with increased batch size. In contrast tothe other benchmark scenarios, the achievable input rate withacknowledgments disabled could be positively influenced bya higher batch size without harming the stability of the inputrate. · acks=0, ir=1,000K MPS,batch=65.54kBacks=1, ir=1,000K MPSacks=all, ir=1,000K MPS2 data senders -both remote,acks=1, ir=500K MPS . · . · . · . · Number of Incoming Messages per SecondFig. 4. Ingested messages/second - one-minute rate
C. Input Rate of 100,000 Messages/Second
Figure 5 visualizes the one-minute rate of incoming MPSfor a configured input of 100K MPS. The parameters underinvestigation for this benchmark series are the data senderlocality, the acks level, and the read-in-ram option. Similar toall other observations, an increase in the number of incomingMPS can be seen at the beginning. This is when the data senderis started and the one-minute rate begins to adapt accordingly.Also, a sudden decrease in ingested messages is present inall charts after the data sender has transmitted messages forthe configured duration and has shut down. Consequently, themost interesting part of the evaluations is the data presentedin the center of plots. . . . . · Passed Time in Seconds I n c o m i ng M e ss a g e s / S ec ond Figure 5 further shows that almost all chosen settings reachthe configured input of 100K MPS. The only exception isthe remote; acks=0; read-in-ram=false configuration, which isrepresented by the blue line. In this setting, the data sender wasexecuted remotely, specifically on a 2015 Apple MacBook Pro,which was connected using Ethernet. For the other benchmarkruns, the data sender was executed on the broker wherethe topic is stored. All of the tested acks level and read-in-ram combinations reach the configured ingestion rate of100K MPS. . Input Rate of 250,000 Messages/Second
Figure 6 shows the results for an input rate of 250K MPS.As we already identified the limits of the commodity hardwarein Figure 5, we do not pursue further tests with the laptopconfiguration. Figure 6 highlights the significance of the read-in-ram configuration, detailed in Section III-B. Particularly,the three configurations where read-in-ram is set to true reachthe configured input of 250K MPS, whereas the run where read-in-ram is set to false , represented in blue, does not. Thisconfiguration, where read-in-ram is not active, is not able tohandle more than about 220K MPS. Thus, enabling read-in-ram has a positive influence on the achievable number ofincoming messages per second as the latency for accessingthe main memory is lower than for accessing the disk. It isevident that with read-in-ram disabled, there is a bottleneckat the data sender side at this configured input rate. The datacan not be read as fast as it is required to achieve an ingestionrate of 250K MPS. · Passed Time in Seconds I n c o m i ng M e ss a g e s / S ec ond E. Input Rate of 1,000K Messages/Second
Figure 7 visualizes the results for an input rate of1,000K MPS. As we discovered the limits of configurationswith read-in-ram set to false previously, the parameter is en-abled for all following measurements. Next to testing different acks levels, we analyze the effects of changes to the batchsize . Particularly, we study the default size and a batch size increased by a factor of four, which results in 65.54 kB.None of the configurations reach the configured ingestionrate. While the highest ingestion rates peak at about 420K MPSfor a short period, the lowest one is at about 250K MPS. Thetwo configurations that achieve this maximum peak are theones with a batch size of 65.5 4kB and acks set to and all . However, these are also the only two scenarios where nosteady ingestion rate could be established. The acks level of combined with the default batch size reached the lowestingestion rate. Changing acks to either or all resulted in arise to a rate of about 320K MPS. Concerning the batch size ,the increase resulted in a higher ingestion rate for the scenarioswithout acknowledgments. Specifically, a rise of more than20K MPS can be observed. For the other acknowledgmentsettings, the raised batch size led to an unstable ingestion rate. . . . . · Passed Time in Seconds I n c o m i ng M e ss a g e s / S ec ond Passed Time in Seconds I n c o m i ng M B / S ec ond acks=0;batch=16.38 kB acks=0;batch=65.54 kBacks=1;batch=16.38 kB acks=1;batch=65.54 kBacks=all;batch=16.38 kB acks=all;batch=65.54 kBFig. 8. Incoming MB/second - one-minute rate, configured 1,000K MPS Figure 8 shows the incoming data rate in MB/second, whichis provided by Apache Kafka as the metric
BytesInPerSec .The chart fits the corresponding ingestion rates shown inFigure 7. The highest peaks are at about 90 MB/second.The measured maximum network bandwidth between theApache Kafka brokers is about 117.5 MB/second, see Table I.Therefore, if further network traffic is created, that is notcaptured by the
BytesInPerSec metric, the bandwidth of theemployed commodity network could be a limiting factor inpeak situations if data is sent from a remote host. As weexecuted the data sender on the node storing the correspondingtopic partition, there was intra-node transfer and we usedthe loopback interface with its higher bandwidth of about908 MB/second, which is not a bottleneck. The determinedwrite performance of about 70 MB/second described in Table Iis even closer to the observed limits in Figure 8. Dependingon how optimized Apache Kafka writes to disk, the achievableperformance might be higher. Nevertheless, the observationslead to the conclusion that the ingestion rate is likely to bedisk-bound in the viewed benchmark setting.Figure 9 shows the short-term system load of the brokercontaining the topic partition, which is the server where theata sender is executed. The system load gives an overviewover the CPU and I/O utilization of a server, i.e., also reflectingperformance limits regarding disk writes. It is defined asthe number of processes demanding CPU time, specificallyprocesses that are ready to run or waiting for disk I/O. Figure 9shows one-minute-averages of this KPI. As we are usingservers with an eight-core CPU each, it is desirable that nonode exceeds a system load of eight to do not over-utilize amachine.
Passed Time in Seconds S ho r t - t e r m S y s t e m L o a d acks=0;batch=16.38kB acks=0;batch=65.54kBacks=1;batch=16.38kB acks=1;batch=65.54kBacks=all;batch=16.38kB acks=all;batch=65.54kB Fig. 9. Short-term system load of the Apache Kafka broker containing thetopic partition - one-minute rate, configured 1,000K MPS
Figure 9 reveals that in all settings which led to a steadyinput rate, the broker node has a system load lower than eightand thus, seems to be not over-utilized from a system loadperspective. The two remaining scenarios show the highestsystem loads with a value close to 15, which indicates anover-utilization that could limit the achievable ingestion rate.Interestingly, the system load is not proportional to the corre-sponding ingestion rates. At the time of the peak ingestion rate,e.g., the highest system load has not reached its maximum.That might be an indicator of a growing number of waitingwrite operations.
F. Input Rate of Overall 500K MPS with Two Data Senders
To see if server resources regarding CPU are a limitingfactor, we distributed the data sender. We included the default batch size and left out the measurements where acks are setto all as they are, similar to the previously presented results,practically identical to the runs with acks set to . Figure 10shows the achieved ingestion rates. The blue and green linesillustrate the runs where both senders run locally, i.e., onthe broker node containing the topic partition. The blue linevisualizes the results for disabled acks and the green line thosefor an acks level of . The purple line in Figure 10 representsthe run where one data sender is invoked on the broker thatstores the topic and one data sender at another broker. Thebrown line shows the results for the run where the data sendersare executed on the two brokers that do not store the topic.Our measurements show that the two settings where at leastone data sender is executed remotely lead to the same result:a steady input rate of about 420K MPS. The benchmark runs · Passed Time in Seconds I n c o m i ng M e ss a g e s / S ec ond Passed Time in Seconds I n c o m i ng M B / S ec ond acks=1; both localacks=1; both remote (different hosts)Fig. 11. Incoming MB/second - one-minute rate, configured 500K MPS intotal with two data senders having both senders run locally have a different outcome.Similar to the previous results, the acks set to overalloutperforms acks set to . However, neither configurationreaches a steady input rate. Both have the highest spike atthe beginning, which is a behavior observed before.Next to almost identical trends regarding the ingestion ratescompared to the two benchmark settings presented before,the system loads are also equal with a value consistentlyapproaching 15. This again indicates an over-utilization of theserver. The system load never exceeds a value of three on anyserver with distributed data senders. Nevertheless, the inputrates for the setting with two data senders are the overallhighest on average, with a maximum input rate of about460K MPS. Figure 11 shows the data size characteristics. Passed Time in Seconds N u m b e r o f R ece i v e d P ac k a g e s acks=1 he amount of incoming data in MB/second is visualized forthe setting where both data senders were executed locallyand remotely with acks set to . The maximally achievedinput rate of Figure 10 corresponds to an input rate of about100 MB/s. For the constant input rate where both senders wereexecuted remotely, a size-wise input of close to 92 MB/s isreached. The amount of incoming MB/s exceeds the measuredmaximum write performance mentioned in Table I, whichcould be due to increased parallelization or an optimized wayof storing messages implemented in Apache Kafka. As thefully distributed setting uses the eth0 interface to the broker,the network bandwidth of about 117.5 MB/second applies.Since the reported number of incoming bytes is close to thislimit and metadata or further traffic might not be captured, thenetwork represents a potential bottleneck.Figure 12 visualizes the number of packages received oninterface eth0 exemplary for three benchmark runs with alogarithmic scale on the y-axis. This interface is the onlyone next to the loopback interface on the used servers. Thefigure highlights the differences caused by changes in datasender locality. While the number of received packages isnot impacted if data is only sent from the node where thecorresponding target topic is stored, transmitting data froma remote host significantly increases this KPI. Specifically,no remote data senders result in between 25 and 60 receivedpackages on eth0 . One remote data sender amounts to about30K received packages, while two remote data senders areabout to double this number accordingly. G. Summary
Our benchmark results reveal two main insights: Firstly,although a single data sender can create an input rate of250K MPS as shown in Figure 6, two independently executeddata senders do not reach the expected input rate of 500K MPS.Secondly, we show the influence of where data senders areinvoked. When two data senders are executed in parallel onthe same host, they are able to overwhelm the server or impedeeach other, as the observed system load of about 15 indicates.Another limiting factor can be found in the write-to-diskperformance of the used server and the network bandwidthwhen sending data from a remote host. The observed memoryusage was never close to its limits for any of the presentedbenchmark scenarios.The most promising configuration in the study, which led tostable input rates, has the default batch size and acks set to or all . A stable rate is desirable as it leads to predictable systembehavior. Multiple data senders distributed across nodes areable to increase the achievable ingestion rates. An input rateof about 250K MPS to Apache Kafka can be achieved usinga single data sender.V. R ELATED W ORK
Dobbelaere and Esmaili [23] compare Apache Kafka withRabbitMQ [24], another open-source message broker. In theirwork, they compare both solutions qualitatively and quantita-tively. The impact of different acknowledgment levels is one of the factors the authors evaluated in their study. However,their results do not show a clear difference in the achievedthroughput between an acks level of one and zero in theanalyzed setting.Noac’h, Costan, and Boug´e [25] evaluate the performance ofApache Kafka in combination with stream processing systems.They also study the influence of Apache Kafka characteristics,the producer batch size being one of them. Similar to ourresults, their findings reveal that an increased batch size doesnot necessarily lead to a higher throughput.Kreps et al. [9] present a performance analysis of three sys-tems: Apache Kafka, RabbitMQ, and Apache ActiveMQ [26].Similar to the work presented before, they analyze the influ-ence of the batch size of the Apache Kafka producer. Next tothe producer, they study the Apache Kafka consumer behaviorand compare it to the other systems. The achieved throughputfor Apache Kafka in [9] is in a similar range as the results ofthis paper.Apache Pulsar [27] is a message broker originally developedat Yahoo!. It makes use of the distributed storage serviceApache BookKeeper [28]. Similarly to Apache Kafka, ApachePulsar employs the concept of topics to which producers cansend messages and to which consumers can subscribe. Theblog post [27] presents a brief performance analysis. Thethroughput that was achieved in their study using an SSD is1,800K MPS. However, they do not give details about the testsetup, making it hard to assess the results.Next to these open-source systems, there are commercialsolutions, such as Google Cloud Pub/Sub [29] and AmazonSimple Queue Service (SQS) [30]. Studying further systemswere out of the scope of this paper.VI. L
ESSONS L EARNED
Overall, the conducted performance study shows that esti-mates regarding the performance impact of different ApacheKafka producer configurations, based on experiences andperceptions, are not always true. We particularly emphasizetwo unexpected behaviors that are present in the collectedresults. These findings are related to two configuration optionsof the Kafka producer: the acknowledgments level and thebatch size. We theorized that a lower level of acknowledgmentswould necessarily lead to a higher input rate, as sending ofmessages and waiting for an acknowledgment of their arrivalrepresents an overhead. However, our study shows that thistheory does not hold in all observed cases.Similarly, this is true for the Kafka producer batch sizeconfiguration option. Our expectation was that increasingthe batch size would lead to a higher input rate, since thenumber of send actions can be reduced, also lowering theoverall overhead. The presented performance study revealedthat higher batch sizes can indeed increase the input rate asexpected, see Figure 7. However, although increasing the batchsize may lead to a higher peak input rate, it often caused anon-steady, fluctuating input rate. Additionally, the observedaverage input rates for acks set to and all are lower for theconfiguration with an increased batch size.s a result of these experiences, we highlight the importanceof benchmarking message brokers to explore their behavior inapplication scenarios and to obtain realistic KPI’s. This allowsbasing discussions regarding technology selection on facts.Additionally, it lowers the likelihood of wrong assessments.Regarding the tooling utilized for observing the performancecharacteristics of Apache Kafka, we made use of virtualizationtechnology in the form of Docker containers. This turned outto be a low-effort way of deploying different systems. Also,orchestrating these independently running containers, as in theproposed benchmarking architecture, did not introduce anysignificant management overhead. So for similar settings, werecommend using virtualization technologies due to the easyand fast setup. Additionally, moving to a different server orupdating systems are tasks that can be done with low effort.VII. C ONCLUSION AND F UTURE W ORK
We propose and implement a monitoring architecture forApache Kafka and similar systems running in a JVM. Weincorporate state-of-the-art technologies such as
Grafana and collectd with a focus on ease of use and adaptability forfuture measurements. We performed a performance study ofApache Kafka using our developed benchmarking setup. Weevaluate and discuss our benchmark results for varying datasender configurations. The benchmark artifacts, such as thedata sender tool and the
Grafana dashboard, are published fortransparency and reproducibility.In the configuration featuring a single topic with one parti-tion and a replication factor of one, we achieve a maximumsteady ingestion rate to Apache Kafka of about 420K MPSor about 92 MB/s. We quantified the impact of the ApacheKafka producer batch size, acknowledgment level, data senderlocality as well as of additional aspects on the input rateperformance. We analyzed the server’s behavior during thebenchmark runs to explore potential performance bottlenecks.Our study highlights the influence of the chosen acknowl-edgment level. Configurations with enabled acknowledgmentsshowed better performance, i.e., a higher message input rate,which was counter to our working hypothesis. An analysisof why this behavior was observed is part of future research.Moreover, a comparison of apache Kafka to similar systems,such as Apache Pulsar or RabbitMQ, would be of interest tothe research community. Further work should focus on theanalysis of Apache Kafka producers employed in data streamprocessing frameworks, such as Apache Flink or ApacheBeam. These systems often provide their own Apache Kafkaproducer implementations or interfaces. It would be interestingto investigate if these embedded producers perform differentlyin comparable settings regarding the achievable input rate.Furthermore, it is valuable to know how the input ratebehaves when scaling via, e.g., the number of topic partitions.With a growing number of partitions, scaling the numberof broker nodes becomes an additional dimension whoseinfluence can be measured. The impact of higher replicationfactors is another open domain of future research. R
EFERENCES[1] J. Zhuang and Y. Liu, “PinText: A Multitask Text Embedding Systemin Pinterest,” in
ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining , 2019, pp. 2653–2661.[2] G. Hesse and M. Lorenz, “Conceptual Survey on Data Stream ProcessingSystems,” in
IEEE International Conference on Parallel and DistributedSystems, ICPADS , 2015, pp. 797–802.[3] G. Hesse, C. Matthies, B. Reissaus, and M. Uflacker, “A New Applica-tion Benchmark for Data Stream Processing Architectures in an Enter-prise Context: Doctoral Symposium,” in
ACM International Conferenceon Distributed and Event-based Systems, DEBS , 2017, pp. 359–362.[4] G. Hesse, B. Reissaus, C. Matthies, M. Lorenz, M. Kraus, andM. Uflacker, “Senska - Towards an Enterprise Streaming Benchmark,”in
TPC Technology Conference, TPCTC , 2017, pp. 25–40.[5] M. Zhang, T. Wo, X. Lin, T. Xie, and Y. Liu, “CarStream: An Indus-trial System of Big Data Processing for Internet-of-Vehicles,”
PVLDB ,vol. 10, no. 12, pp. 1766–1777, 2017.[6] M. Lorenz, J. Rudolph, G. Hesse, M. Uflacker, and H. Plattner, “Object-Relational Mapping Revisited - A Quantitative Study on the Impact ofDatabase Technology on O/R Mapping Strategies,” in
Hawaii Interna-tional Conference on System Sciences (HICSS) , 2017, pp. 1–10.[7] J. Koshy, “Kafka Ecosystem at LinkedIn,” https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin, 2016, accessed: 2020-08-13.[8] “Kafka Streams,” https://kafka.apache.org/documentation/streams/, ac-cessed: 2020-08-13.[9] J. Kreps, N. Narkhede, and J. Rao, “Kafka: a Distributed MessagingSystem for Log Processing,” in
Proc. International Workshop on Net-working Meets Databases, NetDB , 2011, pp. 1–7.[10] “Documentation - Kafka 0.10.2 Documentation,” https://kafka.apache.org/documentation/, accessed: 2017-04-24.[11] “Kafka 2.3 Documentation,” https://kafka.apache.org/documentation/,accessed: 2020-08-13.[12] “Grafana Labs - The open platform for beautiful analytics and monitor-ing,” https://grafana.com, accessed: 2020-08-13.[13] “alpine linux,” https://alpinelinux.org, accessed: 2020-08-13.[14] A. Kn¨opfel, B. Gr¨one, and P. Tabeling, “Fundamental Modeling Con-cepts,”
Effective Communication of IT Systems
ACM International Conferenceon Distributed Event-Based Systems, DEBS , 2012, pp. 393–398.[23] P. Dobbelaere and K. S. Esmaili, “Industry Paper: Kafka versus Rab-bitMQ: A comparative study of two industry reference publish/subscribeimplementations,” in
ACM International Conference on Distributed andEvent-based Systems, DEBS