Debugging Behaviour of Embedded-Software Developers: An Exploratory Study
Pansy Arafa, Daniel Solomon, Samaneh Navabpour, Sebastian Fischmeister
DDebugging Behaviour of Embedded-SoftwareDevelopers: An Exploratory Study
Pansy Arafa, Daniel Solomon, Samaneh Navabpour, Sebastian Fischmeister
Dept. of Electrical and Computer EngineeringUniversity of Waterloo, Canada { parafa, d4solomo, snavabpo, sfischme } @uwaterloo.ca Abstract —Many researchers have studied the behaviour ofsuccessful developers while debugging desktop software. Inthis paper, we investigate the embedded-software debugging byintermediate programmers through an exploratory study. Thebugs are semantic low-level errors, and the participants arestudents who completed a real-time operating systems course inaddition to five other programming courses. We compare betweenthe behaviour involved in successful debugging attempts versusunsuccessful ones. We describe some characteristics of smoothand successful debugging behaviour.
I. I
NTRODUCTION
Embedded systems represent 98% of all computers(DARPA, 2000). An embedded system must accurately fulfillits functional requirements in addition to strict timing and re-liability constraints. Examples of embedded systems are high-performance networks and robotic controllers. More applica-tions of embedded systems are safety critical. They include thenew generation of airplanes and spacecraft avionics, and thebraking controller in automobiles. The correctness of thesesystems depends, not only on the results they produce, butalso on the time at which these results are produced [1].The verification process of such systems is complicated, andconsequently many testing techniques and tools exist to ensuresystem’s accuracy. That intensifies the necessity of enhancingdeveloper’s debugging skills.As an essential part of software-system development, de-bugging is a difficult and expensive task. A study in 2002has revealed that software engineers spend an average of70%-80% of their time testing and debugging [2]. We candefine debugging roughly as the process of error detectionand maintenance of functional correctness. For desktop appli-cations, many researchers conducted experiments to determinethe behaviour which promotes successful bug repair and timemanagement.In this paper, we focus on the characteristics of successfuldebugging of embedded software. Debugging of embeddedsoftware is challenging due to (1) hardware interaction, e.g.,loading code to the target board, (2) use of low-level languagesemantics, e.g., memory management in C, and (3) the needto respect the system’s timing requirements. We believe theenhancement of the developers’ debugging skills makes thecurrent research work in the embedded-systems domain morepromising and beneficial for the industrial market. That re-search work includes testing of embedded automotive commu- nication systems [3], time-aware instrumentation of embeddedsoftware [4], [5], and debugging of concurrent systems [6].In this paper, we present an exploratory study, in which14 intermediate-level programmers debug semantic errors ofembedded software. In the study, we use seven distinct bugsclassified into two categories: incorrect hardware configurationand memory leaks. We explore the debugging behaviour ofthe participants and provide comparisons between successfuland unsuccessful attempts. Moreover, we introduce the activityvisitation pattern: a graphical representation for the debuggingbehaviour. II. R
ELATED W ORK
Many program-comprehension studies, since the 1970s,have inspected software debugging techniques of developers.As one of the first, Gould examined the behaviour of ex-perienced programmers while debugging non-syntactic errorsin FORTRAN programs [7]. One goal of such studies isunderstanding the activities associated with the debugging andtesting processes [8], [9], [10], [11]. For instance, Murphy etal. [10] presented a qualitative analysis of debugging strate-gies of novice Java programmers. The authors stated twelvecategories of debugging strategies used by the subjects and dis-cussed their effectiveness, e.g., tracing, pattern matching, andusing resources. Another objective is the identification of thedifficulties faced by the developers during software-change ormaintenance tasks. Examples of debugging challenges includebug localization [11], understanding unfamiliar code [11], andgaining a broad understanding of the system [12]. Moreover,there exists a wide range of empirical and exploratory studiesthat examined the debugging techniques of different experi-ence levels of programmers, and also, compared successful tounsuccessful behaviour [13], [14], [15], [16]. Lastly, implica-tions for teaching and evaluation of the existing developmenttools based on the studies’ insights are presented in [13],[12], [17], [10], and [14]. To our knowledge, this paper isthe first work to address embedded-software debugging, whichincludes low-level programming and hardware interaction.III. M
ETHODOLOGY
A. Participants
The study’s participants are intermediate-level developers.Each of them met the following criteria: a r X i v : . [ c s . S E ] A p r They have completed a
Real-time Operating Systems course in the previous year in addition to five differ-ent programming courses throughout their undergraduatestudies. • In the
Real-time Operating Systems course, they imple-mented a real-time operating system (RTOS) on a JanusMCF5307 ColdFire Board similar to the study’s platform.They also have worked on the same RTOS in anothercourse. • They have had an average of four co-operative work terms(about 2400 working hours in total).Participants meeting these criteria have sound knowledge ofthe board and the type of program they are going to de-bug. Hence, these participants have intermediate programmingskills for embedded-software development. We scheduled eachparticipant in a separate two-hour time slot.
B. System
The participants attempted to fix bugs contained within asmall real-time operating system (RTOS) implemented on aJanus MCF5307 based microcontroller board. The RTOS is inC language; it consists of 23 C files and 15 header files. Intotal, it has 3085 lines of code (LOC). That RTOS is similarto the one they have implemented in the previous year. Sothe participants in our study are familiar with such low-level software and have reasonable experience with the RTOS.The RTOS supports a basic multiprogramming environmentwith five priority levels and preemption. It also supportssimple memory management, inter-process communication,basic timing services, an I/O system console, and debug-ging support. The RTOS contains a fully integrated timer,a dual UART (universal asynchronous receiver/transmitter),and several other peripheral interface devices. The RTOS andthe application processes use up to one megabyte of RAM.Software development for the RTOS is supported by gcc andthe ColdFire/68K cross-compiler. During the study sessions,we provided the participants with the RTOS documents andthe Janus MCF5307 ColdFire Board manual.
C. Bugs
This study includes seven semantic bugs categorized intoincorrect-hardware-configuration bugs and memory leaks.Misconfiguration and memory errors constitute a notable shareof the root causes of software bugs [18], [19]. Each bug isprovided with a report that states the buggy behaviour of thesystem. The participant examines only one bug at a time.The first bug category is incorrect hardware configuration.Bug-1 report states the following: ”Interaction with the OSdoes not result in understandable messages; corrupted textappears on the screen.” Bug-1 is an incorrect assignmentof the serial port baud rate (Line 54 in Listing 1). Thus,the system shows corrupted text on the screen when com-municating. Bug-2 has an incorrect value of timer’s matchregister. This causes the timer interval to be three timesslower than intended. Bug-7 causes the board to reset aboutsix seconds after launching the OS because the function resetWatchdog() resets the Watchdog timer to a wrong value. i n t i n i t U A R T I P r o c ( ) { . . . / ∗ S e t t h e baud r a t e ( 1 9 2 0 0 ) ∗ / SERIAL1 UBG1 = 0 x00 ; SERIAL1 UBG2 = 0 x49 ; / ∗ S e t c l o c k mode ∗ / SERIAL1 UCSR = 0xDD ; . . . } Listing 1. Bug-1 code
The second-category bugs contain memory leaks. In Bug-3,the OS stops responding shortly after launching one of theuser processes because it fails to release messages that itreceives. This causes the OS to run out of user memoryafter two iterations of the process. Listing 2 shows thecode portion of Bug-3. Two statements are missing: re-lease memory block(pMsgDelay) directly after Line 110, and release memory block(pMsgIn) directly before Line 115. InBug-4, there is a missing function call in an interrupt processto release messages that it prints to the terminal. This causesthe OS to run out of user memory after some interaction. Asimilar problem occurs in Bug-5; the Wallclock process fails torelease the delay messages. As a result, the OS stops respond-ing some time after launching the Wallclock process. In Bug-6,although the proper release memory block() calls exist in theOS, no memory is actually freed since the boundary checksof the user memory address range are incorrect. v o i d
ProcC ( ) { . . . w h i l e ( 1 ) { . . . i f ( ( pMsgIn − > pData ) [ 0 ] == 0 ) { pMsgOut = r e q u e s t m e m o r y b l o c k ( ) ; . . . s e n d m e s s a g e ( PID , pMsgOut ) ; w h i l e ( 1 ) { pMsgDelay = r e c e i v e m e s s a g e ( ) ; i f ( pMsgDelay − > msg type == wakeup ) { break ; } e l s e { . . . } } } r e l e a s e p r o c e s s o r ( ) ; } } Listing 2. Bug-3 code
D. Data Collection
Our exploratory study involved three methods for datacollection:1) We video-recorded the computer screen during the studysessions using CamStudio [20]. Three observers codedthe videos using the CowLog software [21] to minethe videos and extract information revealing the de-bugging behaviour. For each bug per participant, thevideo-mining process resulted in time-stamped files thatshow (1) the examined bug, (2) the activity, e.g., coderowsing and editing, (3) the code module, and (4) thedebugging technique such as tracing and adding printstatements.2) Each time a participant compiles the system, the systemis copied to a .try folder. The .try folders are used laterto view the edits made in every compilation try.3) Each participant filled out a pre-session and a post-session form in addition to a post-bug form for eachbug he examined.IV. R
ESULTS AND D ISCUSSION
This section lists our observations based on the collecteddata. These observations (1) highlight the debugging activitiesused by the developers to fix the low-level bugs, and (2)compare between the successful debugging attempts versus theunsuccessful ones. Table I presents a summary for the sevenbugs in the study.
Definitions.
The total examining time of a bug is consideredto be the whole time spent by the participant to investigatethat bug (including all debugging activities). The total editingtime refers to the entire time spent editing the program, whichincludes insertion, modification, and deletion of code lines. Wedefine the time spent to locate a bug as the time elapsed beforethe first edit to the right function. Additionally, we define thetime to fix a bug as the difference between the total examiningtime and the time to locate the bug, which is calculated onlyfor bugs that have been fixed. Finally, we consider a bug islocated if the participant edited the right function.
Observation 1.
Locating the bug is harder than fixing it. In93.75% of the post-bug forms filled out by the participants,they considered the difficulty of locating the bug to be higherthan or equal to the difficulty of fixing the bug (30 out of 32).In all but one cases, the time spent to locate the bug is higherthan the time spent to fix it. In total, the participants locatedonly 63.82% of the bugs (30 out of 47).
Observation 2.
We investigated the activity visitation pat-tern , which shows the frequency of transition from eachactivity to another while examining a bug. The activities arecode browsing, code editing, document reading, compiling,and testing. Figure 1(a) shows the activity visitation patternof the successful debugging attempts, i.e., the cases were theparticipants successfully fixed the bug. Figure 1(b) presentsthe pattern of the unsuccessful attempts where the participantfailed to fix the bug. Every node represents a debuggingactivity. We calculated the average value of the number oftransitions between each two activities, and accordingly, set thethickness of the transitions. Thus, the thickness of a transitionrepresents the frequency of switching from the activity of thesource node to that of the destination node.In general, the unsuccessful attempts have a higher numberof transitions than the successful ones. The average numberof transitions between activities are 43.11 and 26.97 in theunsuccessful and successful attempts consequently. We believethat the high number of transitions in the unsuccessful attemptsconveys the indecisive behaviour of the participants which canbe a reason for failing to fix the bugs. Figure 1(a) demonstrates the path of code browsing fol-lowed by code editing then compiling and finally testing. Thissequence repeats until the bug is fixed. Although that pathexists in Figure 1(b), many other transitions interpose it. Webelieve that the mentioned path supports a smooth and suc-cessful debugging process. In Figure 1(a), the transition fromthe code browsing activity to the testing one is frequent. Thismay convey the attempts of the participants to comprehendthe program based on its output. (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6) (cid:7)(cid:8)(cid:9)(cid:10) (cid:11)(cid:6)(cid:12)(cid:8)(cid:13)(cid:3)(cid:14) (cid:15)(cid:3)(cid:16)(cid:17)(cid:9)(cid:18)(cid:6) (cid:19)(cid:6)(cid:5)(cid:10) (a) Successful attempts. (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6) (cid:7)(cid:8)(cid:9)(cid:10) (cid:11)(cid:6)(cid:12)(cid:8)(cid:13)(cid:3)(cid:14) (cid:15)(cid:3)(cid:16)(cid:17)(cid:9)(cid:18)(cid:6) (cid:19)(cid:6)(cid:5)(cid:10) (b) Unsuccessful attempts.Fig. 1. Activity visitation patterns
Observation 3.
Successful debugging behaviour considersalternatives , which means considering that a bug can havemultiple causes. In Bug-3, two lines of code are missing,although, many participants inserted only one line. Elevenparticipants examined this bug and 10 located it. However,only three participants successfully fixed it.
Observation 4.
Successful debugging behaviour impliesediting only one function in a compilation try. This is equiv-alent to avoiding multiple edits behaviour. The absence ofmultiple edits in all tries correlates with low total editing time,low time to locate bug, and low total examining time. However,the presence of multiple edits in any try does not necessarilycorrelate with high total editing time, high time to locate bug,etc. The multiple edits behaviour was avoided in 61.11% ofthe successful debugging attempts (11 out of 18).
ABLE IB UG S UMMARY
ID Name Category Examined By Located By Fixed By Average Examining Time
Observation 5.
Successful debugging behaviour requiresrunning the first compilation try without any changes to theprogram to investigate the behaviour of the buggy system.88.88% of the successful attempts maintained this behaviour(16 out of 18). 86.66% of the located-bug cases maintainedthis behaviour (26 out of 30). 75% of the participants whoedited the program before the first compilation try failed tofix the bug (6 out of 8).
Observation 6.
Successful debugging behaviour maintainsa smooth approach to locating and fixing bugs with almost noretesting for a previously tested function. That is equivalentto avoiding ping-pong behaviour . It means, according to theediting location, moving far from the bug after approaching itat least twice. Figure 2 shows an example of the ping-pongbehaviour in Participant-7 debugging session of Bug-3. Thex-axis represents the number of the compilation try. The y-axis represents the location in the program that the participantedited in each try. There are four editing categories:1) The participant edits the right function which containsthe bug. That means the participant located the bug.2) In the right file which includes the bug, the participantedits any function except the right one.3) The participant edits somewhere other than the filecontaining the bug.4) The participant compiles the system without making anymodifications.Figure 2 shows a non-smooth debugging process. The partici-pant made no code changes in the first compilation try. In thesecond try, the participant edited a function somewhere otherthan the right file. Then, he modified the right function in thethird and the fourth tries, which means he located the bug;but in the fifth and the sixth tries, he again edited a functionfar from the bug location. He switched between editing theright function and editing somewhere else four more times. Itlooks like random edits in different functions, files, and codemodules. The absence of the ping-pong behaviour correlateswith low time spent to locate bug and low total examiningtime, but the presence of the ping-pong behaviour does notnecessarily correlate with high time spent to locate bug orhigh total examining time. 88.88% of the successful debuggingattempts did not include the ping-pong behaviour (16 out of18). Also, 80% of the located bug cases did not include theping-pong behaviour (24 out of 30), and 75% of the ping-pongbehaviour cases are unfixed (6 out of 8).
Consecutive Compilation Tries1 2 3 4 5 6 7 8 9 10 11 12 13NoEditSomewhereElseRightFileRightFunction l l l l l l l l l l l l l
Fig. 2. Example of the ping-pong behaviour (Participant-7, Bug-3)
V. T
HREATS TO V ALIDITY
Video Coding.
The process of analyzing the videos is chal-lenging due to the difficulty of categorizing human behaviour.Three observers coded the videos such that the videos ofsix participants were coded by two observers in conjunction.To validate the information extracted from the videos, wecalculated the inter-rater reliability value using Fleiss’ kappamethod for categorical ratings. It is equal to 0.74 whichindicates substantial agreement among the three observers.
Learning Effect.
The bugs’ identifiers did not introducebiased results. While the bugs are numbered in the sameorder, the participants were free to choose the bugs to inspect.They examined different sets of bugs (e.g., one participantexamined bugs 1, 3, 5 and 6; another one examined bugs 1, 2,3 and 4). We found no correlation between the examining timeand the bug identifier. Also, there is no correlation betweenthe examining time and the order in which each participantexamined the bugs.
Bug Localization.
We consider a bug is located if theparticipant edited the right function, but in some cases, hemight not recognize that he located it (e.g., inserting a printstatement). Since we can not read the participant’s mind,we had to set such an assumption to provide an accuratemeasurement for locating bugs. On the other hand, it is areasonable assumption since all the bug-containing functionsare short; they have an average of 37 LOC.VI. C
ONCLUSION
In this paper, we presented an exploratory study of thedebugging behaviour of intermediate embedded-software de-velopers. We also demonstrated the use of the activity visita-tion pattern. In general, debugging is a complex and time-consuming task especially for embedded software. Under-standing the debugging behaviour of developers should beachieved before the creation of aiding tools. This work pro-vided results that can guide future researchers on automateddebugging tools for the embedded domain.
EFERENCES[1] A. M. K. Cheng,
Real-Time Systems: Scheduling, Analysis, and Verifi-cation . Wiley-Interscience, 2002.[2] G. Tassey, “The Economic Impacts of Inadequate Infrastructure forSoftware Testing,”
National Institute of Standards and Technology, RTIProject , vol. 7007, no. 011, 2002.[3] E. Armengaud, A. Steininger, and M. Horauer, “Towards a SystematicTest for Embedded Automotive Communication Systems,”
IEEE Trans-actions on Industrial Informatics , vol. 4, no. 3, pp. 146 –155, August2008.[4] P. Arafa, H. Kashif, and S. Fischmeister, “DIME: Time-aware DynamicBinary Instrumentation Using Rate-based Resource Allocation,” in
Pro-ceedings of the Eleventh ACM International Conference on EmbeddedSoftware , ser. EMSOFT ’13. IEEE Press, 2013.[5] H. Kashif, P. Arafa, and S. Fischmeister, “INSTEP: A Static In-strumentation Framework for Preserving Extra-Functional Properties,”in
International Conference on Embedded and Real-Time ComputingSystems and Applications . IEEE, Aug 2013.[6] M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar, andI. Neamtiu, “Finding and Reproducing Heisenbugs In Concurrent Pro-grams,” in
Proceedings of the 8th USENIX Conference on OperatingSystems Design and Implementation , ser. OSDI’08. Berkeley, CA,USA: USENIX Association, 2008, pp. 267–280.[7] J. D. Gould, “Some Psychological Evidence on How People DebugComputer Programs,”
International Journal of Man-Machine Studies ,vol. 7, no. 2, pp. 151IN1171–170IN2182, 1975.[8] A. von Mayrhauser and A. M. Vans, “Program understanding behaviorduring debugging of large scale software,” in
Papers Presented at theSeventh Workshop on Empirical Studies of Programmers , ser. ESP ’97.New York, NY, USA: ACM, 1997, pp. 157–179.[9] A. Karahasanovi´c, A. K. Levine, and R. Thomas, “Comprehensionstrategies and difficulties in maintaining object-oriented systems: Anexplorative study,”
The Journal of Systems & Software , vol. 9, no. 80,pp. 1541–1559, 2007.[10] L. Murphy, G. Lewandowski, R. McCauley, B. Simon, L. Thomas, andC. Zander, “Debugging: The Good, The Bad, and The Quirky – AQualitative Analysis Of Novices’ Strategies,”
SIGCSE Bull. , vol. 40,pp. 163–167, March 2008.[11] S. Fitzgerald, R. McCauley, B. Hanks, L. Murphy, B. Simon, and C. Zan-der, “Debugging from The Student Perspective,”
IEEE Transactions onEducation , pp. 390–396, 2010.[12] J. Sillito, K. De Voider, B. Fisher, and G. Murphy, “Managing softwarechange tasks: An exploratory study,” in
Empirical Software Engineering,2005. 2005 International Symposium on . IEEE, 2005, pp. 10–pp.[13] M. P. Robillard, W. Coelho, and G. C. Murphy, “How Effective Develop-ers Investigate Source Code: An Exploratory Study,”
IEEE Transactionson Software Engineering , vol. 30, no. 12, pp. 889–903, Dec 2004.[14] J. Sillito, G. C. Murphy, and K. D. Volder, “Asking and answeringquestions during a programming change task,”
IEEE Transactions onSoftware Engineering , vol. 34, no. 4, pp. 434–451, July 2008.[15] L. Murphy, S. Fitzgerald, B. Hanks, and R. McCauley, “Pair Debugging:A Transactive Discourse Analysis,” in
Proceedings of the Sixth interna-tional workshop on Computing education research , ser. ICER ’10. NewYork, NY, USA: ACM, 2010, pp. 51–58.[16] A. V. Mayrhauser and A. M. Vans, “Identification of dynamic compre-hension processes during large scale maintenance,”
IEEE Transactionson Software Engineering , vol. 22, no. 6, pp. 424–437, Jun 1996.[17] F. Ricca, M. Di Penta, M. Torchiano, P. Tonella, and M. Ceccato, “TheRole of Experience and Ability in Comprehension Tasks Supported byUML Stereotypes,” in
Software Engineering, 2007. ICSE 2007. 29thInternational Conference on . IEEE, 2007, pp. 375–384.[18] Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai, “Have ThingsChanged Now?: An Empirical Study of Bug Characteristics in ModernOpen Source Software,” in
Proceedings of the 1st Workshop on Archi-tectural and System Support for Improving Software Dependability , ser.ASID ’06. ACM, 2006, pp. 25–33.[19] Z. Yin, M. Caesar, and Y. Zhou, “Towards Understanding Bugs in OpenSource Router Software,”