[PDF] Agreeing to Cross: How Drivers and Pedestrians Communicate

Abstract

The contribution of this paper is twofold. The first is a novel dataset for studying behaviors of traffic participants while crossing. Our dataset contains more than 650 samples of pedestrian behaviors in various street configurations and weather conditions. These examples were selected from approx. 240 hours of driving in the city, suburban and urban roads. The second contribution is an analysis of our data from the point of view of joint attention. We identify what types of non-verbal communication cues road users use at the point of crossing, their responses, and under what circumstances the crossing event takes place. It was found that in more than 90% of the cases pedestrians gaze at the approaching cars prior to crossing in non-signalized crosswalks. The crossing action, however, depends on additional factors such as time to collision (TTC), explicit driver's reaction or structure of the crosswalk.

Full PDF

AAgreeing To Cross: How Drivers and Pedestrians Communicate*

Amir Rasouli, Iuliia Kotseruba and John K. Tsotsos Abstract — The contribution of this paper is twofold. The ﬁrstis a novel dataset for studying behaviors of trafﬁc participantswhile crossing. Our dataset contains more than 650 samplesof pedestrian behaviors in various street conﬁgurations andweather conditions. These examples were selected from approx.240 hours of driving in the city, suburban and urban roads.The second contribution is an analysis of our data fromthe point of view of joint attention. We identify what types ofnon-verbal communication cues road users use at the point ofcrossing, their responses, and under what circumstances thecrossing event takes place.It was found that in more than % of the cases pedestriansgaze at the approaching cars prior to crossing in non-signalizedcrosswalks. The crossing action, however, depends on additionalfactors such as time to collision, explicit driver’s reaction orstructure of the crosswalk. I. INTRODUCTIONThe fascination with autonomously driving vehicles goesas far back as mass production of early automobiles. Sincethe early 1920s the automotive industry has witnessed numer-ous attempts to achieve full autonomy in the form of radiosignal controlled cars [1], wire following vehicles [2], lanedetection and car following [3] and, in more recent works,the cars that can fully autonomously drive roads under certainconditions [4].Despite such success stories in autonomous control sys-tems, designing fully autonomous vehicles suitable for urbanenvironments still remains an unsolved problem. Aside fromchallenges associated with developing suitable infrastructure[5] and regulating the autonomous behaviors [6], one of themajor dilemmas faced by autonomous vehicles is to howto communicate with other road users in a chaotic trafﬁcscene [7]. In addition to ofﬁcial rules that govern the ﬂowof trafﬁc, humans often rely on some form of informalrules resulting from non-verbal communication among themand anticipation of the other trafﬁc participants’ intentions.For instance, pedestrians intending to cross a street wherethere is no stop sign or trafﬁc signal often establish eyecontact with the driver to ensure that the approaching carwill stop for them. Other forms of non-verbal communicationsuch as hand gestures or body posture are also used toresolve ambiguities in typical trafﬁc situations. Furthermore,the characteristics of a road user (e.g. age and gender),the physical environment (the structure of the crosswalk, *This work was supported by the Natural Sciences and EngineeringResearch Council of Canada (NSERC), the NSERC Strategic Network forField Robotics (NCFRN), and the Canada Research Chairs Program throughgrants to JKT. The authors are with The Department of Electrical Engineering andComputer Science and Center for Vision Research York University, Toronto,Canada. { aras ; yulia k ; tsotsos } @eecs.yorku.ca Fig. 1: An overview of joint attention in crossing. Thetimeline of events is recovered from the behavioral data andshows a single pedestrian crossing the parking lot. Initially,the driver is moving slow and, as he notices the pedestrianahead, slows down to let her pass. At the same time thepedestrian crosses without looking ﬁrst then turns to check ifthe road is safe, and, as she sees the driver yielding, continuesto cross.weather, etc.) and even cultural differences make estimatingthe intention of trafﬁc participants particularly challenging[8].Our contribution in the proposed work is twofold. First,we introduce a novel visual dataset for detection and analysisof pedestrians’ behaviors while crossing (or attempting tocross) the street under various conditions. We call this datasetJoint Attention in Autonomous Driving (JAAD). Then wepresent some of our ﬁndings regarding the course of actionstaken and non-verbal cues used by pedestrians in differentcrossing scenarios. We show that the crossing behavior canbe inﬂuenced by various contextual elements such as cross-way structure, driver’s behavior, distance to the approachingvehicles, etc. II. R

ELATED W ORKS

A. Studies of driver and pedestrian interaction

Numerous psychological studies examined the behaviorsof drivers and pedestrians before crossing events. Usually, thefollowing aspects are considered: the likelihood of the driveryielding ([9], [10], [11]), driver awareness of the pedestrian[12], [13] and pedestrian’s decision making [14], [15]. Mul-tiple factors affecting these behaviors have been identiﬁed:vehicle speed and time to collision (TTC) ([16], [17]), sizeof the gap between the vehicles [18], geometry and otherfeatures of the road (signs and delineation) [14], weatherconditions [15], crossing conditions (whether pedestrian iscrossing from a standstill or walking), number of pedestrians a r X i v : . [ c s . R O ] F e b rossing [18], gender and age of the drivers and pedestrians[14], eye contact between the pedestrian and the driver ([11],[19]), etc.Typically, the interactions between the trafﬁc participantsare treated mechanistically. For instance, TTC takes into ac-count the speed of the vehicle and distance to the pedestrianand is thought to affect his/her crossing behavior ([20], [17],[21], [16]).However, several recent studies show that non-verbal com-munication is also important for determining the intentionsof trafﬁc participants. For example, drivers are more likelyto yield if they are looked at by the pedestrian waiting tocross ([11], [19]).In a psychological experiment by Schmidt et al. [17]participants were unable to correctly evaluate pedestrians’crossing intentions based only on the trajectories of theirmotion, suggesting that parameters of body language (pos-ture, leg and head movements) are valuable cues.In computer vision and robotics, passive approaches areprevalent for predicting pedestrians’ actions during the cross-ing. These works mainly look at the dynamic factors in thescene such as pedestrians’ trajectories [22] and velocities[23] or try to predict the changes in the behavior of pedes-trians crossing as a group [24].In more recent works, the pedestrian’s body language isused as a means of predicting behavior [25], [26]. In theseworks, head orientation is associated with the pedestrian’slevel of awareness, however, the learning is crude and thecontext is not taken into account. For instance, driver’sreaction or vehicle’s speed as well as the structure of thecrossway such as presence of a trafﬁc signal or width of thestreet is not considered. B. Existing Datasets

There are many datasets for pedestrian detection intro-duced by the computer vision and robotics communities. Toname a few, KITTI [27], Caltech pedestrian detection bench-mark [28] and Daimler Pedestrian Benchmark Dataset [29].These datasets are accompanied by ground truth informationin the form of bounding boxes, stereo information, sensorreadings and occlusion tags.To the best of our knowledge, there are no datasetsfacilitating the study of pedestrians’ crossing behavior. Mostof the data for the relevant psychological studies is collectedat select locations and involves direct observation by theresearchers on site. Another potential source is data collectedfor Naturalistic Driving Studies (NDS). These are introducedto eliminate observer’s effect and aggregated large volumesof data on everyday driving patterns over an extended periodof time. A number of such studies have been launched inthe USA [30], [31], Europe [32], Asia [33] and Australia[34]. Although these studies produced petabytes of videorecordings of everyday driving situations, at present theprocessing of this data has been focused on identifyingcrash and near-crash events and factors that caused them.Since access to the raw NDS data is restricted and onlygeneral anonymized statistics are available, we conducted a small-scale naturalistic driving study and extracted data onthe non-verbal communication occurring between the trafﬁcparticipants in various situations. The following sectionsdiscuss data collection procedure, general statistics and thepreliminary results of our study.III. T HE JAAD D

ATASET

The JAAD dataset was created to study the behavior oftrafﬁc participants. The data consists of 346 high-resolutionvideo clips (5-15s) showing various situations typical forurban driving. These clips were extracted from approx. 240hours of driving videos collected in several locations. Twovehicles equipped with wide-angle video cameras were usedfor data collection (Table I). Cameras were mounted insidethe cars in the center of the windshield below the rear viewmirror.TABLE I: Properties of the samples in the database.

55 Toronto, Canada × GoPro HERO+276 Kremenchuk, Ukraine × Garmin GDR-356 Hamburg, Germany × Highscreen Black Box Connect5 New York, USA × GoPro HERO +4 Lviv, Ukraine × Highscreen Black Box Connect

The video clips represent a wide variety of scenariosinvolving pedestrians and other drivers. Most of the data iscollected in urban areas (downtown and suburban), only afew clips are ﬁlmed in rural locations. The samples cover avariety of situations such as pedestrians crossing individually,or as a group, pedestrians occluded by objects, walking alongthe road and many more. The dataset contains fewer clipsof interactions with other drivers, most of them occur inuncontrolled intersections, in parking lots or when anotherdriver is moving across several lanes to make a turn.The videos are recorded during different times of the day,and under various weather and lighting conditions. Some ofthem are particularly challenging, for example, sun glare.The weather also can impact the behavior of road users,for example, during the heavy snow or rain people wearinghooded jackets or carrying umbrellas may have limitedvisibility of the road. Since their faces are obstructed it isalso harder to tell if they are paying attention to the trafﬁcfrom the driver’s perspective.We attempted to capture all of these conditions for furtheranalysis by providing two kinds of annotations for the data:bounding boxes and textual annotations. Bounding boxes areprovided only for cars and pedestrians that interact with orrequire the attention of the driver (e.g. another car yieldingto the driver, pedestrian waiting to cross the street, etc.).Bounding boxes for each video are written into an XML ﬁlewith frame number, coordinates, width, height, and occlusionﬂag. The textual annotations are created using the BORIS2software for video observations [35]. It allows to assignpredeﬁned behavior labels to different subjects seen in thevideo, and can also save some additional data, such as videoﬁle id, the location where the observation has been made,etc. (see Fig. 1 for an example). http://data.nvision2.eecs.yorku.ca/JAAD dataset/. Ethics certiﬁcate Fig. 2: Joint attention motifs of pedestrians. Diagram a) shows a summary of 345 sequences of pedestrians’ actions beforeand after crossing. Diagram b) shows 92 sequences of actions when pedestrians did not cross. Vertical bars represent actionscolor-coded as the precondition to crossing, attention , reaction to driver’s actions, crossing or ambiguous actions. Curvedlines between the bars show connections between consecutive actions. The thickness of lines reﬂects the frequency of theaction in the ’crossing’ or ’non-crossing’ subset. The sequences longer than 10 actions (e.g. when the pedestrian hesitatesto cross) are extremely rare and are not shown.We save the following data for each video clip: weather,time of the day, age and gender of the pedestrians, locationand whether it is a designated crosswalk.Each pedestrian is assigned a label (pedestrian1, pedes-trian2, etc.). We also distinguish between the driver inside thecar and other drivers, which are labeled as Driver and car1,car2, etc. respectively. This is necessary for the situationswhere two or more drivers are interacting. Finally, a rangeof behaviors is deﬁned for drivers and pedestrians: walking,standing, looking, moving, etc. A more detailed example oftextual annotation can be found in [36].IV. T HE D ATA

In our data, we observed high variability in the behaviorsof pedestrians at the point of crossing/no-crossing withmore than 100 distinct patterns of actions. For instance,Fig. 2a shows sequences of actions during the completedcrossing scenarios found in the dataset. Two typical patterns,”standing, looking, crossing” and ”crossing, looking”, coveronly half of the situations observed in the dataset. Similarly,in / rd of non-crossing scenarios (Fig. 2b) pedestrians arewaiting at the curb and looking at the trafﬁc. Otherwise, thebehaviors vary signiﬁcantly both in the number of actionsbefore and after crossing and in the meaning of particularactions (e.g. standing may be both a precondition and areaction to driver’s actions).For further analysis we split these behavioral patterns into9 groups depending on the initial state of the pedestrian and whether the attention or the act of crossing is happening.We list these actions and the number of samples in TableII. Here attention refers to the ﬁrst moment the pedestrian isassessing the environment and expressing his/her intentionto the approaching vehicles, therefore it is considered as aform of non-verbal communication.Visual attention takes two forms: looking and glancing.Looking refers to the scenarios in which the pedestrian in-spects the approaching car (typically for 1 second or longer),assesses the environment and in some cases establishes eyecontact with the driver. The other form of attention, glance,usually lasts less than a second and is used to quickly assessthe location or speed of the approaching vehicles. Pedestriansglance when they have a certain level of conﬁdence inpredicting the driver’s behavior, e.g. the vehicle is stoppedor moving very slowly or otherwise is sufﬁciently far awayand does not pose any immediate danger.V. O BSERVATIONS AND A NALYSIS

Our data contains various scenarios in which pedestriansare observed during or prior to crossing. Two categories fromTable II, crossing and action , are omitted from the analysis.Since these crossing scenarios do not demonstrate the fullcrossing event, it is difﬁcult to assess the behavior of thepedestrians at the point of crossing. As for the action casesthe intentions of the pedestrians are ambiguous. For example,pedestrians are not approaching the curb or are standing faraway from the crossway.ABLE II: The behavioral patterns observed in the data.

Behavior Sequence Meaning Number of Samples

Crossing The pedestrian is observed at the point of crossing and no attention is taking place

Crossing + Attention The pedestrian is observed at the point of crossing and some form of attention is occurred Crossing + Attention + Reaction The pedestrian is observed at the point of crossing and some form of attention is occurred and the pedestrian changes behavior PreCondition + Crossing The pedestrian is walking/standing and crosses without paying attention Precondition + Attention + Crossing The pedestrian is walking/standing and crosses after paying attention

Precondition + Attention + Reaction + Crossing The pedestrian is walking/standing, pays attention and changes behavior prior to crossing Action The pedestrian is walking/standing and his/her intention is ambiguous Action + Attention The pedestrian is about to cross and pays attention Action + Attention + Reaction The pedestrian is about to cross, pays attention and responds Total

A. Forms of non-verbal communication

In the course of a crossing event, pedestrians often usedifferent forms of non-verbal communication (in more than % of the cases in our dataset). The most prominentsignal to transmit the crossing intention is looking ( %)or glancing ( %) towards the coming trafﬁc. Other formsof communication are rarer, e.g. nodding (as a form ofgratitude and acknowledgement) and hand gesture (as aform of gratitude or yielding), and are usually performedin response to the driver’s action.The pedestrians’ response to the communication is notalways explicit and is often realized as a change in theirbehavior. For instance, when a pedestrian slows down orstops it could be an indicator of noticing the vehicle ap-proaching or driver not yielding. Table III summarizes theforms of communication and responses observed in the data.In this table we distinguish between the primary and sec-ondary occurrence of attention. The primary attention is theﬁrst instance when the pedestrian inspects the environmentprior to crossing. The secondary attention refers subsequentinspection of the environment or checking the trafﬁc whilecrossing.TABLE III: Forms of pedestrians communication and re-sponse. Primary (PO) and secondary occurrence (SO) ofattention. Form of Communication Number of Occurrencesattention PO looking 328glance 37SO looking 106glance 19response stop 71clear path 29slow down 24speed up 14hand gesture 13nod 11

B. Attention occurrence prior to crossing

As mentioned earlier there are scenarios in which pedestri-ans do not pay attention to the moving trafﬁc. To investigatethe probability of attention occurrence, one important factorto consider is TTC or how long it takes the approachingvehicle to arrive at the position of the pedestrian, given thatthey maintain their current speed and trajectory.The relationship between attention occurrence and TTC isillustrated in Fig. 3. Crossing without attention comprisesonly about % of all crossing scenarios out of whichmore than % of the cases occurred when TTC is above10s (including situations where the approaching vehicle is Fig. 3: Relationship between TTC and probability of atten-tion occurring prior to crossing. (a) (b) Fig. 4: The pedestrian attention frequency at a) designatedand b) non-designated crosswalks.stopping). There is also no cases of crossing without attentionwhen TTC is less than s .The context in which the crossing takes place also playsa role in crossing behavior. The context can be described byfactors such as the weather conditions, street structure anddriver’s reaction. Since analyzing all these factors is beyondthe scope of this paper, here we only look at the effect ofthe street structure.There are two factors that characterize a crosswalk:whether it is designated (there is a zebra crossing or trafﬁcsignal) and its width (measured as the number of lanes).In our samples, crossing without attention only happenedin non-designated crosswalks when TTC was higher than seconds (see Fig. 4).The full crossing events happen in street with widthsranging from 1 (narrow one-way streets) to 4 lanes (mainstreets).We report on the data by dividing the results into 4intervals with respect to the TTC values and in each category,we group them based on the number of lanes (see Fig. 5).As illustrated, when TTC is below 3s there is no occurrenceof crossing without attention in streets wider than 2 lanes.In fact, only % of the crossings happened in streets wider a) (b)(c) (d) Fig. 5: Attention occurrence with respect to the number oflanes.Fig. 6: Average duration of the pedestrian’s attention priorto crossing based on TTC for different age groups.than 2 lanes.The duration of attention or how fast pedestrians tend tobegin crossing from the moment they gaze at the approachingcar also may vary. As illustrated in Fig. 6, the duration oflooking depends on time to collision. The further away thevehicle is from the pedestrians, the longer it will take themto assess the intention of the driver, hence they will attendlonger. The gaze duration increases up to a maximum safeTTC threshold (from 7s for adults up to 8s for elderly) afterwhich it dramatically declines when the vehicle is either faraway or stopped. In addition, the elderly pedestrians in com-parison to adults and children tend to be more conservativeand spend on average about 1s longer on looking prior tocrossing.

C. Crossing action post attention occurrence

Although the pedestrian’s head orientation and attentivebehavior are strong indicators of crossing intention, theyare not always followed by a crossing event. In addition toTTC, which reﬂects both the approaching driver’s speed andtheir distance to the contact point, the structure of the street (a) crossing and crosswalk property (b) non-designated(c) zebra crossing (d) trafﬁc signal

Fig. 7: Pedestrians crossing behavior at crosswalks withdifferent properties.and the driver’s reaction can impact the pedestrians level ofconﬁdence to cross.To investigate this we divide the crosswalks into threecategories: non-designated , without zebra or trafﬁc signal, zebra-crossing , with either zebra or/and a pedestrian crossingsign and trafﬁc signal with a signal such as trafﬁc light orstop sign which forces the driver to stop.Fig. 7a shows that pedestrians are less likely cross thestreet after communicating their intention if the crosswalkis not designated and more likely to cross if some form ofsignal or dedicated pathway is present.To understand under what circumstances the crossingtakes place in different crosswalks, we look at the driver’sreaction to the pedestrian’s intention of crossing. The driver’sbehavior can be grouped into speeds (when the driver eithermaintains the current speed or speeds up), slows down and stops .Figs. 7b and 7c show that when there is no trafﬁc signalpresent, in the majority of the cases pedestrians cross if thedriver acknowledges their intention of crossing by slowingdown or stopping. In few scenarios, the pedestrian stillcrosses the street even though the vehicle accelerates. Inthese cases either TTC is very high (average of 25.7 s) or thecar is in a trafﬁc congestion and the pedestrian anticipatesthat the car would shortly stop. Moreover, crossing alsomight not take place when the driver slows down or stops(even in the presence of a trafﬁc signal) (see Fig. 7b and7d).In these cases either the pedestrian hesitates to cross orexplicitly (often by some form of hand gesture) yields tothe driver. VI. CONCLUSIONSPedestrians often engage in various forms of non-verbalcommunication with other road users. These include gazing,and gesture, nodding or changing their behavior. At thepoint of crossing, in more than % of the cases pedestriansuse some form of attention to communicate their intentionof crossing. The most prominent form of attention (orprimary communication) is looking in the direction of theapproaching vehicles. The duration of looking also may varydepending on age of the pedestrian or time to collision.Other forms of explicit communication such as noddingor hand gesture were observed in % of the cases as aresponse to the driver’s action and often were used to showgratitude, acknowledgement or to yield to the driver.The crossing event does not always follow the ﬁrst com-munication of intention. Crossing depends on additionalfactors such as the structure of the street (e.g. designated/non-designated, the width of the street), the driver’s reaction tothe communication or time to collision (how soon the driverarrives at the crosswalk).Future work will include analysis of pedestrians’ gaitpatterns with and without attention during the crossing. Inaddition, to better assess the nature of communication itwould be beneﬁcial to record driver’s data such as driver’sgestures, eye movements and any reaction that involveschanging the state of the vehicle.R EFERENCES[1] F. Kroger, “Automated driving in its social, historical and culturalconte,”

Autonomous Driving, Technical, Legal and Social Aspects , pp.41–68, 2016.[2] M. Mann, “The car that drives itself,”

Popular Science , vol. 175, no. 5,p. 76, 1958.[3] E. D. Dickmanns, B. Mysliwetz, and T. Christians, “An integratedspatio-temporal approach to automatic visual guidance of autonomousvehicles,”

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBER-NETIC , vol. 20, no. 6, pp. 1273–1284, December 1990.[4] S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron,J. Diebel, P. Fong, J. Gale, M. Halpenny, G. Hoffmann, K. Lau,C. Oakley, M. Palatucci, V. Pratt, and P. Stang, “Stanley: The robot thatwon the darpa grand challenge,”

Journal of Field Robotics , vol. 23,no. 9, pp. 661–692, 2006.[5] B. Friedrich, “The effect of autonomous vehicles on trafﬁc,”

Au-tonomous Driving, Technical, Legal and Social Aspects , pp. 317–334,2016.[6] T. M. Gasser, “Fundamental and special legal questions for au-tonomous vehicles,”

Autonomous Driving, Technical, Legal and SocialAspects , pp. 523–551, 2016.[7] W. Knight, “Can this man make ai more human,”

MIT TechnologyReview

Autonomous Driving, Technical, Legal and Social Aspects , pp. 103–124, 2016.[9] D. Sun, S. Ukkusuri, R. F. Benekohal, and S. T. Waller, “Modeling ofmotorist-pedestrian interaction at uncontrolled mid-block crosswalks,”

Urbana , vol. 51, 2002.[10] K. Salamati, B. Schroeder, D. Geruschat, and N. Rouphail, “Event-based modeling of driver yielding behavior to pedestrians at two-laneroundabout approaches,”

Transportation Research Record: Journal ofthe Transportation Research Board , no. 2389, 2013.[11] N. Gu´eguen, S. Meineri, and C. Eyssartier, “A pedestrians stareand drivers stopping behavior: A ﬁeld experiment at the pedestriancrossing,”

Safety Science , vol. 75, pp. 87–89, 2015.[12] Y.-C. Lee, J. D. Lee, and L. N. Boyle, “The interaction of cognitiveload and attention-directing cues in driving,”

Human Factors: TheJournal of the Human Factors and Ergonomics Society , vol. 51, no. 3,pp. 272–280, 2009. [13] Y. Fukagawa and K. Yamada, “Estimating driver awareness of pedes-trians from driving behavior based on a probabilistic model,” in

Intelligent Vehicles Symposium (IV) . IEEE, 2013, pp. 1155–1160.[14] A. Tom and M.-A. Grani´e, “Gender differences in pedestrian rule com-pliance and visual search at signalized and unsignalized crossroads,”

Accident Analysis and Prevention , vol. 43, no. 5, pp. 1984–1801, 2011.[15] R. Sun, X. Zhuang, C. Wu, G. Zhao, and K. Zhang, “The estimationof vehicle speed and stopping distance by pedestrians crossing streetsin a naturalistic trafﬁc environment,”

Transportation research part F:trafﬁc psychology and behaviour , vol. 30, pp. 97–106, 2015.[16] N. Lubbe and J. Davidsson, “Drivers comfort boundaries in pedestriancrossings: A study in driver braking characteristics as a function ofpedestrian walking speed,”

Safety Science , vol. 75, pp. 100–106, 2015.[17] S. Schmidt and B. Farber, “Pedestrians at the kerb–recognising theaction intentions of humans,”

Transportation research part F: trafﬁcpsychology and behaviour , vol. 12, no. 4, pp. 300–310, 2009.[18] T. Wang, J. Wu, P. Zheng, and M. McDonald, “Study of pedestrians’gap acceptance behavior when they jaywalk outside crossing facil-ities,” in , 2010.[19] Z. Ren, X. Jiang, and W. Wang, “Analysis of the inﬂuence of pedes-trians eye contact on drivers comfort boundary during the crossingconﬂict,”

Procedia Engineering , vol. 137, pp. 399–406, 2016.[20] R. R. Oudejans, C. F. Michaels, B. van Dort, and E. J. P. Frissen,“To cross or not to cross: The effect of locomotion on street-crossingbehavior,”

Ecological Psychology , vol. 8, no. 3, pp. 259–267, 1996.[21] E. Du, K. Yang, F. Jiang, P. Jiang, R. Tian, M. Luzetski, Y. Chen,R. Sherony, and H. Takahashi, “Pedestrian behavior analysis using110-car naturalistic driving data in usa,” in , 2013.[22] T. Bandyopadhyay, C. Z. Jie, D. Hsu, M. H. A. Jr., D. Rus, andE. Frazzoli, “Intention-aware pedestrian avoidance,” in

The 13thInternational Symposium on Experimental Robotics , 2013.[23] S. Pellegrini, A. Ess, K. Schindler, and L. V. Gool, “You’ll never walkalone: Modeling social behavior for multi-target tracking,” in , 2009, pp. 261–268.[24] W. Choi and S. Savarese, “Understanding collective activities of peoplefrom videos,”

IEEE TRANSACTIONS ON PATTERN ANALYSIS ANDMACHINE INTELLIGENCE , vol. 36, no. 6, pp. 1242–1257, 2014.[25] J. Kooij, N. Schneider, F. Flohr, and D. M. Gavrila, “Context-basedpedestrian path prediction,” in

In European Conference on ComputerVision (ECCV) , 2014, pp. 618–633.[26] A. T. Schulz and R. Stiefelhagen, “Pedestrian intention recognitionusing latent-dynamic conditional random ﬁelds,” in

Intelligent VehiclesSymposium (IV) , 2015.[27] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics:The kitti dataset,”

International Journal of Robotics Research (IJRR) ,2013.[28] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection:An evaluation of the state of the art,”

PAMI , vol. 34, 2012.[29] M. Enzweiler and D. M. Gavrila., “Monocular pedestrian detection:Survey and experiments,”

IEEE Trans. on Pattern Analysis and Ma-chine Intelligence , vol. 31, no. 12, pp. 2179–2195, 2009.[30] Shrp2 naturalistic driving study. [Online]. Available: https://insight.shrp2nds.us/[31] Virginia tech transportation institute data warehouse. [On-line]. Available: http://forums.vtti.vt.edu/index.php?/ﬁles/category/2-vtti-data-sets/[32] Y. Barnard, F. Utesch, N. Nes, R. Eenink, and M. Baumann, “Thestudy design of udrive: the naturalistic driving study across europefor cars, trucks and scooters,”

European Transport Research Review ,vol. 8, no. 2, 2016.[33] N. Uchida, M. Kawakoshi, T. Tagawa, and T. Mochida, “An investi-gation of factors contributing to major crash types in japan based onnaturalistic driving data,”

IATSS Research , vol. 34, no. 1, pp. 22–30,2010.[34] A. Williamson, R. Grzebieta, J. Eusebio, Y. Wu, J. Wall, J. L. Charlton,M. Lenne, J. Haley, B. Barnes, and A. Rakotonirainy, “The australiannaturalistic driving study: From beginnings to launch,” in

Proceedingsof the 2015 Australasian Road Safety Conference , 2015.[35] O. Friard and M. Gamba, “Boris: a free, versatile open-sourceeventlogging software for video/audio coding and live observations,”

Methods in Ecology and Evolution , vol. 7, no. 11, 2016.[36] I. Kotseruba, A. Rasouli, and J. K. Tsotsos, “Joint attention inautonomous driving (jaad),” arXiv preprint arXiv:1609.04741arXiv preprint arXiv:1609.04741