Research Challenges in Management and Compliance of Policies on the Web
aa r X i v : . [ c s . C Y ] J u l Research Challenges in Management and Compliance of Policies on the Web
Holger M. Kienle ∗ University of VictoriaVictoria, Canada [email protected]
Hausi A. M ¨ullerUniversity of VictoriaVictoria, Canada [email protected]
Abstract
In this paper we argue that policies are an increasingconcern for organizations that are operating a web site. Ex-amples of policies that are relevant in the domain of the webaddress issues such as privacy of personal data, accessi-bility for the disabled, user conduct, e-commerce, and in-tellectual property. Web site policies—and the overarchingconcept of web site governance—are cross-cutting concernsthat have to be addressed and implemented at different lev-els (e.g., policy documents, legal statements, business pro-cesses, contracts, auditing, and software systems). For websites, policies are also reflected in the legal statements thatthe web site posts, and in the behavior and features that theweb site offers to its users. Both policies and software tendto evolve independently, but at the same time they both haveto be kept in sync. This is a practical challenge for opera-tors of web sites that is poorly addressed right now and is,we believe, a promising avenue for future research. In thispaper, we discuss various challenges that policy poses forweb sites with an emphasis on privacy and data protectionand identify open issues for future research.
Keywords:
Internet, legal factors, compliance control
1. Introduction “ There is hardly a government in the world that does nothave some form of policy about the World Wide Web. ”– Diffily [11]
In this paper, we want to raise awareness that policy is-sues for web sites have to be addressed during the site’swhole life cycle, including requirements, design, develop-ment and maintenance. Policy compliance and manage-ment are parts of web site governance. Web sites are chal-lenging from the perspectives of governance and policy, be-cause they operate in a highly homogeneous environment(e.g., multiple languages, corporate entities, and jurisdic-tions) with diverse users. There are increasingly complexsites (web 2.0 sites and sites of e-tailers) that offer func- tionality that rivals shrink-wrapped consumer products—and site complexity translates to policy complexity. Fur-thermore, there may be a number of different stakeholdersinvolved that have different goals and objectives for the site.One can distinguish three important groups of stakeholders:the web site operator, the web site’s users, and lawmakers.Understanding and addressing policy issues and require-ments from the beginning has a number of potential benefitsfor the development of web sites. Policy requirements forthe web site’s domain can be elicited during requirementsengineering, thus making them a part of the entire softwarelife cycle. This way, issues such as privacy and security thatmay impact the system’s architecture and design can be ad-dressed early on. Thus, addressing policy issues from thestart can prevent costly changes in subsequent developmentor maintenance activities.Policies and software have to be kept in sync. A pol-icy can evolve for various reasons (e.g., changing businessneeds, lobbying by stakeholders, or legal developments)and there needs to be process and tool support to trace pol-icy changes down to the code. Conversely, changes in thecode can (unwittingly) implement behavior that contradictsa policy. Hence, there should be static and dynamic checksto verify policy compliance and to detect policy violations.It should not be taken lightly by an organization if its website violates a policy. Generally policies such as privacy andterms of use statements can be seen as binding contracts be-tween the operator of the site and its users. If the policytouches on users’ rights, a violation may result in a loss ofreputation and trust, causing users to abandon the site. Forexample, studies show that users are often concerned abouttheir privacy [5], and as a result may be outraged about aviolation of the site’s privacy policy. If the policy is man-dated by law, violations can cause lengthy and costly legalactions, severe penalties, and temporary shut-down of thesite. Thus, risk mitigation demands that operators of websites have to re-evaluate constantly the conformance of thebehavior of their site with respect to various policies.The rest of the paper is organized as follows. In Sec-tion 2 we start out with a primer on information technologyovernance in general and web site governance in particularbecause both topics are closely related to policy manage-ment. In Section 3 we give an overview of the diversity ofpolicy issues that web sites have to address. In Section 4we focus on privacy policies as an example to expose issuesthat need be addressed in policy management and compli-ance for web sites. Privacy policies are a opportune examplebecause organizations are free to define their own policies,but they are also constrained by national laws and consid-eration of users’ trust. Furthermore, privacy needs to beaddressed for internal operation, but it also needs to be com-municated to users of the web site. Based on the discussionof privacy policies, we discuss in Section 5 the complex-ity of policy management for different kinds of web sites(brochure-ware, e-commerce, and web 2.0), and describe inSection 6 implications and research challenges that organi-zations have to tackle for policy management and compli-ance. Section 7 closes with conclusions.
2. IT and Web Site Governance “ Compliance with regulations regarding internal controls,financial reporting, and privacy is now a substantial catalystfor companies to understand and invest in addressinginformation risk challenges. ”– Ernst & Young Global Information Security Survey [12]
Policy management and compliance of software systemsare increasingly important activities that need to be ad-dressed by most organizations. One of the trends identifiedby the Ernst & Young Global Information Security Surveyis that “the impact of compliance continues to grow” [12].Policy management and compliance is typically an inte-gral part of information technology (IT) governance , whichis driven by the realization that an organization’s softwaresystems are the hub around which its business activities re-volve. Consequently, an organization’s IT capabilities canno longer be treated as a black box by the companies stake-holders [44]. Instead, various stakeholders across the orga-nization have to coordinate and resolve policy issues. Onekey challenge for stakeholders is “balancing compliance atall costs, compared to compliance at an affordable cost”[30].Web sites are part of an organization’s IT infrastructureand as such have to be incorporated into IT governance.Furthermore, web sites are an organization’s interface to thepublic. Many organizations and businesses use web sites topost information for, and to interact with consumers. Theterm web site governance [11] has been suggested to em-phasize the importance of web sites and to stress that websites pose particular challenges for governance. In fact,policy compliance of a web site (e.g., a banking portal) isequally important to a (backend) software system (e.g., a financial transactions system). Furthermore, there may becomplex interactions between them.Depending on the nature of an organization, governancecan be more lightweight or heavyweight. Diffily believesthat web site governance “does not require a huge and un-wieldy bureaucracy, just some plainly written guidelinesand clear executive oversight” [11, p. 288]. Diffily sug-gests to have a web site management team (WMT) that isin charge of the web site and (1) defines the organizationsweb strategy, (2) sets the site’s high level goals and ensuresthe achievement of the goals, and (3) monitors overall per-formance. The WMT is also responsible to ensure that pro-cesses for site management are in place and that these pro-cesses address policy issues adequately. In order to enablethe WMT to operate effectively, there should be dedicatedtool support that assists in the tasks of policy managementand compliance checking. These tools should probably op-erate at different levels of granularity, including high-leveldashboards [24] and lower-level information such as policygoals and ontologies.
3. Policy and Legal Issues of Web Sites “ No longer an information ‘wild, wild, west,’ the Internetincreasingly is influenced by legal considerations. ”– Baker in [18]
Policy issues for web sites touch on many diverse areas.An organization may have a security policy, privacy policy,corporate identity policy, ethics policy, customer care pol-icy, etc.Often policy issues interact with laws or regulations. Forexample, an increasing number of countries have data pro-tection laws that web sites have to adhere to; however, evenif a country has no such law it may still make good sensefor a web site to address this issue in their privacy policy toincrease consumer trust. Another cross-cutting issue is ju-risdiction because web sites are accessible by users world-wide. Depending on the nature of the site (e.g., financialor healthcare), domain-specific policy and legal issue mayemerge. An extreme example is a gambling web site, be-cause it targets a domain that is heavily regulated by moststates and countries.Examples of typical policy issues for web sites that areinfluenced by legal considerations are [11] [19] [18] [38]: criminal damage:
A web site may (inadvertently) causeharm to a site’s user. A site may host a virus (e.g.,in a web page’s JavaScript or a downloadable file) thatdeletes data on the user’s computer. In this case, thesite operator may be liable for negligence. To giveanother example, courts have applied trespass law toprohibit frequent, unwanted spidering of a web site if meaningful harm to the site’s computer system (e.g.,resource drain) could be shown [34]. freedom of expression:
Certain kinds of content on websites raise the issue of free speech and its restrictions(e.g., product defamation, slander and libel, companysecrets, and hateful ideas). Generally, it is often dif-ficult to decide if free speech applies or not. For ex-ample, is unwanted but harmless email protected byfree speech, or could such email be prohibited based ontrespass law? Since the freedom of expression variessignificantly by jurisdiction, the decision where to op-erate a site may be significant. intellectual property:
The content, design, functionality,and domain name of a web site may be protected byintellectual property (i.e., copyright, trademarks, andpatents). On the one hand, a web site site has to pro-tect its own intellectual property; on the other hand, thesite has to ensure that it does not violate the intellec-tual property rights of third parties. In the context ofcopyright (and freedom of expression) it is significantthat posting material to a web site is considered an actof publishing. electronic commerce:
E-commerce refers to the trading ofgoods and services over the Internet. In response to thelarge volume of commerce on the Web, legislation hasintroduced rules to govern online transactions. As aresult, agreements made via the Internet and electronicsignatures are legally binding, consumer rights protec-tion applies to goods purchased on the web, purchaseson the web are taxable, etc. accessibility for the disabled:
A web site should be acces-sible to users with disabilities. The U.S. has amendedthe Rehabilitation Act to require Federal agencies tomake their web site accessible (Section 508). As aresult, persons with disabilities may file administra-tive complaints or bring civil actions in Federal courtagainst agencies that fail to comply with the require-ments of Section 508. There are commercial web sitesthat have chosen to address accessibility in their websites. For example, General Electric posts an explicitstatement on its site that states its current accessibilityfeatures ( ).The above issues are meant to illustrate the broad rangeof policies that interact with legal requirements. There arecertainly other policy considerations that a web site needsto address. Another important topic is privacy and data pro-tection, which is discussed in detail in Section 4.
4. Data Protection and Privacy Policies “ Traditionally, policy specification has not been an explicitpart of the software development process. This isolationof policy specification from software development oftenresults in policies that are not in compliance with systemrequirements and/or organizational security and privacypolicies, leaving the system vulnerable to data breaches. ”– He et al. [16]
Privacy can be defined as “the ability of an individual orgroup to seclude information about themselves and therebyreveal themselves selectively” [45]. In the context of in-formation technology, an important concern is data protec-tion of digitally stored information (e.g., health, criminal,financial, genetic, ethnic, and location information). Impor-tantly, more low-level data should be also considered privatesuch as stated user preferences (e.g., language setting) andinteractions with the system (e.g., executed search enginequeries, and visited web pages and sites).In response to growing privacy concerns, many countrieshave passed laws that govern the treatment of sensitive data.As any software system, web sites have to meet legal obliga-tions. The European Union has enacted Directive 95/46/ECin 1995 [40], which requires organizations that collect per-sonal data to register with the government and to take pre-cautions against data misuse. Furthermore, organizationshave to inform individuals about the reasons for collectinginformation about them, to provide access to the data and tocorrect wrong data. This directive has to be implementedby states that are members of the EU. For example, theUK has the Data Protection Act (1998), and Germany hasthe
Bundesdatenschutzgesetz (2001). Examples of non-EUcountries with data protection laws are Canada (2004), Aus-tralia (2001), Japan (2005), and Switzerland (1993). In theU.S. there is no single act or law that addresses privacy. In-stead, there are different laws that touch on privacy and dataprotection. For example, the Health Insurance Portabilityand Accountability Act (HIPAA) establishes regulations forthe use and disclosure of health information, the Gramm-Leach-Bliley Act regulates the privacy of customers of fi-nancial institutions, and the Children’s Online Privacy Pro-tection Act of 1998 (COPPA) addresses privacy issues forchildren under 13.What constitutes private data is not easy to decide.Knight and Fitzsimons give the following example: “Somepeople regard receiving a flood of ’junk mail’ as an inva-sion of privacy, others regard it as just a part of modernliving, or even a chance to be informed” [21]. An impor-tant, but unresolved, question in law is who owns the (pri-vate) data of users. According to Taipale, “a fundamentalissue, as yet not fully resolved to everyone’s satisfaction in http://ec.europa.eu/justice_home/fsj/privacy/law/implementation_en.htm he context of emerging technologies, is whether data about an individual (whether disclosed by that individual or oth-erwise obtained) should ‘belong’ to that individual in anykind of sense that would invoke legal mechanisms of on-going control—i.e., some notion of property—or perhapseven a renewal of ‘expectations’ of privacy for secondaryuses—after it [is] shared or otherwise becomes known” [37,p. 154]. For example, if a web site collects informationabout a user (e.g., search queries), does the user own thatdata or the collecting entity?There are potential legal concerns whenever a web sitecollects, processes and stores data that contains informationof or about its users. In the following, we briefly give ex-amples of web site features that are interacting with privacypolicies. logging of data: The log files of a web site may store in-formation about the interactions of a user with the sys-tem. Weitzner et al. advocate policy-aware transactionlogs that are responsible for “recording information-use events that may be relevant to the assessment ofaccountability to some set of policies” [43]. There isoften a clash between privacy protection on the onehand, and security and auditing concerns on the other. On the one hand, there may be legal requirements thatmake it necessary that data about users is retained fora certain period of time. On the other hand, loggeddata may have to be anonymized in order to protect theprivacy of users. For privacy protection it is importantwhether the logs guarantee certain properties such as complete anonymity [13]. profiling and personalization:
Many commercial websites collect data of user interactions to personalizepages [22] [42]. For example, Amazon generatesrecommendations of books that are personalized bythe user’s own history of viewing and buying booksas well as the buying habits of other “similar” users.Google is working on personalized searches that takea person’s search history into account [36]; the user’ssearch history is kept in the Google Web History.For such services, web sites have to ensure that thesite’s privacy policy is communicated to users, thatthe site does not expose private data to other users,and that users have a certain control over the collectedinformation (e.g., correction of wrong information). distribution and transmission of data:
Users of a website will potentially access it from all over the world.However, certain countries have policies that governthe transmission of personal data across borders. The The tension between auditing requirements and privacy for log filesbecomes apparent by the following pun: “If logs mention private informa-tion they are forbidden and if they do not, they are useless” [13].
EU privacy directive mentioned above prohibits the ex-porting of personal data to a country that does not pro-vide an adequate level of privacy protection. So far theEU has recognized few countries, among them Canadaand Switzerland. For countries not recognized by theEU, the user would be required to explicitly give con-sent to the web site operator. communication of policies to the user:
Last but not least,web sites have to post their privacy policy. These poli-cies are important for users to understand how theirdata will be treated by the site operator. Ant´on et al.say, “often, the only guide users have as to how aninstitution will use, disclose, and store sensitive infor-mation is via its online privacy policies. Thus, usersshould expect these privacy policy documents to ac-curately describe an institution’s privacy practices in aclear and easy-to-understand manner” [4]. The follow-ing is an excerpt a legal notice from the web site of alarge American corporation in 1998: “Any visitor to the Valero web site whoprovides information to Valero agrees thatValero has unlimited rights to such infor-mation as provided, and that Valero mayuse such information in any way Valerochooses. Such information as provided bythe visitor shall be non-confidential.”Even though an organization is free to fashion its ownpolicy, in practice there are many legal and ethical con-straints that need to be taken into account. For exam-ple, web sites of healthcare providers have to reflectthe legal requirements of HIPAA in their privacy poli-cies [4]. It seems rather unlikely that the above legalnotice would be considered as adequate nowadays.There are probably many more policy issues that need tobe addressed besides the examples given above. However,these examples are sufficient to expose the complexity ofmanaging privacy policies for web sites.Furthermore, the interactions of policies and web sitesincrease with the complexity of the site. This issue is dis-cussed in more detail in Section 5.
5. Policy Complexity of Web Sites
Policy issues are typically more pronounced for morecomplex web site is. For discussion, we define three kindsof sites with increasing sophistication in terms of function-ality and user interactions: http://ec.europa.eu/justice_home/fsj/privacy/thridcountries/index_en.htm rochure-ware: These sites provide information that userscan browse (e.g., to obtain information about productsand services that they can obtain off-line) [39]. Userdo not have to log on to the site and the site is static inthe sense that it looks the same for all users. e-commerce:
These sites are run by companies that sellproducts online. They may be pure online retailers( e-tailers ) or have a clicks-and-bricks hybrid businessmodel [32]. To place orders, users have to create anaccount. web 2.0:
These sites are characterized by sophisticatedfunctionality that often rival shrink-wrapped softwareproducts (e.g., Google GMail and Adobe PhotoshopExpress ). These sites typically offer a participatoryand interactive user experience [10], which is typicallyrealized with technologies such as AJAX, mashups,blogs, Wikis, and RSS [27].The above classification is an idealization because concreteweb sites typically have features that blur into other groups.For example, a brochure-ware site may have a form or ques-tionnaire that users can fill out to provide feedback to thesite operator, and e-commerce sites often have some kind ofpersonalization (e.g., Amazon’s wishlists) or user-generatedcontent (e.g., book reviews of users).The simplicity of brochure-ware sites makes policy man-agement and compliance comparably easy to accomplish.Since all content is supplied and maintained by sourceswithin the organization, processes can be defined that man-date compliance checks by a central authority. For exampleDiffily suggests that “all ideas for new information or ap-plications must be approved by a WMT before work com-mences. . . . In order to get a development approved, a pro-posal must be submitted and a collective decision madeabout whether it is suitable for the site (perhaps based onadvice from the editor).” [11, p. 294]. As a result, track-ing and assessment of content-related issues such as intel-lectual property, accessibility, and freedom of expressioncan be implemented in a straightforward manner. Sincea brochure-ware site can be realized with static HTML,many security threats are mitigated or do not exists (e.g.,script-based and SQL injection attacks). Furthermore, sincebrochure-ware sites do not ask for personal informationfrom users, there are only privacy issues of tracking themovements of users on the site.Policy issues of e-commerce sites are significantly moredifficult to handle than those for brochure-ware sites. Theweb site itself is more complex because it needs function-ality to manage user accounts and purchasing. As a conse-quence, e-commerce sites have to manage personal data of http://mail.google.com users such as address and billing information. Furthermore,they store—permanently or temporarily—sensitive infor-mation about users such as purchased items and defaults inpayments. As a consequence, the site has to adhere to dataprotection laws of various jurisdictions. For example, orga-nizations may be required to report stolen data to affectedpeople. Following California in 2002, many states in theU.S. have adopted security breach notification laws. Theselaws require an organization to notify all residents withinthe law’s state if it believes that personal, non-public infor-mation of residents has been stolen. The EU is consideringa similar law—amending EU Directives 2002/22/EC and2002/58/EC—that would force telecommunications com-panies to tell customers when personal data security hasbeen breached. Since e-commerce sites do business withusers, they have to adhere to consumer protection laws, han-dle taxation, and deal with fraudulent transactions. Eventhough complexity increases, it appears that many policyaspects can still be managed by a central authority.Web 2.0 sites add further complexity for policy manage-ment and compliance due to several characteristics. Thesesites have typically user-generated content where users are conducers , that is, they “both consume creative works andsimultaneously add creative content to those same works”[33]. As a result content is no longer created exclusivelywithin an organization and as such policy compliance ofcontent is difficult to enforce. Specifically, web sites haveto address liability issues for users’ misuse of intellectualproperty (e.g., copyright infringements), misinformation,slanderous comments or other harmful data. Web 2.0 sitesare realized by using web browsers as thin clients wheremost of the application logic and data storage is performedby distant Internet servers. This approach is discussedas cloud computing [15] and Platform-as-a-Service (PaaS)[23]. Hayes points out that cloud computing “raises awk-ward questions about control and ownership: If you moveto a competing service provider, can you take your data withyou? Could you lose access to your documents if you failto pay your bill? Do you have the power to expunge docu-ments that are no longer wanted?” [15].From a privacy perspective, the service provider has toestablish polices on how to manage and protect personal in-formation entrusted by its users. In contrast to e-commercesites, web 2.0 may collect personal data on a much largerscale. This is especially the case for consolidated infor-mation service providers such as Google that accumulatehuge amounts of personal data depending on the extent ofservices used by a particular user (e.g., various kinds ofsearches, emails, appointments, contacts, documents, newsalerts, and financials) [9]. Web sites that store personal ata also have to be prepared on whether and how they de-fend their users’ privacy rights. Hayes describes the follow-ing scenario: “a government agency presents a subpoenaor search warrant to the third party that has possession ofyour data. If you had retained physical custody, you mightstill have been compelled to surrender the information, butat least you would have been able to decide for yourselfwhether or not to contest the order. The third-party ser-vice is presumably less likely to go to court on your behalf”[15]. In fact, the Department of Justice in the U.S. orderedGoogle in 2005 to hand over two month of search queries.Google refused—primarily on the grounds of trade secrets,but also because of privacy concern—and a court decidedin Google’s favor. From the users’ perspective, published policies of websites are important because they spell out the users’ rightsand obligations. Examples of such statements are a site’sprivacy policy and terms of use. With increasing complex-ity of the web site, these policies also increase in complex-ity. While brochure-ware sites can be satisfied by coveringonly general issues (e.g., license to use, disclaimer, linking,and intellectual property), e-commerce sites also have to ad-dress issues such as order acceptance, pricing information,exporting of goods, and disclaimers for special goods suchas medicines. Web 2.0 sites are also more complex thanbrochure-ware because they have to address complex userinteractions with the site involving personal data.To summarize, web sites have to define policies thatcover a diverse range of issues. Increasing complexity ina web site typically translates into increasing complexity ofpolicy management. Also, a subset of a web site’s policieshas to be communicated to users in the form of statementsposted on the site.
6. Implications and Research Challenges
The implementation and management of policy issues isa major challenge for an organization. Many of the poli-cies have to be reflected in the organization’s web site andkept in synch with internal policies and the web site’s func-tionality. Also, policies can be complex to define and main-tain. For example, Ant´on et al. have analyzed the PrivacyRule of HIPAA, formalizing its content as restricted natu-ral language statements [6]. They found 46 rights and 80obligations that need to be addressed. Presumably, most ofthese rights and obligations apply to the web site’s contentand application logic. In another study, Ant´on et al. showthat privacy policies of web sites have indeed changed sig-nificantly after HIPAA came into effect [4]. Ant´on et al.also have evaluated web site policies in the financial sector; http://googleblog.blogspot.com/search/label/privacy they concluded that “compliance with the existing legisla-tion and standards is, at best, questionable” [3].Policies have to be managed at different levels of ab-straction and in different representations. Figure 1 groupspolicies into three tiers: high-level policies that apply to anorganization as a whole, policies that are specific to the or-ganization’s web site(s), and the implementation of thesepolicies. Ant´on et al. introduce a framework for online pri-vacy policies that is also based on three tiers; they distin-guish the top tier (principles of privacy practices), middletier (security policies), and bottom tier (enforcement in thephysical layer) [2].At the top tier in Figure 1 there are documents that de-fine policies at a high level of abstraction and in naturallanguage. Examples of such policies are legal texts (e.g.,Acts enacted by the U.S. Congress), standards (e.g., W3Caccessibility guidelines), and internal policies of the orga-nization. The latter can be in natural language or a dedi-cated policy language. Such high-level policies have tobe substantiated into web site policies that are posted onthe site. These policies are expressed in natural languageso that users can read them, but also in formal languagesfor automated processing. Transforming a legal text into aformal language such as a goal model can be (partially) au-tomated (e.g., [20]). An example of a dedicated formal lan-guage is the W3C’s Platform for Privacy Preferences (P3P),which is an XML-based format that enables web sites toencode their data collection and data-use practices. Finally,policies have to be encoded in the web site’s content andfunctionality. For example, an operator that decides not tocollect personal information from children (in response toCOPPA) may decide to post notices and to implement safe-guards in their web site that communicate and enforce thisparticular policy. More generally, policies concerning dataprotection of personal information that are communicatedto users, have to be faithfully reflected in the site’s behav-ior. This means for instance that the part of a database thatstores private information has to be safeguarded with ac-cess control mechanisms (e.g., Oracle supports row-levelsecurity with its virtual private database feature [28]). Simi-larly, the application logic needs access control mechanisms(e.g., implemented with OASIS’s eXtensible Access Con- For example, Google’s organizational goals with respect to privacyare outlined in its Code of Conduct available at http://investor.google.com/conduct.html :“As we develop great products that serve our users’ needs,always remember that we are asking users to trust us with theirpersonal information. Preserving that trust requires that eachof us respect and protect the privacy of that information.” For example, Google has a general Privacy Policy (about 1900 words)that is augmented with service-specific privacy policies. For AdWords,Google claims that its “conversion tracking server complies with P3P pri-vacy policies” ( http://adwords.google.com/support/bin/answer.py?hl=en&answer=6358 ). natural language) Standards (natural language)
Org.−Internal Policy (formal/natural language)
An Organization’s High−Level Policies ... (natural language)
Privacy Policy (formal language) ...
P3P SpecificationAccess Control (Orcale VPD and XACML) ...
Logging/AuditingLegal Texts vertical transformationsynchronizationsynchronization vertical transformation
Low−Level Implementation of Web Site PoliciesAn Organization’s Web Site Policies
Figure 1. Policies at different levels of abstractions and transformations of these policies trol Markup Language (XACML) or IBM’s Enterprise Pri-vacy Authorization Language (EPAL) [1]). Furthermore,logging and auditing data that contains private informationmust not violate the privacy policy (e.g., by anonymizingthe data).If policy issues are considered for web sites, they affecttheir whole life cycle and other cross-cutting concerns: requirements analysis and policy representation:
Policies have to be treated as first-class entities inrequirements engineering. Furthermore, they haveto be formalized so that it is possible to analyzeand reason about them. This may be accomplishedwith general formalisms such as goal models [41]or domain-specific representations such as P3P andEPAL. Policies have characteristics of both qualityattributes (e.g., security) and functional requirements(e.g., data retention for a certain amount of time).Also, since policies are often fuzzy and ambiguous itmust be possible to model degrees of uncertainty.Currently, approaches such as P3P are too limited todescribe fine-grained privacy rules [2]. It is an impor-tant research challenge to come up with a suitable rep-resentation, which may be customized from a general approach or developed bottom-up specifically for theweb. policies and design models:
It is quite likely that policiesare impacting the design of web sites. A trivial ex-ample would be a requirement that the privacy policyis accessible (via a hyperlink) from each page of thesite. A more complex example is the impact of users’privacy settings on the data visible to other users. Pol-icy requirements should be reflected as constraints andother annotations in the web site’s application, navi-gation, and presentation model. For example, depend-ing on privacy settings, the navigation model may en-able or disable navigation paths to particular sensitivecontent, and the presentation model may replace onewidget with another (e.g., a name alias instead of theuser’s real name with his or her picture). In order torepresent policies during design, web-design method-ologies such as WebML [8] or OOHDM [35] have tobe suitably augmented. testing for compliance:
Web sites have to be tested forpolicy compliance before they are deployed. In fact,there are many examples of privacy violations causedby wrong implementations of privacy features. Forxample, in Facebook supposedly private annotationswere made visible to all users [14]. Testing for policycompliance is complicated by the fact that web siteshave to support diverse user and environmental pro-files, resulting in highly varied behaviors at the clientside [29].Generally, testing of user interactions of web sites ex-hibits similar problems than GUI testing [25]. It isan open research problem how to effectively specifytest cases for policy compliance and how to (automat-ically) evolve these tests if the web site evolves. monitoring for compliance:
In addition to off-line test-ing, web sites should be monitored at run-time forcompliance. A large corporate presence may be com-posed of multiple web sites that are administered bydifferent entities in different countries. As a result,such sites may resemble systems of systems or evenultra-large scale systems in some respects [31]. Forsuch systems, “policies will have to reconcile diverseand competing objectives while providing completeand unambiguous semantic content sufficiently to gov-ern distributed system development, evolution, and op-eration” [31, p. 119]. For web sites that cannot becentrally controlled and tested, run-time monitoringbecomes increasingly important. A simple examplewould be a crawler that periodically checks whetheraccessibility guidelines, linking policies, and contentdo conform to company policies and legal require-ments.Ideas from autonomic computing could be adapted tomonitor web sites in terms of their data, operations,and communication [26]. Furthermore, if a failure oc-curs in the web site (e.g., because low-level access con-trol has detected a privacy policy violation), strategiesfor reconfiguring and self-healing could be employed(e.g., to gracefully degrade the site’s functionality, orto achieve the desired functionality without violating aparticular access control path). policy transformations:
Policies have to be represented atdifferent levels of abstraction. As a result, it is nec-essary to implement vertical transformation of poli-cies between the three tiers identified in Figure 1. Forexample, general privacy policies of the organizationthat are stated in natural language in the top tier haveto be reflected in the middle tier (e.g., the web site’sprivacy policy, its privacy seal, and P3P specification)as well as the low-level implementation (e.g., databaseand client/server-side code).There are also horizontal transformations (e.g., trans-lation of a policy encoded in a formal-language repre-sentation into a human-readable web policy in the mid- dle tier). It is an open question how to transform pol-icy requirements into policy constraints. Furthermore,policy constraints should be automatically translatedinto testing code and run-time checks. policy synchronization and traceability:
Because poli-cies are encoded in different forms at differentlevels of abstraction (cf. Figure 1), they have to bere-synchronized if one policy changes. Synchro-nization is facilitated if policies are formalized inmachine-readable representations and if they can beautomatically transformed. However, natural languagepolicies may not be fully automated. Furthermore,seemingly unrelated changes in the behavior and op-erations of the web site may contradict certain policyrules as an undesirable side-effect. For example, alow-level data replication features (that is transparentto upper layers) may be in violation of privacy law ifthe replication involves data transmissions that crossborders (e.g., to a data center that is located in anothercountry).To effectively synchronize polices and to handle errors,policies have to be traceable. For example, it should bepossible to trace a high-level natural language policyto its formal representation (e.g., part of a goal model)and further down to the code. Conversely, if a faultoccurs because a low-level policy check is violated,traceability should facilitate the maintainers to ratio-nalize which higher-level policies are responsible forthis particular check. policy negotiation:
If a web site requests a service of an-other entity (e.g., in a service-oriented setting) thereneeds to be some form of policy matching and ne-gotiation. For example, the requested service has toguarantee (e.g., in a service level agreement) that per-sonal data is processes in accordance with the request-ing web site’s policies. Policy negotiation may befully automated (e.g., communicating web services)or human-involved. In the latter case, dedicated toolsupport is needed that facilitates negotiation. SPAR-CLE is an example of such a privacy management tool[7]. The W3C has developed A P3P Preference Ex-change Language (APPEL), which allows users todescribe privacy policies that are acceptable to themand to check whether a given web site accommodatesthem.Since different users and services may have differentpolicy requirements, a web site has to be prepared toaccommodate different policy settings. In this sce-nario, different privacy preferences of users may resultin different behavior of the web site. nother important concern besides the above challengesis how to make existing web sites more policy-aware. Forsimpler web sites, it seems feasible to reimplement them,but this may not be an option for complex e-commerce andweb 2.0 sites. When migrating a web site, policy-aware fea-tures can be injected incrementally. For example, the codeof the web site can be reverse engineered to extract stake-holder goal models and to establish traceability between theexisting code and the goal model [46]. The goal model canthen be augmented with policy-aware requirements and thecode evolved to reflect the changes in the goal model.
7. Conclusions “ Security is an emergent property of a system, not a feature.. . . Because security is not a feature, it can’t be bolted onafter other software features are codified, nor can it bepatched in after attacks have occurred in the field. ”– Hope et al. [17]
Hope et al. in the above quote stress that security is nota set of immovable features [17]. The same is true for pol-icy compliance, which is a moving target because of vari-ous forces that an organization is confronted with. Theseforces can be internal ones such as changes in strategy thataffect policy as well as external ones such as new regu-lations. Given that policies will evolve, there have to bemechanisms in place to enable a policy-aware evolution ofthe site. Policy-aware evolution has to be addressed in thewhole life cycle of the web site starting with requirements.In this paper, we have discussed policy management andcompliance for web sites, giving examples of concrete pol-icy and legal issues. We have then explored one particularpolicy issue in more detail, namely privacy and data protec-tion. This issue exposes the complexity of policy manage-ment: web sites have to address it in logging of data; pro-filing and personalization; distribution and transmission ofdata; and in communicating their privacy policy to the usersof the site. Furthermore, policy issues are more pronouncedwith increasing complexity of the site.We believe that the policy issues that we have identifiedshow that web sites can no longer be developed in a vac-uum without consideration of policies and legal constraints.Treating policy requirements as first-class entities is justi-fied by the potentially severe adverse consequences of ig-noring them. This paper has exposed a number of policychallenges, which can serve as a starting point in formulat-ing a research agenda to advance the current state of policymanagement and compliance for web sites.
Acknowledgments
Thanks to Crina Vasiliu for proofreading and comment-ing on an earlier draft of this paper. This work has been supported by the Natural Sciencesand Engineering Research Council of Canada (NSERC),the Consortium for Software Engineering (CSER), and theCentre for Advanced Studies (CAS), IBM Canada Ltd.
References [1] A. H. Anderson. A comparison of two privacy policy lan-guages: EPAL and XACML. , pages 53–60, Nov. 2006.[2] A. I. Ant´on, E. Bertino, N. Li, and T. Yu. A roadmap forcomprehensive online privacy policy management.
Commu-nications of the ACM , 50(7):109–116, July 2007.[3] A. I. Ant´on, J. B. Earp, Q. He, W. Stufflebeam, D. Bol-chini, and C. Jensen. Financial privacy policies and the needfor standardization.
IEEE Security & Privacy , 2(2):36–45,Mar./Apr. 2004.[4] A. I. Ant´on, J. B. Earp, M. W. Vail, N. Jain, C. M. Gheen,and J. M. Frink. HIPAA’s effect on web site privacy policies.
IEEE Security & Privacy , 5(1):45–52, Jan./Feb. 2007.[5] B. Berendt, O. G¨unther, and S. Spiekermann. Privacy ine-commerce: States preferences vs. actual behavior.
Com-munications of the ACM , 48(4):101–106, Apr. 2005.[6] T. D. Breaux, M. W. Vail, and A. I. Ant´on. Towards regula-tory compliance: Extracting rights and obligations to alignrequirement with regulations. , pages 49–58, Sept. 2006.[7] C. Brodie, C.-M. Karat, J. Karat, and J. Feng. Usable secu-rity and privacy: A case study of developing privacy man-agement tools.
ACM Symposium on Usable Privacy and Se-curity 2005 (SOUPS’05) , pages 35–43, July 2005.[8] S. Ceri, P. Fraternali, and M. Matera. WebML applicationframeworks: a conceptual tool for enhancing design reuse.
WWW10 Workshop Web Engineering , May 2001.[9] G. Conti. Googling considered harmful. , pages 67–76,Sept. 2006.[10] G. Cormode and B. Krishnamurthy. Key differences be-tween web 1.0 and web 2.0.
First Monday , 13(6), June 2008. .[11] S. Diffily.
The Website Manager’s Handbook . Lulu.com,2006.[12] Ernst & Young. Achieving success in a global-ized world: Is your way secure, 2006. .[13] S. Etalle, F. Massacci, and A. Yautsiukhin. The meaning oflogs. In C. Lambrinoudakis, G. Pernul, and A. M. Tjoa,editors,
TrustBus 2007 , volume 4657 of
Lecture Notes inComputer Science , pages 145–154. Springer-Verlag, 2007.[14] D. Goodin. Facebook bug dishes out notesdesignated private. The Register, Oct. 2007. .15] B. Hayes. Cloud computing.
Communications of the ACM ,51(7):9–11, July 2008.[16] Q. He, P. Otto, A. I. Ant´on, and L. Jones. Ensuring compli-ance between policies, requirements and software design: Acase study. , Apr. 2006.[17] P. Hope, G. McGraw, and A. I. Ant´on. Misuse and abusecases: Getting past the positive.
IEEE Security & Privacy ,2(3):90–92, May/June 2004.[18] D. Isenberg.
GigaLaw Guide to Internet Law . RandomHouse, Oct. 2002.[19] H. M. Kienle, D. German, S. Tilley, and H. A. M¨uller. Man-aging legal risks associated with web content.
InternationalJournal of Business Information Systems (IJBIS) , 3(1):86–106, Dec. 2008.[20] N. Kiyavitskaya, N. Zeni, T. D. Breaux, A. I. Ant´on,J. R. Cordy, L. Mich, and J. Mylopoulos. Au-tomating the extraction of rights and obligations forregulatory compliance. , Oct. 2008. http://research.cs.queensu.ca/˜cordy/Papers/KZBACMM_ER08_GaiusT.pdf .[21] P. Knight and J. Fitzsimons.
The Legal Environment of Com-puting . Addison-Wesley, 1990.[22] A. Kobsa. Privacy-enhanced personalization.
Communica-tions of the ACM , 50(8):24–33, Aug. 2007.[23] G. Lawton. Developing software online with platform-as-a-service technology.
IEEE Computer , 41(6):13–15, June2008.[24] A. Marcus. Dashboards in your future. interactions , 13(1),Jan.+Feb. 2006.[25] A. M. Memon, A. Nagarajan, and Q. Xie. Automating re-gression testing for evolving GUI software.
Journal of Soft-ware Maintenance and Evolution: Research and Practice ,17(1):27–64, Jan./Feb. 2005. .[26] H. M¨uller. Bits of history, challenges for the future and au-tonomic computing technology. , pages 9–18, Oct.2006.[27] S. Murugesan. Understanding Web 2.0.
IT Pro , 9(4):34–41,July/Aug. 2007.[28] A. Nanda. Keeping information private withVPD.
Oracle Magazine , Mar./Apr. 2004. .[29] T. Parveen, S. Tilley, and G. Gonzalez. On the need forteaching web application testing.
IEEE 9th InternationalSymposium on Web Site Evolution (WSE’07) , pages 51–55,Sept. 2007.[30] PC-Welt. Canadian CIOs to focus on IT governance, Dec.2005. .[31] W. Pollak, editor.
Ultra-Large-Scale Systems: The SoftwareChallenge of the Future . SEI, July 2006. .[32] P. Prasarnphanich and M. L. Gillenson. The hybrid clicksand bricks business model.
Communications of the ACM ,46(12ve):178–185, Dec. 2003. [33] E. Reuveni. Authorship in the age of the conducer.
SSRN ,2008. http://ssrn.com/abstract=1113491 .[34] P. Samuelson. Unsolicited communications as trespass?
Communications of the ACM , 46(10):15–20, Oct. 2003.[35] D. Schwabe and G. Rossi. An object oriented ap-proach to web-based applications design.
Theoryand Practice of Object Systems , 4(4):207–225, Oct.1998. .[36] D. Sullivan. Google ramps up personalized search.Search Engine Land, Feb. 2007. http://searchengineland.com/070202-224617.php .[37] K. A. Taipale. Technology, security and privacy: Thefear of frankenstein, the mythology of privacy and thelessons of king Ludd.
Yale Journal of Law and Technol-ogy , 7(Fall):123–221, Dec. 2004. .[38] G. S. Takach.
Computer Law . Irwin Law, second edition,2003.[39] S. Tilley and S. Huang. Evaluating the reverse engineeringcapabilities of Web tools for understanding site content andstructure: A case study. , pages 514–523,May 2001.[40] E. Union. Directive 95/46/ec of the european parliamentand of the council of 24 october 1995 on the protectionof individuals with regard to the processing of personaldata and on the free movement of such data, 1995. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31995L0046:EN:HTML .[41] A. van Lamsweerde. Goal-oriented requirements engineer-ing: A guided tour. , pages 249–62, Aug. 2001.[42] E. Volokh. Personalization and privacy: Does personaliza-tion jeopardize privacy? if so, what should the law do aboutit?
Communications of the ACM , 43(8):84–88, Aug. 2000.[43] D. J. Weitzner, H. Abelson, T. Berners-Lee, J. Feigenbaum,J. Hendler, and G. J. Sussman. Information accountability.
Communications of the ACM , 51(6):82–87, June 2008.[44] Wikipedia. Information technology governance. http://en.wikipedia.org/wiki/Information_technology_governance .[45] Wikipedia. Privacy. http://en.wikipedia.org/wiki/Privacy .[46] Y. Yu, Y. Wang, J. Mylopoulos, S. Liaskos, A. Lapouchnian,and J. C. S. do Prado Leite. Reverse engineering goal modelsfrom legacy code. , pages 363–372, 2005.
This work is licensed under a Creative Commons Attribution-Noncommercial-ShareAlike 3.0 United States License. The license is available here: http://creativecommons.org/licenses/by-nc-sa/3.0/us/http://creativecommons.org/licenses/by-nc-sa/3.0/us/