Skip to main content
Log in

Abstract

The increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-preserving techniques that support publishing of really anonymous data, without altering the analysis results significantly. In this paper we propose to apply the Privacy-by-design paradigm for designing a technological framework to counter the threats of undesirable, unlawful effects of privacy violation on sequence data, without obstructing the knowledge discovery opportunities of data mining technologies. First, we introduce a k-anonymity framework for sequence data, by defining the sequence linking attack model and its associated countermeasure, a k-anonymity notion for sequence datasets, which provides a formal protection against the attack. Second, we instantiate this framework and provide a specific method for constructing the k-anonymous version of a sequence dataset, which preserves the results of sequential pattern mining, together with several basic statistics and other analytical properties of the original data, including the clustering structure. A comprehensive experimental study on realistic datasets of process-logs, web-logs and GPS tracks is carried out, which empirically shows how, in our proposed method, the protection of privacy meets analytical utility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In statistics, the problem has been extensively studied in the field of statistical disclosure control.

  2. In the original formulation, the requirement is that the support is ≤ a given threshold.

  3. http://www.think3.com.

  4. http://archive.ics.uci.edu/ml/.

  5. http://www.geopkdd.eu.

  6. http://www.processmining.org.

References

  • Abul O, Atzori M, Bonchi F, Giannotti F (2007a) Hiding sensitive trajectory patterns. In: Proceedings of IEEE ICDM workshops, pp 693–698

  • Abul O, Atzori M, Bonchi F, Giannotti F (2007b) Hiding sequences. In: Proceedings of IEEE ICDE workshops, pp 147–156

  • Abul O, Atzori M, Bonchi F, Giannotti F (2007c) Hiding sequences. In: Proceedings of IEEE ICDE workshops, pp 147–156

  • Abul O, Bonchi F, Nanni M (2008) Never walk alone: uncertainty for anonymity in moving objects databases. In: Proceedings of IEEE ICDE, pp 376–385

  • Aggarwal CC, Yu PS (2008b) A framework for condensation-based anonymization of string data. Data Min Knowl Discov 16(3):251–275

    Article  MathSciNet  Google Scholar 

  • Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of IEEE ICDE, pp 3–14

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of ACM SIGMOD, pp 439–450

  • Article 29 data protection working party and working party on police and justice, the future of privacy: Joint contribution to the consultation of the european commission on the legal framework for the fundamental right to protection of personal data. 02356/09/en, wp 168 (December 1, 2009). http://ec.europa.eu/justice/policies/privacy/docs/wpdocs/2009/wp168_en.pdf

  • Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive rules. In: Proceedings of KDEX’99, pp 45–52

  • Barbaro M, Zeller Jr T (2006) A face is exposed for aol searcher no. 4417749. The New York Times

  • Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of IEEE ICDE, pp 217–228

  • Bettini C, Mascetti S (2006) Preserving k-anonymity in spatio-temporal datasets and location-based services

  • Bonchi F, Saygin Y, Verykios VS, Atzori M, Gkoulalas-Divanis A, Kaya SV, Savas E (2008) Privacy in spatiotemporal data mining. In: 19, pp 297–329. Springer

  • Dalenius T (1974) The invasion of privacy problem and statistics production—an overview. Stat Tidskrift 12:213–225

    Google Scholar 

  • Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Proceedings of IHW 2001, pp 369–383

  • European Data Protection Supervisor (2010) Opinion of the european data protection supervisor on promoting trust in the information society by fostering data protection and privacy

  • Federal Trade Commission (Bureau of Consumer Protection) (2010) Preliminary staff report, protecting consumer privacy in an era of rapid change: a proposed framework for business and policy makers, at v, 41, http://www.ftc.gov/os/2010/12/101201privacyreport.pdf

  • Ghinita G, Tao Y, Kalnis P (2008) On the anonymization of sparse high-dimensional data. In ICDE, pp 715–724

  • Giannotti, F, Pedreschi, D (eds) (2008) Mobility, data mining and privacy. Springer, Berlin

    Google Scholar 

  • Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of ACM SIGKDD, pp 330–339

  • LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proceedings of IEEE ICDE, p 25

  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: Privacy beyond k-anonymity. TKDD, 1(1)

  • Malin B (2008) k-unlinkability: a privacy protection model for distributed data. Data Knowl Eng 64(1):294–311

    Article  Google Scholar 

  • Mascetti S, Bettini C, Wang XS, Jajodia S (2006) k-anonymity in databases with timestamped data. In TIME, pp 177–186

  • Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: PODS’04, pp 223–228, New York, NY, USA, ACM

  • Mielikainen T (2003) On inverse frequent set mining. In: 2nd workshop on privacy preserving data mining (PPDM 2003), pp 18–23

  • Monreale A (2011) Privacy by design in data mining. Ph.D. Thesis at Department of Computer Science, University of Pisa, Pisa

  • Nanni M (2005) Speeding-up hierarchical agglomerative clustering in presence of expensive metrics. In: Proceedings of PAKDD, LNCS 3518, pp 378–387. Springer

  • Nergiz ME, Atzori M, Saygin Y (2007) Perturbation-driven anonymization of trajectories. Technical report 2007-TR-017, ISTI-CNR, Pisa, Italy, p 10

  • Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Hsu M, Dayal U (2001) Prefixspan: mining sequential patterns by prefix-projected growth. In: Proceedings of IEEE ICDE, pp 215–225

  • International Conference of Data Protection and Privacy Commissioners (2010) Privacy by design resolution. Jerusalem, Israel, October 27–29, 2010

  • Samarati P, Sweeney L (1998a) Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of PODS, p 188

  • Samarati P, Sweeney L (1998b) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International

  • Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. SIGMOD Record 30(4):45–54

    Article  Google Scholar 

  • Sweeney L (2001) Computational disclosure control: a primer. Ph.D. thesis, Department of Electrical Engineering and Computer Science, MIT

  • Terrovitis M, Mamoulis N (2008) Privacy preservation in the publication of trajectories. In: MDM, pp 65–72

  • Xu Y, Fung BCM, Wang K, Fu AW-C, Pei J (2008) Publishing sensitive transactions for itemset utility. In: ICDM, pp 1109–1114

  • Yarovoy R, Bonchi F, Lakshmanan LVS, Wang WH (2009) Anonymizing moving objects: how to hide a mob in a crowd? In: EDBT, pp 72–83

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Monreale.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Monreale, A., Pedreschi, D., Pensa, R.G. et al. Anonymity preserving sequential pattern mining. Artif Intell Law 22, 141–173 (2014). https://doi.org/10.1007/s10506-014-9154-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-014-9154-6

Keywords

Navigation