Ariel Stolerman
Home | Research | TAU classes | Drexel classes |
Research |
I got my Ph.D. in Computer Science from Drexel University in 2015, under the supervision of Prof. Rachel Greenstadt at the Privacy, Security and Automation Lab (PSAL). My research mainly involved applied machine learning in security and privacy, specifically applications with stylomtery (authorship attribution). |
Publications
2015
-
Authorship Verification.
Stolerman, A.
PhD Dissertation, Drexel University, 2015.
[abstract] [BibTeX] [paper] [slides]In recent years, stylometry, the study of linguistic style, has become more prominent in security and privacy applications involving written language, mostly in digital and online domains. Although literature is abundant with computational stylometry research, the field of authorship verification is relatively unexplored. Authorship verification is the binary semi-open-world problem of determining whether a document is written by a given author or not. A key component in authorship verification techniques is confidence measurement, on which verification decisions are based, expressed by acceptance thresholds selected and tuned per need. This thesis demonstrates how utilization of confidence-based approaches in stylometric applications, and their combination with traditional approaches, can benefit classification accuracy, and allow new domains and problems to be analyzed. We start by motivating the usage of authorship verification approaches with two stylometric applications: native-language identification from non-native text and active linguistic user authentication. Next, we introduce the Classify-Verify algorithm, which integrates classification with binary verification, applied to several stylometric problems. Classify-Verify is proposed as an open-world alternative to restricted closed-world attribution methods, and is shown effective in dealing with possibly missing candidate authors by thwarting misclassifications, coping with various domains and scales, and even adversarial authors who try to fool the classifier.
@phdthesis{Stolerman_thesis:2015, author={Stolerman, Ariel}, advisor={Greenstadt, Rachel}, year={2015}, title={Authorship Verification}, journal={ProQuest Dissertations and Theses}, pages={146}, note={Copyright - Copyright ProQuest, UMI Dissertations Publishing 2015; Last updated - 2015-05-12; First page - n/a}, keywords={Applied sciences; Authorship attribution; Authorship verification; Machine learning; Stylometry; Artificial intelligence; Computer science; 0984:Computer science; 0800:Artificial intelligence}, isbn={9781321696998}, language={English}, url={http://search.proquest.com/docview/1679459529?accountid=10559}, }
2014
-
Multi-Modal Decision Fusion for Continuous Authentication.
Fridman, A., Stolerman, A., Acharya, S., Brennan, P., Juola, P., Greenstadt, R. and Kam, M.
Computers & Electrical Engineering, 2014 (to appear).
[abstract] [BibTeX] [paper] [slides]Active authentication is the process of continuously verifying a user based on their on-going interaction with a computer. In this study, we consider a representative collection of behavioral biometrics: two low-level modalities of keystroke dynamics and mouse movement, and a high-level modality of stylometry. We develop a sensor for each modality and organize the sensors as a parallel binary decision fusion architecture. We consider several applications for this authentication system, with a particular focus on secure distributed communication. We test our approach on a dataset collected from 67 users, each working individually in an office environment for a period of approximately one week. We are able to characterize the performance of the system with respect to intruder detection time and robustness to adversarial attacks, and to quantify the contribution of each modality to the overall performance.
@article{fridman2014multimodal, author={Lex Fridman and Ariel Stolerman and Sayandeep Acharya and Patrick Brennan and Patrick Juola and Rachel Greenstadt and Moshe Kam}, journal={Computers and Electrical Engineering}, title={Multi-Modal Decision Fusion for Continuous Authentication}, pages={Accepted}, year=2014, pdf={http://lexfridman.com/publications/papers/2014-elsevier-cee-fridman-stolerman-multi-modal-decision-fusion-for-continuous-authentication.pdf} }
-
Doppelgänger Finder: Taking Stylometry To The Underground.
Afroz, S., Caliskan-Islam, A., Stolerman, A., Greenstadt, R. and McCoy, D.
The 35th IEEE Symposium on Security and Privacy, May 2014 (to appear).
[abstract] [BibTeX] [paper] [slides]Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale.
We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgängers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities.
For scenario 1, we achieve 77.2% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90.38% precision on blogs and 85.18% recall with 82.14% precision for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelgänger accounts.@inproceedings{Afroz:2014:DFT:2650286.2650799, author = {Afroz, Sadia and Islam, Aylin Caliskan and Stolerman, Ariel and Greenstadt, Rachel and McCoy, Damon}, title = {Doppelg\&\#228;Nger Finder: Taking Stylometry to the Underground}, booktitle = {Proceedings of the 2014 IEEE Symposium on Security and Privacy}, series = {SP '14}, year = {2014}, isbn = {978-1-4799-4686-0}, pages = {212--226}, numpages = {15}, url = {http://dx.doi.org/10.1109/SP.2014.21}, doi = {10.1109/SP.2014.21}, acmid = {2650799}, publisher = {IEEE Computer Society}, address = {Washington, DC, USA}, keywords = {Stylometry, cybercrime, underground forum}, }
-
Active linguistic authentication revisited: Real-time stylometric evaluation towards multi-modal decision fusion.
Stolerman, A., Fridman, A., Greenstadt, R., Brennan, P. and Juola, P.
The Tenth Annual IFIP WG 11.9 International Conference on Digital Forensics, January 2014, Vienna, Austria.
[abstract] [BibTeX] [paper] [slides]Active authentication is the process of continuously verifying a user based on his/her on-going interaction with the computer. Forensic stylometry is the study of linguistic style, applied to author (user) identification. We evaluate the Active Linguistic Authentication Dataset, collected from users working individually in an office environment for a period of one week. We consider a battery of stylometric modalities, as a representative collection of high-level behavioral biometrics. As opposed to the initial evaluation presented on this dataset before on 14 users, we consider the fully collected dataset, which consists of data by 67 users. An additional significant difference is in the type of evaluation: instead of a day-based, or data-based (number-of-characters) windows considered for classification, we evaluate time-based, overlapping sliding windows; our evaluation tests the ability to produce authentication decisions every 10-60 seconds, highly applicable to real-world active security systems. We evaluate the different sensors via cross-validation, measuring false acceptance and rejection rates (FAR & FRR). We show that under these realistic settings, stylometric sensors perform with considerable effectiveness down to 0/0.5 FAR/FRR, for decisions produced every 60 seconds, available 95% of the time. This work is considered towards a decision-fusion approach, that undertakes multiple modalities (e.g. keyboard and mouse dynamics) for making centralized, highly accurate authentication decisions.@inproceedings{stolerman2013active, author = {Ariel Stolerman and Lex Fridman and Rachel Greenstadt and Patrick Brennan and Patrick Juola}, title = {{Active Linguistic Authentication Revisited: Real-Time Stylometric Evaluation towards Multi-Modal Decision Fusion}}, address = {Orlando, Florida, USA}, booktitle = {Proceedings of the Ninth Annual IFIP WG 11.9 International Conference on Digital Forensics}, month = jan, publisher = {National Center for Forensic Science}, year = 2014 }
-
Classify, but Verify: Breaking the Closed-World Assumption in Stylometric Authorship Attribution.
Stolerman, A., Overdorf, R., Afroz, S. and Greenstadt, R.
The Tenth Annual IFIP WG 11.9 International Conference on Digital Forensics, January 2014, Vienna, Austria.
[abstract] [BibTeX] [paper] [slides]Forensic stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the document's author is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional methods used in classification are ineffective. We propose the Classify-Verify method, that augments classification with a binary verification step, evaluated on stylometric datasets, but can be generalized to any domain. We suggest augmentations to an existing distance-based authorship verification method, by adding per-feature standard deviations and per-author threshold normalization. The Classify-Verify method significantly outperforms traditional classifiers in open-world settings (p-val < 0.01) and attains F1-score of 0.87, comparable to traditional classifiers performance in closed-world settings. Moreover, Classify-Verify successfully detects adversarial documents where authors deliberately change their style, where closed-world classifiers fail.@inproceedings{StolermanOAG14, author = {Ariel Stolerman and Rebekah Overdorf and Sadia Afroz and Rachel Greenstadt}, title = {Breaking the Closed-World Assumption in Stylometric Authorship Attribution}, booktitle = {Advances in Digital Forensics {X} - 10th {IFIP} {WG} 11.9 International Conference, Vienna, Austria, January 8-10, 2014, Revised Selected Papers}, pages = {185--205}, year = {2014}, }
2013
-
Keyboard Behavior Based Authentication for Security.
Juola, P., Neocker, Jr. J., Stolerman, A., Ryan, M., Brennan, P. and Greenstadt, R.
IEEE IT Professional, vol. 15, no. 4, pp. 8-11, July-Aug. 2013.
[abstract] [BibTeX] [paper] [slides]We developed a large corpus of keyboard behavior based on temporary workers employed in a simulated office environment. Analysis of this corpus using stylometric techniques shows good accuracy in distinguishing users.@article{DBLP:journals/itpro/JuolaNSRBG13, author = {Patrick Juola and John Noecker Jr. and Ariel Stolerman and Michael Ryan and Patrick Brennan and Rachel Greenstadt}, title = {Keyboard-Behavior-Based Authentication}, journal = {IT Professional}, volume = {15}, number = {4}, year = {2013}, pages = {8-11}, ee = {http://doi.ieeecomputersociety.org/10.1109/MITP.2013.49}, bibsource = {DBLP, http://dblp.uni-trier.de} }
-
Decision Fusion for Multi-Modal Active Authentication.
Fridman, A., Stolerman, A., Acharya, S., Brennan, P., Juola, P., Greenstadt, R. and Kam, M.
IEEE IT Professional, vol. 15, no. 4, pp. 29-33, July-Aug. 2013.
[abstract] [BibTeX] [paper] [slides]We consider a representative collection of behavioral biometrics: two low-level modalities of keystroke dynamics and mouse movement, and two high-level modalities of stylometry and web browsing behavior. To the best of our knowledge, the application of the latter two in the continuous authentication context has not been studied before. We develop a sensor for each modality and organize the sensors as a parallel binary detection decision fusion architecture. The decisions of each sensor are fed into a Decision Fusion Center (DFC) which applies the Chair-Varshney fusion algorithm to generate a global decision. We test our approach on a dataset collected from 19 users in a simulated work environment. We show that the fusion algorithm achieves lower probability of error than that of the best individual sensor in the fused set, and we are able to quantify the contribution of each modality to the overall performance.@article{DBLP:journals/itpro/FridmanSABJGK13, author = {Alex Fridman and Ariel Stolerman and Sayandeep Acharya and Patrick Brennan and Patrick Juola and Rachel Greenstadt and Moshe Kam}, title = {Decision Fusion for Multimodal Active Authentication}, journal = {IT Professional}, volume = {15}, number = {4}, year = {2013}, pages = {29-33}, ee = {http://doi.ieeecomputersociety.org/10.1109/MITP.2013.53}, bibsource = {DBLP, http://dblp.uni-trier.de} }
-
From Language to Family and Back: Native Language and Language Family Identification from English Text.
Stolerman, A., Caliskan Islam, A. and Greenstadt, R.
Proceedings of the 2013 NAACL HLT Student Research Workshop, pages 32-39, June 2013, Atlanta, Georgia.
[abstract] [BibTeX] [paper] [slides]Revealing an anonymous author's traits from text is a well-researched area. In this paper we aim to identify the native language and language family of a non-native English author, given his/her English writings. We extract features from the text based on prior work, and extend or modify it to construct different feature sets, and use support vector machines for classification. We show that native language identification accuracy can be improved by up to 6.43% for a 9-class task, depending on the feature set, by introducing a novel method to incorporate language family information. In addition we show that introducing grammar-based features improves accuracy of both native language and language family identification.@inproceedings{DBLP:conf/naacl/StolermanCG13, author = {Ariel Stolerman and Aylin Caliskan and Rachel Greenstadt}, title = {From Language to Family and Back: Native Language and Language Family Identification from English Text}, booktitle = {HLT-NAACL}, year = {2013}, pages = {32-39}, ee = {http://aclweb.org/anthology/N/N13/N13-2005.pdf}, crossref = {DBLP:conf/naacl/2013}, bibsource = {DBLP, http://dblp.uni-trier.de} } @proceedings{DBLP:conf/naacl/2013, title = {Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA}, booktitle = {HLT-NAACL}, publisher = {The Association for Computational Linguistics}, year = {2013}, bibsource = {DBLP, http://dblp.uni-trier.de} }
-
A Dataset for Active Linguistic Authentication.
Juola, P., Neocker, Jr. J., Stolerman, A., Ryan, M., Brennan, P. and Greenstadt, R.
Proceedings of the Ninth Annual IFIP WG 11.9 International Conference on Digital Forensics, Orlando, Florida, USA: National Center for Forensic Science, January 2013.
[abstract] [BibTeX] [paper] [slides]Biometric technologies provide the possibility of a new and more effective way of security computers against unauthorized access; linguistic technologies, and in particular, authorship attribution technologies, provide the possibility of a means to this end. We report on a novel corpus developed to test this possibility. Using temporary workers in a simulated office environment, we collected a week's work-product for 19 subjects and demonstrate that techniques culled from the field of authorship attribution can identify workers with more than 90% accuracy.@inproceedings{DBLP:conf/ifip11-9/JuolaNSRBG13, author = {Patrick Juola and John Noecker Jr. and Ariel Stolerman and Michael Ryan and Patrick Brennan and Rachel Greenstadt}, title = {Towards Active Linguistic Authentication}, booktitle = {IFIP Int. Conf. Digital Forensics}, year = {2013}, pages = {385-398}, ee = {http://dx.doi.org/10.1007/978-3-642-41148-9_25}, crossref = {DBLP:conf/ifip11-9/2013}, bibsource = {DBLP, http://dblp.uni-trier.de} } @proceedings{DBLP:conf/ifip11-9/2013, editor = {Gilbert L. Peterson and Sujeet Shenoi}, title = {Advances in Digital Forensics IX - 9th IFIP WG 11.9 International Conference on Digital Forensics, Orlando, FL, USA, January 28-30, 2013, Revised Selected Papers}, booktitle = {IFIP Int. Conf. Digital Forensics}, publisher = {Springer}, series = {IFIP Advances in Information and Communication Technology}, volume = {410}, year = {2013}, isbn = {978-3-642-41147-2}, ee = {http://dx.doi.org/10.1007/978-3-642-41148-9}, bibsource = {DBLP, http://dblp.uni-trier.de} }
2012
-
Use Fewer Instances of the Letter "i": Toward Writing Style Anonymization.
McDonald, A., Afroz, S., Caliskan, A., Stolerman, A. and Greenstadt, R.
Privacy Enhancing Technologies Symposium (PETS) 2012, 299-318.
[abstract] [BibTeX] [paper] [slides]This paper presents Anonymouth, a novel framework for anonymizing writing style. Without accounting for style, anonymous authors risk identification. This framework is necessary to provide a tool for testing the consistency of anonymized writing style and a mechanism for adaptive attacks against stylometry techniques. Our framework defines the steps necessary to anonymize documents and implements them. A key contribution of this work is this framework, including novel methods for identifying which features of documents need to change and how they must be changed to accomplish document anonymization. In our experiment, 80% of the user study participants were able to anonymize their documents in terms of a fixed corpus and limited feature set used. However, modifying pre-written documents were found to be difficult and the anonymization did not hold up to more extensive feature sets. It is important to note that Anonymouth is only the first step toward a tool to acheive stylometric anonymity with respect to state-of-the-art authorship attribution techniques. The topic needs further exploration in order to accomplish significant anonymity.@inproceedings{DBLP:conf/pet/McDonaldACSG12, author = {Andrew W. E. McDonald and Sadia Afroz and Aylin Caliskan and Ariel Stolerman and Rachel Greenstadt}, title = {Use Fewer Instances of the Letter "i": Toward Writing Style Anonymization}, booktitle = {Privacy Enhancing Technologies}, year = {2012}, pages = {299-318}, ee = {http://dx.doi.org/10.1007/978-3-642-31680-7_16}, crossref = {DBLP:conf/pet/2012}, bibsource = {DBLP, http://dblp.uni-trier.de} } @proceedings{DBLP:conf/pet/2012, editor = {Simone Fischer-H{\"u}bner and Matthew Wright}, title = {Privacy Enhancing Technologies - 12th International Symposium, PETS 2012, Vigo, Spain, July 11-13, 2012. Proceedings}, booktitle = {Privacy Enhancing Technologies}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, volume = {7384}, year = {2012}, isbn = {978-3-642-31679-1}, ee = {http://dx.doi.org/10.1007/978-3-642-31680-7}, bibsource = {DBLP, http://dblp.uni-trier.de} }
Posters
2013
-
Classify, but Verify: Breaking the Closed-World Assumption in Stylometric Authorship Attribution.
Stolerman, A., Overdorf, R., Afroz, S. and Greenstadt, R.
USENIX Security Aug 2013.
[abstract] [poster]Forensic stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the document's author is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional methods used in classification are ineffective.
We propose the Classify-Verify method, that augments classification with a binary verification step, evaluated on stylometric datasets, but can be generalized to any domain.
We suggest augmentations to an existing distance-based authorship verification method, by adding per-feature standard deviations and per-author threshold normalization.
The Classify-Verify method significantly outperforms traditional classifiers in open-world settings (p-val < 0.01) and attains F1-score of 0.87, comparable to traditional classifiers performance in closed-world settings. Moreover, Classify-Verify successfully detects adversarial documents where authors deliberately change their style, where closed-world classifiers fail.
In the News
-
Stylometric analysis to track anonymous users in the underground
Pierluigi Paganini, Security Affairs Blog, Jan 10, 2013 -
Linguistics identifies anonymous users
Darren Pauli, SC Magazine, Jan 9,2013 -
Students release stylometry tools
Helen Nowotnik, The Triangle, Jan 13, 2012 -
Software Helps Identify Anonymous Writers or Helps Them Stay That Way
Nicole Perlroth, New York Times Bits Blog, Jan 3, 2012 -
Authorship recognition software from Drexel University lab to be released December
Christopher Wink, Technically Philly, Nov 15, 2011