Active Authentication through Psychometrics

paper, specified "short paper"
Authorship
  1. 1. John Noecker Jr.

    Juola & Associates

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

What can your computer habits reveal about you? The answer might surprise you. Previous work (Juola, et al., 2013) has shown that just a few minutes of computer usage can be used to identify who is at the keyboard and their demographic and psychological attributes with a fairly high degree of accuracy. We expand upon this to show that the same usage data can be used to thoroughly profile a previously-unknown user to obtain valuable psychological information about the user.
Authorship attribution, the analysis of a document’s writing style to infer the author’s identity, is a well-established problem in text classification. Previously, we used classical authorship attribution techniques to identify “who was at the keyboard” using the DARPA Active Authentication Corpus (Juola et al, 2013). Researchers have successfully applied the analysis of language usage to infer authorship of written documents (Juola, 2006. Koppel et al, 2009. Stamatatos, 2009. Jockers & Witten, 2010), and stylometric analysis has also been applied to things like gender (Argamon et al, 2006), personality (Luyckx & Daelemans, 2008), and even psychological disorders like depression (Rude et al, 2004).
Here, we attempt to perform the same technique with groups composed of individuals who share common psychological traits. Previous work (Luyckx & Daelemans, 2008. Noecker & Juola, 2013) on personality profiling has so far focused on analyzing previously-written documents. In contrast, our system provides a method for real-time psychological profiling of a user based on his or her interactions with a computer over a relatively short period of time (approximately 30 minutes). The ultimate goal is two-fold: to learn something about a previously-unobserved user (traditional stylometric identification techniques require us to have training data on a user before we can identify him) and to use psychological traits as an enhancement to current user authentication methods.
Currently, exact accuracy on the user-based authentication is approximately 90%. This task becomes more difficult (and the accuracy becomes correspondingly lower) as the pool of potential author models grows. In order to improve overall accuracy of the user authentication task, we propose to include these psychological profiling tools in the authentication system. If a given user can be identified as the most likely candidate with 90% probability, and several facets of that user’s personality can be confirmed with similarly high confidence, this will increase the overall robustness of the authentication system.
For our purposes, we used two personality/intelligence measurement systems to profile users: Myers-Briggs Type Indicator (MBTI) and Multiple Intelligences Developmental Assessment Scales (MIDAS).
The Myers-Briggs type indicator (MBTI) assigns four binary classifications to define personality (Myers & Myers, 1980)
Extroversion vs Introversion
iNtuition vs Sensing
Thinking vs Feeling
Judgement vs Perception
The Multiple Intelligences Developmental Assessment Scales (MIDAS) were developed by Dr. Howard Gardner in his 1983 book “Frames of Mind” (Gardner, 1983). He used a unique definition of intelligence: “The ability to solve a problem or create a product that is valued within one or more cultures” (MI Research and Consulting). He identified 8 primary intelligent scales, each of which have several subscales (MI Research and Consulting):
Musical
Vocal Ability
Instrumental Skill
Composer
Appreciation
Kinesthetic
Athletics
Dexterity
Logical-Mathematical
Everyday Math
School Math
Everyday Problem Solving
Strategy Games
Spatial
Space Awareness
Working with Objects
Artistic Design
Linguistic
Expressive Sensitivity
Rhetorical Skill
Written-academic
Interpersonal
Social Sensitivity
Social Persuasion
Interpersonal Work
Intrapersonal
Personal Knowledge / Efficacy
Effectiveness
Calculations
Spatial Problem Solving
Naturalist
Animal Care
Plant Care
We also include a 9th main scale, Leadership, with its own subscales: Communication, Management, and Social.
Materials and Methods

Corpus

In order to create the most accurate corpus possible, we set up a simulated office environment and hired 80 temporary workers for one week each. Workers were tasked to perform a long-term blogging project (research and write blog articles on topics “related to Pittsburgh in some way”) over the course of a normal workweek. For this study, we use the Free Key Logger output, which provides the exact text typed by each user. We do not include any information about the applications being used or any data the user pastes from the clipboard.
Feature Extraction

For our analysis, we used the Java Graphical Authorship Attribution Program (JGAAP) (Juola et al, 2009). JGAAP is a Java-based, modular program for textual analysis, text categorization, and authorship attribution. It provides a comprehensive framework, allowing us to rapidly test the effectiveness of different analysis techniques on the recorded data.
JGAAP divides analysis into several steps: Canonicization (Preprocessing), Event Set (Feature) Generation, and Analysis. In Canonicization, preprocessors are used to standardize the text. For this step, we converted all input letters to lower case (“Unify Case”) and converted all strings of whitespace characters into a single space character (“Normalize Whitespace”). At this stage, we also processed a variety of special keyboard characters, converting these non-printable characters into a printable placeholder (e.g. “backspace” was replaced with “β”). Finally, we divided the input data into blocks of 1,000 characters, representing about 30 minutes of computer usage.
For the event set generation, we tested character N-grams for all N from 1 to 15, and word N-grams for N from 1 to 5. We then applied a number of analysis methods for each experiment: Cosine Distance, Intersection Distance, Manhattan Distance, and Matusita Distance. For each method, we used a centroid-based nearest neighbor classifier. We performed leave-one-out cross-validation to reach our final conclusion.
Models

For the MBTI classifiers, we built four binary classifiers (i.e. E vs I, N vs S, T vs F, and J vs P). For the MIDAS classifiers, we first built a single 9-way classifier to identify a user’s principle main scale. This was the scale along which the user scored highest (i.e. the scale for which the user showed the highest preference). For example, a user might have a preference for “Musical” or “Linguistic”. We also developed subscale classifiers, which identify a user’s preference within each major scale. For instance, a user might be identified as “(Musical) Vocal Ability” and “(Kinesthetic) Dexterity”, etc. Thus, each user was identified by a single main scale preference as well as nine subscale preferences.
Results

MBTI

For the MBTI classifiers, we averaged an accuracy of 81.5%. The expected baseline average (assuming we pick the most prevalent personality type for each category) is 55%.
MIDAS

For the MIDAS main category identification, our best performing classifier had accuracy of 70.7%. This was using character 15 grams with Intersection Distance. The expected baseline accuracy (achieved by choosing the most common main scale, “Linguistic”) was 22.1%.
For the MIDAS subscale identification, the best performing classifiers used a variety of Character n-grams, again with Intersection Distance as the top performing analysis method. The average subscale accuracy was 81%.
Conclusion

We have shown here a method to reliably psychologically profile a computer user based on only a short period (about 30 minutes) of usage time. In addition to providing valuable information about the user in question, this method can also be used to provide additional layers of security for the active authentication system we have described previously. Even in an adversarial situation, the difficulty of imitating both an individual user’s style, as well as mimicking the psychological profile of the user, will provide additional security to the authentication system.
Also interesting to note is the limited usage data required to perform these analyses. The initial user psychological testing period took approximately 3 hours, but accurate results were obtained for only 30 minutes of computer usage. In addition, the three hours of testing were completely lost time – the users were able to work only on the tests during this time. In contrast, the 30 minutes of analysis can be done on whatever the user is working on at the time. No downtime is required to perform these analyses. We believe this system could be useful anywhere a non-intrusive analysis of a user might be beneficial (e.g. determining whether a potential employee would be a good fit).
For future work, we intend to focus on reducing the amount of data needed even further. Preliminary results on as little as 500 characters (about 15 minutes of usage time) have been promising. Additional work is also being done to integrate these methods into the broader active authentication system in order to bolster the overall reliability of the system.
References

Argamon, S., Koppel, M., Fine, J., Shimoni, A. R (2006). “Gender, Genre, and Writing Style in Formal Written Texts”. Interdisciplinary Journal for the Study of Discourse. Volume 23. Issue 3. pp. 321-346.
“Broad Agency Announcement: Active Authentication”. (2012). DARPA. Solicitation No. DARPA-BAA-12-06. 12 Jan. 2012. <http://www.fbo.gov/index?tab=documents&t abmode=form&subtab=core&tabid=494b6b2c612c4 fd3db6cb018d4467e21>.
Gardner, Howard (1983). “Frames of Mind: The Theory of Multiple Intelligences”. Basic Books.
Jockers, M. L., Witten, D. (2010) “A Comparative Study of Machine Learning Methods for Authorship Attribution”.Literary and Linguistic Computing, vol. 25, no. 2. pp. 215–23.
Juola, P. (2006) “Authorship Attribution”.Foundations and Trends in Information Retrieval, vol. 1, no. 3. pp. 233–334.
Juola, Patrick, Noecker Jr., John, Ryan, Mike, Speer, Sandy (2009). “JGAAP 4.0 – A Revised Authorship Attribution Tool”. Proc. Digital Humanities 2009. pp. 357–359. Maryland Insitute. for Technology in the Humanities. University of Maryland.
Juola, Patrick, Noecker Jr., John, Stolerman, Ariel, Ryan, Michael, Brennan, Patrick, Greenstadt, Rachel (2013). "Keyboard Behavior Based Authentication for Security". IT Professional. 18 June 2013. IEEE computer Society Digital Library. IEEE Computer Society. <http://doi.ieeecomputersociety.org/10.1109/MITP.2013.49>.
Koppel, M., Schler, J., Argamon, S. (2009)“Computational Methods in Authorship Attribution”. J. Amer. Soc. Information Science and Technology, vol. 60, no. 1. pp. 9–26.
Luyckx, K., Daelemans, W. (2008) “Personae, a Corpus for Author and Personality Prediction from Text”.Proceedings of the Sixth International Conference on Language Resources and Evaluation. Marrakech, Morroco.
MI Research and Consulting, Inc. “Multiple Intelligences Theory”. <www.miresearch.org/mi_theory.html>.
Myers I B, Myers P. (1980)“Gifts Differing: Understanding Personality Type”. Palo Alto, CA. Consulting Psychologists Press.
Noecker Jr., J. Juola, P. (2013) “Psychological Profiling Through Textual Analysis”. Literary and Linguist Computing.
Rude, S., Gortner, E., Pennebaker, J. (2004) “Language Use of Depressed and Depression-Vulnerable College Students”.Cognition and Emotion.
Stamatatos, E. (2009) “A Survey of Modern Authorship Attribution Methods”. J. Amer. Soc. Information Science and Technology, vol. 60, no. 3. pp. 538–556.
Zheng, N., Paloski, A., Wang, H. (2011) “An Efficient User Verification System via Mouse Movements,”Proc. 18th ACM Conf. Computer and Communications Security (CCS 11). ACM. pp. 139–150.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO