You may not be able to determine whether a person has a mental illness but you can predict

  • American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.

    Book  Google Scholar 

  • Bagroy, S., Kumaraguru, P., & De Choudhury, M. (2017). A social media based index of mental well-being in college campuses. In Proceedings of the 2017 CHI Conference on Human factors in Computing Systems (pp. 1634–1646). New York, NY: ACM Press.

    Chapter  Google Scholar 

  • Bedi, G., Carrillo, F., Cecchi, G., Slezak, D., Sigman, M., Mota, N., . . . Corcoran, C. M. (2015). Automated analysis of free speech predicts psychosis onset in high-risk youths. NPJ Schizophrenia, 1, 15030.

  • Bond, R., Fariss, C., Jones, J., Kramer, A., Marlow, C., Settle, J., & Fowler, J. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489, 295–298.

    Article  PubMed  Google Scholar 

  • Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K., & Mitchell, M. (2015). CLPsych 2015 shared task: Depression and PTSD on Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology (pp. 31–39). Red Hook, NY: Association for Computational Linguistics.

    Google Scholar 

  • Corcoran, C., Carrillo, F., Slezak, D., Klim, C., Bedi, G., Javitt, D., . . . Cecchi, G. (2018). Language disturbance as a predictor of psychosis onset in youth at enhanced clinical risk. Schizophrenia Bulletin, 44, S43–S44.

  • De Choudhury, M., Counts, S., Horvitz, E., & Hoff, A. (2014). Characterizing and predicting postpartum depression from shared Facebook data. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 628–638). New York, NY: ACM Press.

    Google Scholar 

  • De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (pp. 128–137). Menlo Park, CA: AAAI Press.

    Google Scholar 

  • De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G., & Kumar, M. (2016). Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI conference on human factors in computing systems (pp. 2098–2110). New York, NY: ACM Press.

    Chapter  Google Scholar 

  • Elvevag, B., Cohen, A., Wolters, M. , Whalley, H., Gountouna, V, Kuznetsova, K., . . . Nicodemus, K (2016). An examination of the language construct in NIMH’s research domain criteria: Time for reconceptualization! American Journal of Medical Genetics Part B, 171, 904–919.

  • Elvevag, B., Foltz, P., Weinberger, D., & Goldberg, T. (2007). Quantifying incoherence in speech: An automated methodology and novel application to schizophrenia. Schizophrenia Research, 93, 304–316.

    Article  PubMed  Google Scholar 

  • Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han, & U. Fayyad (Eds.), Proceedings of Second International Conference on Knowledge Discovery and Data Mining (pp. 226–231). Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Frankel, M. (2012). Regulating the boundaries of dual-use research. Science, 336(6088), 1523–1525.

  • Gkotsis, G., Oellrich, A., Velupillai, S., Liakata, M., Hubbard, T., Dobson, R., & Dutta, R. (2017). Characterisation of mental health conditions in social media using Informed Deep Learning. Nature Scientific Reports, 7, 45141.

    Article  Google Scholar 

  • Goldstone, R., & Lupyan, G. (2016). Discovering psychological principles by mining naturally occurring datasets. Topics in Cognitive Science, 8, 548–568.

    Article  PubMed  Google Scholar 

  • Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H., & Eichstaedt, J. C. (2017). Detecting depression and mental illness on social media: An integrative review. Current Opinion in Behavioral Sciences, 18, 43–49. https://doi.org/10.1016/j.cobeha.2017.07.005

    Article  Google Scholar 

  • Insel, T. (2017). Digital phenotyping: Technology for a new science of behavior. Journal of the American Medical Association, 318, 1215–1216.

    Article  PubMed  Google Scholar 

  • Ireland, M. E., & Mehl, M. R. (2014). Natural language use as a marker of personality. In T. M. Holtgraves (Ed.), Oxford handbook of language and social psychology (pp. 201–218). New York, NY: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199838639.013.034

    Chapter  Google Scholar 

  • Jain, S., Powers, B., Hawkins, J., & Brownstein, J. (2015). The digital phenotype. Nature Biotechnology, 33, 462–463.

    Article  PubMed  Google Scholar 

  • Kapur, S., Phillips, A. G., & Insel, T. R. (2012). Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Molecular Psychiatry, 17, 1174–1179. https://doi.org/10.1038/mp.2012.105

    Article  PubMed  Google Scholar 

  • Kern, M. L., Park, G., Eichstaedt, J., Schwartz, H., Sap, M., Smith, L, & Ungar, L. (2016). Gaining insights from social media language: Methodologies and challenges. Psychological Methods, 21, 507–525. https://doi.org/10.1037/met0000091

    Article  PubMed  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In NIPS’12 Proceedings of the 25th International Conference on Neural Information Processing Systems (pp. 1097–1105). Red Hook, NY: Curran Associates.

    Google Scholar 

  • Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15), 5802–5805.

  • Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

    Google Scholar 

  • Mehl, M., Pennebaker, J, Crow, D., Dabbs, J., & Price, J. (2001). The electronically activated recorder (EAR): A device for sampling naturalistic daily activities and conversations. Behavior Research Methods, Instruments, & Computers, 33, 517–523.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, D., & Dean, J. (2013). Efficient estimation of word representations in vector space. In International Conference on Learning Representations (ICLR) 2013. Retrieved from https://sites.google.com/site/representationlearning2013/workshop-proceedings

  • Monroe, S. M., & Simons, A. D. (1991). Diathesis—Stress theories in the context of life stress research: Implications for the depressive disorders. Psychological Bulletin, 110, 406–425.

    Article  PubMed  Google Scholar 

  • Mota, N., Copelli, M., & Ribeiro, S. (2017). Thought disorder measured as random speech structure classifies negative symptoms and schizophrenia diagnosis 6 months in advance. NPJ Schizophrenia, 3, 18. https://doi.org/10.1038/s41537-017-0019-3

    Article  PubMed  PubMed Central  Google Scholar 

  • Mota, N., Vasconcelos, N., Lemos, N., Pieretti, A., Kinouchi, O., Cecchi, G., . . . Ribeiro, S. (2012). Speech graphs provide a quantitative measure of thought disorder in psychosis. PLoS ONE, 7, e34928. https://doi.org/10.1371/journal.pone.0034928

  • Narayanan, A., & Shamitkov, V. (2008). Robust de-anonymizatoin of large sparse datasets. In Proceedings of IEEE 2008.

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

  • Pennebaker, J., Boyd, R., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Retrieved from https://repositories.lib.utexas.edu/.

  • Pennebaker, J., & King, L. (1999). Linguistic style: Language use as an individual difference. Journal of Personality and Social Psychology, 77, 1296–1312.

    Article  PubMed  Google Scholar 

  • Pennebaker, J. W., & Graybeal, A. (2001). Patterns of natural language use: Disclosure, personality, and social integration. Current Directions in Psychological Science, 10, 90–93. https://doi.org/10.1111/1467-8721.00123

    Article  Google Scholar 

  • Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language. use: Our words, our selves. Annual Review of Psychology, 54, 547–577. https://doi.org/10.1146/annurev.psych.54.101601.145041

    Article  PubMed  Google Scholar 

  • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on EMNLP (pp. 1532–1543). New York, NY: Association for Computational Linguistics.

    Google Scholar 

  • Preotiuc-Pietro, D., Eichstaedt, J., Park, G., Sap, M., Smith, L., Tobolsky, V., . . . Ungar, L. (2015). The role of personality, age and gender in tweeting about mental illness. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology (pp. 21–30). New York, NY: Association for Computational Linguistics.

  • Resnik, P., Armstrong, W., Claudino, L., Nguyne, T., Nguyen, V., & Boyd-Graber, J. (2015). Beyond LDA: Exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology (pp. 99–107). New York, NY: Association for Computational Linguistics.

    Google Scholar 

  • Rude, S., Gortner, E., & Pennebaker, J. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18(8), 1121–113.

  • Schwartz, H. A., Eichstaedt, J., Kern, M. L., Park, G., Sap, M., Stillwell, D., . . . Ungar, L. (2014). Toward assessing changes in degree of depression through Facebook. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology (pp. 118–125). New York, NY: Association for Computational Linguistics.

  • Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., Van Den Driessche, G., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–489.

    Article  PubMed  Google Scholar 

  • Thorstad, R., & Wolff, P. (2018). A big data analysis of the relationship between future thinking and decision-making. Proceedings of the National Academy of Sciences, 115, 1740–1748.

    Article  Google Scholar 

  • Wolinetz, C. (2012). Implementing the new US dual-use policy. Science, 336(6088), 1525–1527.

  • Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112, 1036–1040.

    Article  Google Scholar 


Page 2

From: Predicting future mental illness from social media: A big-data approach

  ADHD Anxiety Bipolar Depression Total
Subreddits
Mental illness subreddits (Study 1) .83 .75 .75 .74 .77
All other subreddits (Study 2) .42 .30 .34 .44 .38
All other subreddits, future prediction (Study 3) .39 .32 .37 .36 .36

  1. Performance is reported as F score, both separately for each disorder and averaged over all disorders. Chance performance is .25. Performance is scored on held-out test data