I conduct fundamental, empirical and applied research at the nexus of data science, network analysis, natural language processing, machine learning and computational social science. My research agenda is driven by my believe that progress in data science requires interdisciplinary and mixed methods approaches to enable the joint consideration of the structure and content of social interactions, and to assess the impact of data provenance on research outcomes and of information products on social agents. In my research lab, we develop computational solutions that are grounded in theories from the social sciences, humanities and linguistics. We bring these solutions into different application contexts to test their generalizability, and to advance theory.
What impact do information products have on people beyond simple frequency based metrics?
We have been developing, implementing, evaluating and applying a theory-driven, computational solution to assessing the impact of issue-focused information products on people, groups and society. For example, we built a theory-driven framework and probabilistic prediction model for identifying different types of impact on individuals, such as changes versus reinforcement in personal behavior, cognition and emotions...[Learn More]
How do limitations and intransparencies in data quality and data provenance bias research outcomes, and how can we detect and mitigate these limitations?
For example, we have been investigating the impact of entity resolution errors on network analysis results. We found that commonly reported network metrics and derived implications can strongly deviate from the truth - as established based on gold standard data or approximations thereof - depending on the efforts dedicated to entity resolution....[Learn More]
How can we use user-generated content to construct, infer or refine network data?
We have been tackling this problem by leveraging communication content produced and disseminated in social networks to enhance graph data. For example, we have used domain-adjusted sentiment analysis to label graphs with valence values in order to enable triadic balance assessment. The resulting method enables fast and systematic sign detection, eliminates the need for surveys or manual link labeling, and reduces issues with leveraging user-generated (meta)-data....[Learn More]
How to be rule compliant and still innovate?
We study practical ethics for working with human-centered and online data. The collection and analysis of human-centered and/ or data are governed by multiple sets of norms and regulations. Problems can arise when researchers are unaware of applicable rules, uninformed about their practical meaning and compatibility, and insufficiently skilled in implementing them. We are developing and delivering educational modules to address this issue....[Learn More]
In our current work funded by IMO (Intelligent Medical Objects), a private company specializing in developing, managing and licensing medical vocabularies, we evaluate the coverage and accuracy of various medical terminologies, and test strategies for increasing the precision of mapping medical reports to standardized terminologies....[Learn More]