Integrating methods from network analysis, natural language processing, and machine learning with theories from the social sciences to advance knowledge and discovery about interaction-based and communication-based social systems

Current Projects | Completed Projects

Completed grants and projects:

Impact Assessment
Biases in Data and Technology, and Data Quality and Provenance; Scientometrics
KISTI (Korea Institute of Science and Technology Information) Projects

Project Description:
Dr. Jana Diesner has led a three-phased project spanning over three years (2014~2016) in order to measure how data quality (name ambiguity) can affect our micro- and macro-level knowledge findings from bibliometric data including that generated and maintained by KISTI (Phase 1), based on quality-controlled KISTI data investigate the structure and evolution of collaboration patterns among almost 700,000 Korean scholars over the 65 years (Phase 2), and identify what mechanisms govern collaborator selection and influence among scholars drawing on social theories of human interaction (Phase 3).
Major findings include that (1) data quality can severely distort our understanding of bibliometric data, contrary to wide assumptions among informetrics scholars that such distortion is ignorable, (2) scholarly collaboration in Korea have grown exponentially but shown trends toward a highly centralized, hierarchical structure depending heavily on a small number of top scholars, and (3) traditional network-structure-based analysis do not explain or predict much formation of collaboration among scholars, leading Dr. Diesner to propose a new framework for detecting the impact of influence in scientific collaboration.

Funder: KISTI (Korea Institute of Science and Technology Information)

Data Governance
Academia-Industry Big Data Collaboration for Early Career Researchers program: Developing organizational expertise and resources for the responsible conduct of research with human generated and publicly available data, 2016

Project Description:
Researchers frequently collect, use, and analyze publicly available data from social networking platforms, online production communities, and customer review websites, among other sources, as part of their research activities. Technically, it can be feasible and straightforward to access and obtain public data, while considering the ethics, norms, and regulations applicable to these data requires additional awareness, knowledge and skills. This is partially due to the fact that multiple types of rules may apply. We provide an overview on these types of regulations, and also clarify on common misassumptions between free as in “free speech” (i.e. freedom from restriction, “libre”) versus free as in free beer (i.e. freedom from cost, “gratis”). We outline several approaches to responsibly conducting research with public available data, and solutions to enhancing the expertise of researchers on implement data regulations, especially for the case of human centered data science.

Funder: Midwest Big Data Hub & Computing Community Consortium

Information Extraction and Evaluation

Thanks to our sponsors: