TextTransfer – Corpus-based detection of secondary use of scientific publications, 2017-2019
In this collaborative project, we are using Natural Language Processing and Machine Learning to identify secondary practical uses of research findings from final reports of grant funded work. Such reports are often stored in specialized databases, where long-term archiving activities focus on standardization, interoperability, and information indexing and retrieval. However, secondary use of reports is often not enabled or enforced, limiting the replication and reusability of research. We are identifying practically relevant patterns from text data by using information extraction techniques and detect transferable knowledge (from basic research to applications) in selected domains.
Funder: Federal Ministry of Education and Research (Germany) and Institute for German Language (IDS) (Germany)
Biases, Fairness, and Consequences of Limited Data Quality and Provenance
Alliance for IoBT Research on Evolving Intelligent Goal-driven Networks (IoBT REIGN), 2017-2022
Jana Diesner, assistant professor and PhD program director at the iSchool, is co-principal investigator on a multi-institutional initiative funded by the Army Research Lab to enable new predictive battlefield analytics and services. Diesner will collaborate on the project with researchers from Computer Science and Electrical and Computer Engineering.....[Learn More] (Project description courtesy of U of I Coordinated Science Lab)
Funder: Army Research Lab Cooperative Agreement W911NF-17-2-0196
Review and Assessment of the Usage of Computational Methods for Humanitarian Assistance and Disaster Relief (HADR) Efforts, and Scalable Measurement of Emergency Response from Text Data, 2017-2018
The Humanitarian Assistance and Disaster Relief (HARD) project developed a review of prior work that uses computational methods as well as reviewing properties of data sources for improving HARD efforts. We brought this knowledge into an application context by computing and then compare the situational awareness gained of a disaster depending on computational analysis methods and information sources. We applied Natural Language Processing methods to large-scale text corpora for this purpose.
Funder: Department of Homeland Security and Critical Infrastructure Resilience Institute
Data Governance and Pratical Ethics
Regulations for Human-Centered Data Science, 2017-present
Data Science projects often involve the collection and analysis of online and open data. These data are governed by multiple sets of norms and regulations. Problems can arise when researchers are unaware of applicable rules, uninformed about their practical meaning, and insufficiently skilled in implementing them. To address this issue, we have have developed and provide education, and conduct research on NLP for data regulations.
Funder: Office of the CIO/ Technology Services at UIUC
ConText: A solution to support text and network analysis.
Completed grants and projects:
#0155-0370: Computational impact assessment of issue focused media and information products, 2015-2016
#0145-0558: Computational impact assessment of social justice documentaries and media, 2014-2015
#0125-6162: Computational impact assessment of social justice documentaries, 2012-2014
Funder: FORD Foundation
The overall goal with this project was to propose a solution for measuring impact of social justice documentaries in a theoretically grounded, systematic, empirical, scalable and rigorous fashion using computational approaches and to gain novel substantive knowledge providing actionable insights for film makers and funders.
The project has been conducted for the 2012~2016 period and all intended actions and goals have been achieved and made publicly available in the form of innovative software (ConText), software training material and education for students, scholars, and practitioners, and publications in premiere journals and conferences that document our improved and new methodological steps as well as any substantive knowledge gained by applying these novel methods. Especially, the proposed framework and technologies of impact measurement have been tested against and applied to real world media products. The project has made meaningful progress on developing and evaluating new, empirical, and scalable ways for measuring the impact of information products, especially issue-focused media, on individuals, groups, and society.
Predictive modeling for impact asessment, 2015-2016
Funder: National Center for Supercomputing Applications Faculty Fellowship
#2014-04922: Socio-technical data analytics for improving impact and impact assessment, 2014-2015
Funder: Anheuser Busch
Evaluation of the performance and usability of the ConText technology, 2013-2014
Funder: Center for Investigative Reporting
- Peace is Loud
- Shoot First Inc.
- Picture Motion
- VM People
- Robert Stone Productions
Bias, Data Quality and Provenance
Analysis of academic activity patterns in academic literature, 2017 (Co-PI: Vetle Torvik)
Selection versus Homophily in Scientific Collaboration Networks, 2016
#C3240: Modeling of nation-scale scientific collaboration networks and impact of entity disambiguation on big network data, 2015
#P14033: Authority data based scientist network analysis, 2014
Funder: KISTI (Korea Institute of Science and Technology Information)
Dr. Jana Diesner has led a three-phased project spanning over three years (2014~2016) in order to measure how data quality (name ambiguity) can affect our micro- and macro-level knowledge findings from bibliometric data including that generated and maintained by KISTI (Phase 1), based on quality-controlled KISTI data investigate the structure and evolution of collaboration patterns among almost 700,000 Korean scholars over the 65 years (Phase 2), and identify what mechanisms govern collaborator selection and influence among scholars drawing on social theories of human interaction (Phase 3).
Major findings include that (1) data quality can severely distort our understanding of bibliometric data, contrary to wide assumptions among informetrics scholars that such distortion is ignorable, (2) scholarly collaboration in Korea have grown exponentially but shown trends toward a highly centralized, hierarchical structure depending heavily on a small number of top scholars, and (3) traditional network-structure-based analysis do not explain or predict much formation of collaboration among scholars, leading Dr. Diesner to propose a new framework for detecting the impact of influence in scientific collaboration.
Academia-Industry Big Data Collaboration for Early Career Researchers program: Developing organizational expertise and resources for the responsible conduct of research with human generated and publicly available data, 2016
Funder: Midwest Big Data Hub & Computing Community Consortium
Researchers frequently collect, use, and analyze publicly available data from social networking platforms, online production communities, and customer review websites, among other sources, as part of their research activities. Technically, it can be feasible and straightforward to access and obtain public data, while considering the ethics, norms, and regulations applicable to these data requires additional awareness, knowledge and skills. This is partially due to the fact that multiple types of rules may apply. We provide an overview on these types of regulations, and also clarify on common misassumptions between free as in “free speech” (i.e. freedom from restriction, “libre”) versus free as in free beer (i.e. freedom from cost, “gratis”). We outline several approaches to responsibly conducting research with public available data, and solutions to enhancing the expertise of researchers on implement data regulations, especially for the case of human centered data science.
Information Extraction and Evaluation
#2014-04797-00-00: Predictive modeling for detection and classification of medical entities and facts, 2015-2017
#2013-05716-00-00: Computational expansion of controlled medical vocabulary and identification of organizational social media footprint, 2013
Funder: Intelligent Medical Objects
Developing an entity extractor for the scalable construction of semantically rich socio-technical network data, 2013 (Co-PI: Brent Fegley)
Funder: Start-up allocation award from Extreme Science and Engineering Discovery Environment (XSEDE)