Online Workshop 3: What challenges do Machine Learning and AI raise in terms of privacy, ethics, research integrity, reproducibility, and bias?
This workshop will be held online and hosted on ZOOM by Durham University on Thursday 27th January from 2pm to 4:30pm and Friday 28th January from 3pm to 5:30pm UK time.
The workshop is free to attend, but registration is essential to receive the joining information. Please complete the google docs form to register by Tuesday 25th January: https://forms.gle/RKexThosKPBidCDbA
Title: “What challenges do Machine Learning and AI raise in terms of privacy, ethics, research integrity, reproducibility, and bias?”
This workshop, the third in the series, looks at privacy and the uses of AI. It asks how far we, as users and as information professionals, trust AI both in terms of how transparently algorithms are constructed, and what their creators say about them. How far are we able to cut through promotional hype and evaluate the affordances of AI for use in cultural heritage? How do we allow for potential biases in the construction of algorithms? How should we advise potential new users of such technologies?
The workshop will consist of two Keynote talks, one on each day, followed by open discussion sessions, where participants will be encouraged to discuss questions, and make recommendations for future action.
The Keynotes will be Jason R. Baron (University of Maryland) and Alexandra Cristea (Durham University).
Alexandra Cristea (Thursday 27th January)
Bio: Alexandra I. Cristea is Professor, Deputy Head, Director of Research and Head of the Artificial Intelligence in Human Systems research group in the Department of Computer Science at Durham University. She is Advisory Board Member at the Ustinov College, N8 CIR Digital Humanities team lead for Durham. Her research includes web science, learning analytics, user modelling and personalisation, semantic web, social web, authoring, and has written over 300 papers on these subjects. Her work on frameworks for adaptive systems has influenced many researchers and is highly cited. She was classified within the top 50 researchers in the world in the area of educational computer-based research according to Microsoft Research. Recently she has taken giving back to the community to a different level, leading the Empowering women in science through mentoring and exchanging experiences (2021-22) (UK-Brazil Gender Equality Partnership funded by the British Council), and co-leading the TechUP project series (Bootcamp 2021) (2019-2020: training 100 women in computer science from various (BAME) backgrounds).
Title: Bias in AI.
Abstract: Artificial Intelligence is a thriving area in Computer Science. Especially trending is the sub-area of Machine Learning and Deep Learning, including Data Analytics. However, the latter comes often with various forms of bias. Bias in AI can be introduced in many forms, from data to methods and algorithms, and it negatively affects people as well as research quality. It also impacts upon an increasing amount of areas, including sensitive ones, such as healthcare, law, criminal justice, hiring. Thus, an important task for researchers is to use AI to identify and reduce (human or machine) biases, as well as improve AI systems, to prevent introducing and perpetuating bias.
Aspects of Bias in AI range:
- from statistical/theoretical perspectives –where bias should be avoided with new algorithmic solutions, methodologically correct procedures (e.g., bias induced by overlapping training/test set, historically inaccurate time series, average accuracy results only in classification); sensitivity analysis (including k-anonymity, l-diversity, t-closeness, k-safety, k-confusability, t-plausibility) for structured/unstructured data, or ways of quantifying uncertainty in deep learning, e.g., via adversarial learning, generative models, invertible networks, meta-learning nets.
- to human perspectives – where specific types of bias introduced by data or methodology can do harm, such as in implicit racial, ethnic, gender, ideological biases.
The former perspectives are to produce correct or optimised results, the latter are to lead to conversational explanations and explainable AI, in view of GDPR and increasing ethical concerns, and the move from symbolic AI to sub-symbolic (deep) representations, with no direct answer to the classic AI questions of ‘Why’ and ‘How’. This includes the novel field of Machine Teaching, expanding on the classical field of knowledge extraction from (shallow or, more recently, deep) Neural Networks. This area should lead to novel insights into accountability of AI. This talk will consider some of these aspects of Bias in AI and lead to thoughts and possibly a wider discussion on the social impact of AI.
Jason R. Baron (Friday 28th January)
Bio: Jason R. Baron is a Professor of the Practice in the College of Information Studies at the University of Maryland. Previously, he served as the first appointed director of litigation at the US National Archives and Records Administration, and before that as a trial lawyer and senior counsel at the Department of Justice. In those capacities, he acted as lead counsel on landmark lawsuits involving the preservation of White House email, and also played a leading role in improving federal electronic recordkeeping policies. Mr. Baron is a recipient of the international Emmett Leahy Award for his achievements in records and information management, including co-founding a Legal Track at the US National Institute of Standards and Technology Text Retrieval Conference (TREC), to evaluate the efficacy of machine learning methods as used in legal practice. He served as lead editor of the book Perspectives on Predictive Coding and Other Advanced Search Methods for the Legal Practitioner (2016), and is the author of over 100 published articles on e-discovery, electronic recordkeeping, and information governance. Mr. Baron received his B.A. magna cum laude with honors from Wesleyan University, and his J.D. from the Boston University School of Law.
Title: Challenges in Providing Access To The Digital Universe: Are Algorithms The Answer?
Abstract: The sheer volume of electronic and digital records in archives and other cultural institutions is already overwhelming the ability to provide meaningful access to patrons and the public at large. In the United States, the National Archives currently holds over a billion pages of White House emails and attachments going back to the 1980s, only a fraction of which are publicly available. Machine learning techniques, properly applied, may be useful in searching for relevant records and filtering those records for personal information and other sensitive content. But what are the privacy-related and other obstacles we presently encounter in using AI methods? Can we trust the algorithms used to open up archival collections? And if we don’t trust AI, does going digital mean for all practical purposes going dark for many decades to come?