Interview with Nisarg Shah: Understanding fairness in AI and machine learning
During the 33rd International Joint Conference on Artificial Intelligence (IJCAI), held in Jeju, I had the opportunity to meet with one of the keynote speakers, and winner of the 2024 IJCAI Computers and Thought Award, Professor Nisarg Shah. I asked him about his research, the role of theory in machine learning research, fairness and safety […]
During the 33rd International Joint Conference on Artificial Intelligence (IJCAI), held in Jeju, I had the opportunity to meet with one of the keynote speakers, and winner of the 2024 IJCAI Computers and Thought Award, Professor Nisarg Shah. I asked him about his research, the role of theory in machine learning research, fairness and safety guarantees, regulation, conference reviews, and advice for those just starting out on their research journey.
Kumar Kshitij Patel (KKP): Professor Shah, thanks for joining me. Could you start by telling us about yourself, your career, and your education?
Nisarg Shah (NS): I grew up in India and went to IIT Bombay for my undergraduate. Ever since then, I knew that I wanted to go into higher education and academia. I actually did do an industrial placement after my undergrad, and I got a job offer that was very lucrative and would have been more lucrative than doing a PhD. However, that [money] is not why I wanted to do my PhD. I wanted to do my PhD because I was genuinely curious about different questions in this field, and I wanted to study more about them and have fun while doing it. And that brought me to a PhD at Carnegie Mellon University (CMU). I had a fantastic time and a fantastic advisor, Ariel Procaccia. After that, I got an offer from the University of Toronto, and I deferred that for a year to take a postdoc at Harvard with David Parks and Yiling Chen. Then, finally, I joined the faculty position at the University of Toronto. I got tenure in July 2023, and I’m now an associate professor.
KKP: You said you got interested in the field during your undergraduate degree. Did you get interested in theoretical computer science generally, applied or theoretical machine learning, or have you always wanted to work on fairness-related questions?
NS: That’s an interesting question and one that many students ask me – “How do you know what field you want to work in?” And I think that the answer is that you gradually discover it. Right from school age, I knew that I liked the more mathematically-oriented approaches. I was looking at different disciplines for doing my undergraduate studies and I realized that theoretical computer science is really just applied math, and I can apply it to problems that I relate to and that interests and excites me. So that led me down the path of computer science within this general mathematically-oriented discipline. When I joined my PhD, I was open to exploring different areas. CMU has this great system where, for the first month, all you do is just go and meet with different faculty; there are no classes. I was working with three different faculty members at the same time on three different projects in three completely different areas. One in theoretical computer science, one in computational social choice, and another one on auctions. After working on all of them, I realized that this computational social choice area seemed the most exciting to me because it looks at these problems that I can really see out there and relate to, and then find very appealing and very clean mathematical formulations of them. And then, once you derive some results, you can actually see how those results directly have an impact in the real world setting that you are trying to model. Even if your modeling may not be completely perfect, it can still give you a lot of insight as to what you should do in the real world, whereas some of the problems in theoretical computer science were a bit more on the abstract level. So that led me down the computational social choice path within which then I got more interested in fairness. Humans have been interested in fairness for millennia, and we have a lot of intuition about it. However, it was very, very underdeveloped mathematically, and I saw a lot of potential there. Now, I’m turning from fairness to a broader problem of value alignment – the value could be fairness, safety, or a lot of different things.
KKP: You already touched on this in your previous answer, but what do you think is the role of theory/foundation work in machine learning?
NS: That’s a great question. I mean my thoughts on this subject are constantly evolving. Had you asked me this six months ago, the answer would have been very different. Currently, I think that there are certain aspects of machine learning systems where it’s not as important to know theoretically that things will work as long as they tend to work in practice. The prominent example is accuracy. For example, if you are designing a translation system, a few errors here and there are not a big concern. You just want the system to be working well on average. And there I think that the machine learning community has done a good job of focusing more on just driving the empirical performance rather than just being tied down to whatever we can prove theoretically.
But then there are some other aspects, such as fairness or safety, where we are not happy with just knowing that most of the time, it seems to work. If you think about safety, even if it doesn’t work once and there is a terrible safety violation, then that can have great repercussions. So, there we really want to have a formal guarantee to provide to the users where we can prove that the system is safe or fair. You cannot just deploy an AI model that’s going to affect people and then just say, “it seems to be fair in all my experiments.” So that’s why, in certain aspects, we want provable guarantees. That’s the one big role of theory. And then for certain other aspects, even though theory may not be completely necessary, it can still give great insight.
KKP: Is it possible that for systems where we do not understand why they even learn, we can hope to give guarantees about their fairness and robustness? And if it’s not, then is it the case that for a lot of the situations where we care about fairness and robustness, it ends up being that simpler machine learning systems are actually being deployed? Is it possible to give both some guarantee about optimization or generalization and also guarantees about fairness and robustness?
NS: That is something I’ve been battling about myself. I’ve been thinking about reinforcement learning from human feedback (RLHF) for large language models (LLMs). A member of my group is exploring the idea of using sortation algorithms for annotator selection so that we can give formal guarantees on how representative those annotations of norms or values would be. But that’s not enough. Even though we have guarantees for those annotators’ norms, and values, when we train our LLM using that, we need guarantees on the final performance of the LLM. The reinforcement learning environment is just not structured enough. But there are lots of formal things you can prove for reinforcement learning, so it remains to be seen whether the fairness guarantees that we have on the input to the reinforcement learning environment can be translated onto the output of the reinforcement learning.
KKP: Let’s say we have a hypothetical scenario where we care about some fairness question, robustness question or there’s some other safety concern that arises because of the deployment of the machine learning system. Would you prefer to deploy a simpler system for which you can give these guarantees, or would you be fine with deploying a more accurate, larger system for which we can only give heuristic guarantees?
NS: It all depends on the trade-off, right? If you asked me to deploy just decision trees, then no, I would not be willing to deploy just decision trees compared to the state-of-the-art systems. But I think that we should be able to derive formal guarantees on systems that are much, much more powerful, powerful enough that we would rather deploy those systems with their provable formal guarantees than systems that are only slightly better than that but do not come with any provable guarantees.
KKP: I see. So until we get there, there’s sort of a murky ground…
NS: Until we get there, for sure. Right now, we are all using ChatGPT and other LLMs, and there’s a reason why we are doing so. There’s no getting around it: these systems are doing things that other systems just aren’t able to do. So, until we get there, we are certainly going to continue using them. But that’s why I think that it’s also important to boost funding for ethics and safety. Currently, a lot of the funding for AI comes from industry and a lot of that is on boosting core technology, the core capabilities of the systems. A large chunk of the ethics and safety funding comes from government sources, but it’s not enough. The governmental funding agencies are trying to balance the funding that they give out for core ML capability development versus ethics and safety development. But they sometimes don’t realize that they are not working in an isolated environment. There is a lot of industry funding that’s already going towards the core capability development. So, these government agencies need to give significantly higher levels of funding for ethics and safety. Each agency thinks that the right split is fifty-fifty (fifty for ethics and safety, fifty for capability development), not realizing that they are not the only player in the whole ecosystem.
KKP: That makes a lot of sense. So given that you talked about your undergraduate and how you got interested in both math and theoretical computer science. How have you seen the relationship between theory and applied machine learning / AI change over the years, both in the broader ML/AI community and also, more specifically, in fairness?
NS: I think that it’s been a roller coaster ride. I would say that there were years when math and formal approaches were getting increasing attention within ML and AI. Initially, when deep learning systems came out, everybody was all about trying to understand and prove generalization guarantees. Then, these systems took off, and many people lost their belief that we would ever be able to prove anything formal about systems as advanced as these transformer-based systems. Then, people suddenly got very disinterested in mathematical approaches. But now I think they are making a comeback. There is this realization that, even though you may not need maths for everything, there are certain things for which you need it. There are lots of prominent people working on AI safety who have advocated that the only feasible approach to AI safety is one that gives formal guarantees, in other words, making sure that the system will never take an unsafe action. It’s not enough to just have simulations and know that in simulations the system seems to be working safely. You need to know that the system will not be unsafe in the real world. I think with the safety concerns at the forefront, mathematical approaches are making a comeback. In the fairness community, people have always realized that, to some extent, we can have a lot of interesting, vague ideas about the social aspects of fairness, but at the end of the day, these AI systems are mathematical systems. Unless we find a way to translate those social ideas around fairness into technical approaches and encode those ideas into the AI systems, then these AI systems are not going to behave in a fair manner. That’s something that I actually explored in an assignment on a new graduate course that I designed just last semester. The course is on the mathematical foundations of algorithmic fairness. In one of the assignments, I cover one of the prominent social approaches to fairness, which is that, when you are in a company forming a team of data scientists and engineers, you should always strive to get people from various backgrounds because the more diversity that you have in your team, the more different viewpoints and thought processes you will have in your team, and that leads to the team developing more ethical models. In the assignment, I probe the students to think about the different mechanisms through which a more diverse team of data scientists and engineers training an AI model can actually make the model fairer. So now they have to actually take this intuition (that if you have a more diverse team, it’s going to be better) and think about the different mechanisms through which this can happen. We need to understand the exact mechanisms through which diversity and inclusion actually help in terms of these concrete aspects, such as fairness or safety. If you understand exactly how diversity and inclusion can help with fairness, there may be interventions that we can take that particularly boost that effect and can lead to much fairer AI systems.
KKP: If you were given a free hand in a hypothetical world, what one AI policy would you like to either bring forward or change?
NS: I have a lot of thoughts on this. However, relating to my subject expertise of fairness, I would say that there are a few things that really need to happen. One is that all the AI regulations that we have are extremely vague to the point of essentially being meaningless. For example, the AI and data act that is going through the parliamentary rounds right now in Canada essentially has a language about fairness that says that you shouldn’t discriminate between certain groups of people – that’s not anything new compared to what the law already said. It doesn’t tell us what discrimination is; it doesn’t tell us about other groups that are not protected. It’s a very crude, limited, and vague approach to fairness, which doesn’t provide any operational guidance. I understand the need for vagueness, but if you think about the laws that we have in our society, they still do provide a lot more operational guidance. For example, I know that I shouldn’t jaywalk on the road or steal from somebody else, so there is a lot of concrete knowledge that I can derive from the laws and regulations that we have in the world. However, for AI systems, if you ask a data scientist or an engineer to read an entire regulation or law, what does it tell them about how they should train their AI system? I think it would tell you very, very close to zero. That’s something that we really need to improve. We need ways to find language and frameworks to capture different scenarios and, at the same time, provide more operational guidance. At the moment, it’s all being decided by the individual data scientist who is training the model inside a company, and I think that this is a matter of too great importance to be left in the hands of one person. We need to have broader societal discussion and input. Then, the discussion needs to be narrowed down into an operational law or regulation. In terms of the concrete approaches, I would start by holding more town halls with the public. I’m actually collaborating with a nonprofit, AIGS Canada, in the role of advisor. They are launching something called “National Conversation on AI,” and their plan is to hold town halls, debates, panels, and discussions across Canada. The first step would be to inform the public about what AI is and isn’t, then collect their more informed views and take those views to the policymakers. The next step would be to take all these collective societal norms and views and use them to form policies that still have the benefits of being very broadly applicable yet provide a lot of operational guidance for the data scientists and engineers who are actually training these models.
KKP: I want to ask the reverse question now. In your talk yesterday you mentioned how AI, or fairness in general, can take inspiration from governance, in particular, representational democracy. I wonder if you also have thoughts on how governance can improve by using AI and either become more efficient or fair? Also, where there are these information asymmetries, can we break them by using technology or AI in particular?
NS: That’s a great question. I am going to stick to democracy, which I’ve done a lot of work on. We have thought about a lot of approaches for using AI to improve democracy. In the social choice literature it’s often divided into two formulations or viewpoints. One is the viewpoint of subjective preferences, where every voter has their own subjective preferences, and there are no right or wrong preferences. Everybody just wants different things, and then the goal of the election is to find some sort of a consensus or trade-off. So, you find a candidate who will lead the country in a direction that may not be perfectly aligned with any group of voters, but it’s a good trade-off between what the different voters want.
Then there is a very different viewpoint where there is a ground truth answer. There is a candidate that’s truly the best for the country and each voter just has a noisy perception of which candidate is the right candidate. We want to aggregate all these votes and use the wisdom of the crowd to remove the noisy perception of the voters and figure out the ground truth.
I think that the true answer lies somewhere in between. So, even though I want a candidate who would, let’s say, optimize some objective function (be that social equality or improved GDP), I also have certain perceptions about which candidate would be the best candidate to move the country in that direction. And the thing is that my perception of these candidates is based on the speeches that the candidates gave and what else I read in the media, and there could be some factual misinformation there. What I think we need to do in democratic systems is to help voters correct their factual misinformation. That’s where I think AI has a role to play because the AI systems may be able to detect the factual misinformation that you’ve been given. An LLM could probe voters through a dialogue, and help them establish their own preferences (basing this on factual information rather than what has been proved to be incorrect in speeches or in the media). That’s a role that AI can play in helping to correct misinformation.
KKP: How do you view big tech’s role in relation to some of these issues? In the past few years there have been a lot of controversies about what sort of messages they amplify versus subdue. And then there have been a lot of things happening with regard to social media platforms, where they’ve been taking away all the teams that did a lot of the job of marking things as factual or not. So, where do you see big tech appearing here?
NS: A lot of the information that we consume comes through the filter of big tech. So it certainly has a dramatic effect on society and, as such, I think that it does require very, very strong regulation now. Often, big companies like Google, Amazon, or Meta have a very interesting approach when it comes to regulation, which is that they are OK with certain types of regulations, such as GDPR, which work in their favor. A company like Google would have the resources to change its infrastructure to comply with GDPR, whereas a lot of smaller companies may not have that, and they might be driven out of the EU markets. However, when it comes to regulations that they believe go against their company interest, they very strongly object. For example, the UK AI Safety Institute demanded access to state-of-the-art models before they were deployed. When Anthropic recently deployed its next generative model, it did not provide any pre-deployment access window to the AI Safety Institute, and this caused a big uproar. Anthropic basically said that this idea is nice, but it’s not practically feasible, and they can’t give access to the model before it’s deployed. I think that we need more expertise within the government and the regulatory agencies so that they can understand the state-of-the-art systems as they are being developed and can actually play a significant technical role in auditing these systems, in guiding the development of these systems and in putting up the frameworks that can constrain these systems to be ethical and safe. That’s where I think the government needs to recruit talent that can help them keep up with the development of the AI community, and they can take charge rather than let the companies self-regulate.
KKP: That makes a lot of sense. We’ve talked about a lot of things. I want to zoom in a little bit on our own community and the health of the machine learning community. I wanted to ask about your opinion on peer review in both AI conferences and machine learning conferences, and at smaller conferences. How do you see peer review being different across different scales of conferences? Do you think we are in a healthy state as a community in terms of doing a good service, and what are the challenges that you see emerging here?
NS: Initially, I used to believe that it’s all about the scale and that when you have a small conference, it’s easy to control the quality of the reviews, and when you have a large conference, there’s not much you can control. Then, over the past years, my experience has been quite different. I’ve seen a lot of small conferences having terribly low-information reviews and some large conferences have more informative reviews. In addition, sometimes the same large conference might have an extremely bad review process one year, and then the next year a review process that people are happy with. I think that a lot of this comes down to small choices that happen during the organization of the conference. We need to look at data from all the past conferences, collect author sentiment on the review process, and then try to detect patterns on what choices made the review process much better. But at a high level the scale is not really that important because as you have bigger fields, with more people, you have more submissions, but you also have more reviewing expertise. The issue is that, in some fields like machine learning, I’ve seen senior people not really participating in the review process because they have 20-plus PhD students that they’re managing and they feel that they do not have the time to review a paper, let alone serve as an area chair. So, I think that we need mechanisms to enforce that more senior people are involved in the review process. A lot of the senior people have a net negative contribution to the review process in the sense that they submit a lot of papers and get a lot of reviews, using up a lot of the community hours, but do not invest that many hours back. Just to mention another issue on the topic of reviewing, I’m not sure that reviewers are given sufficient instructions or guidance on what high-quality reviews look like.
KKP: Finally, do you have any advice for junior researchers or PhD students who are starting in the areas of fairness or theoretical machine learning that perhaps you would have liked to have known when you were in their shoes?
NS: We used to be in an era where sometimes it was OK to just put your head down into your research, find interesting mathematical problems, solve them, publish, move on, find new problems, solve, publish, move on. Now, we are in an era where the boundary between research and the real world is much more blurred than it was before, even for the theoretical disciplines. I think that now the process is starting to be very different – you think about what’s problematic or missing in the real world, and then you try to model that in your theoretical discipline. You should constantly keep in mind whether your solution is actually trying to fix the issue that you started from. If it’s not, then there’s something wrong with your modeling. The right approach is to go back and fix your model so that the solution you design actually says something meaningful about the problem from the real world that you were excited about in the first place. I think that this whole process of thinking about what your research means out there in the real world is something that’s becoming increasingly relevant, and I only started doing this a little bit later in my PhD. I would recommend to PhD students that they be attuned to this process at the start rather than having to change their mindset later on.
About Nisarg
Nisarg Shah is an Associate Professor of Computer Science at the University of Toronto. He is also a Research Lead for Ethics of AI at the Schwartz Reisman Institute for Technology and Society, a Faculty Affiliate of the Vector Institute for Artificial Intelligence, and an Advisor to the nonprofit Spliddit.org, which has helped more than 250,000 people make provably fair decisions in their everyday lives. He earned his PhD in computer science at Carnegie Mellon University and was a postdoctoral fellow at Harvard University. |