Skip to Main Content
The HSHSL is a part of the University of Maryland, Baltimore | My UMB The Elm UM Shuttle Blackboard
Library Logo

601 West Lombard Street
Baltimore MD 21201-1512

Reference: 410-706-7996
Circulation: 410-706-7928

AI and Information Literacy: Bias & Ethics

Learn about the intersection of Artificial Intelligence (AI) and information literacy.

Introduction: Ethical Considerations for Working with AI

Alongside the exciting possibilities of these AI-based tools, there are a number of things to be careful about as you assess if and when you want to use them. Consider the following:

What is Bias in AI?

While it may be tempting to think of an output from an AI-based tool as neutral when it comes to bias, that is not the case. Since machine learning models are trained on real world datasets, and since the world contains bias, it is safe to assume that outputs from these models may replicate or even exacerbate biases we see in the world around us.

These biases are not always intentional, and may never be entirely eliminated despite ongoing efforts to mitigate them. The presence of bias in generative AI does not necessarily mean that these models should not be used. It does mean that they should be used with caution and with an understanding the human decisions that go into creating them.

The National Institute of Standards and Technology (NIST) produced a report that  identifies 3 broad categories of AI bias:

  1. Systemic - resulting from long-existing institutional procedures and practices that lead to some social groups being advantaged and others being disadvantaged.  (e.g. institutional racism and sexism)
  2. Statistical and Computational -  resulting from data not being representative of the population and from modeling decisions like over- or under- fitting, treatment of outliers, and data cleaning and imputation factors
  3. Human - resulting from systematic errors of human thought, such as confirmation bias, where people tend to prefer information that aligns with their existing beliefs.

ChatGPT and similar models are trained primarily on content scraped from the Internet, like websites, e-books, social media platforms, and conversational data (from chat logs, comment sections, etc).  It may seem like the sheer scale of this data would lead to a diverse dataset, but the Internet is not equally accessible to all groups. Additionally, more mainstream content tends to be scraped for inclusion in the training data. This means that overall Web content is not representative of the population, and tends to skew toward younger, male, English-dominated viewpoints. (For more read Bender and Gebru, et al. "On the Danger of Stochastic Parrots".)

Models trained on this data may pick up on certain demographic, cultural, ideological, and political biases in the content. Much of this web content is English which introduces linguistic biases. There may also be temporal biases, because the data is from a specific time range, and may become divorced from its original context.  All of these factors can lead to generative AI not only producing biased results, but also perpetuating and amplifying existing biases. (For more read Emilio Ferrara, "Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models" )

Security and Privacy

It is safe to assume that -- in some way or another -- any information you put into a AI-based tool is being used to further train the machine learning model. If you choose to use these tools, you'll want to make sure you're never putting personal or secure information about you or anyone else in your chats. You should also read through any user agreements if you sign up to use a particular service and make your own decision if you are comfortable agreeing to the terms. If one of your class projects requires the use of a particular technology that you do not wish to create an account for, you can ask your instructor for an alternative way to complete the assignment. 

Labor and Copyright

Where does the content come from? Because machine learning takes huge inputs of data sets, many models use information from the internet in their training. Artists and authors have criticized AI-based tools for using their work without compensation or credit. If an AI-based image generator can produce work in the style of a certain artist, should that be seen as stealing, or paying homage?

Environmental Impact

Generative AI models have large carbon footprints.The vast amount of data that goes into these models requires the use of many powerful computers located in data centers around the world. These data centers are massive consumers of electricity and water.  There are also environmental costs to the extraction of raw material, manufacture, and transportation of the hardware. When these data centers are built in low-and middle- income countries they can divert already scarce resources from local needs and exacerbate existing inequalities.

Read more:

Examples of Bias in AI

  • This Bloomberg Report on the image generator Stable Diffusion. When prompted for images of people in high-paying and low-paying professions, it showed an association between lighter skin tones and higher paying jobs and darker skin tones and lower paying jobs. Images of men were more dominant in the dataset overall.
  • Digital artist Nettrice Gaskins examines how Midjourney creates very different images when prompted to show a "boy on a bicycle" versus a "black boy on a tricycle"
  • Another study showed that, although ChatGPT has been trained to avoid hate speech when prompted to directly insult a particular group, it can be manipulated into adopting "toxic" and racist personas.
  • Researchers prompted 14 different language models to agree or disagree with 62 political statements, and plotted responses on a political compass and found distinct political tendencies among them. Models trained on data from right- and left-leaning sources became more right- or left- leaning respectively. Read the MIT Technology Review summary or the original paper.

In some cases, models may have been improved as a result of these and other similar reports. But you can test it out for yourself! Try doing similar experiments with generative AI models of your choice and see what biases you notice!