601 West Lombard Street
Baltimore MD 21201-1512
Reference: 410-706-7996
Circulation: 410-706-7928
Alongside the exciting possibilities of these AI-based tools, there are a number of things to be careful about as you assess if and when you want to use them. Consider the following:
While it may be tempting to think of an output from an AI-based tool as neutral when it comes to bias, that is not the case. Since machine learning models are trained on real world datasets, and since the world contains bias, it is safe to assume that outputs from these models may replicate or even exacerbate biases we see in the world around us.
These biases are not always intentional, and may never be entirely eliminated despite ongoing efforts to mitigate them. The presence of bias in generative AI does not necessarily mean that these models should not be used. It does mean that they should be used with caution and with an understanding the human decisions that go into creating them.
The National Institute of Standards and Technology (NIST) produced a report that identifies 3 broad categories of AI bias:
ChatGPT and similar models are trained primarily on content scraped from the Internet, like websites, e-books, social media platforms, and conversational data (from chat logs, comment sections, etc). It may seem like the sheer scale of this data would lead to a diverse dataset, but the Internet is not equally accessible to all groups. Additionally, more mainstream content tends to be scraped for inclusion in the training data. This means that overall Web content is not representative of the population, and tends to skew toward younger, male, English-dominated viewpoints. (For more read Bender and Gebru, et al. "On the Danger of Stochastic Parrots".)
Models trained on this data may pick up on certain demographic, cultural, ideological, and political biases in the content. Much of this web content is English which introduces linguistic biases. There may also be temporal biases, because the data is from a specific time range, and may become divorced from its original context. All of these factors can lead to generative AI not only producing biased results, but also perpetuating and amplifying existing biases. (For more read Emilio Ferrara, "Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models" )
It is safe to assume that -- in some way or another -- any information you put into a AI-based tool is being used to further train the machine learning model. If you choose to use these tools, you'll want to make sure you're never putting personal or secure information about you or anyone else in your chats. You should also read through any user agreements if you sign up to use a particular service and make your own decision if you are comfortable agreeing to the terms. If one of your class projects requires the use of a particular technology that you do not wish to create an account for, you can ask your instructor for an alternative way to complete the assignment.
Where does the content come from? Because machine learning takes huge inputs of data sets, many models use information from the internet in their training. Artists and authors have criticized AI-based tools for using their work without compensation or credit. If an AI-based image generator can produce work in the style of a certain artist, should that be seen as stealing, or paying homage?
Generative AI models have large carbon footprints.The vast amount of data that goes into these models requires the use of many powerful computers located in data centers around the world. These data centers are massive consumers of electricity and water. There are also environmental costs to the extraction of raw material, manufacture, and transportation of the hardware. When these data centers are built in low-and middle- income countries they can divert already scarce resources from local needs and exacerbate existing inequalities.
Read more:
In some cases, models may have been improved as a result of these and other similar reports. But you can test it out for yourself! Try doing similar experiments with generative AI models of your choice and see what biases you notice!