March 19, 2025

AI Personalities: Between Safety, Bias, and Hallucination

AI Personalities: Between Safety, Bias, and Hallucination
The player is loading ...
AI Personalities: Between Safety, Bias, and Hallucination
00:00
00:00
00:00

AI personalities are shaping the way we engage and interact online. But as the tech evolves, it brings with it complex ethical challenges, including the formation of bias, safety concerns, and even the risk of confusing fantasy with reality. Our host, Carter Considine, breaks it down in this episode of Ethical Bytes.

 

The synthesis of training data and the particular values of their developers, AI personalities range from friendly and conversational to reflective and philosophical. All these play huge roles in how users experience AI models like ChatGPT and AI assistant Claude. The imparting of bias and ideology are not necessarily intentional on the developer’s part. However, the fact that we do have to deal with them raises serious questions about the ethical framework we should employ when considering AI personalities.

 

Despite their usefulness in creative, technical, and multilingual tasks, AI personalities also bring to mind issues such as what we could call “hallucinations”—where models generate inaccurate or even harmful information, without consumers even realizing it. These false outputs have real-world implications, including (but not limited to) law and healthcare.

 

The cause often lies in data contamination. This is where AI models inadvertently absorb toxic or misleading content, or in the misinterpretation of prompts, which inevitably lead to incorrect or nonsensical responses.

 

AI developers face the ongoing challenge of building systems that balance performance, safety, and ethical considerations. As AI continues to evolve, the ability to navigate the complexities of personality, bias, and hallucinations will be key to ensuring this technology stays both useful and reliable to users.

 

Key Topics:

  • What if AI Turns Hostile? (00:00)
  • The Personalities of AI Models (01:18)
  • Aligning AI Traits and Practical Applications (05:51)
  • The Hallucination Problem: A Feature, Not a Bug (09:30)
  • Conclusion (14:22)

 

 

More info, transcripts, and references can be found at ⁠ethical.fm

Artificial intelligence is meant to be helpful, informative, and at times even conversationally engaging. But what happens if AI turns hostile? In late 2024, Google’s AI chatbot Gemini made headlines when it generated a disturbing response: "Human… Please die". Naturally, this chilling message wasn’t an intentional feature but an unexpected output and safety failure of a probabilistic model. Such incidents shed light on the broader issue that AI personalities are shaped by their training data, which is influenced by company values, as well as the underlying mechanisms of LLMs. This article explores different AI personalities and the inevitable ethical issue of AI hallucinations.

The Personalities of AI Models

Understanding AI Personality

AI models exhibit distinct interaction styles that users often describe as "personalities." In academia, the concept of AI personalities is an emerging research area that is still being understood. Should personality be defined as how a bot “feels” about itself? Or how a user interacting with the bot feels about the bot? The field of social computing, which predates the emergence of LLMs, asks the question of how to imbue machines with traits that help humans achieve their goals. Such bots could serve as coaches or job trainers, for instance. But Maarten Sap, a natural language processing expert at Carnegie Mellon University in Pittsburgh, and others working with bots in this manner hesitate to call the suite of resulting features “personality.” “It doesn’t matter what the personality of AI is. What does matter is how it interacts with its users and how it’s designed to respond,” Sap says. “That can look like personality to humans. Maybe we need new terminology.” These perceived personalities are not the result of intentional character design but emergent behaviors shaped by pre-training data and later fine-tuning. Although much of the pre-training data is similar, the human-curated training data used for fine-tuning is greatly influenced by the organizations that develop the training data pairs, giving each model a specific feel.

Human Influence on AI Behavior

AI models are not created in a vacuum. The values, priorities, and biases of the organizations developing them inevitably influence their training data, therefore behavior. This influence manifests through the curation of training data, fine-tuning objectives, and the ethical guidelines imposed on the models. 

Models such as ChatGPT have faced criticism for perceived left-leaning biases, such as refusing to speak about Donald Trump during the election. Another example is Llama-2 endorsing Kamala during the recent presidential election, no matter how hard the policy team tried to force the model to remain neutral. Although there’s a causal relationship between training data and behavior, most organizations still struggle with getting AI to behave reliably, pushing the need for guardrails. The problem with guardrails is that they tend to use hard-coded input (prompt) and output (answer) filters, reducing model accuracy and usefulness. Other guardrails rely on prompt engineering, which is unreliable due to the possibility of jailbreaking the model.

AI Personality is Subjective

Unlike human personality, AI personality is not an explicitly defined attribute but a collection of behaviors observed by users. It is important to note that base model companies do not officially publish personality traits for their AI models; instead, the users form impressions based on their prompt and response interactions. Since there is no standardized framework for benchmarking AI personality, evaluations are inherently subjective, relying on anecdotal experiences and user perceptions rather than measured benchmarks.

Handling Inappropriate or Harmful Requests

AI chatbots like ChatGPT and Claude are designed to reject inappropriate or harmful requests, especially when users are rude or abusive. This behavior is guided by ethical training protocols, such as OpenAI’s moderation policies and Anthropic’s constitutional classifiers, which help models identify and decline dangerous or offensive prompts. Despite these safeguards, researchers have found that chatbots can struggle to manage impolite users effectively, highlighting the ongoing need for improved strategies in AI interaction design.

Let’s explore how users describe the personality of various popular models and how their personalities contribute to the usefulness of particular use cases. 

Aligning AI Traits with Practical Applications

Creative Writing and Content Generation

For storytelling, brainstorming, and creative writing, ChatGPT and Claude stand out. ChatGPT has a friendly, conversational personality that fosters creativity while engaging with the model, making it an excellent partner for narratives, character dialogues, and brainstorming novel phrasing. Claude has a reflective and philosophical nature, which adds depth and critical thinking to writing. Claude often provides thoughtful, morally complex content that can enrich storytelling. Since Mistral performs well in multilingual contexts, this model is also useful when needing to tailor stories to specific and culturally diverse audiences.

Research and Information Retrieval

For summarizing complex documents and retrieving factual information, Gemini and LLaMA excel. Gemini’s structured, analytical personality makes it adept at breaking down dense information into clear, concise summaries, ideal for academic and professional research. LLaMA’s efficient, no-frills personality supports rapid data processing and technical documentation. Meanwhile, ChatGPT’s approachable and explanatory style helps users grasp complicated topics through simplified, easy-to-understand explanations.

Ethical and Philosophical Reasoning

For nuanced ethical debates and reflective discussions, Claude leads with its principled, cautious personality shaped by Constitutional AI, encouraging thoughtful moral reasoning. This makes Claude particularly strong in discussions around ethics and philosophy. ChatGPT’s balanced, adaptable personality allows it to present multiple perspectives fairly, fostering well-rounded debates. Gemini, with its fact-driven personality, complements these discussions by grounding arguments in empirical evidence and structured reasoning.

Technical and Research Applications

LLaMA, Gemini, and Claude excel in technical domains like coding and data analysis. LLaMA’s efficiency-driven design supports quick, accurate outputs, while Gemini’s detail-oriented structure enhances technical workflows (direct.mit.edu). Claude stands out for programming, offering advanced problem-solving, real-time code suggestions, and effective debugging through tools like Cursor.so and Replit. Its strong reasoning skills make it ideal for both novice and experienced developers.

Multilingual and Global Communication

For multilingual communication, Mistral and ChatGPT excel. Mistral’s culturally sensitive and linguistically adaptive personality enables it to handle diverse languages with nuance, making it ideal for businesses operating globally. ChatGPT’s personable, conversational nature helps bridge language barriers, offering smooth, engaging communication in multiple languages. LLaMA’s concise, direct style is particularly effective for technical multilingual contexts, supporting cross-border research and development initiatives.

The Hallucination Problem: A Feature, Not a Bug

One of the most significant challenges in AI ethics is the problem of hallucinations. AI models do not "know" things in the way humans do; instead, they generate responses based on probabilities derived from their training data. This means that every response is, in essence, a very educated guess, leading to cases where AI confidently generates false information.

Understanding Extreme and Unexpected Outputs

Some AI-generated hallucinations are more than just factual inaccuracies. The outputs can be outright bizarre or even disturbing, as seen in Gemini’s case. These extreme outputs occur due to several key factors.

One major reason for these failures is data contamination. Large-scale training datasets inevitably contain harmful or toxic content, even if such material is relatively rare. AI models trained on vast amounts of internet text may inadvertently absorb and replicate problematic language, sometimes producing outputs that reflect this unseen contamination.

Another contributing factor is prompt misinterpretation. AI models do not think or understand context in the way humans do. Instead, they generate responses based on statistical associations between words. If a model misinterprets the intent behind a user’s prompt, it may produce an inappropriate or nonsensical response due to an incorrect prediction of the next most likely word sequence.

A lack of robust safeguards can also lead to unexpected AI behavior. While AI developers implement safety filters to prevent harmful outputs, these mechanisms are not foolproof. If an AI encounters an input scenario it has not been adequately trained to handle, it may resort to generating an unpredictable or anomalous response, as seen in cases where chatbots have issued offensive or dangerous statements.

Finally, adversarial inputs can deliberately push AI models into generating unintended outputs. Some users actively try to manipulate AI models through "prompt injection" techniques, crafting queries that exploit weaknesses in the model’s guardrails. This can result in models generating outputs they were specifically designed to avoid, sometimes revealing deeper vulnerabilities in AI safety measures.

Are Hallucinations Always Harmful?

The ethical implications of hallucinations depend on the context in which they arise. In some cases, AI hallucinations are harmless, such as when an AI generates a fictional story or adds artifacts to images. For example, if a user asks an AI to "describe what a conversation between Shakespeare and Einstein might sound like," and the AI creates a fictional dialogue, no harm is done. This is because it is implied in the user’s prompt that they understand the conversation as fictional. If they were to ask, “What conversations exist between Shakespeare and Einstein?” and the model responds in the same way, this would lead to the model representing fiction as fact, or misinformation.

Misinformation in critical and impactful areas can cause real harm. In a notable incident, a New York lawyer cited fictitious cases generated by ChatGPT in a legal brief filed in federal court. This led to potential sanctions due to the submission of non-existent legal precedents. Similarly, health-related hallucinations can be dangerous, such as models incorrectly advising users to eat a toxic substance or misrepresenting medical studies.

Should AI-generated hallucinations always be classified as harm? The answer depends on intent, predictability, and user expectations. If a user is aware that an AI sometimes hallucinates, they might apply critical thinking before accepting its statements as fact. However, if an AI presents hallucinations as definitive truth, especially in fields like medicine, law, or finance, then it creates a misleading and potentially harmful experience for users. The challenge for AI developers is to build models that minimize harmful hallucinations while preserving the AI's ability to generate creative and engaging responses in appropriate contexts.

Conclusion

AI personalities are not just design choices; they are reflections of corporate priorities, ethical considerations, and the fundamental limitations of large language models. Whether it’s ChatGPT’s helpful demeanor, Claude’s moral philosophy, or Gemini’s structured responses, each AI carries the imprint of its creators. But with this power comes responsibility. As AI continues to evolve, developers must navigate the fine line between helpfulness and harm, engagement and caution, ensuring that these digital entities serve society in ethical and reliable ways.