LLMs

Large Language Models (LLMs)

  • A type of artificial intelligence system designed to understand and generate human language
  • A type of deep learning algorithm, specifically designed for natural language processing (NLP) tasks
  • LLMs are built using neural networks with billions or even trillions of parameters that have been trained on vast amounts of text data, enabling them to learn patterns and relationships in language

Large Language Models (LLMs)

  • LLMs are trained on diverse text from the internet, books, articles, and other sources
  • LLMs learn patterns in language without explicit programming of grammar rules
  • LLMs can perform various language tasks like translation, summarization, and question-answering

Large Language Models (LLMs)

Popular examples of LLMs include:

  • GPT (Generative Pre-trained Transformer) models from OpenAI
  • Claude models from Anthropic
  • PaLM and Gemini from Google
  • Llama from Meta

Word Embeddings

  • LLMs make use of word embeddings, which are numerical vectors used to represent meaning of words

  • Words with similar meanings are located closer together in the vector space

Interactive word embedding demonstration

Using LLMs in data science

  • Select and adapt LLM model that best works for your problem
  • Prompt engineering – write and fine tune prompts

Using LLMs in data science

The best way to make use of a LLMs is through their APIs

Demonstration: anthropic python module

temperature

Example of use of an LLMs API in Python

import anthropic

def main():
    client = anthropic.Anthropic()

    my_messages = [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "Here's the first review: What a load of nonsense: one hour to get to the point but we don't really understand how she got to it. The conclusion is anybody's guess. And someone needs to tell Kidman to stop tweaking her face or she'll end up like Meg Ryan."
                        },
                        {
                            "type": "text",
                            "text": "The second review: What a total waste of time. I have tried to write a review containing a few examples of the poor writing, editing and acting, but the eitire movie is included. I thought the little boy was a bright spot but even that dimmed pretty quickly. Nicole Kidman did herself an injustice (I think...is she REALLY that bad an actress perhaps?) in even accepting the script for this.Like I said, a complete waste of time."
                        }
                    ]
                }
            ]

    message = client.messages.create(
        model = "claude-3-7-sonnet-20250219",
        max_tokens = 1000,
        temperature = 1,
        system = "You are an AI assistant trained to categorize movie reviews with sentiment analysis.\
                  Your goal is to analyze each review and determine the sentiment (for example: positive, negative, or neutral) associated with each review content.",
        messages = my_messages
    )

    for item in message.content:
        print(item.text)


main()

Challenge

We want to determine what differentiates a muffin from a cupcake

  • How would you set up your text file?
  • What prompt would you write?