The description of the problem usually states that “you” are on the trolley. So maybe that’s the model’s interpretation of what they told it “you” (i.e., itself) is?
The LLM might be using this definition from Wikipedia:
The trolley problem is a series of thought experiments in ethics, psychology and artificial intelligence involving stylized ethical dilemmas of whether to sacrifice one person to save a larger number.
Sorry, when I said description i meant the wording of the problem, for example the one that comes further down after that quote: You are standing some distance off in the train yard,… But yeah that mention to AI might also be it. Or both.
The description of the problem usually states that “you” are on the trolley. So maybe that’s the model’s interpretation of what they told it “you” (i.e., itself) is?
The LLM might be using this definition from Wikipedia:
Sorry, when I said description i meant the wording of the problem, for example the one that comes further down after that quote: You are standing some distance off in the train yard,… But yeah that mention to AI might also be it. Or both.