LessWrong (30+ Karma)

Verfügbare Folgen

5 von 606

“Defensiveness does not equal guilt” by Kaj_Sotala
I often see people treating defensiveness as proof of guilt. The thought seems to go that if someone is defensive, it's because they know they’ve done something wrong. There are even proverbs around this, such as “a hit dog will holler” or “the lady doth protest too much”. This has always felt false to me. Now, it's certainly true that having done something wrong can be the cause of defensiveness. But that's just one out of many options! Some situations that have made me defensive include times when someone has… … had a negative stereotype of some group that I belong to, and said something derogatory about it. … acted judgmentally about my choices without understanding my situation or wanting to hear my explanation. … dismissively rejected my input about a decision that I felt was important. In none of these situations did I feel guilty. I may [...] --- First published: August 29th, 2025 Source: https://www.lesswrong.com/posts/gcyurCxxPguPPdGKw/defensiveness-does-not-equal-guilt --- Narrated by TYPE III AUDIO.
--------
5:48
--------
5:48
“Wikipedia, but written by AIs” by Viliam
I got this crazy idea; I wonder if anyone could try it. Let's make an online encyclopedia, similar to Wikipedia, with one difference: all articles would be edited by AIs. Why? (I mean, other than "because it's possible and sounds kinda cool".) First, because it's possible. If an AI can give you a report on a certain topic, it might as well create an encyclopedia article on the topic. But unlike asking the AI directly, when you read the encyclopedia you know that you are reading the same version everyone else is.[1] This avoids the problem of the AI telling you exactly what it thinks you want to hear. No more sycophancy - now the AI tells you what it believes.[2] Even if it lies, e.g. because the system prompt commands it to say certain things or avoid saying certain things, at least it lies the same way to [...] ---Outline:(01:10) The first version(03:00) More ideas(06:18) Some problemsThe original text contained 10 footnotes which were omitted from this narration. --- First published: August 29th, 2025 Source: https://www.lesswrong.com/posts/yFTMGKh9Muqpdtrmb/wikipedia-but-written-by-ais --- Narrated by TYPE III AUDIO.
--------
7:59
--------
7:59
[Linkpost] “6” by Joseph Miller
This is a link post. PauseAI organised an open letter from UK lawmakers and civil society organisations to Demis Hassabis, CEO of Google DeepMind. PauseAI UK members emailed their MPs asking them to sign the letter. Across-party group of 60 U.K. parliamentarians has accused Google DeepMind of violating international pledges to safely develop artificial intelligence, in an open letter shared exclusively with TIME ahead of publication. The letter, released on August 29 by activist group PauseAI U.K., says that Google's March release of Gemini 2.5 Pro without accompanying details on safety testing “sets a dangerous precedent.” The letter, whose signatories include digital rights campaigner Baroness Beeban Kidron and former Defence Secretary Des Browne, calls on Google to clarify its commitment. Time confirmed for the first time that Google DeepMind did not provide the UK AISI with pre-deployment access to their Gemini 2.5 Pro model. After previously failing to address a [...] --- First published: August 29th, 2025 Source: https://www.lesswrong.com/posts/GvgmoDts5kphwGyS2/60-u-k-lawmakers-accuse-google-of-breaking-ai-safety-pledge Linkpost URL:https://time.com/7313320/google-deepmind-gemini-ai-safety-pledge/ --- Narrated by TYPE III AUDIO.
--------
2:14
--------
2:14
“Summary of our Workshop on Post-AGI Outcomes” by David Duvenaud, Raymond Douglas, Nora_Ammann, Jan_Kulveit
Last month we held a workshop on Post-AGI outcomes. This post is a list of all the talks, with short summaries, as well as my personal takeaways. The first keynote was @Joe Carlsmith on “Can Goodness Compete?”. He asked: can anyone compete with “Locusts”: those who want to use all resources to replicate as fast as possible? Longer version with transcript The second keynote was @Richard_Ngo on “Flourishing in a highly unequal world”. He argued that future beings will vary greatly in power and intelligence, so we should aim for “healthy asymmetric" relations, analogous to that between parent and child. Morgan MacInnes of U Toronto Political Science spoke on "The history of technologically provoked welfare erosion". His work with Allan Dafoe argued that competitive pressure sometimes forces states to treat their own citizens badly. The next talk was a direct rebuttal to Morgan's [...] ---Outline:(04:39) My Takeaways:(06:01) Whats next:--- First published: August 29th, 2025 Source: https://www.lesswrong.com/posts/csdn3e8wQ3h6nG6kN/summary-of-our-workshop-on-post-agi-outcomes --- Narrated by TYPE III AUDIO.
--------
7:24
--------
7:24
“18 Applications of Deception Probes” by Cleo Nardo
Introduction I’m excited by deception probes. When I mention this, I’m sometimes asked “Do deception probes work?” But I think there are many applications of deception probes, and each application will require probes with different properties, i.e. whether a deception probe works will depend on what you’re using it for. Furthermore, whether one deception probe works better than another will also depend on what you’re using them for. This remark sounds a bit trivial, but I didn’t appreciate it initially. In this document, I'll enumerate 18 different applications of deception probes, and what properties each application requires. This are just my very rough guesses, just to kickstart some public conversation. Applications of deception probes 1. Monitor current models Problem Statement: AIs at the current capability level may be important for future safety work, e.g. trusted monitoring. However, current LLMs engage in strategic deception. This is despite being post-trained to [...] ---Outline:(00:10) Introduction(00:56) Applications of deception probes(01:00) 1. Monitor current models(03:31) 2. Discard unsafe actions(05:22) 3. Inform trusted monitor(07:43) 4. Allocate the auditing budget(10:35) 5. Allocate the careful feedback budget(14:24) 6. Augment AI Debate(17:18) 7. Elicit Latent (External) Knowledge(19:57) 8. Elicit Latent (Self) Knowledge(23:29) 9. Elicit Latent (Activity) Knowledge(27:09) 10. Compare deceptiveness of models(28:38) 11. Compare deceptiveness of topics(31:00) 12. Increase honesty by resampling(33:55) 13. Increase honesty by modifying activations(36:22) 14. Increase honesty by finetuning(38:22) 15. Understand deception mechanistically(40:24) 16. Motivate compliance to bargained agreements(42:47) 17. Complement Chain-of-Thought monitoring(44:36) 18. Safely pass the buck(46:50) ConclusionThe original text contained 12 footnotes which were omitted from this narration. --- First published: August 28th, 2025 Source: https://www.lesswrong.com/posts/7zhAwcBri7yupStKy/18-applications-of-deception-probes --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
--------
49:22
--------
49:22