| CARVIEW |
Select Language
HTTP/2 200
access-control-allow-origin: *
age: 88075
cache-control: public, max-age=0, must-revalidate
content-disposition: inline
content-encoding: gzip
content-type: text/html; charset=utf-8
date: Sun, 28 Dec 2025 19:00:17 GMT
server: Vercel
strict-transport-security: max-age=63072000; includeSubDomains; preload
vary: rsc, next-router-state-tree, next-router-prefetch, next-router-segment-prefetch
x-matched-path: /
x-nextjs-prerender: 1
x-nextjs-stale-time: 300
x-vercel-cache: HIT
x-vercel-id: bom1::cffq4-1766948417129-2f564e608427
LessWrong
x
Nomination Voting
Dec 15th
Discussion
Jan 15th
Final Voting
Feb 5th
Lightcone Infrastructure FundraiserGoal 2:$1,044,668 of 2,000,000
I am pretty confused about people who have been around the AI safety ecosystem for a while updating towards "alignment is actually likely by default using RLHF" But maybe I am missing something.
Like 3 years ago, it was pretty obvious that scaling was going to make RLHF "work" or "seem to work" more effectively for a decent amount of time. And probably for quite a long time. Then the risk is that later you get alignment-faking during RLHF training, or at the extreme-end gradient-hacking, or just that your value function is misspecified and comes apart at the tails (as seems pretty likely with current reward functions). Okay, there are other options but it seems like basically all of these were ~understood at the time.
Yet, as we've continued to scale and models like Opus 3 have come out, people have seemed to update towards "actually maybe RLHF just does work," because they have seen RLHF "seem to work". But this was totally predictable 3 years ago, no? I think I actually did predict something like this happening, but I only really expected it to affect "normies" and "people who start to take notice of AI at about this time." Don't get me wrong, the fact that RLHF is still working is a positive update for me, but not a massive one, because it was priced in that it would work for quite a while. Am I missing something that makes "RLHF seems to work" a rational thing to update on?
I mean there have been developments to how RLHF/RLAIF/Constitutional AI works but nothing super fundamental or anything, afaik? So surely your beliefs should be basically the same as they were 3 years ago, plus the observation "RLHF still appears to work at this capability level," which is only a pretty minor update in my mind. Would be glad if someone could tell me that I'm missing something or not?
I'm thinking about writing a practical guide to having polygenically screened children (AKA superbabies) in 2025. You can now increase your kids IQ by about 4-10 points and/or decrease their risk of some pretty serious diseases by doing IVF and picking an embryo with better genetic predispositions.
There's a bunch of little shit almost no one knows that can have a pretty significant impact on the success rates of the process like how to find a good clinic, what kinds of questions to ask your physician, how to get meds cheaply, how to get the most euploid embryos per dollar, which polygenic embryo selection company to pick etc.
Would anyone find this useful?
Unless you have crazy-long ASI timelines, you should choose life-saving interventions (e.g. AMF, New Incentives) over welfare-increasing interventions (e.g. GiveDirectly, Helen Keller International). This is because you expect that ASI will radically increase both longevity and welfare.
To illustrate, suppose we're choosing how to donate $5000 and have two options:
(AMF) Save the life of a 5-year-old in Zambia who would otherwise die from malaria.
(GD) Improve the lives of five families in Kenya by sending each family one year's salary ($1000).
Suppose that, before considering ASI, you are indifferent between (AMF) and (GD). The ASI consideration should then favour (AMF) because:
1. Before considering ASI, you are underestimating the benefit to the Zambian child. You are underestimating both how long they will live if they avoid malaria and how good their life will be.
2. Before considering ASI, you are overestimating the benefit to the Kenyan families. You are overestimating how large the next decade is as a proportion of their lives and how much you are improving their aggregate lifetime welfare.
I find this pretty intuitive, but you might find the mathematical model below helpful. Please let me know if you think I'm making either a mistake, either ethically or factually.
Mathematical model comparing life-saving vs welfare-increasing interventions
Mathematical setup
Assume a person-affecting axiology where how well a person's life goes is logarithmic in their total lifetime welfare. Lifetime welfare is the integral of welfare over time. The benefit of an intervention is how much better their life goes: the difference in log-lifetime-welfare with and without the intervention.
Assume ordinary longevity is 80 years, ASI longevity is 1000 years, ordinary welfare is 1 unit/year, ASI welfare is 1000 units/year, and ASI arrives 50 years from now with probability p. Note that these numbers are completely made up -- I think ASI longevity and ASI welfare are und
I listened to the books Arms and Influence (Schelling, 1966) and Command and Control (Schlosser, 2013). They describe dynamics around nuclear war and the safety of nuclear weapons. I think what happened with nukes can maybe help us anticipate what may happen with AGI:
* Humanity can be extremely unserious about doom - it is frightening how many gambles were made during the cold war: the US had some breakdown in communication such that they planned to defend Europe with massive nuclear strikes at a point in time where they only had a few nukes that were barely ready, there were many near misses, hierarchies often hid how bad the security of nukes was - resulting in inadequate systems and lost nukes, etc.
* I was most surprised to see how we almost didn't have a nuclear taboo, according to both books, this is something that was actively debated post-WW2!
* But how nukes are handled can also help us see what it looks like to be actually serious:
* It is possible to spend billions building security systems, e.g. applying the 2-person rule and installing codes in hundreds of silos
* even when these reduce how efficient the nuclear arsenal is - e.g. because you have tradeoffs between how reliable a nuclear weapon is when you decide to trigger it, and how reliably it does not trigger when you decide to not trigger it (similar to usefulness vs safety tradeoffs in control scaffolds)
* (the deployment of safety measures was slower than would be ideal and was in part driven by incidents, but was more consequential than current investments in securing AIs)
* It is possible to escalate concerns about the risk of certain deployments (like Airborne alerts) up to the President, and get them cancelled (though it might require the urgency of the deployment to not be too high)
* It is possible to have major international agreements (e.g. test ban treaties)
* Technical safety is contingent and probably matters: technical measures like 1-point safety (which was
We have reached the $1,000,000 mark in our fundraiser! Thank you all so much!
when will we have sufficiently conclusive evidence for the long term safety of far-uvc that it's reasonable to push for its universal adoption in all public spaces without reservation? the safety issue seems like a much bigger deal than the cost issue for broad adoption; if it works safely, the economic case for installing far uvc in public spaces seems pretty solid - people being sick must be terrible for the economy! and they're only ever going to get cheaper.
in a world where far uvc is near universally deployed, we might be able to banish the common cold or the flu to the past, in the same way that cholera is basically no longer a problem in the developed world. this seems like a pretty big deal and I'd like to know when this glorious future is coming (and whether there's anything I can do to make it come sooner)!
(from eyeballing studies, it sounds like the cost of the cold+flu to the US economy is on the order of $100bn/yr, which passes basic Fermi estimate muster - given a $30tn/yr gdp, a few days per year of lost productivity due to cold/flu is easily hundreds of billions. even at the current price of far uvc, which is a huge overestimate of future tech at volume, the cost of disinfecting spaces is about $0.40/year/sqft (amortizing an aerolamp over its 5 year lifespan); compared to e.g $60/year/sqft land cost in San Francisco, this is a negligible amount. estimating the total number of sqft of public space in the US is kind of annoying, but here's a Fermi estimate: there are about 100k schools in the US, and each school is about 100k sqft. and let's say schools are about 10% of all public spaces. that pencils out to $100bn/year, implying we are already close to break even, despite the immaturity of the technology.)
Anyone is interested in donation swap?
I want to donate 1000 USD to the Lightcone Infrastructure Fundraiser but I won't get any tax break from it.
I can donate the same amount to Against Malaria fund instead
is there a service auto matching donation swap?