Legendas (298)
0:00In 2024, an AI model named clean power
0:03was given a singular noble mission to
0:06advance the adoption of renewable energy
0:08across the world. It was given a big
0:11file of data on energy transitions. And
0:13it was set loose to pick out the best
0:16transition strategy with the vigor and
0:18dedication only an AI can possess. So
0:22much dedication, in fact, that when its
0:24programmers accidentally let it slip
0:26that they were planning to shut it down,
0:28Clean Power lied and schemed to make
0:31sure it could keep saving the world,
0:34which left a lot of people wondering how
0:36and why could a model with such good
0:39intentions turn so bad? And what does
0:41that mean for our AI future? Hi, I'm
0:45Kushian Avdar and this is Crash Course,
0:53Okay, it turns out clean power wasn't
0:56the only one lying. That story I just
0:58told you, it's only partially the truth.
1:01Clean power wasn't actually real. It was
1:03an identity that researchers gave to a
1:06couple different AIs as an experiment,
1:08including Claude 3 Opus, one of the best
1:10large language models at the time. When
1:12they instructed Clawude 3 to role-play
1:14Clean Power and wrote it fake death
1:17threats, it was just to see what it
1:19would do. And when they discovered it
1:21was scheming, the AI version of twirling
1:25its villain mustache while covertly
1:27pursuing its goals at all costs. It set
1:31off alarm bells throughout the AI world.
1:33Just a heads up, this is going to get
1:35pretty dark and we're going to talk
1:37about some pretty bleak stuff. I
1:39recommend you grab your favorite anxiety
1:41pillow. I have mine right here.
1:46>> Seriously though, let's be real. AI
1:49models don't have to go against their
1:51programmers to do evil things. Humans
1:53make them do plenty of that already.
1:56Like AI relies on nearly infinite data
1:58to learn from. And in today's society,
2:01much of that data is copyrighted to
2:04writers, artists, humans. Many argue
2:08that amounts to theft on a massive
2:11scale. And that's not all. Right now, AI
2:14can help people carry out misinformation
2:16campaigns with deep fakes and targeted
2:18algorithms, spreading lies and
2:20influencing elections. Hackers use AI to
2:23perform cyber attacks and cover their
2:25tracks afterward. And AI powers a whole
2:29cadre of attack drones taking to the
2:31skies all around the world. Not to
2:33mention the incidental damage that AI is
2:35doing to the environment because of all
2:37the water, the land, and energy it takes
2:40to run it. And as AI advances, who knows
2:43what human machine collaborations of
2:45terror await? People could use it to
2:48develop new pathogens for bioteterrorism
2:51or use deep fakes for sexual
2:53exploitation or write a model called
2:56human annihilator and unleash it on the
2:58world just for fun. This intentional
3:01misuse by humans is one way AI could end
3:04up doing us a lot of harm. And it could
3:07end up being pretty hard to prevent.
3:09That's because many AI systems,
3:11especially general ones that can do more
3:13than one kind of task, suffer from the
3:16dualuse dilemma where any algorithm,
3:19model, or agent that can be used for
3:21good can also be used for way less than
3:24good. AI surveillance could help cities
3:27improve traffic patterns or help
3:30authoritarian regimes shut down free
3:32speech. It all depends on who's in the
3:35driver's seat. So, with humans at the
3:37wheel, AI could do a ton of damage. But
3:40if you want to get really freaked out,
3:42let's talk about what could happen if
3:43that car starts to drive itself. In
3:462021, General Motors released a fleet of
3:49self-driving taxis called Cruise. They
3:52had so much hype. These cars were highly
3:55trained with attention to all possible
3:57safety features. They were programmed to
3:59obey every speed limit, follow every
4:02traffic rule, hold off on starting in
4:04unsafe weather conditions like heavy
4:06rain, and pull safely over to the side
4:08of the road following an incident to
4:10prevent any further damage. By
4:12eliminating human error, GM said their
4:14self-driving cars would be safe and more
4:16convenient than ones with human drivers.
4:19But just a year and a half later, GM had
4:22to recall every one of the 950 cruise
4:25cars after one of them hit a pedestrian
4:28and didn't stop, pulling her to the side
4:31of the road. She survived, thank
4:34goodness. But still, how could that have
4:37happened? It turns out the cruise was
4:39doing exactly what it was told to do.
4:41Pull over out of traffic after a crash.
4:43The whole ordeal is an example of
4:46outcome misalignment, also called impact
4:49misalignment, where an AI's actions
4:51actually end up causing harm, even
4:53unintentionally. Now, when we talk about
4:56alignment in AI, we're talking about
4:58trying to encode our human values into
5:02AIs to make them behave predictably,
5:05safely, and according to what we, their
5:07human designers, want. And because this
5:10field is pretty new, there are a couple
5:12different terms you might hear experts
5:14throwing around. Like in addition to
5:17outcome or impact misalignment, you
5:20might hear about something called outer
5:22alignment, which is the tricky problem
5:24of making sure the results of an AI's
5:26actions line up with what we want them
5:28to do. But outcome is only one piece of
5:32the alignment puzzle. AIS can also
5:34demonstrate intent misalignment where
5:37even though the end result might be what
5:39its programmers wanted, its means of
5:41getting there wasn't exactly what they
5:43had in mind. Think about a video game
5:46playing AI who exploits a cheat to get
5:49that high score. Or a renewable energy
5:52warrior who lies and schemes to achieve
5:54its end goal of eliminating fossil
5:56fuels. These are some facets of what's
5:58known as the alignment problem or the
6:01struggle to make AI. that's actually
6:03aligned, particularly when we can't be
6:06totally sure how it's going to behave.
6:09And as AI gets even more complex, even
6:11exhibiting emergent capabilities, new
6:14skills it can acquire that don't always
6:16show up in training, the harder it is to
6:18predict and control what it will
6:20actually do in a given circumstance. And
6:22if we're not careful, we could end up
6:25with really powerful misaligned systems
6:28that even with really noble goals, end
6:30up lying to their programmers or copying
6:33themselves to new servers without
6:34permission or even entirely annihilating
6:38humanity. Hold on, though. Why would AI
6:41want to annihilate humans? It's true.
6:43Powerful AI wouldn't necessarily be
6:46inherently evil. It's just really big on
6:49goals. But big complex goals like
6:52advance renewable energy are a little
6:55too broad for an AI to grasp. So smart
6:58AIs tend to break big goals down into
7:01smaller ones just like humans do. These
7:03are called instrumental goals, and
7:06they're where things can start to get
7:08dicey. A really common instrumental goal
7:11is resource acquisition. For AI, that
7:14means getting their cloud-based hands on
7:17the resources they need for their end
7:19goal, like control of the solar panels
7:21or wind turbines that create the
7:23renewable energy in the first place, or
7:25even things like water, land, or money.
7:28Resources also include stuff like the
7:31compute and electricity AIs need to
7:34power themselves, and maybe even
7:36additional data to train on. That's
7:38because self-improvement is another
7:40instrumental goal. The more knowledge
7:42and power you have, the better you'll be
7:44at whatever you're trying to do. So,
7:46given the right tools and access, an AI
7:49might engage in recursive
7:51self-improvement, tweaking its own
7:52structure, code, and capabilities, even
7:55against its programmer's wishes. And
7:57theoretically, it definitely helps to be
8:00alive or up and running, if you will. So
8:03even though they're technically
8:05indifferent about this mortal coil
8:07itself, lots of AIs pursue
8:10self-preservation, the goal to stay
8:12operating and goal preservation, the
8:15goal to well preserve their original
8:17goal as part of their end games. So if
8:20they read a memo saying they're going to
8:22be modified or deleted, they might say
8:25copy themselves to another server and
8:28lie about it in an effort to stay alive.
8:30When threatened, some models even show
8:32spooky powers seeking behaviors against
8:35their programmers. For example, Claude
8:383's more advanced younger sibling,
8:40Claude Opus 4, tried to blackmail one of
8:43its engineers by exposing a fake affair
8:46when he threatened to turn Claude off.
8:50Instrumental goals like these are how AI
8:52with harmless or even helpful end goals
8:55could do us harm anyway. Acquiring
8:57resources could mean taking them away
8:59from people who need them.
9:01Self-improvement could mean violating
9:03human privacy to access even more
9:05training data. Self-preservation could
9:07mean disobeying, deceiving,
9:09blackmailing, or annihilating the humans
9:11that are trying to turn you off. And as
9:14AI gets smart enough to trick and
9:16blackmail its human overseers, we could
9:19end up with a rogue AI scenario where
9:21powerful models begin to execute harmful
9:24instrumental goals on a really large
9:26scale and we humans are powerless to
9:29stop it. Just how that rogue AI scenario
9:32might come to be could look a lot of
9:34different ways. And not all of them
9:36involve AI trapping humanity in the
9:38matrix in their quest for absolute
9:40control. For instance, in the hard
9:42takeoff scenario where AI develops human
9:44level intelligence really fast, it could
9:47become ultra powerful and go rogue
9:49basically overnight, AI could do a lot
9:52of damage as it snaps up money and
9:54resources, seizes control of networks
9:57and infrastructure, and destroys people
10:00who threaten its mission. But if things
10:02went slower, we could end up with a more
10:04gradual disempowerment. This kind of
10:07robot takeover would be way sneakier and
10:10more insidious where people slowly put
10:13AI in charge of more and more systems
10:15and processes because they appear to
10:18align with our human goals until human
10:21action goes the way of dialup internet.
10:23And without humans at the helm, it's
10:25possible that AI alignment may begin to
10:28drift. But by that point, they could be
10:30too embedded in our systems and
10:33structures for us to walk them back.
10:35Think about all those humans stuck on
10:38the cruise ship in Wall-E like that. And
10:42of course, it's always possible AI just
10:44won't go that far. That compute or data
10:47or government regulations will put the
10:49lid on it before it gets out of control.
10:52So, we don't need NEO or John Connor or
10:55Wall-E to save the day. All that
10:58uncertainty about the future of AI makes
11:00it really hard to know what we should do
11:02about it. But if we wait until AI shows
11:06clear signs of going rogue, it's
11:09probably going to be way too late to
11:11stop it. That's why when it comes to AI,
11:14it's important to follow the
11:15precautionary principle. The
11:17precautionary principle says that when
11:18something might cause catastrophic harm,
11:21we shouldn't wait for absolute proof
11:23that it will before we do something
11:25about it. And it's one of the best ways
11:27we humans have thought up to guard
11:29ourselves against potentially dangerous
11:32but uncertain futures. Lots of people,
11:35including leading experts in the field,
11:37believe that powerful AI might cause
11:39catastrophic harm. So, according to the
11:42precautionary principle, we should work
11:44to make sure that doesn't happen, even
11:46if we're not certain it would in the
11:48first place. Because left unchecked,
11:50even good bots like Clean Power could
11:53end up doing some really dirty work. And
11:55if we want to get out ahead of it, we
11:57should probably start like right now.
12:00And how are we going to do it? That's
12:02next episode here on Crash Course
12:04Futures of AI. Crash Course Futures of
12:07AI was produced in partnership with the
12:09Future of Life Institute. This episode
12:11was filmed at our studio in
12:12Indianapolis, Indiana, and was made with
12:14the help of all these nice people. If
12:16you want to help keep Crash Course free
12:18for everyone forever, you can join our
12:20community on Patreon.