Logo
Home
language
Loading...

What if all the world's biggest problems have the same solution?

聞く/Video/Veritasium/What if all the world's biggest problems have the same solution?

What if all the world's biggest problems have the same solution?

Veritasium
4000 IELTS Words3000 Oxford Words5000 Oxford Words3000 Common Words1000 TOEIC Words5000 TOEFL Words

字幕 (575)

0:00What if, all
0:01of the world's biggest problems from climate change,
0:04to curing diseases, to disposal of plastic waste,
0:07what if they all had the same solution?
0:10A solution so tiny it would be invisible.
0:14I'm inclined to believe this is possible,
0:16thanks to a recent breakthrough that solved one
0:18of the biggest problems of the last century.
0:21How to determine the structure of a protein?
0:23- It's been described to me as equivalent
0:25to Fermat's last theorem, but for biology.
0:28- Over six decades, tens of thousands
0:30of biologists painstakingly worked out
0:32the structure of 150,000 proteins.
0:35Then in just a few years, a team of
0:38around 15 determined the structure of 200 million.
0:42That's basically every protein known to exist in nature.
0:46So how did they do it
0:48and why does this have the potential
0:49to solve problems way outside the realm of biology?
0:55A protein starts simply as a string of amino acids.
0:59Each amino acid has a carbon atom at the center.
1:02Then on one side is an amine group,
1:04and on the other side is a carboxyl group.
1:07And the last thing it's bonded to could be one
1:09of 20 different side chains,
1:11and which one determines which
1:13of the 20 different amino acids this molecule is.
1:17The amine group from one amino acid can react
1:20with the carboxyl group of another to form a peptide bond.
1:23So a series of amino acids can bond to form a string
1:27and pushing and pulling
1:29between countless molecules, electrostatic forces,
1:31hydrogen bonds, solvent interactions
1:34can cause this string to coil up and fold onto itself.
1:37This ultimately determines the 3D structure of the protein.
1:41And this shape is the thing
1:43that really matters about the protein.
1:45It's built for a specific purpose,
1:48like how hemoglobin has the perfect binding site
1:50to carry around oxygen in your blood.
1:53- These are machines, they need
1:55to be in their correct orientation in order to work together
1:58to move, for example, the proteins in your muscles.
2:01They change their shape a little bit
2:02in order to pull and contract.
2:04- But it would take people a long time
2:06to get the structure of just one protein.
2:08- Absolutely.
2:09So what should proteins look like?
2:11Was only started to answer really
2:12with experimental techniques.
2:15- [Derek] The first way protein structure was determined
2:17was by creating a crystal out of that protein.
2:20This was then exposed to x-rays
2:22to get a diffraction pattern,
2:24and then scientists would work backwards
2:26to try to figure out what shape
2:27of molecules would create such a pattern.
2:31It took British biochemist, John Kendrew, 12 years
2:34to get the first protein structure.
2:36His target was an oxygen storing protein called myoglobin,
2:40an important protein in our hearts.
2:43He first tried a horse heart,
2:45but this produced rather small crystals
2:46because it didn't have enough myoglobin.
2:49He knew diving mammals would have lots
2:52of myoglobin in their muscles
2:53since they're the best at conserving oxygen.
2:55So he obtained a huge chunk of whale meat from Peru.
2:59This finally gave Kendrew large enough crystals
3:02to create an x-ray diffraction image.
3:04- And when it came out, it looked really weird.
3:07People expected something
3:08kind of logical, mathematical, understandable,
3:11and it almost looked, I wouldn't say ugly, but intricate
3:14and complex and kind
3:15of like if you see a rocket motor,
3:18and all the parts hanging off.
3:20- [Derek] This structure, which has been called
3:21"Turd of the century,"
3:23won Kendrew, the 1962 Nobel Prize in chemistry.
3:28Over the next two decades,
3:29only around a hundred more structures were resolved.
3:32Even today, protein crystallization remains a big challenge.
3:36- Frankly it is not uncommon
3:38that just a couple protein structures
3:41can be someone's entire PhD.
3:42Sometimes just one, sometimes even just progress toward one,
3:46- And it's expensive.
3:47X-ray crystallography can cost 10s of thousands
3:50of dollars per protein.
3:52So scientists sought another way
3:53to work out protein structure.
3:55It only costs around a hundred dollars
3:57to find a protein sequence of amino acids.
4:00So if you could use this to figure out
4:02how the protein would fold,
4:03that would save a lot of time, effort, and money.
4:07I kind of know how carbon behaves
4:08and I know how carbon sticks to a sulfur
4:10and how that might stick next to a nitrogen.
4:12And if these ones are here,
4:13then I can imagine this one folding, making that bond there.
4:15So it seems like if you have some sense
4:17of basic molecular dynamics, you might be able to figure out
4:21how this protein's gonna fold.
4:22- One of the few true predictions in biology
4:25was actually Linus Pauling looking at just the geometry
4:28of the building blocks of proteins
4:29and saying, actually they should make helices and sheets.
4:32That's what we call secondary structure,
4:34the very local kind of twists and turns of the protein.
4:37- But beyond helices and sheets,
4:39biochemists could not figure out any reliable patterns
4:42that would lead to the final structure of all proteins.
4:46One reason for this is that evolution didn't design proteins
4:49from the ground up.
4:50- It's kind of like a programmer
4:52that doesn't know what they're doing,
4:54and whenever it looked good,
4:54they just kept adding that kind of thing.
4:56And that's how you end up
4:57with these both amazing objects
4:59and incredibly complex and hard to describe.
5:02They don't have purpose underneath them in the same way
5:05as like a human designed machine would.
5:08- [Derek] To illustrate just how complicated this process
5:11can get, MIT biologist Cyrus Levinthal did
5:14a back-of-the-envelope calculation,
5:16and he showed that even a short protein chain
5:18with 35 amino acids can fold
5:21in an astronomical number of ways.
5:24So even if a computer checked the energy instability
5:27of 30,000 configurations every nanosecond,
5:30it would take 200 times the age of the universe
5:33to find the correct structure.
5:37Refusing to give up,
5:39the University of Maryland professor John Moult
5:41started a competition called CASP in 1994.
5:45The challenge was simple, to design a computer model
5:48that could take an amino acid sequence
5:50and output its structure.
5:52The modelers would not know the correct structure
5:55beforehand, but the output from each model would be compared
5:59to the experimentally determined structure.
6:02A perfect match would get a score of a hundred,
6:04but anything over 90 was considered close enough
6:07that the structure was solved.
6:09CASP competitors gathered at an old wooden chapel
6:12turned conference center in Monterey, California,
6:15and at any point where a prediction didn't make sense,
6:17they were encouraged to tap their feet as friendly banter.
6:21There was a lot of foot tapping.
6:23(foot tapping)
6:25In the first year, teams could not achieve scores
6:28higher than 40.
6:30The early front runner was an algorithm called Rosetta,
6:33created by University of Washington biologist David Baker.
6:37One of his innovations was to boost computation
6:39by pooling together processing power from idle computers
6:42in homes, schools, and libraries that volunteered
6:45to install his software called Rosetta at Home.
6:49- As part of it, there was a screensaver
6:51that showed basically the course
6:52of the protein folding calculation.
6:55And then we started getting people writing in saying
6:57that they were watching the screensaver
6:59and they thought they could do better than the computer.
7:02- So Baker had an idea.
7:04He created a video game.
7:06(upbeat music)
7:08The game called Fold It,
7:10set up a protein chain capable of twisting
7:12and turning into different arrangements.
7:14- But now instead of the computer making the moves,
7:17the game players, the humans could make the moves.
7:20- Within three weeks,
7:21more than 50,000 gamers pooled
7:23their efforts to decipher an enzyme
7:25that plays a key role in HIV.
7:27X-Ray crystallography showed their result was correct.
7:30The gamers even got credited
7:32as co-authors on the research paper.
7:36Now, one man who played Fold It
7:37was a former child chess prodigy named Demis Hassabis.
7:41Hassabis had recently started an AI company called DeepMind.
7:45Their AI algorithm, AlphaGo made headlines
7:47for beating world champion Lee Sedol at the game of Go.
7:51One of AlphaGo's moves, move 37, shook Sedol to his core.
7:56But Hassabis never forgot about his time as a Fold It gamer.
8:00- So of course I was fascinated this
8:02just from games design perspective.
8:03You know, wouldn't it be amazing if we could mimic
8:05the intuition of these gamers who were only, by the way,
8:08of course, amateur biologists.
8:11- After returning from Korea,
8:12DeepMind researchers had a week-long hackathon
8:15where they tried to train AI to play Fold It.
8:18This was the beginning of Hassabis' longstanding goal
8:21of using AI to advance science.
8:23He initiated a new project called Alpha Fold
8:26to solve the protein folding problem.
8:30Meanwhile at CASP, the quality
8:31of prediction from the best performers,
8:33including Rosetta had plateaued.
8:36In fact, the performance went downhill after CASP eight.
8:40The predictions weren't good enough,
8:42even with faster computers
8:43and a growing number of structures
8:45in the protein data bank to train on.
8:48DeepMind hoped to change this with AlphaFold.
8:52Its first iteration, AlphaFold 1,
8:54was a standard off-the-shelf deep neural network like
8:57the ones used for computer vision at that time.
8:59The researchers trained it on lots
9:01and lots of protein structures from the protein data bank.
9:05As input, AlphaFold took the protein's amino acid sequence
9:09and an important set of clues given by evolution.
9:13Evolution is driven by mutations,
9:15changes in the genetic code,
9:17which in turn change the amino acids
9:19within a given protein sequence.
9:21But as species evolve, proteins need to retain the shape
9:24that allows them to perform their specific function.
9:27For instance, hemoglobin looks the same in humans, cats,
9:30horses, and basically any mammal.
9:33Evolution says, if it ain't broke, don't fix it.
9:36So we can compare sequences
9:37of the same protein across different species
9:40in this evolutionary table.
9:42Where sequences are similar,
9:44it's likely they are important
9:45in the protein structure and function.
9:48But even where the sequences are different,
9:50it's helpful to look at where mutations happen in pairs
9:54because they can identify which amino acids
9:56are close to each other in the final structure.
9:59Say two amino acids, a positively charged lysine
10:02and a negatively charged glutamic acid attract
10:05and hold each other in the folded protein.
10:08Now, if a mutation changes lysine
10:10to a negatively charged amino acid,
10:13it would repel glutamic acid
10:15and destabilize the whole protein.
10:17Therefore, another mutation must replace glutamic acid
10:20with a positively charged amino acid.
10:23This is known as co-evolution.
10:25These evolutionary tables
10:27were an important input for AlphaFold.
10:31As output, instead of directly producing a 3D structure,
10:35AlphaFold predicted a simpler 2D pair representation
10:38of that structure.
10:40The amino acid sequence
10:41is laid out horizontally and vertically.
10:43Whenever two amino acids are close
10:45to each other in the final structure,
10:47their corresponding row column intersection is bright.
10:52Distant amino acid pairs are dim.
10:56In addition to distances,
10:58the pair representation can also hold information
11:00on how amino acid molecules are twisted
11:03within the structure.
11:05AlphaFold 1 fed the protein sequence
11:07and its evolutionary table into its deep neural network,
11:10which it had trained to predict the pair representation.
11:14Once it had this, a separate algorithm folded
11:16the amino acid string based
11:18on the distance and torsion constraints.
11:20And this was the final protein structure prediction.
11:24With this framework, AlphaFold entered CASP 13
11:28and it immediately turned heads.
11:31It was the clear winner after many additions,
11:34but it wasn't perfect.
11:36Its score of 70 was not enough
11:38to clear the CASP threshold of 90.
11:42DeepMind needed to get back to the drawing board
11:44to get better results.
11:46So Hassabis recruited John Jumper to lead AlphaFold.
11:50- AlphaFold 2 was really a system about designing
11:52our deep learning.
11:54The individual blocks to be good at learning about proteins,
11:57have the types of geometric physical, evolutionary concepts
12:01that were needed and put it into the middle of the network
12:03instead of a process around it.
12:04And that was a tremendous accuracy boost.
12:07- [Derek] There were three key steps to get better results with AI.
12:11First, maximum compute power.
12:13Here, DeepMind was already better positioned
12:15than anybody in the world.
12:18It had access to the enormous computing power of Google,
12:21including their tensor processing units.
12:24Second, they needed a large and diverse data set.
12:27Is data the biggest roadblock and why?
12:31- I think it's too easy to say data's the roadblock
12:33and we should be careful about it.
12:34AlphaFold 2 was trained on the exact same data with much,
12:37much better machine learning as AlphaFold 1.
12:41So everyone overestimates the data blockage
12:44because it gets less severe with better machine learning.
12:48- [Derek] And that was the third key element, better AI algorithms.
12:53Now AI is not just good at protein folding.
12:56It can do all kinds of tasks
12:57that no one likes from writing emails
12:59to answering phone calls.
13:01Something I hate is building and maintaining a website.
13:04It's so much work from optimizing the website
13:07for different platforms, finding a good design
13:09so it looks professional to constantly updating it
13:12with new information about the business as it grows.
13:16That's why we partnered with Hostinger,
13:18the sponsor of today's video.
13:20Hostinger makes it super easy to build a website
13:22for yourself or your business.
13:24And with their advanced AI tools, you can simply describe
13:28what you want your website to look like.
13:30And in just a few seconds,
13:31your personalized website is up and running.
13:34Hostinger is designed to be as easy as possible
13:36for beginners and professionals.
13:38So any tweaks you need to make
13:40after that are super easy too.
13:42Just drag and drop any pictures
13:43or videos you want, where you want them,
13:46or just type what you want to say
13:48or have the AI help you here too,
13:49if writing isn't your thing either.
13:51And if you still want that human touch, Hostinger
13:53is always available
13:54with 24/7 support if you ever run into any issues.
13:57But when you're done building in just a few clicks,
13:59your website is live.
14:01It's all incredibly affordable too, with a domain
14:04and business email included for free.
14:06So to take your big idea online today,
14:08visit hostinger.com/ve or scan this QR code right here.
14:13And when you sign up, remember to use code VE at checkout
14:17to get 10% off your plan.
14:19I wanna thank Hostinger for sponsoring
14:20this part of the video.
14:21And now back to protein folding.
14:24As the AlphaFold 2 team searched for better algorithms,
14:27they turned to the transformer.
14:29That's the T in ChatGPT.
14:31And it relies on a concept called attention.
14:35In the sentence,
14:35the animal didn't cross the street because it was too tired.
14:39Attention recognizes that it refers to animal
14:42and not street based on the word tired.
14:45Attention adds context to any kind of sequential information
14:49by breaking it down into chunks,
14:51converting these into numerical representations
14:53or embeddings and making connections between them.
14:56In this case, the word it and animal.
15:003Blue1Brown has a great series
15:01of videos specifically about transformers and attention.
15:06Large language models use attention
15:07to predict the most appropriate word to add to a sentence,
15:10but AlphaFold also has sequential information,
15:13not sentences, but amino acid sequences.
15:17And to analyze them,
15:18the AlphaFold team built their own version
15:20of the transformer called an EVO Former.
15:24The EVO Former contained two towers,
15:26evolutionary information in the biology tower
15:29and pair representations in the geometry tower.
15:33Gone was AlphaFold 1's deep neural network that started
15:36with one tower and predicted the other.
15:38Instead, AlphaFold 2's EVO Former
15:40builds each tower separately.
15:42It starts with some initial guesses,
15:44evolutionary tables taken from known data sets as before,
15:48and the pair representations
15:49based on similar known proteins.
15:51And this time there's a bridge connecting the two towers
15:54that conveys newly found biological and geometry clues
15:58back and forth.
16:00In the biology tower,
16:01attention applied on a column identifies
16:03amino acid sequences that have been conserved.
16:06While along a row, it finds amino acid mutations
16:09that have occurred together.
16:11Whenever the EVO Former finds too closely linked amino acids
16:14in the evolutionary table.
16:15It means they are important to structure
16:18and it sends this information to the geometry tower.
16:20Here attention is applied
16:22to help calculate distances between amino acids.
16:25- There's also this thing called triangular attention
16:28that got introduced,
16:30which is essentially about letting
16:31triplets attend to each other.
16:33- [Derek] For each triplet of amino acids,
16:35AlphaFold applies the triangle inequality.
16:37The sum of two sides must be greater than the third.
16:41This constrains how far apart
16:43these three amino acids can be.
16:45This information is used to update the pair representation,
16:49- And that helps the model produce like
16:51a self-consistent picture of the structure.
16:53- [Derek] If the geometry tower finds it's impossible
16:56for two amino acids to be close to each other,
16:58then it tells the first tower
17:00to ignore their relationship in the evolutionary table.
17:03This exchange of information within the EVO Former
17:06goes on for 48 times,
17:08until information within both towers is refined.
17:11The geometrical features learned by this network
17:13are passed onto AlphaFold 2's second main innovation,
17:17the structure module.
17:18- For each amino acid,
17:19we pick three special atoms in the amino acid
17:22and say that those define a frame.
17:24And what the network does is it imagines
17:26that all the amino acids start out with the origin
17:29and it has to predict the appropriate translation
17:31and rotation to move these frames
17:33to where they sit in the real structure.
17:35So that's essentially what the structure module does.
17:37- But the thing that sets the structure module apart
17:39is what it doesn't do.
17:41- Previously, people might have imagined that you would
17:44like to encode the fact that this is a chain, you know,
17:47and that certain residue should sit next to each other.
17:50We don't really explicitly tell AlphaFold that.
17:53It's more like we give it a bag of amino acids
17:56and it's allowed to position each of them separately.
17:58And some people have thought that that helps it
18:02to not get stuck in terms of where things should be placed.
18:05It doesn't have to always be thinking about the constraint
18:07of these things forming a chain,
18:09that's something that emerges naturally later.
18:11- [Derek] That's why live AlphaFold folding videos
18:13can show it doing some weirdly non-physical stuff.
18:20The structure module outputs a 3D protein,
18:22but it still isn't ready.
18:24It's recycled at least three more times
18:27through the Evo Former to gain a deeper understanding
18:29of the protein only then the final prediction is made.
18:35In December, 2020, DeepMind returned to a virtual CASP
18:39with AlphaFold 2, and this time they did it.
18:43- I'm going to read an email from John Moult.
18:46"Your group has performed amazingly well in CASP 14,
18:50both relative to other groups
18:52and in absolute model accuracy.
18:54Congratulations on this work."
18:57- [Derek] For many proteins, AlphaFold 2 predictions
19:00were virtually indistinguishable from the actual structures
19:03and they finally beat the gold standard score of 90.
19:10- For me, having worked on this problem so long,
19:13after many, many stops and starts,
19:16and suddenly this is a solution.
19:17We'd solved the problem.
19:19This gives you such excitement about the way science works.
19:23- [Derek] Over six decades, all of the scientists working
19:26around the world on proteins painstakingly found
19:29about 150,000 protein structures.
19:32Then in one fell swoop, AlphaFold came in
19:35and unveiled over 200 million of them.
19:39Nearly all proteins known to exist in nature.
19:43In just a few months, AlphaFold advanced the work
19:46of research labs worldwide by several decades.
19:51It has directly helped us develop a vaccine for malaria.
19:55It's made possible the breaking down
19:57of antibiotic resistance enzymes,
19:59which make many life-saving drugs effective again.
20:02It's even helped us understand
20:03how protein mutations lead
20:04to various diseases from schizophrenia to cancer,
20:07and biologists studying little known
20:09and endangered species suddenly
20:11had access to proteins and their life mechanism.
20:15The AlphaFold 2 paper has been cited over 30,000 times.
20:19It has truly made a step function leap
20:22in our understanding of life.
20:24John Jumper and Demis Hassabis were awarded one half
20:27of the 2024 Nobel Prize in chemistry for this breakthrough.
20:30The other half went to David Baker,
20:33but not for predicting structures using Rosetta.
20:35Instead, it was for designing
20:37completely new proteins from scratch.
20:40- It was really hard to make brand new proteins
20:42that would do things.
20:43And so that's kind of the problem that we solved.
20:45- To do so, he uses the same kind of generative AI
20:48that makes art in programs like Dall-E.
20:51- You can say draw a picture
20:52of a kangaroo riding on a rabbit
20:54or something, and it will do that.
20:56And so it's exactly what we did with proteins.
20:58- His technique called "RF Diffusion" is trained
21:01by adding random noise to a known protein structure,
21:04and then the AI has to remove this noise.
21:07Once trained in this way, the AI can be asked
21:10to produce proteins for various functions.
21:12It's given a random noise input,
21:14and the AI figures out a brand new protein
21:17that does what you asked it to do.
21:20This work has huge implications.
21:22I mean, imagine you got bitten by a venomous snake.
21:25If you're lucky, you'll have access to anti-venom prepared
21:29by milking venom from the exact kind of snake,
21:32which is then injected into live animals,
21:35and the antibodies from that animal are extracted
21:37and refined and then given to you as an anti-venom.
21:41The trouble is often people have allergic reactions
21:44to these antibodies from other organisms.
21:46But your odds of survival can be a lot better
21:48with the latest synthetic proteins designed in Baker's lab.
21:52They've created human compatible antibodies
21:54that can neutralize lethal snake venom.
21:57This anti-venom could be manufactured in large quantities
22:00and easily transported to the places where it's needed.
22:03With these tiny molecular machines,
22:05the possibilities are endless.
22:07What are the applications you're most excited about?
22:10- So I think vaccines are gonna be really powerful.
22:12We have a number of proteins
22:14that are in human clinical trials for cancer,
22:16and we're working on autoimmune disease now.
22:18We're really excited about problems
22:19like capturing greenhouse gases.
22:21So we're designing enzymes
22:22that can fix methane, break down plastic.
22:26- What makes this approach so effective is
22:28how fast they can create and iterate the proteins.
22:32- It's really quite miraculous
22:34for anyone who's a conventional school biochemist
22:36or protein scientist.
22:38We can now have designs on the computer,
22:40get the amino acid sequence of the design proteins,
22:43and then in just a couple days we can get the protein out.
22:48Yeah. We've given a name to this,
22:49which is "Cowboy Biochemistry"
22:51because we just like, you just got kind of go for it
22:54as fast as you can, and it turns out to work pretty well.
22:58- What AI has done for proteins is just a hint
23:00of what it can do in other fields
23:02and on larger scales.
23:05In materials science, for example,
23:06DeepMind's GNoME program has found 2.2 million new crystals,
23:11including over 400,000 stable materials
23:14that could power future technologies
23:15from superconductors to batteries.
23:18AI is creating transformative leaps in science
23:21by helping to solve some of the fundamental problems
23:24that have blocked human progress.
23:26- If we think of the whole tree of knowledge,
23:28you know there are certain problems
23:29where you know if their root, no problems.
23:31If you unlock them, if you discover a solution to them,
23:33it would unlock a whole new branch or avenue of discovery.
23:38- And with this, AI is pushing forward the boundaries
23:41of human knowledge at a rate never seen before.
23:44- Speed ups of 2x are nice,
23:46they're great, we love them.
23:48Speed ups of a 100,000x, change what you do.
23:53You do fundamentally different stuff
23:55and you start to rebuild your science
23:58around the things that got easy.
24:01- And that's what I'm excited about.
24:03These discoveries represent real step function
24:06changes in science.
24:08Even if AI doesn't advance beyond where it is today,
24:10we will be reaping the benefits
24:12of these breakthroughs for decades.
24:15And assuming AI does continue to develop,
24:17well, it will open up opportunities
24:19that were previously thought impossible.
24:21Whether that's curing all diseases,
24:23creating novel materials,
24:25or restoring the environment to a pristine state.
24:29This sounds like an amazing future as long
24:32as the AI doesn't take over and destroy us all first.
24:36(slow cosmic music)