What if all the world's biggest problems have the same solution?

Veritasium

4000 IELTS Words3000 Oxford Words5000 Oxford Words3000 Common Words1000 TOEIC Words5000 TOEFL Words

Mga Subtitle (575)

0:00What if, all

0:01of the world's biggest problems from climate change,

0:04to curing diseases, to disposal of plastic waste,

0:07what if they all had the same solution?

0:10A solution so tiny it would be invisible.

0:14I'm inclined to believe this is possible,

0:16thanks to a recent breakthrough that solved one

0:18of the biggest problems of the last century.

0:21How to determine the structure of a protein?

0:23- It's been described to me as equivalent

0:25to Fermat's last theorem, but for biology.

0:28- Over six decades, tens of thousands

0:30of biologists painstakingly worked out

0:32the structure of 150,000 proteins.

0:35Then in just a few years, a team of

0:38around 15 determined the structure of 200 million.

0:42That's basically every protein known to exist in nature.

0:46So how did they do it

0:48and why does this have the potential

0:49to solve problems way outside the realm of biology?

0:55A protein starts simply as a string of amino acids.

0:59Each amino acid has a carbon atom at the center.

1:02Then on one side is an amine group,

1:04and on the other side is a carboxyl group.

1:07And the last thing it's bonded to could be one

1:09of 20 different side chains,

1:11and which one determines which

1:13of the 20 different amino acids this molecule is.

1:17The amine group from one amino acid can react

1:20with the carboxyl group of another to form a peptide bond.

1:23So a series of amino acids can bond to form a string

1:27and pushing and pulling

1:29between countless molecules, electrostatic forces,

1:31hydrogen bonds, solvent interactions

1:34can cause this string to coil up and fold onto itself.

1:37This ultimately determines the 3D structure of the protein.

1:41And this shape is the thing

1:43that really matters about the protein.

1:45It's built for a specific purpose,

1:48like how hemoglobin has the perfect binding site

1:50to carry around oxygen in your blood.

1:53- These are machines, they need

1:55to be in their correct orientation in order to work together

1:58to move, for example, the proteins in your muscles.

2:01They change their shape a little bit

2:02in order to pull and contract.

2:04- But it would take people a long time

2:06to get the structure of just one protein.

2:08- Absolutely.

2:09So what should proteins look like?

2:11Was only started to answer really

2:12with experimental techniques.

2:15- [Derek] The first way protein structure was determined

2:17was by creating a crystal out of that protein.

2:20This was then exposed to x-rays

2:22to get a diffraction pattern,

2:24and then scientists would work backwards

2:26to try to figure out what shape

2:27of molecules would create such a pattern.

2:31It took British biochemist, John Kendrew, 12 years

2:34to get the first protein structure.

2:36His target was an oxygen storing protein called myoglobin,

2:40an important protein in our hearts.

2:43He first tried a horse heart,

2:45but this produced rather small crystals

2:46because it didn't have enough myoglobin.

2:49He knew diving mammals would have lots

2:52of myoglobin in their muscles

2:53since they're the best at conserving oxygen.

2:55So he obtained a huge chunk of whale meat from Peru.

2:59This finally gave Kendrew large enough crystals

3:02to create an x-ray diffraction image.

3:04- And when it came out, it looked really weird.

3:07People expected something

3:08kind of logical, mathematical, understandable,

3:11and it almost looked, I wouldn't say ugly, but intricate

3:14and complex and kind

3:15of like if you see a rocket motor,

3:18and all the parts hanging off.

3:20- [Derek] This structure, which has been called

3:21"Turd of the century,"

3:23won Kendrew, the 1962 Nobel Prize in chemistry.

3:28Over the next two decades,

3:29only around a hundred more structures were resolved.

3:32Even today, protein crystallization remains a big challenge.

3:36- Frankly it is not uncommon

3:38that just a couple protein structures

3:41can be someone's entire PhD.

3:42Sometimes just one, sometimes even just progress toward one,

3:46- And it's expensive.

3:47X-ray crystallography can cost 10s of thousands

3:50of dollars per protein.

3:52So scientists sought another way

3:53to work out protein structure.

3:55It only costs around a hundred dollars

3:57to find a protein sequence of amino acids.

4:00So if you could use this to figure out

4:02how the protein would fold,

4:03that would save a lot of time, effort, and money.

4:07I kind of know how carbon behaves

4:08and I know how carbon sticks to a sulfur

4:10and how that might stick next to a nitrogen.

4:12And if these ones are here,

4:13then I can imagine this one folding, making that bond there.

4:15So it seems like if you have some sense

4:17of basic molecular dynamics, you might be able to figure out

4:21how this protein's gonna fold.

4:22- One of the few true predictions in biology

4:25was actually Linus Pauling looking at just the geometry

4:28of the building blocks of proteins

4:29and saying, actually they should make helices and sheets.

4:32That's what we call secondary structure,

4:34the very local kind of twists and turns of the protein.

4:37- But beyond helices and sheets,

4:39biochemists could not figure out any reliable patterns

4:42that would lead to the final structure of all proteins.

4:46One reason for this is that evolution didn't design proteins

4:49from the ground up.

4:50- It's kind of like a programmer

4:52that doesn't know what they're doing,

4:54and whenever it looked good,

4:54they just kept adding that kind of thing.

4:56And that's how you end up

4:57with these both amazing objects

4:59and incredibly complex and hard to describe.

5:02They don't have purpose underneath them in the same way

5:05as like a human designed machine would.

5:08- [Derek] To illustrate just how complicated this process

5:11can get, MIT biologist Cyrus Levinthal did

5:14a back-of-the-envelope calculation,

5:16and he showed that even a short protein chain

5:18with 35 amino acids can fold

5:21in an astronomical number of ways.

5:24So even if a computer checked the energy instability

5:27of 30,000 configurations every nanosecond,

5:30it would take 200 times the age of the universe

5:33to find the correct structure.

5:37Refusing to give up,

5:39the University of Maryland professor John Moult

5:41started a competition called CASP in 1994.

5:45The challenge was simple, to design a computer model

5:48that could take an amino acid sequence

5:50and output its structure.

5:52The modelers would not know the correct structure

5:55beforehand, but the output from each model would be compared

5:59to the experimentally determined structure.

6:02A perfect match would get a score of a hundred,

6:04but anything over 90 was considered close enough

6:07that the structure was solved.

6:09CASP competitors gathered at an old wooden chapel

6:12turned conference center in Monterey, California,

6:15and at any point where a prediction didn't make sense,

6:17they were encouraged to tap their feet as friendly banter.

6:21There was a lot of foot tapping.

6:23(foot tapping)

6:25In the first year, teams could not achieve scores

6:28higher than 40.

6:30The early front runner was an algorithm called Rosetta,

6:33created by University of Washington biologist David Baker.

6:37One of his innovations was to boost computation

6:39by pooling together processing power from idle computers

6:42in homes, schools, and libraries that volunteered

6:45to install his software called Rosetta at Home.

6:49- As part of it, there was a screensaver

6:51that showed basically the course

6:52of the protein folding calculation.

6:55And then we started getting people writing in saying

6:57that they were watching the screensaver

6:59and they thought they could do better than the computer.

7:02- So Baker had an idea.

7:04He created a video game.

7:06(upbeat music)

7:08The game called Fold It,

7:10set up a protein chain capable of twisting

7:12and turning into different arrangements.

7:14- But now instead of the computer making the moves,

7:17the game players, the humans could make the moves.

7:20- Within three weeks,

7:21more than 50,000 gamers pooled

7:23their efforts to decipher an enzyme

7:25that plays a key role in HIV.

7:27X-Ray crystallography showed their result was correct.

7:30The gamers even got credited

7:32as co-authors on the research paper.

7:36Now, one man who played Fold It

7:37was a former child chess prodigy named Demis Hassabis.

7:41Hassabis had recently started an AI company called DeepMind.

7:45Their AI algorithm, AlphaGo made headlines

7:47for beating world champion Lee Sedol at the game of Go.

7:51One of AlphaGo's moves, move 37, shook Sedol to his core.

7:56But Hassabis never forgot about his time as a Fold It gamer.

8:00- So of course I was fascinated this

8:02just from games design perspective.

8:03You know, wouldn't it be amazing if we could mimic

8:05the intuition of these gamers who were only, by the way,

8:08of course, amateur biologists.

8:11- After returning from Korea,

8:12DeepMind researchers had a week-long hackathon

8:15where they tried to train AI to play Fold It.

8:18This was the beginning of Hassabis' longstanding goal

8:21of using AI to advance science.

8:23He initiated a new project called Alpha Fold

8:26to solve the protein folding problem.

8:30Meanwhile at CASP, the quality

8:31of prediction from the best performers,

8:33including Rosetta had plateaued.

8:36In fact, the performance went downhill after CASP eight.

8:40The predictions weren't good enough,

8:42even with faster computers

8:43and a growing number of structures

8:45in the protein data bank to train on.

8:48DeepMind hoped to change this with AlphaFold.

8:52Its first iteration, AlphaFold 1,

8:54was a standard off-the-shelf deep neural network like

8:57the ones used for computer vision at that time.

8:59The researchers trained it on lots

9:01and lots of protein structures from the protein data bank.

9:05As input, AlphaFold took the protein's amino acid sequence

9:09and an important set of clues given by evolution.

9:13Evolution is driven by mutations,

9:15changes in the genetic code,

9:17which in turn change the amino acids

9:19within a given protein sequence.

9:21But as species evolve, proteins need to retain the shape

9:24that allows them to perform their specific function.

9:27For instance, hemoglobin looks the same in humans, cats,

9:30horses, and basically any mammal.

9:33Evolution says, if it ain't broke, don't fix it.

9:36So we can compare sequences

9:37of the same protein across different species

9:40in this evolutionary table.

9:42Where sequences are similar,

9:44it's likely they are important

9:45in the protein structure and function.

9:48But even where the sequences are different,

9:50it's helpful to look at where mutations happen in pairs

9:54because they can identify which amino acids

9:56are close to each other in the final structure.

9:59Say two amino acids, a positively charged lysine

10:02and a negatively charged glutamic acid attract

10:05and hold each other in the folded protein.

10:08Now, if a mutation changes lysine

10:10to a negatively charged amino acid,

10:13it would repel glutamic acid

10:15and destabilize the whole protein.

10:17Therefore, another mutation must replace glutamic acid

10:20with a positively charged amino acid.

10:23This is known as co-evolution.

10:25These evolutionary tables

10:27were an important input for AlphaFold.

10:31As output, instead of directly producing a 3D structure,

10:35AlphaFold predicted a simpler 2D pair representation

10:38of that structure.

10:40The amino acid sequence

10:41is laid out horizontally and vertically.

10:43Whenever two amino acids are close

10:45to each other in the final structure,

10:47their corresponding row column intersection is bright.

10:52Distant amino acid pairs are dim.

10:56In addition to distances,

10:58the pair representation can also hold information

11:00on how amino acid molecules are twisted

11:03within the structure.

11:05AlphaFold 1 fed the protein sequence

11:07and its evolutionary table into its deep neural network,

11:10which it had trained to predict the pair representation.

11:14Once it had this, a separate algorithm folded

11:16the amino acid string based

11:18on the distance and torsion constraints.

11:20And this was the final protein structure prediction.

11:24With this framework, AlphaFold entered CASP 13

11:28and it immediately turned heads.

11:31It was the clear winner after many additions,

11:34but it wasn't perfect.

11:36Its score of 70 was not enough

11:38to clear the CASP threshold of 90.

11:42DeepMind needed to get back to the drawing board

11:44to get better results.

11:46So Hassabis recruited John Jumper to lead AlphaFold.

11:50- AlphaFold 2 was really a system about designing

11:52our deep learning.

11:54The individual blocks to be good at learning about proteins,

11:57have the types of geometric physical, evolutionary concepts

12:01that were needed and put it into the middle of the network

12:03instead of a process around it.

12:04And that was a tremendous accuracy boost.

12:07- [Derek] There were three key steps to get better results with AI.

12:11First, maximum compute power.

12:13Here, DeepMind was already better positioned

12:15than anybody in the world.

12:18It had access to the enormous computing power of Google,

12:21including their tensor processing units.

12:24Second, they needed a large and diverse data set.

12:27Is data the biggest roadblock and why?

12:31- I think it's too easy to say data's the roadblock

12:33and we should be careful about it.

12:34AlphaFold 2 was trained on the exact same data with much,

12:37much better machine learning as AlphaFold 1.

12:41So everyone overestimates the data blockage

12:44because it gets less severe with better machine learning.

12:48- [Derek] And that was the third key element, better AI algorithms.

12:53Now AI is not just good at protein folding.

12:56It can do all kinds of tasks

12:57that no one likes from writing emails

12:59to answering phone calls.

13:01Something I hate is building and maintaining a website.

13:04It's so much work from optimizing the website

13:07for different platforms, finding a good design

13:09so it looks professional to constantly updating it

13:12with new information about the business as it grows.

13:16That's why we partnered with Hostinger,

13:18the sponsor of today's video.

13:20Hostinger makes it super easy to build a website

13:22for yourself or your business.

13:24And with their advanced AI tools, you can simply describe

13:28what you want your website to look like.

13:30And in just a few seconds,

13:31your personalized website is up and running.

13:34Hostinger is designed to be as easy as possible

13:36for beginners and professionals.

13:38So any tweaks you need to make

13:40after that are super easy too.

13:42Just drag and drop any pictures

13:43or videos you want, where you want them,

13:46or just type what you want to say

13:48or have the AI help you here too,

13:49if writing isn't your thing either.

13:51And if you still want that human touch, Hostinger

13:53is always available

13:54with 24/7 support if you ever run into any issues.

13:57But when you're done building in just a few clicks,

13:59your website is live.

14:01It's all incredibly affordable too, with a domain

14:04and business email included for free.

14:06So to take your big idea online today,

14:08visit hostinger.com/ve or scan this QR code right here.

14:13And when you sign up, remember to use code VE at checkout

14:17to get 10% off your plan.

14:19I wanna thank Hostinger for sponsoring

14:20this part of the video.

14:21And now back to protein folding.

14:24As the AlphaFold 2 team searched for better algorithms,

14:27they turned to the transformer.

14:29That's the T in ChatGPT.

14:31And it relies on a concept called attention.

14:35In the sentence,

14:35the animal didn't cross the street because it was too tired.

14:39Attention recognizes that it refers to animal

14:42and not street based on the word tired.

14:45Attention adds context to any kind of sequential information

14:49by breaking it down into chunks,

14:51converting these into numerical representations

14:53or embeddings and making connections between them.

14:56In this case, the word it and animal.

15:003Blue1Brown has a great series

15:01of videos specifically about transformers and attention.

15:06Large language models use attention

15:07to predict the most appropriate word to add to a sentence,

15:10but AlphaFold also has sequential information,

15:13not sentences, but amino acid sequences.

15:17And to analyze them,

15:18the AlphaFold team built their own version

15:20of the transformer called an EVO Former.

15:24The EVO Former contained two towers,

15:26evolutionary information in the biology tower

15:29and pair representations in the geometry tower.

15:33Gone was AlphaFold 1's deep neural network that started

15:36with one tower and predicted the other.

15:38Instead, AlphaFold 2's EVO Former

15:40builds each tower separately.

15:42It starts with some initial guesses,

15:44evolutionary tables taken from known data sets as before,

15:48and the pair representations

15:49based on similar known proteins.

15:51And this time there's a bridge connecting the two towers

15:54that conveys newly found biological and geometry clues

15:58back and forth.

16:00In the biology tower,

16:01attention applied on a column identifies

16:03amino acid sequences that have been conserved.

16:06While along a row, it finds amino acid mutations

16:09that have occurred together.

16:11Whenever the EVO Former finds too closely linked amino acids

16:14in the evolutionary table.

16:15It means they are important to structure

16:18and it sends this information to the geometry tower.

16:20Here attention is applied

16:22to help calculate distances between amino acids.

16:25- There's also this thing called triangular attention

16:28that got introduced,

16:30which is essentially about letting

16:31triplets attend to each other.

16:33- [Derek] For each triplet of amino acids,

16:35AlphaFold applies the triangle inequality.

16:37The sum of two sides must be greater than the third.

16:41This constrains how far apart

16:43these three amino acids can be.

16:45This information is used to update the pair representation,

16:49- And that helps the model produce like

16:51a self-consistent picture of the structure.

16:53- [Derek] If the geometry tower finds it's impossible

16:56for two amino acids to be close to each other,

16:58then it tells the first tower

17:00to ignore their relationship in the evolutionary table.

17:03This exchange of information within the EVO Former

17:06goes on for 48 times,

17:08until information within both towers is refined.

17:11The geometrical features learned by this network

17:13are passed onto AlphaFold 2's second main innovation,

17:17the structure module.

17:18- For each amino acid,

17:19we pick three special atoms in the amino acid

17:22and say that those define a frame.

17:24And what the network does is it imagines

17:26that all the amino acids start out with the origin

17:29and it has to predict the appropriate translation

17:31and rotation to move these frames

17:33to where they sit in the real structure.

17:35So that's essentially what the structure module does.

17:37- But the thing that sets the structure module apart

17:39is what it doesn't do.

17:41- Previously, people might have imagined that you would

17:44like to encode the fact that this is a chain, you know,

17:47and that certain residue should sit next to each other.

17:50We don't really explicitly tell AlphaFold that.

17:53It's more like we give it a bag of amino acids

17:56and it's allowed to position each of them separately.

17:58And some people have thought that that helps it

18:02to not get stuck in terms of where things should be placed.

18:05It doesn't have to always be thinking about the constraint

18:07of these things forming a chain,

18:09that's something that emerges naturally later.

18:11- [Derek] That's why live AlphaFold folding videos

18:13can show it doing some weirdly non-physical stuff.

18:20The structure module outputs a 3D protein,

18:22but it still isn't ready.

18:24It's recycled at least three more times

18:27through the Evo Former to gain a deeper understanding

18:29of the protein only then the final prediction is made.

18:35In December, 2020, DeepMind returned to a virtual CASP

18:39with AlphaFold 2, and this time they did it.

18:43- I'm going to read an email from John Moult.

18:46"Your group has performed amazingly well in CASP 14,

18:50both relative to other groups

18:52and in absolute model accuracy.

18:54Congratulations on this work."

18:57- [Derek] For many proteins, AlphaFold 2 predictions

19:00were virtually indistinguishable from the actual structures

19:03and they finally beat the gold standard score of 90.

19:10- For me, having worked on this problem so long,

19:13after many, many stops and starts,

19:16and suddenly this is a solution.

19:17We'd solved the problem.

19:19This gives you such excitement about the way science works.

19:23- [Derek] Over six decades, all of the scientists working

19:26around the world on proteins painstakingly found

19:29about 150,000 protein structures.

19:32Then in one fell swoop, AlphaFold came in

19:35and unveiled over 200 million of them.

19:39Nearly all proteins known to exist in nature.

19:43In just a few months, AlphaFold advanced the work

19:46of research labs worldwide by several decades.

19:51It has directly helped us develop a vaccine for malaria.

19:55It's made possible the breaking down

19:57of antibiotic resistance enzymes,

19:59which make many life-saving drugs effective again.

20:02It's even helped us understand

20:03how protein mutations lead

20:04to various diseases from schizophrenia to cancer,

20:07and biologists studying little known

20:09and endangered species suddenly

20:11had access to proteins and their life mechanism.

20:15The AlphaFold 2 paper has been cited over 30,000 times.

20:19It has truly made a step function leap

20:22in our understanding of life.

20:24John Jumper and Demis Hassabis were awarded one half

20:27of the 2024 Nobel Prize in chemistry for this breakthrough.

20:30The other half went to David Baker,

20:33but not for predicting structures using Rosetta.

20:35Instead, it was for designing

20:37completely new proteins from scratch.

20:40- It was really hard to make brand new proteins

20:42that would do things.

20:43And so that's kind of the problem that we solved.

20:45- To do so, he uses the same kind of generative AI

20:48that makes art in programs like Dall-E.

20:51- You can say draw a picture

20:52of a kangaroo riding on a rabbit

20:54or something, and it will do that.

20:56And so it's exactly what we did with proteins.

20:58- His technique called "RF Diffusion" is trained

21:01by adding random noise to a known protein structure,

21:04and then the AI has to remove this noise.

21:07Once trained in this way, the AI can be asked

21:10to produce proteins for various functions.

21:12It's given a random noise input,

21:14and the AI figures out a brand new protein

21:17that does what you asked it to do.

21:20This work has huge implications.

21:22I mean, imagine you got bitten by a venomous snake.

21:25If you're lucky, you'll have access to anti-venom prepared

21:29by milking venom from the exact kind of snake,

21:32which is then injected into live animals,

21:35and the antibodies from that animal are extracted

21:37and refined and then given to you as an anti-venom.

21:41The trouble is often people have allergic reactions

21:44to these antibodies from other organisms.

21:46But your odds of survival can be a lot better

21:48with the latest synthetic proteins designed in Baker's lab.

21:52They've created human compatible antibodies

21:54that can neutralize lethal snake venom.

21:57This anti-venom could be manufactured in large quantities

22:00and easily transported to the places where it's needed.

22:03With these tiny molecular machines,

22:05the possibilities are endless.

22:07What are the applications you're most excited about?

22:10- So I think vaccines are gonna be really powerful.

22:12We have a number of proteins

22:14that are in human clinical trials for cancer,

22:16and we're working on autoimmune disease now.

22:18We're really excited about problems

22:19like capturing greenhouse gases.

22:21So we're designing enzymes

22:22that can fix methane, break down plastic.

22:26- What makes this approach so effective is

22:28how fast they can create and iterate the proteins.

22:32- It's really quite miraculous

22:34for anyone who's a conventional school biochemist

22:36or protein scientist.

22:38We can now have designs on the computer,

22:40get the amino acid sequence of the design proteins,

22:43and then in just a couple days we can get the protein out.

22:48Yeah. We've given a name to this,

22:49which is "Cowboy Biochemistry"

22:51because we just like, you just got kind of go for it

22:54as fast as you can, and it turns out to work pretty well.

22:58- What AI has done for proteins is just a hint

23:00of what it can do in other fields

23:02and on larger scales.

23:05In materials science, for example,

23:06DeepMind's GNoME program has found 2.2 million new crystals,

23:11including over 400,000 stable materials

23:14that could power future technologies

23:15from superconductors to batteries.

23:18AI is creating transformative leaps in science

23:21by helping to solve some of the fundamental problems

23:24that have blocked human progress.

23:26- If we think of the whole tree of knowledge,

23:28you know there are certain problems

23:29where you know if their root, no problems.

23:31If you unlock them, if you discover a solution to them,

23:33it would unlock a whole new branch or avenue of discovery.

23:38- And with this, AI is pushing forward the boundaries

23:41of human knowledge at a rate never seen before.

23:44- Speed ups of 2x are nice,

23:46they're great, we love them.

23:48Speed ups of a 100,000x, change what you do.

23:53You do fundamentally different stuff

23:55and you start to rebuild your science

23:58around the things that got easy.

24:01- And that's what I'm excited about.

24:03These discoveries represent real step function

24:06changes in science.

24:08Even if AI doesn't advance beyond where it is today,

24:10we will be reaping the benefits

24:12of these breakthroughs for decades.

24:15And assuming AI does continue to develop,

24:17well, it will open up opportunities

24:19that were previously thought impossible.

24:21Whether that's curing all diseases,

24:23creating novel materials,

24:25or restoring the environment to a pristine state.

24:29This sounds like an amazing future as long

24:32as the AI doesn't take over and destroy us all first.

24:36(slow cosmic music)