Mga Subtitle (575)
0:01of the world's biggest
problems from climate change,
0:04to curing diseases, to
disposal of plastic waste,
0:07what if they all had the same solution?
0:10A solution so tiny it would be invisible.
0:14I'm inclined to believe this is possible,
0:16thanks to a recent
breakthrough that solved one
0:18of the biggest problems
of the last century.
0:21How to determine the
structure of a protein?
0:23- It's been described to me as equivalent
0:25to Fermat's last theorem, but for biology.
0:28- Over six decades, tens of thousands
0:30of biologists painstakingly worked out
0:32the structure of 150,000 proteins.
0:35Then in just a few years, a team of
0:38around 15 determined the
structure of 200 million.
0:42That's basically every protein
known to exist in nature.
0:46So how did they do it
0:48and why does this have the potential
0:49to solve problems way
outside the realm of biology?
0:55A protein starts simply as
a string of amino acids.
0:59Each amino acid has a
carbon atom at the center.
1:02Then on one side is an amine group,
1:04and on the other side is a carboxyl group.
1:07And the last thing it's
bonded to could be one
1:09of 20 different side chains,
1:11and which one determines which
1:13of the 20 different amino
acids this molecule is.
1:17The amine group from
one amino acid can react
1:20with the carboxyl group of
another to form a peptide bond.
1:23So a series of amino acids
can bond to form a string
1:27and pushing and pulling
1:29between countless molecules,
electrostatic forces,
1:31hydrogen bonds, solvent interactions
1:34can cause this string to
coil up and fold onto itself.
1:37This ultimately determines the
3D structure of the protein.
1:41And this shape is the thing
1:43that really matters about the protein.
1:45It's built for a specific purpose,
1:48like how hemoglobin has
the perfect binding site
1:50to carry around oxygen in your blood.
1:53- These are machines, they need
1:55to be in their correct orientation
in order to work together
1:58to move, for example, the
proteins in your muscles.
2:01They change their shape a little bit
2:02in order to pull and contract.
2:04- But it would take people a long time
2:06to get the structure of just one protein.
2:09So what should proteins look like?
2:11Was only started to answer really
2:12with experimental techniques.
2:15- [Derek] The first way protein
structure was determined
2:17was by creating a crystal
out of that protein.
2:20This was then exposed to x-rays
2:22to get a diffraction pattern,
2:24and then scientists would work backwards
2:26to try to figure out what shape
2:27of molecules would create such a pattern.
2:31It took British biochemist,
John Kendrew, 12 years
2:34to get the first protein structure.
2:36His target was an oxygen storing
protein called myoglobin,
2:40an important protein in our hearts.
2:43He first tried a horse heart,
2:45but this produced rather small crystals
2:46because it didn't have enough myoglobin.
2:49He knew diving mammals would have lots
2:52of myoglobin in their muscles
2:53since they're the best
at conserving oxygen.
2:55So he obtained a huge chunk
of whale meat from Peru.
2:59This finally gave Kendrew
large enough crystals
3:02to create an x-ray diffraction image.
3:04- And when it came out,
it looked really weird.
3:07People expected something
3:08kind of logical,
mathematical, understandable,
3:11and it almost looked, I
wouldn't say ugly, but intricate
3:15of like if you see a rocket motor,
3:18and all the parts hanging off.
3:20- [Derek] This structure,
which has been called
3:21"Turd of the century,"
3:23won Kendrew, the 1962
Nobel Prize in chemistry.
3:28Over the next two decades,
3:29only around a hundred more
structures were resolved.
3:32Even today, protein crystallization
remains a big challenge.
3:36- Frankly it is not uncommon
3:38that just a couple protein structures
3:41can be someone's entire PhD.
3:42Sometimes just one, sometimes
even just progress toward one,
3:46- And it's expensive.
3:47X-ray crystallography
can cost 10s of thousands
3:50of dollars per protein.
3:52So scientists sought another way
3:53to work out protein structure.
3:55It only costs around a hundred dollars
3:57to find a protein sequence of amino acids.
4:00So if you could use this to figure out
4:02how the protein would fold,
4:03that would save a lot of
time, effort, and money.
4:07I kind of know how carbon behaves
4:08and I know how carbon sticks to a sulfur
4:10and how that might stick
next to a nitrogen.
4:12And if these ones are here,
4:13then I can imagine this one
folding, making that bond there.
4:15So it seems like if you have some sense
4:17of basic molecular dynamics,
you might be able to figure out
4:21how this protein's gonna fold.
4:22- One of the few true
predictions in biology
4:25was actually Linus Pauling
looking at just the geometry
4:28of the building blocks of proteins
4:29and saying, actually they
should make helices and sheets.
4:32That's what we call secondary structure,
4:34the very local kind of twists
and turns of the protein.
4:37- But beyond helices and sheets,
4:39biochemists could not figure
out any reliable patterns
4:42that would lead to the final
structure of all proteins.
4:46One reason for this is that
evolution didn't design proteins
4:50- It's kind of like a programmer
4:52that doesn't know what they're doing,
4:54and whenever it looked good,
4:54they just kept adding that kind of thing.
4:56And that's how you end up
4:57with these both amazing objects
4:59and incredibly complex
and hard to describe.
5:02They don't have purpose
underneath them in the same way
5:05as like a human designed machine would.
5:08- [Derek] To illustrate just
how complicated this process
5:11can get, MIT biologist Cyrus Levinthal did
5:14a back-of-the-envelope calculation,
5:16and he showed that even
a short protein chain
5:18with 35 amino acids can fold
5:21in an astronomical number of ways.
5:24So even if a computer checked
the energy instability
5:27of 30,000 configurations every nanosecond,
5:30it would take 200 times
the age of the universe
5:33to find the correct structure.
5:39the University of Maryland
professor John Moult
5:41started a competition called CASP in 1994.
5:45The challenge was simple,
to design a computer model
5:48that could take an amino acid sequence
5:50and output its structure.
5:52The modelers would not
know the correct structure
5:55beforehand, but the output from
each model would be compared
5:59to the experimentally
determined structure.
6:02A perfect match would
get a score of a hundred,
6:04but anything over 90 was
considered close enough
6:07that the structure was solved.
6:09CASP competitors gathered
at an old wooden chapel
6:12turned conference center
in Monterey, California,
6:15and at any point where a
prediction didn't make sense,
6:17they were encouraged to tap
their feet as friendly banter.
6:21There was a lot of foot tapping.
6:25In the first year, teams
could not achieve scores
6:30The early front runner was
an algorithm called Rosetta,
6:33created by University of
Washington biologist David Baker.
6:37One of his innovations
was to boost computation
6:39by pooling together processing
power from idle computers
6:42in homes, schools, and
libraries that volunteered
6:45to install his software
called Rosetta at Home.
6:49- As part of it, there was a screensaver
6:51that showed basically the course
6:52of the protein folding calculation.
6:55And then we started getting
people writing in saying
6:57that they were watching the screensaver
6:59and they thought they could
do better than the computer.
7:02- So Baker had an idea.
7:04He created a video game.
7:08The game called Fold It,
7:10set up a protein chain capable of twisting
7:12and turning into different arrangements.
7:14- But now instead of the
computer making the moves,
7:17the game players, the
humans could make the moves.
7:20- Within three weeks,
7:21more than 50,000 gamers pooled
7:23their efforts to decipher an enzyme
7:25that plays a key role in HIV.
7:27X-Ray crystallography showed
their result was correct.
7:30The gamers even got credited
7:32as co-authors on the research paper.
7:36Now, one man who played Fold It
7:37was a former child chess
prodigy named Demis Hassabis.
7:41Hassabis had recently started
an AI company called DeepMind.
7:45Their AI algorithm, AlphaGo made headlines
7:47for beating world champion
Lee Sedol at the game of Go.
7:51One of AlphaGo's moves, move
37, shook Sedol to his core.
7:56But Hassabis never forgot about
his time as a Fold It gamer.
8:00- So of course I was fascinated this
8:02just from games design perspective.
8:03You know, wouldn't it be
amazing if we could mimic
8:05the intuition of these gamers
who were only, by the way,
8:08of course, amateur biologists.
8:11- After returning from Korea,
8:12DeepMind researchers had
a week-long hackathon
8:15where they tried to
train AI to play Fold It.
8:18This was the beginning of
Hassabis' longstanding goal
8:21of using AI to advance science.
8:23He initiated a new
project called Alpha Fold
8:26to solve the protein folding problem.
8:30Meanwhile at CASP, the quality
8:31of prediction from the best performers,
8:33including Rosetta had plateaued.
8:36In fact, the performance went
downhill after CASP eight.
8:40The predictions weren't good enough,
8:42even with faster computers
8:43and a growing number of structures
8:45in the protein data bank to train on.
8:48DeepMind hoped to change
this with AlphaFold.
8:52Its first iteration, AlphaFold 1,
8:54was a standard off-the-shelf
deep neural network like
8:57the ones used for computer
vision at that time.
8:59The researchers trained it on lots
9:01and lots of protein structures
from the protein data bank.
9:05As input, AlphaFold took the
protein's amino acid sequence
9:09and an important set of
clues given by evolution.
9:13Evolution is driven by mutations,
9:15changes in the genetic code,
9:17which in turn change the amino acids
9:19within a given protein sequence.
9:21But as species evolve, proteins
need to retain the shape
9:24that allows them to perform
their specific function.
9:27For instance, hemoglobin looks
the same in humans, cats,
9:30horses, and basically any mammal.
9:33Evolution says, if it
ain't broke, don't fix it.
9:36So we can compare sequences
9:37of the same protein
across different species
9:40in this evolutionary table.
9:42Where sequences are similar,
9:44it's likely they are important
9:45in the protein structure and function.
9:48But even where the
sequences are different,
9:50it's helpful to look at where
mutations happen in pairs
9:54because they can identify
which amino acids
9:56are close to each other
in the final structure.
9:59Say two amino acids, a
positively charged lysine
10:02and a negatively charged
glutamic acid attract
10:05and hold each other in the folded protein.
10:08Now, if a mutation changes lysine
10:10to a negatively charged amino acid,
10:13it would repel glutamic acid
10:15and destabilize the whole protein.
10:17Therefore, another mutation
must replace glutamic acid
10:20with a positively charged amino acid.
10:23This is known as co-evolution.
10:25These evolutionary tables
10:27were an important input for AlphaFold.
10:31As output, instead of directly
producing a 3D structure,
10:35AlphaFold predicted a simpler
2D pair representation
10:40The amino acid sequence
10:41is laid out horizontally and vertically.
10:43Whenever two amino acids are close
10:45to each other in the final structure,
10:47their corresponding row
column intersection is bright.
10:52Distant amino acid pairs are dim.
10:56In addition to distances,
10:58the pair representation
can also hold information
11:00on how amino acid molecules are twisted
11:03within the structure.
11:05AlphaFold 1 fed the protein sequence
11:07and its evolutionary table
into its deep neural network,
11:10which it had trained to predict
the pair representation.
11:14Once it had this, a
separate algorithm folded
11:16the amino acid string based
11:18on the distance and torsion constraints.
11:20And this was the final
protein structure prediction.
11:24With this framework,
AlphaFold entered CASP 13
11:28and it immediately turned heads.
11:31It was the clear winner
after many additions,
11:34but it wasn't perfect.
11:36Its score of 70 was not enough
11:38to clear the CASP threshold of 90.
11:42DeepMind needed to get
back to the drawing board
11:44to get better results.
11:46So Hassabis recruited John
Jumper to lead AlphaFold.
11:50- AlphaFold 2 was really
a system about designing
11:54The individual blocks to be
good at learning about proteins,
11:57have the types of geometric
physical, evolutionary concepts
12:01that were needed and put it
into the middle of the network
12:03instead of a process around it.
12:04And that was a tremendous accuracy boost.
12:07- [Derek] There were three key steps
to get better results with AI.
12:11First, maximum compute power.
12:13Here, DeepMind was
already better positioned
12:15than anybody in the world.
12:18It had access to the enormous
computing power of Google,
12:21including their tensor processing units.
12:24Second, they needed a
large and diverse data set.
12:27Is data the biggest roadblock and why?
12:31- I think it's too easy to
say data's the roadblock
12:33and we should be careful about it.
12:34AlphaFold 2 was trained on
the exact same data with much,
12:37much better machine
learning as AlphaFold 1.
12:41So everyone overestimates
the data blockage
12:44because it gets less severe
with better machine learning.
12:48- [Derek] And that was the third key
element, better AI algorithms.
12:53Now AI is not just good
at protein folding.
12:56It can do all kinds of tasks
12:57that no one likes from writing emails
12:59to answering phone calls.
13:01Something I hate is building
and maintaining a website.
13:04It's so much work from
optimizing the website
13:07for different platforms,
finding a good design
13:09so it looks professional
to constantly updating it
13:12with new information about
the business as it grows.
13:16That's why we partnered with Hostinger,
13:18the sponsor of today's video.
13:20Hostinger makes it super
easy to build a website
13:22for yourself or your business.
13:24And with their advanced AI
tools, you can simply describe
13:28what you want your website to look like.
13:30And in just a few seconds,
13:31your personalized website
is up and running.
13:34Hostinger is designed to
be as easy as possible
13:36for beginners and professionals.
13:38So any tweaks you need to make
13:40after that are super easy too.
13:42Just drag and drop any pictures
13:43or videos you want, where you want them,
13:46or just type what you want to say
13:48or have the AI help you here too,
13:49if writing isn't your thing either.
13:51And if you still want that
human touch, Hostinger
13:54with 24/7 support if you
ever run into any issues.
13:57But when you're done building
in just a few clicks,
13:59your website is live.
14:01It's all incredibly
affordable too, with a domain
14:04and business email included for free.
14:06So to take your big idea online today,
14:08visit hostinger.com/ve or
scan this QR code right here.
14:13And when you sign up, remember
to use code VE at checkout
14:17to get 10% off your plan.
14:19I wanna thank Hostinger for sponsoring
14:20this part of the video.
14:21And now back to protein folding.
14:24As the AlphaFold 2 team
searched for better algorithms,
14:27they turned to the transformer.
14:29That's the T in ChatGPT.
14:31And it relies on a
concept called attention.
14:35the animal didn't cross the
street because it was too tired.
14:39Attention recognizes
that it refers to animal
14:42and not street based on the word tired.
14:45Attention adds context to any
kind of sequential information
14:49by breaking it down into chunks,
14:51converting these into
numerical representations
14:53or embeddings and making
connections between them.
14:56In this case, the word it and animal.
15:003Blue1Brown has a great series
15:01of videos specifically about
transformers and attention.
15:06Large language models use attention
15:07to predict the most appropriate
word to add to a sentence,
15:10but AlphaFold also has
sequential information,
15:13not sentences, but amino acid sequences.
15:17And to analyze them,
15:18the AlphaFold team built their own version
15:20of the transformer called an EVO Former.
15:24The EVO Former contained two towers,
15:26evolutionary information
in the biology tower
15:29and pair representations
in the geometry tower.
15:33Gone was AlphaFold 1's deep
neural network that started
15:36with one tower and predicted the other.
15:38Instead, AlphaFold 2's EVO Former
15:40builds each tower separately.
15:42It starts with some initial guesses,
15:44evolutionary tables taken from
known data sets as before,
15:48and the pair representations
15:49based on similar known proteins.
15:51And this time there's a bridge
connecting the two towers
15:54that conveys newly found
biological and geometry clues
16:00In the biology tower,
16:01attention applied on a column identifies
16:03amino acid sequences
that have been conserved.
16:06While along a row, it
finds amino acid mutations
16:09that have occurred together.
16:11Whenever the EVO Former finds
too closely linked amino acids
16:14in the evolutionary table.
16:15It means they are important to structure
16:18and it sends this information
to the geometry tower.
16:20Here attention is applied
16:22to help calculate distances
between amino acids.
16:25- There's also this thing
called triangular attention
16:28that got introduced,
16:30which is essentially about letting
16:31triplets attend to each other.
16:33- [Derek] For each triplet of amino acids,
16:35AlphaFold applies the triangle inequality.
16:37The sum of two sides must
be greater than the third.
16:41This constrains how far apart
16:43these three amino acids can be.
16:45This information is used to
update the pair representation,
16:49- And that helps the model produce like
16:51a self-consistent
picture of the structure.
16:53- [Derek] If the geometry
tower finds it's impossible
16:56for two amino acids to
be close to each other,
16:58then it tells the first tower
17:00to ignore their relationship
in the evolutionary table.
17:03This exchange of information
within the EVO Former
17:06goes on for 48 times,
17:08until information within
both towers is refined.
17:11The geometrical features
learned by this network
17:13are passed onto AlphaFold
2's second main innovation,
17:17the structure module.
17:18- For each amino acid,
17:19we pick three special
atoms in the amino acid
17:22and say that those define a frame.
17:24And what the network does is it imagines
17:26that all the amino acids
start out with the origin
17:29and it has to predict the
appropriate translation
17:31and rotation to move these frames
17:33to where they sit in the real structure.
17:35So that's essentially what
the structure module does.
17:37- But the thing that sets
the structure module apart
17:39is what it doesn't do.
17:41- Previously, people might
have imagined that you would
17:44like to encode the fact that
this is a chain, you know,
17:47and that certain residue
should sit next to each other.
17:50We don't really explicitly
tell AlphaFold that.
17:53It's more like we give
it a bag of amino acids
17:56and it's allowed to position
each of them separately.
17:58And some people have
thought that that helps it
18:02to not get stuck in terms of
where things should be placed.
18:05It doesn't have to always be
thinking about the constraint
18:07of these things forming a chain,
18:09that's something that
emerges naturally later.
18:11- [Derek] That's why live
AlphaFold folding videos
18:13can show it doing some
weirdly non-physical stuff.
18:20The structure module outputs a 3D protein,
18:22but it still isn't ready.
18:24It's recycled at least three more times
18:27through the Evo Former to
gain a deeper understanding
18:29of the protein only then the
final prediction is made.
18:35In December, 2020, DeepMind
returned to a virtual CASP
18:39with AlphaFold 2, and
this time they did it.
18:43- I'm going to read an
email from John Moult.
18:46"Your group has performed
amazingly well in CASP 14,
18:50both relative to other groups
18:52and in absolute model accuracy.
18:54Congratulations on this work."
18:57- [Derek] For many proteins,
AlphaFold 2 predictions
19:00were virtually indistinguishable
from the actual structures
19:03and they finally beat the
gold standard score of 90.
19:10- For me, having worked
on this problem so long,
19:13after many, many stops and starts,
19:16and suddenly this is a solution.
19:17We'd solved the problem.
19:19This gives you such excitement
about the way science works.
19:23- [Derek] Over six decades,
all of the scientists working
19:26around the world on
proteins painstakingly found
19:29about 150,000 protein structures.
19:32Then in one fell swoop, AlphaFold came in
19:35and unveiled over 200 million of them.
19:39Nearly all proteins
known to exist in nature.
19:43In just a few months,
AlphaFold advanced the work
19:46of research labs worldwide
by several decades.
19:51It has directly helped us
develop a vaccine for malaria.
19:55It's made possible the breaking down
19:57of antibiotic resistance enzymes,
19:59which make many life-saving
drugs effective again.
20:02It's even helped us understand
20:03how protein mutations lead
20:04to various diseases from
schizophrenia to cancer,
20:07and biologists studying little known
20:09and endangered species suddenly
20:11had access to proteins
and their life mechanism.
20:15The AlphaFold 2 paper has
been cited over 30,000 times.
20:19It has truly made a step function leap
20:22in our understanding of life.
20:24John Jumper and Demis
Hassabis were awarded one half
20:27of the 2024 Nobel Prize in
chemistry for this breakthrough.
20:30The other half went to David Baker,
20:33but not for predicting
structures using Rosetta.
20:35Instead, it was for designing
20:37completely new proteins from scratch.
20:40- It was really hard to
make brand new proteins
20:42that would do things.
20:43And so that's kind of the
problem that we solved.
20:45- To do so, he uses the
same kind of generative AI
20:48that makes art in programs like Dall-E.
20:51- You can say draw a picture
20:52of a kangaroo riding on a rabbit
20:54or something, and it will do that.
20:56And so it's exactly what
we did with proteins.
20:58- His technique called
"RF Diffusion" is trained
21:01by adding random noise to
a known protein structure,
21:04and then the AI has to remove this noise.
21:07Once trained in this
way, the AI can be asked
21:10to produce proteins for various functions.
21:12It's given a random noise input,
21:14and the AI figures out a brand new protein
21:17that does what you asked it to do.
21:20This work has huge implications.
21:22I mean, imagine you got
bitten by a venomous snake.
21:25If you're lucky, you'll have
access to anti-venom prepared
21:29by milking venom from
the exact kind of snake,
21:32which is then injected into live animals,
21:35and the antibodies from
that animal are extracted
21:37and refined and then given
to you as an anti-venom.
21:41The trouble is often people
have allergic reactions
21:44to these antibodies from other organisms.
21:46But your odds of survival
can be a lot better
21:48with the latest synthetic
proteins designed in Baker's lab.
21:52They've created human
compatible antibodies
21:54that can neutralize lethal snake venom.
21:57This anti-venom could be
manufactured in large quantities
22:00and easily transported to
the places where it's needed.
22:03With these tiny molecular machines,
22:05the possibilities are endless.
22:07What are the applications
you're most excited about?
22:10- So I think vaccines are
gonna be really powerful.
22:12We have a number of proteins
22:14that are in human clinical
trials for cancer,
22:16and we're working on
autoimmune disease now.
22:18We're really excited about problems
22:19like capturing greenhouse gases.
22:21So we're designing enzymes
22:22that can fix methane, break down plastic.
22:26- What makes this approach so effective is
22:28how fast they can create
and iterate the proteins.
22:32- It's really quite miraculous
22:34for anyone who's a
conventional school biochemist
22:36or protein scientist.
22:38We can now have designs on the computer,
22:40get the amino acid sequence
of the design proteins,
22:43and then in just a couple days
we can get the protein out.
22:48Yeah. We've given a name to this,
22:49which is "Cowboy Biochemistry"
22:51because we just like, you
just got kind of go for it
22:54as fast as you can, and it
turns out to work pretty well.
22:58- What AI has done for
proteins is just a hint
23:00of what it can do in other fields
23:02and on larger scales.
23:05In materials science, for example,
23:06DeepMind's GNoME program has
found 2.2 million new crystals,
23:11including over 400,000 stable materials
23:14that could power future technologies
23:15from superconductors to batteries.
23:18AI is creating transformative
leaps in science
23:21by helping to solve some
of the fundamental problems
23:24that have blocked human progress.
23:26- If we think of the
whole tree of knowledge,
23:28you know there are certain problems
23:29where you know if their root, no problems.
23:31If you unlock them, if you
discover a solution to them,
23:33it would unlock a whole new
branch or avenue of discovery.
23:38- And with this, AI is
pushing forward the boundaries
23:41of human knowledge at a
rate never seen before.
23:44- Speed ups of 2x are nice,
23:46they're great, we love them.
23:48Speed ups of a 100,000x, change what you do.
23:53You do fundamentally different stuff
23:55and you start to rebuild your science
23:58around the things that got easy.
24:01- And that's what I'm excited about.
24:03These discoveries represent
real step function
24:08Even if AI doesn't advance
beyond where it is today,
24:10we will be reaping the benefits
24:12of these breakthroughs for decades.
24:15And assuming AI does continue to develop,
24:17well, it will open up opportunities
24:19that were previously thought impossible.
24:21Whether that's curing all diseases,
24:23creating novel materials,
24:25or restoring the environment
to a pristine state.
24:29This sounds like an amazing future as long
24:32as the AI doesn't take over
and destroy us all first.