Statistical Thinking in Science: Crash Course Scientific Thinking #2

CrashCourse

3000 Oxford Words4000 IELTS Words5000 Oxford Words3000 Common Words1000 TOEIC Words5000 TOEFL Words

Субтитры (295)

0:00I am going to die eventually, which is

0:02pretty important to me personally. So,

0:04I'd like to know roughly at what age I

0:07am most likely to die. You might guess

0:09something like 70, which based on a

0:12national data set was the average age of

0:14death in the US for men who died between

0:162018 and 2023. But it might be that 79

0:21is the more accurate answer, which is an

0:24extra 9 years. So, how can I make sure

0:26I'm using the best number to answer my

0:28question? Can stats really tell me when

0:31I might die? And is there a way to look

0:33at these numbers and not have an

0:35existential crisis? Hi, I'm Hank Green

0:38and this is Crash Course Scientific

0:39Thinking.

0:44Do not worry, I'm not going to teach you

0:46how to do statistics today. We have a

0:48whole other course about that. What

0:50we're talking about here is how to make

0:52sense of the stats you encounter in your

0:54everyday life. Statistics are vital for

0:57so much of what goes on around us, from

0:59designing video games to creating

1:01impactful government health policies.

1:03But statistics can be misleading. It's

1:06not because the numbers are lying. It's

1:08that if we don't understand how the

1:09numbers are being used, we might get the

1:11wrong impression about their meaning.

1:13Scientists use statistics to understand

1:15data, but when they're looking at those

1:17numbers, they have all of the context

1:19that goes along with them. By the time

1:21these stats are reported on in the news,

1:23they often lose some of that context,

1:26which can have big impacts on the ways

1:28that we see the world as consumers of

1:30science news.

1:35Scientists rely on numbers to build

1:37knowledge. But since they can't measure

1:39every person, they use samples, smaller

1:42groups they can measure to better

1:44understand a larger group, which means

1:46there's always some uncertainty. So,

1:48while stats could never tell me, Hank

1:50Green, exactly when I will die, they can

1:52tell me when a person like me is most

1:55likely to die. So, what is the typical

1:57age of death for an American man? Well,

1:59when it comes to statistics, there's a

2:01few different ways of determining what's

2:04typical. One of the most common is to

2:06find the mean or average, the sum of all

2:08the numbers in a sample divided by how

2:10many numbers are in that sample. That's

2:13where we get the first number from.

2:14Based on a large sample of residents who

2:17died between 2018 and 2023, the average

2:20or mean age of death of a man in the US

2:23is 70. But that mean is dragged down by

2:27people who died way younger than 70,

2:29even though there are fewer of them. So

2:31maybe I don't actually want the average.

2:33Maybe instead I want to know the most

2:35common age of death or the mode. That

2:38answer is actually way different from

2:40the mean. The mode is the number that

2:42shows up the most in the data, which is

2:44where we get 79 from. But actually, most

2:47of the numbers in the sample are to the

2:49left of the mode. So, it's actually more

2:52likely that I'd land on one of the

2:53numbers under 79 than that I'd land

2:56squarely on or after 79. So, say then I

2:59want to find an age somewhat close to

3:02the average age when someone like me

3:04would die. I can look at the numbers in

3:06the graph and find the standard

3:07deviation, which tells me how spread out

3:10the other points in the sample are from

3:12the mean, which in turn can help me

3:15figure out how typical that number

3:17really is. If the standard deviation is

3:19small, that tells me most people in this

3:21sample are dying at ages pretty close to

3:24the average age. Another number that

3:26might be helpful is the median or the

3:28point right in the middle of the group

3:30where an equal number of US men died

3:32before and after. And that would be 73.

3:35Still relatively close to 70 and 79, but

3:39different enough to matter because the

3:41median is always the number directly in

3:43the middle of the data set. It is less

3:45likely to be skewed one way or the other

3:47the way a mean might be. So, it might

3:49tell me way more about when American men

3:52tend to die. though of course it still

3:54cannot tell me when I'll die. The point

3:56is averages like mean, median, and mode

3:59are different ways of telling you what

4:00might be typical. But they're way more

4:02useful when you understand how each one

4:05operates differently. And they're even

4:07more useful when combined with the

4:09standard deviation, which tells us how

4:12typical typical really is. There's

4:15always a degree of uncertainty when it

4:17comes to statistics. So, another useful

4:19question is, okay, but how certain are

4:22we of these stats? For a stat to really

4:24mean anything, I need to know how much

4:26confidence to have in it. How likely is

4:28it that if I ran the numbers again, I'd

4:31get those same results? For that, I'd

4:33need to calculate a confidence interval,

4:35or a range of numbers that I can expect

4:37a result to fall within a certain

4:39percentage of the time. A 95% confidence

4:42interval means that if scientists

4:44repeated the study a 100 times with new

4:46samples, the statistic they're measuring

4:48would fall in that range about 95 times.

4:51It shows how much that number might vary

4:53and how much trust can be put into it. A

4:56stat with a high confidence interval is

4:58quite predictive, but it is not perfect.

5:00So when encountering statistics in the

5:02real world, it's good to remember that

5:04every stat actually has two pieces.

5:07first the number and second how

5:09precisely scientists know that number

5:12and it is way better to be roughly right

5:14than precisely wrong. Hold on for a

5:17moment. I'm being told that we have a

5:18special guest on the way. It sounds like

5:20it's time for some sage advice.

5:32>> Hi Hank. Did you know that women also

5:34die? Yes, I did sadly know that.

5:36>> You just talk about dudes a lot. For

5:38example, consider this updated birth

5:39control pill. According to the news, it

5:41raised the risk of developing deadly

5:43blood clots by 100%.

5:46>> That's definitely a big statistic.

5:48>> It sounds like it, right? With the old

5:50pill, 1 in 7,000 people were at risk of

5:53developing blood clots. With the new

5:55pill, the risk doubled. Do you know what

5:58it became?

5:59>> Yeah. If it doubled, I'd guess it went

6:01from one to two.

6:02>> What a great guess. It sounds like a lot

6:05when someone says risk has increased by

6:07100%. But that's just what scientists

6:10call the relative risk or how much the

6:12likelihood of something happening gets

6:14bigger or smaller relative to something

6:16else, which can be helpful to know, but

6:18it doesn't tell us the whole story. For

6:20that, we need the absolute risk or the

6:22number of people actually experiencing

6:24an event in relation to the population

6:26at risk.

6:27>> The absolute risk stayed relatively low,

6:29>> right? It increased to 2 in7,000. Still

6:33important, but people need that context

6:36you talked about earlier to make

6:37informed decisions. At the time though,

6:39a lot of people only learned about this

6:41risk in relative terms through the news.

6:45So, people switched to less effective

6:46pregnancy prevention methods. And you

6:48know what poses a higher risk of

6:50life-threatening blood clots than the

6:51birth control pill? Pregnancy. So, the

6:53more we understand numbers in context,

6:56the better we'll be at making informed

6:58decisions for our lives. And that's been

7:00today's Sage advice.

7:03>> Thanks, Sage. Sage is correct.

7:05Understanding the difference between

7:07absolute risk and relative risk can help

7:09us make sense of so many of the stats we

7:12encounter in our daily lives. Like, how

7:14great is my risk of developing cancer if

7:17I go to the beach every day and don't

7:18wear sunscreen? Which actually brings me

7:21to my next point. Scientists often

7:22analyze relationships in data like the

7:25relationship between sunscreen and skin

7:27cancer. These are known as correlations.

7:30A correlation is a relationship between

7:32two or more variables which are

7:34basically anything that can be measured

7:36or counted. A correlation between two

7:38variables can be loose or it can be

7:40tight which we quantify with their R

7:43value. It's a number fromgative -1 to

7:45one that shows how tightly two things

7:48move together. One means a perfect

7:50match. Negative one means perfect

7:52opposite and zero means no connection.

7:55The simplest kind of correlation is

7:57linear between just two variables. A

7:59correlation can be negative meaning one

8:01variable gets smaller as the other gets

8:03bigger. Like for example how higher

8:05rates of wearing sunscreen correlate to

8:08lower rates of skin cancer or it can be

8:10positive like if say higher rates of ice

8:13cream sales correlate to higher rates of

8:15shark attacks. You might have heard the

8:16saying correlation doesn't equal

8:18causation. But there's actually more to

8:20it than that. Like in the case of

8:21sunscreen, there's a lot of good

8:23evidence that wearing it really does

8:24lower the risk of cancer. There is a

8:26causal link in the correlation. But in

8:29the case of shark attacks, it's safe to

8:30say that the ice cream isn't causing

8:32them. Warm weather is indirectly leading

8:34to both. In this case, weather is a

8:37confounding variable or a factor that

8:39influences the outcome of a study

8:41without being controlled for. These can

8:43blur what's actually going on in the

8:45data if scientists don't measure and

8:47account for them. For example, some

8:48studies have shown a positive

8:50correlation between personal health and

8:52visits to the beach. But it's hard to

8:54know if beaches make people healthier,

8:56if healthy people are more likely to go

8:58to the beach, or if there's some third

9:00confounding variable like the level of

9:03wealth that results in both better

9:05health and more beach visits. And even

9:07if scientists do a good job of

9:08controlling for all of these variables,

9:10they still have to ask, is it possible

9:12this result was just a fluke in our

9:14data? In other words, was it

9:16statistically significant? Statistical

9:18significance means the result is strong

9:20enough that it would be surprising to

9:21get by random chance. But don't let this

9:24phrasing mislead you either. In science,

9:26significant doesn't mean important. Like

9:28how I say Doritos are a significant part

9:30of my life. That means they're important

9:32to me. But that's different from

9:34statistical significance. Statistical

9:35significance doesn't even necessarily

9:37mean meaningful in the real world. It's

9:40more like it would be surprising to get

9:42this result at random, so we should dig

9:45deeper. And digging deeper is something

9:47we can all do when it comes to

9:49statistics. And that begins by

9:51understanding that there will always be

9:53some uncertainty. Scientists can't

9:55possibly measure every version of

9:57everything they want to study. But stats

10:00can help them measure the uncertainty.

10:02And understanding what numbers can and

10:05can't tell us about ourselves, each

10:07other, and the world can help us not

10:09only better understand the way that

10:11science works, but also help us make

10:13more informed judgments about our own

10:16lives.

10:18In our next episode, we're going to look

10:19at how rare it actually is for a single

10:22experiment to change our understanding

10:24of science. I'll see you then. This

10:26episode of Crash Course Scientific

10:28Thinking was produced in partnership

10:29with HHMI Bio Interactive, bringing real

10:32science stories to thousands of high

10:34school and undergrad life science

10:35classrooms. If you're a teacher, visit

10:37their website for resources that explore

10:39the topics we discussed in this video

10:41today. Thanks for watching this episode

10:43of Crash Course Scientific Thinking,

10:44which was filmed in Missoula, Montana,

10:46and was made with the help of all these

10:47nice people. If you would like to help

10:49us keep Crash Course free for everyone

10:51forever, you can join our community on

10:53Patreon.