Субтитры (295)
0:00I am going to die eventually, which is
0:02pretty important to me personally. So,
0:04I'd like to know roughly at what age I
0:07am most likely to die. You might guess
0:09something like 70, which based on a
0:12national data set was the average age of
0:14death in the US for men who died between
0:162018 and 2023. But it might be that 79
0:21is the more accurate answer, which is an
0:24extra 9 years. So, how can I make sure
0:26I'm using the best number to answer my
0:28question? Can stats really tell me when
0:31I might die? And is there a way to look
0:33at these numbers and not have an
0:35existential crisis? Hi, I'm Hank Green
0:38and this is Crash Course Scientific
0:44Do not worry, I'm not going to teach you
0:46how to do statistics today. We have a
0:48whole other course about that. What
0:50we're talking about here is how to make
0:52sense of the stats you encounter in your
0:54everyday life. Statistics are vital for
0:57so much of what goes on around us, from
0:59designing video games to creating
1:01impactful government health policies.
1:03But statistics can be misleading. It's
1:06not because the numbers are lying. It's
1:08that if we don't understand how the
1:09numbers are being used, we might get the
1:11wrong impression about their meaning.
1:13Scientists use statistics to understand
1:15data, but when they're looking at those
1:17numbers, they have all of the context
1:19that goes along with them. By the time
1:21these stats are reported on in the news,
1:23they often lose some of that context,
1:26which can have big impacts on the ways
1:28that we see the world as consumers of
1:35Scientists rely on numbers to build
1:37knowledge. But since they can't measure
1:39every person, they use samples, smaller
1:42groups they can measure to better
1:44understand a larger group, which means
1:46there's always some uncertainty. So,
1:48while stats could never tell me, Hank
1:50Green, exactly when I will die, they can
1:52tell me when a person like me is most
1:55likely to die. So, what is the typical
1:57age of death for an American man? Well,
1:59when it comes to statistics, there's a
2:01few different ways of determining what's
2:04typical. One of the most common is to
2:06find the mean or average, the sum of all
2:08the numbers in a sample divided by how
2:10many numbers are in that sample. That's
2:13where we get the first number from.
2:14Based on a large sample of residents who
2:17died between 2018 and 2023, the average
2:20or mean age of death of a man in the US
2:23is 70. But that mean is dragged down by
2:27people who died way younger than 70,
2:29even though there are fewer of them. So
2:31maybe I don't actually want the average.
2:33Maybe instead I want to know the most
2:35common age of death or the mode. That
2:38answer is actually way different from
2:40the mean. The mode is the number that
2:42shows up the most in the data, which is
2:44where we get 79 from. But actually, most
2:47of the numbers in the sample are to the
2:49left of the mode. So, it's actually more
2:52likely that I'd land on one of the
2:53numbers under 79 than that I'd land
2:56squarely on or after 79. So, say then I
2:59want to find an age somewhat close to
3:02the average age when someone like me
3:04would die. I can look at the numbers in
3:06the graph and find the standard
3:07deviation, which tells me how spread out
3:10the other points in the sample are from
3:12the mean, which in turn can help me
3:15figure out how typical that number
3:17really is. If the standard deviation is
3:19small, that tells me most people in this
3:21sample are dying at ages pretty close to
3:24the average age. Another number that
3:26might be helpful is the median or the
3:28point right in the middle of the group
3:30where an equal number of US men died
3:32before and after. And that would be 73.
3:35Still relatively close to 70 and 79, but
3:39different enough to matter because the
3:41median is always the number directly in
3:43the middle of the data set. It is less
3:45likely to be skewed one way or the other
3:47the way a mean might be. So, it might
3:49tell me way more about when American men
3:52tend to die. though of course it still
3:54cannot tell me when I'll die. The point
3:56is averages like mean, median, and mode
3:59are different ways of telling you what
4:00might be typical. But they're way more
4:02useful when you understand how each one
4:05operates differently. And they're even
4:07more useful when combined with the
4:09standard deviation, which tells us how
4:12typical typical really is. There's
4:15always a degree of uncertainty when it
4:17comes to statistics. So, another useful
4:19question is, okay, but how certain are
4:22we of these stats? For a stat to really
4:24mean anything, I need to know how much
4:26confidence to have in it. How likely is
4:28it that if I ran the numbers again, I'd
4:31get those same results? For that, I'd
4:33need to calculate a confidence interval,
4:35or a range of numbers that I can expect
4:37a result to fall within a certain
4:39percentage of the time. A 95% confidence
4:42interval means that if scientists
4:44repeated the study a 100 times with new
4:46samples, the statistic they're measuring
4:48would fall in that range about 95 times.
4:51It shows how much that number might vary
4:53and how much trust can be put into it. A
4:56stat with a high confidence interval is
4:58quite predictive, but it is not perfect.
5:00So when encountering statistics in the
5:02real world, it's good to remember that
5:04every stat actually has two pieces.
5:07first the number and second how
5:09precisely scientists know that number
5:12and it is way better to be roughly right
5:14than precisely wrong. Hold on for a
5:17moment. I'm being told that we have a
5:18special guest on the way. It sounds like
5:20it's time for some sage advice.
5:32>> Hi Hank. Did you know that women also
5:34die? Yes, I did sadly know that.
5:36>> You just talk about dudes a lot. For
5:38example, consider this updated birth
5:39control pill. According to the news, it
5:41raised the risk of developing deadly
5:46>> That's definitely a big statistic.
5:48>> It sounds like it, right? With the old
5:50pill, 1 in 7,000 people were at risk of
5:53developing blood clots. With the new
5:55pill, the risk doubled. Do you know what
5:59>> Yeah. If it doubled, I'd guess it went
6:02>> What a great guess. It sounds like a lot
6:05when someone says risk has increased by
6:07100%. But that's just what scientists
6:10call the relative risk or how much the
6:12likelihood of something happening gets
6:14bigger or smaller relative to something
6:16else, which can be helpful to know, but
6:18it doesn't tell us the whole story. For
6:20that, we need the absolute risk or the
6:22number of people actually experiencing
6:24an event in relation to the population
6:27>> The absolute risk stayed relatively low,
6:29>> right? It increased to 2 in7,000. Still
6:33important, but people need that context
6:36you talked about earlier to make
6:37informed decisions. At the time though,
6:39a lot of people only learned about this
6:41risk in relative terms through the news.
6:45So, people switched to less effective
6:46pregnancy prevention methods. And you
6:48know what poses a higher risk of
6:50life-threatening blood clots than the
6:51birth control pill? Pregnancy. So, the
6:53more we understand numbers in context,
6:56the better we'll be at making informed
6:58decisions for our lives. And that's been
7:03>> Thanks, Sage. Sage is correct.
7:05Understanding the difference between
7:07absolute risk and relative risk can help
7:09us make sense of so many of the stats we
7:12encounter in our daily lives. Like, how
7:14great is my risk of developing cancer if
7:17I go to the beach every day and don't
7:18wear sunscreen? Which actually brings me
7:21to my next point. Scientists often
7:22analyze relationships in data like the
7:25relationship between sunscreen and skin
7:27cancer. These are known as correlations.
7:30A correlation is a relationship between
7:32two or more variables which are
7:34basically anything that can be measured
7:36or counted. A correlation between two
7:38variables can be loose or it can be
7:40tight which we quantify with their R
7:43value. It's a number fromgative -1 to
7:45one that shows how tightly two things
7:48move together. One means a perfect
7:50match. Negative one means perfect
7:52opposite and zero means no connection.
7:55The simplest kind of correlation is
7:57linear between just two variables. A
7:59correlation can be negative meaning one
8:01variable gets smaller as the other gets
8:03bigger. Like for example how higher
8:05rates of wearing sunscreen correlate to
8:08lower rates of skin cancer or it can be
8:10positive like if say higher rates of ice
8:13cream sales correlate to higher rates of
8:15shark attacks. You might have heard the
8:16saying correlation doesn't equal
8:18causation. But there's actually more to
8:20it than that. Like in the case of
8:21sunscreen, there's a lot of good
8:23evidence that wearing it really does
8:24lower the risk of cancer. There is a
8:26causal link in the correlation. But in
8:29the case of shark attacks, it's safe to
8:30say that the ice cream isn't causing
8:32them. Warm weather is indirectly leading
8:34to both. In this case, weather is a
8:37confounding variable or a factor that
8:39influences the outcome of a study
8:41without being controlled for. These can
8:43blur what's actually going on in the
8:45data if scientists don't measure and
8:47account for them. For example, some
8:48studies have shown a positive
8:50correlation between personal health and
8:52visits to the beach. But it's hard to
8:54know if beaches make people healthier,
8:56if healthy people are more likely to go
8:58to the beach, or if there's some third
9:00confounding variable like the level of
9:03wealth that results in both better
9:05health and more beach visits. And even
9:07if scientists do a good job of
9:08controlling for all of these variables,
9:10they still have to ask, is it possible
9:12this result was just a fluke in our
9:14data? In other words, was it
9:16statistically significant? Statistical
9:18significance means the result is strong
9:20enough that it would be surprising to
9:21get by random chance. But don't let this
9:24phrasing mislead you either. In science,
9:26significant doesn't mean important. Like
9:28how I say Doritos are a significant part
9:30of my life. That means they're important
9:32to me. But that's different from
9:34statistical significance. Statistical
9:35significance doesn't even necessarily
9:37mean meaningful in the real world. It's
9:40more like it would be surprising to get
9:42this result at random, so we should dig
9:45deeper. And digging deeper is something
9:47we can all do when it comes to
9:49statistics. And that begins by
9:51understanding that there will always be
9:53some uncertainty. Scientists can't
9:55possibly measure every version of
9:57everything they want to study. But stats
10:00can help them measure the uncertainty.
10:02And understanding what numbers can and
10:05can't tell us about ourselves, each
10:07other, and the world can help us not
10:09only better understand the way that
10:11science works, but also help us make
10:13more informed judgments about our own
10:18In our next episode, we're going to look
10:19at how rare it actually is for a single
10:22experiment to change our understanding
10:24of science. I'll see you then. This
10:26episode of Crash Course Scientific
10:28Thinking was produced in partnership
10:29with HHMI Bio Interactive, bringing real
10:32science stories to thousands of high
10:34school and undergrad life science
10:35classrooms. If you're a teacher, visit
10:37their website for resources that explore
10:39the topics we discussed in this video
10:41today. Thanks for watching this episode
10:43of Crash Course Scientific Thinking,
10:44which was filmed in Missoula, Montana,
10:46and was made with the help of all these
10:47nice people. If you would like to help
10:49us keep Crash Course free for everyone
10:51forever, you can join our community on