Licence : Non spécifiée / Incluse

Vote :

Informations

Catégorie : mViewer GX Creator Lua TI-Nspire
Auteur : bendo07
Type : Classeur 3.6
Page(s) : 41
Taille Size: 3.39 Mo MB
Mis en ligne : 12/09/2019 - 16:49:47
Visibilité Visibility: Archive publique

Description

Chapter 2
Displaying and
Describing
Categorical Data

2.1 Summarizing a Categorical Variable
A frequency table organizes data by recording totals and
category names as in the table below.

The names of the categories label each row in the frequency
table.

Some tables report counts, others report percentages, and many
report both.

2.1 Summarizing a Categorical Variable
Example: Super Bowl
The Super Bowl, the championship game of the National Football League of
the United States, is an important annual social event for Americans, with
tens of millions of viewers. The ads that air during the game are expensive: a

Polls often ask whether respondents are more interested in the game or the
commercials. Here are 40 responses from one such poll.

2.1 Summarizing a Categorical Variable
Example: Super Bowl
Make a frequency table for this variable. Include counts and percentages.

2.2 Displaying a Categorical Variable
The Three Rules of Data Analysis
Make a picture. Make a picture. Make a picture. Pictures …

• reveal things that can’t be seen in a table of numbers.

• show important features and patterns in the data.

• provide an excellent means for reporting findings to others.

2.2 Displaying a Categorical Variable
The Area Principle

The figure given distorts the data from the frequency table.

2.2 Displaying a Categorical Variable
The Area Principle

The best data displays observe the area principle: the area
occupied by a part of the graph should correspond to the
magnitude of the value it represents.

2.2 Displaying a Categorical Variable
Bar Charts

A bar chart displays the distribution of a categorical variable,
showing the counts for each category next to each other for easy
comparison.

The bar graph here gives
a more accurate visual
impression of the sandal
data, though it may not be
as visually entertaining.

2.2 Displaying a Categorical Variable
Bar Charts

If the counts are replaced with percentages, the data can be
displayed in a relative frequency bar chart.

The relative frequency bar
chart looks the same as the
bar chart, but shows the
proportion of visits in each
category rather than counts.

2.2 Displaying a Categorical Variable
Pie Charts

Pie charts show the whole group of cases as a circle sliced
into pieces with sizes proportional to the fraction of the whole
in each category. The KEEN Inc. data is displayed below.

2.2 Displaying a Categorical Variable
Before making a bar chart or pie chart,

• the data must satisfy the Categorical Data Condition: the data
are counts or percentages of individuals in categories.

• be sure the categories don’t overlap.

• consider what you are attempting to communicate about the
data.

2.3 Exploring Two Categorical Variables: Contingency Tables

• To show how two categorical variables are related, we can
create a contingency table.

• Contingency tables show how individuals are distributed along
each variable depending on the value of the other variable.

2.3 Exploring Two Categorical Variables: Contingency Tables

• The marginal distribution of a variable in a contingency table
is the total count that occurs when the value of that variable is
held constant.

• Each cell of a contingency table (any intersection of a row
and column of the table) gives the count for a combination of
values of the two variables.

• Rather than displaying the data as counts, a table may
display the data as a percentage – as a total percent, row
percent, or column percent, which show percentages with
respect to the total count, row count, or column count,
respectively.

2.3 Exploring Two Categorical Variables: Contingency Tables

Example: Pew Research
One question of interest to business decision makers is how
common it is for citizens of different countries to use social
networking and whether they have it available to them.

2.3 Exploring Two Categorical Variables: Contingency Tables

Example: Pew Research
But if we want to target our online customer relations with social
networks differently in different countries, wouldn’t it be more
interesting to know how social networking use varies from country
to country?

2.3 Exploring Two Categorical Variables: Contingency Tables

Conditional Distributions

Variables may be restricted to show the distribution for just those
cases that satisfy a specified condition. This is called a conditional
distribution.

The more interesting questions are contingent on something.

We’d like to know, for example, whether these countries are
similar in use and availability of social networking.

2.3 Exploring Two Categorical Variables: Contingency Tables

Conditional Distributions

The conditional distribution of Social Networking conditioned on
two values of Country. This table shows column percentages.

2.3 Exploring Two Categorical Variables: Contingency Tables

Conditional Distributions

Variables can be related in many ways, so it is typically easier to
ask if they are not related.

In a contingency table, when the distribution of one variable is the
same for all categories of another variable, we say that the
variables are independent.

This tells us there is no association between these variables.

2.3 Exploring Two Categorical Variables: Contingency Tables

Example: Super Bowl
Here is a contingency table of the responses for 1008 adult U.S. respondents
to the question about watching the Super Bowl discussed previously:

Does it seem that there is an association between what viewers are
interested in watching and their sex?
2.3 Exploring Two Categorical Variables: Contingency Tables

Example: Super Bowl
First, find the conditional distributions of the four responses for each sex:

2.3 Exploring Two Categorical Variables: Contingency Tables

Example: Super
Bowl
Next, display the two
distributions with side-by-
side bar charts:

Based on this poll, there
appears to be an
association between the
viewer’s sex and what the
viewer is most looking
forward to.

2.4 Segmented Bar Charts and Mosaic Plots

To further visualize conditional distributions, we can
create segmented bar charts and mosaic plots.

A segmented bar chart treats each bar as the “whole” and
divides it proportionally into segments corresponding to
the percentage in each group.

A variant of the segmented bar chart, a mosaic plot, looks
like a segmented bar chart, but obeys the area principle
better by making the bars proportional to the sizes of the
groups.

2.4 Segmented Bar Charts and Mosaic Plots

Everyone knows what happened in the North Atlantic on
the night of April 14, 1912
The Titanic sank, leaving almost 1500 passengers
and crew members on board to meet their icy fate. Here
is a contingency table of the 2201 people on board:

2.4 Segmented Bar Charts and Mosaic Plots

Here is a side-by-side bar chart showing the conditional
distribution of Survival for each category of ticket Class:

2.4 Segmented Bar Charts and Mosaic Plots

Here is a segmented
bar(or Stacked Bar) chart.
We can clearly see that
the distributions of ticket
Class are different,
indicating again that
survival was not
independent of ticket
class:

2.4 Segmented Bar Charts and Mosaic Plots

Finally, here is a mosaic
plot for Class by Survival.

Also called a
Heat Map

Combining percentages across very different values or groups
can give confu...

