data source: https://docs.google.com/spreadsheets/d/1HEiR7hqHVv_tBcpdR1Wn67_5DH2jvT14pFsCrS3MIus/edit?usp=sharing
Ref: https://www.unodc.org/documents/data-and-analysis/Crime-statistics/Data_Table-final.xls
# 2-Way Frequency Table
attach(mydata)
mytable <- table(A,B) # A will be rows, B will be columns
mytable # print table
margin.table(mytable, 1) # A frequencies (summed over B)
margin.table(mytable, 2) # B frequencies (summed over A)
prop.table(mytable) # cell percentages
prop.table(mytable, 1) # row percentages
prop.table(mytable, 2) # column percentages
xtabs
The xtabs( ) function allows you to create crosstabulations using formula style input.
# 3-Way Frequency Table
mytable <- xtabs(~A+B+c, data=mydata)
ftable(mytable) # print table
summary(mytable) # chi-square test of indepedence
If a variable is included on the left side of the formula, it
is assumed to be a vector of frequencies (useful if the data have
already been tabulated).
FREQUENCY:
In statistics, the frequency (or absolute frequency) of an event is the number of times the event occurred in an experiment or study. Frequency is the value in numbers that shows how often a particular item occurs in the given data set. There are two types of frequency table - Grouped Frequency Distribution and Ungrouped Frequency Distribution. Data can be shown using graphs like histograms, bar graphs, frequency polygons, and so on.
Frequency Distribution Graphs
There is another way to show data that is in the form of graphs and it can be done by using a frequency distribution graph. The graphs help us to understand the collected data in an easy way. The graphical representation of a frequency distribution can be shown using the following:
- Bar Graphs: Bar graphs represent data using rectangular bars of uniform width along with equal spacing between the rectangular bars.
- Histograms: A histogram is a graphical presentation of data using rectangular bars of different heights. In a histogram, there is no space between the rectangular bars.
- Pie Chart: A pie chart is a type of graph that visually displays data in a circular chart. It records data in a circular manner and then it is further divided into sectors that show a particular part of data out of the whole part.
- Frequency Polygon: A frequency polygon is drawn by joining the mid-points of the bars in a histogram.
Types of Frequency Distribution
There are four types of frequency distribution under statistics which are explained below:
- Ungrouped frequency distribution: It shows the frequency of an item in each separate data value rather than groups of data values.Ungrouped Frequency Distribution Table: In the ungrouped frequency distribution table, we don't make class intervals, we write the accurate frequency of individual data. Considering the above example, the ungrouped table will be like this. Given below table shows two columns: one is of marks obtained in the test and the second is of frequency (no. of students).
Marks obtained in Test | No. of Students |
---|
5 | 3 |
10 | 4 |
15 | 5 |
18 | 4 |
20 | 4 |
Total | 20 |
- Grouped frequency distribution: In this type, the data is arranged and separated into groups called class intervals. The frequency of data belonging to each class interval is noted in a frequency distribution table. The grouped frequency table shows the distribution of frequencies in class intervals.
> summary(rate$V1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.400 3.300 8.634 10.075 72.800
>hist(rate$V1)
par(mfrow=c(1,2))
hist(rate$V1)
> hist(rate$V1,breaks=2)
> table(ratecut)
ratecut
least less moderate high
287 308 312 294
> barplot(table(ratecut))
>
- Relative frequency distribution: It tells the proportion of the total number of observations associated with each category
- > barplot(table(ratecut))
- > barplot(table(ratecut))
- > lines(ratecut)
- > table(ratecut)
- ratecut
- least less moderate high
- 287 308 312 294
- > barplot(table(ratecut))
- > sum(table(ratecut))
- [1] 1201
- > table(ratecut)/1201
- ratecut
- least less moderate high
- 0.2389675 0.2564530 0.2597835 0.2447960
- > (table(ratecut)/1201)*100
- ratecut
- least less moderate high
- 23.89675 25.64530 25.97835 24.47960
- > pie(table(ratecut))
- .
- Cumulative frequency distribution: It is the sum of the first frequency and all frequencies below it in a frequency distribution. You have to add a value with the next value then add the sum with the next value again and so on till the last. The last cumulative frequency will be the total sum of all frequencies.
> attach(crimedata)
> table(region)
region
Africa Americas Asia Europe Oceania
167 296 289 438 30
> table(subregion2)
subregion2
Australia and New Zealand Caribbean
19 12 71
Central America Central Asia Eastern Africa
82 52 58
Eastern Asia Eastern Europe Melanesia
30 114 6
Micronesia Middle Africa Northern Africa
10 14 28
Northern Europe Polynesia South-Eastern Asia
120 2 43
South America Southern Africa Southern Asia
124 26 50
Southern Europe Western Africa Western Asia
122 41 114
Western Europe
82
> table(sourcelevel)
sourcelevel
Police Public Health
846 374
> table(sourcelevel2)
sourcelevel2
Government International National NGO
196 992 6 26
> table(metadata)
metadata
Asesinatos, homicidios
12
Assassinats, meurtres
3
Citing MoJ, Includes attempts
1
Citing National Police data
1
Citing National Police, Murder
6
Data provided by Policia de Investigaciones de Chile
6
Excluding Transdniestr
5
Homicide
11
Homicide and abetment to commit suicide
4
Homicidio
15
Homicidio doloso, denuncias registradas
5
Homicidio, infanticidio y parricidio delitos investigados
5
Homicidios
11
Homicidios dolosos
11
Homicidios dolosos, citing Policia Nacional
4
Homicidios, citing Fiscalia General de la Republica
6
Homicidios, citing National Police
4
Homicidios, citing Polica Nacional Civil
4
Homicidios, citing Policia Nacional
2
Homicidios, muertes
4
Includes attempts
23
Includes killing unwittingly
2
Intentional homicide
2
Intentional killing, reported cases
6
Intentional murder
3
Intentional murder, father kills son, and assault leading to death
1
Intentional murder, father kills son, killing a wife, and assault leading to death
2
Killing and beating till death. Includes attempts.
5
Meures violentas: homicidios y caidos en acciones legales
1
Meurtes violentas
3
Meurtes violentas: homicidios y muertos por P.N. en desempeno de sus funciones
1
Murder
76
Murder and dacoity with murder, data for 2002/03
1
Murder and dacoity with murder, data for 2003/04
1
Murder and dacoity with murder, data for 2004/05
1
Murder and dacoity with murder, data for 2005/06
1
Murder and dacoity with murder, data for 2006/07
1
Murder recorded by judicial police
3
Murder reported to police
6
Murder, citing National Police
3
Murder, infanticide
5
Murder, murder of newly born child, infanticide
5
nf
7
NULL
934
Persons killed
2
Victimas de homicidio doloso
5
|
|
|
> summary(rate)
Rate
Min. : 0.000
1st Qu.: 1.400
Median : 3.300
Mean : 8.634
3rd Qu.:10.075
Max. :72.800
class(rate)
hist(rate$Rate)
> table(region$Region,sourcelevel2$Source.Level.2)
Government International National NGO
Africa 30 131 0 6
Americas 95 178 6 17
Asia 54 232 0 3
Europe 12 426 0 0
Oceania 5 25 0 0
> margin.table(mytable,1)
Africa Americas Asia Europe Oceania
167 296 289 438 30
> margin.table(mytable,2)
Government International National NGO
196 992 6 26
|
|
|
> prop.table(mytable,1)
Government International National NGO
Africa 0.17964072 0.78443114 0.00000000 0.03592814
Americas 0.32094595 0.60135135 0.02027027 0.05743243
Asia 0.18685121 0.80276817 0.00000000 0.01038062
Europe 0.02739726 0.97260274 0.00000000 0.00000000
Oceania 0.16666667 0.83333333 0.00000000 0.00000000
> prop.table(mytable,2)
Government International National NGO
Africa 0.15306122 0.13205645 0.00000000 0.23076923
Americas 0.48469388 0.17943548 1.00000000 0.65384615
Asia 0.27551020 0.23387097 0.00000000 0.11538462
Europe 0.06122449 0.42943548 0.00000000 0.00000000
Oceania 0.02551020 0.02520161 0.00000000 0.00000000
|
|
slicing the data
> head(crimedata)
V1 V1.1 V1.2 V1.3 V1.4 V1.5 V1.6
1 Africa Not available 37.4 2004 Public Health WHO NULL
2 Africa Not available 11.9 2004 Public Health WHO NULL
3 Africa Not available 3.4 2004 Public Health WHO NULL
4 Africa Not available 16.1 2004 Public Health WHO NULL
5 Africa Not available 6.4 2004 Police Interpol NULL
6 Africa Not available 20.5 2004 Public Health WHO NULL
> names(crimedata)=c("region","subregion","rate","year","source1","source2","metadata")
> head(crimedata)
region subregion rate year source1 source2 metadata
1 Africa Not available 37.4 2004 Public Health WHO NULL
2 Africa Not available 11.9 2004 Public Health WHO NULL
3 Africa Not available 3.4 2004 Public Health WHO NULL
4 Africa Not available 16.1 2004 Public Health WHO NULL
5 Africa Not available 6.4 2004 Police Interpol NULL
6 Africa Not available 20.5 2004 Public Health WHO NULL |
|
> crimedata[2,]
region subregion rate year source1 source2 metadata
2 Africa Not available 11.9 2004 Public Health WHO NULL
> crimedata[2:3,]
region subregion rate year source1 source2 metadata
2 Africa Not available 11.9 2004 Public Health WHO NULL
3 Africa Not available 3.4 2004 Public Health WHO NULL
> crimedata[3:6,]
region subregion rate year source1 source2 metadata
3 Africa Not available 3.4 2004 Public Health WHO NULL
4 Africa Not available 16.1 2004 Public Health WHO NULL
5 Africa Not available 6.4 2004 Police Interpol NULL
6 Africa Not available 20.5 2004 Public Health WHO NULL
> crimedata[3:6,3]
[1] 3.4 16.1 6.4 20.5
> crimedata[3:6,3,drop=F]
rate
3 3.4
4 16.1
5 6.4
6 20.5
subset(crimedata,year==2004)
subset(crimedata$region,year==2004)
table(crimedata$Region,crimedata$Source.Level.1)
Police Public Health
Africa 115 52
Americas 197 99
Asia 228 61
Europe 289 149
Oceania 17 13
barplot(table(crimedata$Region,crimedata$Source.Level.1),col=c(1:5))
> my.table=table(crimedata$Region,crimedata$Source.Level.1)
> barplot(my.table)
barplot(my.table,legend=rownames(my.table),col=c(1:5),main="Vertical Distribution of data sources")
>
>barplot(my.table,legend=rownames(my.table),col=c(1:5),beside=T,main="Vertical Distribution of data sources")
> ratecut=cut(rate$Rate,breaks=2,labels = c("low","high"))
> table(ratecut)
ratecut
low high
1155 65
table(crimedata1$Region,crimedata1$ratecut)
low high
Africa 151 16
Americas 247 49
Asia 289 0
Europe 438 0
Oceania 30 0
ratecut=cut(rate$Rate,breaks=c(0,4,10,72),labels=c("very.low","medium","very.high"))
> table(ratecut)
ratecut
very.low medium very.high
644 252 304
> crimedata1=data.frame(crimedata,ratecut)
> table(crimedata1$Region,crimedata1$ratecut)
very.low medium very.high
Africa 77 23 66
Americas 21 79 195
Asia 193 69 26
Europe 331 75 16
Oceania 22 6 1
>my.table=table(crimedata1$Region,crimedata1$ratecut)
> margin.table(my.table,1)
Africa Americas Asia Europe Oceania
166 295 288 422 29
> margin.table(my.table,2)
very.low medium very.high
644 252 304
>round(prop.table(my.table,1),2)
very.low medium very.high
Africa 0.46 0.14 0.40
Americas 0.07 0.27 0.66
Asia 0.67 0.24 0.09
Europe 0.78 0.18 0.04
Oceania 0.76 0.21 0.03
> round(prop.table(my.table,2),2)
very.low medium very.high
Africa 0.12 0.09 0.22
Americas 0.03 0.31 0.64
Asia 0.30 0.27 0.09
Europe 0.51 0.30 0.05
Oceania 0.03 0.02 0.00