Tuesday, September 21, 2021

Exploratory data analysis using R

data source: https://docs.google.com/spreadsheets/d/1HEiR7hqHVv_tBcpdR1Wn67_5DH2jvT14pFsCrS3MIus/edit?usp=sharing

Ref: https://www.unodc.org/documents/data-and-analysis/Crime-statistics/Data_Table-final.xls

 # 2-Way Frequency Table
attach(mydata)
mytable <- table(A,B) # A will be rows, B will be columns
mytable # print table

margin.table(mytable, 1) # A frequencies (summed over B)
margin.table(mytable, 2) # B frequencies (summed over A)

prop.table(mytable) # cell percentages
prop.table(mytable, 1) # row percentages
prop.table(mytable, 2) # column percentages 

xtabs

The xtabs( ) function allows you to create crosstabulations using formula style input.

# 3-Way Frequency Table
mytable <- xtabs(~A+B+c, data=mydata)
ftable(mytable) # print table
summary(mytable) # chi-square test of indepedence

If a variable is included on the left side of the formula, it is assumed to be a vector of frequencies (useful if the data have already been tabulated).

 

 FREQUENCY:   In statistics, the frequency (or absolute frequency) of an event is the number of times the event occurred in an experiment or studyFrequency is the value in numbers that shows how often a particular item occurs in the given data set. There are two types of frequency table - Grouped Frequency Distribution and Ungrouped Frequency Distribution. Data can be shown using graphs like histograms, bar graphs, frequency polygons, and so on.

Frequency Distribution Graphs

There is another way to show data that is in the form of graphs and it can be done by using a frequency distribution graph. The graphs help us to understand the collected data in an easy way. The graphical representation of a frequency distribution can be shown using the following:

  • Bar Graphs: Bar graphs represent data using rectangular bars of uniform width along with equal spacing between the rectangular bars.
  • Histograms: A histogram is a graphical presentation of data using rectangular bars of different heights. In a histogram, there is no space between the rectangular bars.
  • Pie Chart: A pie chart is a type of graph that visually displays data in a circular chart. It records data in a circular manner and then it is further divided into sectors that show a particular part of data out of the whole part.
  • Frequency Polygon: A frequency polygon is drawn by joining the mid-points of the bars in a histogram.

Types of Frequency Distribution

There are four types of frequency distribution under statistics which are explained below:

  • Ungrouped frequency distribution: It shows the frequency of an item in each separate data value rather than groups of data values.Ungrouped Frequency Distribution Table: In the ungrouped frequency distribution table, we don't make class intervals, we write the accurate frequency of individual data. Considering the above example, the ungrouped table will be like this. Given below table shows two columns: one is of marks obtained in the test and the second is of frequency (no. of students).
    Marks obtained in TestNo. of Students
    53
    104
    155
    184
    204
    Total20
  • Grouped frequency distribution: In this type, the data is arranged and separated into groups called class intervals. The frequency of data belonging to each class interval is noted in a frequency distribution table. The grouped frequency table shows the distribution of frequencies in class intervals.

> summary(rate$V1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.400   3.300   8.634  10.075  72.800 
>hist(rate$V1)
par(mfrow=c(1,2))
hist(rate$V1)
> hist(rate$V1,breaks=2)



> table(ratecut)
ratecut
   least     less moderate     high 
     287      308      312      294 
> barplot(table(ratecut))


  • Relative frequency distribution: It tells the proportion of the total number of observations associated with each category
  • > barplot(table(ratecut))
  • > barplot(table(ratecut))
  • > lines(ratecut)
  • > table(ratecut)
  1. ratecut
  2.    least     less moderate     high 
  3.      287      308      312      294 
      • > barplot(table(ratecut))
      • > sum(table(ratecut))
      • [1] 1201
      • > table(ratecut)/1201
      • ratecut
      •     least      less  moderate      high 
      • 0.2389675 0.2564530 0.2597835 0.2447960 
      • > (table(ratecut)/1201)*100
      • ratecut
      •    least     less moderate     high 
      • 23.89675 25.64530 25.97835 24.47960 
      • > pie(table(ratecut))




      • .
      • Cumulative frequency distribution: It is the sum of the first frequency and all frequencies below it in a frequency distribution. You have to add a value with the next value then add the sum with the next value again and so on till the last. The last cumulative frequency will be the total sum of all frequencies.



      > attach(crimedata)
      > table(region)
      region
        Africa Americas     Asia   Europe  Oceania 
           167      296      289      438       30 
      > table(subregion2)
      subregion2
                                Australia and New Zealand                 Caribbean 
                             19                        12                        71 
                Central America              Central Asia            Eastern Africa 
                             82                        52                        58 
                   Eastern Asia            Eastern Europe                 Melanesia 
                             30                       114                         6 
                     Micronesia             Middle Africa           Northern Africa 
                             10                        14                        28 
                Northern Europe                 Polynesia        South-Eastern Asia 
                            120                         2                        43 
                  South America           Southern Africa             Southern Asia 
                            124                        26                        50 
                Southern Europe            Western Africa              Western Asia 
                            122                        41                       114 
                 Western Europe 
                             82 
      > table(sourcelevel)
      sourcelevel
             Police Public Health 
                846           374 
      > table(sourcelevel2)
      sourcelevel2
         Government International      National           NGO 
                196           992             6            26 
      > table(metadata)
      metadata
                                                                  Asesinatos, homicidios 
                                                                                      12 
                                                                   Assassinats, meurtres 
                                                                                       3 
                                                           Citing MoJ, Includes attempts 
                                                                                       1 
                                                             Citing National Police data 
                                                                                       1 
                                                          Citing National Police, Murder 
                                                                                       6 
                                    Data provided by Policia de Investigaciones de Chile 
                                                                                       6 
                                                                  Excluding Transdniestr 
                                                                                       5 
                                                                                Homicide 
                                                                                      11 
                                                 Homicide and abetment to commit suicide 
                                                                                       4 
                                                                               Homicidio 
                                                                                      15 
                                                 Homicidio doloso, denuncias registradas 
                                                                                       5 
                               Homicidio, infanticidio y parricidio delitos investigados 
                                                                                       5 
                                                                              Homicidios 
                                                                                      11 
                                                                      Homicidios dolosos 
                                                                                      11 
                                             Homicidios dolosos, citing Policia Nacional 
                                                                                       4 
                                     Homicidios, citing Fiscalia General de la Republica 
                                                                                       6 
                                                      Homicidios, citing National Police 
                                                                                       4 
                                                Homicidios, citing Polica Nacional Civil 
                                                                                       4 
                                                     Homicidios, citing Policia Nacional 
                                                                                       2 
                                                                     Homicidios, muertes 
                                                                                       4 
                                                                       Includes attempts 
                                                                                      23 
                                                            Includes killing unwittingly 
                                                                                       2 
                                                                    Intentional homicide 
                                                                                       2 
                                                     Intentional killing, reported cases 
                                                                                       6 
                                                                      Intentional murder 
                                                                                       3 
                      Intentional murder, father kills son, and assault leading to death 
                                                                                       1 
      Intentional murder, father kills son, killing a wife, and assault leading to death 
                                                                                       2 
                                      Killing and beating till death. Includes attempts. 
                                                                                       5 
                               Meures violentas: homicidios y caidos en acciones legales 
                                                                                       1 
                                                                       Meurtes violentas 
                                                                                       3 
          Meurtes violentas: homicidios y muertos por P.N. en desempeno de sus funciones 
                                                                                       1 
                                                                                  Murder 
                                                                                      76 
                                        Murder and dacoity with murder, data for 2002/03 
                                                                                       1 
                                        Murder and dacoity with murder, data for 2003/04 
                                                                                       1 
                                        Murder and dacoity with murder, data for 2004/05 
                                                                                       1 
                                        Murder and dacoity with murder, data for 2005/06 
                                                                                       1 
                                        Murder and dacoity with murder, data for 2006/07 
                                                                                       1 
                                                      Murder recorded by judicial police 
                                                                                       3 
                                                               Murder reported to police 
                                                                                       6 
                                                          Murder, citing National Police 
                                                                                       3 
                                                                     Murder, infanticide 
                                                                                       5 
                                         Murder, murder of newly born child, infanticide 
                                                                                       5 
                                                                                      nf 
                                                                                       7 
                                                                                    NULL 
                                                                                     934 
                                                                          Persons killed 
                                                                                       2 
                                                            Victimas de homicidio doloso 
                                                                                       5 
      

      >
       
      > summary(rate)
            Rate       
       Min.   : 0.000  
       1st Qu.: 1.400  
       Median : 3.300  
       Mean   : 8.634  
       3rd Qu.:10.075  
       Max.   :72.800 
       
      class(rate) 
       
      hist(rate$Rate)
       
       > table(region$Region,sourcelevel2$Source.Level.2)
                
                 Government International National NGO
        Africa           30           131        0   6
        Americas         95           178        6  17
        Asia             54           232        0   3
        Europe           12           426        0   0
        Oceania           5            25        0   0
       
       
       
      > margin.table(mytable,1)
      
        Africa Americas     Asia   Europe  Oceania 
           167      296      289      438       30 
      > margin.table(mytable,2)
      
         Government International      National           NGO 
                196           992             6            26 
      

      >
      > prop.table(mytable,1)
                
                 Government International   National        NGO
        Africa   0.17964072    0.78443114 0.00000000 0.03592814
        Americas 0.32094595    0.60135135 0.02027027 0.05743243
        Asia     0.18685121    0.80276817 0.00000000 0.01038062
        Europe   0.02739726    0.97260274 0.00000000 0.00000000
        Oceania  0.16666667    0.83333333 0.00000000 0.00000000
      > prop.table(mytable,2)
                
                 Government International   National        NGO
        Africa   0.15306122    0.13205645 0.00000000 0.23076923
        Americas 0.48469388    0.17943548 1.00000000 0.65384615
        Asia     0.27551020    0.23387097 0.00000000 0.11538462
        Europe   0.06122449    0.42943548 0.00000000 0.00000000
        Oceania  0.02551020    0.02520161 0.00000000 0.00000000
      

      slicing the data

      > head(crimedata) V1 V1.1 V1.2 V1.3 V1.4 V1.5 V1.6 1 Africa Not available 37.4 2004 Public Health WHO NULL 2 Africa Not available 11.9 2004 Public Health WHO NULL 3 Africa Not available 3.4 2004 Public Health WHO NULL 4 Africa Not available 16.1 2004 Public Health WHO NULL 5 Africa Not available 6.4 2004 Police Interpol NULL 6 Africa Not available 20.5 2004 Public Health WHO NULL > names(crimedata)=c("region","subregion","rate","year","source1","source2","metadata") > head(crimedata) region subregion rate year source1 source2 metadata 1 Africa Not available 37.4 2004 Public Health WHO NULL 2 Africa Not available 11.9 2004 Public Health WHO NULL 3 Africa Not available 3.4 2004 Public Health WHO NULL 4 Africa Not available 16.1 2004 Public Health WHO NULL 5 Africa Not available 6.4 2004 Police Interpol NULL 6 Africa Not available 20.5 2004 Public Health WHO NULL
          
      > crimedata[2,]
        region     subregion rate year       source1 source2 metadata
      2 Africa Not available 11.9 2004 Public Health     WHO     NULL
      > crimedata[2:3,]
        region     subregion rate year       source1 source2 metadata
      2 Africa Not available 11.9 2004 Public Health     WHO     NULL
      3 Africa Not available  3.4 2004 Public Health     WHO     NULL

      > crimedata[3:6,]
        region     subregion rate year       source1  source2 metadata
      3 Africa Not available  3.4 2004 Public Health      WHO     NULL
      4 Africa Not available 16.1 2004 Public Health      WHO     NULL
      5 Africa Not available  6.4 2004        Police Interpol     NULL
      6 Africa Not available 20.5 2004 Public Health      WHO     NULL

      > crimedata[3:6,3]
      [1]  3.4 16.1  6.4 20.5

      > crimedata[3:6,3,drop=F]
        rate
      3  3.4
      4 16.1
      5  6.4
      6 20.5

      subset(crimedata,year==2004)
      subset(crimedata$region,year==2004)

      table(crimedata$Region,crimedata$Source.Level.1)
                
                 Police Public Health
        Africa      115            52
        Americas    197            99
        Asia        228            61
        Europe      289           149
        Oceania      17            13
      barplot(table(crimedata$Region,crimedata$Source.Level.1),col=c(1:5))

      
      > my.table=table(crimedata$Region,crimedata$Source.Level.1)
      > barplot(my.table)


      barplot(my.table,legend=rownames(my.table),col=c(1:5),main="Vertical Distribution of data sources")
      


      
      > 

      >barplot(my.table,legend=rownames(my.table),col=c(1:5),beside=T,main="Vertical Distribution of data sources")


      > ratecut=cut(rate$Rate,breaks=2,labels = c("low","high"))
      > table(ratecut)
      ratecut
       low high 
      1155   65 


       table(crimedata1$Region,crimedata1$ratecut)
                
                 low high
        Africa   151   16
        Americas 247   49
        Asia     289    0
        Europe   438    0
        Oceania   30    0

      ratecut=cut(rate$Rate,breaks=c(0,4,10,72),labels=c("very.low","medium","very.high"))
      > table(ratecut)
      ratecut
       very.low    medium very.high 
            644       252       304 
      > crimedata1=data.frame(crimedata,ratecut)
      > table(crimedata1$Region,crimedata1$ratecut)
                
                 very.low medium very.high
        Africa         77     23        66
        Americas       21     79       195
        Asia          193     69        26
        Europe        331     75        16
        Oceania        22      6         1
      >my.table=table(crimedata1$Region,crimedata1$ratecut)
      > margin.table(my.table,1)
      
        Africa Americas     Asia   Europe  Oceania 
           166      295      288      422       29 
      > margin.table(my.table,2)
      
       very.low    medium very.high 
            644       252       304
      >round(prop.table(my.table,1),2)
                
                 very.low medium very.high
        Africa       0.46   0.14      0.40
        Americas     0.07   0.27      0.66
        Asia         0.67   0.24      0.09
        Europe       0.78   0.18      0.04
        Oceania      0.76   0.21      0.03
      > round(prop.table(my.table,2),2)
                
                 very.low medium very.high
        Africa       0.12   0.09      0.22
        Americas     0.03   0.31      0.64
        Asia         0.30   0.27      0.09
        Europe       0.51   0.30      0.05
        Oceania      0.03   0.02      0.00 






      No comments:

      Post a Comment