Thursday, June 21, 2018

R -Programming on Data distribution

    Today we will learn data distribution. Data distribution is very important for the data analysis as well as the data analytics.
    When we are dealing with the big data we have to think about the data distribution. There are different types of distribution like
    Binomial distribution
    Possion distribution
    Normal distribution
    Uniform distribution
    Beta distribution
    Gamma distribution
    And so on...
    Another thing is that we *can customise our distribution*
    Here i am creating one vector that is sohini
    So, *sohini=(1:100)*
    It means sohini vector includes data from the range 1 to 100.
    1 is the lower bound that is minimum and 100 is the upper bound that is maximum.Let's calculate the mean and the sd of the sohini vector.
    Since the data distribution in sohini is uniform we find the box whisker pot similar to the plot of uniform distribution.Again in the density plot we will find the uniform distribution here the command is *plot(sohini)*
    We can use *summary(sohini)* we will get the result minimum, 1st quartile, mean, median, 3rd quartile and the maximum.
    Unif is the command for uniform distribution. So the command is
    rai=10
    akshita=50
    sohini =runif(sohini, rai, akshita)
    Be careful of capital and small letter. Some functions like mean,sd,summary. Here we are using lower case. But there are some commands like NROW,NCOL, there we should use capital.
    Now you will find two different types of plots earlier you have received plot from one to hundred but here, you are getting the plot that is also the uniform distribution plot where, minimum is ten and maximum is fifty, so when you are writing *sohini=runif(sohini)*
    So sohini means 100, min that is rai and akshita the maximum
    So this is the range.
    Comments
    18.6.18
    Akshita Bindal wrote on R programming
    ...See more
    LikeShow More Reactions
    Comment
    Comments

Tuesday, June 5, 2018

My lecture on R-Programming in Whatsapp ...Box-whisker plot

Box-whisker plot


Today we started the discussion with how to *compare two similar datasets* with the help of boxwhisker plot.
When, we find that the data types are same
for example all the data are in continuous and when data size is very big.
Then sometimes we need to compare the training data or the test data with the final data.
Suppose we have collected 200 data and are based on likert scale measuring attitude towards job satisfaction.
And you want to compare the two sets of data one is from 1 to 100 and another from 101 to 200.
We are interested *to prepare the box whisker plot of the two sets of data*.
We can understand by using iris data:
Step 1
*iris1=iris$Sepal.Length [1:75]*
*iris2 =iris$Sepal.Length [76:150]*
Step 2
Test the data size of iris1 and iris2
*NROW(iris1)*
*NROW(iris2)*
Step 3
Make one data frame with iris 1 and iris 2
One vector like
*iris 1_2=data.frame(iris1,iris2)*
Step 4
Make box whisker plot for these two sets of data
*boxplot(iris1_2)*
Step 5
To find out the value of outlier
7 is the outlier we can get it by using *boxplot. Stats (iris3$iristest1)*
Here iris3 is the data frame.
*Cbind*.
Then the vector is equal to c (iris1,iris2). This vector include 2 elements iris1 and iris2
*iris3=c(iris1,iris2)*
To get the different parameters of iris1 and Iris2 we can use the
*sapply (iris3,mean)*
Likewise median /sd/range/summary
========================================================================