Posts

Final Project: Serially looking at your Cereal Choices

Image
  Hi everyone!  Almost everyone starts their mornings with a nice bowl of cereal. In the grocery aisle, you choose a box of cheerios and chocolate puffs. However, your decision to buy that cereal may not have been completely your choice at all. You just picked from the options that were conveniently placed in front of you.  My project will analyze the relationship between the shelves cereals are placed on to the sugar, fiber, and calories of that cereal. Do the ingredients of cereals, either healthy or sugary, play a role in what shelf cereals are placed on and their ratings?  Two Part Analysis  Do the ingredients s ugar, calories, and fiber content have a true mean difference with that cereal's placement on shelf 1,2, or 3 (counting from the ground)? I will use boxplots to understand each ingredients distribution in the shelves and perform ANOVA tests to determine if each ingredient is significant for the placement of the the cereal. ANOVA i...

Module 12: Timely Time Series

Image
  Hi everyone!  This week we are looking into time series where we dive into the past to predicate the future! Our goals in using time series analysis include identifying patterns in the periodic data, predict short term trends, and model these findings. Time series data has components with the most common being the trend and seasonal components. In our data if we see a lot of fluctuation around a common line or "zone", we can focus on methods to understand the trend component like simple exponential smoothing. If more of the data fluctuates based on periodic fluctuations we use methods like decomposing and seasonally adjusting. We will be focusing on exponential smoothing in our data today! This is useful for short term forecasting with an alpha coefficient between 0 to 1 indicating the weight of the moving average. A smaller alpha means the data is getting more smoothed while a higher alpha indicates less smoothing.  Let's get started! The table...

Module 11: Logical Logistic Regression

Image
  Hi everyone!  This week we looked into logistic regression, where like other forms of regression analysis we have looked at we estimate dependent variables based on the effect of the independent variable. The unique aspect of logistic regression is the dependent variable is binary, so the outcome has two options like pass or fail, yes or no, etc.  The formula is: y = (e^(b0 + b1x)) / (1 + e^(b0 + b1x)) where y equals the output proportion between 0 and 1 The best b values will result in y being close to 1. In R, we calculate logistic regression using the function glm(). 10.1 1. Set up an additive model for the  ashina  data, as part of ISwR package. 2. This data contain additive effects on subjects, period and treatment.  Compare  the results with those with those obtained from t tests.  The treatment is significant in the ANOVA results at a p value of 0.01228. The t test of getting the treatment vs not shows a significant p ...

Module 10: Varying Multivariate Regression

Image
  Hi everyone!  This week we dive into a more complex regression analysis than the linear regression we have done. Before we had one predictor variable and one outcome variable. In multivariate multiple regression, we have multiple predictor(or independent) variable but one outcome variables we are analyzing. The formula looks like  y = b 1 x 1  + b 2 x 2  + …   + b n x n  + error.  We will look into the relationships between the variables collectively using ANOVA analysis and individually through linear regression (lm) to understand which variables are contributing to the significance of the relationship. R sqaured will be an important factor to look into because it shows the amount of variation in dependent variable explained by the regression, thus a larger R squared indicates the predictor is more precisely able to predict outcome.  9.1.  Conduct ANOVA (analysis of variance) and Regression coefficients to the data from cyst...

Module 9: Tabulating Tabular Data in R

Image
  Hi everyone!  This week's assignment focuses on using R to make tables from data frames. We also explore adding sum totals for both rows and columns in our contingency tables and creating proportion tables. 1. Generate simple table in R that consists of four rows: Country, age, salary and purchased. 2. Generate contingency table also know as r x c table (Chapter 7, p.135) using  mtcars  dataset. >assignment9  < -  table ( mtcars$gear, mtcars$cyl, dnn= c ("fill out here") 2.1 Add the  addmargins()  function to report on the sum totals of the rows and columns of assignment9 table >addmargins(assignment9) 2.2 Add  prop.tables()  function, and report on the proportional weight of each value in a assignment9 table 2.3 Add  margin  = 1 to the argument under  prop.table()  function, and report on the row proportions found in assignment9 table. -Ramya's POV

Module 8: Persona of ANOVA

Image
  Hi everyone!  This week we learned another form of testing, specifically for analyzing variances of a categorical variable like being male or female and if that has a true mean difference with a quantitative variable like hours spent studying for a test. This form of hypothesis testing is called ANOVA. The characteristics or persona that makes ANOVA different from a t-test for difference in means is we can have more than 2 variables with multiple categories. Let's get started! A researcher is interested in the effects of drug against stress reaction. She gives a reaction time test to three different groups of subjects: one group that is under a great deal of stress, one group under a moderate amount of stress, and a third group that is under almost no stress. The subjects of the study were instructed to take the drug test during their next stress episode and to report their stress on a scale of 1 to 10 (10 being most pain). High Stress Moderate Stres...

Module 7: Progression in Regression Analysis

Image
Hi everyone!  This week we are focusing on regression analysis from linear with one predictor variable to multi variable regression! Regression can be simply put as the an estimation or best fit of the relationship between two variables giving us a tool to predict the outcome of one variable based on its linear relationship to another variable. 1. In this assignment's segment, we will use the following regression equation    Y = a + bX +e Where: Y  is the value of the  Dependent variable (Y) , what is being predicted or explained a  or Alpha, a constant; equals the value of Y when the value of X=0 b  or Beta, the coefficient of X; the slope of the regression line; how much Y changes for each one-unit change in X. X  is the value of the Independent variable (X), what is predicting or explaining the value of Y e  is the error term; the error in predicting the value of Y, given the value of X  (it is not displayed in most regression eq...