Posts

Showing posts from January, 2020

Zillow Data Time Series Analysis

Image
Zillow Data Time Series Analysis In this bog post we will deal with the time series analysis of of the Zillow data set which details the average house prices in every zipcode between April 1996 to April 2018. Our analysis can be found in this github repo . Of course not all zipcodes have data starting from  April 1996. There are overall 14723 Zipcodes which is a lot of data.  The question that we address in this project is the following. Imagine a real estate company has contracted us to analyze the data. The question that they want answered is What are the 5 best zipcodes to invest in? Of course the important question in what does one mean by best . That is of course why we are being paid the big bucks. We will have several definitions on best and analyze the data accordingly. Let us begin by describing the data set and gain some intuition from it. The dataset looks like the following, when stored in the 'wide' format with one column for the average price for ea...

Northwind Dataset - Statistical Analysis

Image
Northwind Dataset - Statistical Analysis Data description The data comes form a fictional "Northwind trading" company that seems to sell food products and beverages. The data is stored in the form of a SQL database in multiple tables. The schema of the SQL database is shown below. We will perform statistical tests to investigate various questions of business interest. The code for the analysis can be found here . Questions investigated Does discount amount have a statistically significant effect on the quantity of a product in an order? If so, at what level(s) of discount? Do sales get better year on year? Does the month when the products were ordered matter?  Is there a standout employee in terms of the distribution from which quantities sold are drawn ? Are there employees who are shipping their orders significantly late? Are there regions that are growing or decreasing in overall sales ? Are there product categories that are growing or decreasing in ove...

King County Housing - Beginner Analysis

Image
King County Dataset - Beginner Analysis This blog post is for the analysis of king county housing data set that can be found here  which describes the housing prices in King County between May 2014- May 2015. The data set I use is slightly modified and needs a little more preprocessing and cleaning. This analysis can ge found in this github repository . Lets begin by taking a look at the contents of the data set. id - unique identified for a house dateDate - house was sold pricePrice -  is prediction target bedroomsNumber -  of Bedrooms/House bathroomsNumber -  of bathrooms/bedrooms sqft_livingsquare -  footage of the home sqft_lotsquare -  footage of the lot floorsTotal -  floors (levels) in house waterfront - House which has a view to a waterfront view - Has been viewed condition - How good the condition is ( Overall ) grade - overall grade given to the housing unit, based on King County grading system sqft_above - square footage of...