Predictive Model For EBay Auctions

Preparing Data Sets

Predictable models were important tools for evaluating patient results. The predictive model can be built with regression analysis in a data set with the formation of a number of representative patients. The apparent performance of the model in this training set will be better than the performance from another data set, even if the last set of tests was made up of patients of the same population (Petruseva, Sherrod, Pancovska, & Petrovski, 2016). This optimism was a well-known statistical phenomenon and several approaches have been proposed to assess the performance of the model in independent subjects more in detail than on the basis of a naive evaluation in the sample (Caldeira, Brandao, & Pereira, 2014).

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Generally, logistic regression was very well suited to describe and evaluate the hypothesis of correlations between a categorical variable of the result and one or more categorical or continuous variables of predictions. It was difficult to describe two parallel lines using the normal regression equation with less squares due to duplication of results, you create categories for the forecaster and the average value of the outbound variable for the outbound variable to calculate their respective categories (Andorfer, & Liebe, 2015; Chan, & Liu, 2017).

A simple and relatively popular approach was to divide the training data into two parts at random: one for the development of the model and another for measuring the performance. This split sample approach determines the performance of the model with comparable but independent data. A more complex approach was the use of cross validation, which can be considered as an extension of the split sample method. In the cross-mediated split validation, the model was developed in one random half and tested in the other and vice versa (Zhang et al., 2018). The average was considered an estimate of performance.

Fictitious categorical variables were created for the categorical predictors. These include 18 types of “categories”, three types of “currencies” (USD, GBP, EURO), six days of auction per week in “EndDay” (Monday to Sunday) and five levels of “Duration”, that were 1, 3, 5, 7 and 10 days. The data set was also partitioned after removal of the outlier or extreme values from the continuous predictors. The training data set was prepared from randomly selected 70% observations (N = 1268), and the validation data set was constructed with rest of the 30% of the observations (N = 544) from 1810 observations of the outlier removed data set.

Originally, the model was adapted to a training dataset, which was a set of observations that was used according to the model’s parameters. The model was trained in training data using a super-wise learning method. The current model was performed with the practice set and creates a result that was then compared to the target of each input vector in the exercise dataset. Model adjustment can include variable selection and parameter evaluation.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The sequentially adjusted model was used to predict observation responses in another dataset called the test dataset. A test dataset was an impartial evaluation of the model that corresponds to the exercise dataset, and adjusts the parameters of the hyper-active model. Test dataset was used for correction using the previous model. Finally, the test dataset was a set of data that was used to impartially evaluate the destination model that matches the practice dataset.

Training Vs Validation

The three scale or continuous variables were seller rating, close price and open price. Average seller rating was 3560.24 (SD = 5973.01), average close price was evaluated as 36.49 (SD = 89.49), and average open price was found to be 12.93 (SD = 38.85). High volatility in the three continuous variables was observed, and high number of outliers was identified from the box plots of the variables. The median seller rating was noted at 1853 and was way less than the average value. High positive skewness was associated with seller rating, and it required outlier elimination following the 6 sigma spread rule. 50% of the close prices were less than the median value of 9.99 and a high positive skewness was noted (S = 6.06) for presence of high positive outlier close values. Similar trend was also observed for open prices with median value of 4.5, whereas the average open price due to the presence of high positive extreme values of open prices.

The 6 sigma spread was evaluated for the above three continuous predictors of the competitiveness of the auction values. The spread of seller rating was calculated as [-14358.8, 21473.3], for the close price was [-232.0, 304.9], and for the open price was [-103.6, 129.5]. The removal of 162 outlier values reduced the data set to 1810 observations (van Smeden et al., 2018). Later, the data set was partitioned for training and validation proposes. It was to be noted that the new variables still had extreme or outliers, but these calculations was respect to the new average and standard deviations. The Figure 3 and Figure 4 represented the outlier removed variables from the previous set of data.

Description summary of the categorical predictors can be identified from Table 1. From Figure 5 dominance of US dollar in the auctions could be noted. Also, duration of holding for 7 days was the maximum for almost half (P = 50.44%) the auctions in the data set. Maximum number of transactions from Table 1 was observed respectively on Monday, Saturday, and Sunday. Wednesday was found to be the least likely day for auctions.

Correlation analysis on the continuous data type independent variables for multicollinearity was performed to check if there were any variables which may cause multicollinearity. The analysis was required for the logit regression model of the article. From Table 2 a significant and positive correlation was noted between open prices and close prices. A negative significant correlation of close price with seller ratings was identified. No correlation whatsoever existed between seller ratings and open price of the auctions. Hence, no multicollinearity issue was identified for the regression model (Chatterjee, & Hadi, 2015).

As an illustration, in May-June 2004, we used the information in auctions on eBay.com, as logistic regression was applied to and interpreted the results. The data was in line with the auctions, and the goal was to create a model that distinguishes between competitive and non-competitive auctions. A competitive action was defined as an auction where at least two offers were placed for the item to be auctioned. The data contains variables selected from the auction category, seller value, price, currency, weekday or auction end. In addition, we have the price at which the auction will be closed. The goal was to predict whether the auction was competitive or not. We implemented logistic regression on the training set by applying ‘Stepwise’ attribute selection method (Harrell, 2015; McBride, Sullivan, Swinson, & Wang, 2014).

Pre-Processing

The overall evaluation of the model states that the logistics model provides a more customized data if it demonstrates improvement in a single interception model. The classification of the stepwise regression model revealed that at step 6 the optimal model estimated the dichotomous nature of the competitive auctions. It was also noted that in 85.7% cases correct estimation was possible for “0” level and in 71.7% cases the competitiveness was accurately estimated. The improvement of this baseline was investigated by three statistical tests: probability coefficient, point and Wald tests (Lombardo, Cama, Conoscenti, Märker, & Rotigliano, 2015; Shivaswamy, Ge, & Yuan, 2016).

The statistical data of individual predictors were checked, the statistical significance of individual regression coefficients was checked in the declaration. According to the model there was noticeable reference to close price in all the steps of the model (p< 0.05). It was the first variable entering the model. In the current dataset, the test result (p< 0.05) suggested that an alternative model could be applied to the data without interception.

Stepwise selection led to an average performance that was slightly worse than that of the full predictor model. When stepwise model was evaluated with the test data, the slope statistic of seller rating found to have no impact on competitiveness in the auctions. The slopes of open price and end day were significantly negative. The close prices, currency selection, category of auctions were found to be positively affecting the competitive nature of auctions. The R-square change for the models was assessed from scores of Cox & Snell R-square. This was the rate of change of a dependent variable that can be explained by model predictors. An attempt was made to create a similar concept for a logistic model. The change in the R-square value from initial to the final steps of the models indicated improvement in prediction of the variability of the binary outcome variable. The predictors in the final model were able to predict 28.2% variability in competitiveness of the auctions. The bias in the apparent performance was slightly larger than that observed for pre specified models (Zhang,et al., 2018).

The goodness-of-fit estimates that the logistic model has been adapted to actual results. The differential control criterion was the Hosmer and Lemeshow (HL) test, which showed  significant assessment of the appropriateness of the model with the predicted structure. Here, the test revealed that the goodness-of-fit of the logit regression model, assuming the data model was appropriate did not endure. The contingency analysis for the stepwise model explained that the differences between expected and observed values for the Chi-square test statistics.

Here, predicted logit of competitiveness represents the log value of the odds in favor of competitiveness in the auctions. Exact value of odds in favor of competitiveness could be obtained by taking antilog on both sides of the estimated equation.

Using the logistical model using the training dataset, we run the regression validation data using the “Enter ” method.  The one step logit regression model provided the estimated equation as,

Outlier Detection

Four of the categorical variables in the model were found to be insignificant (p> 0.05). Type of currencies, auction categories, end day of auction, and duration of auction were found to have statistically insignificant impact on the competitiveness of the auctions. Only, the open and close prices were the two significant predictors in the one step logistic model. The Wald test statistics implied that there was statistically significance of the constant term in the equation. The Omnibus test reflected that the null hypothesis assuming no impact of the independent predictors on the dichotomous outcome variable was rejected at 5% level of significance, concluding that the estimation was statistically significant. 25.2% variation in the outcome was explained by the predictors. But, the Hosmer and Lemeshow test statistic was found to be statistically significant (p < 0.05) and revealed that similar to the stepwise model, the one step model failed to fit the validation data set.

The close price was statistically significant (p< 0.5) at 5% level. Close price on estimating the competitiveness of auctions indicated that the odd ratios or exp-beta indicated that the odd ratio in favor was greater than one. Hence, the close price had practical significance in estimating the auction competitiveness in favor of odds.

In this post we show that logistic regression can be a powerful analytical method to use when the variable result was a bifurcation. The logistic model was supported by testing the significance of the model in relation to the base model, the criterion of the meaning of each predictor, and the descriptive and preferential compliance indicators. The article started with a dataset of auction details from eBay in 2004, and the estimation was made to assess the competitiveness of the auctions. For the estimation purpose, binary logistic models were opted for. The outcome was dichotomous in nature and binary logit model was the choice based on previous literatures (Wojciechowska, 2018).

The piece of writing found major outliers for the continuous predictors and were removed in accordance with the 6 sigma rule. The dataset was portioned in training and validation sets with respectively 70% and 30% observations. Two logit models with both the data sets were done with respectively with stepwise and enter regression methods. Stepwise model predicted the competitiveness with greater accuracy than the one step model. The risk or danger of bias and shortcomings associated with the stepwise model is well known in literatures. The biased regression coefficients as well as ambiguous p-values were some practical problems in the model.

References

Andorfer, V. A., & Liebe, U. (2015). Do information, price, or morals influence ethical consumption? A natural field experiment and customer survey on the purchase of Fair Trade coffee. Social science research, 52, 330-350.

Caldeira, E., Brandao, G., & Pereira, A. C. (2014, October). Fraud Analysis and Prevention in e-Commerce Transactions. In 2014 9th Latin American Web Congress (LA-WEB) (pp. 42-49). IEEE.

Chan, N. H., & Liu, W. W. (2017). Modeling and forecasting online auction prices: a semiparametric regression analysis. Journal of Forecasting, 36(2), 156-164.

Chatterjee, S., & Hadi, A. S. (2015). Regression analysis by example. John Wiley & Sons.

Harrell, F. E. (2015). Introduction. In Regression Modeling Strategies (pp. 1-11). Springer, Cham.

Kimeldorf, H., Meyer, R., Prasad, M., & Robinson, I. (2006). consumers with a conscience: will they pay more?. contexts, 5(1), 24-29.

Lombardo, L., Cama, M., Conoscenti, C., Märker, M., & Rotigliano, E. (2015). Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: application to the 2009 storm event in Messina (Sicily, southern Italy). Natural Hazards, 79(3), 1621-1648.

Malik, Saleem & Bajwa, Imran. (2012). A Rule Based Approach for Business Rule Generation from Business Process Models. 10.1007/978-3-642-32689-9_8.

McBride, J., Sullivan, T. J., Swinson, M., & Wang, Z. (2014). U.S. Patent No. 8,868,480. Washington, DC: U.S. Patent and Trademark Office.

Petruseva, S., Sherrod, P., Pancovska, V. Z., & Petrovski, A. (2016). Predicting Bidding Price in Construction using Support Vector Machine. TEM Journal, 5(2), 143.

Shivaswamy, G. H., Ge, H., & Yuan, L. (2016). U.S. Patent Application No. 14/517,505.

van Smeden, M., Moons, K. G., de Groot, J. A., Collins, G. S., Altman, D. G., Eijkemans, M. J., & Reitsma, J. B. (2018). Sample size for binary logistic prediction models: Beyond events per variable criteria. Statistical methods in medical research, 0962280218784726.

Wojciechowska, O. (2018). Online Auctions-Examination Of Bidders’Strategies: Theory And Data Analysis (Doctoral dissertation, University of Warwick).

Zhang, Z., Cortese, G., Combescure, C., Marshall, R., Lee, M., Lim, H. J., & Haller, B. (2018). Overview of model validation for survival regression model with competing risks using melanoma study data. Annals of translational medicine, 6(16).

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.