Talk:Linear regression

From Wikipedia, the free encyclopedia

[edit] Mergers?

The front page of this article tells me that someone has suggested it be merged with Multiple regression. I agree that it should be. Also, there are articles on

could also be merged in.

Please add to the above list if there are others.

Personally, I'd prefer "Linear model" as the title.

Since this is a subject on which a great many books have been written, an article on it is not going to be anything like comprehensive. It might therefore be sensible to lay down some rules on the content, such as the level of mathematical and theoretical rigour.

Perhaps someone should start a Wikibook to cover the gaps...

—The preceding unsigned comment was added by Tolstoy the Cat (talk • contribs) .

Regression analysis is about a far broader topic than linear regression. Obviously a merger, if any, should go in the other direction. But I would oppose any such merger. Obviously, many sorts of regression are neither linear models nor generalized linear models, so "linear model" would not be an appropriate title. Also, an analysis of variance can accompany any of various different sorts of regression models, but that doesn't mean analysis of variance should not be a separate article. Michael Hardy 00:23, 5 June 2006 (UTC)

PS: Could you sign your comment so that we know who wrote it without doing tedious detective work with the edit history? Michael Hardy 00:24, 5 June 2006 (UTC)

I would also prefer not to merge. There is a lot of good content here which would only be overwhelming detail for most readers of the more general Regression analysis article. -- Avenue 03:18, 6 June 2006 (UTC)

Shouldn't merge those. If anything, the entry on linear models should be expanded. That entry, by the way, has the confusing line 'Ordinary linear regression is a very closely related topic,' which is technically correct but implies that linear regression is somehow more related to linear models than ANOVA or ANCOVA. It's important to keep in mind the difference between linear modeling and linear regression. Linear models follow Y = X*B+E, where Y and E and B are vectors and X is a design matrix. Linear regression models follow Y = X*m+b, where Y and X are data vectors and m and b are scalars. I think the superficial similarity between those two formulas creates the impression that linear regression and linear modeling somehow share a special connection, beyond linear regression just being a special case, as is AN(C)OVA. So I'm going to change that line :-) -- —The preceding unsigned comment was added by 134.58.253.130 (talk • contribs).

[edit] Edits in 2004

In reference to recent edits which change stuff like <math>x_i</math> to ''x''<sub>''i''</sub> -- the math-tag processor is smart enough to use html markup in simple cases (instead of generating an image via latex). It seems that some of those changes weren't all that helpful, as the displayed text is unchanged and the html markup is harder to edit. I agree the in-line <math>x_1,\ldots,x_n</math> wasn't pretty; however, it does seem necessary to clarify "variables" for the benefit of readers who won't immediately see x as a vector. Wile E. Heresiarch 16:41, 2 Feb 2004 (UTC)

In reference to my recent edit on the paragraph containing the eqn y = a + b x + c^2 + e, I moved the discussion of that eqn up into the section titled "Statement of the linear regression model" since it has to do with the characterizing the class of models which are called "linear regression" models. I don't think it could be readily found in the middle of the discussion about parameter estimation. Wile E. Heresiarch 00:43, 10 Feb 2004 (UTC)

I have a question about the stronger set of assumptions (independent, normally distributed, equal variance, mean zero). What can be proven from these that can't be proven from assuming uncorrelated, equal variance, mean zero? Presumably there is some result stronger than the Gauss-Markov theorem. Wile E. Heresiarch 02:42, 10 Feb 2004 (UTC)

At least a partial answer is that the validity of such things as the confidence interval for the slope of the regression line, unsing a t-distribution, relies on the normality assumptions. More later, maybe ... Michael Hardy 19:51, 10 Feb 2004 (UTC)

It occurs to me that if independent, Gaussian, equal variance errors are assumed, a stronger result is that the least-squares estimates are the maximum likelihood estimates -- right? Happy editing, Wile E. Heresiarch 14:57, 29 Mar 2004 (UTC)

I think this is better illustrated by an example. Imagine that x are the numbers from 1 to 100, our model is y = 2 x - 1 + error (or something equally simple), but our error, instead of being a nice normal distribution, is the variable that has 90% chance of being a N(0,1) and 10% chance of being 1000 x N(0,1). It's intuitively obvious that doing the least square fit over this will usually give a wrong analysis, while a method that seeks and destroys the outliers before the lsf would be better.Albmont 14:09, 23 November 2006 (UTC)

[edit] Galton

Hello. In taking a closer look at Galton's 1885 paper, I see that he used a variety of terms -- "mean filial regression towards mediocrity", "regression", "regression towards mediocrity" (p 1207), "law of regression", "filial regression" (p 1209), "average regression of the offspring", "filial regression" (p 1210), "ratio of regression", "tend to regress" and "tendency to regress", "mean regression", "regression" (p 1212) -- although not exactly "regression to the mean". So it seems that the claim that Galton specifically used the term "regression to the mean" should be substantiated. -- Also this same paper shows that Galton was aware that regression works the other way too (parents are less exceptional than their children). I'll probably tinker with the history section in a day or two. Happy editing, Wile E. Heresiarch 06:00, 27 Mar 2004 (UTC)

[edit] Beta

I am confused -- i don't like the notation of d being the solution vector -- how about using Beta1 and Beta0?

[edit] Scientific applications of regression

The treatment is excellent but largely theoretical. It would be helpful to include additional material describing how regression is actually used by scientists. The following paragraph is a draft or outline of an introduction to this aspect of linear regression. (Needs work.)

Linear regression is widely used in biological and behavioral sciences to describe relationships between variables. It ranks as one of the most important tools used in these disciplines. For example, early evidence relating cigarette smoking to mortality came from studies employing regression. Researchers usually include several variables in their regression analysis in an effort to remove factors that might produce spurious correlations. For the cigarette smoking example, researchers might include socio-economic status in addition to smoking to insure that any observed effect of smoking on mortality is not due to some effect of education or income. However, it is never possible to include all possible confounding variables in a study employing regression. For the smoking example, a hypothetical gene might increase mortality and also cause people to smoke more. For this reason, randomized experiments are considered to be more trustworthy than a regression analysis.

I also don't see any treatment of statistical probability testing with regression. Scientific researchers commonly test the statistical significance of the observed regression and place considerable emphasis on p values associated with r-squared and the coefficients of the equation. It would be nice to see a practical discussion of how one does this and how to interpret the results of such an analysis. --anon

I will add it in. Please feel free to improve on it. Oleg Alexandrov 16:46, 6 August 2005 (UTC)

[edit] Use of Terms Independent and Dependent

I changed the terms independent / dependent variable to explanatory / response variable. This was to make the article more in line with the terminology used by the majority of statistics textbooks, and because independent / dependent variables are not statistically independent. Predictor / response might be ideal terminology, I'll see what textbooks are using.--Theblackgecko 23:06, 11 April 2006 (UTC)

The terms independent and dependent are actually more precise in describing linear regression as such - terms like predictor/response or explanatory/response are related to the application of regression to different problems. Try checking a non-applied stats book.

Scientists tend to speak of independent/dependent variables, but statistics texts (such as mine) prefer explanatory/response. (These two pairs are not strictly interchangeable, in any event, though usually they are.) Here is a table a professor wrote up for the terms: http://www.tufts.edu/~gdallal/slr.htm.

[edit] Merge?

At a broad and simplistic level, we can say that Linear Regression is generally used to estimate or project, often from a sample to a population. It is an estimating technique. Multiple Regression is often used as a measure of the proportion of variability explained by the Linear Regression.

Multiple Regression can be used in an entirely different scenario. For example when the data collected is a census rather than a sample, and Linear Regression is not necessary for estimating. In this case, Multiple Regression can be used to develop a measure of the proportionate contribution of independent variables.

Consequently, I would propose that merging Multiple Regression into the discussion of Linear Regression would bury it in an inappropriate and in the case of the example given, an relatively unrelated topic.

--69.119.103.162 20:42, 5 August 2006 (UTC)A. McCready Aug. 5, 2006

Multiple regression (or more accurately, multiple linear regression), is just the general case of linear regression. Most of the article deals with simple linear regression, which is a special case of linear regression. I think the section on multiple regression in this article suffices. I'm going to go ahead and merge the two. --JRavn ^talk 13:54, 31 August 2006 (UTC)

As the discussion on merging Trend line into this article did not go anywhere, I propose we delete the warning template from both pages. Classical geographer 16:50, 19 February 2007 (UTC)

I still think it should be merged. — Chris53516 ^(Talk) 17:01, 19 February 2007 (UTC)

[edit] Tools?

Would it be within the scope of Wikipedia to mention or reference how to do linear regressions in Excel and scientific / financial calculators? I think it would be very helpful because that's how 99% of people will actually do a linear regression - no one cares about the stupid math to get to the coefficients and the R squared.

In Excel, there is the scatter plot for single variable linear regressions, and the linear regression data analysis and the linest() family of functions for multi-variable linear regressions.

I believe the hp 12c only supports single variable linear regressions. —The preceding unsigned comment was added by 12.196.4.146 (talk • contribs) 15:21, 15 August 2006 (UTC)

It would be useful, but I don't think it should be included in a wikipedia article. Tutorials and how-tos are generally not considered encyclopedic. An external link to an excel tutorial should be ok though. --JRavn ^talk 22:12, 18 August 2006 (UTC)

It would not be useful to suggest using Excel; it would be dangerous. Excel is notoriously bad at many statistical tasks, including regression. A readable summary is online here. Try searching for something like excel statistical errors to read more. -- Avenue 11:35, 19 August 2006 (UTC)

Someone new to the topic who has quick access to a ubiquitous program like Excel could quickly fire it up and try out some simple linear regression first hand. I don't think an external link amounts to a suggestion as to the best tool. We should be trying to accommodate people unfamiliar with the topic, not people trying to figure out what's the best linear regression tool (that would be an article in itself). --JRavn ^talk 03:37, 22 August 2006 (UTC)

[edit] Useful comments to accomodate a wider audience

I am not a mathematician, but a scientist. I am most familiar with the use of linear regression to calculate a best fit y=mx + b to suit a particular set of ordered pairs of data that are likely to be in a linear relationship. I was wondering if a much simpler explanation of how to calculate this without use of a calculator or knowing matrix notation might be fit into the article somewhere. The texts at my level of full mathematical understanding merely instruct the student to plug in the coordinates or other information into a graphing calculator, giving a 'black box' sort of feel to the discussion. I would suspect many others who would consult wikipedia about this sort of calculation would not the same audience that the discussion in this entry seems to address, enlightening though it may be. I am not suggesting that the whole article be 'dumbed down' for us feeble non-mathematicians, but merely include a simplified explanation for the modern student wishing a slightly better explanation than "pick up your calculator and press these buttons", as is currently provided in many popular college math texts.The24frans 16:21, 18 September 2006 (UTC)frannie

I agree that this would be helpful. Although I have used this in my graduate coursework, I want to use the same for analysis on the job. Having a simplified explanation as the prior user suggests would be helpful, especially in my trying to explain the theory to my non-analytical staff. 216.114.84.68 20:23, 3 October 2006 (UTC)dofl

[edit] Independent and dependent variables

Why are these labeled as "technically incorrect"? There is no explanation or citation given for this reasoning. The terms describe the relationship between the two variables. One is independent because it determines the the other. Thus the second term is dependent on the first. Linear regressions have independent variables, and it is not incorrect to describe them as such. If one is examining say, the amount of time spent on homework and its effect on GPA, then the hypothetical equation would be:

GPA = m * (homework time) + b

where homework time is the independent variable (independent from GPA), and GPA is dependent (on homework time, as shown in the equation). I will remove the references to these terms as technically incorrect unless someone can refute my reasoning. -- Chris53516 21:06, 3 October 2006 (UTC)

--->I added the terms, and I also added the note that they are technically incorrect. They are technically incorrect because they are considered to imply causation. (unsigned comment by 72.87.187.241)

[edit] Endogenous/exogenous

The uses of "endogenous" and "exogenous" variables here are not consistent with the only way I've ever heard them used. Exogenous means outside of the model -- i.e., a latent/hidden variable. Endogenous describes a variable that IS accounted for by your model, be it independent OR dependent. See Wiki entry for "exogenous," which supports this.

I recommend that these two words be deleted from the list of alternate names for predictor and criterion variables. (unsigned comments by 72.87.187.241)

I have found economists use exogenous in lenear models to mean non response variables. The contrast is endogenous variables that appear on the right hand side of one or more other variables equations but also on the left hand side of their own regression. Pdbailey 00:06, 7 October 2006 (UTC)

[edit] Forecasting

I ran into linear regression for the purpose of forecasting. I believe this is done reasonably often for business planning however I suspect it is statistically incorrect to extend the model outside of its original range for the independent variables. Not that I am by any means a statistician. Worthly of a mention? -K

If you're going to use a statistical model for forecasting you inevitably will have to use it outside the data used to fit it. That's not the main issue. One issue is that people sometimes plug time series data into a regression model and use it to produce a forecast, along with a confidence interval for the forecast. The problem is that in a time series the error terms are usually correlated, which is a violation of the regression model assumptions. The effect is that series standard deviation is underestimated and the confidence intervals are too small. The model coefficients are also estimated inefficiently, but they are not biased. For business planning it is common to first remove the seasonal pattern and then fit a regression line to obtain the trend. All that is fine. More sophisticated time series methods use regression ideas, but use them in a way that correctly accounts for the serial correlaton in the error terms, so that confidence intervals for the forecasts can be produced. Yes, I do think it's worth a mention. Blaise 09:08, 11 October 2006 (UTC)

If you want to have a precise forecast (meaning that you get the prediction and the error), it's necessary to know the a posteriori distributions of the coefficients (inclination, intercept and standard deviation of the errors); see below for Uncertanties. Albmont 13:58, 23 November 2006 (UTC)

[edit] Regression Problem

I have the problem to combine some indicators using a weighted sum. All weights have to be located in the range from 0 till 1. And the weights should add to one.

The probability distribution is rather irregular therefore the application of the EM-algorithm would be rather difficult.

Therefore I am thinking about using a linear regression with Lagrange condition that all indicators sum to one.

One problem which can emerge consists in the fact that a weight derived by linear regression might be negative. I have the idea to filter out indicators with negative weights and redo the linear regression with the remaining indicators until all weights are positive.

Is this sensible or does someone knows a better solution. Or is it better to use neural networks? (unsigned comments of Nulli)

[edit] Uncertainties

The page gives useful formulae for estimating alpha and beta, but it does not give equations for the level of uncertainty in each (standard error, I think). For you mathematicians out there, I'd love it if this were available on wikipedia so I don't have to hunt through books. 99of9 00:11, 26 October 2006 (UTC)

I got these formulas from many sources (some of which were wrong, but it was possible to correct doing a lot of test cases); for a regression y = m x + c + sigma N(0,1), we have that (mbar - m) * sqrt(n-2) * sigma(x) / s is distributed as a t-Student with (n-2) degrees of freedom, and n * s^2 / sigma^2 as a Chi-Square with (n-2) degrees of freedom, where

sigma(x)^2 is (sum(x^2) - sum(x)^2/n)/n

s^2 is (sum(y^2) - cbar sum(y) - mbar sum(xy))/n

however, I could not find a distribution for cbar. The test cases suggest that mbar and s are independent, cbar and s are also independent, but mbar and cbar are highly negatively correlated (something like -sqrt(3)/4) - but this may be a quirk of the test case. I would also appreaciate if the exact formulas were given. Albmont 13:52, 23 November 2006 (UTC)

I think I got those numbers now, after some deductions and a lot of simulations. The above formulas for the distribution of mbar and s are right; the distribution for cbar is weird, because there's no decent way to estimate it. OTOH, if we take c1 (for lack of a better name) = cbar + mbar xbar, then the variable (c1 - (m xbar + c)) * sqrt(n-2) / s is distributed as a t-Student with (n-2) degrees of freedom, and the threesome (mbar, c1, s) will be independent.Albmont 16:30, 23 November 2006 (UTC)

[edit] Polynomial regression

Would it be worth mentioning in this article that the linear regression is simply a special case of a polynomial fit? Below I copy part of the article rephrased to the more general case:

By recognizing that the regression model $y_i = \alpha_0 + \alpha_1 x_i + \alpha_2 x_i^2 \dots + \alpha_m x_i^m + \varepsilon_i$ is a system of polynomial equations of order m we can express the model using data matrix X, target vector Y and parameter vector

δ

. The ith row of X and Y will contain the x and y value for the ith data sample. Then the model can be written as

$\begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix}= \begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^m \\ 1 & x_2 & x_2^2 & \dots & x_2^m \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & x_n & x_n^2 & \dots & x_n^m \end{bmatrix} \begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_m \end{bmatrix} + \begin{bmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n \end{bmatrix}$

I recall using this in the past and it worked quite well for fitting polynomials to data. Simply zero out the epsilons and solve for the alpha coefficients, and you have a nice polynomial. It works as long as m<n. For the linear regression, of course, m=1. -Amatulic 18:57, 30 October 2006 (UTC)

Currently the article concludes with some material on multiple linear regression, which is actually more general than polynomial regression. However, noting polynomial regression there as perhaps an example would seem appropriate. Baccyak4H 19:20, 30 October 2006 (UTC)

Multiple linear regression is more general in the sense that it extends the basic linear regression to more variables; however, the equations are still first-order, and the section on multiple regressions isn't clear that (or how) it can be applied to polynomials. It isn't immediately obvious to me that the matrix expression in that section, when substituting powers of x for the other variables, is equivalent to the expression I wrote above.

the issue of being first order is really a semantic one; if you define the "Y" to be "X^2", it can be seen to be first order too. In fact, as you note below, in some sense it has to be first order to be considered linear regression. You're right about the ambiguity between your polynomial formulation and the current normal equations one. However, they should be fundamentally equivalent. That is why I suggested using the polynomial example here. Baccyak4H 19:57, 30 October 2006 (UTC)

What I suggested above is a replacement for the start of the Parameter Estimation section, presenting a more general case with polynomials and mentioning that solving the coefficients of a line is a specific case where m=1. I'm probably committing an error in calling it a "polynomial regression"; it's still a linear regression because the system of equations to be solved are linear equations.

Either there or at the end would be appropriate, I think. -Amatulic 19:43, 30 October 2006 (UTC)

Currently most of the article is about the y=mx+b case, what my edits referred to as "simple linear regression", and the section at the end generalizes. I notice there is a part in the estimation section that uses the matrix formulation of the problem which would be valid in the general case. Perhaps that could be moved into the multiple section as well as your content? Baccyak4H 19:57, 30 October 2006 (UTC)

[edit] Part on multiple regression

This section should be rewriten as it is not general: polynomial regression is only a special case of multiple regression. And there is no formula for correlation coefficient, which, in case of multiple regression, is called coefficient of determination. Any comments?

TomyDuby 19:29, 2 January 2007 (UTC)

The section on Multiple Regression already does describe the general case. Polynomial fitting is a subsection under that, and the text in that subsection correctly describes polynomial fitting as a special case of multiple regression. I'm not seeing the problem you perceive. -Amatulic 21:29, 2 January 2007 (UTC)

I accept your comment. But I would like to see the comment I made above about the correlation coefficient fixed. TomyDuby 21:24, 10 January 2007 (UTC)

I didn't address your correlation coefficient comment because I assumed you would just go ahead and make the change to the article. I just made a slight change pointing to the coefficient of determination article, but I didn't add a formula. Instead I added a link to the main article. -Amatulic 23:15, 10 January 2007 (UTC)

Thanks! I made one more change. I consider the issue closed. TomyDuby 13:34, 11 January 2007 (UTC)

[edit] Major Rewrite

The French version of this article seems to be better written and more organised. If there are no objections, I intend on translating that article into English and incorporating it into this article in the next month or so. Another point: all the article on regression tend to be poorly written, repeatative, and not clear on the terminology including linear regression, least squares and its many derivatives, multiple linear regression... Woollymammoth 00:08, 21 January 2007 (UTC)

I disagree with you analysis. Please DO NOT replace this page with a translation. If this page needs to be improved, improve it on its own. Wikipedia is not a source for translations of other webpages, even other Wikipedia pages. — Chris53516 ^(Talk) 14:51, 23 January 2007 (UTC)

2nd opinion: I disagree with both of you. Foreign articles can contain valuable content that enhances articles in English. I suggest to Woollymammoth that you create a sub-page off your userpage or your talk page, and put the translated article there. Then post a wikilink to it here so we can review it. There may be a consensus to replace the whole existing article, or to use bits and pieces of your efforts to improve the existing article. -Amatulic 17:10, 23 January 2007 (UTC)

The French Wikipedia is just like the English Wikipedia - it's not an original source and should probably not be cited or used, IMO. — Chris53516 ^(Talk) 18:03, 23 January 2007 (UTC)

I never said it should be cited. At the extreme, I'd prefer an English translation of a foreign article over an English stub article on the same suject any day. That's my point: If an article has value in another language, it has value in English. That may not be the case here; I only suggested that Woollymammoth make the translation available for evaluation. -Amatulic 02:57, 24 January 2007 (UTC)

[edit] Regarding Line of Best Fit Song

I used {{redirect5|Line of best fit|the song "Line of Best Fit"|You Can Play These Songs with Chords}} for the disambiguation template. It appears to be the best one to use based on Wikipedia:Template messages/General. The album page show "Line of Best Fit" in 3 different sections, so linking to 1 section is pointless and misleading. — Chris53516 ^(Talk) 17:25, 30 January 2007 (UTC)

Dear Chris:Concerning your reverts of the above, please check the link again. I edited to take account of your concern. That is, the freshly edited link takes someone to the first mention of the song, which there mentions where the only other reference is to that song in the article. I would appreciate either your responding on the Talk page for Linear regression, or my Talk page, or meeting the concern I expressed in my Edit summary. My thx. Thomasmeeks 19:38, 30 January 2007 (UTC)

The link you posted was not to the first instance, but the second. Check the page again. I don't see the point in debating this. Why are you so concerned that this points to a particular place in the article? It isn't necessary to worry about the length of the disambiguation note, and having anything above the article is distracting anyway (but there's nowhere else to put it), so both of your concerns don't really matter. — Chris53516 ^(Talk) 19:42, 30 January 2007 (UTC)

Thx for response. Sorry about overlooking first mention (I did use Find but from later section). No debate there. The explanation for Redirect5 template is:

Top of articles about well known topics with a redirect which could also refer to another article. (emph. added)

The song is not the CD or the band, so no mention of either is necessary. Anyone trying to get to the song can get there with a briefer disambig. A disambig that is longer than necessary can be a distraction. It looks like spam. And that can be annoying to a lot of people. Thx. -- Thomasmeeks 21:45, 30 January 2007 (UTC)

Well, perhaps we could use the same text, just replace the link to the album with the name of the song. How about:

"Line of best fit" redirects here. For the song "Line of Best Fit", see [[You Can Play These Songs with Chords|Line of Best Fit]].

That sound okay? — Chris53516 ^(Talk) 23:04, 30 January 2007 (UTC)

Better than worst, but the first sentence is unnecessary (those who got there via "Line of best fit" know it & those who didn't don't care). The 2nd sentence repeats "Line of best fit" unnecesarily ("here" works just as well and is shorter), but "See" is good. Convergence(;). BW, Thomasmeeks 02:39, 31 January 2007 (UTC)

Dude! Instead of wasting my time, did you even check "line of best fit"?? It doesn't even redirect here!! — Chris53516 ^(Talk) 23:11, 30 January 2007 (UTC)

It used to redirect here, but was recently changed to trend line, which isn't as good a fit. I have restored line of best fit to redirect to this article. -Amatulic 00:10, 31 January 2007 (UTC)

[edit] Merge of trend line

There seemed to be no objections to merging trend line into this article, so I went ahead. Cleanup would still be useful. Jeremy Tobacman 01:00, 23 February 2007 (UTC)

[edit] Major Rewrite

I have performed a major rewrite removing most of the repeative and redundant information. A lot of the information has been, or soon will be moved to the article least squares, where it should belong. This article should, in my opinion, contain information about different types of linear regression. There seems to be at least 2, if not more, different types: least squares and robust regression. All the theoretical information about least squares should be in the article on least squares. -- Woollymammoth 02:03, 24 February 2007 (UTC)

[edit] Confidence interval for $β i$

It's still not clear (to me) what's the meaning of this:

The

100(1 - α)%

confidence interval for the parameter,

β i

, is computed as follows:

${\hat \beta_i \pm t_{\frac{\alpha }{2},m - n} \hat \sigma \sqrt {(\mathbf{X}^T \mathbf{X})_{ii}^{ - 1} } }$

where t follows the Student's t-distribution with

m - n

degrees of freedom.

The problem is that I don't know what is ${(\mathbf{X}^T \mathbf{X})_{ii}^{ - 1}}\,$ - is it the inverse of the diagonal ii-th element of $\mathbf{X}^T \mathbf{X}$ ? (now that I ask, it seems that it is...).

Could we say that $\frac {\hat{\beta}} {\hat{\sigma}^2}\,$ follows some Multivariate Student Distribution with parameters $\mu = 0\,$ and $\Sigma = (\mathbf{X}^T \mathbf{X})^{-1}\,$ ? Or is there some other expression for the distribution of $\hat{\beta}$ ? Albmont 20:05, 6 March 2007 (UTC)

I am sure for the confusion. I should have defined that notatation. In fact, (A^TA)^-1_ii is the element located at the i^th row and column of the matrix. —The preceding unsigned comment was added by Woollymammoth (talk • contribs) 19:34, 13 March 2007 (UTC).

[edit] Slope/Intercept equations for linear regression

http://en.wikipedia.org/w/index.php?title=Linear_regression&oldid=110403867

I know this is simple minded and hardly advance statistics, but when need the equations to conduct a linear fit for some data points I expected wikipedia to have and more specifically this page, but they have been removed. See this page for the last occurance. It would be nice if they could be worked back in. --vossman 17:41, 29 March 2007 (UTC)

Estimating beta (the slope)

We use the summary statistics above to calculate $\widehat\beta$ , the estimate of β.

$\widehat\beta = {n S_{XY} - S_X S_Y \over n S_{XX} - S_X S_X}. \,$

Estimating alpha (the intercept)

We use the estimate of β and the other statistics to estimate α by:

$\widehat\alpha = {S_Y - \widehat\beta S_X \over n}. \,$

A consequence of this estimate is that the regression line will always pass through the "center" $(\bar{x},\bar{y}) = (S_X/n, S_Y/n)$ .

Retrieved from "http://en.wikipedia.org../../../l/i/n/Talk%7ELinear_regression_e2d9.html"

Talk:Linear regression

From Wikipedia, the free encyclopedia

Contents

[edit] Mergers?

[edit] Edits in 2004

[edit] Galton

[edit] Beta

[edit] Scientific applications of regression

[edit] Use of Terms Independent and Dependent

[edit] Merge?

[edit] Tools?

[edit] Useful comments to accomodate a wider audience

[edit] Independent and dependent variables

[edit] Endogenous/exogenous

[edit] Forecasting

[edit] Regression Problem

[edit] Uncertainties

[edit] Polynomial regression

[edit] Part on multiple regression

[edit] Major Rewrite

[edit] Regarding Line of Best Fit Song

[edit] Merge of trend line

[edit] Major Rewrite

[edit] Confidence interval for $β i$

[edit] Slope/Intercept equations for linear regression

Views

Navigation

interaction

Search

Talk:Linear regression

From Wikipedia, the free encyclopedia

Contents

[edit] Mergers?

[edit] Edits in 2004

[edit] Galton

[edit] Beta

[edit] Scientific applications of regression

[edit] Use of Terms Independent and Dependent

[edit] Merge?

[edit] Tools?

[edit] Useful comments to accomodate a wider audience

[edit] Independent and dependent variables

[edit] Endogenous/exogenous

[edit] Forecasting

[edit] Regression Problem

[edit] Uncertainties

[edit] Polynomial regression

[edit] Part on multiple regression

[edit] Major Rewrite

[edit] Regarding Line of Best Fit Song

[edit] Merge of trend line

[edit] Major Rewrite

[edit] Confidence interval for βi

[edit] Slope/Intercept equations for linear regression

Views

Navigation

interaction

Search

[edit] Confidence interval for $β i$