Friday, November 25, 2011
Command prompt window just appear and disappear!
The solution to this is just remove this key : HKLM\Software\Microsoft\Command Processor\AutoRun, from the registry, wich was added by the virus. remove the value EXIT
For more information about what the virus does, visit this web page "http://net-studio.org/fra/patch/patch/8-patch-pour-supprimer-le-virus-cradle-of-filth-vbe.html" (sorry if it is in french)
Wednesday, November 23, 2011
Stat
measures of location -- a statistic that describes a location within a data set. Measures of central tendency described the center of the distribution
mean -- the average; that value obtained by summing all elements in a set and dividing by the number of elements
mode -- a measure of central tendency given as the value that occurs the most in a sample distribution
median -- a measure of central tendency given as the value above which half of the values fall and below which half of the values fall
measures of variability -- a statistic that indicates the distributions dispersion
range -- the difference between the largest and smallest values of distribution
interquartile range -- the range of distribution income passing the middle 50% of the observations
variants -- the mean squared deviation of all the values from the mean
standard deviation -- the square root of the variance
coefficient of variation -- a useful expression in sampling theory for the standard deviation as a percentage of the mean
skewness -- a characteristic of a distribution that assesses its symmetry about the mean
kurtosis -- a measure of the relative peakedness or flatness of the curve defined by the frequency distribution
null hypothesis -- a statement in which no difference or effect is expected. If the null hypothesis is not rejected, no changes will be made
alternative hypothesis -- a statement that some difference or effect is expected. Excepting the alternative hypothesis will lead to changes in opinions or actions
one tailed test -- a test of the null hypothesis where the alternative hypothesis is expressed directionally
two tailed test -- a test of the null hypothesis where the alternative hypothesis is not expressed directionally
test statistic -- a measure of how close the sample has come to the null hypothesis. It often follows a well-known distribution, such as the normal, t, or chi- squared distribution
type I error -- also known as Alpha error, occurs when a sample results lead to the rejection of a null hypothesis that is in fact true
level of significance -- the probability of making a type 1 error
type II error -- also known as beta error, occurs when the sample results lead to the non-rejection of a null hypothesis that is in fact false
power of a test -- the probability of rejecting the null hypothesis when it is in fact false and should be rejected
Cross tabulation -- a statistical technique that describes two or more variables simultaneously and results in tables that reflect the joint distribution of two or more variables that have a limited number of categories or distinct values
contingency table -- a cross tabulation table. It contains a cell for every combination of categories of the two variables
chi-square statistic -- the statistic used to test the statistical significance of the observed association and cross tabulation. It assists us in determining whether a systematic association exists between the two variables
chi-square distribution -- a skewed distribution and shape depends solely on the number of degrees of freedom. As the number of degrees of freedom increases, the chi-square distribution becomes more symmetrical
phi coefficient -- a measure of the strength of Association and the special case of a table with two rows and two columns
contingency coefficient (C) -- a measure of the strength of association in a table of any size
Cramer's V -- a measure of the strength of association used in tables larger than 2 x 2
asymmetric lambda -- a measure of the percentage improvement in predicting the value of the dependent variable, given the value of the independent variable and contingency table analysis. Lambda also varies between zero and one
symmetric lambda -- the symmetric lambda does not make an assumption about which variable is dependent. It measures the overall improvement when production is done in both directions
tau b -- test statistic that measures the association between two ordinal-level variables. It makes adjustment for ties and is most appropriate when the table of variables is square
tau c -- test statistic that measures the association between two ordinal-level variables. It makes adjustment for ties and is most appropriate when the table of variables is not square but a rectangle
Gamma -- test statistic that measures the association between two ordinal-level variables. It does not make an adjustment for ties
parametric tests -- hypothesis testing procedures that assume that the variables of interest are measured on at least an interval scale
non-parametric tests -- hypothesis testing procedures that assume that the variables are measured on a nominal or ordinal scale
t test -- a univariate hypothesis test using the t distribution, which is used in the standard deviation is unknown and the sample size is small
t statistic -- a statistic that assumes that the variable has a symmetric bell shaped distribution in the mean is known (or assumed to be known) and the population variants is estimated from the sample
t distribution -- symmetric bell shaped distribution that is useful for small sample testing
z test -- a univariate hypothesis test using the standard normal distribution
independent samples -- to samples that are not experimentally related. The measurement of one sample has no effect on the values of the second sample
f test -- a statistical test of the equality of the variances of two populations
f statistic -- the f statistic is computed as the ratio of two sample variances
f distribution -- a frequency distribution that depends on two sets of degrees of freedom -- the degrees of freedom in the numerator and the degrees of freedom in the denominator
paired samples -- and hypothesis testing, the observations are paired so that two sets of observations relate to the same respondents
paired samples t test -- a test for differences in the means of paired samples
Kolmogorov-Smirnov one-sample test - A one sample nonparametric goodness of fit test that compares the cumulative distribution function for a variable with a specified distribution
runs test -- a test of randomness for a dichotomous variable
binomial test -- a goodness of fit statistical test for dichotomous variables. It tests the goodness of fit of the observed number of observations in each category to the number expected under a specified binomial distribution
Mann-Whitney U test -- a statistical test for the variable measured on an ordinal scale comparing the difference in the location of two populations based on observations from two independent samples
two-sample median test -- non-parametric test statistic that determines whether two groups are drawn from populations with the same median. This test is not as powerful as the Mann- Whitney U
Kolmogorov-Smirnov two-sample test -- nonparametric test statistic that determines whether to his divisions are the same. It takes into account any differences in the two distributions including median, dispersion, and skewness
Wilcoxon matched-pairs signed-ranks test -- a nonparametric test that analyzes the differences between the paired observations, taking into account the magnitude of the differences
sign test -- a nonparametric test for examining differences in the location of two populations, based on paired observations, that compares only the signs of the differences between pairs of variables without taking into account the magnitude of the differences
Thursday, November 17, 2011
Types of Survey Errors
Coverage errors occur when the sampling frame excludes some segments of the target population. Phone books are very convenient, but unlisted households are excluded. Election polls sample from the frame of registered voters, but the target population is the subset who are going to vote.
Nonresponse errors can cause serious bias in survey results. Phone interviewers find it easiest to reach families with young children, and hardest to reach young singles. Mail surveys tend to be answered by those who feel strongly about an issue, or by those who feel more civic responsibility, neither of which is a representative cross-section of the population.
Measurement Errors occur when respondents answer `inaccurately' because of question wording, question ordering, interviewer effect, or other external influences. For example, the answer to ``Do you approve of affirmative action?'' may be influenced by the gender of the interviewer. Also, the question ``Do you like living in Davis Hall?'' may be influenced by preceding it with the question ``Do you have enough parking spots in Davis Hall?''.
Survey Sampling Methods
It is incumbent on the researcher to clearly define the target population. There are no strict rules to follow, and the researcher must rely on logic and judgment. The population is defined in keeping with the objectives of the study.
Sometimes, the entire population will be sufficiently small, and the researcher can include the entire population in the study. This type of research is called a census study because data is gathered on every member of the population.
Usually, the population is too large for the researcher to attempt to survey all of its members. A small, but carefully chosen sample can be used to represent the population. The sample reflects the characteristics of the population from which it is drawn.
Sampling methods are classified as either probability or nonprobability. In probability samples, each member of the population has a known non-zero probability of being selected. Probability methods include random sampling, systematic sampling, and stratified sampling. In nonprobability sampling, members are selected from the population in some nonrandom manner. These include convenience sampling, judgment sampling, quota sampling, and snowball sampling. The advantage of probability sampling is that sampling error can be calculated. Sampling error is the degree to which a sample might differ from the population. When inferring to the population, results are reported plus or minus the sampling error. In nonprobability sampling, the degree to which the sample differs from the population remains unknown.
Random sampling is the purest form of probability sampling. Each member of the population has an equal and known chance of being selected. When there are very large populations, it is often difficult or impossible to identify every member of the population, so the pool of available subjects becomes biased.
Systematic sampling is often used instead of random sampling. It is also called an Nth name selection technique. After the required sample size has been calculated, every Nth record is selected from a list of population members. As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. Its only advantage over the random sampling technique is simplicity. Systematic sampling is frequently used to select a specified number of records from a computer file.
Stratified sampling is commonly used probability method that is superior to random sampling because it reduces sampling error. A stratum is a subset of the population that share at least one common characteristic. Examples of stratums might be males and females, or managers and non-managers. The researcher first identifies the relevant stratums and their actual representation in the population. Random sampling is then used to select a sufficient number of subjects from each stratum. "Sufficient" refers to a sample size large enough for us to be reasonably confident that the stratum represents the population. Stratified sampling is often used when one or more of the stratums in the population have a low incidence relative to the other stratums.
Convenience sampling is used in exploratory research where the researcher is interested in getting an inexpensive approximation of the truth. As the name implies, the sample is selected because they are convenient. This nonprobability method is often used during preliminary research efforts to get a gross estimate of the results, without incurring the cost or time required to select a random sample.
Judgment sampling is a common nonprobability method. The researcher selects the sample based on judgment. This is usually and extension of convenience sampling. For example, a researcher may decide to draw the entire sample from one "representative" city, even though the population includes all cities. When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population.
Quota sampling is the nonprobability equivalent of stratified sampling. Like stratified sampling, the researcher first identifies the stratums and their proportions as they are represented in the population. Then convenience or judgment sampling is used to select the required number of subjects from each stratum. This differs from stratified sampling, where the stratums are filled by random sampling.
Snowball sampling is a special nonprobability method used when the desired sample characteristic is rare. It may be extremely difficult or cost prohibitive to locate respondents in these situations. Snowball sampling relies on referrals from initial subjects to generate additional subjects. While this technique can dramatically lower search costs, it comes at the expense of introducing bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.