Difference between revisions of "Variance reduction"

From Simulace.info
Jump to: navigation, search
(Measuring variance reduction efficiency)
(Antithetic variables)
Line 16: Line 16:
  
 
== Antithetic variables ==
 
== Antithetic variables ==
The method of antithetic variables is a frequently suggested technique for reducing the variability of estimators in computer simulation experiments. In practice the method does not appear to work as well as it might and hence is not used as often as it should. The idea in the technique of antithetic variables is as follows: It makes use of the fact that if U is uniformly distributed on (0, 1) then so is 1 − U and furthermore U and 1 − U are negatively correlated.  
+
The method of antithetic variables is a frequently suggested technique for reducing the variability of estimators in computer simulation experiments. In practice the method does not appear to work as well as it might and hence is not used as often as it should. The idea in the technique of antithetic variables is as follows: It makes use of the fact that if U is uniformly distributed on (0, 1) then so is 1 − U and furthermore U and 1 − U are negatively correlated. The key idea is that, if W1 and W2 are the outcomes of two successive simulation runs, then Var(W1 + W2/2)=1/4Var(W1)+1/4Var(W2)+1/2Cov(W1, W2). Hence, in the case that W1 and W2 are negatively correlated the variance of (W1 + W2)/2 will be smaller than in the case that W1 and W2 are independent. The question remains how we can achieve that the outcome of two successive simulation runs will be negatively correlated. From the fact that U and 1 − U are negatively correlated, we may expect that, if we use the random variables U1,...,Um to compute W1, the outcome of the first simulation run, and after that 1 − U1,...,1−Um to compute W2, the outcome of the second simulation run, then also W1 and W2 are negatively correlated. Intuition here is that, e.g., in the simulation of the G/G/1 queue large realizations of the Ui’s corresponding to large service times lead to large waiting times in the first run. Using the antithetic variables, this gives small realizations of the 1−Ui’s corresponding to small service times and hence leading to small waiting times in the second run.
  
 
== Stratification ==
 
== Stratification ==

Revision as of 16:01, 20 January 2019

Monte Carlo can be considered a straightforward (but important) technique as one can notice. Its main issue though is variance. To try to get rid of it, what can be done is to increase the number of samples N. However, each time we want to reduce variance two times, we need four fold the number of samples (for the blunder to diminish linearly, the quantity of tests needs to increase exponentially). Because of that great deal of research was invested into discovering whether change could be decreased by some other mean than simply expanding N. Luckily, mathematics can help answering this question. Monte Carlo integration typically has an error variance of the form σ2/N. Even though we get a better answer by sampling with a larger value of N, the computing time grows with N. Sometimes it is conceivable to decrease σ instead. To do this, we construct a new Monte Carlo problem with the same answer as our original one but with a lower σ. This leads to an entire branch in Monte Carlo research focused on what's known as variance reduction methods.
When do we need variance reduction? It is mostly needed in simulations of highly complex models with high computation cost per replication, and in simulations of rare events with too many replications in crude Monte Carlo. There are many applications of these phenomena in a variety of fields, for example: physics, finance, statistics, reliability or systems biology.


Variance reduction techniques

In this chapter can be found description for some of the popular classic variance reduction techniques, mainly focusing on antithetic variables and common random numbers (CRN). CRN is commonly utilized in a simulation study in which we want to compare performance measures of two distinct systems. Every single other method (for instance antithetic variables, stratification) is valuable in the case that we want to simulate a performance measure of only a single system. These methods all enhance effectiveness by sampling the input values more strategically. Next let’s consider conditioning and control variates. These methods take advantage of closed form solutions to problems similar to the given one. Another major method is importance sampling. Like some of the other methods, importance sampling also changes where we take the sample values, but rather than distributing them in more balanced ways it purposely oversamples from some regions and then corrects for this distortion by reweighting. It is accordingly an extreme reformulation of the issue and can be dubious to do well.
Other group of methods consists of heuristic variance reduction techniques. Amongst them we can find some of those popular techniques: Weighting in lieu of absorption, Splitting, Forced collisions, Russian roulette, etc. Underlying idea here is that we want to choose from a probability distribution that we want to use rather than using the one that the physical data indicates.
Last group we will mention here are more modern methods which include Quasi Monte Carlo and Multilevel Monte Carlo Estimator techniques.


Common Random Numbers

Common random numbers (CRN) is the simplest and presumably the most widely used method for increasing the efficiency of comparisons via simulation. It is intuitively appealing and easy to implement in either a custom-made simulation or in a simulation package. In its simplest form, CRN just requires that the systems under study be simulated with the same stream of random numbers. Naturally, this appears to be reasonable since it guarantees that the frameworks are thought about under same conditions. Key idea in the method is that on the off chance X and Y are two random variables, then Var(X−Y) = Var(X) + Var(Y) − 2Cov(X, Y). Thus, for the situation that X and Y are positively correlated, i.e. Cov(X,Y)>0, the variance of X − Y will be smaller than in the case that X and Y are independent. In general, the use of common random numbers leads to positive correlation of the outcomes of a simulation of two systems. As a consequence, it is better to use common random numbers instead of independent random numbers when we compare two different systems.


Antithetic variables

The method of antithetic variables is a frequently suggested technique for reducing the variability of estimators in computer simulation experiments. In practice the method does not appear to work as well as it might and hence is not used as often as it should. The idea in the technique of antithetic variables is as follows: It makes use of the fact that if U is uniformly distributed on (0, 1) then so is 1 − U and furthermore U and 1 − U are negatively correlated. The key idea is that, if W1 and W2 are the outcomes of two successive simulation runs, then Var(W1 + W2/2)=1/4Var(W1)+1/4Var(W2)+1/2Cov(W1, W2). Hence, in the case that W1 and W2 are negatively correlated the variance of (W1 + W2)/2 will be smaller than in the case that W1 and W2 are independent. The question remains how we can achieve that the outcome of two successive simulation runs will be negatively correlated. From the fact that U and 1 − U are negatively correlated, we may expect that, if we use the random variables U1,...,Um to compute W1, the outcome of the first simulation run, and after that 1 − U1,...,1−Um to compute W2, the outcome of the second simulation run, then also W1 and W2 are negatively correlated. Intuition here is that, e.g., in the simulation of the G/G/1 queue large realizations of the Ui’s corresponding to large service times lead to large waiting times in the first run. Using the antithetic variables, this gives small realizations of the 1−Ui’s corresponding to small service times and hence leading to small waiting times in the second run.

Stratification

Now, let’s just briefly mention the thought behind stratified sampling which is to split up the domain D of X into separate regions, take a well-defined sample of points from each such region, and combine the results to estimate E(f(X)). Naturally, on the off chance that every area gets a lot of points then we ought to get an improved answer. Figure 1 shows two little instances of stratified domains. We may have the capacity to improve the situation still by oversampling inside the vital strata and under sampling those in which f is almost consistent (similar to methods of importance sampling).

Measuring variance reduction efficiency

Using variance reduction can sometimes bring tremendous enhancements in contrast with conducting plain Monte Carlo. It isn’t exceptional for the value σ2 to be reduced thousand times (or more). Unfortunately, it is likewise possible for a variance reduction technique to bring a very modest improvement, perhaps equivalent to diminishing σ2 by only 10%. What is more regrettable, few methods will raise σ2 in unwanted circumstances. The estimation of a variance reduction relies upon more than the change in σ2. It likewise relies upon the PC's running time, perhaps the memory expended, and significantly, the human time taken to program and test the code. Suppose for simplicity, that a baseline method is unbiased and estimates the desired quantity with variance σ20/n, at a cost of nc0, when n function evaluations are used. To get an error variance of τ2 we need n = σ20/τ2 and this will cost c0σ20/τ2. Here we are assuming that cost is measured in time and that overhead cost is small. On the off chance that an alternative fair-minded method has variance σ21/n and cost nc1 under these conditions, then it will cost us c1σ21/τ2 to achieve the same error variance τ2 that the baseline method achieved. The efficiency of the new method, relative to the standard method is E = c0σ20/ c1σ21. At any fixed level of accuracy, the old method takes E times as much work as the new one. The efficiency has two factors, σ20/ σ21 and c0/c1. The first is a mathematical property of the two methods that we can often handle theoretically. The second is more complicated. It can depend heavily on the algorithms used for each method. It can also depend on details of the computing environment, including the computer hardware, operating system, and implementation language. Note that numerical outcomes for c0/c1 acquired in one setting don't really apply to another.
There is no settled guideline for how substantial a productivity enhancement must be to make it worth utilizing. In a few settings, for example, rendering PC illustrations for movies, where a large number of CPUs are kept occupied for a considerable length of time, a 10% enhancement (i.e., E = 1.1) brings important reserve funds. In different settings, for example, an irregular calculation, a 60-overlap gain (i.e., E = 60) which transforms a one-minute pause into a one second pause, may not justify the expense of programming a more complicated solution.
Computation costs so much less than human effort that we commonly require substantial proficiency gains to counterbalance the time spent programming up a variance decrease. The force to search out a productivity enhancement may possibly come when we end up hanging tight quite a while for an outcome, as, when we have to put our whole Monte Carlo figuring inside a loop representing numerous variations of the issue. A very slow computation costs more than just the computer’s time. It may waste time for those waiting for the answer (workforce). Also, slow computations reduce the number of alternatives that one can explore and thus not giving as options to choose from which would be preferable in many situations.
The effectiveness increase important to justify using a method is less if the programming exertion can be amortized over numerous applications. The edge is very high for a one-time program, lower for something that we are adding to our personal library, lower still for code to share with a few associates and even lower for code to be put into a library or simulation tool for general use. Important is to note that all of the methods mentioned in previous chapter are capable of a great range of efficiency improvements.

Recommended reading

https://web.maths.unsw.edu.au/~zdravkobotev/variancereductionCorrection.pdf http://sas.uwaterloo.ca/~dlmcleis/s906/chapt4.pdf https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-mathematical-foundations/variance-and-standard-deviation https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/variance-reduction-methods https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/introduction-quasi-monte-carlo https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/4261/glasserman_yao_guidelines.pdf