Attractions in USA

On this page, I’ll try to put all attractions that I’ve visited and found them to be worth my money. I’ve tried to be as detailed as possible in terms of what it costs to visit the place, how to find the best price for that, where to get parking, etc.

If you have maps that you would like to share or collaborate, let me know, and I can add them here. That way, we can have a very good database of “attractions to visit in USA”. These will mostly include places that are of general interest to Indians, such as general attractions, temples, Indian or other restaurants that suit to Indian taste, or anything else that you think deserves your money.

Attractions:

Most popular destinations in USA have a couple of common attractions. I list a few here:

  • Tall Buildings: Every city in USA will have a tall building, and it will be listed as the top tourist attraction. Don't fall for that. Once you have been to New York, or any other tall building, no point wasting more money. They are all the same.
  • Museums: Museums are some of the most boring stuff to do in any city. But if you want to teach your kids a lesson (NOT literally, I meant torture them), then museums are the way to go. There are few exceptional musems (one in Chicago, other in New York). Once you have done them, you are done.
  • Cruises: If the city has a lake, river or is on the coastline, then cruises are usually the top attractions. These are worth the money, especially if the cruise is on the sea or a long river.
  • Big Bus rides: These are Buses that you can ride on, and see the city. They have a guide, and usually you can hop on and hop off at various stops. So, they are nice way to tour the city without having to drive and park your car for tons of money. They do save you money, but not worth it to pay high price for the bus ride itself as they waste a lot of your time. As part of the pass (see below), they are OK.

Cities:

These are the top cities/places in USA:

  1. Peurto Rico (PR): 
  2. San Francisco and Vicinity (Bay area)
  3. Miami
  4. New York
  5. Chicago
  6. Las Vegas
  7. Yellowstone Park

 


 

Passes:

One of the best ways to visit any city attractions is thru passes. If you are with kids and plan to show them around, passes are one of the cheaper options.These passes are offered by many companies, and they allow you to see a lot of included attractions for a fixed price. These generally save you 50% or more over visiting each paying for each attraction individually. There are 3 large companies that offer these passes for 30+ cities in the world (GoCity, The Sightseeing Pass and CityPass). They all have options for unlimited attractions for limited days (i.e unlimited attractions allowed within a fixed number of consecutive days, usually 1-10 days), or limited attractions for weeks/months (i.e limited number of attractions allowed but you have a longer duration of few weeks to complete it). The 2nd option is usually more expensive as they know that most of the visitors will be able to finish those attractions since they have so much time to finish it. All these offer almost same top attractions, but there are variations in the not so popular attractions. Be sure to check out attractions list for all of them before jumping on one or the other. There are passes offered by local companies too for many cities which are not that well advertised.

The Passes are activated once you scan it for your first attraction. So, scan it early morning of the day you want to start your pass. Since most attractions close by 4PM, to maximize your pass, you should be at your first attraction by 8AM everyday. That way you can cover 3-4 attractions/day, which will be worth the money you paid for your pass.

Sale: Most of these passes can be bought on discount, as there are frequent sales. Groupon and expedia usually have some discount on these passes from time to time. There's also nice cashback that you can get from cashback websites if you use these 2 portals to buy passes. I've found local passes too for many cities on these 2 portals. On top of that, these Pass companies will have sales on their own websites too. Usually, it's chepaer to buy on Groupon and Expedia. Compare prices from all 3 places (Groupn, Expedia and the Company website) before making a purchase. These 3 big players and local companies are all competing with each other in most popular tourist destinations as NewYork, San Francisco, Orlando, etc. So, you may get better deals in these cities. In other not so popular destinations, you may have only one or two of them offering their passes, so passes will be more expensive due to less competition.

Few things to keep in mind when buying these passes:

  • 10 day pass will cost 2X of 2 or 3 day pass. So, it'a best to go with max length pass (i.e 5 or 10 day pass is much more economical than 3 day pass).
  • Look for coupons/sales on internet. Also use Cashback sites to get some more % back. If groupon has these pass for same price, buy from groupon, as you get higher cashback. Expedia usually has lower cashback.
  • Prioritize visiting places that usually have high prices, that way you maximize your return (in case you can't visit everything in that time period)

Below are more details on each company:

  • GoCity: This is the most known and popular one. They have sales from time to time where you can save 30%+ on their passes. Signing up for email saves you 5% more.They have passes for US cities (Las Vegas, New York, San Francisco, Chicago, Orlando, Miami, Los Angeles, etc), as well as for international destinations (London, Paris, Amsterdam, Rome, Sydney, etc).
  • The Sightseeing Pass: This is less known and seems to be a smaller company. They also have coupons and sales, so never pay full price for passes. They have passes for US cities (Las Vegas, New York, San Francisco, Orlando, Miami, Los Angeles, etc), as well as for international destinations (London, Malaga (Spain),  etc).
    • Link => https://www.sightseeingpass.com
    • Groupon sale link =>Could not find the landing page, though sales for sightseeingpass exists on groupon. Just search.
  • CityPass: This is usually confused with "local company" passes, but it's a smaller company covering only US cities as of 2023 (They do have Toronto in Canada for International destination). US cities covered are New York, San Francisco, Chicago, Orlando, Seattle, etc.
  • Local Pass: These are passes offered by local companies or city. These passes are very competitive and sometimes offer better value than these big 2 companies. These are harder to know, but I'll list these in individual city section, where they are available.

 


 

Maps:

If you need help on how to create maps using google maps, and then share or colloborate with others, follow the link below to get guidance.

Google maps help

 

Below is the map for various attractions in Austin TX, USA:


Austin, TX

 

In terms of attractions, there is not much in Austin that will specifically want you to plan a trip.

This shows all the latest articles published on this website:

 

Probability Distribution

We looked at pdf (probability distribution function) earlier. A probability distribution can either be univariate or multivariate

  1. Univariate: A univariate distribution gives the probabilities of a single random var taking on various alternative values;
  2. Multivariate: A multivariate distribution gives the probabilities of a random vector—a set of two or more random variables—taking on various combinations of values. 

Normal Distribution:

There are many kinds of univariate/multivariate distribution function, but we'll mostly talk about "Normal Distribution" aka "Guassian distribution" (or bell shape distribution). Normal distribution is what you will encounter in almost all practical examples in semiconductors, AI, etc. So makes sense to study normal dist in detail. Yu can read about many other kind of dist in wikipedia link:

1. Univariate normal distribution:

https://en.wikipedia.org/wiki/Normal_distribution

pdf function is:

f(x) = 1/(σ√2π).exp(-1/2*((x-μ)/σ)2) => Here μ=mean, σ=std deviation (or σ2= variance). We divide it by σ, so that the integral of f(x) is 1.

Standard normal distribution is simplest normal dist with μ=0, σ=0.

The way we write that random var X belongs to normal distrbution is via this notation:

X ~ N(μ, σ2) => Here N means normal distribution. mean and variance are provided.

We often hear 1σ, 2σ, terms. These refer to σ in Normal dist. If we draw pdf for normal distribution and try to calculate as to how many samples lie within +/- 1σ, we see that 68% of the values are within 1σ or 1 std deviation. Similarly fo 2σ, it's 95%, while for 3σ, it's 99.7%. 3σ is often referred to as 1 out of 1000 outside the range. So that implies that 3σ is roughly taken as 99.9% even though it's 99.7% when solved.

As 3σ is taken as 1 out of 103 or 10-3, 4σ is taken as 10-4, 5σ as 10-5 and 6σ as 10-6 event. So, 6σ implies only 1 out of 1M chance of the sample ebing outside the range. 6σ is used very commonly in industries. Many products have requirement of 6σ defects, i.e 1 ppm defect (only 1 out of 1M parts is allowed to be defective). In semiconductors, 3σ defect rate is targeted for a lot of parameters.

 

2. Multivariate normal distribution: It's generalization of one dimenesional univariate normal dist to higher dimensions.

 https://en.wikipedia.org/wiki/Multivariate_normal_distribution

A random vector X = X1,X2,...Xn is said to be multivariate normal dist if Every linear combination {\displaystyle Y=a_{1}X_{1}+\cdots +a_{k}X_{k}} of its components is normally distributed.

A Multivariate Normal dist is hard to visualize, and not that common. A more common case of multivariate normal dist is bivariate normal dist which is normal dist with dimension=2.

Bivariate normal distribution: Given 2 random vector X, Y, a bivariate pdf function is:

f(x,y) = 1/(2πσx σy √(1-ρ2)).exp(-1/(2(1-ρ2))*[ ((x-μx)/σx) - 2ρ(x-μx)/σx*(y-μy)/σy +  ((y-μy)/σy)2 ]  => Here μ=mean, σ=std deviation (or σ2= variance). We defined a new term rho (ρ), which is the Pearson correlation coefficient R b/w X and Y. It's the same Pearson coeff that we saw earlier in stats section. rho (ρ) captures the dependence of Y on X. If Y is independent of X, then ρ=0, while if Y is completely dependent on X, then ρ=1. We will see more examples of this later. We divide this expr by complex looking term, so that the 2D integral of f(x,y) is 1.

2D plot of f(x,y): We will use gnuplot to plot these.

This is the gnuplot pgm (look in gnuplot section for cmds and usage). f_bi is the final func for Bivariate normal dist func.

gnuplot> set pm3d
gnuplot> set contour both
gnuplot> set isosamples 100
gnuplot> set xrange[-1:1]
gnuplot> set yrange[-1:1]
gnuplot> f1(x,y,mux,muy,sigx,sigy,rho)=((x-mux)/sigx)**2 - 2*rho*((x-mux)/sigx)*((y-muy)/sigy) + ((y-muy)/sigy)**2
gnuplot> f_bi(x,y,mux,muy,sigx,sigy,rho)=1/(2*pi*sigx*sigy*(1-rho**2)**0.5)*exp((-1/(2*(1 - rho**2)))*f1(x,y,mux,muy,sigx,sigy,rho))

1. f(x,y) with ρ=0 => Let's say we have sample of people where X is their height and Y is their IQ. We don't expect to have any dependence between the two. So, here f(X) on X axis is the height of people which is a 1D normal distribution around some mean. Similarly f(Y) on Y axis is the IQ of people which is again a 1D normal distribution around some mean. If we plot a 2D pdf of this, then we are basically multiplying probability of X with probability of Y to get probability at point (x,y). Superimposing f(X) and f(Y) gives contour as a circle as X=mean+sigma or X=mean-sigma will yield the same value for Y as probability of Y doesn't change based on what the probability of X is. Infact this is the properrty and definition of independence => if f(x,y)=f(x).f(y) that means X and Y are independent. We can see that setting ρ=0 yields that. Below is the gnuplot function and the plot

gnuplot> splot f_bi(x,y,0,0,0.4,0.4,0.0)

2. f(x,y) with ρ=0.5 =>Here we can consider the example of same people as above but plot weight Y vs Height X. We hope to see some correlation. What this means is that pdf(Y) varies depending on which point X is chosen. So, if we are at X=mean, then pdf(Y) is some shape, and if we choose X=mean + sigma, then pdf(Y) is some other shape (but both shapes are normal). So, pdf(Y) plotted independently on Y axis as f(Y) is for a particular X. We have to find pdf(Y) for each value of X, and then draw 2D plot for all such X. This data is going to come from field observation, and the 2D plot that we get will determine what the value of ρ is. Here the contour plot start becoming an ellipse instead of a circle. You can find proof on internet that this eqn indeed becomes an ellipse (circle is a special case of an ellipse, where major and minor axis are the same). There is one such proof here: https://www.michaelchughes.com/blog/2013/01/why-contours-for-multivariate-gaussian-are-elliptical/

In this case when we draw pdf(X) and pdf(Y) on 2 axis, it is the pdf assuming ρ=0 (same as in case 1 above). You can think of it as pdf of height X irrespective of what the weight Y is. Of course the pdf of height X is different for different weights Y, But we are kind of drawing the global pdf distribution, the same as we drew in case 1 above. Similarly we do it for pdf of weight Y. So, remember this distinction - pdf plots on X and Y axis in case 2 are still pdf plots from case 1 above. When we start plotting the 2D points, is when we know if it's an ellipse or a circle, which gives us the value of ρ.

gnuplot> splot f_bi(x,y,0,0,0.4,0.4,0.5)

 

3. f(x,y) with ρ=0.95 =>Here correlation goes to extreme. We can consider the example of same people as above but Y axis as "score in Algebra2" and X axis as "score in Algebra1". We hope to see very strong correlation, as someone who scores well in Algebra1 will have high probability of scoring well in Algebra2. Similarly someone who scored bad in Algebra1 will have high probability of scoring bad in Algebra2 as well. The plot here starts becoming narrow ellipse and in the extreme case of ρ=1 becomes a 1D slanted copy of pdf of X. What that means is that Y doesn't even have a distribution given X. i.e if we are told that X=57 is the score, then Y is fixed to be Y=59 => Y doesn't have a distribution anymore given a particular X. In real life, Y will likely have a distribution for ex. from Y=54 to Y=60 (+3σ to -3σ range). This data is again going to come from field observation.


gnuplot> splot f_bi(x,y,0,0,0.4,0.4,0.95)

 

Let's see the example in detail once again for all values of ρ => If there are 5 kids with Algebra1 scores of (8,11,6,9,10) at -3σ, then if we go and look at Algebra2 score of these 5 kids, that will tell us the value of ρ. If scores in Algebra2 are all over the place from 0 to 100 (i.e 89, 9, 50, 75, 32) then we have no dependence and 2D contour plot looks like circle. However, if we see Algebra2 scores for these 5 kids are in narrow range as (7,10, 6,11,12), then this has a high dependence and the 2D contour plot looks like a narrow ellipse. This indicates a high value of ρ.

Also, we observe that as ρ goes from ρ=0 (plot 1) to ρ=1 (plot 3) the circle starts moving inwards, and squeezed into an ellipse. So points with some probablility for plot 1 (let's say 0.01 is combined pdf for point A on circle) have moved inwards for same probability for plot 2 and further in for plot 3. Also, the height of 3D plot goes up as the total pdf has to remain 1 for any curve. It's not too difficult to visualize this. Consider a (-3σ, -3σ) point for X and Y axis. This point has probability of 0.003*0.003 = 0.00001 for plot 1 where ρ=0 (i.e X and Y are independent). Now with  ρ=1 (plot 3), the -3σ point for X axis has probability of 0.003, but -3σ point for Y axis has probability of 1 (since with full correlation, Y has 100% probability of being at -3σ  when X is at -3σ). So, probability of (-3σ, -3σ) point is 0.003*1=0.003. So, this point now moves inward into the ellipse. The original point of 0.00001 probability is not (-3σ, -3σ)  point anymore. It looks like (-4σ, -4σ) point now lies closer to that original point, since it's probability is 10^-4*1=0.0001. Even this is higher. Maybe be more like (-4.5σ, -4.5σ) point lies on that point. So, we see how the correlation factor moves the σ points inwards.

2D plot for different samples:

In all the above plots we considered a sample of people and plotted different attributes of same sample of people. However, if we are plotting attributes of different samples, then it gets tricky. For ex, let's say we plot height of women vs height of men. What does it mean? Given pdf of height of men and pdf of height of women, what does combined pdf mean? Does it mean => given men of height=5ft with prob=0.1, and women of height 4ft with prob=0.2, what is the combined probability of finding men of height=5ft AND women of height=4ft. Best we can say is that are independent and so combined prob=0.1*0.2=0.02. So, we expect to see plot which is going to be similar plot as 1 above (with ρ=0). But how do we get field data for this sample to draw a 2D plot. Do we choose a man, and then choose a woman? The combined 2D pdf doesn't make sense, as men and women are 2 different samples.

However, we know that in a population where people are shorter, both men and women tend to be shorter, and in a population where people are taller, both men and women tend to be taller. So, if we take a sample of people, where men's height varied from 6ft to 7ft, and plotted women's height from that community, we might see that their height varies from 5.5ft to 6.5ft. Similarly for population where men's height varied from 5ft to 6ft, and plotted women's height from that community, we might see that their height varies from 4.5ft to 5.5ft. These are local variation within a subset, instead of global variation. If we take all of these local plots, and combine hem into a global plot, then we can get the dependence data. They suggest some correlation. If we plot all of these on our 2D plot, we may see that ρ≠0. We will see ellipse instead of a circle for iso contours of these 2D plot. These kind of plots are very common in semiconductors that we will see later.

Properties: A lot of cool properties of normal distribution appear, if we take the random variables to be independent, i.e ρ=0. Let's look at some of these properties:

Sum of independent Normal RV: Below is true ONLY for independent RV. If RV have ρ≠0, then below property is not true any more.

If X1, X2, ...Xn are independent normal random variables, with means \mu _{1}, \mu _{2} , ..and standard deviations \sigma _{1}, \sigma _{2}, ... then their sum X1+X2+...+xn will also be normally distribute with mean μ1 + μ2 + ...+ μn

and variance σ12 + σ22 + ... + σn2.

A proof exists here: https://online.stat.psu.edu/stat414/lesson/26/26.1

 

 

 

Probability & Statistics

These 2 go together. Probability is the basic foundation for statistics. It's basic knowledge is needed in a couple of things that we do in AI and in VLSI.

Basic probability:

https://en.wikipedia.org/wiki/Probability_theory

Probability is a number from 0 to 1 => 0 means 0% probability and 1 means 100% probability. Probability of an event is rep by letter "P" => P(event). Sum or integral of probability of all possible outcomes will always be 1.

Discrete Probability Distribution: This is for events that are countable, i.e throwing a dice, tossing coin, etc.

P(X) = 0.4 => Probability of event "X" happening is 40%.

If we roll a dice, then probability of any number 1 to 6 showing up is 1/6. P(dice=1)=1/6, P(dice=6)=1/6

Continuous Probability Distribution: This is for events that occur in continuous space, i.e temperature of water, etc.

PDF: Probability distribution function: When we have a function which is continuous, then instead of having discrete probability number, we have continuous probability function. This is called pdf. Integral of pdf over all possible outcomes will be 1 (just as in discrete case, the sum was 1)

P(x1<x<x2) = ∫ f(x)dx, where f(X) is the pdf, and integral is taken over limits x1 to x2

Factorial:

Factorial is defined as multiplication of all numbers less than or equal to that number. It's denoted by ! mark. So, 3!=3*2*1. 1!=1. n!=n*(n-1)...*2*1

n! = n*(n-1)!.

We define 0! as 1, as that keeps it consistent with other mathematical formulas used in Permutation and Combination shown below. It seems like 0! should be 0, but keeping it 1 allows it to blend nicely with Permutation formula for non-zero numbers.  We'll see that below.

Permutation and Combination:

The most important concept related to probability is figuring out all outcomes that are asked for a given event and divide it by all possible outcomes. As an ex, if probability of getting a 7 on throwing 2 dice is to be calculated, we can calculate as follows:

Number of ways 7 is possible E(sum=7)= (1,6), (2,5), (3,4), (4,3), (5,2) and (6,1) = 6 ways

Total number of possibilities of any number E(any sum) = 6 possibilities of 1st dice (1..6) * 6 possibilities of 2nd dice (1..6) = 6*6 = 36 ways

So, probability of getting 7 = E(sum=7)/E(sum=anything) = 6/36 = 1/6

Number of

Another general probability question is when we have to choose few things out of a given set of things, and we want to know of all different ways of doing it. This is where Permutation/Combination comes in. There are 2 handy formulas that we can use.

This link has very good explanation with the formula at the end: https://www.mathsisfun.com/combinatorics/combinations-permutations.html

  1. Permutation: Here order matters in arranging (P means Position, easy way to remember). ex is 4 digit gate lock code. It's a unique 4 digit code, so order of number matters (i.e 4756 is different than 4567).
    • Repetition not allowed: Given n things, if we have to choose r things, then total number of permutations possible = n*(n-1)*...*(n-r+1) = n!/(n-r)! where ! represents factorial. It's rep as nPr = n!/(n-r)!
      • ex: If we have to choose 3 balls out of 5 different colored balls, it can be done in 5*4*3=5!/2!=60. If we have to choose 1 ball, then it's 5!/4!=5 (as any colored ball can be chosen in every choice). If we have to choose 5 balls, then we can do it in 5*4*3*2*1=120 ways. So, 5!/(5-5)! = 5!/0!= 120 ways. The only way this could be 120 is if we choose 0! as 1. That's why we see in factorial section that 0!=1. If we have to choose 0 balls, then we can do it in 1 way. So, it's 5!/(5-0)! (as there's nothing to choose, so empty set is chosen which is just one way of choosing). So, 5!/5!=1
    • Repetition allowed: Given n things, if we have to choose r things, then total number of permutations possible = n^r
  2. Combination: Here order doesn't matter in arranging. ex is choose 3 socks out of a bag full of socks of different colors. The order in which you take the socks out doesn't matter, as we are concerned only about what color the socks are (i.e "red green blue" is no different than "blue green red").
    • Repetition not allowed: Given n things, if we have to choose r things, then we saw the total number of permutations possible in the above permutation case: It's nPr = n!/(n-r)!. We can easily see that for each set of "r distinct things", we have r! possible permutations. These all need to be grouped into one possibility, since we don't care about different permutations anymore. We are just interested in each such group of "r distinct things". So, we can divide the result by r! to get all combinations possible. So, number of combinations are = n!/((n-r)!*r!). It's rep as nCr = n!/((n-r)!.r!).
    • Repetition allowed: Given n things, if we have to choose r things where things can be repeated and order doesn't matter, then it's not as straight forward as the permutation case. An ex of this would be choosing 3 scoops from 5 flavors of ice cream, where each scoop can be any flavor. How many such combinations exist. One way to solve it would be divide it in 3 diff cases:
      • case 1: all 3 flavors are same => 5 such combinations possible.
      • case 2: 2 flavors are same. Here for each duplicated flavor, the remaining flavor can be one of 4 remaining ones, so 5 possibilities. But each dual flavor itself can be of 5 types, so total possibilities = 5*4=20
      • case 3: all 3 flavors are different. This is case of "repetition not allowed for combination" which is nCr = 5C3 = 5!/(3!*2!)=10
      • So, total number of combinations possible = 5+20+10=35
      • However, we can solve it other way, suggested in link above. We can put circle for each flavor selected and "x" for each time we move to next flavor. So, "x" serves to separate out diff flavors. In this ex, we 'll have exactly 3 circles and 4 "x". So, basically we are looking for all ways of arranging 3 circles in 7 positions.

Problems: Permutation + Combination

One of the biggest confusions in solving permutation/combination problems is to figure out whether the problem is a permutation problem or a combinatorial one. Many times it's not clear, and sometimes it's a mix of the 2. We'll look at some common problems below.

  1. Permutation of identical things: Let's say we have 2 balls, they may be identical or different. We have 5 slots arranged in a line, in which we need to put these balls, with only one ball going into 1 slot. How many ways can we arrange this?
    • 1st case: similar balls: Let's say the balls are of 2 colors = red and blue. First ball has 5 places to go, while 2nd ball has 4 places to go, so total places is 5*4=20 possibilities. This is simple permutation problem. The 3 slots remain empty, and there's no permutation possible amongst 3 empty slots, as empty slots are identical. Formula is nPr where we are placing r different things in n slots.
    • 2nd case: identical balls: Now let's say the 2 balls are identical. So, now we have to see in how many ways can this pair of 2 balls be arranged. The 1st ball can go in 5 places, just like before. 2nd ball can still go in 4 places, but many of the cases are now repeated, as balls are identical.
      • So, let's solve it other way as shown below.
        • Keeping 1st ball in position 1, we have 4 positions for ball 2. => Total ways = 4 ways
        • Keeping 1st ball in position 2, we have 3 positions for ball 2. => We can't put 2nd ball in position 1, as we already covered that case above. Total ways = 3 ways
        • Keeping 1st ball in position 3, we have 2 positions for ball 2. => We can't put 2nd ball in position 1 or position 2, as we already covered those cases above. Total ways = 2 ways
        • Keeping 1st ball in position 4, we have 1 position for ball 2. => On similar lines as above, other cases are already covered. Total ways = 1 way
        • Keeping 1st ball in position 5, we have no more unique places left for ball 2 to go. So Total = 0 ways.
        • So, In total, we get => 4+3+2+1 = 10 ways for the pair of 2 balls to go.
      • There's one other way to solve it. Since we are choosing 2 balls, and the order of balls doesn't matter (since they are identical), of all the 20 possibilities of permutation that we had with red and blue ball, we have to cut it down since red ad blue are same color now. So, this is a "combination" problem, where the order doesn't matter. So, we can divide 20 by 2! since 2! is the number of ways that this Red+blue balls were permutated amongst each other. So, total number of ways = 10. This way is lot easier to understand.
      • So, our formula for this permutation problem uses combinatorial part too.  In short, when having identical things r1, r2, ... rn, etc out of a total of r things, and having n slots, the formula is:
        • nPr / ((r1!)(r2!)...(rn!)) => We just took the permutation part and divide it by factorial of how many identical things we have in each set.

 

Basic Statistics:

https://en.wikipedia.org/wiki/Mathematical_statistics

Satistics is widely used in AI. There is a channel called "StatQuest" on Youtube that I found very helpful on learning basic statistics:

https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw

For any sample X, where x1, x2, ..., xn are the individual samples in X, we define various terms that are very important in stats. Let's review these terms:

  • Mean(X) = 1/n * ∑ X  => Mean or average of any sample is sum of all values divided by number of sample points.
  • Variance(X) =  1/n * ∑ (X-Xmean) ^2 => variance is just measuring how far values are scattered from their mean value, on average. So, if X values are close to each other, Var(X) is small. Std deviation is just square root of Variance, i.e std_deviation(X) = σ(X) = √ Variance(X). So, variance(X) = σ2(X). Std deviation is more helpful in practical scenarios, since it represents variation from mean, and NOT the square of variation from mean.
  • Covariance(X,Y) = 1/n * ∑ ( (X-Xmean) * (Y-Ymean) ) => covariance measures joint variance of X and Y (Y is corresponding value for a given X for a set of n samples). Covariance is largest when X,Y move together, and is negative when they move opposite from each other. Covariance is 0, when there is no relation b/w X and Y (i.e X,Y are scattered all over the place). Covariance measures relationship b/w 2 different data, i.e are they related in some way or are they totally unrelated.

 

Central Limit Theorem (CLT):

It is one of the great results of mathematics. It's used both in Probability and statistics. IIt's not going to be used anywhere in our material, but it's good to know this. It establishes the importance of "Normal Distributiion". Theorem is stated in link below:

https://en.wikipedia.org/wiki/Central_limit_theorem