# One-Dimensional Statistical Tools

When we're presented with sets of raw data from some observations, we need to be able to systematically quantify its various properties. 
This is the purpose of most one-dimensional descriptive statistics:

- Quantify the "average" value present (i.e. the mean)

$$ \bar{x} := \frac{1}{N} \sum_{i=1}^{N} x_i $$

- The "center" or "middle value" (i.e. the median)
- The values that appear more or less frequently (i.e. the mode)
- How broad/extreme are the observations (i.e. min/max, these can be at odds with the above)
- The "spread" of the data, how varied is it?
- Variance is:

$$ var(X) := \sigma^2_x := \frac{1}{N-1} \sum_{i=1}^{N} (x_i - \bar{x})^2 $$

- Standard deviation is then:

$$ std(X) := \sqrt{\sigma^2_x} $$

- Covariance is similar to the variance, but instead of average square distance to the mean it's the average product of differences with their means:\
$$ cov(X,Y) = \frac{1}{N-1} \sum_{i=1}^{N} (x_i - \bar{x}) (y_i - \bar{y}) $$
Note then covariance generalizes variance in the sense that 
$$ var(X) = cov(X,X) $$
- Correlation between two sample populations/events/processes is measure of their relationship. Typically given by a 'correlation coefficient' that is in the range (-1,1). A positive correlation means that when the value of the first process/observation is higher, so will the other one be; i.e. they increase together. Negative implies an inverse relationship; when one grows, the other shrinks. To define we also to define the "covariance" between the samples.
$$ \sigma_{x,y} := \frac{cov(X,Y)}{\sigma_x \sigma_y} $$

## First, some discussion of using Python modules

We want to, in Python, be able to import not only nice build-in libraries, but code we wrote ourselves!
Similar to, in `c++`, being able to `#include` other `.cpp` files.

We with Python `modules` -- which for us will just mean single files with definitions of function/utility variables inside.

Idea:
- Write your python tools, functions, etc. in some file that ends in `.py`, ex. `stats.py`
- If the file is local to your current other Python file or Notebook, you can simple `import` that file by name (without the `.py`), e.g. `import stats`
- This then loads all the constituent definitions into a scope named by the import, e.g. a function called `mean` defined in `stats.py` will be acessible via `stats.mean`
- NB: if using a notebooky or kernel-based environment, either have to unload and reload the module to refresh its contents or restart your kernel (unloading stuff left as exercise)

In [1]:
# every variable, function, etc. in the top-level scope is now here(!) 
# and located under the "stats" scope.
# When you import, it essentially does `python stats.py` and stores the definitions/variables
import numpy as np
import stats

# fancier imports, if you dislike the stats.XXX style we can import and load into the global scope
#from stats import mean # only imports the mean function but puts in the global scopej
#from stats import * # to get everything in the global scope

In [2]:
# with `import stats`
stats.mean(np.array([3,5,7,9]))

# with `from stats import mean`
#mean(np.array([3,5,7,9]))

6.0

In [3]:
# len = 'hello'

In [4]:
# def len(x):
#     print('ha')

In [5]:
len([1,2,4,5])

4

In [6]:
stats.median([2,5,4,3,1])

3

In [7]:
stats.median([2,5,4,3])

3.5

In [8]:
sum([1,2,32,4,5])

44

In [9]:
from collections import Counter

In [10]:
xs = [1,2,3,4,3,2,3,1,2,1,1,2,3,2]

In [11]:
counts = Counter(xs)

In [12]:
counts

Counter({1: 4, 2: 5, 3: 4, 4: 1})

In [13]:
counts.values()

dict_values([4, 5, 4, 1])

In [14]:
max(counts.values())

5

In [15]:
counts.items()

dict_items([(1, 4), (2, 5), (3, 4), (4, 1)])

In [16]:
[x[0] for x in counts.items() if x[1] == max(counts.values())]

[2]

In [17]:
x = []
for val, count in counts.items():
    if count == max(counts.values()):
#         x += [val]
        x.append(val)
x

[2]

In [18]:
sorted([1,2,3,4,3,2,3,1,2,1,1,2,3,2])

[1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4]

In [19]:
len([1,2,3,4,3,2,3,1,2,1,1,2,3,2])

14

In [20]:
help(stats.mean)

Help on function mean in module stats:

mean(x)
    calculate and return the mean of a numpy array

