{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["<div style='background-image: url(\"../share/images/header.svg\") ; padding: 0px ; background-size: cover ; border-radius: 5px ; height: 250px'>\n", "    <div style=\"float: right ; margin: 50px ; padding: 20px ; background: rgba(255 , 255 , 255 , 0.7) ; width: 50% ; height: 150px\">\n", "        <div style=\"position: relative ; top: 50% ; transform: translatey(-50%)\">\n", "            <div style=\"font-size: xx-large ; font-weight: 900 ; color: rgba(0 , 0 , 0 , 0.8) ; line-height: 100%\">Signal Processing</div>\n", "            <div style=\"font-size: large ; padding-top: 20px ; color: rgba(0 , 0 , 0 , 0.5)\">Spectral Analysis + Preprocessing - Solution</div>\n", "        </div>\n", "    </div>\n", "</div>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Seismo-Live: http://seismo-live.org\n", "\n", "##### Authors:\n", "* Stefanie Donner ([@stefdonner](https://github.com/stefdonner))\n", "* Celine Hadziioannou ([@hadzii](https://github.com/hadzii))\n", "* Ceri Nunn ([@cerinunn](https://github.com/cerinunn))\n", "\n", "\n", "---"]}, {"cell_type": "markdown", "metadata": {}, "source": ["<h1>Basics in spectral analysis + preprocessing</h1>\n", "<br>"]}, {"cell_type": "code", "execution_count": null, "metadata": {"code_folding": [0]}, "outputs": [], "source": ["# Cell 0 - Preparation: load packages, set some basic options  \n", "%matplotlib inline\n", "import obspy\n", "from obspy.signal.invsim import cosine_taper \n", "from obspy.signal.filter import lowpass\n", "from obspy.clients.fdsn import Client\n", "from obspy.core import UTCDateTime\n", "from matplotlib import rcParams\n", "import numpy as np\n", "import matplotlib.pylab as plt\n", "plt.style.use('ggplot')\n", "plt.rcParams['figure.figsize'] = 15, 7\n", "rcParams[\"figure.subplot.hspace\"] = (0.5)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## The sampling theorem\n", "\n", "In the digital world, we have neither continuous nor periodic signals. We always work with sampled signals. Therefore, we need to define some terms characterising the sampling:   \n", "* The *sampling interval* is the temporal distance $\\Delta t$ between the single sample values. The position on the time axis of one sample $k$ of the signal is thus given as $t = k\\Delta t$ with $k=[1,2, ..., N]$ and $N$ being the maximum number of samples within the signal.  \n", "* The *period* of the signal is then given as $T = N\\Delta t$.\n", "* The *sampling rate* - also called the *sampling frequency* - is $\\Delta f = 1/T$.   \n", "\n", "The *sampling theorem* states that at least 2 samples per period are needed to correctly reproduce the highest frequency of a signal. Or in other words: A signal can only reproduced properly when it does not contain frequency components above one-half of the sampling frequency. This is the definition of the *Nyquist frequency*:\n", "\n", "$$ f_{Ny} = \\frac{1}{2\\Delta t} = \\frac{\\Delta f}{2}$$\n", "\n", "An important consequence of this theorem is the necessity to low-pass data *before* resampling them to a lower sample rate. The corner frequency of the low-pass filter is maximum the Nyquist frequency of the *new* sample rate. If this necessity is not considered, we obtain *aliasing* effects.\n", "\n", "When dealing with spectral analysis, it should be clear that it is fundamentally based on the characterisitics and properties of the *[Fourier series/Fourier transformation](fourier_transform.ipynb)*. Make sure, that you have understood this lecture as well. \n", "\n", "\n", "---"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Nyquist frequency and aliasing\n", "\n", "In the following cell, we first create a signal made of two different frequencies (4 and 8 Hz) and sampled at 20 Hz (black). Then, we downsample it to 10 Hz in two ways: first, we just reduce the sample rate (red); second, we low-pass filter the data before downsampling (green). Finally, all three signals are plotted in the time- and frequency-domain to see the effects.   \n", "\n", "1) What are the Nyquist frequencies for the original and the downsampled data?   \n", "2) Comparing the original (black) with the purely downsampled data (red), what do you observe and can you explain it?   \n", "3) Comparing the original (black) with the low-passed and downsampled signal, what do you observe now? "]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 1: downsampling + Nyquist\n", "npts = 512                              # number of samples\n", "nsec = 4.0                              # length of signal in seconds\n", "df = 20.0                               # sampling rate\n", "fNy = df / 2.0                          # Nyquist frequency\n", "fg1 = 8                                 # generator frequency 1 (initial: 8 Hz)\n", "fg2 = 4                                 # generator frequency 2 (initial: 4 Hz)\n", "time = np.linspace(0,nsec,(nsec*df)+1)  # time axis for plotting\n", "\n", "y  = np.sin(2 * np.pi * fg1 * time)     # set up a test signal from two frequencies\n", "y += np.sin(2 * np.pi * fg2 * time) \n", "\n", "# downsample to 10 Hz by taking every second element\n", "y_2 = y[::2]\n", "\n", "# downsample after lowpassing the signal\n", "y_l = lowpass(y, 5.0, df=df, corners=4, zerophase=False)\n", "y_new = y_l[::2]\n", "\n", "y_f = np.fft.rfft(y)                    # transform all 3 signals into frequency domain \n", "y_f2 = np.fft.rfft(y_2)                 # applying Fourier transformation via FFT\n", "y_fnew = np.fft.rfft(y_l)\n", "freq = np.linspace(0, fNy, len(y_f))    # frequency axis for plotting\n", "\n", "# plot\n", "plt.subplot(211)\n", "plt.plot(time, y, 'k', label=\"Original data\", lw=1.5)\n", "plt.plot(time[::2], y_2, 'r--', label=\"Downsample without lowpass\", lw=2)\n", "plt.plot(time[::2], y_new, 'g', label=\"Downsample with lowpass\", lw=2)\n", "plt.legend()\n", "plt.ylim(-2, 4.5)\n", "plt.title('Time Domain')\n", "plt.xlabel('Time [s]')\n", "plt.ylabel('Amplitude')\n", "\n", "plt.subplot(212)\n", "plt.plot(freq, abs(y_f), 'k', label=\"Original frequencies\", lw=1.5)\n", "plt.plot(freq[:len(y_f2)], abs(y_f2), 'r--', label=\"Downsample without lowpass\", lw=2)\n", "plt.plot(freq[:len(y_fnew)], abs(y_fnew), 'g--', label=\"Downsample with lowpass\", lw=3)\n", "plt.legend()\n", "plt.ylim(0, 55)\n", "plt.title('Frequency Domain')\n", "plt.xlabel('Frequency [Hz]')\n", "plt.ylabel('Amplitude')\n", "\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["---\n", "\n", "### Spectral leakage\n", "\n", "Now, we create a sinusoid with frequency 4 Hz sampled at 100 Hz, transfer it in the frequency-domain via FFT and plot both. \n", "\n", "4) What would you expect theoretically for the frequency-domain plot.   \n", "5) Enlarge the length of the time-domain signal by setting the variable $leng$ step by step higher. What do you observe in the plot for the frequency domain? Can you explain it?   \n", "6) In reality we cannot just enlarge the length of our signal. Can you think about a way to still get the same effect?"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 2: spectral leakage\n", "leng = 2.0                           # length of signal in seconds (initial: 2 s)\n", "dt = 1./100.                         # sampling interval\n", "ny = 1/(2.*dt)                       # Nyquist frequency\n", "t = np.arange(0, leng, dt)           # time axis for plotting\n", "sin  = np.sin(2 * 4 * np.pi * t)     # set up a sine wave as signal\n", "\n", "Fsin = np.fft.rfft(sin)              # FFT to frequency domain\n", "f = np.linspace(0, ny, len(Fsin))    # frequency axis for plotting\n", "\n", "# plot\n", "plt.subplot(211)\n", "plt.plot(t, sin, 'b')\n", "plt.title('Time Domain')\n", "plt.ylim(-1.1,1.1)\n", "plt.xlabel('Time [s]')\n", "plt.ylabel('Amplitude')\n", "\n", "plt.subplot(212)\n", "plt.plot(f, abs(Fsin), 'b')\n", "plt.xlim(1,8)\n", "plt.title('Frequency Domain')\n", "plt.xlabel('Frequency [Hz]')\n", "plt.ylabel('Amplitude')\n", "\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Time-frequency plots - spectrograms \n", "\n", "So far, we analysed the data only in either time- or frequency-domain. Sometimes it is helpful to look at both dimensions together. Such a time-frequency plot is called *spectrogram*. In cell 3, we first download and prepare data from the $M_w$ 9.1 Tohoku earthquake from 11 March 2011. In cell 4, we then create a spectrogram from these data.    \n", "\n", "7) What happens when you increase the number of sampe points $NFFT$ in cell 4 and why?     \n", "8) Zoom in to the start of the signal by changing $xstart$ and $xend$. Does a longer or shorter window length allow you to identify the start of the signal more easily? Why?     \n", "9) Now adapt the time limits to look at the surface waves, what do you observe? "]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 3: prepare data from Tohoku earthquake. \n", "client = Client(\"BGR\")\n", "t1 = UTCDateTime(\"2011-03-11T05:00:00.000\")\n", "st = client.get_waveforms(\"GR\", \"WET\", \"\", \"BHZ\", t1, t1 + 6 * 60 * 60, \n", "                          attach_response = True)\n", "st.remove_response(output=\"VEL\")\n", "st.detrend('linear')\n", "st.detrend('demean')\n", "st.plot()"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 4 - spectrogram\n", "tr = st[0]\n", "NFFT = 256    # length of spectrogram window in sample points (initial: 256)\n", "# number of sample points that the sliding window overlaps, must be less than NFFT\n", "noverlap = 50  \n", "xstart = 0    # x axis limits in the plot\n", "xend = 21627  # max. length of signal: 21627 sec\n", "\n", "# plot\n", "ax1 = plt.subplot(211)\n", "plt.plot(tr.times(), tr.data, linewidth=0.5)\n", "plt.xlabel('time [sec]')\n", "plt.ylabel('velocity [m/s]')\n", "\n", "plt.subplot(212, sharex=ax1)\n", "plt.title('spectrogram, window length %s pts' % NFFT)\n", "Pxx, freqs, bins, im = plt.specgram(\n", "    tr.data, NFFT=NFFT, Fs=tr.stats.sampling_rate, \n", "    noverlap=noverlap,cmap=plt.cm.gist_heat)\n", "\n", "# Pxx is the segments x freqs array of instantaneous power, freqs is\n", "# the frequency vector, bins are the centers of the time bins in which\n", "# the power is computed, and im is the matplotlib.image.AxesImage instance\n", "plt.ylabel('frequency [Hz]')\n", "plt.xlabel('time [sec]')\n", "plt.ylim(0,0.2)\n", "\n", "plt.xlim(xstart, xend)\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["The example above uses the `specgram` function of `matplotlib.pylab`. In case of very long signals or signals with a high sampling rate, this is highly recommended due to the high computational effort. In case of shorter time signals, `ObsPy` also offers a `spectrogram` function. An example can be found [here](https://docs.obspy.org/tutorial/code_snippets/plotting_spectrograms.html)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["---\n", "\n", "## Preprocessing data\n", "\n", "The basics of the spectral analysis are the foundation for *preprocessing data*. Preprocessing data is an essential task before the data can be used for any scientific analysis. First, the instrument characteristic needs to be removed from the data and second, we need to make sure that the data are in the ideal shape for the specific task. The 'good-practise rules' for preprocessing depend on the characteristics/properties of spectral analysis. They include the following steps in more or less this order:\n", "\n", "- Restitution, also called decomposition or instrument correction   \n", "- Low-pass filtering\n", "- Downsample/resample the data\n", "- Cut the data to a specific window\n", "- Detrend and demean the data\n", "- Taper the data\n", "- Pad with zeros\n", "\n", "Not always all these steps  and sometimes further preprocessing steps are necessary, depending on the specific task the data shall be used for. However, these steps are the main and most important ones. The restitution is a very special process with a lot of details in its own. Therefore, we skip it here and refer to the [separate notebook on instrument correction](../General Seismology/instrument_response.ipynb).\n", "\n", "---\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Windows and Tapers\n", "\n", "Sometimes, we have a very long signal but only need a small portion of it, e.g. a recording of an earthquake in a file covering the entire day. Then, to speed up processing and make plots more clear, we need to cut that part, we are interested in. This is also called *windowing* the signal because it is similar to applying a boxcar/window function to the data in the frequency-domain (see the [notebook on filter basics](...ipynb) for more details).  \n", "\n", "In Cell 5, you have several possibilities to set-up an inital signal. Just comment/uncomment the lines in the second code block as you want. In the fourth code block the taper is defined. Here, we use a simple $cos$ to taper. Working on real data using `ObsPy` you have several [taper windows](https://docs.obspy.org/master/packages/autogen/obspy.core.trace.Trace.taper.html) to chose from. In general, the percentage of taper applied has a much larger effect then the choosen window. \n", "\n", "10) Why do we apply a taper to the data and what is a taper actually?   \n", "11) What happens at the beginning and end of the taper and what happens in the middle part?    \n", "12) What happens when you increase the percentage of tappering?    "]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 5 - tapers  \n", "samp = 200                                # number of sample point (initial: 200)\n", "delta = 0.05                              # sample spacing (initial: 0.05)\n", "freq1 = 2.15                              # generator freq to create signal (initial: 2.15)\n", "freq2 = 7.8                               # generator freq to create signal (initial: 7.8)\n", "temp = np.linspace(0.0, samp*delta, samp) # time axis in seconds\n", "\n", "# generate freq1 Hz sine wave signal\n", "dat = np.sin(freq1 * 2.0 * np.pi * temp)\n", "# add an offset for the zero point of the sinewave\n", "dat = np.sin(freq1 * 2.0 * np.pi * temp + np.pi/3)\n", "## add another sinusoid to signal with freq2 Hz\n", "#dat = dat + np.sin(freq2 * 2.0 * np.pi * temp + np.pi/3)\n", "#noise_amplitude = 0.7\n", "## add noise to the signal\n", "#dat = dat + np.random.randn(len(dat)) * noise_amplitude \n", "# determine max. amplitude of data (for plotting)\n", "maximum = max(dat)\n", "\n", "print('Before Taper')\n", "print('amplitude of first sample point:%6.1f' %dat[0])\n", "print('amplitude of last sample point:%6.1f' %(dat[len(dat)-1]))\n", "\n", "# percentage of taper applied [0. ; 1.] (initial: 0.1)\n", "taper_percentage = 0.1\n", "# define taper window\n", "taper = cosine_taper(samp,taper_percentage)\n", "# taper the signal\n", "dat_taper = dat * taper\n", "\n", "print('After Taper')\n", "print('amplitude of first sample point:%6.1f' %dat_taper[0])\n", "print('amplitude of last sample point:%6.1f' %(dat_taper[len(dat_taper)-1]))\n", "\n", "# FFT data into frequency-domain\n", "Fdat = np.fft.rfft(dat, n=samp)\n", "Fdat_taper = np.fft.rfft(dat_taper, n=samp)\n", "# x-axis in f-domain for plotting\n", "xf = np.linspace(0.0, 1.0/(2.0*delta), (samp/2)+1)\n", "\n", "# plot\n", "plt.subplot(211)\n", "plt.title('Time Domain')\n", "plt.plot(temp, dat, label=\"Original Data\", color='b', linewidth=1.5)\n", "plt.plot(temp, dat_taper, label=\"Tapered Data\", color='g',linewidth=1.5)\n", "plt.plot(temp, taper, label=\"The Taper\", color='r', linewidth=2)\n", "plt.legend(loc='lower center')\n", "plt.xlabel('Time [s]')\n", "plt.ylabel('Amplitude')\n", "plt.ylim(-(maximum+0.2),maximum+0.2)\n", "\n", "plt.subplot(212)\n", "plt.title('Frequency Domain')\n", "plt.plot(xf, 2.0/samp * np.abs(Fdat), color='b',label=\"no Taper\",linewidth=1.5)\n", "plt.plot(xf, 2.0/samp * np.abs(Fdat_taper), label=\"with Taper\", color='g',linewidth=1.5)\n", "plt.legend()\n", "plt.xlabel('Frequency [Hz]')\n", "plt.ylabel('Amplitude')\n", "\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["---\n", "\n", "### Deman / detrend\n", "\n", "Real data are definitely not perfect. Worse luck! So, they can have jumps, trends, offsets, spikes, etc. ... Luckily, the most common ones are easy to handle.    \n", "In Cell 6 we load a stream which is uncluded in the `ObsPy` installation as an example to start with. In the following cells, we artifically introduce some data errors and see how they blow-up our processing. As an example, we use a bandpass filter. `ObsPy` provide us with the right [tool](http://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.detrend.html#obspy.core.trace.Trace.detrend) to correct our data before further (pre-)processing.\n", "     \n", "13) Run the cells and describe what effects you can see. Then uncomment the line *correction* and run the cells again. What do you observe.    \n", "14) Can you explain, why all these spoilers blow-up the further processing?"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 6 - load the data\n", "st = obspy.read()             # read in example seismogram included in obspy\n", "print(st)\n", "print('')\n", "tr = st[0]                    # take only the vertical trace of the stream\n", "print(tr)\n", "tr.filter(\"highpass\", freq=2) # removing long-period noise\n", "tr.plot()\n", "tr_safe = tr.copy()           # safety copy to not override our data"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 7 - offsets \n", "\n", "tr = tr_safe.copy()       # load a fresh version of the data\n", "tr.data += 500            # creating an offset\n", "tr.plot()\n", "#tr.detrend('demean')     # correction\n", "tr.filter('bandpass', freqmin=0.01, freqmax=5)\n", "tr.plot()"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 8 - linear trend \n", "\n", "tr = tr_safe.copy()         # load a fresh version of the data\n", "tr.data += 50. * tr.times() # creating a linear trend\n", "tr.plot()\n", "#tr.detrend('linear')       # correction\n", "#tr.plot()\n", "tr.filter('bandpass', freqmin=0.01, freqmax=5.)\n", "tr.plot()"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# Cell 9 - any polynomial trend\n", "\n", "# load a fresh version of the data\n", "tr = tr_safe.copy()\n", "# \"spoiling\" the data\n", "tr.data += 6000 + 4 * tr.times() ** 2\n", "# further \"spoiling\" the data\n", "tr.data -= 0.1 * tr.times() ** 3 + 0.00001 * tr.times() ** 5\n", "\n", "tr.plot()\n", "tmp = tr.copy()\n", "tmp.filter('bandpass', freqmin=0.01, freqmax=5)\n", "tmp.plot()\n", "\n", "# taking the data as array out of trace container\n", "data = tr.data                                           \n", "# correct the spoiled data and plot+print it\n", "obspy.signal.detrend.polynomial(data, order=3, plot=True) "]}, {"cell_type": "markdown", "metadata": {"tags": ["solution"]}, "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}}, "nbformat": 4, "nbformat_minor": 2}