180A Python Examples
180A Python Examples
∗
Dylan Neff
Part I
Introduction
The goal of this document is not to teach Python fundamentals but, rather, to teach by examples
relevant to analysis in this nuclear physics experimental course. The examples will roughly follow
the flow of the analyses for the course, gradually adding tools and building upon those introduced
in previous examples. The code will be written in functions, heavily commented, and attempt to
use only the core Python packages (numpy, matplotlib, scipy, etc.). Screenshots of output will be
taken from the Spyder IDE, but the code presented is general and should run equally well on any
user environment. If any of this material is out of date/incorrect, please email the issue to Dylan
Neff. As a disclaimer, the author is by no means a Python expert so suggestions and coding style
should be considered with this in mind.
∗
Physics Department, UCLA, Los Angeles, CA 90024, USA
1
Contents
I Introduction 1
II Python Basics 4
1 Hello World 4
1.1 Example Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Use of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Python Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Reading Files 5
2.1 Example Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 File Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Discussion of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Plotting 8
3.1 Example Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Discussion of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Fitting 14
4.1 Finding Optimal Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Example Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Discussion of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1 Histograms 18
1.1 Example Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Simple Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Fitting and Plotting with Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Earthquake Example 23
2.1 Analysis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Analysis Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Python Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Python lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Filtering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Discrete Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
IV Full Scripts 30
2
1 Hello World 30
2 Read File 31
3 Plotting 32
4 Fitting 34
5 Histograms 36
6 Earthquakes 38
3
Part II
Python Basics
One major strength of Python is its popularity. This, along with its high level syntax, makes
learning Python substantially easier than other languages. There is a plethora of material available
online for learning Python and, more importantly, any question about the language has likely
already been asked and answered somewhere online. For this reason, this document will provide
only cursory introductory examples and will instead focus on unique tools needed for nuclear
physics. It is left to the reader to fill in any gaps via Google and documentation pages for Python
packages. Some good general python tutorials can be found at codeacademy and learnpython.
1 Hello World
As is tradition in programming introductions, the first example will print hello world to the termi-
nal:
if __name__ == '__main__':
main() # If this file is run as a script directly, this statement will execute and main will be run
Upon running this script, the terminal should display ”hello world” as shown in Figure 1.
4
While the use of functions is by no means mandatory, it is heavily encouraged to develop good
coding practices. Object oriented programming is a somewhat related paradigm which will be
dodged in this document but can be applied in this course.
2 Reading Files
Reading data from files stored on the local machine will be crucial for this course. In this example,
a set of US corona virus data is stored in a comma separated values (csv) file with three columns:
the date, number of cases on that date, and the number of deaths on that date. This file can be
opened and viewed in any text editor or spreadsheet software. A screenshot of the first lines are
shown in Figure 2.
This example shows how to open, read, and print the data in the file to the terminal. This
is split into two functions, read_cases and print_cases which are called in turn from the main
function. The defined path is passed to read_cases which returns the date and cases data. These
are then passed to print_cases which prints the data to the terminal.
5
2.1 Example Code
def main():
"""
Read US corona virus case data from file and print to terminal.
:return:
"""
path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex2_Read_File/us_12_14_20.csv'
dates, cases = read_cases(path)
print_cases(dates, cases)
print('donzo')
def read_cases(path):
"""
Read corona virus case data from file at the input path and return lists of dates and case numbers
:param path: Path to corona virus case data
:return: List of dates and list of cases
"""
with open(path, 'r') as file: # Open the contents at given path and name this opened object file
lines = file.readlines() # Read the opened file into a list with each entry a string of the line
for line in lines[1:]: # Iterate over each line, skipping the first line since it's a header
line_data = line.split(',') # Split each line string into a list of items separated by commas
dates.append(line_data[0]) # Append zeroth element of line_data to list of dates
cases.append(int(line_data[1])) # Append first element of line_data to list of cases as an integer
for date, case in zip(dates, cases): # Iterate through dates and cases list together
print(f'date: {date}, cases: {case}') # Print each date/case pair on a new terminal line
6
2.2 File Paths
A key aspect of reading and writing files on your computer is knowing where those files are located.
Computer data is stored in a filesystem structure, in which folders or directories are used to organize
and retrieve data. A file path is the location of a file within this file system and uniquely specifies
the file from all others on the computer. As demonstrated in the example code, the path must
be specified in order for Python to open the correct file. The path of a file can be found by right
clicking on it and selecting ‘Properties’ in Windows/Linux or ‘Get Info’ on a Mac.
7
Figure 3: Portion of terminal output from reading files example.
3 Plotting
This example builds on the previous, using the read_cases function verbatim. The goal of this
section is to demonstrate Python plotting with the matplotlib module. Once the dates and cases
are read, they are passed to three independent functions: plot_cases_simple, plot_cases and
plot_case_rate. The plot_cases_simple function simply plots the data with the cumulative
number of cases on the y-axis and the case index on the x-axis. The plot_cases function does
the same but instead uses the date values for the x-axis and adds a title. The plot_case_rate
function plots the number of daily new cases vs the date along with uncertainty bands to showcase
the feature.
8
3.1 Example Code
def main():
"""
Read US corona virus case data from file and plot.
:return:
"""
path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex2_Read_File/us_12_14_20.csv'
dates, cases = read_cases(path)
plot_cases_simple(cases)
plot_cases(dates, cases)
plot_case_rate(dates, cases)
print('donzo')
def plot_cases_simple(cases):
"""
Plot cumulative US corona virus cases vs the case index using matplotlib.
:param cases: List of coronavirus case numbers for each day
:return:
"""
case_indices = range(len(cases)) # Produce a list of integers from {0, 1, ..., number of cases}
plt.plot(case_indices, cases) # Plot case indices on x-axis and number of cases on y-axis.
plt.show() # Once plot is set up and ready, call this to display the plot. Not displayed if this is not called
dates = [dt.strptime(date, '%Y-%m-%d').date() for date in dates] # Convert date strings to datetime objects
plt.plot(dates, cases) # Plot dates on x-axis and cases on y-axis. This can be called alone if no formatting needed
plt.show() # Once plot is set up and ready, call this to display the plot. Not displayed if this is not called
fig, ax = plt.subplots() # Set matplotlib subplots, only needed when axis (ax) object needed for manipulation
ax.set_title('Daily New Covid Cases in US') # Set plot title
dates = [dt.strptime(date, '%Y-%m-%d').date() for date in dates] # Convert date strings to datetime objects
plt.plot(dates[1:], rates, color='green') # Plot dates (skipping first) on x-axis rates on y, set color to green
plt.fill_between(dates[1:], rates - rates_err, rates + rates_err, facecolor='green', alpha=0.3,
interpolate=True) # Plot uncertainty bands about data, alpha is transparency
plt.show() # Once plot is set up and ready, call this to display the plot. Not displayed if this is not called.
9
3.2 Discussion of Code
The plot_cases_simple function simply takes the case data and plots them on the y-axis with
the case list index on the x-axis. To generate this list of case indices, the number of cases in the list
is first calculated with the built-in len function. The (also built-in) range function then generates
a list of integers starting from zero up to the number of cases minus 1. The case_indices and
cases are then passed to matplotlib for plotting.
The two most important and ubiquitous lines used in matplotlib plotting are plt.plot and
plt.show. The plt.plot function is versatile but here takes the two arguments: a list of x-axis
values first followed by a list of y-axis values. Both lists must have the same number of elements.
This produces, by default, a plot in which each pair or x and y points are connected by a line. The
plot function can take many additional options and perusing the documentation page is highly
encouraged. The final line calls the plt.show function which is required to actually display the
figures that have been created and will, depending upon the Python environment or IDE, create a
pop-up matplotlib window with the plot and possibly hold execution of the code until this window
is closed. This pop-up figure window, shown in Figure 4 is interactive and displays the line-plot of
cases vs date just created. This window can be resized and offers some simple options for formatting
the plot.
If using Spyder IDE, matplotlib figures will likely be produced in the ‘Plots’ pane or inline on
the terminal by default. To change these settings to instead have an interactive matplotlib
window pop up (suggested) from the top tool bar click ‘Tools’→‘Preferences’. From the list
on the right hand side select ‘IPython console’ and then select the ‘Graphics’ tab. Here the
default behavior of various plotting aspects can be changed. In order to produce plots in
their own interactive windows, select ‘Automatic’ from the ‘Backend’ drop-down menu under
the ‘Graphics backend’ section. Once selected, click ‘Apply’ and the ‘OK’ to close. You will
likely need to manually restart Spyder for these changes to properly take effect.
10
Figure 4: Simple plot of cumulative US corona cases vs case index in an interactive matplotlib
window.
The plot_cases function again uses plt.plot and plt.show as in the plot_cases_simple
function but now plots the dates on the x-axis and adds a title. The plt.subplots method is
first called to set the figure (fig) and axis (ax) objects. These objects are set automatically when
plt.plot is called if none have previously been created. Calling subplots is useful when the figure
or axis objects are needed for direct manipulation or formatting. The plot title is set by calling the
axis set_title function.
Datetime Package
The built in Python datetime package will not be necessary for this course but is extremely
useful when working with dates and will be explained in the next paragraph for completeness.
To plot dates properly a bit of work must be done. The date string is first converted to a Python
datetime object which is a module in the Python standard library (datetime object in datetime
module imported and succinctly renamed “dt”). This is accomplished by a list comprehension
which iterates over the original strings, converts each to a datetime object, then stores the results
in a list (this list is stored as “dates”, overwriting the original list of strings which are no longer
needed in this function). A matplotlib formatter is created with the DateFormatter class, telling
the plot to display only the month and day separated by a ‘/’. This formatter object is passed
to the axis object’s xaxis with set_major_formatter and the dates are rotated by 30° with the
set_tick_params function. Once this formatting is done, plt.plot and plt.show are called as
before and the figure is displayed as in Figure 5.
11
Figure 5: Plot of cumulative US corona cases vs date with title.
The plot_case_rate re-uses many of the tools presented in plot_cases and expands on the
capabilities of matplotlib plotting. Instead of analyzing the cumulative number of covid cases, a
somewhat more interesting data set is the number of new cases reported per day. This is obtained
from the cumulative cases by subtracting the total cases on day i from the total number of cases on
day i+1. These N −1 rate values are calculated from the N case values by iterating through all but
the last case value, subtracting the ith case value from the (i + 1)th and storing these differences
collected via list comprehension as a numpy array. In addition, the uncertainty on these rates
can be very roughly estimated under the assumption of counting statistics as the square root of
the rate values. The benefit of numpy arrays is that they are easy to manipulate algebraically as
demonstrated by np.sqrt(rates) which takes the square root of each value of the array individually
and stores the new array as rates_err.
It is prudent to note here that the conversion from cases to rates is a step logically indepen-
dent from the plotting procedure. The conversion can be seen as a manipulation or analysis
of the original case data and would likely be better partitioned into its own function. This
would especially hold true if the manipulation took any more than a few lines of code to
accomplish.
12
After obtaining a numpy array of the rates and one for the associated error estimates, the figure
and axis are created and formatted in a manner similar to the plot_cases function. Next, a line-
plot is again created though this time the first date is skipped since the change in case numbers
is being plotted and the color of the line is set to green. To showcase a feature of matplotlib,
the fill_between method is used with the rate error estimates to fill a transparent (dictated by
the ”alpha” parameter) band about the rate data. The resulting plot is shown in Figure 6 and a
zoomed version in Figure 7 to show the error bands.
Figure 6: Plot of daily change of US corona Figure 7: Zoomed plot of daily change of
virus cases vs date. cases to show filled uncertainty bands.
13
4 Fitting
Finding the best fit of a given function to data is an extremely important tool in analysis. Given
a set of data, a function with an arbitrary set of parameters, and a metric with which to compare
the function output to the data, a “best fit” of the function to the data is obtained by finding
the parameter values that optimize the goodness of fit metric. This goodness of fit metric is more
generally called a loss function and in this document a simple quadratic loss function will be utilized
via least squares regression.
14
4.2 Example Code
def main():
"""
Read US corona virus case data from file and fit with exponential.
:return:
"""
path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex2_Read_File/us_12_14_20.csv'
dates, cases = read_cases(path)
fit_cases(dates, cases)
print('donzo')
dates = [dt.strptime(date, '%Y-%m-%d').date() for date in dates] # Convert date strings to datetime objects
# Convert to days since first day for fitting, int64 for large integers, default int32 max is 5236926
days = np.asarray([(date - dates[0]).days for date in dates], dtype=np.int64)
cases = np.asarray(cases, dtype=np.int64) # Convert cases from Python list to numpy array
plt.plot(dates, cases, color='blue', label='data') # Plot dates on x-axis and cases on y-axis
plt.plot(dates, exp(days, *popt), 'r--', label='fit') # Plot best fit curve in red dashes
plt.show() # Once plot is set up and ready, call this to display the plot. Not displayed if this is not called
15
4.3 Discussion of Code
The main function of this script again reuses the read_cases function to pull the dates and cases
lists from the text file at the given path. The data are then passed to the fit_cases function
where they are fit to a simple exponential function.
The fit_cases function repeats much of the matplotlib setup and formatting described in
Section 3. The fitting is accomplished with the curve_fit function from the scipy package. This
function takes the fitting function as its first parameter followed by the x data (independent variable)
and then the y data (dependent variable). The fitting function must take the x data as its first
parameter followed by an arbitrary number of parameters and return a value to compare to the data.
In the example code, an exp function is defined following this format. It takes parameters a which
is the amplitude of the exponential and lam which determines the curvature of the exponential.
It then calculates and returns the value of the exponential given these two parameters and the x
value. The x and y data passed to curve_fit are the same lists passed to the plot function. The
date values are stored as datetime objects which can be used by matplotlib but are not directly
compatible with scipy curve_fit. To work with curve_fit, the dates are converted to the integer
number of days since the first date. In addition, these are stored explicitly as 64 bit integers using
the dtype argument rather than the default 32 bit integers whose max is 5236926, smaller than the
max values encountered in the cases data. The cases are also converted to a numpy array of 64 bit
integers.
The curve_fit function varies the input parameters and compares the goodness of fit for each
variation, searching for an optimal set of parameter values. As such, it is often important, especially
with many parameters (many dimensional space to explore), to set specific initial parameters from
which curve_fit will start its search. In addition, bounds may be set on these parameters though
this should only be done if necessary as it may affect error estimation. In this example, the initial
parameters are very roughly set with an amplitude of 1 (expected number of cases at day 0) and
the exponential constant of 0.04 (positive curvature but slower growth). In general it is often useful
to plot the data and the fit function with guesses for best parameters before curve_fitting in
order to find good initial parameter values that give decent agreement between the data and the
function.
The return values from curve_fit are the optimal parameters found and the covariance matrix.
The estimated errors on the optimal parameters are the square root of the diagonal elements of
the covariance matrix. The example code prints the optimal values from the fit along with the
estimated parameter errors and the full covariance matrix, shown in Figure 8. Without going into
detail, the covariance matrix is symmetric and the off diagonal elements give the covariance between
the parameters.
16
Figure 8: Optimal parameters as determined by fit on first line followed by estimated error on this
parameters. Full covariance matrix printed at the end.
The cases are then plotted as usual with an additional label argument which sets the plot’s
name in the figure legend. Next, the best fit curve is plotted on top of the raw cases data by calling
the exp function with days as the x-data and the optimal parameters which are unpacked and
passed with *popt. Finally ax.legend() is called to construct the figure legend and plt.show()
is called to show the figure as in Figure 9.
Figure 9: Plot of cumulative US coronavirus cases vs date with best fit exponential function
superimposed and a legend drawn. Notice the x-axis title is cut off if the plot is saved directly.
Tweaking the plot in the interactive matplotlib window can quickly remedy this issue.
17
Part III
Nuclear Physics Specifics
While the previous part outlined a rather general type of analysis and showcased some of the
capabilities of Python, nuclear physics relies on some tools more heavily than the general Python
user community. Many nuclear physics experiments and analyses, at their core, tend to involve
counting the number of particles meeting some criteria. Counting data such as this are most often
visualized and analyzed as histograms which will be the focus of the first section. The next section
will give a full example analysis similar to what would be expected for an experiment in this course.
The general Python aspects of these examples will be less heavily discussed, favoring instead to
focus on the specifics important to nuclear physics.
1 Histograms
Histograms are tools used more heavily in nuclear physics than in general. For this reason, it is
prudent to show how to use and modify Python’s more popular tools in order to accurately set,
plot, and fit histograms.
The following example takes height and weight data found online and histograms the height
data. Python’s matplotlib package has a built in histogram method which works well for most use
cases. The major feature missing from plt.hist is the ability to draw error bars on the histogram.
As of the time of writing, there’s no workaround for this absence. Instead, the matplotlib bar charts
will be used. The histogrammed bins and counts will be produced by numpy’s histogram function
and the plotting will be done with matplotlib’s bar method.
The heights and weights are first read from the file with the read_heights_weights function
and returned as numpy arrays. The heights are then passed to the get_statistics function to
calculate the mean and standard deviation of the data set. The height data is then plotted naively
with the auto_plot_hist function which uses automatic binning and the bin_plot_hist which
takes an optional binning argument. The heights are then passed to the fit_gaus function which
more carefully plots the data and fits a Gaussian distribution.
def read_heights_weights(path):
"""
Read file at file_path and return height and weight data to as numpy arrays.
:param path: Path to data file.
:return heights: height data as a numpy float array
:return weights: weight data as a numpy float array
"""
heights = [] # Make a list to store height data
weights = [] # Make a list to store weight data
with open(path, 'r') as file: # Open file_path in read ('r') mode, reference it as file inside with scope
lines = file.readlines() # Read each line from file, lines is then a list of strings, one for each line in file
for line in lines[1:]: # Iterate over each line in lines, [1:] -> Skip first line to get rid of file headers
18
line = line.strip() # Remove any white space or newline at begging or end of line (optional)
line = line.split() # Split line on argument. With no argument, default is space or tab
heights.append(float(line[1])) # Convert element in second column (line[1]) to float, add to heights list
weights.append(float(line[2])) # Same as above but with element in third column of the line
return np.asarray(heights), np.asarray(weights) # Convert lists to numpy arrays to allow for easier math/plotting
def get_statistics(data):
"""
Get mean and standard deviation of data and print to terminal.
:param data: Sequence of data to calculate mean/sd of.
:return:
"""
mean = np.mean(data) # Calculate mean of data
sd = np.std(data) # Calculate standard deviation of data
print(f'Data mean: {mean}, standard deviation {sd}') # Print mean and standard deviation with f-string
def auto_plot_hist(data):
"""
Plot histogram of data with matplotlib hist method with the default 10 bins
:param data: Sequence of data to be histogrammed and plotted.
:return:
"""
fig, ax = plt.subplots() # Set matplotlib subplots, only needed when axis (ax) object needed for manipulation
ax.hist(data, align='mid') # Plot histogrammed data with bins centered in the middle of the bin range
plt.show()
counts, bin_edges = np.histogram(data, bins) # Histogram via numpy with binning passed, return counts and bin edges
bin_centers = 0.5 * (bin_edges[1:] + bin_edges[:-1]) # Calculate N bin centers from N+1 bin edges
bin_widths = bin_edges[:-1] - bin_edges[1:] # Calculate N bin widths from N+1 bin edges
counting_err = [np.sqrt(count) if count > 0 else 1 for count in counts] # Calculate counting uncertainties
ax.bar(bin_centers, counts, bin_widths, yerr=counting_err, zorder=1, label='height data') # Plot histogram via bar
19
def gaus(x, a, x0, sigma):
"""
Calculate and return gaussian value at point x
:param x: Point at which to compute gaussian
:param a: Amplitude of gaussian
:param x0: Center of gaussian (equal to mean)
:param sigma: Width of gaussian (equal to standard deviation)
:return: Value of gaussian at point x
"""
return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
1.2 Statistics
In the get_statistics function the mean and the standard deviation are calculated for the raw
(unhistogrammed) heights data. This is accomplished using numpy’s mean and std functions which
calculate these statistics for a sequence of data. The resulting statistics are then printed to the
terminal via an f-string.
It is important to note that when a data set is histogrammed, all information within each bin
is lost. For instance, if you have a bin from 1 to 4, you will be unable to distinguish a 2 from a 3
after histogramming and information is thus lost. For this reason, it is important to think carefully
before constructing a histogram and to pay attention to binning once one is constructed.
20
Figure 10: Histogram of height data made with the matplotlib hist method.
Figure 11: Histogram of height data made with the matplotlib hist method with a binning value
of 15 passed.
21
1.4 Fitting and Plotting with Errors
Fitting histograms is where analysis in Python requires a bit of extra work though it ensures the
user knows what they are doing. The scipy curve_fit function used in Section 4 will again be used
for fitting histograms. As before, the function will take the fitting function along with x and y data
lists of the same size. For histograms, the y-data corresponds to the number of counts in each bin
and the x-data usually corresponds to the center value of each bin (what point in the bin is used
is somewhat arbitrary but the center is a somewhat logical choice). The bin centers therefore need
to be calculated from the bin edges which fully define the binning. This can be done in Python in
one line utilizing the convenient list slicing, adding adjacent edges and dividing by two to get an
average. Once the histogram counts and bin centers have been obtained, fitting works just like for
a scatter plot as in Section 4.
In typical counting experiments (where histograms are most useful) uncertainties on each bin
can be estimated as the square root of the counts in that bin (with the exception of zero counts) as
will be discussed at some point in lecture. In order to plot these uncertainties on the histogram with
matplotlib, a bar chart will be used and slightly modified. In the example code the bin_centers
and counts are passed to the matplotlib bar method followed by the bin_widths. These bin widths
must also be calculated “manually” by subtracting each bin edge from the subsequent edge. Next
the calculated list of uncertainties, counting_err, is passed as the optional yerr parameter to
show the bin error bars. In addition, the zorder optional parameter determines how plot objects
are displayed in relation to each other. The lower zorder objects will be displayed on top of
(overlapping) higher zorder objects.
When data has associated uncertainties it is usually necessary to weigh fits on these uncertain-
ties. A list of weights are passed to curve_fit via the optional sigma parameter. These weights
can be arbitrary but, in the case that they are proper uncertainties for the y-data points, the
optional parameter absolute_sigma can passed as True which will allow the covariance matrix to
be properly calculated taking these errors into account.
In this example the height data is fit with a Gaussian function. This best fit curve is superim-
posed on the histogram along with a highly transparent (low alpha) red fill under the fit curve for
aesthetics (plotted on top of histogram with high zorder). One difference from previous examples
is that the x-points used to plot the Gaussian are manually set via numpys’s linspace function
which, in this example, generates a list of 300 equally spaced number from the first bin edge to the
last edge. This allows for a higher resolution (less jagged) function to be drawn as compared to
what would result from using the bin edges or bin centers.
The final new feature introduced is the matplotlib annotate method which is a slightly more
feature filled variant of the base text method. This allows the user to draw arbitrary text/an-
notations on the plot area. In this case, the optimal fit parameters along with their associated
uncertainties are written in a round, tan, transparent text box. The final plot is shown in Figure
12.
22
Figure 12: Histogram of height data with best fit Gaussian superimposed.
2 Earthquake Example
In this final, more comprehensive example analysis, Southern California earthquake data from
Caltech is used to demonstrate many of the features already introduced along with a few extras.
This example will have fewer comments and the description will focus on the overall analysis
along with specifics of the features not introduced in previous examples. While the text will be
pedagogical, the analysis itself will be similar to what is expected for experiments in this course.
The data is a rather large (2.7MB) file that lists individual earthquakes on each line with
earthquake characteristics separated by spaces (Figure 13). These earthquakes took place in 2020
between latitudes 32.0 and 37.0 and longitudes -122.0 and -114.0, roughly visualized in Figure 14.
Many questions can and undoubtedly are asked of this data by geologists and others. This example
will ignore much of the detail (which will help introduce filters) and focus on a simple question
within the purview of the author’s very limited knowledge of earthquakes.
23
Figure 13: First few lines of the Caltech earthquake data.
Figure 14: Map of the approximate area (shaded) of recorded earthquakes in the Caltech data set.
24
list is then sent to the plot_quake_days function where it is histogrammed with numpy, fit to a
Poisson distribution, and plotted with matplotlib.
The resulting plot of the filtered daily earthquake counts along with superimposed best fit
Poisson is shown in Figure 15. From this plot it is apparent that there are outlying data points,
specifically, the day with 48 recorded earthquakes as shown with a y-axis zoom in Figure 16. In
a more thorough analysis it would likely be beneficial to investigate this outlier, which would be
easy to do with the quakes list of dictionaries (find the date in which there were 48 earthquakes
meeting the criteria and then look into the quakes characteristics or cross reference this date with
events online). This analysis is more interested in modeling the data as Poisson, so the plot is
simply zoomed on the x-axis to produce Figure 17. From these plots it is apparent that while
the number of earthquakes each day aren’t a perfect Poisson process, a Poisson fit does a decent
job of describing the data. From here, this model could be used to estimate the probability of N
earthquakes occurring in a single day in Southern California. Other analyses in this same vein can
rather easily be carried out using modifications of this script.
Figure 15: Plot of number of days (y-axis) with number of earthquakes (x-axis) of magnitude
greater than or equal to 3 and less than 9.
25
Figure 16: Zooming in on the y-axis shows the days with a large number of earthquakes per day.
While a Poisson distribution is expected to have sizable probability for rare tail events, the number
of days with more than 5 earthquakes seems to deviate from the Poisson expectation, especially the
outlier with 48 earthquakes. This likely indicates a failure of the Poisson model and may indicate
some temporal correlations between earthquakes.
Figure 17: Zooming in on the x-axis shows that, while a Poisson model of the data certainly isn’t
perfect, it does capture much of the behavior and may, therefore, still be useful.
26
2.2 Analysis Code
def main():
"""
Plot histogram of daily number of SoCal earthquakes of magnitude between 3 and 9. Fit to Poisson distribution.
:return:
"""
path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex6_Discrete_Hist/' \
'caltech_earthquakes_2020.catalog.txt'
quakes = read_quakes(path)
quake_days = hist_daily_quakes(quakes, mag_3_filter)
plot_quake_days(quake_days)
print('donzo')
def read_quakes(path):
"""
Read caltech earthquake data at path. Return list of dictionaries with info about each quake.
:param path: Path to caltech 2020 earthquake data file
:return:
"""
quakes = []
with open(path, 'r') as file:
lines = file.readlines()
for line in lines:
new_quake_dict = {}
line = line.strip().split()
if len(line) != 13:
continue
try:
new_quake_dict.update({'date_time': dt.strptime(line[0] + ' ' + line[1], '%Y/%m/%d %H:%M:%S.%f')})
except ValueError:
continue
new_quake_dict.update({'et': line[2]})
new_quake_dict.update({'gt': line[3]})
new_quake_dict.update({'mag': float(line[4])})
new_quake_dict.update({'m': line[5]})
new_quake_dict.update({'lat': float(line[6])})
new_quake_dict.update({'lon': float(line[7])})
new_quake_dict.update({'depth': float(line[8])})
new_quake_dict.update({'q': line[9]})
new_quake_dict.update({'evid': int(line[10])})
new_quake_dict.update({'nph': int(line[11])})
new_quake_dict.update({'ngrm': int(line[12])})
quakes.append(new_quake_dict)
return quakes
return list(quake_days.values())
def plot_quake_days(quake_days):
counts, bin_edges = np.histogram(quake_days, np.arange(-0.5, 50.5, 1))
27
bin_centers = (bin_edges[1:] + bin_edges[:-1]) / 2
counting_err = [np.sqrt(count) if count > 0 else 1 for count in counts]
plt.bar(bin_centers, counts, 1.0, yerr=counting_err, label='quake data')
def mag_3_filter(quake):
"""
Given a quake dictionary, return True if the magnitude is between 3 and 9
:param quake: Quake dictionary. Must include 'mag' keyword with corresponding magnitude value of the earthquake.
:return: True if quake magnitude greater than or equal to three and less than 9
"""
if 3 <= quake['mag'] < 9:
return True
else:
return False
28
The filter function takes as it’s first argument a criteria function and a data sequence as its
second. This function must take as an input parameter an element of the data sequence and
return True if this element should be kept and False if the element should be filtered out. The
filter function then returns a filtered sequence of data which can then be manipulated. Lambda
functions are often used as the filter criteria if the expression is simple enough.
29
Part IV
Full Scripts
The full scripts used in this document’s examples can be found below.
1 Hello World
1 #!/usr/bin/env python3
2 # -*- coding: utf-8 -*-
3 """
4 Created on December 14 2:06 PM 2020
5 Created in PyCharm
6 Created as UCLA_Nuclear_Lab/hello_world
7
8 @author: Dylan Neff, Dylan
9 """
10
11
12 def main():
13 print('hello world') # Print 'hello world' string to the terminal
14
15
16 if __name__ == '__main__':
17 main() # If this file is run as a script directly, this statement will execute and main will be run
30
2 Read File
1 #!/usr/bin/env python3
2 # -*- coding: utf-8 -*-
3 """
4 Created on December 14 3:21 PM 2020
5 Created in PyCharm
6 Created as UCLA_Nuclear_Lab/read_file
7
8 @author: Dylan Neff, Dylan
9 """
10
11
12 def main():
13 """
14 Read US corona virus case data from file and print to terminal.
15 :return:
16 """
17
18 path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex2_Read_File/us_12_14_20.csv'
19 dates, cases = read_cases(path)
20 print_cases(dates, cases)
21 print('donzo')
22
23
24 def read_cases(path):
25 """
26 Read corona virus case data from file at the input path and return lists of dates and case numbers
27 :param path: Path to corona virus case data
28 :return: List of dates and list of cases
29 """
30
31 dates = [] # List of dates, left as strings
32 cases = [] # List of cases, converted to integers
33
34 with open(path, 'r') as file: # Open the contents at given path and name this opened object file
35 lines = file.readlines() # Read the opened file into a list with each entry a string of the line
36 for line in lines[1:]: # Iterate over each line, skipping the first line since it's a header
37 line_data = line.split(',') # Split each line string into a list of items separated by commas
38 dates.append(line_data[0]) # Append zeroth element of line_data to list of dates
39 cases.append(int(line_data[1])) # Append first element of line_data to list of cases as an integer
40
41 return dates, cases # Return the two lists
42
43
44 def print_cases(dates, cases):
45 """
46 Print the corona virus date and case data passed as lists
47 :param dates: List of dates
48 :param cases: List of cases
49 :return:
50 """
51
52 print(dates) # Print the list of dates
53 print(cases) # Print the list of cases
54
55 for date, case in zip(dates, cases): # Iterate through dates and cases list together
56 print(f'date: {date}, cases: {case}') # Print each date/case pair on a new terminal line
57
58
59 if __name__ == '__main__':
60 main()
31
3 Plotting
1 #!/usr/bin/env python3
2 # -*- coding: utf-8 -*-
3 """
4 Created on December 14 4:37 PM 2020
5 Created in PyCharm
6 Created as UCLA_Nuclear_Lab/plotting
7
8 @author: Dylan Neff, Dylan
9 """
10
11
12 import numpy as np
13 import matplotlib.pyplot as plt
14 from matplotlib.dates import DateFormatter
15 from datetime import datetime as dt
16
17
18 def main():
19 """
20 Read US corona virus case data from file and plot.
21 :return:
22 """
23
24 path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex2_Read_File/us_12_14_20.csv'
25 dates, cases = read_cases(path)
26 plot_cases_simple(cases)
27 plot_cases(dates, cases)
28 plot_case_rate(dates, cases)
29 print('donzo')
30
31
32 def read_cases(path):
33 """
34 Read corona virus case data from file at the input path and return lists of dates and case numbers
35 :param path: Path to corona virus case data
36 :return: List of dates and list of cases
37 """
38
39 dates = [] # List of dates, left as strings
40 cases = [] # List of cases, converted to integers
41
42 with open(path, 'r') as file: # Open the contents at given path and name this opened object file
43 lines = file.readlines() # Read the opened file into a list with each entry a string of the line
44 for line in lines[1:]: # Iterate over each line, skipping the first line since it's a header
45 line_data = line.split(',') # Split each line string into a list of items separated by commas
46 dates.append(line_data[0]) # Append zeroth element of line_data to list of dates
47 cases.append(int(line_data[1])) # Append first element of line_data to list of cases as an integer
48
49 return dates, cases # Return the two lists
50
51
52 def plot_cases_simple(cases):
53 """
54 Plot cumulative US corona virus cases vs the case index using matplotlib.
55 :param cases: List of coronavirus case numbers for each day
56 :return:
57 """
58 case_indices = range(len(cases)) # Produce a list of integers from {0, 1, ..., number of cases}
59 plt.plot(case_indices, cases) # Plot case indices on x-axis and number of cases on y-axis.
60 plt.show() # Once plot is set up and ready, call this to display the plot. Not displayed if this is not called
61
62
63 def plot_cases(dates, cases):
64 """
65 Plot cumulative US corona virus cases vs the date using matplotlib.
32
66 :param dates: List of dates for coronoavirus cases
67 :param cases: List of coronavirus case numbers for each corresponding date in dates
68 :return:
69 """
70 fig, ax = plt.subplots() # Set matplotlib subplots, only needed when axis (ax) object needed for manipulation
71 ax.set_title('Total Covid Cases in US') # Set title of plot
72
73 dates = [dt.strptime(date, '%Y-%m-%d').date() for date in dates] # Convert date strings to datetime objects
74
75 formatter = DateFormatter('%m/%d') # Create date formatting for plot
76 ax.xaxis.set_major_formatter(formatter) # Set date formatting on plot
77 ax.xaxis.set_tick_params(rotation=30) # Set rotation of date labels to 30 degrees
78
79 plt.plot(dates, cases) # Plot dates on x-axis and cases on y-axis. This can be called alone if no formatting needed
80 plt.show() # Once plot is set up and ready, call this to display the plot. Not displayed if this is not called
81
82
83 def plot_case_rate(dates, cases):
84 """
85 Plot number of daily new coronavirus cases in US vs the date. Also represent estimated errors with filled bands.
86 :param dates: List of dates for coronoavirus cases
87 :param cases: List of coronavirus case numbers for each corresponding date in dates
88 :return:
89 """
90 rates = np.asarray([cases[i+1] - cases[i] for i in range(len(cases) - 1)]) # Calculate rate, daily change in cases
91 rates_err = np.sqrt(rates) # Estimate uncertainties on rates based on counting statistics, not robust
92
93 fig, ax = plt.subplots() # Set matplotlib subplots, only needed when axis (ax) object needed for manipulation
94 ax.set_title('Daily New Covid Cases in US') # Set plot title
95
96 dates = [dt.strptime(date, '%Y-%m-%d').date() for date in dates] # Convert date strings to datetime objects
97
98 formatter = DateFormatter('%m/%d') # Create date formatting for plot
99 ax.xaxis.set_major_formatter(formatter) # Set date formatting on plot
100 ax.xaxis.set_tick_params(rotation=30) # Set rotation of date labels to 30 degrees
101
102 plt.plot(dates[1:], rates, color='green') # Plot dates (skipping first) on x-axis rates on y, set color to green
103 plt.fill_between(dates[1:], rates - rates_err, rates + rates_err, facecolor='green', alpha=0.3,
104 interpolate=True) # Plot uncertainty bands about data, alpha is transparency
105
106 plt.show() # Once plot is set up and ready, call this to display the plot. Not displayed if this is not called.
107
108
109 if __name__ == '__main__':
110 main()
33
4 Fitting
1 #!/usr/bin/env python3
2 # -*- coding: utf-8 -*-
3 """
4 Created on December 16 2:27 PM 2020
5 Created in PyCharm
6 Created as UCLA_Nuclear_Lab/fitting
7
8 @author: Dylan Neff, Dyn04
9 """
10
11
12 import numpy as np
13 import matplotlib.pyplot as plt
14 from matplotlib.dates import DateFormatter
15 from datetime import datetime as dt
16 from scipy.optimize import curve_fit
17
18
19 def main():
20 """
21 Read US corona virus case data from file and fit with exponential.
22 :return:
23 """
24
25 path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex2_Read_File/us_12_14_20.csv'
26 dates, cases = read_cases(path)
27 fit_cases(dates, cases)
28 print('donzo')
29
30
31 def read_cases(path):
32 """
33 Read corona virus case data from file at the input path and return lists of dates and case numbers
34 :param path: Path to corona virus case data
35 :return: List of dates and list of cases
36 """
37
38 dates = [] # List of dates, left as strings
39 cases = [] # List of cases, converted to integers
40
41 with open(path, 'r') as file: # Open the contents at given path and name this opened object file
42 lines = file.readlines() # Read the opened file into a list with each entry a string of the line
43 for line in lines[1:]: # Iterate over each line, skipping the first line since it's a header
44 line_data = line.split(',') # Split each line string into a list of items separated by commas
45 dates.append(line_data[0]) # Append zeroth element of line_data to list of dates
46 cases.append(int(line_data[1])) # Append first element of line_data to list of cases as an integer
47
48 return dates, cases # Return the two lists
49
50
51 def fit_cases(dates, cases):
52 """
53 Fit the US cumulative coronavirus cases vs date data to an exponential function and plot the result
54 :param dates: List of dates for coronoavirus cases
55 :param cases: List of coronavirus case numbers for each corresponding date in dates
56 :return:
57 """
58 fig, ax = plt.subplots() # Set matplotlib subplots, only needed when axis (ax) object needed for manipulation
59 ax.set_title('Total Covid Cases in US with Exponential Fit') # Set title of plot
60 ax.set_xlabel('Days since Jan 21 2020') # Set x-axis title
61
62 formatter = DateFormatter('%m/%d') # Create date formatting for plot
63 ax.xaxis.set_major_formatter(formatter) # Set date formatting on plot
64 ax.xaxis.set_tick_params(rotation=30) # Set rotation of date labels to 30 degrees
65
34
66 dates = [dt.strptime(date, '%Y-%m-%d').date() for date in dates] # Convert date strings to datetime objects
67
68 # Convert to days since first day for fitting, int64 for large integers, default int32 max is 5236926
69 days = np.asarray([(date - dates[0]).days for date in dates], dtype=np.int64)
70 cases = np.asarray(cases, dtype=np.int64) # Convert cases from Python list to numpy array
71
72 pinit = [1, 0.04] # Set initial exponential parameters for fitting
73 popt, pcov = curve_fit(exp, days, cases, p0=pinit) # Fit data to exponential
74 print(f'Optimal parameters: {popt}') # Print list of best fit parameters
75 print(f'Error on parameters: {np.sqrt(np.diag(pcov))}') # Print estimated parameter errors on fit
76 print(f'Covariance matrix:\n{pcov}') # Print fit covariance matrix
77
78 plt.plot(dates, cases, color='blue', label='data') # Plot dates on x-axis and cases on y-axis
79 plt.plot(dates, exp(days, *popt), 'r--', label='fit') # Plot best fit curve in red dashes
80
81 ax.legend() # Display the legend on plot
82
83 plt.show() # Once plot is set up and ready, call this to display the plot. Not displayed if this is not called
84
85
86 def exp(x, a, lam):
87 """
88 Exponential function
89 :param x: Data x values
90 :param a: Amplitude
91 :param lam: Decay/growth parameter
92 :return: Value of exponential
93 """
94 return a * np.exp(lam * x)
95
96
97 if __name__ == '__main__':
98 main()
35
5 Histograms
1 #!/usr/bin/env python3
2 # -*- coding: utf-8 -*-
3 """
4 Created on January 11 1:44 PM 2020
5 Created in PyCharm
6 Created as UCLA_Nuclear_Lab/read_txt_example
7
8 @author: Dylan Neff, Dylan
9 """
10
11 import numpy as np
12 import matplotlib.pyplot as plt
13 from scipy.optimize import curve_fit
14
15
16 def main():
17 path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex5_Plot_Hist/height_weight.txt'
18 heights, weights = read_heights_weights(path)
19 get_statistics(heights)
20 auto_plot_hist(heights)
21 bin_plot_hist(heights, 15)
22 fit_gaus(heights, 15)
23 print('donzo')
24
25
26 def read_heights_weights(path):
27 """
28 Read file at file_path and return height and weight data to as numpy arrays.
29 :param path: Path to data file.
30 :return heights: height data as a numpy float array
31 :return weights: weight data as a numpy float array
32 """
33 heights = [] # Make a list to store height data
34 weights = [] # Make a list to store weight data
35 with open(path, 'r') as file: # Open file_path in read ('r') mode, reference it as file inside with scope
36 lines = file.readlines() # Read each line from file, lines is then a list of strings, one for each line in file
37 for line in lines[1:]: # Iterate over each line in lines, [1:] -> Skip first line to get rid of file headers
38 line = line.strip() # Remove any white space or newline at begging or end of line (optional)
39 line = line.split() # Split line on argument. With no argument, default is space or tab
40 heights.append(float(line[1])) # Convert element in second column (line[1]) to float, add to heights list
41 weights.append(float(line[2])) # Same as above but with element in third column of the line
42
43 return np.asarray(heights), np.asarray(weights) # Convert lists to numpy arrays to allow for easier math/plotting
44
45
46 def get_statistics(data):
47 """
48 Get mean and standard deviation of data and print to terminal.
49 :param data: Sequence of data to calculate mean/sd of.
50 :return:
51 """
52 mean = np.mean(data) # Calculate mean of data
53 sd = np.std(data) # Calculate standard deviation of data
54 print(f'Data mean: {mean}, standard deviation {sd}') # Print mean and standard deviation with f-string
55
56
57 def auto_plot_hist(data):
58 """
59 Plot histogram of data with matplotlib hist method with the default 10 bins
60 :param data: Sequence of data to be histogrammed and plotted.
61 :return:
62 """
63 fig, ax = plt.subplots() # Set matplotlib subplots, only needed when axis (ax) object needed for manipulation
64 ax.hist(data, align='mid') # Plot histogrammed data with bins centered in the middle of the bin range
65 plt.show()
36
66
67
68 def bin_plot_hist(data, bins=10):
69 """
70 Plot histogram of data with matplotlib hist method with input optional bins parameter passed for binning
71 :param data: Sequence of data to be histogrammed and plotted
72 :param bins: Optional parameter with default 10. Passed to matplotlib hist as histogram binning
73 :return:
74 """
75 fig, ax = plt.subplots() # Set matplotlib subplots, only needed when axis (ax) object needed for manipulation
76 ax.hist(data, bins, align='mid') # Plot histogrammed data with input binning with bins centered middle of range
77 plt.show()
78
79
80 def fit_gaus(data, bins=10):
81 """
82 Histogram data and plot manually as bar chart with uncertainties. Fit gaussian to histogram and superimpose fit.
83 :param data: Data to be histogrammed and fit
84 :param bins: Optional parameter for binning
85 :return:
86 """
87 fig, ax = plt.subplots() # Set matplotlib subplots, only needed when axis (ax) object needed for manipulation
88
89 counts, bin_edges = np.histogram(data, bins) # Histogram via numpy with binning passed, return counts and bin edges
90 bin_centers = 0.5 * (bin_edges[1:] + bin_edges[:-1]) # Calculate N bin centers from N+1 bin edges
91 bin_widths = bin_edges[:-1] - bin_edges[1:] # Calculate N bin widths from N+1 bin edges
92 counting_err = [np.sqrt(count) if count > 0 else 1 for count in counts] # Calculate counting uncertainties
93 ax.bar(bin_centers, counts, bin_widths, yerr=counting_err, zorder=1, label='height data') # Plot histogram via bar
94
95 pinit = [20, 68, 3] # Set initial conditions for gaussian fit
96 popt, pcov = curve_fit(gaus, bin_centers, counts, p0=pinit, sigma=counting_err, absolute_sigma=True) # Fit to gaus
97 perr = np.sqrt(np.diag(pcov)) # Calculate uncertainties on optimal parameters via sqrt of diagonals of cov matrix
98 print(f'Optimal parameters: {popt}') # Print list of best fit parameters
99 print(f'Error on parameters: {np.sqrt(np.diag(pcov))}') # Print estimated parameter errors on fit
100 print(f'Covariance matrix:\n{pcov}') # Print fit covariance matrix
101 gaus_range = np.linspace(bin_edges[0], bin_edges[-1], 300) # Set range of points to plot for fit gaussian
102 ax.plot(gaus_range, gaus(gaus_range, *popt), 'r--', zorder=3, label='gaus fit') # Plot best fit gaus over hist
103 ax.fill_between(gaus_range, gaus(gaus_range, *popt), 0, color='red', alpha=0.2, zorder=2) # Fill area under gaus
104 ax.legend() # Draw legend on plot.
105 ax.annotate(f'a: {popt[0]:.1f}±{perr[0]:.1f}\nx0: {popt[1]:.1f}±{perr[1]:.1f}\nsigma: {popt[2]:.1f}±{perr[2]:.1f}',
106 xy=(0.05, 0.95), xycoords='axes fraction', bbox=dict(boxstyle='round', facecolor='tan', alpha=0.3),
107 verticalalignment='top') # Display box on plot with optimal fit parameters
108 plt.show()
109
110
111 def gaus(x, a, x0, sigma):
112 """
113 Calculate and return gaussian value at point x
114 :param x: Point at which to compute gaussian
115 :param a: Amplitude of gaussian
116 :param x0: Center of gaussian (equal to mean)
117 :param sigma: Width of gaussian (equal to standard deviation)
118 :return: Value of gaussian at point x
119 """
120 return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
121
122
123 if __name__ == '__main__':
124 main()
37
6 Earthquakes
1 #!/usr/bin/env python3
2 # -*- coding: utf-8 -*-
3 """
4 Created on December 22 9:37 PM 2020
5 Created in PyCharm
6 Created as UCLA_Nuclear_Lab/hist_earthquakes
7
8 @author: Dylan Neff, Dyn04
9 """
10
11
12 import numpy as np
13 import matplotlib.pyplot as plt
14 from datetime import datetime as dt
15 from scipy.optimize import curve_fit
16 from scipy.special import factorial
17
18
19 def main():
20 """
21 Plot histogram of daily number of SoCal earthquakes of magnitude between 3 and 9. Fit to Poisson distribution.
22 :return:
23 """
24 path = 'C:/Users/Dylan/PycharmProjects/UCLA_Nuclear_Lab/Example_Code/Ex6_Discrete_Hist/' \
25 'caltech_earthquakes_2020.catalog.txt'
26 quakes = read_quakes(path)
27 quake_days = hist_daily_quakes(quakes, mag_3_filter)
28 plot_quake_days(quake_days)
29 print('donzo')
30
31
32 def read_quakes(path):
33 """
34 Read caltech earthquake data at path. Return list of dictionaries with info about each quake.
35 :param path: Path to caltech 2020 earthquake data file
36 :return:
37 """
38 quakes = []
39 with open(path, 'r') as file:
40 lines = file.readlines()
41 for line in lines:
42 new_quake_dict = {}
43 line = line.strip().split()
44 if len(line) != 13:
45 continue
46 try:
47 new_quake_dict.update({'date_time': dt.strptime(line[0] + ' ' + line[1], '%Y/%m/%d %H:%M:%S.%f')})
48 except ValueError:
49 continue
50 new_quake_dict.update({'et': line[2]})
51 new_quake_dict.update({'gt': line[3]})
52 new_quake_dict.update({'mag': float(line[4])})
53 new_quake_dict.update({'m': line[5]})
54 new_quake_dict.update({'lat': float(line[6])})
55 new_quake_dict.update({'lon': float(line[7])})
56 new_quake_dict.update({'depth': float(line[8])})
57 new_quake_dict.update({'q': line[9]})
58 new_quake_dict.update({'evid': int(line[10])})
59 new_quake_dict.update({'nph': int(line[11])})
60 new_quake_dict.update({'ngrm': int(line[12])})
61 quakes.append(new_quake_dict)
62
63 return quakes
64
65
38
66 def hist_daily_quakes(quakes, criteria=lambda quake: True):
67 """
68 Given a list of earthquake dictionaries and a filter function, generate a list of the number earthquakes that
69 occurred each day passing the filter criteria.
70 :param quakes: Quake dictionary. Must include 'mag' keyword with corresponding magnitude value of the earthquake.
71 :param criteria: Function to determine which earthquakes to count per day. Defaults to counting all quakes each day.
72 :return: List of number of quakes passing the given criteria each day.
73 """
74 quake_days = {}
75 for quake in quakes:
76 if quake['date_time'].date() not in quake_days:
77 quake_days.update({quake['date_time'].date(): 0})
78
79 quakes = filter(criteria, quakes)
80 for quake in quakes:
81 quake_days[quake['date_time'].date()] += 1
82
83 return list(quake_days.values())
84
85
86 def plot_quake_days(quake_days):
87 counts, bin_edges = np.histogram(quake_days, np.arange(-0.5, 50.5, 1))
88 bin_centers = (bin_edges[1:] + bin_edges[:-1]) / 2
89 counting_err = [np.sqrt(count) if count > 0 else 1 for count in counts]
90 plt.bar(bin_centers, counts, 1.0, yerr=counting_err, label='quake data')
91
92 pinit = [100, 1] # Very rough initial guess
93 popt, pcov = curve_fit(poisson, bin_centers, counts, p0=pinit)
94 perr = np.sqrt(np.diag(pcov))
95 print(f'Optimal parameters: {popt}') # Print list of best fit parameters
96 print(f'Error on parameters: {np.sqrt(np.diag(pcov))}') # Print estimated parameter errors on fit
97 print(f'Covariance matrix:\n{pcov}') # Print fit covariance matrix
98 poisson_range = np.linspace(0, bin_edges[-1], 300)
99 plt.plot(poisson_range, poisson(poisson_range, *popt), 'r--', label='poisson fit')
100 plt.title('Number of Daily Earthquakes with 3 <= mag < 9')
101 plt.ylabel('Number of Days')
102 plt.xlabel('Number of Earthquakes in Day')
103 plt.annotate(f'a: {popt[0]:.0f}±{perr[0]:.0f}\nlam: {popt[1]:.3f}±{perr[1]:.3f}',
104 xy=(0.5, 0.95), xycoords='axes fraction', bbox=dict(boxstyle='round', facecolor='tan', alpha=0.3),
105 verticalalignment='top', horizontalalignment='center')
106 plt.legend()
107 plt.show()
108
109
110 def mag_3_filter(quake):
111 """
112 Given a quake dictionary, return True if the magnitude is between 3 and 9
113 :param quake: Quake dictionary. Must include 'mag' keyword with corresponding magnitude value of the earthquake.
114 :return: True if quake magnitude greater than or equal to three and less than 9
115 """
116 if 3 <= quake['mag'] < 9:
117 return True
118 else:
119 return False
120
121
122 def poisson(k, a, lam):
123 return a * lam ** k / factorial(k) * np.exp(-lam)
124
125
126 if __name__ == '__main__':
127 main()
39