I’m sure that the importance of the oil and gas industry, as well as the loads of money generated from it, doesn’t need any introduction. However, what needs to be mentioned is the importance of knowing the quality of the reservoir to determine if it is worth spending money, time, and labor to exploit it.
This is why we created this script using machine learning and well-log data. It checks the porosity, which is the capacity of the reservoir to store fluids in the pores, the permeability, which is the connectivity of those pores that allows the fluid to circulate through the reservoir, and the saturation, which is the percentage of a fluid (oil in particular) in the pores. The script labels these parameters as good or bad to determine if the reservoir quality is good or bad.
Let’s get started!
Table of Contents
- Necessary Libraries
- Imports
- Loading Data
- Plotting Well Logs
- Evaluating Reservoir Quality
- Training The Model
- Determining Exploitation Suitability
- Main Function to Load and Analyze Data
- Creating the GUI
- Running the GUI
- Example
- Full Code
Necessary Libraries
Let’s set up everything for the code to function properly. Install the following libraries by running these commands:
$ pip install pandas
$ pip install numpy
$ pip install matplotlib
$ pip install seaborn
$ pip install scikit-learn
$ pip install tk
Imports
Before we dive into the exciting world of predicting reservoir quality using machine learning and well-log data, we need to gather our tools. Think of it as assembling our superhero kit! Now, let’s gear up and import the libraries we’ll be using today:
- pandas: to manipulate and analyze data.
- numpy: to provide support for multi-dimensional arrays and matrices.
- matplotlib.pyplot and seaborn: for data visualization, basically for plotting and drawing informative statistical graphics.
- train_test_split: to split the dataset into training and testing sets.
- LinearRegression: to perform linear regression.
- mean_squared_error and r2_score: to evaluate the performance of the regression model.
- tkinter: to create a graphical user interface.
- filedialog, messagebox, and Text: to navigate through directories, use message boxes, and text widgets.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import tkinter as tk
from tkinter import filedialog, messagebox, Text
Loading Data
This is where we load the CSV file we want to analyze and return it as a DataFrame using pandas.read_csv()
.
def load_data(file_path):
return pd.read_csv(file_path)
Plotting Well Logs
After loading our log data, it is time to visualize it. The plot_well_logs()
function creates different subplots, each representing a different well log parameter against depth. Each parameter is plotted using sns.lineplot()
, with the parameter in question as the y-axis and the depth as the x-axis.
This function also adds labels and titles to each subplot for clarification and uses plt.tight_layout()
to add some padding between plots. Finally, the plots are displayed with plt.show()
.
def plot_well_logs(data):
fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(25, 10), sharey=True)
sns.lineplot(x='GR', y='Depth', data=data, ax=ax[0], color='g')
ax[0].set_title('Gamma Ray (GR)')
ax[0].invert_yaxis()
ax[0].set_ylabel('Depth (ft)')
ax[0].set_xlabel('GR (API)')
sns.lineplot(x='Resistivity', y='Depth', data=data, ax=ax[1], color='b')
ax[1].set_title('Resistivity')
ax[1].set_xlabel('Resistivity (ohm.m)')
sns.lineplot(x='Porosity', y='Depth', data=data, ax=ax[2], color='r')
ax[2].set_title('Porosity')
ax[2].set_xlabel('Porosity (fraction)')
sns.lineplot(x='Saturation', y='Depth', data=data, ax=ax[3], color='m')
ax[3].set_title('Saturation')
ax[3].set_xlabel('Saturation (fraction)')
sns.lineplot(x='Permeability', y='Depth', data=data, ax=ax[4], color='c')
ax[4].set_title('Permeability')
ax[4].set_xlabel('Permeability (mD)')
plt.tight_layout()
plt.show()
Evaluating Reservoir Quality
Now that we have the visualization of our well logs, the next step is to decode the quality of our reservoir:
How do we do that? By checking if our reservoir’s log data meets the criteria for Porosity, Permeability, and Saturation using np.select()
. This function applies the criteria and assigns the corresponding label (good or bad) to each column of the log data.
def evaluate_reservoir(data):
conditions = [
(data['Porosity'] >= 0.15) & (data['Saturation'] >= 0.75) & (data['Permeability'] >= 100),
(data['Porosity'] < 0.15) | (data['Saturation'] < 0.75) | (data['Permeability'] < 100)
]
choices = ['Good', 'Bad']
data['Reservoir_Quality'] = np.select(conditions, choices, default='Bad')
return data
Training The Model
What do you think comes after establishing the conditions that make a reservoir good quality? It’s time to train our model to recognize those conditions! That’s the goal of our train_model()
function. This function begins by setting up the conditions as our independent variables (x) and oil production as our dependent variable (y). We then split this data into training and testing sets using train_test_split()
.
Next, we create a Linear Regression model that learns from the training data and makes predictions on the test set. To see how well our model performs, we use mean_squared_error()
and r2_score()
. Finally, the function returns all these results, giving us a clear picture of our model’s accuracy and effectiveness.
def train_model(data):
X = data[['GR', 'Resistivity', 'Porosity', 'Saturation', 'Permeability']]
y = data['Oil_Production']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
return model, X_test, y_test, y_pred, mse, r2
Determining Exploitation Suitability
With our log data visualized, conditions established, and model trained, the next step is to determine if the well is worthy of exploitation. The determine_exploitation()
function calculates the average of the predicted and actual production. It then compares the predicted production to 80% of the actual production. If the predicted production is higher, the reservoir quality is deemed good, and exploitation is recommended. Otherwise, the quality is considered bad, and exploitation is not recommended.
def determine_exploitation(data, y_test, y_pred):
avg_pred_production = np.mean(y_pred)
avg_actual_production = np.mean(y_test)
reservoir_quality = data['Reservoir_Quality'].value_counts().idxmax()
if avg_pred_production > avg_actual_production * 0.8 and reservoir_quality == 'Good':
return "This well is good for exploitation."
else:
return "This well is not ideal for exploitation."
Main Function to Load and Analyze Data
We have finally reached the brain of our script: the load_and_analyze()
function that manages and orchestrates all the previous functions. How does it do this? Let’s go through the process step by step:
First, it uses filedialog
to select a CSV file. Once the user selects a file, it calls the load_data()
function to load the data as previously described. Then, it calls the evaluate_reservoir()
function to assess the reservoir quality. If all steps up to this point are correct, you should receive a success message indicating that the data is loaded and analyzed.
Next, the plot_well_logs()
function is called to visualize the well logs. Following this, the script trains our model using the train_model()
function. Finally, the determine_exploitation()
function is used to judge whether the reservoir can be exploited or not. The results and data are then displayed in the GUI text boxes.
def load_and_analyze():
file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
if not file_path:
return
try:
well_log_data = load_data(file_path)
well_log_data = evaluate_reservoir(well_log_data)
messagebox.showinfo("Success", "Data loaded and analyzed successfully.")
plot_well_logs(well_log_data)
model, X_test, y_test, y_pred, mse, r2 = train_model(well_log_data)
messagebox.showinfo("Success", "Model trained successfully. Check console for performance metrics.")
result = determine_exploitation(well_log_data, y_test, y_pred)
messagebox.showinfo("Result", result)
# Display results in the text box
results_text.delete("1.0", tk.END)
results_text.insert(tk.END, f"Mean Squared Error: {mse}\n")
results_text.insert(tk.END, f"R-squared: {r2}\n")
results_text.insert(tk.END, f"Reservoir Quality: {well_log_data['Reservoir_Quality'].value_counts().idxmax()}\n")
results_text.insert(tk.END, f"Exploitation Recommendation: {result}\n")
# Rearrange columns to have Reservoir_Quality next to Oil_Production
cols = list(well_log_data.columns)
cols.insert(cols.index('Oil_Production') + 1, cols.pop(cols.index('Reservoir_Quality')))
well_log_data = well_log_data[cols]
# Display the DataFrame in the text box
data_text.delete("1.0", tk.END)
data_text.insert(tk.END, well_log_data.to_string())
except Exception as e:
messagebox.showerror("Error", f"An error occurred: {e}")
Creating the GUI
Welcome to the command center of our script! This is where all the magic happens, combining the elements we created earlier into one seamless interface. Enter the create_gui()
function, your control panel for this adventure.
First, it crafts the main window, setting the title and defining its geometry. Then, it rolls out a canvas for layout purposes and sets up a frame to house the “Load and Analyze Well Log” button, which triggers our powerful load_and_analyze()
function.
But that’s not all! It also builds a results_frame
and data_frame
to display results and data through the results_text
and data_text
widgets. Finally, it launches the main event loop with root.mainloop()
, ensuring our command center remains active and responsive to your commands. Get ready to take control and see your data come to life!
def create_gui():
global results_text, data_text
root = tk.Tk()
root.title("Well Log Data Analyzer and Reservoir Quality Predictor - The Pycodes")
root.geometry("1000x700")
canvas = tk.Canvas(root, height=600, width=800)
canvas.pack()
frame = tk.Frame(root, bg='#80c1ff', bd=5)
frame.place(relx=0.5, rely=0.1, relwidth=0.8, relheight=0.2, anchor='n')
button = tk.Button(frame, text="Load and Analyze Well Log Data", padx=10, pady=5, fg="white", bg="#263D42",
command=load_and_analyze)
button.pack()
results_frame = tk.Frame(root, bg='#80c1ff', bd=5)
results_frame.place(relx=0.5, rely=0.35, relwidth=0.8, relheight=0.25, anchor='n')
results_text = Text(results_frame, wrap=tk.WORD)
results_text.pack(expand=True, fill='both')
data_frame = tk.Frame(root, bg='#80c1ff', bd=5)
data_frame.place(relx=0.5, rely=0.65, relwidth=0.8, relheight=0.3, anchor='n')
data_text = Text(data_frame, wrap=tk.WORD)
data_text.pack(expand=True, fill='both')
root.mainloop()
Running the GUI
Lastly, this final part of the code ensures that the create_gui()
function can only be triggered when this script is run directly and not imported as a module.
if __name__ == "__main__":
create_gui()
Example
Full Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import tkinter as tk
from tkinter import filedialog, messagebox, Text
def load_data(file_path):
return pd.read_csv(file_path)
def plot_well_logs(data):
fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(25, 10), sharey=True)
sns.lineplot(x='GR', y='Depth', data=data, ax=ax[0], color='g')
ax[0].set_title('Gamma Ray (GR)')
ax[0].invert_yaxis()
ax[0].set_ylabel('Depth (ft)')
ax[0].set_xlabel('GR (API)')
sns.lineplot(x='Resistivity', y='Depth', data=data, ax=ax[1], color='b')
ax[1].set_title('Resistivity')
ax[1].set_xlabel('Resistivity (ohm.m)')
sns.lineplot(x='Porosity', y='Depth', data=data, ax=ax[2], color='r')
ax[2].set_title('Porosity')
ax[2].set_xlabel('Porosity (fraction)')
sns.lineplot(x='Saturation', y='Depth', data=data, ax=ax[3], color='m')
ax[3].set_title('Saturation')
ax[3].set_xlabel('Saturation (fraction)')
sns.lineplot(x='Permeability', y='Depth', data=data, ax=ax[4], color='c')
ax[4].set_title('Permeability')
ax[4].set_xlabel('Permeability (mD)')
plt.tight_layout()
plt.show()
def evaluate_reservoir(data):
conditions = [
(data['Porosity'] >= 0.15) & (data['Saturation'] >= 0.75) & (data['Permeability'] >= 100),
(data['Porosity'] < 0.15) | (data['Saturation'] < 0.75) | (data['Permeability'] < 100)
]
choices = ['Good', 'Bad']
data['Reservoir_Quality'] = np.select(conditions, choices, default='Bad')
return data
def train_model(data):
X = data[['GR', 'Resistivity', 'Porosity', 'Saturation', 'Permeability']]
y = data['Oil_Production']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
return model, X_test, y_test, y_pred, mse, r2
def determine_exploitation(data, y_test, y_pred):
avg_pred_production = np.mean(y_pred)
avg_actual_production = np.mean(y_test)
reservoir_quality = data['Reservoir_Quality'].value_counts().idxmax()
if avg_pred_production > avg_actual_production * 0.8 and reservoir_quality == 'Good':
return "This well is good for exploitation."
else:
return "This well is not ideal for exploitation."
def load_and_analyze():
file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
if not file_path:
return
try:
well_log_data = load_data(file_path)
well_log_data = evaluate_reservoir(well_log_data)
messagebox.showinfo("Success", "Data loaded and analyzed successfully.")
plot_well_logs(well_log_data)
model, X_test, y_test, y_pred, mse, r2 = train_model(well_log_data)
messagebox.showinfo("Success", "Model trained successfully. Check console for performance metrics.")
result = determine_exploitation(well_log_data, y_test, y_pred)
messagebox.showinfo("Result", result)
# Display results in the text box
results_text.delete("1.0", tk.END)
results_text.insert(tk.END, f"Mean Squared Error: {mse}\n")
results_text.insert(tk.END, f"R-squared: {r2}\n")
results_text.insert(tk.END, f"Reservoir Quality: {well_log_data['Reservoir_Quality'].value_counts().idxmax()}\n")
results_text.insert(tk.END, f"Exploitation Recommendation: {result}\n")
# Rearrange columns to have Reservoir_Quality next to Oil_Production
cols = list(well_log_data.columns)
cols.insert(cols.index('Oil_Production') + 1, cols.pop(cols.index('Reservoir_Quality')))
well_log_data = well_log_data[cols]
# Display the DataFrame in the text box
data_text.delete("1.0", tk.END)
data_text.insert(tk.END, well_log_data.to_string())
except Exception as e:
messagebox.showerror("Error", f"An error occurred: {e}")
def create_gui():
global results_text, data_text
root = tk.Tk()
root.title("Well Log Data Analyzer and Reservoir Quality Predictor - The Pycodes")
root.geometry("1000x700")
canvas = tk.Canvas(root, height=600, width=800)
canvas.pack()
frame = tk.Frame(root, bg='#80c1ff', bd=5)
frame.place(relx=0.5, rely=0.1, relwidth=0.8, relheight=0.2, anchor='n')
button = tk.Button(frame, text="Load and Analyze Well Log Data", padx=10, pady=5, fg="white", bg="#263D42",
command=load_and_analyze)
button.pack()
results_frame = tk.Frame(root, bg='#80c1ff', bd=5)
results_frame.place(relx=0.5, rely=0.35, relwidth=0.8, relheight=0.25, anchor='n')
results_text = Text(results_frame, wrap=tk.WORD)
results_text.pack(expand=True, fill='both')
data_frame = tk.Frame(root, bg='#80c1ff', bd=5)
data_frame.place(relx=0.5, rely=0.65, relwidth=0.8, relheight=0.3, anchor='n')
data_text = Text(data_frame, wrap=tk.WORD)
data_text.pack(expand=True, fill='both')
root.mainloop()
if __name__ == "__main__":
create_gui()
Happy Coding!