How Dewaxing LLM Works

The Dewaxing LLM project uses large language models to automate the extraction and structuring of job summaries and run details from WellView job logs. This enables more efficient data capture, reporting, and optimization of dewaxing operations.

Model Background

Job Summary

Modeling Strategy

For dewaxing jobs in WellView, comments from the job time log are extracted and used as input for the large language model (LLM). A labeled dataset is used to create prompt instructions and provide examples for few-shot prompting. A subset of unseen labeled examples is reserved for evaluation.

Objective: The model aims to generate the number of runs, relevant dates, and a comprehensive job summary. Summaries are created for each day, and the final job summary is post-processed to create daily summaries.

Example Input and Output:

Log Data:

Job Id: FD1D2182A580450DAFC7607319174610 Well Name: 7GEN HZ 102 KARR 13-11-64-4 Pad Name: 16-16-064-04W6 Job Category: Workover Time Log Start Date: 2022-08-23 00:00:00 Time Log End Date: 2022-08-23 11:00:00 Comments: Well flowing through Production … (additional time log entries)

Summary Output:

Job Summary:
Date: [“2022-08-31”]
Number of Runs: 10
Summary:
Crew time on well: 300 minutes (11:30 to 16:30).
4 runs with barbed spear reaching depths from 100 m to 750 m, retrieving brown medium wax (100 L flowed back per run).
3 runs with wax knife to 750 m, retrieving brown medium wax: 2 runs at 100%, then 1 run at 10% (100 L flowed back per run).
2 runs with gauge, no issues, to 750 m and 2763 m.
1 run with BHBS assembly set in profile at 2763 m.
Bumper spring preparation: Yes (BHBS installed and plunger cycling).
Daily Summary:
Crew time on well: 300 minutes (11:30 to 16:30).
4 runs with barbed spear reaching depths from 100 m to 750 m, retrieving brown medium wax (100 L flowed back per run).
3 runs with wax knife to 750 m, retrieving brown medium wax: 2 runs at 100%, then 1 run at 10% (100 L flowed back per run).
2 runs with gauge, no issues, to 750 m and 2763 m.
1 run with BHBS assembly set in profile at 2763 m.
Bumper spring preparation: Yes (BHBS installed and plunger cycling).

Model Selection and Configuration

At the time of this report, the model used “Claude Sonnet 4” and few-shot prompting with structured output.

For the most up-to-date model pipeline, refer to the main branch of the code repository:
src/data_science/models/sources/fewshot_llm_model.py

For configuration, model serving, prompts, few-shot examples, and output structures, see:
src/data_science/models/sources/job_summary_llm_config.yaml

Model Update History

Version	Date	Updates
1		Initial release of model

Using The Model

The model is registered in the Unity Catalog at:
prd_zone3.dewaxingllm.job_summary_llm_model

Inference results are stored at:
prd_zone3.dewaxingllm.job_summary_inference

Key Columns in the Inference Table:

Column	Type	Description
APIUWI	string	Unique well identifier
WellID	string	WellView ID for the well
JOBID	string	WellView ID for the job
JobStartDate	timestamp	Start date of the job
JobEndDate	timestamp	End date of the job
PRIMARYJOBTYPE	string	Type of the job
SECONDARYJOBTYPE	string	Sub type of the job
LogData	string	Concatenated string containing job and time log details for LLM input
job_summary_predicted	struct	Extracted job summary from LogData using LLM
Date	array	Dates, extracted from job_summary_predicted
NumberOfRuns	int	Number of runs, extracted from job_summary_predicted
JobSummary	string	Job summary, extracted from job_summary_predicted
DailySummary	array	Daily summary, extracted from job_summary_predicted
mlflow_run_id	string	Unique identifier for each MLflow run
model_registry_name	string	Three-level name of the registered model in Unity Catalog (catalog.schema.model)
model_type	string	Type of the machine learning model used
model_version	string	Version of the machine learning model used
load_datetime_utc	timestamp	Date and time of data loading in Coordinated Universal Time (UTC)

Integration with WellView API:

Job summaries are sent to the summary field in the Jobs table.
Daily summaries are sent to the summary field in the Daily Operations table.
All AI-generated values are tagged with ‘AI Generated’.

Run Details

Modeling Strategy

For dewaxing jobs in WellView, comments from the job time log are extracted and used as input for the LLM. The objective is to identify and extract details from each run within a job. To achieve this, a labeled dataset is used to create prompt instructions and provide examples for few-shot prompting. A subset of unseen labeled examples is reserved for evaluation.

Data Structure and Field Descriptions:

SwabNo (integer): Sequential numbers beginning with 1 at the start of the job.
StartDate (string): Timestamp in the format YYYY-MM-DD HH:MM:SS.
SolventOrSteam (string; enum: “0”, “0.1”, “1”, “1.1”):
- 0 = no steam, no solvent
- 0.1 = solvent only
- 1 = steam only
- 1.1 = steam and solvent
- 0 if not mentioned.
WaxProperties (string or null; enum):
- Wax type code determined by properties mentioned in the log.
- null if not mentioned for that run.
WaxPercentage (number or null):
- Decimal value of wax percentage.
- 0.0 if the log indicates no wax was observed.
- null if percentage is not mentioned or cannot be inferred.
DepthPull (number or null):
- Depth reached during the run in meters.
- null if not mentioned.
VolFluidRec (number or null):
- Volume in cubic meters (e.g., 0.05 for 50 L).
- null if not mentioned.
TempWH (number or null):
- Wellhead temperature in Celsius. Usually recorded only for the first swab (SwabNo 1) at the start of the log.
- null if not mentioned.
Com (string or null; enum: null, “spear”, “wax knife”, “gauge ring”, “bumper spring”, “plunger”):
- Specify the tool used only if it matches listed categories.
- null if no tool is mentioned or does not match.

Example Input and Output:

Log Data:

Job Id: 0F95E3E0E3104E5684F6E55DD5A9D062 Well Name: 7GEN HZ KAKWA 4-30-64-4 Pad Name: 16-17-064-04W6 Job Category: Workover Time Log Start Date: 2019-12-17 00:00:00 Time Log End Date: 2019-12-18 00:00:00 Comments: Record pressures FTP=2992kPa SICP=5764kPa 09:30 - RIH with spear to 10m, flow back to P-tank 100L 09:45 - RIH with spear to 50m, flow back to P-tank 100L … (additional time log entries)

Extracted Run Details:

SwabNo: 1 StartDate: 2019-12-17 09:30:00 SolventOrSteam: ‘0’ WaxProperties: null WaxPercentage: null DepthPull: 10.0 VolFluidRec: 0.1 TempWH: null Com: spear, AI Generated
SwabNo: 2 StartDate: 2019-12-17 09:45:00 SolventOrSteam: ‘0’ WaxProperties: null WaxPercentage: null DepthPull: 50.0 VolFluidRec: 0.1 TempWH: null Com: spear, AI Generated (Additional runs follow the same structure.)

Model Selection and Configuration

At the time of writing, the model used was “llama 4 maverick” with few-shot prompting and structured output.

For the latest model pipeline, refer to the main branch of the code repository:
src/data_science/models/sources/fewshot_llm_model.py

For configuration, model serving, prompts, few-shot examples, and output structures, see:
src/data_science/models/sources/run_details_llm_config.yaml

Model Update History

Version	Date	Updates
1		Initial release of model

Model Usage

The model is registered in the Unity Catalog at:
prd_zone3.dewaxingllm.run_details_llm_model

Inference results are stored at:
prd_zone3.dewaxingllm.run_details_inference

Key Columns in the Inference Table:

Column	Type	Description
APIUWI	string	Unique well identifier
WellID	string	WellView ID for the well
JOBID	string	WellView ID for the job
JobStartDate	timestamp	Start date of the job
JobEndDate	timestamp	End date of the job
PRIMARYJOBTYPE	string	Type of the job
SECONDARYJOBTYPE	string	Sub type of the job
LogData	string	Concatenated string containing job and time log details for LLM input
run_details_predicted	struct	Extracted run details from LogData using LLM
SwabNo	bigint	Sequential run number within the job, extracted from run_details_predicted
StartDate	timestamp	Date the run occurred, extracted from run_details_predicted
SolventOrSteam	string	Coded value indicating steam and/or solvent usage, extracted from run_details_predicted, captured as PresTub column in WellView database
WaxProperties	string	Coded value describing wax color and hardness, extracted from run_details_predicted, captured as PresCas column in WellView database
WaxPercentage	double	Percentage of tool capacity filled with wax (null if unspecified, 0 if noted as empty), extracted from run_details_predicted, captured as TankGauge column in WellView database
DepthPull	double	Maximum depth reached during the run, extracted from run_details_predicted
VolFluidRec	double	Volume of fluid recovered (in cubic meters), extracted from run_details_predicted
TempWH	double	Wellhead temperature; entered on the first swab when mentioned, extracted from run_details_predicted
Com	string	Tool used for the run and AI Generated, extracted from run_details_predicted
mlflow_run_id	string	Unique identifier for each MLflow run
model_registry_name	string	Three-level name of the registered model in Unity Catalog (catalog.schema.model)
model_type	string	Type of the machine learning model used
model_version	string	Version of the machine learning model used
load_datetime_utc	timestamp	Date and time of data loading in Coordinated Universal Time (UTC)

Integration with WellView API:

Job run details are sent to the Swab Details table.
All AI-generated values are tagged as ‘AI Generated’.

Pipeline Overview

Training Workflow: prd-dewaxingllm-training

The training workflow consists of several steps to ensure robust model development and deployment:

Source Data Validation: Source data is tested to confirm adherence to expected schema. If the data does not meet these criteria, the pipeline will fail.

#	Name	Schema	Validations	Key Notes
1	jobtimelog_v1	prd_zone2.wellviewetl.jobtimelogv1	Schema	Columns: STARTDATE, COMMENTS, ENDDATE, IDJOB
2	job_v1	prd_zone2.wellviewetl.jobv1	Schema	Columns: JOBCATEGORY, ID, PRIMARYJOBTYPE, SECONDARYJOBTYPE, STARTDATE, ENDDATE
3	wells_v1	prd_zone2.wellviewetl.wellsv1	Schema	Columns: PADNAME, WELLNAME, APIUWI, ID

For detailed validation criteria, refer to src/tests/data/conftest.yml.

Feature Engineering: The feature engineering pipeline processes the entire historical dataset, consolidating time logs for each job.
Model Training: Model training is executed in parallel for both job summary and run details models:
- job_summary_training
- run_details_training
Model Validation: A set of labeled sample data is curated for both job summary and run details tasks, and stored in:
- src/data_science/models/sources/job_summary_labeled_samples.yaml
- src/data_science/models/sources/run_details_labeled_samples.yaml
  After training, the model is evaluated using these labeled samples. Final accuracy metrics are calculated and logged in both the metrics table and MLflow. Model validation compares these metrics against the defined criteria. If the criteria are met, the model advances to the deployment step. Validation is performed in parallel for both models:
- Job Summary Model Validation Metrics:
  - Compares ‘NumberOfRuns’ (difference ≤ 1), ‘Date’ arrays (exact match), and ‘JobSummary’ character length (difference ≤ 500).
  - Computes per-column accuracy and overall accuracy across all columns.
- Job Summary Model Validation Criteria:
  - Overall Accuracy: The model must achieve at least 80% overall accuracy.
  - Number of Runs Accuracy: At least 70% accuracy in correctly identifying the number of runs.
  - Date Accuracy: At least 70% accuracy in extracting relevant dates.
  - Job Summary Length Accuracy: At least 70% accuracy in generating job summaries of appropriate length.
- Run Details Model Validation Metrics:
  - Checks for exact matches between all columns, including StartDate, SolventOrSteam, WaxProperties, WaxPercentage, DepthPull, VolFluidRec, TempWH, and Com.
  - Calculates per-column accuracy and overall accuracy across all columns.
- Run Details Model Validation Criteria:
  - Overall Accuracy: The model must achieve at least 80% overall accuracy.
  - Start Date Accuracy: At least 70% accuracy in correctly identifying start dates.
  - Solvent or Steam Usage Accuracy: At least 70% accuracy in identifying solvent or steam usage.
  - Wax Properties Accuracy: At least 70% accuracy in identifying wax properties.
  - Wax Percentage Accuracy: At least 70% accuracy in identifying wax percentage.
  - Depth Pull Accuracy: At least 70% accuracy in identifying the depth reached during each run.
  - Recovered Fluid Volume Accuracy: At least 70% accuracy in identifying the volume of fluid recovered.
  - Wellhead Temperature Accuracy: At least 70% accuracy in identifying wellhead temperature.
  - Tool Used Accuracy: At least 70% accuracy in identifying the tool used in each run.
  - Extra Predictions: The proportion of extra (unnecessary) predictions must not exceed 10%.
  - Missing Predictions: The proportion of missing (unreported) predictions must not exceed 10%.

This pipeline runs in parallel for both models:
- job_summary_model_validation
- run_details_model_validation

Model Deployment: Once training is complete, the model is assigned the “Challenger” alias. If the model passes all minimum validation criteria, it is promoted to “Champion.” This pipeline runs in parallel for both models:
- job_summary_model_deployment
- run_details_model_deployment
Inference: Inference is performed on a small subset of data to test and create the inference table if not already available. This runs in parallel for both models:
- job_summary_inference
- run_details_inference

Output Data Validation: Output data is validated for schema, comment structure, null percentage, value ranges, etc. If the data does not meet these criteria, the pipeline will fail.

#	Name	Schema	Validations	Key Notes
1	input_features	prd_zone3.dewaxingllm.input_features	Schema, Allowed Values, Null Thresholds, Compound Uniqueness, Column Comments	Allowed values for job types; uniqueness on (JOBID, WELLID); all key fields non-null
2	rundetailsmetrics	prd_zone3.dewaxingllm.run_details_metrics	Schema, Ranges, Null Thresholds, Compound Uniqueness, Column Comments	Accuracy metrics 0–100; uniqueness on (mlflow_run_id, load_datetime_utc); all columns non-null
3	rundetailsprediction	prd_zone3.dewaxingllm.run_details_prediction	Schema, Null Thresholds, Compound Uniqueness, Column Comments	All critical fields non-null; uniqueness on (WellID, JOBID, mlflow_run_id, load_datetime_utc)
4	rundetailsinference	prd_zone3.dewaxingllm.run_details_inference	Schema, Null Thresholds, Compound Uniqueness, Column Comments	Partial nulls allowed (e.g., WaxProperties ≤ 99%); uniqueness on (WellID, JOBID, SwabNo)
5	jobsummarymetrics	prd_zone3.dewaxingllm.job_summary_metrics	Schema, Ranges, Null Thresholds, Compound Uniqueness, Column Comments	Accuracy fields 0–100; uniqueness on (mlflow_run_id, load_datetime_utc); all columns non-null
6	jobsummaryprediction	prd_zone3.dewaxingllm.job_summary_prediction	Schema, Null Thresholds, Compound Uniqueness, Column Comments	No nulls in core fields; uniqueness on (WellID, JOBID, mlflow_run_id, load_datetime_utc)
7	jobsummaryinference	prd_zone3.dewaxingllm.job_summary_inference	Schema, Ranges, Null Thresholds, Compound Uniqueness, Column Comments	Allows small null tolerance (JobSummary ≤ 10%); NumberOfRuns ≥ 0; uniqueness on (WellID, JOBID)

For more details, refer to src/tests/data/conftest.yml.

Schedule: This pipeline is not scheduled for automatic retraining and will run on demand.

Inference Workflow: prd-dewaxingllm-inference

The inference pipeline performs the following steps to update feature and inference tables and send data to the Peloton WellView API. Tasks are parameterized to support real-time, backfill, or CI testing processes. For backfill, the time range of jobs and job IDs can be specified.

Feature Engineering: Runs feature engineering for real-time or backfill processes.
Inference: Executes inference for both models in parallel, with parameters defining the type of inference (CI_test, backfill, realtime) and the time range or job IDs for backfill.
- job_summary_inference
- run_details_inference
WellView API Push: Sends data from inference tables to WellView tables. The operation type (CI_test, backfill, realtime), time range, job IDs, and model selection (jobsummary, run_details, or both) can be specified.

Rules for API Integration:

Job Summary:
- Data is sent to the summary column in the Jobs (wvJob) table with the system tag ‘AI Generated’.
- If an existing summary exceeds 100 characters and does not begin with ‘Summary generated by AI’, it will not be updated.
- The summary will be updated if it is missing, shorter than 100 characters, or begins with ‘Summary generated by AI’.
- All AI-generated summaries begin with ‘Summary generated by AI’.
- If the summary exceeds WellView’s 2000-character limit, the existing human-generated summary is retained or a default message is written: “AI-generated job summary exceeded the 2000-character limit. Please see the daily summaries for full details.”
Daily Summary:
- Data is sent to the summary (summaryops) column in the Daily Operations ([wvJobReport]) table with the system tag ‘AI Generated’.
- The same character limit and update rules apply as for job summaries.
Run Details: Data is sent to the Swab Details (wvSwabDetails) table, populating available fields:
- SwabNo: Sequential run number (integer)
- StartDate: Time of the run (datetime)
- PresTub (SolventOrSteam): Solvent or steam usage (double)
- PresCas (WaxProperties): Wax properties (double)
- TankGauge WaxPercentage: Wax percentage (double)
- DepthPull: Maximum depth reached (double, meters)
- VolFluidRec: Volume of fluid recovered (double, cubic meters)
- TempWH: Wellhead temperature (double, °C; entered only for the first swab)
- Com: Tool used for the run, marked as AI Generated (string, max 2000 characters)
- If the Swabs Table (wvSwab) does not exist, a new record is created using the run start time.
- All new records are tagged as ‘AI Generated’.
- Existing swab records are retained, and new AI-generated records are added.
- All Com columns include the ‘AI Generated’ tag.
API Integration Logs: All logs with error/success status are saved in prd_zone3.dewaxingllm.api_integration_log.

Schedule: The pipeline runs daily at 3:30 AM Mountain Time.

Monitoring Workflow: prd-dewaxingllm-monitoring

The monitoring pipeline tracks both job summary and run details models to detect anomalies, missing data, performance degradation, and logs/alerts for action.

Job Summary Monitoring:
- Verifies new data has been added within the past week.
- Checks for null values outside defined ranges within the past week.
- Identifies missing records or errors between API integration logs and inference tables within the past week.
- Uses a custom LLM to judge completeness, formatting, and relevance of a sample of data from the past week.
Run Details Monitoring:
- Verifies new data has been added within the past week.
- Checks for null values outside defined ranges within the past week.
- Identifies missing records or errors between API integration logs and inference tables within the past week.
- Uses a custom LLM to judge formatting and relevance of a sample of data from the past week.
Overall Model Monitoring and Alerts:
- All models are monitored weekly, with results stored in prd_zone3.dewaxingllm.monitoring_summary.
- The model_drifted_overall column flags any failed monitoring checks, with additional columns providing details on specific issues.
- If any table has model_drifted_overall = 1, a SQL query in the Databricks workspace automatically triggers an email alert.

Schedule: The monitoring pipeline runs weekly on Tuesdays at 1 AM Mountain Time.

Edit on GitHub