How Dewaxing LLM Works
The Dewaxing LLM project uses large language models to automate the extraction and structuring of job summaries and run details from WellView job logs. This enables more efficient data capture, reporting, and optimization of dewaxing operations.
Model Background
Job Summary
Modeling Strategy
For dewaxing jobs in WellView, comments from the job time log are extracted and used as input for the large language model (LLM). A labeled dataset is used to create prompt instructions and provide examples for few-shot prompting. A subset of unseen labeled examples is reserved for evaluation.
Objective: The model aims to generate the number of runs, relevant dates, and a comprehensive job summary. Summaries are created for each day, and the final job summary is post-processed to create daily summaries.
Example Input and Output:
Log Data:
Job Id: FD1D2182A580450DAFC7607319174610 Well Name: 7GEN HZ 102 KARR 13-11-64-4 Pad Name: 16-16-064-04W6 Job Category: Workover Time Log Start Date: 2022-08-23 00:00:00 Time Log End Date: 2022-08-23 11:00:00 Comments: Well flowing through Production … (additional time log entries)
Summary Output:
- Job Summary:
Date: [“2022-08-31”]
Number of Runs: 10
Summary:
Crew time on well: 300 minutes (11:30 to 16:30).
4 runs with barbed spear reaching depths from 100 m to 750 m, retrieving brown medium wax (100 L flowed back per run).
3 runs with wax knife to 750 m, retrieving brown medium wax: 2 runs at 100%, then 1 run at 10% (100 L flowed back per run).
2 runs with gauge, no issues, to 750 m and 2763 m.
1 run with BHBS assembly set in profile at 2763 m.
Bumper spring preparation: Yes (BHBS installed and plunger cycling). - Daily Summary:
Crew time on well: 300 minutes (11:30 to 16:30).
4 runs with barbed spear reaching depths from 100 m to 750 m, retrieving brown medium wax (100 L flowed back per run).
3 runs with wax knife to 750 m, retrieving brown medium wax: 2 runs at 100%, then 1 run at 10% (100 L flowed back per run).
2 runs with gauge, no issues, to 750 m and 2763 m.
1 run with BHBS assembly set in profile at 2763 m.
Bumper spring preparation: Yes (BHBS installed and plunger cycling).
Model Selection and Configuration
At the time of this report, the model used “Claude Sonnet 4” and few-shot prompting with structured output.
For the most up-to-date model pipeline, refer to the main branch of the code repository:src/data_science/models/sources/fewshot_llm_model.py
For configuration, model serving, prompts, few-shot examples, and output structures, see:src/data_science/models/sources/job_summary_llm_config.yaml
Model Update History
| Version | Date | Updates |
|---|---|---|
| 1 | Initial release of model |
Using The Model
The model is registered in the Unity Catalog at:prd_zone3.dewaxingllm.job_summary_llm_model
Inference results are stored at:prd_zone3.dewaxingllm.job_summary_inference
Key Columns in the Inference Table:
| Column | Type | Description |
|---|---|---|
| APIUWI | string | Unique well identifier |
| WellID | string | WellView ID for the well |
| JOBID | string | WellView ID for the job |
| JobStartDate | timestamp | Start date of the job |
| JobEndDate | timestamp | End date of the job |
| PRIMARYJOBTYPE | string | Type of the job |
| SECONDARYJOBTYPE | string | Sub type of the job |
| LogData | string | Concatenated string containing job and time log details for LLM input |
| job_summary_predicted | struct | Extracted job summary from LogData using LLM |
| Date | array | Dates, extracted from job_summary_predicted |
| NumberOfRuns | int | Number of runs, extracted from job_summary_predicted |
| JobSummary | string | Job summary, extracted from job_summary_predicted |
| DailySummary | array | Daily summary, extracted from job_summary_predicted |
| mlflow_run_id | string | Unique identifier for each MLflow run |
| model_registry_name | string | Three-level name of the registered model in Unity Catalog (catalog.schema.model) |
| model_type | string | Type of the machine learning model used |
| model_version | string | Version of the machine learning model used |
| load_datetime_utc | timestamp | Date and time of data loading in Coordinated Universal Time (UTC) |
Integration with WellView API:
- Job summaries are sent to the summary field in the Jobs table.
- Daily summaries are sent to the summary field in the Daily Operations table.
- All AI-generated values are tagged with ‘AI Generated’.
Run Details
Modeling Strategy
For dewaxing jobs in WellView, comments from the job time log are extracted and used as input for the LLM. The objective is to identify and extract details from each run within a job. To achieve this, a labeled dataset is used to create prompt instructions and provide examples for few-shot prompting. A subset of unseen labeled examples is reserved for evaluation.
Data Structure and Field Descriptions:
- SwabNo (integer): Sequential numbers beginning with 1 at the start of the job.
- StartDate (string): Timestamp in the format YYYY-MM-DD HH:MM:SS.
- SolventOrSteam (string; enum: “0”, “0.1”, “1”, “1.1”):
- 0 = no steam, no solvent
- 0.1 = solvent only
- 1 = steam only
- 1.1 = steam and solvent
- 0 if not mentioned.
- WaxProperties (string or null; enum):
- Wax type code determined by properties mentioned in the log.
- null if not mentioned for that run.
- WaxPercentage (number or null):
- Decimal value of wax percentage.
- 0.0 if the log indicates no wax was observed.
- null if percentage is not mentioned or cannot be inferred.
- DepthPull (number or null):
- Depth reached during the run in meters.
- null if not mentioned.
- VolFluidRec (number or null):
- Volume in cubic meters (e.g., 0.05 for 50 L).
- null if not mentioned.
- TempWH (number or null):
- Wellhead temperature in Celsius. Usually recorded only for the first swab (SwabNo 1) at the start of the log.
- null if not mentioned.
- Com (string or null; enum: null, “spear”, “wax knife”, “gauge ring”, “bumper spring”, “plunger”):
- Specify the tool used only if it matches listed categories.
- null if no tool is mentioned or does not match.
Example Input and Output:
Log Data:
Job Id: 0F95E3E0E3104E5684F6E55DD5A9D062 Well Name: 7GEN HZ KAKWA 4-30-64-4 Pad Name: 16-17-064-04W6 Job Category: Workover Time Log Start Date: 2019-12-17 00:00:00 Time Log End Date: 2019-12-18 00:00:00 Comments: Record pressures FTP=2992kPa SICP=5764kPa 09:30 - RIH with spear to 10m, flow back to P-tank 100L 09:45 - RIH with spear to 50m, flow back to P-tank 100L … (additional time log entries)
Extracted Run Details:
- SwabNo: 1 StartDate: 2019-12-17 09:30:00 SolventOrSteam: ‘0’ WaxProperties: null WaxPercentage: null DepthPull: 10.0 VolFluidRec: 0.1 TempWH: null Com: spear, AI Generated
- SwabNo: 2 StartDate: 2019-12-17 09:45:00 SolventOrSteam: ‘0’ WaxProperties: null WaxPercentage: null DepthPull: 50.0 VolFluidRec: 0.1 TempWH: null Com: spear, AI Generated (Additional runs follow the same structure.)
Model Selection and Configuration
At the time of writing, the model used was “llama 4 maverick” with few-shot prompting and structured output.
For the latest model pipeline, refer to the main branch of the code repository:src/data_science/models/sources/fewshot_llm_model.py
For configuration, model serving, prompts, few-shot examples, and output structures, see:src/data_science/models/sources/run_details_llm_config.yaml
Model Update History
| Version | Date | Updates |
|---|---|---|
| 1 | Initial release of model |
Model Usage
The model is registered in the Unity Catalog at:prd_zone3.dewaxingllm.run_details_llm_model
Inference results are stored at:prd_zone3.dewaxingllm.run_details_inference
Key Columns in the Inference Table:
| Column | Type | Description |
|---|---|---|
| APIUWI | string | Unique well identifier |
| WellID | string | WellView ID for the well |
| JOBID | string | WellView ID for the job |
| JobStartDate | timestamp | Start date of the job |
| JobEndDate | timestamp | End date of the job |
| PRIMARYJOBTYPE | string | Type of the job |
| SECONDARYJOBTYPE | string | Sub type of the job |
| LogData | string | Concatenated string containing job and time log details for LLM input |
| run_details_predicted | struct | Extracted run details from LogData using LLM |
| SwabNo | bigint | Sequential run number within the job, extracted from run_details_predicted |
| StartDate | timestamp | Date the run occurred, extracted from run_details_predicted |
| SolventOrSteam | string | Coded value indicating steam and/or solvent usage, extracted from run_details_predicted, captured as PresTub column in WellView database |
| WaxProperties | string | Coded value describing wax color and hardness, extracted from run_details_predicted, captured as PresCas column in WellView database |
| WaxPercentage | double | Percentage of tool capacity filled with wax (null if unspecified, 0 if noted as empty), extracted from run_details_predicted, captured as TankGauge column in WellView database |
| DepthPull | double | Maximum depth reached during the run, extracted from run_details_predicted |
| VolFluidRec | double | Volume of fluid recovered (in cubic meters), extracted from run_details_predicted |
| TempWH | double | Wellhead temperature; entered on the first swab when mentioned, extracted from run_details_predicted |
| Com | string | Tool used for the run and AI Generated, extracted from run_details_predicted |
| mlflow_run_id | string | Unique identifier for each MLflow run |
| model_registry_name | string | Three-level name of the registered model in Unity Catalog (catalog.schema.model) |
| model_type | string | Type of the machine learning model used |
| model_version | string | Version of the machine learning model used |
| load_datetime_utc | timestamp | Date and time of data loading in Coordinated Universal Time (UTC) |
Integration with WellView API:
- Job run details are sent to the Swab Details table.
- All AI-generated values are tagged as ‘AI Generated’.
Pipeline Overview
Training Workflow: prd-dewaxingllm-training
The training workflow consists of several steps to ensure robust model development and deployment:
Source Data Validation: Source data is tested to confirm adherence to expected schema. If the data does not meet these criteria, the pipeline will fail.
# Name Schema Validations Key Notes 1 jobtimelog_v1 prd_zone2.wellviewetl.jobtimelogv1 Schema Columns: STARTDATE, COMMENTS, ENDDATE, IDJOB 2 job_v1 prd_zone2.wellviewetl.jobv1 Schema Columns: JOBCATEGORY, ID, PRIMARYJOBTYPE, SECONDARYJOBTYPE, STARTDATE, ENDDATE 3 wells_v1 prd_zone2.wellviewetl.wellsv1 Schema Columns: PADNAME, WELLNAME, APIUWI, ID For detailed validation criteria, refer to
src/tests/data/conftest.yml.Feature Engineering: The feature engineering pipeline processes the entire historical dataset, consolidating time logs for each job.
Model Training: Model training is executed in parallel for both job summary and run details models:
- job_summary_training
- run_details_training
Model Validation: A set of labeled sample data is curated for both job summary and run details tasks, and stored in:
src/data_science/models/sources/job_summary_labeled_samples.yamlsrc/data_science/models/sources/run_details_labeled_samples.yaml
After training, the model is evaluated using these labeled samples. Final accuracy metrics are calculated and logged in both the metrics table and MLflow. Model validation compares these metrics against the defined criteria. If the criteria are met, the model advances to the deployment step. Validation is performed in parallel for both models:- Job Summary Model Validation Metrics:
- Compares ‘NumberOfRuns’ (difference ≤ 1), ‘Date’ arrays (exact match), and ‘JobSummary’ character length (difference ≤ 500).
- Computes per-column accuracy and overall accuracy across all columns.
- Job Summary Model Validation Criteria:
- Overall Accuracy: The model must achieve at least 80% overall accuracy.
- Number of Runs Accuracy: At least 70% accuracy in correctly identifying the number of runs.
- Date Accuracy: At least 70% accuracy in extracting relevant dates.
- Job Summary Length Accuracy: At least 70% accuracy in generating job summaries of appropriate length.
- Run Details Model Validation Metrics:
- Checks for exact matches between all columns, including StartDate, SolventOrSteam, WaxProperties, WaxPercentage, DepthPull, VolFluidRec, TempWH, and Com.
- Calculates per-column accuracy and overall accuracy across all columns.
- Run Details Model Validation Criteria:
- Overall Accuracy: The model must achieve at least 80% overall accuracy.
- Start Date Accuracy: At least 70% accuracy in correctly identifying start dates.
- Solvent or Steam Usage Accuracy: At least 70% accuracy in identifying solvent or steam usage.
- Wax Properties Accuracy: At least 70% accuracy in identifying wax properties.
- Wax Percentage Accuracy: At least 70% accuracy in identifying wax percentage.
- Depth Pull Accuracy: At least 70% accuracy in identifying the depth reached during each run.
- Recovered Fluid Volume Accuracy: At least 70% accuracy in identifying the volume of fluid recovered.
- Wellhead Temperature Accuracy: At least 70% accuracy in identifying wellhead temperature.
- Tool Used Accuracy: At least 70% accuracy in identifying the tool used in each run.
- Extra Predictions: The proportion of extra (unnecessary) predictions must not exceed 10%.
- Missing Predictions: The proportion of missing (unreported) predictions must not exceed 10%.
- This pipeline runs in parallel for both models:
- job_summary_model_validation
- run_details_model_validation
Model Deployment: Once training is complete, the model is assigned the “Challenger” alias. If the model passes all minimum validation criteria, it is promoted to “Champion.” This pipeline runs in parallel for both models:
- job_summary_model_deployment
- run_details_model_deployment
Inference: Inference is performed on a small subset of data to test and create the inference table if not already available. This runs in parallel for both models:
- job_summary_inference
- run_details_inference
Output Data Validation: Output data is validated for schema, comment structure, null percentage, value ranges, etc. If the data does not meet these criteria, the pipeline will fail.
# Name Schema Validations Key Notes 1 input_features prd_zone3.dewaxingllm.input_features Schema, Allowed Values, Null Thresholds, Compound Uniqueness, Column Comments Allowed values for job types; uniqueness on (JOBID, WELLID); all key fields non-null 2 rundetailsmetrics prd_zone3.dewaxingllm.run_details_metrics Schema, Ranges, Null Thresholds, Compound Uniqueness, Column Comments Accuracy metrics 0–100; uniqueness on (mlflow_run_id, load_datetime_utc); all columns non-null 3 rundetailsprediction prd_zone3.dewaxingllm.run_details_prediction Schema, Null Thresholds, Compound Uniqueness, Column Comments All critical fields non-null; uniqueness on (WellID, JOBID, mlflow_run_id, load_datetime_utc) 4 rundetailsinference prd_zone3.dewaxingllm.run_details_inference Schema, Null Thresholds, Compound Uniqueness, Column Comments Partial nulls allowed (e.g., WaxProperties ≤ 99%); uniqueness on (WellID, JOBID, SwabNo) 5 jobsummarymetrics prd_zone3.dewaxingllm.job_summary_metrics Schema, Ranges, Null Thresholds, Compound Uniqueness, Column Comments Accuracy fields 0–100; uniqueness on (mlflow_run_id, load_datetime_utc); all columns non-null 6 jobsummaryprediction prd_zone3.dewaxingllm.job_summary_prediction Schema, Null Thresholds, Compound Uniqueness, Column Comments No nulls in core fields; uniqueness on (WellID, JOBID, mlflow_run_id, load_datetime_utc) 7 jobsummaryinference prd_zone3.dewaxingllm.job_summary_inference Schema, Ranges, Null Thresholds, Compound Uniqueness, Column Comments Allows small null tolerance (JobSummary ≤ 10%); NumberOfRuns ≥ 0; uniqueness on (WellID, JOBID) For more details, refer to
src/tests/data/conftest.yml.
Schedule: This pipeline is not scheduled for automatic retraining and will run on demand.
Inference Workflow: prd-dewaxingllm-inference
The inference pipeline performs the following steps to update feature and inference tables and send data to the Peloton WellView API. Tasks are parameterized to support real-time, backfill, or CI testing processes. For backfill, the time range of jobs and job IDs can be specified.
Feature Engineering: Runs feature engineering for real-time or backfill processes.
Inference: Executes inference for both models in parallel, with parameters defining the type of inference (CI_test, backfill, realtime) and the time range or job IDs for backfill.
- job_summary_inference
- run_details_inference
WellView API Push: Sends data from inference tables to WellView tables. The operation type (CI_test, backfill, realtime), time range, job IDs, and model selection (jobsummary, run_details, or both) can be specified.
Rules for API Integration:
Job Summary:
- Data is sent to the summary column in the Jobs (wvJob) table with the system tag ‘AI Generated’.
- If an existing summary exceeds 100 characters and does not begin with ‘Summary generated by AI’, it will not be updated.
- The summary will be updated if it is missing, shorter than 100 characters, or begins with ‘Summary generated by AI’.
- All AI-generated summaries begin with ‘Summary generated by AI’.
- If the summary exceeds WellView’s 2000-character limit, the existing human-generated summary is retained or a default message is written: “AI-generated job summary exceeded the 2000-character limit. Please see the daily summaries for full details.”
Daily Summary:
- Data is sent to the summary (summaryops) column in the Daily Operations ([wvJobReport]) table with the system tag ‘AI Generated’.
- The same character limit and update rules apply as for job summaries.
Run Details: Data is sent to the Swab Details (wvSwabDetails) table, populating available fields:
- SwabNo: Sequential run number (integer)
- StartDate: Time of the run (datetime)
- PresTub (SolventOrSteam): Solvent or steam usage (double)
- PresCas (WaxProperties): Wax properties (double)
- TankGauge WaxPercentage: Wax percentage (double)
- DepthPull: Maximum depth reached (double, meters)
- VolFluidRec: Volume of fluid recovered (double, cubic meters)
- TempWH: Wellhead temperature (double, °C; entered only for the first swab)
- Com: Tool used for the run, marked as AI Generated (string, max 2000 characters)
- If the Swabs Table (wvSwab) does not exist, a new record is created using the run start time.
- All new records are tagged as ‘AI Generated’.
- Existing swab records are retained, and new AI-generated records are added.
- All Com columns include the ‘AI Generated’ tag.
API Integration Logs: All logs with error/success status are saved in
prd_zone3.dewaxingllm.api_integration_log.
Schedule: The pipeline runs daily at 3:30 AM Mountain Time.
Monitoring Workflow: prd-dewaxingllm-monitoring
The monitoring pipeline tracks both job summary and run details models to detect anomalies, missing data, performance degradation, and logs/alerts for action.
- Job Summary Monitoring:
- Verifies new data has been added within the past week.
- Checks for null values outside defined ranges within the past week.
- Identifies missing records or errors between API integration logs and inference tables within the past week.
- Uses a custom LLM to judge completeness, formatting, and relevance of a sample of data from the past week.
- Run Details Monitoring:
- Verifies new data has been added within the past week.
- Checks for null values outside defined ranges within the past week.
- Identifies missing records or errors between API integration logs and inference tables within the past week.
- Uses a custom LLM to judge formatting and relevance of a sample of data from the past week.
- Overall Model Monitoring and Alerts:
- All models are monitored weekly, with results stored in
prd_zone3.dewaxingllm.monitoring_summary. - The model_drifted_overall column flags any failed monitoring checks, with additional columns providing details on specific issues.
- If any table has model_drifted_overall = 1, a SQL query in the Databricks workspace automatically triggers an email alert.
- All models are monitored weekly, with results stored in
Schedule: The monitoring pipeline runs weekly on Tuesdays at 1 AM Mountain Time.