processor
create_audit_record
create_audit_record (batch_id:str, source_id:str, total_questions:int, processing_time_seconds:float, llm_config:Dict[str,Any], summary_stats:Dict[str,Any])
*Create an audit record for the extraction batch.
Args: batch_id: Unique identifier for the batch source_id: Identifier for the source document total_questions: Number of questions processed processing_time_seconds: Time taken for processing llm_config: Configuration used for LLM calls summary_stats: Summary statistics from create_summary_stats
Returns: Audit record dictionary*
create_summary_stats
create_summary_stats (results:List[llm_data_extractor.models.ExtractionR esult])
*Create summary statistics for a batch of extraction results.
Args: results: List of ExtractionResult objects
Returns: Dictionary with summary statistics*
format_for_target_tables
format_for_target_tables (results:List[llm_data_extractor.models.Extract ionResult], questions_map:Dict[str,Any])
*Format results grouped by target table for direct insertion into business tables.
Args: results: List of ExtractionResult objects questions_map: Map of question_id to Question objects
Returns: Dictionary with table names as keys and records as values*
format_for_db
format_for_db (results:List[llm_data_extractor.models.ExtractionResult], source_id:Optional[str]=None, batch_id:Optional[str]=None)
*Format extraction results for database insertion.
Args: results: List of ExtractionResult objects source_id: Optional identifier for the source document/text batch_id: Optional identifier for the processing batch
Returns: List of dictionaries ready for database insertion*
process_query
process_query (query:str, llm_config:llm_data_extractor.models.LLMConfig, db_config:llm_data_extractor.models.DBConfig, results_table_name:str, max_workers:int=4)
*Fetches data from Snowflake, processes it in parallel using an LLM, and inserts results back.
Args: query: SQL query to fetch the data to be processed. llm_config: Configuration for the LLM. db_config: Configuration for the database connection. results_table_name: Name of the table to store the results. max_workers: The maximum number of threads to use for parallel processing.
Returns: A dictionary containing summary statistics of the processing run.*