Log Ingestion Pipeline Toolbox
Personal Project · Independant Researcher and Developer
Log Ingestion Pipeline Toolbox: Modular ETL Framework for PostgreSQL
This Python-based toolkit provides a flexible, modular approach to ingesting and transforming log file data into PostgreSQL databases. Whether you're dealing with raw logs, CSVs, or relational schemas—locally or remotely—this toolbox simplifies the end-to-end process through script automation and a GUI-based workflow.
Designed for data engineers, analysts, and developers, it supports a wide range of ingestion scenarios with intuitive interfaces and scalable code.
As the Creator and Lead Developer, I was responsible for:
- Raw Log to CSV Conversion: Developed
log_to_csv.py to clean and convert unstructured log files into delimited CSVs for preprocessing or migration.
- Local ETL Automation: Implemented
pipelineAutomation.py to automate the full local ingestion pipeline—CSV generation, PostgreSQL DB/table creation, and data loading.
- Remote Ingestion Support: Built
remoteConnectionPipelineAutomation.py to handle secure ingestion into remote PostgreSQL instances, with duplicate-handling and schema creation logic.
- User-Friendly GUI: Designed
remotePipelineGUI.py with a Tkinter interface, allowing non-technical users to run the pipeline by inputting credentials and selecting files.
- Relational Schema Ingestion: Created
twoTablePipeline.py to directly ingest logs into normalized PostgreSQL schemas using foreign keys—tracking file origin with a FileRegistry and PDWData structure.
Key Features:
- Convert unstructured logs into CSV for analysis or migration.
- Ingest CSVs into PostgreSQL with local and remote support.
- Create and manage relational schemas across multiple data sources.
- Simple GUI interface with responsive layout and hover effects.
Use Cases:
- Engineering teams ingesting PDW or sensor logs.
- Data analysts working with raw batch logs.
- PostgreSQL ETL demonstrations and workshops.
Technologies Used:
- Python 3.7+
- PostgreSQL
- psycopg2, tkinter, and other standard libraries
Impact: This toolbox offers a comprehensive and extensible approach to data ingestion, transforming complex log workflows into manageable pipelines. Whether automated via CLI or simplified through a GUI, it empowers both technical and non-technical users to interact with structured data reliably and securely.
Joshua Fields — full portfolio