
About this project
This three months project was a term project in Data science practicum (an elective for students in Bachelor of Science in Statistics program, Chulalongkorn University). The main objective of the project is to understand of end-to-end process for data science project. Our team focused on a time series data, Solar irradiance, forecasting problem using machine learning approach.
Our work
- Data source: The experiment data were collected from 2017-01-01 to 2018-12-31 using the sensors which are installed on the top of the Electrical Engineering Building, Chulalongkorn University.
- EDA: Performed exploratory data analysis to identify a seasonality pattern of irradiance at different time of day using Matplotlib.
- Data cleaning: Performed linear interpolation, sensors bug data elimination and Replace Nan with the average of irradiance at the time point.
- Feature engineering: Generated insightful predictors consisting of exponential moving average of past irradiance and clear sky irradiance.
- Train validation test split: Due to a small size of data, we splited the data into bi-monthly batches for avoiding the seasonality effect of irradiance.
- Modeling: Built a baseline model, tree-based models (Decision tree, Random forest and Xgboost) and Linear models (Linear/Ridge/Lasso regression and Elasticnet) to predict solar irradiance.
- Error analysis: Inspected an prediction error for planning the next step of project improvement.
- © Untitled
- Design: HTML5 UP