About this project

This three months project was a term project in Data science practicum (an elective for students in Bachelor of Science in Statistics program, Chulalongkorn University). The main objective of the project is to understand of end-to-end process for data science project. Our team focused on a time series data, Solar irradiance, forecasting problem using machine learning approach.


Our work

  • Data source: The experiment data were collected from 2017-01-01 to 2018-12-31 using the sensors which are installed on the top of the Electrical Engineering Building, Chulalongkorn University.
  • EDA: Performed exploratory data analysis to identify a seasonality pattern of irradiance at different time of day using Matplotlib.
  • Data cleaning: Performed linear interpolation, sensors bug data elimination and Replace Nan with the average of irradiance at the time point.
  • Feature engineering: Generated insightful predictors consisting of exponential moving average of past irradiance and clear sky irradiance.
  • Train validation test split: Due to a small size of data, we splited the data into bi-monthly batches for avoiding the seasonality effect of irradiance.
  • Modeling: Built a baseline model, tree-based models (Decision tree, Random forest and Xgboost) and Linear models (Linear/Ridge/Lasso regression and Elasticnet) to predict solar irradiance.
  • Error analysis: Inspected an prediction error for planning the next step of project improvement.