Say we have a dataset that has the following attributes:

- customer_id: There are a total of 1000 customers, each of them with a unique customer_id

- observation_date: The date on which we last observed the amount of water a certain customer has consumed (yyyy/mm/dd)

- amount_consumed: The amount of water a customer had consumed on the observation date (in liters, since the last observation date)

Now, as I said, there are a total of 1000 customers, all with observations taken at different intervals. The intervals are between 30~45 days and mostly unequal for each customer. The data spans over 2 years , and I have about 19~24 observation for each customer (19~24 observations per customer_id).

(I can procure data spanning over 14 years as well)

Now, I have a couple of questions:

1. From what I understand, We need to use Time Series Forecasting methods on this data. If I want to forecast the amount of water consumed by every customer in the next month, do I have to train a different Time Series model for every single user? (e.g. train 1000 different Time Series models, each belonging to a certain customer_id)

2. What do you think would be the best algorithm to achieve this? From what I gather, machine learning algorithms tend to perform poorly when we don't have a lot of data points. So what method would you suggest for this particular task?

3. Would the inclusion of monthly weather data (temperature, humidity, etc.) help improve my accuracy, since we don't have a lot of data points?

4. Could we use methods such as Matrix Factorization here?

Thanks in advance!

More Behnam Sajjadi's questions See All
Similar questions and discussions