Understanding the ITMS Problem statement

The problem statement demands to predict the estimated time of arrival based on five parameters. The inputs to the algorithm, i.e Route_ID, Direction, Bus_Name, Time and Stop_Code are all static. This means it does not include the anomalies and the problem statement boils down to building a simple model or a time-table. In other words, answering the simple question “At what time the bus is supposed to come?” based on existing statistics? And doesn’t really “Predict”?

Is my understanding right?

PS. I think including the ‘current_location’ will have a big role in making a dynamic “prediction”. Also, modeling the “type of anomaly” can make the prediction better and that would be entirely a different problem statement.

Hi Raghavendra,
Your understanding is correct.
The problem statement specifies that given a stop_id time and route_id, you are to find the estimated arrival time of the bus on that route. You may specify your results as a distribution with a mean and variance.

To your point regarding the problem boiling down to a simple modelling problem -
You may obtain an estimate based on simple regression of the bus. But that would be a simple model and would not obtain you a good score. You could make the model a little more complex by including adjacent routes and their correlation to estimated times on the chosen route. This will give you more marks.

Modelling the type of anomaly will fetch you marks too, in both the data-science portion and the applications portion.

I hope this answers your question.

1 Like

Hi Raghavendra,

The need to use the current location of the bus is implicit in the problem statement. You could check to see if the bus is currently in transit and what it’s last known location was to get better estimation results.

1 Like