Pollution exposure test points

During the testing of the pollution exposure prediction model, the test points will be from the locations where sensors are currently present, right?

The test points should include both locations with sensors and other nearby locations.

Hi. Test points can be locations where there are no sensors too. For your testing you may deliberately not select a sensor and evaluate the performance of the algorithm. When we evaluate your algorithm, we will test it against a reference sensor which we have not exposed in the catalogue.

Thank you for the reply. In the pollution exposure problem statement, it is mentioned that there is a possibility of data availability from IITM AQM stations. Is that data already available? The data which I have come across so far from the sensors [0 and 1] here in the catalogue is faulty. The PM10_MAX value is less than PM2_MAX value which is wrong.

Yes it’s available. You can search for IITM on pudx.catalogue.org.in by selecting tags and find the ID.
Also, you can follow the same steps we did in the jupyter notebook for AQM-Climo’s and obtain the data for IITM. You will have to add “iitm” to the tags, for e.g,
attributes = {“tags”: [“aqi”, “aqm”, “iitm”]}

Please try it and let me know if you encounter any issues.

1 Like

Thanks a lot. Much appreciated.

Can we have the IITM and Climo data until 25th of November? The data which is available right now is until the 10th of November. And this year since it rained until the 2nd week of November, the usual winter weather resumed only after that week. As of now the training points to predict the PM values for the usual winter period are very less or none and hence I would like to request for data until the November 25th if possible. Thank you.

Hi Ritesh,
You can use the APIs as shown in the AQM jupyter notebook to obtain data till the 25th of November. Please note that AQM and IITM queries have a time limit imposed for the during query and you can only query for 15 days at a time. Since the archives using download API is available only till the 10th, you can use the “during” function of pyIUDX or the during time series query.
Here are a few examples -
Curl (For a particular AQM Resource)

curl -X POST
-H ‘Content-Type: application/json’
-H ‘Postman-Token: 28c34cf7-01ac-4707-99fd-40893fa31abc’
-H ‘cache-control: no-cache’
-d ‘{
“id” : “rbccps.org/aa9d66a000d94a78895de8d4c0b3a67f3450e531/pudx-resource-server/aqm-bosch-climo/ABC Farm House Junction_4”,
“time”: “2019-11-20T00:00:00+05:30/2019-11-25T00:00:00+05:30”,
“TRelation”: “during”

Using SDK (This will give you data from a bunch of sensors)-

from pyIUDX.cat import cat
from pyIUDX.rs import item
cat = cat.Catalogue(“https://pudx.catalogue.iudx.org.in/catalogue/v1”)
geo1 = {“circle”: {“lat”: 18.539107, “lon”: 73.853987, “radius”: 3000}}
attributes = {“tags”: [“aqm”]}
filters = [“id”]
allAQMItemsByID = cat.getManyResourceItems(attributes, filters, geo=geo1)
aqms = item.Items(“https://pudx.catalogue.iudx.org.in/catalogue/v1”, allAQMItemsByID)
aqms.during(“2019-10-26T00:00:00.000+05:30”, “2019-11-02T00:00:00.000+05:30”)

This is shown in the jupyter notebook.

For IITM -
You can find out the ID of the resource from pudx.catalogue.iudx.org.in by searching for iitm as “tag”.
One such ID is -

An example of a curl command for the same is -

curl -X POST
-H ‘Content-Type: application/json’
-H ‘Postman-Token: 4efcf6a3-8729-4113-adec-c48b1546d55a’
-H ‘cache-control: no-cache’
-d ‘{
“id” : “rbccps.org/aa9d66a000d94a78895de8d4c0b3a67f3450e531/pudx-resource-server/pune-iitm-aqi/Shivajinagar”,
“time”: “2019-11-20T00:00:00+05:30/2019-11-25T00:00:00+05:30”,
“TRelation”: “during”

You can do the same with the SDK.

Let me know if you have any further queries.

Also, in response to your queries of PM10_MAX being less than PM2_MAX.
You will have to consider the data as it is. We are not expecting you to write algorithms to calibrate the sensor. You may do so if you have an idea of how to do it.

For predicting pm values could you clarify would be the timestamp of test data set will be nearer to the current date or in distant future. If the dates are far away from current date we might lose the temporal nature.
And will the catalogue available for usage after releasing test data set?

We will test with timestamps in the past where the training data is made available. We have blanked out data for a few days. We will also give your algorightm test points of days after the submission due date.

Thanks for the reply. So is this problem is of data imputation or pm2.5 forecasting or interpolating pm2.5 all over city (predicting pm2.5 of locations where sensors are not present )?

Hi Sourabh,
The problem is primarily of predicting pm2.5 at both locations with sensors and without sensors at different times. This will therefore include all the aspects (data imputation, spatial interpolation etc) that you have mentioned. The final goal is to provide a value of PM2.5 for any given Lat-Long in Pune over a specified time interval.

1 Like