From: Mark Rose

Date: April 9, 2007 4:23:46 PM PDT

To: Mitch Gordon, Mark Showalter

Subject: Comments on CIRS reformatted data set

Hi Mitch and Mark,

I failed to get this to you guys on Friday, sorry. I've just put the comments in-line in this email, but I could reformat them into a spreadsheet or Word document, whatever's easiest for you guys.

Also, it looks like all the problems I reported to Mark early last summer have been fixed.

-- Mark

-----

Scope of comments

I have necessarily limited my comments to the data structure and the contents of descriptive files. I have looked over a number of the volumes, but have pulled all examples from COCIRS_5509, the last volume in the set.

General data organization

The organization is nice. If you don't need to read GEO files, it's very straightforward to step through the data by reading files in parallel. If you need the GEO data, it's a little trickier, but manageable.

I checked a couple of the volumes to ensure that all the data rows had the right SCET values, but did not do this exhaustively on all volumes. (I presumed you had already done that.)

DATA directory

1. FP3 data files have non-FP3 detector numbers. E.g., on COCIRS_5509:

- APODSPEC/ISPM0509160800_FP3.TAB line 6 has detector 12

- POIDATA/POI0509160800_FP3.TAB line 6 has detector 12

- RINDATA/RIN0509160800_FP3.TAB line 6 has detector 12

- TARDATA/TAR0509160800_FP3.TAB line 6 has detector 12 (= FP4 pixel 2)

2. Different focal planes sometimes have different target sets. This may be correct (since the three focal planes have slightly different pointing), but I wanted to make sure. Sometimes it's just a target missing from one or more focal planes, but sometimes there is a completely different satellite, which may be correct but looks suspect. E.g., on COCIRS_5509 at 0509022120, we have:

- FP1 targets: 602, 611, 699

- FP3, FP4 targets: 602, 603, 699

(another example is at 0509160800)

We believe that this may be related to the distinction between detector number and pixel number. We will investigate this and clean it up. If appropriate, we will update the documentation to clarify this distinction.

3. Targets are not identified correctly via TARGET_NAME in label files. For example, on COCIRS_5509 at 0509022120:

- APODSPEC/ISPM0509022120_FP1.LBL, _FP3.LBL, and _FP4.LBL all show TETHYS as the only target in TARGET_NAME keyword. However, there are 3 targets according to GEO files.

- Corresponding POIDATA, RINDATA, and TARDATA files also show only TETHYS.

PDS standards require that TARGET_NAME be a single-valued parameter. However, TARGET_LIST can have multiple values. We propose to add this to the labels but not necessarily to the index.

4. ISPM*.LBL: SAMPLING_PARAMETER_UNIT = "INVERSE CENTIMETER". Is this correct? It is different from the "CM^-1" which would be used in a unit expression. If "CM^-1" is legal here, it would be better to be consistent.

The correct expression, according to PDS standards, is "CM**-1". We will make this change.

CALIB directory

5. All the calibration files are really table files rather than text files. They should be made more like tables via one of these two fixes:

a. change to a real PDS TABLE, remove the header lines and add column specifications to the label;

b. or, keep as text, but make the headers line up with the data. (There are extra spaces in the header line that cause the header labels not to line up with the data columns.)

We believe that these files were copied directly from the original CIRS volumes. However, it should be possible to change them to TABLE objects with minimal effort.

CATALOG directory

All examples from COCIRS_5509.

6. CATINFO.TXT has a PUBLICATION_DATE that is prior to the publication date of the original publication date of COCIRS_0509 (2006-04-28 vs. 2006-07-01).

We will fix this.

7. DATASET.CAT

- DATA_SET_TERSE_DESC: Should mention "Saturn" to help with full-text search and to make result rows more readable.

We will fix.

- ABSTRACT_DESC: Should mention "Saturn" to help with full-text search and to make result rows more readable.

We will fix.

- DATA_SET_TARGET descriptions: These are identical with original volumes. Do original volumes include satellites discovered during the observations for the volume? If not, need to augment them here.

Consensus was that this change is unnecessary, because none of the new satellites are large enough to be detectable in the data set.

- START_TIME doesn't match the value of the original volume (COCIRS_0509).

We will fix.

- line 533: "This dataset is composed of CIRS Time Sequential Data Records": TDSR is deemphasized elsewhere within this file (data set description, etc.). Are the reformatted tables also supposed to be called TDSRs, or does that imply Vanilla formatting?

We will investigate the precise meaning of "TSDR"---perhaps it is just another name for the Vanilla format. We'll update the documentation as appropriate.

DOCUMENT directory

8. Publication dates in the labels are sometimes less then the publication date in the corresponding file from the original CIRS volume.

- E.g., COCIRS_5509, CASSINI-RSP.LBL: COCIRS_0509=2006-06-08, COCIRS_5509=2006-03-08

We will fix.

9. CIRS_FOV_OVERVIEW.TXT: It appears that all \r\n newline sequences in the original file have been replaced with \n\n. I think this is wrong, since the PDS standard (I think) is to have \r\n. Secondly, the new file prints double-spaced.

We will fix.

10. DATASIS.TEXT: It appears that all \r\n newline sequences in the original file have been replaced with \n\n. I think this is wrong, since the PDS standard (I think) is to have \r\n. Secondly, the new file prints double-spaced.

We will fix.

11. DATASIS_OCR.PDF and .LBL: Perhaps merge this label information into DATASIS.LBL, as is done for the TEX and PDF files, to make more explicit the fact that this is a 3rd, alternate format.

This is a new file added to the directory by the Rings Node. It overcomes the fact that the original file originated in TeX and therefore uses a non-standard font, which prevents the user from copying text out of the original DATASIS.PDF. We prefer not to modify the other files in the directory. This is already noted in DOCINFO.TXT. We will make sure it is explained more clearly.

CALIB directory

12. Publication dates in the labels are sometimes less then the publication date in the corresponding file from the original CIRS volume.

We will fix.

INDEX directory

13. Publication dates in the labels are sometimes less then the publication date in the corresponding file from the original CIRS volume.

We will fix.

14. It would be more convenient if the bandwidth and resolution were in the POIINDEX.TAB and RININDEX.TAB files, as well as the ISPMINDEX.TAB file. Reason: Since some ISPM files won't be in the RININDEX.TAB file (and perhaps not in the POIINDEX.TAB file either, since some ISPM files are just SKY or, perhaps, just rings), correlating the files is more complicated. As well, some indication of the mode might be nice, but a single character, as in the original CIRS volumes (A/C/E/O/P/B) would be better than the separate flag approach.

It was our intention that each file have exactly the same sequence of rows, and Mark agreed that this change is unnecessary if the records do match. We will confirm that the file records match as intended. We will make this point clearer in the tutorial file, including the note that some records are filled with values of -200, which means that they do not contain any valid information.

15. ISPMINDEX.TAB: It would be better to have a single mode flag (all/even/odd/pairs/centers/blinking), rather than the 6 separate flags.

Mark retracted this suggestion. No change.

16. The original volumes included tables of the CIRS requests and their times. This by itself is not that useful for the reformatted volumes, since the products are arranged around the CIRS requests. However, some other information from the science kernel might be appropriate, such as a table of CIRS request IDs and their description or science objective. (Or perhaps a scientist could tell me what would be appropriate. And as I mentioned, the info is in the science kernel, so a sophisticated user could get the info anyway by using INSPEKT or writing a SPICE program.)

This is not intended as a replacement for the original volume, so we did not attempt to duplicate all the information. Nevertheless, we agree that a table of CIRS requests with their times and science objectives would be useful. We'll include it if practical.

It was also suggested that the volume indicate more clearly the absence of raw data on this volume. This will be better documented in DATAINFO.TXT, AAREADME.TXT and TUTORIAL.TXT, with references back to the original CIRS volume from which the re-formatted data files were derived.

Mark Rose

PSGS / NASA Ames Research Center

650.xxx.xxxx

=====================================================

FOLLOW-UP MESSAGE...

=====================================================

From: Mark Rose

Date: April 10, 2007 3:13:24 PM PDT

To: Mitch Gordon, Mark Showalter

Subject: Detector # label problems

Hi Mark and Mitch,

Thanks for all your work. I think the reformatted volumes look very good. I hope I gave you guys some useful comments. (I feel a little hampered in my ability to comment since I can't give very good feedback on the science side of things.)

A little more info on a few of the issues we discussed today:

1. FP3 data files have non-FP3 detector numbers.

It looks like the detector number problems are all in the .FMT files. All these files have bad docs for the detector numbers:

APODSPEC/ISPM_ASCII.FMT

POIDATA/POI_ASCII.FMT

RINDATA/RIN_ASCII.FMT

TARDATA/TAR_ASCII.FMT

It looks like the documentation in these files was copied from the original CIRS volumes. All have the following as the documentation for the "DET" column. You'll notice that this description doesn't match the SIS. The SIS in table 1 on page 11 correctly states that 1-20=FP3 and 21-40=FP4, as Conor noted.

OBJECT = COLUMN

NAME = DET

DATA_TYPE = ASCII_INTEGER

START_BYTE = 12

BYTES = 2

FORMAT = "I2"

DESCRIPTION = "Detector ID. Values are:

0: FP1, pixel 0

1: FP3, pixel 1

2: FP3, pixel 2

3: FP3, pixel 3

4: FP3, pixel 4

5: FP3, pixel 5

6: FP3, pixel 6

7: FP3, pixel 7

8: FP3, pixel 8

9: FP3, pixel 9

10: FP3, pixel 10

11: FP4, pixel 1

12: FP4, pixel 2

13: FP4, pixel 3

14: FP4, pixel 4

15: FP4, pixel 5

16: FP4, pixel 6

17: FP4, pixel 7

18: FP4, pixel 8

19: FP4, pixel 9

20: FP4, pixel 10

21: FP3, pixels 1+2

22: FP3, pixels 3+4

23: FP3, pixels 5+6

24: FP3, pixels 7+8

25: FP3, pixels 9+10

26: FP4, pixels 1+2

27: FP4, pixels 3+4

28: FP4, pixels 5+6

29: FP4, pixels 7+8

30: FP4, pixels 9+10

END_OBJECT = COLUMN

I checked all files on COCIRS_5509, and all have the expected ranges (0=FP1, 1-20=FP3, 21-40=FP4).

OK, thanks for looking into this. We will fix it.

14. It would be more convenient if the bandwidth and resolution were in the POIINDEX.TAB and RININDEX.TAB files, as well as the ISPMINDEX.TAB file.

I looked over the INDEX directory contents again, and I think I understand how it's designed. I was expecting something slightly different, which was the basis for my original comment.

What I was expecting: A way to look up observations, by geometry, time, or observation ID, and get the set of products in the observation.

What I think is there: A way to look up products based on geometry, time, instrument mode, and/or observation ID, depending on the index file used.

(I'm not sure if I'm making this clear.)

The fact this is confusing means that, at minimum, that we must clarify the documentation. It also means we should consider a better way to index the data set. To explain...

INDEX.TAB is a required file for all PDS volumes, and it contains one record per labeled PDS data file. That means that it mixes information about files of different types (POI, RIN, ISPM, etc.) so it is of limited utility for searching. However, it does contain a complete list of all the data files on the volume.

The other index files, RININDEX, POIINDEX and ISPMINDEX, are the ones intended to support searches. However, as you've noted, they do not match well. RININDEX and POIINDEX contain one record per {CIRS request + focal plane}, whereas ISPMINDEX contains one record per {CIRS request + focal plane + resolution change}. This, as you note, makes the files difficult to read in parallel and use for searching.

I think the natural solution is to produce a single index that contains one record per {request + focal plane}. It could also note the range of resolutions used. That is the key information that I would want to load into a database. Another index could summarize all the binary files associated with a particular {request + focal plane}, and this is the file that would indicate the distinct resolution found in each of the A,B,C... suffixed binary files.

Now that I've thought about this, I also don't like the layout of the POI, RIN, and TAR files as much as I did previously. I think I'd find them more useful if they had the same suffix structure as the binary ISPM files. In fact, if all the files in an observation were broken up by instrument mode and focal plane, I think they'd be the most useful. To tie everything together, it might be nice to have an index file, perhaps in an OBS directory, that lists all files for an observation, broken up by time and instrument mode. I think it's too late to consider this structure, but if you're interested, I'd be glad to either flesh this out with an example, or stop by SETI and draw on a whiteboard for a few minutes.

In any case, you can probably ignore my original comment (#14), as I think what's there is sufficient.

This represents a radical departure from our design and would also require an almost complete rewrite of our re-formatting code. Furthermore, we think that {request + focal plane} is the more natural way to divide up the data set from the viewpoint of a scientist; resolution changes are a secondary consideration, except that they have the unfortunate side-effect of forcing us to start a new binary file. We hope that a more unified approach to indexing, as discussed above, would solve the key problem that concerns you.

One new comment

- There is a slight problem with ISPMINDEX.TAB in the INDEX directory: there is a space after the FILE_SPECIFICATION_NAME in column 2, before the closing quote. This should be fixed.

We will fix.

Mark

Mark Rose

PSGS / NASA Ames Research Center

650.xxx.xxxx