Description of data and analysis:

Our device is similar in function and performance to the device described in the PDF "Pulse Glucometry- A new approach for noninvasive blood glucose measurement using instantaneous differential NIR spectrophotometry YAMAKOSHI".  In short, spectral variations caused by arterial pulse modulation (systolic vs diastolic) are used to produce a "pulse differential spectrum" (PDS).

We have acquired absorbance data on a number test subjects "users".  We have 6 unique users.  Each time a set of data is collected on a user, we call it a "session".  We have 22 sessions.  Within a session, when the user places their finger on our device, we call it a "sitting".  The sessions have varying numbers of sittings.  During a sitting, 60 seconds of data is collected.  We use 10 second segments of data to produce a single spectrum.  There are 6 spectra per sitting.  There are a total of 2478 spectra across all of the users and sessions.

For each sitting, at lease 1 glucose concentration reference measurement is acquired.  There were 5 reference meters used during the testing; Dexcom CGM, One Touch Ultra, Free Style Lite, Oxiline, One Touch Verio.  For every spectrum in the data set, there is a glucose reference from a One Touch device.  When building regression calibrations, we have been using the data from the One Touch devices as the primary glucose concentration reference value.

Our goal is to produce a regression algorithm that accurately predicts the reference value (to within 20%).  If needed, a classification/anomaly/novelty approach to reject spectral measurements that produce high residual error (>20%) is also of interest.  We have had some success with a partial least squares regression on normalized PDS data. However, the predictions often require a scale and offset correction.  We have had similar success using a Gaussian process regression, either rational quadratic or Matern 5/2.  Both approaches usually have some spectra with high residual error.  Most spectra with high residual error are not detected with a one class support vector machine novelty detection approach.  We have mostly evaluated regression on the pds data that has been computed from raw data averaged in wavelength x 4 and time by 5.  We have also had some interesting results using the DC-only data (dc450 below) and a GPR Exponential regression; but we are not sure if this has physical relevance.

The raw spectral data was collected at a rate of 125Hz, ie a 1x256 spectrum every 8ms.  We have included spectral data as indicated below that may have had averaging in the spectral and/or time dimensions prior to the ac and dc amplitudes being computed.  We implement an absorbance calculation of pds=log10((dc+ac)./(dc-ac))

Variables in the data file:

id: each row corresponds to a row in any of the spectral or reference matrices. The columns contain the user #, Session #, Sitting #, Spectrum # and timestamp

ref: each row corresponds to a row in any of the spectral matrices.  Any non-zero values are actual glucose concentration references for that row (spectrum).

spectral variables are labeled with a prefix of x, ac or dc to indicate absorbance, computed ac amplitude, or computed dc amplitude.  The 3 digits following the prefix indicate the the number of samples averaged prior to ac/dc amplitude computation or if the spectrum is normalized; first digit is number of pixels averaged, second digit is number of raw spectra averaged and 3 digit is a 1 if normalized, 0 if not.  Similar to the normalization approach described in the above paper, we normalize a spectrum by subtracting the minimum value from the spectrum and then scaling to make the magnitude of the first element in the spectrum equal to 1.

ac110: ac amplitude computed from raw data; no averaging

ac410: ac amplitude computed from raw data averaged in wavelength x 4

ac450: ac amplitude computed from raw data averaged in wavelength x 4, averaged in time x 5

dc110: dc amplitude computed from raw data; no averaging

dc410: dc amplitude computed from raw data averaged in wavelength x 4

dc450: dc amplitude computed from raw data averaged in wavelength x 4, averaged in time x 5

x110: pds; no averaging, not normalized

x410: pds; raw data averaged in wavelength x 4, not normalized

x450: pds; raw data averaged in wavelength x 4, averaged in time x 5, not normalized

x111: pds; no averaging, normalized

x411: pds; raw data averaged in wavelength x 4, normalized

x451: pds; raw data averaged in wavelength x 4, averaged in time x 5, normalized