Academia.eduAcademia.edu

Feature selection and rapid characterization of bloodstains on different substrates

2020, Applied Spectroscopy

Establishing the precise timeline of a crime can be challenging due to the need for rapid and non-destructive analysis of body fluids encountered at crime scenes. Raman spectroscopy has demonstrated great potential in forensic science as it provides direct information about the structural and molecular changes without the need for processing or extracting samples. However, its current applicability is limited to pure body fluids as signals from the substrate underlying these fluids greatly influences the current models used for age estimation. In this study, we utilized Raman spectroscopy to identify selective spectral markers that delineates the bloodstain age in presence of interfering signal from the substrate. Least absolute shrinkage and selection operator (LASSO) regression was employed to guide feature selection process in the presence of interference from substrates to accurately predict bloodstains age. Substrate specific regression models guided by automated feature selection algorithm depicted low values of predictive root-mean-squared-error (0.207, 0.204, 0.222) and high R2 (0.924, 0.926, 0.913) on test data consisting of blood spectra on floor-tile, facial-tissue and linoleum substrates respectively. This framework of automated feature selection algorithm relies entirely on pure bloodstains spectra to train substrate specific models for estimating the age of composite (blood on substrate) spectra. The model can thus be easily applied to any new composite spectra and highly scalable to new environments. This study demonstrates that Raman spectroscopy coupled with LASSO can serve as a reliable and nondestructive technique to determine age of bloodstains on any surface while aiding forensic investigations in real-world scenarios.

Applied Spectroscopy Feature selection and rapid characterization of bloodstains on different substrates Journal: Applied Spectroscopy Manuscript ID Draft Manuscript Type: Submitted Manuscript Date Submitted by the n/a Author: Fo ee rP Complete List of Authors: Gautam, Rekha; Vanderbilt University, Biomedical Engineering Peoples, Deandra; Vanderbilt University, Biomedical Engineering Jansen, Kiana; Vanderbilt University, Biomedical Engineering O'Connor, Maggie; Vanderbilt University, Biomedical Engineering Thomas, Giju; Vanderbilt University, Biomedical Engineering Vanga, Sandeep; Episode Solutions LLC Pence, Isaac; Vanderbilt University, Biomedical Engineering; Massachusetts General Hospital Mahadevan-Jansen, Anita ; Vanderbilt University, Biomedical Engineering rR Manuscript Keywords: Machine Learning, Forensic, Raman Spectroscopy, LASSO Regression iew ev Establishing the precise timeline of a crime can be challenging due to the need for rapid and non-destructive analysis of body fluids encountered at crime scenes. Raman spectroscopy has demonstrated great potential in forensic science as it provides direct information about the structural and molecular changes without the need for processing or extracting samples. However, its current applicability is limited to pure body fluids as signals from the substrate underlying these fluids greatly influences the current models used for age estimation. In this study, we utilized Raman spectroscopy to identify selective spectral markers that delineates the bloodstain age in presence of interfering signal from the substrate. Least absolute shrinkage and selection operator (LASSO) regression was employed to guide feature selection process in the presence of interference from substrates to accurately predict Abstract: bloodstains age. Substrate specific regression models guided by automated feature selection algorithm depicted low values of predictive root-mean-squared-error (0.207, 0.204, 0.222) and high R2 (0.924, 0.926, 0.913) on test data consisting of blood spectra on floor-tile, facial-tissue and linoleum substrates respectively. This framework of automated feature selection algorithm relies entirely on pure bloodstains spectra to train substrate specific models for estimating the age of composite (blood on substrate) spectra. The model can thus be easily applied to any new composite spectra and highly scalable to new environments. This study demonstrates that Raman spectroscopy coupled with LASSO can serve as a reliable and nondestructive technique to determine age of bloodstains on any surface while aiding forensic investigations in real-world scenarios. https://mc.manuscriptcentral.com/asp Page 1 of 40 iew ev rR ee rP Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy https://mc.manuscriptcentral.com/asp Applied Spectroscopy Feature selection and rapid characterization of bloodstains on different substrates Rekha Gautam1, Deandra Peoples1, Kiana Jansen1, Maggie O’Connor1, Giju Thomas1, Sandeep Vanga2, Isaac Pence1,3, Anita Mahadevan-Jansen1 1 Department of Biomedical Engineering, Vanderbilt University, TN, USA 2Episode 3Wellman Solutions LLC, Nashville, TN, USA Center for Photomedicine, Massachusetts General Hospital, Boston, MA, USA anita.mahadevan-jansen@vanderbilt.edu Abstract rP Fo Establishing the precise timeline of a crime can be challenging due to the need for rapid and non- ee destructive analysis of body fluids encountered at crime scenes. Raman spectroscopy has rR demonstrated great potential in forensic science as it provides direct information about the structural and molecular changes without the need for processing or extracting samples. ev However, its current applicability is limited to pure body fluids as signals from the substrate iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 underlying these fluids greatly influences the current models used for age estimation. In this study, we utilized Raman spectroscopy to identify selective spectral markers that delineates the bloodstain age in presence of interfering signal from the substrate. Least absolute shrinkage and selection operator (LASSO) regression was employed to guide feature selection process in the presence of interference from substrates to accurately predict bloodstains age. Substrate specific regression models guided by automated feature selection algorithm depicted low values of predictive root-mean-squared-error (0.207, 0.204, 0.222) and high R2 (0.924, 0.926, 0.913) on test data consisting of blood spectra on floor-tile, facial-tissue and linoleum substrates respectively. This framework of automated feature selection algorithm relies entirely on pure https://mc.manuscriptcentral.com/asp Page 2 of 40 Page 3 of 40 bloodstains spectra to train substrate specific models for estimating the age of composite (blood on substrate) spectra. The model can thus be easily applied to any new composite spectra and highly scalable to new environments. This study demonstrates that Raman spectroscopy coupled with LASSO can serve as a reliable and nondestructive technique to determine age of bloodstains on any surface while aiding forensic investigations in real-world scenarios. Introduction In recent years, forensic investigations have undergone a surge in popularity among scientists Fo and researchers. Focus has progressively shifted to developing novel, non-destructive techniques rP for rapid analysis of evidence out in the field. One critical area of focus has been the ee determination of a crime timeline in the absence of a witness or corpse. In these scenarios, body fluids such as bloods that are frequently encountered at a crime scene are analyzed to predict rR when the crime may have occurred. Investigators have looked at RNA degradation1, electron ev paramagnetic resonance spectroscopy2, atomic force microscopy3, diffuse reflectance spectroscopy4, fluorescence life time measurements5, hyperspectral imaging6, and ATR-FTIR iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy spectroscopy7 in order to determine the age of bloodstains. Most of these techniques are either time consuming, destructive or provide very low temporal precision over an extended period of time8, 4. Spectroscopy based techniques provide detailed physiochemical information and therefore are highly valued in forensic temporal examinations as methodically summarized by Zadora et al9. Among these, Raman spectroscopy has shown great potential in the field of forensic science for the identification of drugs, explosives, gunshot residues and different body fluids10-12. Raman spectroscopy is based on inelastic scattering of light by molecules and provides a molecular fingerprint that represents the vibrational modes in them13, 14. The advent of https://mc.manuscriptcentral.com/asp Applied Spectroscopy fiber probes14, 15 and portable instruments16 can now ensure easy implementation of Raman spectroscopy in the field as a nondestructive and rapid technique. This is a key reason for the extensive exploration of Raman spectroscopy in the field of forensic science. Numerous studies have revealed the ability of Raman spectroscopy to analyze blood and its components for identification of different disease states, gender, blood type, species and aging of red blood cells (RBCs) in cold storage16-22. These studies have shown that Raman spectroscopy is sensitive to changes in blood analyte concentration and different oxidative states of hemoglobin (Hb)17, 18. Fo Several reports have investigated the potential of Raman spectroscopy in estimating the age of bloodstains based on Raman spectral markers23-27. ‘Age’ is the amount of time blood has been rP outside the body and exposed to ambient air, which is usually described by ‘time since ee deposition’ (TSD). Our prior work on bloodstains analysis has shown that Hb, which constitutes approximately 90% dry weight of RBCs, is the primary component contributing to spectral rR changes that occurs during aging25. The Hb contains four heme molecules, composed of a ev protoporphyrin ring with an iron (Fe) atom at its center9. The cascade of biochemical changes iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 that occurs during aging typically involve saturation of deoxyhemoglobin (deoxyHb) to oxyhemoglobin (oxyHb) in the presence of ambient air, followed by autooxidation of oxyHb to methemoglobin (metHb) eventually formatting other degraded products such as hemi- and hemochromes 9. Lemler et al. specifically analyzed laser induced changes in blood and identified that photodamage causes saturation of deoxyHb and autooxidation of oxyHb which looks similar to the natural aging process of blood over time26. This study emphasized the need to use of low power and exposure time to eliminate local-heating which induces heme aggregates during these measurement from blood26. Raman spectroscopy has shown potential in determining not only the age of bloodstains23 but also the chronological age of blood donors28. More recently, Doty et al. https://mc.manuscriptcentral.com/asp Page 4 of 40 Page 5 of 40 used Raman spectral signatures along with multivariate analysis methods to predict age of bloodstains from two volunteers with high accuracy (R2=0.97)24. Most of these prior studies analyzed blood samples on substrates such as aluminum which has no interfering Raman signals. However, in real scenarios, blood at the crime scene are indeed influenced by Raman signatures of underlying substrates such as floor-tile, paper tissue or contaminants like dust or sand. In the field of Raman spectroscopy, numerous experimental and data processing approaches have been employed to deal with background signals from substrates10, 19, 29-32. The most common Fo experimental approach to avoid substrate interference involves reconstitution of blood after extraction using water. As discussed by Boyd et al.10, water extracts were prepared by immersing rP and mixing small pieces of stained fabrics in 500 μL of water. Although, only a small portion of ee sample is required for Raman analysis, the extraction process is destructive and laborious. Techniques such as shifted excitation Raman difference spectroscopy30 and automated rR background subtraction based on least-squares polynomial curve-fitting31 are widely adapted to ev eliminate interfering broad fluorescence background but do not help in removing Raman peaks iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy from substrates. Subtraction of pure substrate spectrum from the composite (sample on substrate) spectra may perhaps be the most intuitive solution, however it is challenging with substrates that possess both strong Raman peaks and heterogenous fluorescent backgrounds29. Multivariate analysis-based methods have also been explored to circumvent the issue of interference from substrate32, 33 and to differentiate composite spectra19. Sikirzhytskaya et al.33 employed alternating least squares statistics and multivariate curve resolution to fit blood signatures to the contaminated experimental spectra and estimated the blood contribution in presence of contaminants. This method worked well for identification of bloodstains in presence of contaminants such as sand, soil or dust33 but not for predicting age. Gautam et al. used partial https://mc.manuscriptcentral.com/asp Applied Spectroscopy least squares-discriminant analysis (PLS-DA) to differentiate young (6-8 days) and old (35-42 days) stored blood samples with high accuracy in the presence of polymer interference19. However, this study assumes that the polymer is homogeneous and contributes equally to all the spectra. In general, these aforementioned postprocessing and multivariate analysis methods have yielded poor precision and limited usage. To our knowledge no automated and versatile method has been established for estimating age of bloodstains in the presence of substrate signals. The goal of this study is to evaluate the ability of Raman spectroscopy to predict age of Fo bloodstains in the presence of signal interference from substrates typically found at crime scenes. Here, we propose a framework using least absolute shrinkage and selection operator (LASSO) rP regression model34-36 to efficiently deal with substrate interference and extract blood age ee information. This regression model has the inherent capability of denoising and compression by using L1-penalty which continuously shrinks the smallest estimated regression coefficients rR towards zero to induce sparsity36, 37. The novel aspects of the study involve (i) training of model ev on pure bloodstains spectra aged over time, rendering the model applicability to any substrate iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 and (ii) automatically selecting subset of Raman features (free from substrate signals) to accurately predict the age of bloodstains in presence of interference from different substrates. To fully understand the biochemical process of in vitro blood degradation, we also performed ratiometric analysis using specific band intensities. This study revealed that Raman spectroscopy is sensitive to the structural changes in Hb in its various states including oxyHb, metHb, hemichrome and along with LASSO based analysis can be employed to determine age of bloodstains in the field. Materials and methods https://mc.manuscriptcentral.com/asp Page 6 of 40 Page 7 of 40 Sample collection and processing: Fresh blood samples were obtained using finger prick method from four healthy adult volunteers, two males and two females with approval from the Vanderbilt Institutional Review Board (IRB-151532). Blood samples- without anticoagulantswere placed on aluminum plate and three different substrates (floor-tile, facial-tissue, linoleumpolymer) for Raman spectral measurements under ambient conditions (room temperature). Raman spectroscopy: Raman spectra were recorded using a Raman micro-spectrometer (InVia Renishaw Inc., UK). A 785 nm diode laser (Innovative Photonic Solutions, NJ) was used to Fo excite the samples using a 20x (NA=0.4) objective. The input power and exposure time were optimized to avoid potential unwanted heating/damage to the sample as discussed previously25, 26. rP The system was calibrated to the 520.5 cm-1 line using an internal silicon reference before ee acquiring sample spectra. Spectra were recorded with 15s (3 seconds x 5 accumulations) integration time at ~2 mW laser power. Measurements taken from liquid blood drops for the first rR 20 minutes were considered fresh bloodstain spectra and spectra were collected over a time ev course of two weeks as tabulated below (Table 1). Spectra were recorded from five different iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy bloodstains (drops) for each volunteer. Spectra from five spatially different locations within a stain were averaged to obtain one spectrum per bloodstain. This approach prevented photodamage and accounted for the spatial variability within a bloodstain. For each donor, 115 spectra (23-time points x 5 spectra per time point) were obtained and a total of 460 spectra (4 volunteers x 115 spectra) were analyzed. For study of bloodstains on substrates (composite stains), blood drops from a donor were placed on each substrate - floor-tile, facial-tissue and linoleum-polymer - and aged in ambient conditions. For composite (blood on substrate) spectral acquisition, each spectrum was integrated for 15s and measured over a time course of two weeks. Spectra were obtained in a similar manner as described earlier and a total of 225 spectra (15-time https://mc.manuscriptcentral.com/asp Applied Spectroscopy points x 5 spectra per time point x 3 substrates) were analyzed for all three substrates (Table 1). The testing of regression models was done on these composite spectra which were not used at any training phase. To obtain a pure substrate (no blood) spectrum, 15 spectra from spatially different locations on the substrate were recorded and averaged for each of the substrates separately. Table 1. Time points analyzed over a period of two weeks. Sample type Time points over a period of two weeks (in hrs.) Fresh 1, 1.5 Bloodstains on substrate Fresh 1.5 2, 2.5, 3, 4 5, 6 7, 8 9, 10 11, 12 24 48 96 144 192 240 288 336 4 6 8 10 12 24 48 96 144 192 240 288 336 rP Pure Bloodstains Fo Data preprocessing: Cosmic ray removal was performed using Renishaw WiRE 4.2 software ee immediately after acquiring each spectrum. Further data processing and analysis was carried out rR using the R software (R core team 2018) and Origin 2008 (Origin Lab Corporation, MA, USA). First, band alignment was performed using local regression to calculate intensities at a pre- ev defined common spectral axis in order to correct for small instrumental spectral shifts38. All iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 8 of 40 spectra were baseline corrected using the Asymmetric Least Squares method39 with lambda=4 and P=0.0005 where lambda defines how closely baseline fits to the data and P defines the asymmetry of positive versus negative residuals. The normalization is performed using a Standard Normal Variate transformation across the whole spectral range to eliminate the influence of inter/intra spectral variability and to ensure that all spectra contribute equally to the model38, 40. Regression models: Linear regression is a widely used technique where the relationship between a dependent variable (outcome) and observed independent variables (predictors) is assumed to be linear. In general, model parameters are estimated by minimizing the squared error between https://mc.manuscriptcentral.com/asp Page 9 of 40 estimated and true values of the dependent variable on a training data set. The estimation of age of bloodstains can be modeled using linear regression where the age of bloodstain is the desired outcome variable (TSD) and Raman features serve as the predictor variables. Though the Raman spectrum from bloodstain contains a large number of intensity variables (features), these features increase the computational complexity and hinder the model efficiency due to noise contribution from undesirable features. To avoid the noisy and/or irrelevant information which usually degrades model performance, a variety of techniques such as principal component regression (PCR) and partial least squares (PLS) have been employed9, 24. Herein, we employed PCR as a Fo benchmark for estimating age of bloodstains based on 10 principal components (PCs). The rP performance of the calibration model was evaluated in terms of root-mean-square-error (RMSE) as described previously9, 24. ee In order to evaluate a more robust and automated approach, we also implemented LASSO rR regression model, a regularization technique, which permits the compression (feature selection) ev of data. Notably, LASSO does not require to map the predictors (Raman features) into any iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy subspace. Therefore, a better interpretability and possibility of fusing the information obtained from LASSO with substrate Raman spectrum is attainable for tuning the substrate specific model performance34, 35, 37. The LASSO uses penalized variant of least squares (L1-regularizer) on model parameters (regression coefficients). Thus, the model parameters are estimated by 𝑚 minimizing an additional term (L1-regularizer, 𝜆∑𝑗 = 1|𝛽𝑗|) along with squared error. This continuously shrinks the smallest estimated regression coefficients towards zero by keeping only those features that efficiently explain the variance in the dependent variable34, 36, 37. This process allows the exclusion of some features (independent variables) from the model without any significant performance loss. Number of zero-valued model parameters increases as a function of https://mc.manuscriptcentral.com/asp Applied Spectroscopy the regularization parameter (λ). The resulting sparse model can be used for both feature selection and age prediction. The value of the dependent variable (age of bloodstain) for the ith sample is given by: 𝑚 𝑦𝑖 = ∑𝑗 = 1𝛽𝑗𝑋𝑖𝑗 + ∈ 𝑖 where, i= 1,2,3…N and Xij is the intensity of the jth predictor variable (wavenumber) for the ith sample; 𝛽𝑗is the model parameter (coefficient) corresponding to the jth predictor variable; 𝜖𝑖 is the residual for the ith Fo sample that is unexplained by the regression model; N is the number of samples in the training data set; and m is the number of Raman features in the data set. Here, the goal is to estimate the rP coefficient 𝛽𝑗(model parameter) for each predictor variable by minimizing the following cost function, C on the training data set: 𝑁 ^ rR ee 𝑚 𝐶 = {∑𝑖 = 1(𝑦𝑖 ― 𝑦𝑖)2 +𝜆∑𝑗 = 1|𝛽𝑗|} ^ 𝑚 ev where, 𝑦𝑖 = ∑𝑗 = 1𝛽𝑗𝑋𝑖𝑗 iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 10 of 40 Performance of the calibration model was evaluated in terms of RMSE which is defined as below: 𝑅𝑀𝑆𝐸 = 𝑁 ^ ∑𝑖 = 1(𝑦𝑖 ― 𝑦𝑖)2 𝑁 ^ Where yi is the true value of age of the bloodstain for ith sample; 𝑦𝑖 is the value of age as predicted by the regression model for ith sample, and N is the number of samples in the training data. Here the LASSO was implemented using the glmnet package of the R software34, 35, 37. The robustness of PCR and LASSO models was verified by cross-validation using Venetian blinds algorithm with ten data splits. This approach calibrates the model based on 90% of data https://mc.manuscriptcentral.com/asp Page 11 of 40 from four volunteers, cross-validated the remaining 10% against that model, and repeats the process through ten iterations to assess its predictive power and optimize the corresponding parameters- number of PCs and regularization parameter (𝜆)- in PCR and LASSO respectively. Finally, the model was tested on 70 (5 spectra x 14 time points) composite spectra for each substrate separately using corresponding model trained on specific subset of features selected by the proposed algorithm. Automatic Feature Selection Algorithm: In this study, the regression models were trained on Fo dataset consisting of 420 spectra (5 spectra x 21-time points x 4 donors) of pure bloodstains. Spectra obtained below 1.5hrs timepoint were excluded due to variation in drying time for rP different volunteers. With the guidance of LASSO, an algorithm was devised to select Raman ee features that predominantly explained the variability in age of bloodstains and at the same time avoid strong signals from the given substrate. At first, a pool of Raman features that mainly rR contributed to age estimation were derived from pure bloodstains spectra using LASSO. These ev Raman features, their corresponding LASSO coefficients and the pure substrate Raman spectrum iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy were used to select a subset of Raman features that are best suited for a given substrate. This subset of features was selected using three important steps, firstly min-max normalization was applied to pure substrate spectrum to bring all intensity values in 0 to 1 range. This normalized substrate spectrum was divided (labeled) into ‘silent regions’ which are free of substrate interference (peaks) and ‘signal regions’ which includes substrate peaks. Secondly, LASSO selected Raman features derived from pure bloodstains spectra which overlapped with ‘silent regions’ of substrate spectrum were included in the subset. Lastly, LASSO selected Raman features that overlap with ‘signal regions’ were included in the subset only if their intensities were below a set threshold (1/10th of the sum of five smallest peak intensities) in the normalized https://mc.manuscriptcentral.com/asp Applied Spectroscopy substrate spectrum. This process is described using a flow chart in Figure 1. Three different subsets of features associated with floor-tile, facial-tissue and linoleum-polymer were derived respectively. These subsets of features were used to train new LASSO regression models on the data set consisting of pure bloodstains spectra (from 4 donors) and then tested on composite (blood on substrate) spectra of the corresponding substrate. Performance of each model was evaluated in terms of RMSE as described above. Results Fo The Raman spectral changes in human blood with aging were analyzed over a duration of two rP weeks (336 hrs) under ambient temperature. Blood spectra acquired on aluminum substrate were ee considered pure bloodstains spectra due to no interference from the substrate at 785 nm excitation. Figure 2 displays spectral signatures of pure bloodstains at various time points from a rR donor plotted with offset for clarity. Obvious changes in the signal were observed in the first 12 ev hrs (Figure 2A), some of which leveled off there after (Figure 2B). Among these changes, appearance of peaks at 971 and 1248 cm-1, and disappearance of 1638 cm-1 band along with iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 12 of 40 several shifts and broadening in other bands were very distinctive, these features are highlighted in Figure 3. The band at 971 cm-1 was assigned to δ(pyrrole deformation) asymmetric in plane deformation (ν46) and/or γ(=CbH2) symmetric out-of-plane deformation and 1248 cm-1 was allocated to δ(CmH) in-plane deformation (ν13)18, 26. The redshift in bands at 1376 cm-1 and 1583 cm-1 were also clearly noticeable as illustrated in Figure 3C and 3D. Bands at 1638 cm-1 and 1583 cm-1 were assigned to ν(CαCm) asymmetric stretch (ν10) and (ν37) respectively9, 18 while band at 1376 cm-1 was assigned to ν(pyrrole half-ring, CaN) symmetric stretch (ν4)26, 41, 42. All these band vibrations are mainly associated with various forms of Hb. Further, these spectral https://mc.manuscriptcentral.com/asp Page 13 of 40 changes were examined with respect to intensity ratios at 971/937, 1248/1224, 1371/1376 and 1638/1577 over the course of two weeks (Figure 4). As the trend in these ratios appeared exponential, the changes were fit with Y = Y0 + AeR0X model where parameters such as offset Y0, initial value A, and rate R0 were obtained by minimizing the sum squared error between measured and estimated ratio. The exponential model fit well for all ratio-metric changes and yielded R2 values greater than 0.94 (Figure 4A-4D). Analyzing pure bloodstains over time provide insight into the mechanism of its aging. However, Fo to mimic a crime scene, it is important to assess its modulation in the presence of a substrate. Figure 5 displays the Raman spectra of bloodstains on three substrates (floor-tile, facial-tissue, rP linoleum-polymer) commonly found at crime scenes. For analysis, spectra were recorded from ee fresh bloodstain on substrate, at 1.5, 4, 6, 8, 10, 12 hrs on day one and then at 24, 48, 96, 144, 192, 240, 288, 336 hrs over a period of two weeks. Composite (blood on substrate) spectra rR include spectral contributions from both components, blood and the substrate, as observed in ev Figure 5 where composite spectra are compared with those of pure substrate and pure bloodstains. iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy Our goal is to build a multivariate linear regression model which utilizes the changes in Raman spectra to predict TSD of bloodstains on various substrates. Here we examined two regression models, PCR and LASSO, for estimating age of bloodstains on various substrates. We observed that the drying process on three different substrates (floor-tile, facial-tissue, linoleum-polymer) was different. Blood drop on facial tissue immediately spread and was absorbed by the tissue fibers while the rough tile surface absorbs the water in blood slowly. In comparison, blood remained in fluid form for relatively longer time on the linoleum polymer which may be due to the hydrophobic nature of the surface. Nevertheless, the bloodstains dried out within 1.5 hrs TSD https://mc.manuscriptcentral.com/asp Applied Spectroscopy on all three substrates under ambient conditions. Therefore, we excluded <1.5 hrs bloodstain data before generating the models. Pure bloodstain spectra at 21 timepoints from four donors was as assigned to the training data set used to train all substrate-based models. The PCR model was trained and cross-validated using 420 pure bloodstains spectra. By using 10 PCs, the model accurately estimated TSD with an R2=0.974 and RMSECV=0.121. The universal LASSO regression was constructed using all Raman features as input. The LASSO has the inherent ability to set the contribution from certain (irrelevant) features zero and thus provides the spectral signatures contributed to the model with nonzero coefficient estimates36, Fo 37. Coefficients represent the extent of contribution of these selected features (wavenumbers) for accurately rP predicting TSD. Important spectral features that actually contributed to the universal LASSO ee regression model of pure bloodstains are depicted in Figure 6(i). The wavenumber values of these features are given in Table 2. This universal model was trained and cross-validated on pure rR bloodstains spectra, accurately estimated TSD with R2=0.984 and RMSECV=0.096. The ev important features derived from pure bloodstains spectra using universal LASSO regression model were plotted with pure substrate spectra for comparison (Figure 6). Some of these features iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 14 of 40 marked with dotted lines coincide with strong substrate signals. Any interference from the strong substrate signals increases the uncertainty in predicting age of bloodstains. Thus, an automated feature selection method (Figure 1) was devised to extract substrate specific subset of features that are free from substrate interference. This subset of features was used to predict age of composite (blood on substrate) spectra. For the three different substrates three separate substrate specific LASSO regression models were obtained using the selected subset of features in each case. These subsets of features are marked on their respective substrate spectrum for comparison as shown in Figure 6(ii), 6(iii), 6(iv) and their wavenumber values are given in Table 2. https://mc.manuscriptcentral.com/asp Page 15 of 40 Variations in the accuracy of predictions using universal LASSO model at different time points are illustrated in Figure 7A where the error of prediction can be considered as a measure of uncertainty. Results from substrate specific LASSO models are presented in Figures 7B-7D for floor-tile, facial-tissue and linoleum-polymer respectively. In Figures 7A-7D, grey squares represent training data results for corresponding models. The substrate specific models were tested using separate test data sets, consisting of composite (blood on substrate) spectra from each respective substrate to predict TSD of bloodstains and the results are marked as circles on Figure 7B-7D. The performance of the regression models on test data set consisting of 70 Fo composite (blood on substrate) spectra are tabulated in Table 3. Interestingly, substrate specific rP models trained on subset of features illustrate improved accuracy in comparison with respective ee PCR models incluse all Raman features (Table 3). Discussion ev rR Raman spectroscopy has been explored extensively in forensic science including its use in iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy estimating the age of pure bloodstains based on spectral markers23-26. Practically, bloodstains at a crime scene are contaminated by signals from the surfaces they are on which interfere with the bloodstain spectra. The resulting interference in the bloodstain spectra can prove problematic in estimating the age of bloodstains accurately. Thus, we employed Raman spectroscopy in combination with regression analysis to estimate the age of bloodstains in the presence of the underlying substrate signals. In this study, special care was taken to identify the appropriate objective lens, exposure time and laser power needed to obtain Raman spectra with good signal to noise ratio and to avoid any local heating of bloodstains as discussed by Lemler et al26. As shown in Figure 2 and 3, the acquired Raman spectra revealed conformational/structural changes https://mc.manuscriptcentral.com/asp Applied Spectroscopy occurred in Hb as a result of drying (up to 1.5hrs) and aging over a time of two weeks. Appearance of both 971 and 1248 cm-1 bands (Figure 3A and 3B) has been previously identified as denaturation and aggregation of Hb markers26. When iron oxidizes to Fe3+ state, it loses its ability to carry oxygen (as in metHb). In metHb, Fe3+ remains in high spin which subsequently denature to low spin Fe3+ hemichrome. The redshift in ν4 band at 1376 cm-1 (Fe oxidation state marker) indicates conversion of oxyHb to Fe3+ metHb/hemichrome (Figure 3C)41, 42 which is also corroborated with a redshift in 1583 cm-1 (Figure 3D)9, 18. The conversion of oxyHb to metHb can further be evidenced from a decrease in the band at 1638 cm-1 which denotes planar Fo porphyrin ring in oxyHb, also known as an oxygenation marker9, 23. Binding of oxygen to Fe2+ rP heme leads to a decrease in the size of the iron atom causing iron to move into the plane of the ee porphyrin ring of Hb resulting in a slight conformational adjustment of porphyrin and associated globin9. This process can also be monitored by assessing 1200-1230 cm-1 region associated with rR C-H in-plane bending vibrations of the methine hydrogen. These vibrations are observed at 1207 ev cm-1 for deoxyHb/metHb (high spin domed porphyrin) as well as at 1224 cm-1 for oxyHb (low spin planar porphyrin)16, 18 shown in Figure 3B. iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 It has been previously shown that the biochemical changes in Hb initiate as soon as blood exit the human body26. However, aging/storage conditions have a significant effect on degradation of blood cells. Studies on RBCs in blood bags stored at 4℃ do not show instant saturation of deoxyHb to oxyHb16, 18, 19, as the conversion occurs gradually over a period of 42 days owing to its limited contact with ambient oxygen. This is contradictory to the bloodstains that underwent aging in ambient air where oxygen saturation completes as soon as the drop is deposited on the substrate and within 1 hour of aging the band at 1638 cm-1 starts to decrease due to autooxidation of oxyHb to metHb (Figure 3D). Previous reports have shown increase in the intensity of the https://mc.manuscriptcentral.com/asp Page 16 of 40 Page 17 of 40 band at 1638 cm-1 (oxygenation marker) 16, 18, 19 over a period of 42 days at 4℃ in a blood bag. It is likely that autooxidation occurs at a slower pace in a blood bag at 4℃18, 30 as compared to bloodstains aging at room temperature exposed to ambient conditions. This could be due to the fact that RBCs remain in a liquid state in the blood bag at 4℃ where enzymes responsible for converting metHb back to deoxyHb remain active for longer duration9 as opposed to dry bloodstains. Freshly deposit bloodstains transformed to dry state within 1.5 hrs of deposition due to evaporation of water as well as coagulation. These macroscopic changes are clearly reflected in the corresponding Raman spectra (Figure 2 and 3). The ratio-metric analysis of these spectral Fo changes which followed exponential trends illustrated the denaturation/aggregation of Hb and rP transformation of oxyHb to metHb/hemichrome over the course of two weeks (336hrs). The ee exponential fit suggests that the transition rate of oxyHb into metHb and hemichrome is rapid for the initial 12hrs and then slows down (Figure 4). Even though the ratio-metric analysis of rR specific Raman bands is semi-quantitative, the biphasic autooxidation of oxyHb in bloodstains ev can be visualized as discussed by Bremmer et al.43 They observed that the initial oxidation of oxyHb is rapid and slows down thereafter. Bremmer et al.43 correlated the biphasic decay with iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy change in water and salt content in blood both of which vary with environmental factors such as temperature and humidity. While the present study is performed in ambient environment, some of these Raman spectral changes were clearly visible even in the presence of substrate signals (Figure 5). Several studies have evaluated blood spectra in the presence of unwanted background signals10, 29, 33. Reconstitution of blood spectra by immersing bloodstains (on substrate) in water have shown promising results10, however, addition of water to bloodstains affects the changes that occur over time43. Another approach is based on the identification of an adequate laser source https://mc.manuscriptcentral.com/asp Applied Spectroscopy specific to a substrate to reduce background contribution and post collection data treatment29, 33. However, the post collection data processing works well only when using a particular combination of substrate and excitation laser wavelength29. Although, these studies were focused on identification of body fluids in presence of contaminants using multivariate analysis, none of them actually used composite (blood on substrate) spectra to estimate the TSD. In this study, we used composite Raman spectra along with multivariate analysis tools to estimate TSD for bloodstains. The performance of our cross-validated PCR model built using four donors was comparable with previously published PCR model built using only one donor24. This proves the Fo robustness of our model despite donor to donor variation. The novel aspect of this study is the rP implementation of LASSO regression guided feature (wavenumbers) selection algorithm ee designed to avoid interference from substrate signals. The universal LASSO model trained on pure blood spectra selects the features that reveal an explicit relationship with age of bloodstains rR with their respective coefficient estimates. An automated feature selection algorithm was devised ev to use these features along with pure substrate spectra to obtain substrate specific subset of features. Common features selected for all three susbstrates are related to spin states and iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 oxygenation states of Hb (Figure 6). It must be noted that even though the universal LASSO model uses a sparse set of features (Figure 6i) as compared to PCR which uses the entire Raman spectra, performance of both models are comparable, demonstrating that all features are not required to estimate age. However, the substrate specific LASSO models, based on subset of features (wavenumbers) free from substrate interference, outperformed the respective PCR models (Table 3). The relatively poor performance of PCR using composite spectra is likely due to the interference of substrate background, the contribution of which varies with the thickness of https://mc.manuscriptcentral.com/asp Page 18 of 40 Page 19 of 40 bloodstains. Therefore, use of PCR model which is widely used to predict age of the bloodstains, may prove counterproductive for predicting TSD for a composite (blood on substrate) spectra. To best of our knowledge, this is the first report to predict TSD of bloodstains on test data sets consisting of composite (blood on substrate) spectra (Figure 7). The feature selection algorithm (Figure 1) can be used to extract the subsets of features for any new substrates encountered at the crime scene. Importantly, training of models on pure bloodstain data eliminates the need of acquisition of training data onsite, this can be done in house with multiple volunteers and Fo timepoints to enhance the robustness of the model leading to faster onsite processing. Accurate predictions of TSD for composite spectra (test data) using LASSO models which were trained on rP pure bloodstains spectra (Figure 7), demonstrated the efficacy and scalability of our approach to ee the other substrates. However, the LASSO model needs to be tested with other substrates while being investigated for longer time points and attempting to understand the effect of environment on bloodstains aging. Conclusions iew ev rR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy Raman spectroscopy is sensitive to changes in blood as a function of time and therefore has the potential in creating a forensic timeline of bloodstain age. In this study, we demonstrated the feasibility of Raman spectroscopy to determine age of bloodstains in the presence of spectral contribution from different substrates commonly encountered at crime scene. As a bloodstain aged with time, exponential trends were identified in the specific spectral band ratios that were indicative of oxyHb to metHb conversion and formation of degraded product such as hemichrome. A LASSO regression model was developed on pure bloodstains spectra to extract the important features contributing to age estimation of bloodstains. Further, an automated https://mc.manuscriptcentral.com/asp Applied Spectroscopy feature selection algorithm was devised to use these features along with pure substrate bands to obtain a substrate specific subset of features that maximize the model performance and minimize the substrate interference. The substrate specific LASSO models trained on pure bloodstains spectra using only the subset of features were tested on composite (blood on substrate) spectra for each substrate to yield superior accuracy (R2 and RMSE) compared to the PCR models. Importantly, our current approach can easily be extended to other substrates. Recent emergence of portable probe-based systems along with the proposed LASSO selection approach could potentially ensure on-site utility of Raman spectroscopy for determining age of bloodstains in the Fo presence of contaminants. Conflicts of interest ev rR There are no conflicts to declare. ee rP Acknowledgements Authors would like to thank all the volunteers who agreed to take part in the study. iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 References 1. Alshehhi, S.; McCallum, N. A.; Haddrill, P. R., Quantification of RNA degradation of blood-specific markers to indicate the age of bloodstains. Forensic Science International: Genetics Supplement Series 2017, 6, e453-e455. 2. Fujita, Y.; Tsuchiya, K.; Abe, S.; Takiguchi, Y.; Kubo, S.-i.; Sakurai, H., Estimation of the age of human bloodstains by electron paramagnetic resonance spectroscopy: Long-term https://mc.manuscriptcentral.com/asp Page 20 of 40 Page 21 of 40 controlled experiment on the effects of environmental factors. Forensic Science International 2005, 152 (1), 39-43. 3. Strasser, S.; Zink, A.; Kada, G.; Hinterdorfer, P.; Peschel, O.; Heckl, W. M.; Nerlich, A. G.; Thalhammer, S., Age determination of blood spots in forensic medicine by force spectroscopy. Forensic Science International 2007, 170 (1), 8-14. 4. Bremmer, R. H.; Nadort, A.; van Leeuwen, T. G.; van Gemert, M. J. C.; Aalders, M. C. G., Age estimation of blood stains by hemoglobin derivative determination using reflectance Fo spectroscopy. Forensic Science International 2011, 206 (1), 166-171. 5. Shine, S. M.; Suhling, K.; Beavil, A.; Daniel, B.; Frascione, N., The applicability of rP fluorescence lifetime to determine the time since the deposition of biological stains. Analytical Methods 2017, 9 (13), 2007-2013. rR 6. ee Majda, A.; Wietecha-Posłuszny, R.; Mendys, A.; Wójtowicz, A.; Łydżba-Kopczyńska, B., Hyperspectral imaging and multivariate analysis in the dried blood spots investigations. Applied 7. iew Physics A 2018, 124 (4), 312. ev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy Lin, H.; Zhang, Y.; Wang, Q.; Li, B.; Huang, P.; Wang, Z., Estimation of the age of human bloodstains under the simulated indoor and outdoor crime scene conditions by ATR-FTIR spectroscopy. Scientific Reports 2017, 7 (1), 13254. 8. Bremmer, R. H.; de Bruin, K. G.; van Gemert, M. J. C.; van Leeuwen, T. G.; Aalders, M. C. G., Forensic quest for age determination of bloodstains. Forensic Science International 2012, 216 (1), 1-11. https://mc.manuscriptcentral.com/asp Applied Spectroscopy 9. Zadora, G.; Menżyk, A., In the pursuit of the holy grail of forensic science – Spectroscopic studies on the estimation of time since deposition of bloodstains. TrAC Trends in Analytical Chemistry 2018, 105, 137-165. 10. Boyd, S.; Bertino, M. F.; Seashols, S. J., Raman spectroscopy of blood samples for forensic applications. Forensic Science International 2011, 208 (1), 124-128. 11. Muro, C. K.; Doty, K. C.; de Souza Fernandes, L.; Lednev, I. K., Forensic body fluid identification and differentiation by Raman spectroscopy. Forensic Chemistry 2016, 1, 31-38. 12. Fo Khandasammy, S. R.; Fikiet, M. A.; Mistek, E.; Ahmed, Y.; Halámková, L.; Bueno, J.; Lednev, I. K., Bloodstains, paintings, and drugs: Raman spectroscopy applications in forensic rP science. Forensic Chemistry 2018, 8, 111-133. 13. ee Mahadevan-Jansen, A.; Richards-Kortum, R. R. In Raman spectroscopy for the detection rR of cancers and precancers, SPIE: 1996; p 40. 14. Pence, I.; Mahadevan-Jansen, A., Clinical instrumentation and applications of Raman ev spectroscopy. Chemical Society Reviews 2016, 45 (7), 1958-1979. 15. iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 O'Brien, C. M.; Cochran, K. J.; Masson, L. E.; Goldberg, M.; Marple, E.; Bennett, K. A.; Reese, J.; Slaughter, J. C.; Newton, J. M.; Mahadevan-Jansen, A., Development of a visually guided Raman spectroscopy probe for cervical assessment during pregnancy. J Biophotonics 2018, 0 (0), e201800138. 16. Vardaki, M. Z.; Atkins, C. G.; Schulze, H. G.; Devine, D. V.; Serrano, K.; Blades, M. W.; Turner, R. F. B., Raman spectroscopy of stored red blood cell concentrate within sealed transfusion blood bags. Analyst 2018, 143 (24), 6006-6013. https://mc.manuscriptcentral.com/asp Page 22 of 40 Page 23 of 40 17. Atkins, C. G.; Buckley, K.; Blades, M. W.; Turner, R. F. B., Raman Spectroscopy of Blood and Blood Components. Appl Spectrosc 2017, 71 (5), 767-793. 18. Gautam, R.; Oh, J. Y.; Marques, M. B.; Dluhy, R. A.; Patel, R. P., Characterization of Storage-Induced Red Blood Cell Hemolysis Using Raman Spectroscopy. Lab Med 2018, 49 (4), 298-310. 19. Gautam, R.; Oh, J. Y.; Patel, R. P.; Dluhy, R. A., Non-invasive analysis of stored red blood cells using diffuse resonance Raman spectroscopy. Analyst 2018, 143 (24), 5950-5958. 20. Fo McLaughlin, G.; Doty, K. C.; Lednev, I. K., Raman Spectroscopy of Blood for Species Identification. Analytical Chemistry 2014, 86 (23), 11628-11633. 21. rP Sikirzhytskaya, A.; Sikirzhytski, V.; Lednev, I. K., Determining Gender by Raman ee Spectroscopy of a Bloodstain. Analytical Chemistry 2017, 89 (3), 1486-1492. 22. rR Lin, D.; Zheng, Z.; Wang, Q.; Huang, H.; Huang, Z.; Yu, Y.; Qiu, S.; Wen, C.; Cheng, M.; Feng, S., Label-free optical sensor based on red blood cells laser tweezers Raman spectroscopy ev analysis for ABO blood typing. Optics Express 2016, 24 (21), 24750-24759. 23. iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy Doty, K. C.; McLaughlin, G.; Lednev, I. K., A Raman “spectroscopic clock” for bloodstain age determination: the first week after deposition. Analytical and Bioanalytical Chemistry 2016, 408 (15), 3993-4001. 24. Doty, K. C.; Muro, C. K.; Lednev, I. K., Predicting the time of the crime: Bloodstain aging estimation for up to two years. Forensic Chemistry 2017, 5, 1-7. 25. Maggie O’Connor, K. J., Joseph Hodge, Christine O’Brien, Isaac Pence, and Anita Mahadevan-Jansen Shedding New Light on Forensic Timelines. Spectroscopy 2016, pp 40-45. https://mc.manuscriptcentral.com/asp Applied Spectroscopy 26. Lemler, P.; Premasiri, W. R.; DelMonaco, A.; Ziegler, L. D., NIR Raman spectra of whole human blood: effects of laser-induced and in vitro hemoglobin denaturation. Analytical and Bioanalytical Chemistry 2014, 406 (1), 193-200. 27. Menżyk, A.; Damin, A.; Martyna, A.; Alladio, E.; Vincenti, M.; Martra, G.; Zadora, G., Toward a novel framework for bloodstains dating by Raman spectroscopy: How to avoid sample photodamage and subsampling errors. Talanta 2019, 120565. 28. Doty, K. C.; Lednev, I. K., Differentiating Donor Age Groups Based on Raman Fo Spectroscopy of Bloodstains for Forensic Purposes. ACS Central Science 2018. 29. McLaughlin, G.; Sikirzhytski, V.; Lednev, I. K., Circumventing substrate interference in rP the Raman spectroscopic identification of blood stains. Forensic Science International 2013, 231 (1), 157-166. rR 30. ee Gebrekidan, M. T.; Knipfer, C.; Stelzle, F.; Popp, J.; Will, S.; Braeuer, A., A shifted- excitation Raman difference spectroscopy (SERDS) evaluation strategy for the efficient isolation ev of Raman spectra from extreme fluorescence interference. Journal of Raman Spectroscopy iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 2016, 47 (2), 198-209. 31. Lieber, C. A.; Mahadevan-Jansen, A., Automated Method for Subtraction of Fluorescence from Biological Raman Spectra. Applied Spectroscopy 2003, 57 (11), 1363-1367. 32. Sharma, V.; Kumar, R., Trends of chemometrics in bloodstain investigations. TrAC Trends in Analytical Chemistry 2018, 107, 181-195. 33. Sikirzhytskaya, A.; Sikirzhytski, V.; McLaughlin, G.; Lednev, I. K., Forensic Identification of Blood in the Presence of Contaminations Using Raman Microspectroscopy Coupled with https://mc.manuscriptcentral.com/asp Page 24 of 40 Page 25 of 40 Advanced Statistics: Effect of Sand, Dust, and Soil. Journal of Forensic Sciences 2013, 58 (5), 1141-1148. 34. Tibshirani, R., Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 1996, 58 (1), 267-288. 35. Bi, X.; Rexer, B.; Arteaga, C. L.; Guo, M.; Mahadevan-Jansen, A., Evaluating HER2 amplification status and acquired drug resistance in breast cancer cells using Raman spectroscopy. Journal of biomedical optics 2014, 19 (2), 025001-025001. 36. Fo Chen, G.; Lin, X.; Lin, D.; Ge, X.; Feng, S.; Pan, J.; Lin, J.; Huang, Z.; Huang, X.; Chen, R., Identification of different tumor states in nasopharyngeal cancer using surface-enhanced rP Raman spectroscopy combined with Lasso-PLS-DA algorithm. RSC Advances 2016, 6 (10), 7760- ee 7764. 37. rR Zhao, J.; Zeng, H.; Kalia, S.; Lui, H., Wavenumber selection based analysis in Raman spectroscopy improves skin cancer diagnostic specificity. Analyst 2016, 141 (3), 1034-1043. Gautam, R.; Vanga, S.; Ariese, F.; Umapathy, S., Review of multidimensional data iew 38. ev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy processing approaches for Raman and infrared spectroscopy. EPJ Techniques and Instrumentation 2015, 2 (1), 8. 39. Felten, J.; Hall, H.; Jaumot, J.; Tauler, R.; de Juan, A.; Gorzsás, A., Vibrational spectroscopic image analysis of biological material using multivariate curve resolution– alternating least squares (MCR-ALS). Nature Protocols 2015, 10, 217. 40. Gautam, R.; Vanga, S.; Madan, A.; Gayathri, N.; Nongthomba, U.; Umapathy, S., Raman spectroscopic studies on screening of myopathies. Anal Chem 2015, 87 (4), 2187-94. https://mc.manuscriptcentral.com/asp Applied Spectroscopy 41. Rao, S.; Bálint, Š.; Cossins, B.; Guallar, V.; Petrov, D., Raman Study of Mechanically Induced Oxygenation State Transition of Red Blood Cells Using Optical Tweezers. Biophysical Journal 2009, 96 (1), 209-216. 42. Asghari-Khiavi, M.; Mechler, A.; Bambery, K. R.; McNaughton, D.; Wood, B. R., A resonance Raman spectroscopic investigation into the effects of fixation and dehydration on heme environment of hemoglobin. Journal of Raman Spectroscopy 2009, 40 (11), 1668-1674. 43. Bremmer, R. H.; de Bruin, D. M.; de Joode, M.; Buma, W. J.; van Leeuwen, T. G.; Fo Aalders, M. C. G., Biphasic Oxidation of Oxy-Hemoglobin in Bloodstains. PLOS ONE 2011, 6 (7), e21845. iew ev rR ee rP 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/asp Page 26 of 40 Page 27 of 40 Tables Table 2. LASSO features selected for analysis from pure bloodstains and the subset of features (wavenumbers) selected by the algorithm for Floor-tile, Facial-tissue and Linoleum-polymer substrates respectively as shown in Figure 6. Pure bloodstains Features selected (wavenumber cm-1) 653.5, 654.5, 710.5, 722.5, 734.5, 745, 764.5, 794, 794.5, 798.5, 808.5, 824.5, 839, 839.5, 852.5, 873, 881, 906.5, 936.5, 954.5, 973, 977.5, 978, 980.5, 1002, 1015, 1059.5, 1080, 1089, 1126, 1126.5, 1127, 1127.5, 1132, 1144.5, 1157.5, 1186.5, 1198.5, 1208.5, 1220, 1220.5, 1223, 1225, 1225.5, 1226, 1230.5, 1231, 1231.5, 1278, 1334.5, 1363, 1366.5, 1367, 1368, 1368.5, 1369.5, 1375, 1375.5, 1376.5, 1381, 1381.5, 1382, 1388, 1400.5, 1423.5, 1425, 1426, 1431.5, 1432, 1453, 1453.5, 1455.5, 1456.5, 1461, 1461.5, 1464.5, 1473, 1481, 1481.5, 1517, 1518, 1570, 1571, 1575.5, 1578, 1582, 1582.5, 1583, 1598, 1599, 1602, 1602.5, 1603, 1606.5, 1612, 1624.5, 1625.5, 1630.5, 1631.5, 1660, 1707.5, 1720 653.5, 654.5, 722.5, 734.5, 745, 764.5,794, 794.5, 798.5, 808.5, 824.5, 839, 839.5, 852.5, 873, 881, 906.5, 936.5, 954.5, 973, 977.5, 978, 980.5, 1002, 1015, 1059.5, 1126, 1126.5, 1127, 1127.5, 1132, 1144.5, 1157.5, 1186.5, 1198.5, 1208.5, 1220, 1220.5, 1223, 1225, 1225.5, 1226, 1388, 1400.5, 1423.5, 1425, 1426, 1431.5, 1432, 1453, 1453.5, 1455.5, 1456.5, 1461, 1461.5, 1464.5, 1473, 1481, 1481.5, 1517, 1518, 1625.5, 1630.5, 1631.5, 1660, 1707.5, 1720 653.5, 654.5, 852.5, 936.5, 954.5, 1208.5, 1220, 1570, 1571, 1575.5, 1578, 1582, 1582.5, 1583, 1707.5, 1720 rR 734.5, 745, 764.5, 1015, 1144.5, 1198.5, 1208.5, 1220, 1220.5, 1223, 1225, 1225.5, 1226, 1230.5, 1231, 1231.5, 1517, 1518, 1570, 1571, 1575.5, 1578, 1630.5, 1631.5, 1660 iew ev Facialtissue Linoleumpolymer ee Floor-tile rP Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy Table 3. Comparison between the performance parameters for the PCR and LASSO derived models to estimate age of the bloodstains on three substrates. All Variables Test sets LASSO guided features selection LASSO PCR LASSO R2 0.7762 0.7119 0.9240 RMSE 0.3161 0.4039 0.2074 R2 0.8196 0.6039 0.9262 RMSE 0.3926 0.4736 0.2044 R2 -0.6951 0.4406 0.9132 RMSE 0.9798 0.5629 0.2216 Floor-tile Facial-tissue Linoleumpolymer https://mc.manuscriptcentral.com/asp Applied Spectroscopy Figures iew ev rR ee rP Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Figure 1. Flow chart describing the steps of the automated feature selection process. The threshold is set to a value equal to 1/10th of the sum of five smallest peak intensities in the minmax (0 to 1) normalized substrate spectrum. https://mc.manuscriptcentral.com/asp Page 28 of 40 Page 29 of 40 rP Fo Figure 2. Representative Raman spectra of pure bloodstains recorded over a period of two weeks on aluminum substrate separated for clarity. (A) Radical spectral changes occur in the first 12hrs ee of time since deposition (TSD), (B) Gradual shifts and broadening of the bands are observed rR with aging from day-2 (24hrs) to day-15 (336hrs) TSD. Dashed lines indicate selective wavenumbers at which clear differences were observed overtime. iew ev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy https://mc.manuscriptcentral.com/asp Applied Spectroscopy iew ev rR ee rP Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Figure 3. Raman spectral changes in pure bloodstains over a period of two weeks as indicated by dashed arrows: (A) appearance of 971 cm-1 band associated with disorderness in protein, (B) appearance of 1248 cm-1 and decrease of 1224 cm-1 assigned to heme aggregation and oxyHb respectively, (C) increase in 1371 cm-1 band associated with Fe3+ oxidation state and (D) disappearance of 1638 cm-1 and increase in1577 cm-1 associated with oxyHb and metHb respectively. https://mc.manuscriptcentral.com/asp Page 30 of 40 Page 31 of 40 iew ev rR ee rP Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy Figure 4. Ratiometric analysis of specific Raman bands intensities: (A) 971cm-1/937cm-1 band associated with denaturation of protein (B) 1248 cm-1/1224 cm-1 associated with increase in protein aggregation and decrease in oxyHb (C) 1371 cm-1/1376 cm-1 increase in ferric Fe at the cost of ferrous Fe (D) 1638cm-1/1577cm-1 related to disappearance of oxyHb (1638 cm-1) and increase in metHb (1577 cm-1). https://mc.manuscriptcentral.com/asp Applied Spectroscopy rP Fo Figure 5. Representative Raman spectra of blood on substrate plotted with pure bloodstains ee (fresh) and pure substrate spectra for comparison (A) Floor-tile, (B) Facial-tissue and (C) Linoleum-polymer. Shaded areas indicate regions affected by substrate background. iew ev rR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/asp Page 32 of 40 Page 33 of 40 iew ev rR ee rP Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy Figure 6. Comparison of substrate signals with LASSO features: (i) LASSO features selected for analysis from pure bloodstains, solid black lines are marked on pure substrate spectrum to indicate the subset of features (wavenumbers) selected by the algorithm for (ii) Floor-tile, (iii) Facial-tissue and (iv) Linoleum-polymer respectively. The wavenumber values corresponding to the marked lines are tabulated in Table 2. Dotted lines represent the LASSO features overlapped with strong substrate peaks. https://mc.manuscriptcentral.com/asp Applied Spectroscopy iew ev rR ee rP Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Figure 7. LASSO regression results for (A) pure bloodstains showing cross validation (CV) on training data set and B) bloodstains on floor-tile, (C) bloodstains on facial-tissue, (D) bloodstains on linoleum-polymer showing training and test datasets. Training was performed on pure bloodstains spectra using selected features and testing was performed on test data sets consisting of composite (blood on substrate) spectra. All bloodstains were aged up to two weeks. https://mc.manuscriptcentral.com/asp Page 34 of 40 Page 35 of 40 rR ee rP Fo Figure 1. Flow chart describing the steps of the automated feature selection process. The threshold is set to a value equal to 1/10th of the sum of five smallest peak intensities in the min-max (0 to 1) normalized substrate spectrum. ev 80x59mm (300 x 300 DPI) iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy https://mc.manuscriptcentral.com/asp Applied Spectroscopy Fo Figure 2. Representative Raman spectra of pure bloodstains recorded over a period of two weeks on aluminum substrate separated for clarity. (A) Radical spectral changes occur in the first 12hrs of time since deposition (TSD), (B) Gradual shifts and broadening of the bands are observed with aging from day-2 (24hrs) to day-15 (336hrs) TSD. Dashed lines indicate selective wavenumbers at which clear differences were observed overtime. ee rP 170x77mm (300 x 300 DPI) iew ev rR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 https://mc.manuscriptcentral.com/asp Page 36 of 40 Page 37 of 40 rR ee rP Fo Figure 3. Raman spectral changes in pure bloodstains over a period of two weeks as indicated by dashed arrows: (A) appearance of 971 cm-1 band associated with disorderness in protein, (B) appearance of 1248 cm-1 and decrease of 1224 cm-1 assigned to heme aggregation and oxyHb respectively, (C) increase in 1371 cm-1 band associated with Fe3+ oxidation state and (D) disappearance of 1638 cm-1 and increase in1577 cm-1 associated with oxyHb and metHb respectively. iew ev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy 169x130mm (300 x 300 DPI) https://mc.manuscriptcentral.com/asp Applied Spectroscopy rR ee rP Fo Figure 4. Ratiometric analysis of specific Raman bands intensities: (A) 971cm-1/937cm-1 band associated with denaturation of protein (B) 1248 cm-1/1224 cm-1 associated with increase in protein aggregation and decrease in oxyHb (C) 1371 cm-1/1376 cm-1 increase in ferric Fe at the cost of ferrous Fe (D) 1638cm1/1577cm-1 related to disappearance of oxyHb (1638 cm-1) and increase in metHb (1577 cm-1). iew ev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 169x133mm (300 x 300 DPI) https://mc.manuscriptcentral.com/asp Page 38 of 40 Page 39 of 40 Fo Figure 5. Representative Raman spectra of blood on substrate plotted with pure bloodstains (fresh) and pure substrate spectra for comparison (A) Floor-tile, (B) Facial-tissue and (C) Linoleum-polymer. Shaded areas indicate regions affected by substrate background. rP 170x72mm (300 x 300 DPI) iew ev rR ee 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy https://mc.manuscriptcentral.com/asp Applied Spectroscopy iew ev rR ee rP Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Figure 6. Comparison of substrate signals with LASSO features: (i) LASSO features selected for analysis from pure bloodstains, solid black lines are marked on pure substrate spectrum to indicate the subset of features (wavenumbers) selected by the algorithm for (ii) Floor-tile, (iii) Facial-tissue and (iv) Linoleumpolymer respectively. The wavenumber values corresponding to the marked lines are tabulated in Table 2. Dotted lines represent the LASSO features overlapped with strong substrate peaks. 80x74mm (300 x 300 DPI) https://mc.manuscriptcentral.com/asp Page 40 of 40 Page 41 of 40 ev rR ee rP Fo Figure 7. LASSO regression results for (A) pure bloodstains showing cross validation (CV) on training data set and B) bloodstains on floor-tile, (C) bloodstains on facial-tissue, (D) bloodstains on linoleum-polymer showing training and test datasets. Training was performed on pure bloodstains spectra using selected features and testing was performed on test data sets consisting of composite (blood on substrate) spectra. All bloodstains were aged up to two weeks. iew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Applied Spectroscopy 169x138mm (300 x 300 DPI) https://mc.manuscriptcentral.com/asp