309
Governance by Evidence: Regulated Predictors in Decision-Tree Models
arXiv:2512.15955v1 Announce Type: new
Abstract: Decision-tree methods are widely used on structured tabular data and are valued for interpretability across many sectors. However, published studies often list the predictors they use (for example age, diagnosis codes, location). Privacy laws increasingly regulate such data types. We use published decision-tree papers as a proxy for real-world use of legally governed data. We compile a corpus of decision-tree studies and assign each reported predictor to a regulated data category (for example health data, biometric identifiers, children's data, financial attributes, location traces, and government IDs). We then link each category to specific excerpts in European Union and United States privacy laws. We find that many reported predictors fall into regulated categories, with the largest shares in healthcare and clear differences across industries. We analyze prevalence, industry composition, and temporal patterns, and summarize regulation-aligned timing using each framework's reference year. Our evidence supports privacy-preserving methods and governance checks, and can inform ML practice beyond decision trees.
Abstract: Decision-tree methods are widely used on structured tabular data and are valued for interpretability across many sectors. However, published studies often list the predictors they use (for example age, diagnosis codes, location). Privacy laws increasingly regulate such data types. We use published decision-tree papers as a proxy for real-world use of legally governed data. We compile a corpus of decision-tree studies and assign each reported predictor to a regulated data category (for example health data, biometric identifiers, children's data, financial attributes, location traces, and government IDs). We then link each category to specific excerpts in European Union and United States privacy laws. We find that many reported predictors fall into regulated categories, with the largest shares in healthcare and clear differences across industries. We analyze prevalence, industry composition, and temporal patterns, and summarize regulation-aligned timing using each framework's reference year. Our evidence supports privacy-preserving methods and governance checks, and can inform ML practice beyond decision trees.