Predictive modeling of crime has become an important tool for law enforcement and public safety planning. In this section, we explore how machine learning techniques can be used to predict the type of crime a victim might experience based on demographic characteristics and contextual factors. By understanding these patterns, authorities can develop more targeted prevention strategies and allocate resources more effectively.
To build our predictive models, we used the Los Angeles crime dataset with features including victim age, gender, ethnicity, time of occurrence, and location type. We preprocessed the data by:
Feature | Unique_Values | Examples |
---|---|---|
Crime Categories | 7 | Other, Robbery, Fraud |
Victim Gender | 3 | Male, Unknown, Female |
Age Groups | 5 | 50-69, 30-49, 18-29 |
Premise Types | 10 | Residential, Commercial, Bank |
Time of Day | 4 | Night, Evening, Morning, Afternoon |
We built and evaluated several machine learning models to predict crime types:
Each model was evaluated on the test set using accuracy, class-wise sensitivity, and Top-N accuracy metrics.
Crime.Category | Accuracy…. |
---|---|
Overall (Validation) | 46.53 |
Overall (Test) | 46.89 |
Class: Assault | 66.24 |
Class: Fraud | 11.46 |
Class: Property_Damage | 32.96 |
Class: Public_Order | 0.10 |
Class: Sexual_Crimes | 0.06 |
Class: Theft | 62.29 |
Class: Violent_Crimes | 0.00 |
The logistic regression model achieved a moderate overall accuracy of ~46%. It performed best on common crime types like Assault (66.24%) and Theft (62.29%), but struggled with less frequent categories like Public Order (0.10%) and Fraud (11.46%). This behavior suggests the model is influenced by class imbalance.
Crime.Category | Accuracy…. |
---|---|
Overall (Validation) | 33.54 |
Overall (Test) | 33.65 |
Class: Assault | 13.86 |
Class: Fraud | 67.34 |
Class: Property_Damage | 36.45 |
Class: Public_Order | 27.96 |
Class: Sexual_Crimes | 53.02 |
Class: Theft | 37.98 |
Class: Violent_Crimes | 53.49 |
The weighted Random Forest shows lower overall accuracy (33.65%) compared to logistic regression, but demonstrates more balanced performance across crime categories. The class weighting strategy improved sensitivity for minority classes like Fraud (67.34%) and Public Order (27.96%), at the cost of performance on majority classes.
Top.N | Validation.Accuracy…. | Test.Accuracy…. |
---|---|---|
Top-1 | 49.11 | 49.50 |
Top-2 | 73.28 | 73.62 |
Top-3 | 85.61 | 85.84 |
Top-4 | 92.93 | 93.02 |
Top-5 | 97.31 | 97.26 |
Top.N | Validation.Accuracy…. | Test.Accuracy…. |
---|---|---|
Top-1 | 49.01 | 49.07 |
Top-2 | 73.27 | 73.58 |
Top-3 | 85.78 | 85.92 |
Top-4 | 93.17 | 93.12 |
Top-5 | 97.38 | 97.34 |
When evaluating models using Top-N accuracy, all models show substantial improvement beyond Top-1 predictions. Most notably, the ensemble methods (Random Forest and XGBoost) reach over 85% accuracy when considering their top 3 predictions, and nearly 98% with top 5 predictions. This suggests that while predicting the exact crime type is challenging, identifying a small set of likely crime types is highly feasible.
The confusion matrix helps us understand where our models make mistakes and identify patterns in misclassification.
The confusion matrix reveals several important patterns:
Based on our comprehensive model evaluation, we recommend the following deployment strategy:
Scenario | Recommended_Model | Justification |
---|---|---|
Emergency Response Prioritization | XGBoost with Top-3 Predictions | Balance between speed and accuracy; provides actionable multiple scenarios |
Community Risk Awareness | Random Forest with Top-5 Predictions | Higher coverage ensures most potential risks are identified for community education |
Resource Allocation | Logistic Regression with Class-Specific Thresholds | Simple interpretable model with adjustable thresholds for different resource types |
Individual Safety Planning | XGBoost with Probability Calibration | Best calibrated probability estimates for personal risk assessment |
Below is a demonstration of how our crime prediction model could be implemented in an interactive tool. Using demographic information and location details, the tool estimates the probability of different crime types.
Scenario | Gender | Age | Location | Time |
---|---|---|---|---|
Scenario 1 | Female | 18-29 | Commercial | Evening |
Scenario 2 | Male | 30-49 | Street | Night |
Scenario 3 | Female | 50-69 | Residential | Morning |
While our models show promising results, several limitations should be acknowledged:
Future research directions include:
Our analysis demonstrates that machine learning can effectively predict crime types based on victim demographics and contextual factors. While no model achieves perfect accuracy, ensemble methods like Random Forest and XGBoost provide reliable Top-3 predictions, which are practical for real-world applications.
The most important predictors of crime type are victim age, location type, and time of day. Different demographic groups face distinctly different risk profiles, which supports targeted prevention strategies rather than one-size-fits-all approaches.
For practical deployment, we recommend using a Top-3 prediction system that presents the most likely crime scenarios rather than a single prediction. This approach balances accuracy with actionable information, enabling more effective resource allocation and safety planning.