Jarrar Haider

Graduate student at the University of Maryland's Smith School of Business, studying Business Analytics.
I'm skilled in statistical analysis, programming (R, SQL, and Python), and data visualization, with an ability to leverage modern software tools and technologies to handle big data and generate actionable insights

Hey there, I'm Jarrar👋

I work at the intersection of business, data, and technology, and I am interested in education, learning design, and user experience research.

Beyond the data, I'm a fan of fiction (absolutely love works of Gabriel Garcia Marquez!), poetry, and cooking delicious South Asian delicacies (Biryani, anyone?). So, whether you want to chat about data insights or the latest novel you're reading, let's connect and geek out together!

Below is the list of projects that I have worked on in the last few years

Classification Model for Twitter Spam Detection based on User and Tweet Characteristics

GitHub

In response to escalating cyber threats and online fraud on social media platforms, this data mining project aims to create a predictive model for identifying Twitter spammers based on user accounts and tweet characteristics. Detecting patterns indicative of cyber attacks and malicious activities enables social media companies to refine their cybersecurity measures, preserving platform integrity.

The project seeks to answer three key questions: Can data patterns unveil Twitter accounts engaged in cyber attacks or malicious activities? How does the behavior of Twitter spammers differ from legitimate users concerning account and tweet characteristics, and how can this enhance cybersecurity measures? Can we predict a Twitter account's spam likelihood based on its attributes, and which factors are most influential? The study employed diverse classification techniques like Logistic Regression, K-Nearest Neighbors, Naive Bayes, Classification Trees, Random Forests, Bagging, and Boosting. The model with the best performance metrics (Accuracy and AUC) on a test set was selected and further fine-tuned and validated to confirm its accuracy.

Twitter's popularity has attracted malicious actors over the past decade. This research deploys a model that effectively classifies spammers by incorporating user and tweet features. Among the seven techniques tested, Random Forest stood out with an 88.25% accuracy rate. While not exhaustive, the findings offer valuable insights for Twitter to identify spam accounts and have broader implications for cybersecurity. This study underscores the importance of data mining techniques in combating malicious activity, encouraging social media and website platforms to consider their adoption.

Analyzing Demographics in Fatal Crashes: Insights for Washington Traffic Safety Policy

Our project focused on the 'Washington Fatal Crash Files,' aiming to provide insights, predictions, and policy recommendations to the Washington Traffic Safety Commission (WTSC). The primary objective was to assess whether individuals involved in fatal crashes in particular communities were residents of those communities and to pinpoint ZIP codes with a high prevalence of high-risk drivers, especially in areas near the borders of Oregon and Idaho.

Our methodology involved exploratory data analysis to identify trends and patterns and logistic regression to predict crash fatality probabilities and evaluate the impact of individual variables on fatality odds. The analysis revealed a significant proportion of non-resident drivers involved in fatal crashes (28% residents vs. 72% non-residents) and disparities in crash types and behavioral factors between residents and non-residents. Several high-risk ZIP codes were also identified, along with specific population demographics associated with increased risk.

In summary, this analysis underscores the importance of understanding driver demographics and commuting patterns in specific communities where fatal crashes occur. The findings offer valuable insights for policymakers and program managers seeking to develop effective traffic safety initiatives and interventions, ultimately contributing to improved road safety in Washington.

Workforce Retention Analytics: Predicting Employee Attrition for Improved HR Management

The HR analytics project aimed at predicting employee attrition, an issue of significant concern in the business world. With over 4.25 million people leaving their jobs in the U.S. in January 2022, the cost of replacing employees is estimated to be 1.5 to 2 times their annual salary. The project underscores the importance of addressing attrition early to prevent long-lasting damage to organizations, citing contributors to employee burnout, such as unfair treatment, unmanageable workloads, and lack of role clarity.

The project utilized HR Analytics data from Kaggle, focusing on 14,999 observations and 10 features, including satisfaction levels, performance evaluations, number of projects, and more. The project's mission objectives were to provide insights and predictions to help company leadership mitigate attrition.

Four machine learning models were implemented, with Random Forest achieving the highest accuracy, followed by Decision Tree, Logistic Regression, and Naive Bayes. Random Forest, despite its accuracy, was noted to potentially have an overfitting issue.

In conclusion, this analysis quantifies the performance of the machine learning models and highlights the strengths and weaknesses of each, offering valuable insights for organizations to address and potentially reduce employee attrition.

Sports Analytics Database Design: Leveraging DDL, DML, and Tableau for Performance Optimization

In this data analytics project, we expertly employed Data Definition Language (DDL) and Data Manipulation Language (DML) techniques to efficiently extract, transform, and load data into a meticulously designed database. This approach significantly bolstered the organization's data analysis capabilities.

Our project centered around the creation of an extensive database for a fictional football team, encompassing all team players. We diligently loaded this data into MS SQL Server, ensuring data integrity and consistency across various tables by constructing an ER diagram complete with primary and foreign keys using DDL.

A critical aspect of our project involved the formulation and execution of queries utilizing DML, effectively extracting pertinent data from the database to address pressing business challenges. To make the data more accessible and insightful, we harnessed Tableau to generate interactive dashboards. These dashboards provided a visual representation of the extracted data, allowing us to envision a practical scenario where this data-driven approach could significantly benefit football clubs in optimizing their team's performance through insightful data analysis.

In summary, our project harnessed DDL and DML techniques along with SQL and Tableau to build a comprehensive database and interactive visualization tools, offering a data-centric solution that holds the potential to revolutionize football team management. This approach provides clubs with the means to make informed decisions and enhance their team's performance, ushering in a new era of data-driven sports management.

Reimagining Staycations in the Heart of NYC: A Data-Driven Apartment Rental Model

During an exciting hackathon, we had the unique opportunity to develop an innovative business model for a short-term apartment rental platform located in the vibrant heart of New York City. This exhilarating journey allowed us to channel our creative, technical, and strategic capabilities into crafting a groundbreaking business model that is set to redefine the future of staycations in the city that never sleeps.

Our approach involved a comprehensive analysis of market trends, user behaviors, and competitive landscapes to identify key opportunities. Leveraging data-driven tools such as Python, Machine Learning, and Tableau, we formulated a compelling pitch that not only fulfills customer needs but also aligns with the preferences of potential investors.

To enhance our insights, we incorporated data sources such as reviews per month, construction year, rental prices, and information from The New York Times, which provided data on crime rates and transportation connectivity for various neighborhoods. By aggregating this information, we calculated scores for each neighborhood, shedding light on their relative attractiveness.

While our methodology showcased the top neighborhoods in Manhattan, it's important to acknowledge its limitations. Our approach did not account for dynamic factors like market trends, seasonal fluctuations, or occupancy rates, among other considerations.

Moreover, we harnessed the power of neural networks to explore the potential of real estate analysis. Neural networks offer promise due to their ability to learn intricate patterns within vast datasets. These networks can be trained on a wide array of input features, including location, property size, room count, building age, and historical sales and rental prices. By integrating neural networks into our approach, we aim to enhance our capacity to predict market trends accurately, identify lucrative investment opportunities, and make informed decisions within the dynamic real estate landscape.

In the culmination of our efforts, we identified three neighborhoods and proposed innovative strategies to attract both tourists and business travelers, ensuring a modern and appealing experience for all.

The Cloudy Story: Assessing the Risks of Amazon Web Services

I had the honor of representing the University of Maryland at the prestigious International Business Ethics and Sustainability Case Competition held at Loyola Marymount University in Los Angeles, CA. Our team tackled a significant case focusing on Amazon Web Services (AWS), the largest cloud computing company, which currently holds a 34% global market share among cloud service providers. Our case emphasized the environmental impact of energy production for data centers and the responsible disposal of electronic waste generated by its operations.

Key highlights:

● Advocated for advancing sustainability and responsible consumption within the cloud computing industry. Proposed cleaner energy solutions and responsible practices to mitigate the environmental impact of data center energy consumption and electronic waste disposal.

● Recommended increasing AWS's utilization of renewable energy sources and enhancing transparency in reporting renewable energy adoption to the public. These measures would demonstrate a strong commitment to reducing carbon emissions and promoting a greener cloud computing infrastructure.

● Proposed the establishment of an internal recycling center dedicated to reusing and reprocessing electronic waste generated by AWS's data centers. This initiative would contribute to minimizing electronic waste and promoting a circular economy within the cloud computing industry.

● Advocated for partnerships with hardware manufacturers to drive the development of eco-friendly hardware specifically designed for cloud computing. This collaboration would introduce sustainable practices at the hardware level, further enhancing AWS's commitment to long-term environmental sustainability.

Page updated

Google Sites

Report abuse