Vid-20220607-wa0021mp4 Online

0021 (The 21st media file saved on that specific day) How to Programmatically Create Features

If you are building a machine learning model or an organized archive, you should break this string into multiple sub-features. Feature Name Media_Type Extracted from the VID prefix Timestamp 2022-06-07 Parsed from YYYYMMDD format Year Useful for long-term trend analysis Month Useful for seasonal patterns Day_of_Week Derived from the date (2022-06-07 was a Tuesday) Source_App Identified by the WA string Daily_Index The unique counter for that day 2. Python Implementation (High-Speed Extraction) VID-20220607-WA0021mp4

import re from datetime import datetime filename = "VID-20220607-WA0021.mp4" # Fast extraction using Regex match = re.search(r"VID-(\d{4})(\d{2})(\d{2})-WA(\d+)", filename) if match: year, month, day, seq = match.groups() date_obj = datetime(int(year), int(month), int(day)) feature_set = { "is_video": True, "date": date_obj.strftime('%Y-%m-%d'), "day_name": date_obj.strftime('%A'), "whatsapp_sequence": int(seq) } print(feature_set) Use code with caution. 0021 (The 21st media file saved on that

In data science, keeping the raw filename is "noisy." By converting it into (e.g., "Tuesday") and numerical (e.g., sequence "21") values, you allow an algorithm to understand that a video sent on a weekend might be different from one sent during work hours, or that the 21st video of the day implies a high-activity period. In data science, keeping the raw filename is "noisy