Vid-20220607-wa0021mp4 Online
0021 (The 21st media file saved on that specific day) How to Programmatically Create Features
If you are building a machine learning model or an organized archive, you should break this string into multiple sub-features. Feature Name Media_Type Extracted from the VID prefix Timestamp 2022-06-07 Parsed from YYYYMMDD format Year Useful for long-term trend analysis Month Useful for seasonal patterns Day_of_Week Derived from the date (2022-06-07 was a Tuesday) Source_App Identified by the WA string Daily_Index The unique counter for that day 2. Python Implementation (High-Speed Extraction) VID-20220607-WA0021mp4
import re from datetime import datetime filename = "VID-20220607-WA0021.mp4" # Fast extraction using Regex match = re.search(r"VID-(\d{4})(\d{2})(\d{2})-WA(\d+)", filename) if match: year, month, day, seq = match.groups() date_obj = datetime(int(year), int(month), int(day)) feature_set = { "is_video": True, "date": date_obj.strftime('%Y-%m-%d'), "day_name": date_obj.strftime('%A'), "whatsapp_sequence": int(seq) } print(feature_set) Use code with caution. 0021 (The 21st media file saved on that
In data science, keeping the raw filename is "noisy." By converting it into (e.g., "Tuesday") and numerical (e.g., sequence "21") values, you allow an algorithm to understand that a video sent on a weekend might be different from one sent during work hours, or that the 21st video of the day implies a high-activity period. In data science, keeping the raw filename is "noisy
