Output Specifications

All of the analytical data is output in a single .json file. Certain datapoints exist regardless of the platform the VOD is from, some datapoints are specific to the platform.

Common fields:

Chat Analytics Data

The Chat Analytics object is directly transformed into JSON data.

class chat_analyzer.dataformat.ChatAnalytics(duration: float, interval: int, description: str, program_version: str, platform: str, duration_text: str = '', interval_text: str = '', mediaTitle: str = 'No Media Title', mediaSource: str = 'No Media Source', samples: ~typing.List[~chat_analyzer.dataformat.Sample] = <factory>, totalActivity: int = 0, totalChatMessages: int = 0, totalUniqueUsers: int = 0, overallAvgActivityPerSecond: float = 0, overallAvgChatMessagesPerSecond: float = 0, overallAvgUniqueUsersPerSecond: float = 0, highlights: ~typing.List[~chat_analyzer.dataformat.Highlight] = <factory>, highlights_duration: float = 0, highlights_duration_text: str = '', highlight_percentile: float = 0, highlight_metric: str = '', spikes: ~typing.List[~chat_analyzer.dataformat.Spike] = <factory>, _overallUserChats: dict = <factory>, _currentSample: ~typing.Optional[~chat_analyzer.dataformat.Sample] = None)

Bases: ABC

Class that contains the results of the chat data analysis/processing.

An instance of a subclass is created and then modified throughout the analysis process. After the processing of the data is complete, the object will contain all relevant results we are looking for.

This class cannot be directly instantiated, see the subclasses YoutubeChatAnalytics & TwitchChatAnalytics. YT and Twitch chats report/record data differently and contain site-specific events, so we centralize common data/fxnality and separate specifics into subclasses.

The object can then be converted to JSON/printed/manipulated as desired to format/output the results as necessary.

—

[Defined when class Initialized]:

duration: float: The total duration (in seconds) of the associated video/media. Message times correspond to the video times.
interval: int: The time interval (in seconds) at which to compress datapoints into samples. i.e. Duration of the samples. The smaller the interval, the more granular the analytics are. At interval=5, each sample contains 5 seconds of cumulative data. (With the exception of the last sample, which may be shorter than the interval.) This is b/c media duration is not necessarily divisible by the interval. #(samples in raw_data) is about (video duration/interval) (+1 if necessary to encompass remaining non-divisible data at end of data).
description: str: A description included to help distinguish it from other analytical data.
program_version: str: The version of the chat analytics program that was used to generate the data. Helps identify outdated/version-specific data formats.
platform: str: Used to store the platform the data came from: ‘www.youtube.com’, ‘www.twitch.tv’, ‘youtu.be’… While it technically can be determined by the type of subclass, this makes for easier conversion to JSON/output

[Automatically re-defined on post-init]:

duration_text: str: String representation of the media duration time.
interval_text: str: String representation of the interval time.

[Defined w/ default and modified DURING analysis]:

mediaTitle: str: The title of the media associated with the chatlog.
mediaSource: str: The link to the media associated with the chatlog (url that it was origianlly downloaded from or filepath of a chatfile).
samples: List[Sample]: An array of sequential samples, each corresponding to data about a section of chat of ‘interval’ seconds long. Each sample has specific data corresponding to a time interval of the vid. See the ‘Sample’ class
totalActivity: int: The total number of messages/things (of any type!) that appeared in chat. (Sum of intervalActivity from all samples) Includes messages,notifications,subscriptions, superchats, … anything that appeared in chat
totalChatMessages: int: The total number of chats sent by human (non-system) users (what is traditionally thought of as a chat) NOTE: Difficult to discern bots from humans other than just creating a known list of popular bots and blacklisting, because not all sites (YT/Twitch) provide information on whether chat was sent by a registered bot or not.
highlight_percentile: float: The cutoff percentile that samples must meet to be considered a highlight
highlight_metric: str: The metric to use for engagement analysis to build highlights. NOTE: must be converted into actual Sample field name before use.

[Defined w/ default and modified AFTER analysis]:

totalUniqueUsers: int: The total number of unique users that sent a chat message (human users that sent at least one traditional chat)
overallAvgActivityPerSecond: float: The average activity per second across the whole chatlog. (totalActivity/totalDuration)
overallAvgChatMessagesPerSecond: float: The average number of chat messages per second across the whole chatlog. (totalChatMessages/totalDuration)
overallAvgUniqueUsersPerSecond: float: The average number of unique users chatting per second.
highlights: List[Highlight]: A list of the high engagement sections of the chatlog.
highlights_duration: float: The cumulative duration of the highlights (in seconds)
highlights_duration_text: str: The cumulative duration of the highlights represented in text format (i.e. hh:mm:ss)
spikes: List[Spike]: Not yet implemented TODO A list of the calculated spikes in the chatlog. May contain spikes of different types, identifiable by the spike’s type field.

chatlog_post_process(settings: ProcessSettings)

After we have finished iterating through the chatlog and constructing all of the samples, we call chatlog_post_process() to process the cumulative data points (so we don’t have to do this every time we add a sample).

This step is sometimes referred to as “analysis”.

Also removes the internal fields that don’t need to be output in the JSON object.

Parameters: settings (ProcessSettings) – Utility class for passing information from the analyzer to the chatlog processor and post-processor

create_new_sample()

Post-processes the previous sample, then appends & creates a new sample following the previous sample sequentially. If a previous sample doesn’t exist, creates the first sample.

NOTE: If there there are only 2 chats, one at time 0:03, and the other at 5:09:12, there are still a lot of empty samples in between (because we still want to graph/track the silence times with temporal stability)

get_highlights(highlight_metric: str, highlight_percentile: float)

Highlights reference a contiguous period of time where the provided metric remains above the percentile threshold. Find and return a list of highlights referencing the start and end times of samples whose highlight_metric is in the highlight_percentile for contiguous period of time of the referenced samples.

A highlight may reference more than one sample if contiguous samples meet the percentile cutoff.

Samples in the top ‘percentile’% of the selected engagement metric will be considered high-engagement samples and included in the highlights output list. The larger the percentile, the greater the metric requirement before being reported. If ‘engagement-percentile’=93.0, any sample in the 93rd percentile (top 7.0%%) of the selected metric will be considered an engagement highlight.

These high-engagement portions of the chatlog are stored as highlights, and may last for multiple samples.

This method should only be called after the averages have been calculated, ensuring accurate results when determining periods of high engagement.

Parameters

highlight_metric – The metric samples are compared to determine if they are high-engagement samples. NOTE: Internally converted to the actual field name of a sample field.
highlight_percentile – The cutoff percentile that the samples must meet to be included in a highlight

Returns

a list of highlights referencing samples that met the percentile cutoff requirements for the provided metric

Return type

List[Highlight]

get_spikes(spike_sensitivity, spike_metric)

A spike is a point in the chatlog where from one sample to the next, there is a sharp increase in the provided metric.

…? Are spikes sustained or..? ?: A spike is a point in the chatlog where the activity is significantly different from the average activity. Activity is significantly different if it is > avg*SPIKE_MULT_THRESHOLD. We detect a spike if the high activity level is maintained for at least SPIKE_SUSTAIN_REQUIREMENT # of samples.

print_process_progress(msg, idx, finished=False)

Prints progress of the chat download/process to the console.

If finished is true, normal printing is skipped and the last bar of progress is printed. This is important because we print progress every UPDATE_PROGRESS_INTERVAL messages, and the total number of messages is not usually divisible by this. We therefore have to slightly change the approach to printing progress for this special case.

process_chatlog(chatlog: Chat, source: str, settings: ProcessSettings)

Iterates through the whole chatlog and calculates the analytical data (Modifies and stores in a ChatAnalytics object).

Parameters

chatlog (chat_downloader.sites.common.Chat) – The chatlog we have downloaded
source (str) – The source of the media associated w the chatlog. URL of the media we have downloaded the log from, or a filepath
settings (ProcessSettings) – Utility class for passing information from the analyzer to the chatlog processor and post-processor

process_message(msg): Given a msg object from chat, update appropriate statistics based on the chat

Sample Data

The main JSON data contains a sample field comprised of a list of Sample objects.

class chat_analyzer.dataformat.Sample(startTime: float, endTime: float, sampleDuration: float = -1, startTime_text: str = '', endTime_text: str = '', activity: int = 0, chatMessages: int = 0, firstTimeChatters: int = 0, uniqueUsers: int = 0, avgActivityPerSecond: float = 0, avgChatMessagesPerSecond: float = 0, avgUniqueUsersPerSecond: float = 0, _userChats: dict = <factory>)

Bases: object

Class that contains data of a specific time interval of the chat. Messages will be included in a sample if they are contained within [startTime, endTime)

—

[Defined when class Initialized]:

startTime: float: The start time (inclusive) (in seconds) corresponding to a sample.
endTime: float: The end time (exclusive) (in seconds) corresponding to a sample.

[Automatically Defined on init]:

startTime_text: str: The start time represented in text format (i.e. hh:mm:ss)
endTime_text: str: The end time represented in text format (i.e. hh:mm:ss)
sampleDuration: float: The duration (in seconds) of the sample (end-start) NOTE: Should be == to the selected interval in all except the last sample if the total duration of the chat is not divisible by the interval

[Defined w/ default and modified DURING analysis of sample]:

activity: int: The total number of messages/things (of any type!) that appeared in chat within the start/endTime of this sample. Includes messages,notifications,subscriptions, superchats, … anything that appeared in chat
chatMessages: int: The total number of chats sent by human (non-system) users (what is traditionally thought of as a chat) NOTE: Difficult to discern bots from humans other than just creating a known list of popular bots and blacklisting, because not all sites (YT/Twitch) provide information on whether chat was sent by a registered bot or not.
firstTimeChatters: int: The total number of users who sent their first message of the whole stream during this sample interval

[Defined w/ default and modified AFTER analysis of sample]:

uniqueUsers: int: The total number of unique users that sent a chat message across this sample interval (len(self._userChats))
avgActivityPerSecond: float: The average activity per second across this sample interval. (activity/sampleDuration)
avgChatMessagesPerSecond: float: The average number of chat messages per second across this sample interval. (totalChatMessages/sampleDuration)
avgUniqueUsersPerSecond: float: The average number of unique users that sent a chat across this sample interval. (uniqueUsers/sampleDuration)

sample_post_process()

After we have finished adding messages to a particular sample (moving on to the next sample), we call sample_post_process() to process the cumulative data points (so we don’t have to do this every time we add a message)

Also removes the internal fields that don’t need to be output in the JSON object.

Highlight Data

The main JSON data contains a highlights field comprised of a lsit of Highlight objects. Currently, there are no platform-specific fields corresponding to Highlights (i.e. highlight objects look the same for all platforms).

class chat_analyzer.dataformat.Highlight(startTime: float, endTime: float, description: str, type: str, peak: float, avg: float)

Bases: Section

Highlights reference a contiguous period of time where the provided metric remains above the percentile threshold.

—

type: str: The engagement metric. i.e. “avgActivityPerSecond”, “avgChatMessagesPerSecond”, “avgUniqueUsersPerSecond”, etc. NOTE: It is stored as its converted value (the name of the actual field), NOT the metric str the user provided in the CLI.
peak: float: The maximum value of the engagement metric throughout the whole Highlight (among the samples in the Highlight).
avg: float: The average value of the engagement metric throughout the whole Highlight (among the samples in the Highlight).

Twitch-specific fields:

Chat Analytics Data (Twitch)

class chat_analyzer.dataformat.TwitchChatAnalytics(duration: float, interval: int, description: str, program_version: str, platform: str, duration_text: str = '', interval_text: str = '', mediaTitle: str = 'No Media Title', mediaSource: str = 'No Media Source', samples: ~typing.List[~chat_analyzer.dataformat.Sample] = <factory>, totalActivity: int = 0, totalChatMessages: int = 0, totalUniqueUsers: int = 0, overallAvgActivityPerSecond: float = 0, overallAvgChatMessagesPerSecond: float = 0, overallAvgUniqueUsersPerSecond: float = 0, highlights: ~typing.List[~chat_analyzer.dataformat.Highlight] = <factory>, highlights_duration: float = 0, highlights_duration_text: str = '', highlight_percentile: float = 0, highlight_metric: str = '', spikes: ~typing.List[~chat_analyzer.dataformat.Spike] = <factory>, _overallUserChats: dict = <factory>, _currentSample: ~typing.Optional[~chat_analyzer.dataformat.Sample] = None, totalSubscriptions: int = 0, totalGiftSubscriptions: int = 0, totalUpgradeSubscriptions: int = 0)

Bases: ChatAnalytics

Extension of the ChatAnalytics class, meant to contain data that all chats have and data specific to Twitch chats.

NOTE: Most twitch-specific attributes don’t make a lot of sense to continously report a per-second value, so we don’t!

—

(See ChatAnalytics class for common fields)

[Defined w/ default and modified DURING analysis]:

totalSubscriptions: int: The total number of subscriptions that appeared in the chat (which people purchased themselves).
totalGiftSubscriptions: int: The total number of gift subscriptions that appeared in the chat.
totalUpgradeSubscriptions: int: The total number of upgraded subscriptions that appeared in the chat.

chatlog_post_process(settings)

After we have finished iterating through the chatlog and constructing all of the samples, we call chatlog_post_process() to process the cumulative data points (so we don’t have to do this every time we add a sample).

This step is sometimes referred to as “analysis”.

Also removes the internal fields that don’t need to be output in the JSON object.

Parameters: settings (ProcessSettings) – Utility class for passing information from the analyzer to the chatlog processor and post-processor

process_message(msg): Given a msg object from chat, update common fields and twitch-specific fields

Sample Data (Twitch)

class chat_analyzer.dataformat.TwitchSample(startTime: float, endTime: float, sampleDuration: float = -1, startTime_text: str = '', endTime_text: str = '', activity: int = 0, chatMessages: int = 0, firstTimeChatters: int = 0, uniqueUsers: int = 0, avgActivityPerSecond: float = 0, avgChatMessagesPerSecond: float = 0, avgUniqueUsersPerSecond: float = 0, _userChats: dict = <factory>, subscriptions: int = 0, giftSubscriptions: int = 0, upgradeSubscriptions: int = 0)

Bases: Sample

Class that contains data specific to Twitch of a specific time interval of the chat.

—

[Defined w/ default and modified DURING analysis of sample]:

subscriptions: int: The total number of subscriptions (that people purhcased themselves) that appeared in chat within the start/endTime of this sample.
giftSubscriptions: int: The total number of gift subscriptions that appeared in chat within the start/endTime of this sample.
upgradeSubscriptions: int: The total number of upgraded subscriptions that appeared in chat within the start/endTime of this sample.

YouTube-specific fields:

Chat Analytics Data (YouTube)

class chat_analyzer.dataformat.YoutubeChatAnalytics(duration: float, interval: int, description: str, program_version: str, platform: str, duration_text: str = '', interval_text: str = '', mediaTitle: str = 'No Media Title', mediaSource: str = 'No Media Source', samples: ~typing.List[~chat_analyzer.dataformat.Sample] = <factory>, totalActivity: int = 0, totalChatMessages: int = 0, totalUniqueUsers: int = 0, overallAvgActivityPerSecond: float = 0, overallAvgChatMessagesPerSecond: float = 0, overallAvgUniqueUsersPerSecond: float = 0, highlights: ~typing.List[~chat_analyzer.dataformat.Highlight] = <factory>, highlights_duration: float = 0, highlights_duration_text: str = '', highlight_percentile: float = 0, highlight_metric: str = '', spikes: ~typing.List[~chat_analyzer.dataformat.Spike] = <factory>, _overallUserChats: dict = <factory>, _currentSample: ~typing.Optional[~chat_analyzer.dataformat.Sample] = None, totalSuperchats: int = 0, totalMemberships: int = 0)

Bases: ChatAnalytics

Extension of the ChatAnalytics class, meant to contain data that all chats have and data specific to YouTube chats.

NOTE: Most youtube-specific attributes don’t make a lot of sense to continously report a per-second value, so we don’t!

—

(See ChatAnalytics class for common fields and descriptions)

[Defined w/ default and modified DURING analysis]:

totalSuperchats: int: The total number of superchats (regular/ticker) that appeared in the chat. NOTE: A creator doesn’t necessarily care what form a superchat takes, so we just combine regular and ticker superchats
totalMemberships: int: The total number of memberships that appeared in the chat.

process_message(msg): Given a msg object from chat, update common fields and youtube-specific fields

Sample Data (YouTube)

class chat_analyzer.dataformat.YoutubeChatAnalytics(duration: float, interval: int, description: str, program_version: str, platform: str, duration_text: str = '', interval_text: str = '', mediaTitle: str = 'No Media Title', mediaSource: str = 'No Media Source', samples: ~typing.List[~chat_analyzer.dataformat.Sample] = <factory>, totalActivity: int = 0, totalChatMessages: int = 0, totalUniqueUsers: int = 0, overallAvgActivityPerSecond: float = 0, overallAvgChatMessagesPerSecond: float = 0, overallAvgUniqueUsersPerSecond: float = 0, highlights: ~typing.List[~chat_analyzer.dataformat.Highlight] = <factory>, highlights_duration: float = 0, highlights_duration_text: str = '', highlight_percentile: float = 0, highlight_metric: str = '', spikes: ~typing.List[~chat_analyzer.dataformat.Spike] = <factory>, _overallUserChats: dict = <factory>, _currentSample: ~typing.Optional[~chat_analyzer.dataformat.Sample] = None, totalSuperchats: int = 0, totalMemberships: int = 0)

Bases: ChatAnalytics

Extension of the ChatAnalytics class, meant to contain data that all chats have and data specific to YouTube chats.

NOTE: Most youtube-specific attributes don’t make a lot of sense to continously report a per-second value, so we don’t!

—

(See ChatAnalytics class for common fields and descriptions)

[Defined w/ default and modified DURING analysis]:

totalSuperchats: int: The total number of superchats (regular/ticker) that appeared in the chat. NOTE: A creator doesn’t necessarily care what form a superchat takes, so we just combine regular and ticker superchats
totalMemberships: int: The total number of memberships that appeared in the chat.

process_message(msg): Given a msg object from chat, update common fields and youtube-specific fields

Example JSON output:

An output JSON file might look something like… (Note, only generic fields are shown. Platform-specific fields would be included in their respective sections: the main analytics data in the main body of the JSON, and the sample data within each sample.)

{
    "duration": 7386.016,
    "interval": 5,
    "description": "description ",
    "program_version": "1.0.0b5",
    "platform": "www.....com",
    "duration_text": "2:03:06",
    "interval_text": "0:05",
    "mediaTitle": "The title of the VOD",
    "mediaSource": "https://www...",
    "samples": [
        {
        "startTime": 0,
        "endTime": 5,
        "sampleDuration": 5,
        "startTime_text": "0:00",
        "endTime_text": "0:05",
        "activity": 10,
        "chatMessages": 9,
        "firstTimeChatters": 9,
        "uniqueUsers": 9,
        "avgActivityPerSecond": 2.0,
        "avgChatMessagesPerSecond": 1.8,
        "avgUniqueUsersPerSecond": 1.8,
        "_userChats": {},
        },
        ...
    ],
    "totalActivity": 42547,
    "totalChatMessages": 42034,
    "totalUniqueUsers": 12533,
    "overallAvgActivityPerSecond": 5.760480345561126,
    "overallAvgChatMessagesPerSecond": 5.691024768968819,
    "overallAvgUniqueUsersPerSecond": 5.66955345060893,
    "highlights": [
        {
        "startTime": 4405,
        "endTime": 4420,
        "description": "avgUniqueUsersPerSecond sustained at or above [8.6]",
        "type": "avgUniqueUsersPerSecond",
        "peak": 11.2,
        "avg": 9.866666666666665,
        "duration": 15,
        "duration_text": "0:15",
        "startTime_text": "1:13:25",
        "endTime_text": "1:13:40"
        },
        ...
    ],
    "highlights_duration": 540,
    "highlights_duration_text": "9:00",
    "highlight_percentile": 93.0,
    "highlight_metric": "usersPSec",
    "spikes": [],
    "_overallUserChats": {},
    "_currentSample": null,
}