spam filter
[2019-1-14, 2018-11-5]
Table: user_actions
ds (STRING) | user_id (BIGINT) |post_id (BIGINT) |action (STRING) | extra (STRING)
'2018-07-01'| 1209283021 | 329482048384792 | 'view' |
'2018-07-01'| 1209283021 | 329482048384792 | 'like' |
'2018-07-01'| 1938409273 | 349573908750923 | 'reaction' | 'LOVE'
'2018-07-01'| 1209283021 | 329482048384792 | 'comment' | 'Such nice Raybans'
'2018-07-01'| 1238472931 | 329482048384792 | 'report' | 'SPAM'
'2018-07-01'| 1298349287 | 328472938472087 | 'report' | 'NUDITY'
'2018-07-01'| 1238712388 | 329482048384792 | 'reshare' | 'I wanted to share with you all'
Table: reviewer_removals
ds (STRING) | reviewer_id (BIGINT) | post_id (BIGINT) |
'2018-07-01'| 3894729384729078 | 329482048384792 |
'2018-07-01'| 8477594743909585 | 388573002873499 |
Q1: How many posts were reported yesterday for each report Reason?
Q2: What percent of daily content that users view on Facebook is actually Spam?
Q3: How to find the user who abuses this spam system?
Q3: Facebook has decided to be proactive about SPAM, instead of merely reactive. We decide to address the SPAM problem through a Machine Learning solution predicting whether a given post is Indeed SPAM. We want to use the predictions in order to downrank/deprioritize suspected SPAM from news feed. Q3的问题是如何来评估这个machine leaning有没有用.
PRODUCT:
Q1. Facebook用machine learning 建了一个model来rank content以达到filter spam的目的,需要关注什么metrics来评价这个model
Q2. 在用ab testing的时候发现用了新的spam model之后revenue下降了。面试官确定了首先这个model不会touch到ads,就是说ads不会被filter out。并且DAU/WAU/MAU和time spend没有变化,也就是说user方面没有变化。那么可能的原因是什么。
然后我问面试官revenue主要来自什么,面试官说是click ads。我说那么ads click的revenue主要可以break down成#user x CTR/CTP x price/click. 这个情况下只可能有变化的是CTR,也就是说因为用来新的model以后,这个平台的整体content质量更高了,那么user就更喜欢花更多时间去explore这些content,那么点击广告的时间就相对来说哦变少了,revenue也下降了。面试官说是这样的,采用新的model之后用户可能会花更多时间去看video之类的,那么用在ads上的时间就变少了。
Last updated