Interview Question
Data Scientist Interview
-
MetaData challenge was very similar to the ads analysis challenge on the book the collection of data science takehome challenge, so that was easy (if you have done your homework). SQL was: you have a table where you have date, user_id, song_id and count. It shows at the end of each day how many times in her history a user has listened to a given song. So count is cumulative sum. You have to update this on a daily basis based on a second table that records in real time when a user listens to a given song. Basically, at the end of each day, you go to this second table and pull a count of each user/song combination and then add this count to the first table that has the lifetime count. If it is the first time a user has listened to a given song, you won't have this pair in the lifetime table, so you have to create the pair there and then add the count of the last day. Onsite: lots of ads related and machine learning questions. How to build an ad model, how to test it, describe a model. I didn't do well in some of these.
Interview Answers
18 Answers
Can't tell you the solution of the ads analysis challenge. I would recommend getting in touch with the book author though. It was really useful to prep for all these interviews. SQL is a full outer join between life time count and last day count and then sum the two.
Anonymous on
Can you post here your solution for the ads analysis from the takehome challenge book. I also bought the book and was interested in comparing the solutions. Also can you post here how you solved the SQL question?
Anonymous on
for the SQL, I think both should work. Outer join between lifetime count and new day count and then sum columns replacing NULLs with 0, or union all between those two, group by and then sum.
Anonymous on
REPLACE INTO summary_table(name,song,tot_count) SELECT d_name,d_song,d_cnt+ IFNULL(tot_count,0) FROM (select name d_name,song d_song,count(*) d_cnt from today_table t where to_date ='12-26-2013' group by name,song) LEFT JOIN (select name s_name,song s_song,tot_count from summary_table s) ON s_name=d_name and s_song=d_song;
Anonymous on
I had the interview and queries are like this. Not hard, but boy, coding live in front of a screen is very different than reading a question on glassdoor and trying to answer it!
Anonymous on
Thanks so much for this! So helpful!
Anonymous on
insert into summary_table select d.* from (select b.user_id, b._date, b.song, b.todaysum+coalesce(c.maxsum,0) as cumsum from today_summary b left join (select a.user_id, a.song, max(cumsum) as maxsum from summary_table a group by a.user_id, a.song) c on b.user_id=c.user_id and b.song=c.song) d;
Anonymous on
Main query works, having trouble with update statement though... SELECT current_date, t.user_id, t.song_id, SUM(t.count) FROM ( SELECT * FROM cumu UNION ALL SELECT * FROM curr) t GROUP BY t.user_id, t.song_id;
Anonymous on
I've seen many reviews where the reviewer referred to this book, but didn't explain much beyond that. I'd say they're trying to make you buy the book.
AdsAnalysisChallenge on
Could you please explain a little bit more about "ads analysis challenge". What is given and what is the objective?
Erin on
Could you please explain a little bit more about "ads analysis challenge". What is given and what is the objective?
Erin on
I made a composite primary key (user_id, song_id) REPLACE INTO usersong1 SELECT f.datert, f.user_id, f.song_id, f.daily+f.counts FROM (select * from usersong1 a JOIN (select datert, user_id, song_id, COUNT(user_id) as daily from rt group by rt.user_id, rt.song_id ) e ON (a.user_id = e.user_id) AND (a.song_id = e.song_id)) f;
Anonymous on
for the SQL, I think Union all should be used instead of outer join
Anonymous on
Update cuml set cuml.cnt = (cuml.cnt + cur.cnt) From Cuml cuml inner Join Curr cur On cuml.Userid = cur.userid and cuml.songid = cur.songid
Anonymous on
INSERT INTO History(date, user_id, song_id, count) SELECT date, user_id, song_id, SUM(count) FROM Daily WHERE Daily.user_id = History.user_id AND Daily.song_id = History.song_id ON DUPLICATE KEY UPDATE SET History.count = History.count + Daily.count;
RC on
INSERT INTO History(date, user_id, song_id, count) SELECT date, user_id, song_id, SUM(count) FROM Daily GROUP BY date, user_id, song_id WHERE Daily.user_id = History.user_id AND Daily.song_id = History.song_id ON DUPLICATE KEY UPDATE SET History.count = History.count + Daily.count;
RC on
Note: In Facebook they want you to write efficient queries and union all is not efficient. SELECT (CASE WHEN C.User_id IS NOT NULL THEN C.User_id ELSE D.User_id END) AS USERS, (CASE WHEN C.song_id IS NOT NULL THEN C.song_id ELSE D.song_id END) AS SONG, sum(ISNULL(Cum,0))+sum(isnull(daily_count,0)) as sum FROM [dbo].[Q_15_Cum] AS C FULL JOIN (SELECT User_id, song_id, COUNT(*) AS DAILY_COUNT FROM [dbo].[Q_15_Daily] GROUP BY User_id, song_id )D ON C.User_id=D.User_id AND C.song_id=D.song_id group by(CASE WHEN C.User_id IS NOT NULL THEN C.User_id ELSE D.User_id END) , (CASE WHEN C.song_id IS NOT NULL THEN C.song_id ELSE D.song_id END) order by 1,2
SQL_Master on
-> first update cumm table for the user_id,song_id combinations that exist in both tables UPDATE cumm AS v SET date = s.date, count = v.count + s.count FROM (select date,user_id,song_id,count(*) as count from curr group by 1,2,3) AS s where v.user_id=s.user_id and v.song_id=s.song_id -> then insert in cumm table the combinations that dont exist in cumm table Insert into cumm select a.* from (select date,user_id,song_id,count(*) as count from curr group by 1,2,3) a left join cumm b on a.user_id=b.user_id and a.song_id=b.song_id where b.user_id is null
MM on