Interview Question

Data Scientist, Analytics Interview

-Menlo Park, CA

Meta

Write a SQL query to compute a frequency table of a certain attribute involving two joins. What if you want to GROUP or ORDER BY some attribute? What changes would you need to make? How would you account for NULLs?

Tags:sql facebook
AnswerAdd Tags

Interview Answers

25 Answers

3

select number_comments, count(submission_id) as number_posts from ( # more than zero comments select submission_id, count(post_id) as number_comments from ( select submission_id, case when parent_id is null 1 else 0 end as post, case when parent_id is not null parent_id else null end as post_id, body from Submissions )k where post =0 group by submission_id ) k1 group by number_comments union select number_comments, count(submission_id) as number_posts from ( # comments= 0 select submission_id, 0 as number_comments from ( select submission_id, case when parent_id is null 1 else 0 end as post, case when parent_id is not null parent_id else null end as post_id, body from Submissions )k where post =1 group by submission_id ) k1 group by number_comments

Anonymous on

2

I've tested all these on a mock data set and none of them work! Does anyone have the correct solution? I'm stuck on this one..

QR on

2

Posts and comments in the same table looks weird. Here's my attempt (made easy with CASE) to exclude all the posts from the table and grouping/counting comments. SEL parent_id ,COUNT(*) as comment_count ( SEL * ,CASE WHEN perent_id IS NULL THEN 'Post' ELSE 'comment' END as post_or_comment FROM Submissions ) a WHERE post_or_comment = 'comment'

Alberto on

2

Here is the solution. You need a left self join that accounts for posts with zero comments. Select children , count(submission_id) from ( Select a.submission_id, count(b.submission_id) as children from Submissions a Left Join submissions b on On a.submission_id=b.parent_id Where a.parent_id is null Group by a.submission_id ) a Group by children

Anonymous on

1

@ RLeung shouldn't you use left join? You are effectively losing all posts with zero comment.

anonymous on

7

Can you explain why you would even need the self-join here? Can you not just group by parent_id and do the COUNT() on each group, since the parent_id values correspond to the post values when they're not null?

Anonymous on

5

If you group by parent_id, you'll be leaving out all posts with zero comments.

Anonymous on

0

select t.post_id, count(t.submission_id) -1 from (select submission_id, case when parent_id is null then submission_id else parent_id end as post_id from submissions) t group by post_id

Anonymous on

0

select comments_count, count(submission_id) as post_count from ( select submission_id, count( distinct parent_id) as comments_count from Table A group by submission_id )A group by comments_count

Anonymous on

0

I think all of the Posts are missing Parent_ID. I am editing the code shared above. This will solve the duplicate problem select parent_id as post, count(parent_id) as num_of_comments from submissions group by parent_id union select submission_id as post, 0 as num_of_comments from submissions where parent.id not in (select submission_id from submissions)

Anonymous on

0

Not the shortest answer but I think much clearer than anything posted here. Also gives output table that could actually be fed directly into a histogram which was part of the question. SELECT CASE WHEN num_comments IS NULL THEN 0 ELSE num_comments END AS num_comments, COUNT(parent_post_id) AS cnt_posts FROM ( SELECT submission_id AS parent_post_id, comment_count.num_comments FROM Submissions WHERE parent_id IS NULL LEFT JOIN ( SELECT parent_id, COUNT(parent_id) AS num_comments FROM Submissions WHERE parent_id IS NOT NULL GROUP BY 1 ) comment_count ON submission_id = comment_count.parent_id ) GROUP BY 1 ORDER BY 1

Cam on

0

select p.parent_id as posts, count(c.submission_id) as commentcount from submissions c inner join submissions p on c.parent_id = p.submission_id group by p.parent_id;

Harish on

0

select case when parent_id is not null then parent_id else sub_id end as post_id, sum(case when parent_id is not null then 1 else 0 end) as comment_count from submissions group by case when parent_id is not null then parent_id else sub_id end;

Priyanka Shukla on

0

Create table: create table submissions ( submission_id int null, body varchar(500) null, parent_id int null ); Insert records: (change your database name) INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (1, 'POST1', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C1', 1); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (2, 'POST2', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (3, 'POST3', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C2', 3); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C3', 3); Solution: SELECT a.submission_id AS post_id, a.body, sum(CASE WHEN t.parent_id > 0 THEN 1 ELSE IFNULL(t.parent_id,0) END) AS comment_id FROM submissions AS a LEFT JOIN (SELECT b.parent_id FROM submissions AS b) t ON a.submission_id = t.parent_id WHERE a.submission_id IS NOT NULL GROUP BY post_id; Results: 1 POST1 1 2 POST2 0 3 POST3 2

Complete solution on

0

CREATE TABLE users( sid INT , pid INT , body Varchar(255)); insert into users Values ( 2,null, "cover"), (1,2,"Ami is"),(3,2,"hi"),(4,2,"good pic"),(5,null ,"profil pic"),(6,5,"nice"); (select pid , COUNT(pid) as total from users where pid is not null group by pid)

Anonymous on

0

create table subs( sub_id integer, parent_id integer ) insert into subs values(1,null); insert into subs values(2,null); insert into subs values(3,null); insert into subs values(4,null); commit; insert into subs values(5,1); insert into subs values(6,1); insert into subs values(7,1); insert into subs values(8,1); insert into subs values(9,2); insert into subs values(10,2); insert into subs values(11,3); insert into subs values(12,3); insert into subs values(12,4); commit; select * from subs select cc, count(sub_id) from ( select a.sub_id, count(b.sub_id) cc from subs a inner join subs b on(b.parent_id = a.sub_id) group by 1) group by 1

Anonymous on

0

I found it easier to explain when I broke it out into named sub tables to handle the case when there are no comments on a post and you want the end result to be the histogram of the number of comments per post: with parent_comment_ct as ( SELECT parent_id, COUNT(parent_id) AS num_comments FROM submissions WHERE parent_id IS NOT NULL GROUP BY parent_id ), submission_comment_ct as ( SELECT su.submission_id AS parent_post_id, pcc.num_comments AS num_comments FROM submissions su LEFT JOIN parent_comment_ct pcc ON su.submission_id = pcc.parent_id WHERE su.parent_id IS NULL ) SELECT CASE WHEN scc.num_comments IS NULL THEN 0 ELSE scc.num_comments END AS num_comments, COUNT(scc.parent_post_id) AS cnt_posts FROM submission_comment_ct scc GROUP BY 1 ORDER BY 1

icarus on

0

SELECT recommended_page FROM (SELECT f.user1_id as users, f.user2_id as freinds, l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user2_id = l.user_id WHERE f.user1_id = 1 UNION ALL SELECT f.user2_id as users,f.user1_id as friends,l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user1_id = l.user_id WHERE f.user2_id = 1) MINUS (SELECT page_id as recommended_page FROM likes WHERE user_id = 1);

Anonymous on

0

SELECT recommended_page FROM (SELECT f.user1_id as users, f.user2_id as freinds, l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user2_id = l.user_id WHERE f.user1_id = 1 UNION ALL SELECT f.user2_id as users,f.user1_id as friends,l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user1_id = l.user_id WHERE f.user2_id = 1) MINUS (SELECT page_id as recommended_page FROM likes WHERE user_id = 1);

Anonymous on

0

select c.subID as SubmissionID, count(c.body)-1 as Counts_Comments from subm c LEFT JOIN subm b ON c.subID = b.pID where b.pID is null AND c.pID is NULL group by c.subID UNION ALL select a.pID as SubmissionID, count(a.body) as Counts_Comments from ( select *, case when pID IS NULL then 'P' Else 'C' END as P_O_C from subm)a where P_O_C = 'C' group by a.pID Order by SubmissionID;

Aslesha on

0

select a.user_name,b.user_name,page_liked from services_db.pages_liked a, services_db.user_friends b where 1=1 and a.user_name = b.friend_user and a.page_liked not in ( select page_liked from services_db.pages_liked c where 1=1 and c.user_name = b.user_name ) ;

GV on

0

select parent_id as post, count(parent_id) as num_of_comments from submissions group by parent_id union select submission_id as post, 0 as num_of_comments from submissions where parent.id=null

Anonymous on

0

select k.post_id, count(submission_id) -1 from (select submission_id, case when parent_id is null then submission_id else parent_id end as post_id from submissions) t group by post_id

Anonymous on

3

In my case this question was like: 'you have a table Submissions with the submission_id, the body, and the parent_id. Submissions can be posts, or comments to a post. In posts, parent_id is null, and in comments, the parent_id is the post the comment is commenting about. How would you go and make a histogram of number of posts per comment_count?' I think i solved it along the lines of: SELECT comment_counts.n_comments, count distinct(n_comments.submission_id) ( select s1.submission_id, COUNT DISTINCT(s2.parent_id) as n_comments OUTER join submissions on s1.submission_id = s2.parent_id group by submission_id) comment_counts GROUP BY comment_counts.n_comments

Anonymous on

0

I think it is pretty straight forward. All the posts will have null parent_id. Considering the table schema to be something like this: CREATE TABLE submissions ( submission_id INT, body VARCHAR(500), parent_id INT ); SELECT DISTINCT nvl(parent_id::TEXT,'Post with no comments') AS post_id, COUNT(CASE WHEN parent_id IS NOT NULL THEN submission_id ELSE 0 END) AS number_of_comments_or_post FROM submissions GROUP BY 1; This will give results like this: post_id number_of_comments_or_post Post with no comments 8 1 10 7 11 13 8 19 9 25 7 So, the first row will give the number of posts with no comments which is 8 and remaining rows tell the number of comments per post. Is there a flaw in this?

ig on

Add Answers or Comments

To comment on this, Sign In or Sign Up.