Coursera | Big Data Specialization | Big Data Modeling and Management Systems | Quiz Week 3 Data Models Quiz
1。
What is a possible pitfall of utilizing Excel as a way to manipulate
small databases?
Excel is a user program and thus cannot run on a server.
Excel does not allow algorithms for data manipulation.
Excel does not enforce many principles of relational data
models.
2。
What does the term “atomic” mean in the context of relational
databases?
A tuple that cannot be reduced.
A column or row of data. Depends on the context.
One unit of information that cannot be decomposed.
For more information about the following concept, please
view here.
Fixed schema of a particular database.
3。
What is the Pareto-Optimality problem?
Find the best possible path given two or more
optimization criteria where neither constraint can be
fully optimized simultaneously.
Find the shortest path from source node to target node.
Find the optimal path that requires going through
specific nodes given by the user.
4。
What constitutes a community within a graph?
High density of nodes at a certain location.
Many anomalous neighborhoods within the same
vicinity.
A neighborhood defined by an integer constant K around
a specific node. All K+1 nodes belong in another
community.
A dense amount of edge connections between nodes in
a community and a few connections across
communities.
5。
Why are trees useful for semi-structured data such as XML and
JSON?
Computers can easily visualize the data with a tree
structure.
It is not always the case that XML and JSON can be
represented as trees.
Trees take advantage of the parent-child relationship of
the data for easy navigation.
For more information about the following concept, please
view here.
They are only useful for XML data as tree-like structure is
apparent with tags. While JSON does not contain a treelike structure as it contains arrays.
6。
What is the general purpose of modeling data as vectors?
Enables image searching.
The ability to normalize vectors allowing probability
distributions.
Enables weighting of the query.
Results can be ordered by similarity using vector
projection.
For more information about the following concept, please
view here.
7。
For the following questions 7, 8, and 9, suppose a registration
website creates data with the following fields for each person
registered (note: if the user does not input a value, NULL is stored
instead): Name, Date, Address, and Account Number.
Suppose we collect data month by month. Each month, we would
have a batch of data containing the fields listed above. At the end
of the year, we want to summarize our registrant activities for the
entire year, so we would remove redundancies in our data by
removing any records with duplicate account numbers from month
to month. What type of operation do we use in this scenario?
Not an Operation
Union
Join
Subsetting
8。
From the information given in question 7, what are the constraints,
if any, which we have placed on the Account Number field for the
end of year collection?
There are no constraints.
If we had n duplicate Account Numbers then we will
remove n-1 duplicate fields.
Account Number should be unique.
Account should have at most n digits.
9。
Suppose 100 people signup for our system and of the 100 people,
60 of them did not input an address. The system lists the values as
NULL for these empty entries in the address field. Would this
situation still have structure for our data?
Yes the data has structure because we have placed a
structural constraint on the data, thus the data will
always have the originally defined structure.
For more information about the following concept, please
view here.
No because the majority of data do not have a specific
field filled, thus our originally defined structure is lost.
Data Models Quiz Week 3
Note: Correct answers are in the bold and underlined word.1。
What is a possible pitfall of utilizing Excel as a way to manipulate
small databases?
Excel is a user program and thus cannot run on a server.
Excel does not allow algorithms for data manipulation.
Excel does not enforce many principles of relational data
models.
2。
What does the term “atomic” mean in the context of relational
databases?
A tuple that cannot be reduced.
A column or row of data. Depends on the context.
One unit of information that cannot be decomposed.
For more information about the following concept, please
view here.
Fixed schema of a particular database.
3。
What is the Pareto-Optimality problem?
Find the best possible path given two or more
optimization criteria where neither constraint can be
fully optimized simultaneously.
Find the shortest path from source node to target node.
Find the optimal path that requires going through
specific nodes given by the user.
4。
What constitutes a community within a graph?
High density of nodes at a certain location.
Many anomalous neighborhoods within the same
vicinity.
A neighborhood defined by an integer constant K around
a specific node. All K+1 nodes belong in another
community.
A dense amount of edge connections between nodes in
a community and a few connections across
communities.
5。
Why are trees useful for semi-structured data such as XML and
JSON?
Computers can easily visualize the data with a tree
structure.
It is not always the case that XML and JSON can be
represented as trees.
Trees take advantage of the parent-child relationship of
the data for easy navigation.
For more information about the following concept, please
view here.
They are only useful for XML data as tree-like structure is
apparent with tags. While JSON does not contain a treelike structure as it contains arrays.
6。
What is the general purpose of modeling data as vectors?
Enables image searching.
The ability to normalize vectors allowing probability
distributions.
Enables weighting of the query.
Results can be ordered by similarity using vector
projection.
For more information about the following concept, please
view here.
7。
For the following questions 7, 8, and 9, suppose a registration
website creates data with the following fields for each person
registered (note: if the user does not input a value, NULL is stored
instead): Name, Date, Address, and Account Number.
Suppose we collect data month by month. Each month, we would
have a batch of data containing the fields listed above. At the end
of the year, we want to summarize our registrant activities for the
entire year, so we would remove redundancies in our data by
removing any records with duplicate account numbers from month
to month. What type of operation do we use in this scenario?
Not an Operation
Union
Join
Subsetting
8。
From the information given in question 7, what are the constraints,
if any, which we have placed on the Account Number field for the
end of year collection?
There are no constraints.
If we had n duplicate Account Numbers then we will
remove n-1 duplicate fields.
Account Number should be unique.
Account should have at most n digits.
9。
Suppose 100 people signup for our system and of the 100 people,
60 of them did not input an address. The system lists the values as
NULL for these empty entries in the address field. Would this
situation still have structure for our data?
Yes the data has structure because we have placed a
structural constraint on the data, thus the data will
always have the originally defined structure.
For more information about the following concept, please
view here.
No because the majority of data do not have a specific
field filled, thus our originally defined structure is lost.
I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.data science course
ReplyDeleteNice blog
ReplyDeleteData power training
Data Stage training
Dynamic CRM Training
Informatica idq training