Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
IP portfolios and evolution of biomedical additivemanufacturing applications
Amy J. C. Trappey1 • Charles V. Trappey2 •
Curry L. S. Chung1
Received: 4 February 2016 / Published online: 10 February 2017� Akademiai Kiado, Budapest, Hungary 2017
Abstract Additive manufacturing (AM) or 3D printing includes techniques capable of
manufacturing regular and irregular shapes for small batches of customized products. The
ability to customize unusual shapes makes the process particularly suitable for prosthetic
products used in biomedical applications. AM adoption in the field of biomedical appli-
cations (called bio-AM in this research) has seen significant growth over the last few years.
This research develops an Intellectual Property (IP) analytical methodology to explore the
portfolios and evolution of patents, as well as their relevance to Taiwan’s Ministry of
Science and Technology (MOST) research projects in bio-AM domain. Specifically, global
and domestic IP portfolios for bio-AM innovations are studied using the proposed method.
First, the domain documents (of US patents and MOST projects) are collected from a
global patent database and MOST project database. The key term frequency counts and
technical clustering analysis of the collected documents are derived. The key terms and
appearance frequencies in documents form the basis for document clustering and similarity
analysis. The ontology of bio-AM is constructed based on the clustering results. Finally,
the patents and projects in the adjusted clusters are subject to evolution analysis using
concept lattice analysis. This research provides a computer supported IP evolution analysis
system, based on the developed algorithms, for the decision support of IP and R&D
strategic planning.
& Amy J. C. [email protected]
Charles V. [email protected]
Curry L. S. [email protected]
1 Department of Industrial Engineering and Engineering Management, National Tsing HuaUniversity, Hsinchu, Taiwan
2 Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan
123
Scientometrics (2017) 111:139–157DOI 10.1007/s11192-017-2273-6
Keywords Additive manufacturing � 3D printing � Biomedical � Patent analysis �Ontology � Evolution analysis
Introduction
Additive manufacturing (AM) is a technology used to print three dimensional (3D) objects
via an ongoing process of material additions. By using 3D digital modeling data, the 3D
objects are constructed through layer-by-layer material adherences. The advantage of 3D
printing (or AM) is the ability to fabricate irregular shaped objects with a high degree of
curvature and/or void spaces within the object. AM is currently restricted to the production
of small batches of products. Small batch manufacturing, however, is not necessarily a
hindrance for prosthetic products used in medical applications since many objects are
custom made, one of the kind, for individual patients.
The worldwide market for bio-AM (i.e., 3D printing in medical applications) is
expected to reach US 1 billion dollars by 2019 (Transparency Market Research 2013). The
applications of bio-AM have created unique business opportunities with intense compe-
tition emerging across the industry. Strategically realizing sustainable opportunities in the
bio-AM market requires that companies search for effective strategies to develop and apply
technologies within time, within budget constraints, and without infringing upon the
intellectual property (IP) of others. The common approach used to accomplish these goals
begins with patent analysis. Patent documents contain important research results (the
ability to reconstruct the invention by a person knowledgeable in the art) which is of great
value to industry, legal researchers, and policy advocates in science and technology R&D
(Tseng et al. 2007). Since patents provide legal protection for the IP owners, these
intangible rights may be licensed for creating a continuous revenue stream, sold to others
for substantial profits, or used to enhance the brand values of the patent assignees and
licensees. Hence, planning an appropriate patent strategy is important for a technology
company to leverage its IP rights, to create patent portfolios for better management, to
enhance the value of the firm, and to secure inventions from possible IP infringement
litigation.
This research focuses on developing an innovative methodology to analyze a company’s
IP portfolio, compare the critical strength and position relative to its competitors (Narin
et al. 1987), identify the domain technology evolution, and trace the technology pathways
of the past, current, and future developments (Zhou et al. 2014). In our case study, a unique
patent analytical method is applied to study global and domestic portfolios and the evo-
lution of bio-AM technologies archived in the USPTO patent database and the database of
Taiwan Ministry of Science and Technology (MOST) funded research proposals. The
MOST bio-AM proposals are evaluated against the global bio-AM patent trends. For the
study, R statistical programs are used for text mining and cluster analysis. The normalized
term frequency-inverse document frequency (NTF–IDF) is the measure for evaluating the
importance of extracted key terms. The computer assisted R-program system counts the
normalized term frequencies (NTF) and inverse document frequencies (IDF) across all
documents and constructs a NTF–IDF matrix. The research calculates the correlations
between all pairs of documents, based on the common term frequencies, as the inputs for
clustering algorithms. The K means, K medoids, and Ward’s hierarchical algorithms are
used to cluster the documents. The R package cValid (Brock et al. 2008) is applied to
140 Scientometrics (2017) 111:139–157
123
evaluate the performance and select the best clustering results. The graphical clustered
concept evolution is derived using a modified formal concept analysis (MFCA or called
concept lattice) algorithm. The proposed approach provides an objective basis to recom-
mend, within the given knowledge domain, which research development strategies should
be considered as being competitive, potentially profitable, and clear of infringement lia-
bilities. The key sections of the entire research paper, including introduction, literature
reviews, general methodology development, computer-supported prototype implementa-
tion, and bio-AM case study, are outlined in Fig. 1.
Literature review
This research relies on text mining, data mining, and clustering methods to construct the
ontology of bio-AM technologies and applications. The computer-assisted system, incor-
porating the algorithmic subroutines, is built using R programs. Text mining extracts
information from documents and the extracted term frequencies are used for accurately
clustering these documents. Without accurate clustering, a suitable ontology is difficult, if
not impossible, to derive. In the following sub-sections, the background and literature of
the fundamental algorithms and previous research methodologies are reviewed.
Text mining of patent documents
Most of the information in the patent document is text although illustrations are used to
clarify the uniqueness of claims and evaluate the advancement of technology beyond the
prior art. Text mining is used to extract knowledge from the unstructured language of
patent claims and descriptions (Sanchez et al. 2008). The outcome of the process provides
unique information about the invention and helps to correlate similar information found in
Fig. 1 The research framework and the paper outline
Scientometrics (2017) 111:139–157 141
123
other patents (Sullivan 2001). Text mining technology is often used in combination with
term frequency analysis and the approach is divided into text refining and knowledge
distillation. Text refining transforms free form text documents into an intermediate form,
and knowledge distillation derives patterns or knowledge from the intermediate form (Tan
1999). Text mining requires a substantial number of text documents to create statistically
valid relationships and provide a sufficient pool of term frequencies to analyze and match
the results of patent analysis. Common patent analysis methods include categorical anal-
ysis, cluster analysis, and relational analysis.
Most text mining methods are based on words, but the existence of synonyms and
polysemy affects the results unless controlled. Zhong et al. (2012) present an effective
pattern discovery technique to improve the effectiveness of the text mining technique. The
first step of text mining begins with data pre-processing (Mierswa et al. 2006) and the basic
text processing steps (Te Liew et al. 2014) include importing and classifying text files,
transforming upper and lower cases, creating tokens (e.g., the terms ‘greenhouse gases’ and
‘greenhouse gas’ are replaced with the token ‘GHG’), splitting text into a sequence of
tokens, filtering English stop-words, filtering user defined stop-words, and generating
n grams (n grams are a contiguous sequence of n objects used to capture phrases from the
text).
After data pre-processing, methods to count term frequencies, variance, and term fre-
quency—inverse density frequency (TF-IDF) are used to extracted keywords from patent
documents. Lee et al. (2009) conducted variance and frequency comparison analyses to
extract keywords. A high variance means that a keyword has a high frequency in specific
documents and a low frequency in other documents, and better identifies the technical
features of patent documents (Bermudez-Edo et al. 2015).
Clustering
Clustering is used as an unsupervised learning method with the goal to discover a new set
of categories (Maimon and Rokach2005). For practical clustering, the goal is to minimize
the distances (similarity maximization) among entities in the same cluster and maximize
the distance (similarity minimization) between entities in different clusters. The clustering
method is divided into two categories (Jain et al. 1999):
1. Partitioning algorithms that produce non-hierarchical clusters. This method assigns a
number of clustering centers and an iteration algorithm determines the cluster group
centers. Frequently used partitioning algorithms include Self-Organizing Maps (SOM)
and K means clustering (Trappey et al. 2013). Among these clustering algorithms,
K means clustering is most widely used and studied (Kanungo et al. 2002). K means
clustering is very efficient in terms of computational time, but it is sensitive to outliers.
For this reason, K medoids clustering is used to reduce outlier bias.
2. Hierarchical algorithms produce hierarchical clusters using divisive and agglomerative
approaches (Rokach and Maimon 2005). The divisive approach is rarely used. The
agglomerative approach views each sample as a small cluster, iteratively combined
into a larger cluster using techniques such as Ward’s method.
Ontology
Ontology is a branch of philosophy often called a synonym of metaphysics (Floridi 2008).
The underlying concept of ontology is to discuss the existence or features of real objects.
142 Scientometrics (2017) 111:139–157
123
Gruninger and Fox (1995) defined ontology as a formal description of a domain set of
entities, their properties, behaviors, and relations.
With the development of Information Technology (IT), ontologies are applied in the
area of Knowledge Sharing (KS) for innovative product or service development (Lee et al.
2015; Yan et al. 2005, 2009). An ontology defines the basic terms and relations comprising
the vocabulary of the domain knowledge as well as the rules for combining terms and
relations to define extensions to the vocabulary (Neches et al. 1991). Gruber (1995) defines
an ontology as an explicit specification of a conceptualization, which describes a knowl-
edge domain consisting of a set of relations, objects concepts, and functions.
A formal methodology to develop ontologies was proposed by Noy and McGuinness
(2001). The systematic steps include determining the domain and scope of the ontology,
considering the reuse of existing ontologies, enumerating important terms in the ontology,
defining the classes, their hierarchy, and their properties as class facets, and finally creating
instances of ontology classes.
Using R subroutines for text and data analytics
R language is a standard set of computer code and subroutines for statistical analysis and
graphical visualization originally developed in 1996 (Ihaka and Gentleman 1996). R
language has a similar architecture to the earlier statistical programming language S. After
years of development and enhancement, R has become a widely used open access statis-
tical software language.
Methodology
This section describes the approaches and algorithms used for the IP portfolio and evo-
lution analytics. The proposed methodology includes steps for extracting key terms per-
forming cosine similarity analysis, creating and validating technical clusters, and tracing
the technology evolution (Fig. 2).
Key term extraction
This research uses the normalized term frequency-inversed document frequency (NTF–
IDF) ranking method to identify key terms. Term Frequency (TF) prioritizes the words
appearing in a text document. Due to variations in length of patents or project documents,
we normalize the term frequencies for the entire set of text documents. The normalized
term frequency (NTF) is used to provide a precise representation of term frequency
occurrences in a document. The mathematical expression of NTF is shown in Eq. (1)
ntfij ¼ tfij �PN
s¼1 dns
N � dnjð1Þ
tfij: The number of times that term i appears in document j; dnj: The total number of words
in document j; N: The total number of documents.
Inverse document frequency (IDF) is a measure used to calculate a terms’ importance
and its ability to distinguish a document from others. The IDF value is calculated in two
steps. The first step represents the total number of documents (N) divided by the number of
documents where the term i appears (dfi). The second step provides the quotient of the first
Scientometrics (2017) 111:139–157 143
123
step and represents it in the form of a logarithm. The mathematical formula is shown by
Eq. (2)
idfi ¼ logN
dfi
� �
ð2Þ
dfi: The number of documents where the term i appears.
The NTF–IDF value is the product of Eqs. (1) and (2). The reason for using NTF–IDF is
to reduce bias if a term appears frequently in one document but rarely appears in other
documents. This allows NTF–IDF to be used as a method to distinguish between different
categories and is very suitable for clustering. Lower NTF–IDF values represent the lower
degrees of discrimination and higher values represent higher degrees of discrimination.
Thus, NTF–IDF values are a means to distinguish important terms. R statistical software
with text mining functions is adopted to calculate all NFT–IDF values of a given set of
documents as shown in Table 1, where NTF - IDFij is the product of ntfij and idfi. The
NTF–IDF data are used as inputs for cosine similarity analysis.
Cosine similarity analysis
The cosine similarity theorem (Salton et al. 1975) is used to determine the similarity
between documents. The angle between the inner product of two vectors determines the
cosine similarity. Since the NTF–IDF value of terms are unlikely to be negative, the cosine
similarity of the two documents range from zero to one. The mathematical formula is
shown in Eq. (3). Referring to Eq. (3), the numerator represents the inner product of the
Fig. 2 The step-by-step procedure of the IP evolution method and algorithm sequence
144 Scientometrics (2017) 111:139–157
123
two key terms vectors x and y and the denominator represents the product of the vector
lengths.
Similarity x; yð Þ ¼ cos hð Þ ¼ x � yxk k yk k ð3Þ
x�y inner product of vectors x and y; kxk: length of vector x; kyk: length of vector y.
Clustering algorithms and performance validation
Brock et al. (2008) presented three types of cluster validation approaches: internal, sta-
bility, and biological. This research uses internal and stability validation to measure the
quality of clustering and to determine which algorithm is most suitable for analyzing the
bio-AM case.
For internal validation, there are three measures including connectivity, silhouette
width, and Dunn index. Connectivity measures the connectedness of observations whereas
silhouette width and the Dunn index measure the non-linear combinations of the com-
pactness and separation of the cluster partitions.
For stability validation, there are four measures for the average proportion of non-
overlap (APN), the average distance (AD), the average distance between means (ADM),
and the figure of merit (FOM). These measures compare the clustering results based on the
full set of data.
In this research, three clustering algorithms (K means algorithm, K medoids algorithm,
and Ward’s algorithm) are applied to cluster the same set of documents. The K means
approach (MacQueen 1967) requires the specification of the number of clusters (k), then
through iteration, reduces the differences between data within a cluster and the cluster
center until cluster members are fixed. K means clustering is divided into the following five
steps (Velmurugan and Santhanam 2010):
1. Randomly select k numbers of data in the data set as the initial cluster centers and
estimate the number of clusters (K clusters).
2. Calculate the distance of each random cluster center to each data, and assign each data
to the nearest cluster center. This will form a cluster boundary resulting in an initial
cluster of members.
3. According to the cluster boundary, calculate the mean point of each cluster to the new
cluster center.
Table 1 Document versus key term NTF–IDF matrix
NTF � IDFij Terms (Ti; i ¼ 1; 2; . . .;m)
T1 T2 … Tm
DocumentsðDj; j ¼ 1; 2; . . .; n)
D1 NTFIDF1,1 NTFIDF2,1 … NTFIDFm,1
D2 NTFIDF1,2 NTFIDF2,2 … NTFIDFm,2
D3 NTFIDF1,3 NTFIDF2,3 … NTFIDFm,3
… … … … …Dn NTFIDF1,n NTFIDF2,n … NTFIDFm,n
Sumvalue
Pnj¼1 NTF � IDF1;j
Pnj¼1 NTF � IDF2;j …
Pnj¼1 NTF � IDFm;j
Scientometrics (2017) 111:139–157 145
123
4. After specifying the new cluster center, calculate the distances again and redistribute
the data to the nearest cluster centers.
5. Steps 3 and 4 are repeated until there are no more changes in the cluster members.
The number K will affect the clustering result, so the R Squared (RS) and root-mean-
square standard deviation (RMSSTD) statistics are used to estimate the number of clusters.
RS indicates the degree of difference among clusters. ANOVA analysis of variance is used
to calculate the RS values which are between zero and one. RMSSTD is a measure of
homogeneity within clusters. Large values of RS mean that there is a greater difference
between clusters. Small values of RMSSTD indicates that the clusters are homogenous.
The optimal number of clusters are selected using larger RS values and smaller RMSSTD
values.
The K medoids clustering algorithm is an algorithm derived from the K means algo-
rithm. Kaufman and Rousseeuw (1990) demonstrated that Partitioning Around Medoids
(PAM) is an improved version of the K medoids algorithm. The PAM algorithm is
described as follows (Velmurugan and Santhanam 2010):
1. Arbitrarily choose k objects in the data set as the initial clustering medoids.
2. Assign each remaining object to the cluster with the nearest medoid.
3. Randomly select a non medoid object Orandom in each cluster and compute the distance
change S. If S\0, replace the original medoid with Orandom.
4. Repeat steps 2 and 3 until there is no change.
Hierarchical clustering is a method where the data are repeatedly divided or combined
to produce the final tree structure. The most commonly used method is agglomerative
hierarchical clustering which aggregates layers from the bottom of the tree to form clusters
hierarchically. The algorithm follows four steps:
1. Each data set will be treated as a cluster Ci, i = 1 to n.
2. Find the closest two clusters among all the clusters, Ci and Cj.
3. Merge the two clusters Ci and Cj into a new cluster.
4. If the current number of clusters exceeds the expected number, then repeat steps 2–4
until the number of clusters satisfy the requirements.
In step 2, two close clusters are found. The common algorithms used are a single-
linkage agglomerative algorithm, complete-linkage agglomerative algorithm, average-
linkage agglomerative algorithm and Ward’s method (Jain et al. 1999). In this research, we
use Ward’s method to find the two closest clusters. Ward’s (1963) method is a common
algorithm for hierarchical clustering and its mathematical representation is shown in
Eq. (4):
dðCi;CjÞ ¼X
a2Ci[Cj
a� lk k ð4Þ
l The mean vector of Ci [ Cj.
Tracing technology evolution
Based on the NTF–IDF matrix and the clustering results, a dynamic concept lattice graph
for a given set of technical documents (patents and proposals) is constructed. First, the
NTF–IDF matrix of entire document set provides the basic data for clustering and each
cluster also has its own NTF–IDF matrix. Second, all NTF–IDF values are ranked in the
146 Scientometrics (2017) 111:139–157
123
NTF–IDF matrices by ascending order. The top quartiles of documents are included for
evolution analysis. Thus, if the NTF–IDF value exceeds the first quartile threshold value,
then NTF–IDF index value is 1, otherwise the index is 0. Finally, define the correlation
threshold value (e.g., 0.3). If the correlation value between two documents is greater than
the threshold, there is a strong relationship (link) between these two nodes. If the docu-
ments have common terms in branch, a solid line is used to connect the documents. If not, a
dotted line is used to link the documents. The evolution algorithm’s pseudo code and
explanatory comments are provided in Appendix 1. The portfolio and evolution analysis of
bio-AM patents and project proposals are described in ‘‘Case analysis of bio-AM patents
and projects’’ section.
Case analysis of bio-AM patents and projects
The global bio-AM patents, excluding sole dental applications, are retrieved and archived
using a strategic combination of keywords. The search keywords are divided into parts
including 3D printing (additive manufacture) and biomedical scaffolding. First, the search
keywords are defined as additive manufacture, 3d, three dimension, print, fabricate, and
manufacture. Then these keywords are extended and combined with the keywords, such as
biomedical, bionic scaffold, and implant, to improve the search accuracy within biomed-
ical applications. The research uses the word stemming function commonly available in the
patent search engine (e.g., FreePatentsOnline, FPO). Thus, we can also identify patents
which use inflected and not exact words.
Among 220 granted US patents identified from the preliminary search, 58 patents are
eventually selected by the domain expert as having a high technological impact. A set of
nine Taiwan Ministry of Science and Technology (MOST) bio-AM research proposals and
58 published patents (P10–P67) are archived for the lattice construction. A total of 67
documents are selected to demonstrate the methodology used in the case study.
Key term extraction
The stemDocument function of R statistical software filters out the English stop words.
This step reduces the number of generic English function words in the text. The text
mining subroutines (e.g., tm, NLP, textir, lsa) are used to generate the NTF–IDF matrix. A
partial example of the NTF–IDF matrix is shown in Table 2, where NTF–IDF values are
summed by column to calculate the total NTF–IDF values of key terms. The top ranked 56
terms for the case document set are listed in Appendix 2.
Table 2 A partial NTF–IDFmatrix for the case analysis
Implant Scaffold Bone Guide Model
P1 0.00 48.09 0.00 0.00 4.06
P2 0.00 0.00 0.00 0.00 4.90
P3 1.39 0.00 14.46 0.00 0.00
P4 0.00 8.49 0.00 0.00 0.00
… … … … … …Total 369.80 348.14 302.22 282.30 277.29
Scientometrics (2017) 111:139–157 147
123
In order to reduce the bias of choosing similar key phrases, key term selection rules are
defined. The first rule is that if the terms describe an identical or near identical concept, the
most generalized term is selected as the representing the token. For example, we select
‘‘scaffold’’ instead of ‘‘cellular scaffold.’’ The second rule is that if the phrase and its
divided words are all in the top 100 rankings, the words in the phrases are selected. For
example, we select both ‘‘tissue’’ and ‘‘construct’’ as key terms, instead of ‘‘tissue con-
struct.’’ The third rule is to select the terms which have higher NTF–IDF values. Before
clustering the documents, the NTF–IDF matrix is organized using the cosine similarity
algorithm. The resulting partial matrix is shown in Table 3.
Bio-AM technology clustering and the ontology schema
Before clustering, the R package (clValid) is used to evaluate the most suitable numbers of
clusters. Three popular algorithms, K means, Partitioning Around Medoids (PAM or
K medoids), and Wards hierarchical are selected for clustering. The comparative analysis
of the clustering algorithms, based on their internal and stability validation values
(Table 4), suggests PAM with six clusters and hierarchy with two clusters yield good
results.
For the case analysis, the six cluster K medoids clustering analysis is performed and the
result is listed in Appendix 3. This algorithm generates a six cluster result that is a better
representation with more distinguished cluster features than the other clustering algo-
rithms. As observed, cluster one identifies the tissue engineering applications. Cluster two
is similar to cluster one but is focused on methods for bio-printing. Cluster three identifies
the methods of manufacturing bone implants. Cluster four identifies the material compo-
sition for implants and prosthesis. Cluster five is related to surgical guides and models.
Finally, cluster six focuses on the joint implant applications. Shown in Appendix 4, this
research adjusts the K medoids six clusters into three meaningful sub-domain categories
for the technology evolution analysis (in the ‘‘Bio-AM technology evolution analysis’’
section). These three sub-domains for AM-bio applications are (1) tissue engineering and
bioprinting, (2) surgical guides and modeling, and (3) implants and prosthesis as identified
in the literature (Rengier et al. 2010; Klein et al. 2013; Gross et al. 2014).
Using the frequently appearring key terms identified through text mining of the case
documents, the biomedical-AM ontology schema is constructed as shown in Fig. 3. The
scaffold node is subdivided into mechanical aids including in vitro and in vivo implants.
Mechanical aids in vitro are linked to selective laser sintering (SLS) and selective laser
melting (SLM) techniques to produce prosthetics and use polyamide (PA) and Ti–6Al–4V
as construction materials. In vivo implants are used to construct scaffolds for growing
organs and tissues. The materials commonly used include polycaprolactone (PCL), poly-
lactic acid (PLA), sodium carboxymethyl cellulose, biosynthetic cellulose, nano-cellulose,
Table 3 Similarity matrix(partial)
P1 P2 P3 P4 P5
P1 1.000 0.035 0.001 0.157 0.003
P2 0.035 1.000 0.001 0.009 0.346
P3 0.001 0.001 1.000 0.012 0.429
P4 0.157 0.009 0.012 1.000 0.024
P5 0.003 0.346 0.429 0.024 1.000
148 Scientometrics (2017) 111:139–157
123
Connective Growth Factor (CTGF), and Transforming Growth Factor b3 (TGFb3). The
bone node in the ontology schema is in the implant sub-domain, which uses biomedical
grade metal powders (e.g., Ti–6Al–4V and Co–Cr–Mo) to build prostheses to replace the
original bone structure. Commonly used AM techniques are selective laser sintering (SLS)
and selective laser melting (SLM). The soft organ category has a key link to stem cells.
Stem cells are obtained from umbilical cord blood, bone marrow mesenchymal and adipose
tissue. The 3D printing bio ink is created from biological stem cells and sprayed onto the
Table 4 The validation result of hierarchical, k means and PAM clustering
Internal validationValidation Cluster sizesMeasures: 2 3 4 5 6
hierarchical Connectivity 3.8972 8.6143 21.5067 28.7341 32.0742Dunn 0.4880 0.5451 0.4734 0.4919 0.4919Silhouette 0.2756 0.2713 0.2461 0.2531 0.2611
k-means Connectivity 3.8972 8.6143 22.2504 28.7722 37.2702Dunn 0.4880 0.5451 0.3529 0.3666 0.3785Silhouette 0.2756 0.2713 0.2518 0.2673 0.2654
pam Connectivity 3.8972 8.6143 27.6004 34.0972 40.2456Dunn 0.4880 0.5451 0.3459 0.3648 0.4060Silhouette 0.2756 0.2713 0.2347 0.2554 0.2766
Optimal Scores:Score Method Clusters
Connectivity 3.8972 hierarchical 2Dunn 0.5451 hierarchical 3Silhouette 0.2766 pam 6Stability validationClustering Validation Cluster sizemethod measure 2 3 4 5 6hierarchical APN 0.0000 0.0170 0.0033 0.0261 0.0856
AD 1.8397 1.6874 1.5195 1.4357 1.4002ADM 0.0000 0.0274 0.0059 0.0822 0.2009FOM 0.1643 0.1603 0.1417 0.1414 0.1416
k-means APN 0.0000 0.0157 0.0004 0.0285 0.0458AD 1.8397 1.6874 1.5114 1.4150 1.3852ADM 0.0000 0.0274 0.0008 0.0781 0.2126FOM 0.1643 0.1603 0.1409 0.1390 0.1394
pam APN 0.0253 0.1682 0.1194 0.1503 0.0705AD 1.8569 1.7598 1.5824 1.4950 1.3373ADM 0.0519 0.3013 0.2162 0.3015 0.1332FOM 0.1636 0.1605 0.1468 0.1397 0.1347
Optimal Scores:Score Method Clusters
APN 0.0000 hierarchical 2AD 1.3373 pam 6ADM 0.0000 hierarchical 2FOM 0.1347 pam 6
Scientometrics (2017) 111:139–157 149
123
scaffold with a hydrogel used to attach the cells. These cells are grown in a bioreactor
breeding vessel which enables the cells to grow and generate tissues and organs. After-
ward, the scaffold is biodegraded. The tissues and cells collection branch is similar to the
soft organ category. This category is unique innovation and is included as an AM
application.
Bio-AM technology evolution analysis
The clusters and evolution pathways for bio-AM patents and research projects using the
concept lattice algorithm are shown in Fig. 4. The derived IP portfolio’s (including patents
and MOST projects) evolution for the adjusted sub-domain clusters are depicted in this
figure. The evolution of the bio-AM sub-domain for surgical guide production and mod-
eling are described in Fig. 5.
As shown in the example case in Fig. 5, the bio-AM IPs for the surgical guide sub-
domain include two US patents and two MOST projects (detailed in Table 5) and th
evolutionary relationships are drawn using the concept lattice algorithm. The key terms of
each patent/project are shown in Fig. 4 and are highlighted in the boxes.
Patent 8352056 illustrates a method of manufacturing a surgical implant guide to
increase the precision, safety and reliability of the surgery. Patent 8758357 discloses a
system and method for developing customized apparatus for use in one or more surgical
procedures. The common key terms of these two patents are data, image, surgical, implant,
and guide. With these key terms as concept lattices, it indicates that these two patents are
using image data to manufacture surgical implant guides. PJ17 describes the clinical
applications of manufacturing bone structures. The common key terms for patent 8352056
and PJ17 are medical, image, tissue, surgical, and guide, which identify their strong
evolutionary link algorithmically. PJ21 develops software for 3D modeling to create
surgical guide fixtures and other devices for high-precision clinical applications. The
Fig. 3 The ontology schema of bio-AM technologies and key terms
150 Scientometrics (2017) 111:139–157
123
common key terms with patent 8352056 are medical, image, surgical, and guide. These key
terms identify PJ21 as an application for surgical guides and models.
Patent 8352056 describes planning procedures for implanting screws into affected tis-
sues that can be used to construct replacement teeth. Patent 8758357 describes a patient
specific pedicle screw guide that anatomically mates with the spinous processes of a
particular vertebral body (key terms: anatomic, pedicle screw). PJ17 is the clinic appli-
cation of additive manufacturing bone (key terms: bone, screw, orthopedic). PJ21 describes
not only a surgical guide but also the clinic application of the pedicel screw. These four
Fig. 4 The evolution map of patents and projects in three sub-domain clusters
Fig. 5 The key terms indicate the evolving links between patents and projects
Scientometrics (2017) 111:139–157 151
123
patents and projects are linked by the evolutionary relationship. The content describes
applications for surgical guides and models. An important discovery for this research is
that patent 8352056 is from a Taiwan assignee (Chang-Gung University). This patent was
developed earlier than the other patents and indicates that we have developed significant
bio-AM information technology that provides a strategic advantage for manufacturing and
commercializing related surgical applications.
Conclusions
This research uses additive manufacturing (AM or called 3D printing) applied in the
biomedical field as a case example to construct the domain ontology and perform IP
portfolio evolution analysis. The algorithms have advantages and limitations. The
advantages are that the approach is a computer-supported system, which can objectively
mine knowledge text and data from patent and project documents. The system can conduct
key terms extraction and perform document similarity analysis. The documents are clus-
tered and the evolutionary trends of technologies, based on the text mining, can be
interpreted and extrapolated. The evolution analysis is to calculate the similarity between
documents over time with solid evidence that they are correlated beyond a threshold value.
The key terms of each document can be interpreted for their context and meaning. The
limits of this method are the sample of documents and the selection of key terms may not
be a complete representation of the body of knowledge under investigation. Although this
research defines the rules of selecting key terms, the results require a domain expert to
check and validate the results. For future research, additional validation and verification
methods of the analytical results will be derived and tested to ensure valid and reliable
decision support for defining IP trends and R&D strategies.
Acknowledgements This research is partially supported by the Ministry of Science and Technologyresearch Grants (MOST 104-2218-E-007-015-MY2).
Table 5 Patent/project number and titles related to bio-AM surgical guide and modeling sub-domain
Patent/projectnumber
Title
8352056 Surgical implant guide and method of manufacturing the same
8758357 Patient matching surgical guide and method for using the same
PJ17 Clinical applications of geometry-based techniques on musculoskeletal disorders—aninvestigation on forearm brace manufacture and fracture reduction
PJ21 Software development in 3d modeling and manipulation for precisely clinicalapplications
152 Scientometrics (2017) 111:139–157
123
Appendices
Appendix 1
Patent evolution algorithm.
PatentList = Sort ( PatentCotext, IssuedDate ) for( i = 1 to X of patents) //From 1 to x of patent {if (PatentList (i) is composed of new key terms ) {Make_Node ( Property = PatentList(i), Radius = r )#Make a node for ith patent without linkage, which radius is r} else if (PatentList (i) is composed of new and existing key terms ) {Make_Node ( Property = PatentList(i), Radius = r )#Make a node for ith patent //Using solid or dotted line to link to related patents if ( Related_Node Similarity > = threshold ) {if(Common Keyterms appear in all branch node) {Link ( Related Node, Property is solid line )} else {Link ( Related Node, Property is dotted line )}}} else {if (PatentList (i) is composed of existing key terms in the same year )#Add the patent to existing node {Update_existingNode ( Property add PatentList(i), Radius = r* the number of patents in the node )} else {Make_Node ( Property = PatentList(i), Radius = r )#Make a node for ith patent //Using solid or dotted line to link to related patents if ( Related_Node Similarity > = threshold ) { if(Common Keyterms appear in all branch node) { Link ( Related Node, Property is solid line ) } else { Link ( Related Node, Property is dotted line )}}}}}
Appendix 2
See Table 6.
Scientometrics (2017) 111:139–157 153
123
Appendix 3
See Table 7.
Table 6 Top 56 key terms with assigned IDs
ID Keyterms ID Keyterms ID Keyterms ID Keyterms
K1 Implant K16 Polymer K31 Reconstruct K46 Limb
K2 Scaffold K17 Metal K32 Composite K47 Simulate
K3 Bone K18 Leg K33 Liver K48 Knee
K4 Guide K19 Tissue K34 Biocompatible K49 Ligament
K5 Model K20 Anatomic K35 Medical K50 Cartilage
K6 Joint K21 Prosthetic K36 Condylar K51 Lung
K7 Porous K22 Elastomer K37 Splint K52 Polycaprolactone
K8 Bioprint K23 Ink K38 Graft K53 Polyurethane
K9 Data K24 Virtual K39 Jaw K54 Skull
K10 Image K25 Digital K40 Tibial K55 Cellulose
K11 Surgical K26 Prosthesis K41 Pedicle K56 Cardiovascular
K12 Mold K27 Contact K42 Engineering
K13 Bio K28 Acid K43 Biodegradable
K14 Orthopedic K29 Femoral K44 Screw
K15 Articular K30 Cell K45 Osteotomy
Table 7 K medoids clustering result (6 clusters)
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
PJ02 PJ10 PJ12, PJ22 PJ14, PJ16 PJ17, PJ21 US8036729
US7968026 US8931880 US8532806 PJ23 US8350186 US8545569
US8071007 US9039998 US8551173 US8086336 US8352056 US8617242
US8090540 US9149952 US8679189 US8142886 US8425227 US8682052
US9034378 US9227339 US8843229 US8172907 US8496663 US8715291
US9168328 US8888862 US8236350 US8706285 US8735773
US9255178 US9015922 US8303746 US8758357 US9237950
US9211129 US8366789 US8790408
US9241772 US8369925 US8984731
US8454705
US8457930
US8470231
US8527244
US8582841
US8623397
US8691974
US8775133
US8852192
US8868226
154 Scientometrics (2017) 111:139–157
123
Appendix 4
See Table 8.
Table 8 The adjusted clusters with three sub-domain interpretations
Tissue engineering/bioprint Surgical guide/model Implant/prosthesis
PJ02, PJ10 PJ17, PJ21 PJ12, PJ16
PJ14, PJ23 PJ22 US8036729
US7968026 US8352056 US8086336
US8071007 US8369925 US8142886
US8090540 US8425227 US8172907
US8236350 US8496663 US8303746
US8470231 US8527244 US8350186
US8582841 US8706285 US8366789
US8623397 US8758357 US8454705
US8691974 US8790408 US8457930
US8931880 US8843229 US8532806
US9034378 US8984731 US8545569
US9039998 US9241772 US8551173
US9043191 US8617242
US9149952 US8679189
US9168328 US8682052
US9180029 US8715291
US9222932 US8735773
US9227339 US8775133
US9255178 US8852192
US8868226
US8888862
US8920512
US8974535
US8992825
US9015922
Table 7 continued
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
US8920512
US8974535
US8992825
US9043191
US9060810
US9180029
US9222932
US9226827
Scientometrics (2017) 111:139–157 155
123
References
Bermudez-Edo, M., Hurtado, M. V., Noguera, M., & Hurtado-Torres, N. (2015). Managing technologicalknowledge of patents: HCOntology, a semantic approach. Computers in Industry, 72, 1–13.
Brock, G., Pihur, V., Datta, S., & Datta, S. (2008). clValid, an R package for cluster validation. Journal ofStatistical Software, 25(4), 1–22.
Floridi, L. (Ed.). (2008). The Blackwell guide to the philosophy of computing and information. Hoboken, NJ:Wiley.
Gross, B. C., Erkal, J. L., Lockwood, S. Y., Chen, C., & Spence, D. M. (2014). Evaluation of 3D printingand its potential impact on biotechnology and the chemical sciences. Analytical Chemistry, 86(7),3240–3253.
Gruber, T. R. (1995). Toward principles for the design of ontologies used for knowledge sharing? Inter-national Journal of Human-Computer Studies, 43(5), 907–928.
Gruninger, M., & Fox, M. S. (1995). Methodology for the design and evaluation of ontologies. Workshop onbasic ontological issues in knowledge sharing, August 19–20, Montreal.
Ihaka, R., & Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of computationaland graphical statistics, 5(3), 299–314.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys(CSUR), 31(3), 264–323.
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). Anefficient k means clustering algorithm: Analysis and implementation. IEEE Transactions on PatternAnalysis and Machine Intelligence, 24(7), 881–892.
Kaufman, L., & Rousseeuw, P. J. (1990). Partitioning around medoids (program PAM). Finding groups indata: an introduction to cluster analysis (pp. 68–125). Hoboken: Wiley.
Klein, G. T., Lu, Y., & Wang, M. Y. (2013). 3D printing and neurosurgery—ready for prime time? WorldNeurosurgery, 80(3), 233–235.
Lee, C.-H., Wang, Y.-H., & Trappey, A. J. C. (2015). Ontology-based reasoning for the intelligent handlingof customer complaints. Computers and Industrial Engineering, 84, 144–155.
Lee, S., Yoon, B., & Park, Y. (2009). An approach to discovering new technology opportunities: Keyword-based patent map approach. Technovation, 29(6), 481–497.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Pro-ceedings of the fifth Berkeley symposium on mathematical statistics and probability (vol. 1, no. 14,pp. 281–297).
Maimon, O., & Rokach, L. (Eds.). (2005). Clustering methods. In Data mining and knowledge discoveryhandbook (pp. 321–352). Springer, Berlin.
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., & Euler, T. (2006). Yale: Rapid prototyping forcomplex data mining tasks. In Proceedings of the 12th ACM SIGKDD international conference onKnowledge discovery and data mining (pp. 935–940). ACM.
Narin, F., Noma, E., & Perry, R. (1987). Patents as indicators of corporate technological strength. ResearchPolicy, 16(2), 143–155.
Neches, R., Fikes, R. E., Finin, T., Gruber, T., Patil, R., Senator, T., et al. (1991). Enabling technology forknowledge sharing. AI Magazine, 12(3), 36.
Noy, N. F., & McGuinness, D. L. (2001). Ontology development 101: A guide to creating your firstontology. Technical Report KSL-01-05, Stanford Knowledge Systems Laboratory.
Rengier, F., Mehndiratta, A., von Tengg-Kobligk, H., Zechmann, C. M., Unterhinninghofen, R., Kauczor, H.U., et al. (2010). 3D printing based on imaging data: Review of medical applications. InternationalJournal of Computer Assisted Radiology and Surgery, 5(4), 335–341.
Table 8 continued
Tissue engineering/bioprint Surgical guide/model Implant/prosthesis
US9060810
US9211129
US9226827
US9237950
156 Scientometrics (2017) 111:139–157
123
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communicationsof the ACM, 18(11), 613–620.
Sanchez, D., Martin-Bautista, M. J., Blanco, I., & Torre, C. (2008). Text knowledge mining: An alternativeto text data mining. In Proceedings of the 8th ICDMW IEEE international conference on Data miningworkshop (pp. 664–672).
Sullivan, D. (2001). Document warehousing and text mining: techniques for improving business operations,marketing, and sales. Hoboken, NJ: Wiley.
Tan, A. H. (1999). Text mining: The state of the art and the challenges. In Proceedings of the PAKDDworkshop on knowledge discovery from advanced databases (vol. 8, pp. 65–70).
Te Liew, W., Adhitya, A., & Srinivasan, R. (2014). Sustainability trends in the process industries: A textmining-based analysis. Computers in Industry, 65(3), 393–400.
Transparency Market Research. (2013). 3D printing in medical applications market—global industryanalysis, size, share, growth, trends and forecast, 2013–2019. Retrieved from Research and MarketWebsite: http://www.researchandmarkets.com/reports/2642328/3d_printing_in_medical_applications_market#pos-0.
Trappey, A. J., Trappey, C. V., Chiang, T. A., & Huang, Y. H. (2013). Ontology-based neural network forpatent knowledge management in design collaboration. International Journal of Production Research,51(7), 1992–2005.
Tseng, Y. H., Lin, C. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. InformationProcessing and Management, 43(5), 1216–1247.
Velmurugan, T., & Santhanam, T. (2010). Computational complexity between K means and K medoidsclustering algorithms for normal and uniform distributions of data points. Journal of Computer Sci-ence, 6(3), 363.
Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the AmericanStatistical Association, 58(301), 236–244.
Yan, W., Chen, C.-H., & Chang, W. (2009). An investigation into sustainable product conceptualizationusing a design knowledge hierarchy and Hopfield network. Computers and Industrial Engineering,56(4), 1617–1626.
Yan, W., Khoo, L. P., & Chen, C.-H. (2005). A QFD-enabled product conceptualisation approach via designknowledge hierarchy and RCE neural network. Knowledge-Based Systems, 18(6), 279–293.
Zhong, N., Li, Y., & Wu, S. T. (2012). Effective pattern discovery for text mining. IEEE Transactions onKnowledge and Data Engineering, 24(1), 30–44.
Zhou, X., Zhang, Y., Porter, A. L., Guo, Y., & Zhu, D. (2014). A patent analysis method to trace technologyevolutionary pathways. Scientometrics, 100(3), 705–721.
Scientometrics (2017) 111:139–157 157
123