# C4.5 How to select the split point (threshold) for a Continuous Attribute

G

#### GreekFire

##### Guest
GreekFire Asks: C4.5 How to select the split point (threshold) for a Continuous Attribute
Using the "play golf" or "play ball" data (listed at the bottom), to pick the root node we look at Outlook, Temperature, Humidity, and Wind, to see which has the highest GainRatio.

Now, Outlook will be chosen as the attribute with the highest GainRatio. However, I am confused that Humidity (a Continuous Attribute) selects the split point 80 having a GainRatio=0.1087, while 65 has a higher GainRatio=0.1285. The split point 80 does have a higher Gain, but not GainRatio.

I have seen literature say roughly "pick the split point for a continuous attribute to be the one giving the most gain"... this seems counterintuitive to me that the split point is based on Gain alone, opposed to when comparing all the attributes you select the highest GainRatio to be the next decision node.

I hope to gain some clarity here.

Thanks.

EDIT: The crux of the question is: what is the appropriate method for selecting the threshold split point of a continuous attribute? Is it (1) the Gain or (2) the Gain Ratio?

The calculations are as follows:

OUTLOOK:
Gain = 0.2467
SplitInfo = 1.5774
Gain Ratio = 0.1564

TEMPERATURE:
Gain = 0.0292
SplitInfo = 1.5566
Gain Ratio = 0.0187

HUMIDITY:
Possible split points = { 65, 70, 75, 78, 80, 85, 90, 95, 96 }

Split 65:
Gain = 0.0477
SplitInfo = 0.3712
Gain Ratio = 0.1285

Split 80:
Gain = 0.1022
SplitInfo = 0.9402
Gain Ratio = 0.1087

WIND:
Gain = 0.0481
SplitInfo = 0.9852
Gain Ratio = 0.0488

DATA:
Outlook Temperature Humidity Wind Play
sun hot 85 low no
sun hot 90 high no
overcast hot 78 low yes
rain sweet 96 low yes
rain cold 80 low yes
rain cold 70 high no
overcast cold 65 high yes
sun sweet 95 low no
sun cold 70 low yes
rain sweet 80 low yes
sun sweet 70 high yes
overcast sweet 90 high yes
overcast hot 75 low yes
rain sweet 80 high no

• sorry, could not format data nicely
SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.

#### [Solved] Extract Data from Corrupted TerraSync file, Pathfinder can't correct & won't open

• Kevin Nolan
• Geography
• Replies: 0
Kevin Nolan Asks: Extract Data from Corrupted TerraSync file, Pathfinder can't correct & won't open
I had an employee recording points for a project with a Trimble Geo XT 7 series receiver attached to a Zephyr 2 external antenna. The receiver went to sleep between collecting episodes. Upon attempting to re-open, the file came back as corrupted and TerraSync was unable to fix and open the file. I have never had an unrecoverable corruption and didn't expect this to be a permanent problem. We stared a new file and finished collecting data.

However, now Pathfinder cannot fix and refuses to open the corrupted file. I copied the corrupted file parts to a separate folder manually (Pathfinder won't transfer file). I can open the individual files that show up in Windows Explorer (*.dd, *.gic, *.gip, *.gis, *giw, *.gix, *.obs, *.obx) in various programs (Notepad, Word, Excel), but only user-entered data (comments, notes, text fields from data dictionary form) show up as readable text. The rest is various combinations of numbers, letters, and symbols. I need to extract the coordinate data from this file, even if I can't postprocess and correct it.

Is there any way to translate the data displayed as symbols into text and numbers so that I can see the coordinates?

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your response here to help other visitors like you. Thank you, solveforum.

#### [Solved] How to snap lines from the same layer together?

• Geowhat
• Geography
• Replies: 0
Geowhat Asks: How to snap lines from the same layer together?
I did a Network Analysis and this is the outcome. How can I snap lines from the same layer together?

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your response here to help other visitors like you. Thank you, solveforum.

#### [Solved] Publish a Query Layer as a Feature Access Service - ArcGIS Server

a1234 Asks: Publish a Query Layer as a Feature Access Service - ArcGIS Server
I am connecting to a SQL Server 2016 database, and adding data into ArcMap 10.3.1 as a Query Layer.

I want to publish this data as a Feature Access service to ArcGIS Server 10.3.1. I have no problems publishing this data as a mapping service. However, when I want to enable Feature Access, I get 00223: Query layer has virtual column error.

I have read the documentation on this error, and the only solution is to "Change the query definition of your layer so it no longer contains a virtual column or columns".

I have tried several different ways to alter the sql query (very simple queries), and all still produce this same error.

What is a virtual column? How can I publish a Query Layer as a Feature Access service?

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your response here to help other visitors like you. Thank you, solveforum.

#### Doubt about Electromagnetic Attraction b/w Electromagnet-Iron vs Electromagnet-Electromagnet

• J.Doe
• Physics
• Replies: 0
https://www.supermagnete.de/eng/faq...the-combination-magnet-magnet-and-magnet-iron

At full contact, the attractive force between a raw magnet and an iron plate is the same as the attractive force between two raw magnets. However, with increasing distance, the attraction diminishes faster than the attraction between two raw magnets (see graph)

I am confused by this:

1. Consider two identical Electromagnets touching each other in one case and a single electromagnet touching an iron plate in second case. Won't the attraction be more in the first case as both magnets are exerting a pull on each other where as in second case the magnet is pulling on a passive iron plate (which has temporarily induced magnetism)?

2. For two cylindrical electromagnets with poles at distance $x$ repelling each other (they are relatively close by i.e. $10*R > x$ where $R$=radius of pole) how does the force exerted on each electromagnet change with increase in current ($I$)? Can someone help with this formula?

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.

#### Closing a Graph

sam wolfe Asks: Closing a Graph
Consider the following graph g

I want to "close" it by adding the following edges

Alternatively, I would also be happy with something like

where I've added four "corner" vertices.

Now, this question is, in a way, a follow up to this answer, where I initially have a picture on which I draw a graph, as follows

Is it possible to turn the picture into a rectangle-shaped graph and then join them in somehow? This would lead to something like the second example.

Alternatively, I also thought about simply uniting 1-degree vertices that are close to each other, creating the first example graph.

In the end, I want a graph that defines a mesh, thus the need of "closing" the graph in this fashion. Ideally, I want KVertexConnectedGraphQ[g] = True for the closed graph.

Any ideas?

Edit 1: Note that my goal is to be able to do this for general graphs. For example, considering the graph

I want to get something like

I guess using something like ConvexHull (which doesn't correspond to what is drawn) in some matter could help my goal, but at this point I'm entirely sure how.

Edit 2: In order to be more practical consider the graph given by

Code:
g = Graph[{1 \[UndirectedEdge] 10, 2 \[UndirectedEdge] 9,
3 \[UndirectedEdge] 9, 4 \[UndirectedEdge] 12,
5 \[UndirectedEdge] 8, 6 \[UndirectedEdge] 13,
7 \[UndirectedEdge] 14, 8 \[UndirectedEdge] 11,
8 \[UndirectedEdge] 17, 9 \[UndirectedEdge] 20,
10 \[UndirectedEdge] 11, 10 \[UndirectedEdge] 21,
11 \[UndirectedEdge] 25, 12 \[UndirectedEdge] 18,
12 \[UndirectedEdge] 19, 13 \[UndirectedEdge] 18,
13 \[UndirectedEdge] 27, 14 \[UndirectedEdge] 15,
14 \[UndirectedEdge] 19, 16 \[UndirectedEdge] 17,
17 \[UndirectedEdge] 23, 18 \[UndirectedEdge] 24,
19 \[UndirectedEdge] 22, 20 \[UndirectedEdge] 21,
20 \[UndirectedEdge] 27, 21 \[UndirectedEdge] 34,
22 \[UndirectedEdge] 26, 22 \[UndirectedEdge] 29,
23 \[UndirectedEdge] 31, 23 \[UndirectedEdge] 33,
24 \[UndirectedEdge] 29, 24 \[UndirectedEdge] 30,
25 \[UndirectedEdge] 31, 25 \[UndirectedEdge] 34,
27 \[UndirectedEdge] 28, 28 \[UndirectedEdge] 32,
28 \[UndirectedEdge] 38, 29 \[UndirectedEdge] 37,
30 \[UndirectedEdge] 32, 30 \[UndirectedEdge] 35,
31 \[UndirectedEdge] 39, 32 \[UndirectedEdge] 41,
34 \[UndirectedEdge] 36, 35 \[UndirectedEdge] 42,
35 \[UndirectedEdge] 44, 36 \[UndirectedEdge] 45,
36 \[UndirectedEdge] 54, 37 \[UndirectedEdge] 40,
37 \[UndirectedEdge] 44, 38 \[UndirectedEdge] 45,
38 \[UndirectedEdge] 48, 39 \[UndirectedEdge] 47,
39 \[UndirectedEdge] 53, 41 \[UndirectedEdge] 42,
41 \[UndirectedEdge] 48, 42 \[UndirectedEdge] 56,
43 \[UndirectedEdge] 46, 44 \[UndirectedEdge] 51,
45 \[UndirectedEdge] 55, 46 \[UndirectedEdge] 47,
46 \[UndirectedEdge] 52, 47 \[UndirectedEdge] 49,
48 \[UndirectedEdge] 50},
VertexCoordinates -> {{102.5, 175.5}, {84.5, 152.5}, {108.5,
175.5}, {133.5, 153.5}, {152.5, 175.5}, {244.5,
175.5}, {254.5, 148.5}, {43.5, 174.5}, {43.5,
170.5}, {196.5, 174.5}, {202.5, 147.5}, {297.5,
174.5}, {309.5, 147.5}, {63.5, 148.5}, {10.5,
141.5}, {143.5, 117.5}, {119.5, 109.5}, {67.5,
94.5}, {236.5, 131.5}, {293.5, 127.5}, {180.5,
89.5}, {312.5, 146.5}, {4.5, 143.5}, {18.5,
97.5}, {253.5, 95.5}, {301.5, 98.5}, {110.5,
75.5}, {313.5, 93.5}, {286.5, 83.5}, {52.5, 80.5}, {4.5,
76.5}, {236.5, 82.5}, {181.5, 86.5}, {187.5,
80.5}, {168.5, 31.5}, {297.5, 37.5}, {244.5,
49.5}, {59.5, 29.5}, {216.5, 27.5}, {125.5,
38.5}, {225.5, 26.5}, {280.5, 23.5}, {152.5,
20.5}, {110.5, 3.5}, {313.5, 29.5}, {199.5, 7.5}, {32.5,
7.5}, {85.5, 3.5}, {236.5, 3.5}, {4.5, 25.5}, {10.5,
16.5}, {281.5, 4.5}, {155.5, 3.5}, {4.5, 3.5}, {34.5,
4.5}, {199.5, 4.5}},
VertexSize -> 3 {1, 1}, VertexStyle -> Red,
EdgeStyle -> Directive[Black]]

which yields the first graph g. Then, the code

Code:
hm = ConvexHullMesh[
Transpose[
Select[{GraphEmbedding[g], VertexDegree[g]} //
Transpose, #[[2]] == 1 &]][[1]]]
gb = Graph[hm["Edges"], VertexCoordinates -> MeshCoordinates[hm],
VertexSize -> 3 {1, 1}, VertexStyle -> Red,
EdgeStyle -> Directive[Black]]

yields

Now, how do I merge both graphs? I tried GraphUnion, but I would need the correct VertexCoordinates. Could it be simply an ordering problem? Any suggestion?

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.

#### Showing an inequality in vector space.

Hamdiken Asks: Showing an inequality in vector space.
Let $E$ be a normed vector space. We define a linear form $T:E\rightarrow \mathbb R$. Denote $H=Ker(T)$.

• Show that if $T$ is continuous, then $H$ is a closed set.

$H$ is the continuous inverse of a closed set ($\{0\}$), hence closed.

• Suppose that $H$ is closed. Let $a\in E$ such that $T(a)=1$. Show that the set $a+H$ is closed and does not contain $0$.

My guess is that since the set $\{a\}$ is bounded, and since $H$ is closed in a vector space, which means it's also bounded, gives the result that $a+H$ is closed. As for the $0$, we have for every $y\in a+H$, $T=T(a+x)=T(a)+T(x)=1$. However $T(0)=0\neq1$, so $0\notin a+H$.

• Deduce the existence of $r>0$ such that $B(0,r)\cap(a+H)=\emptyset$.

Since $0\notin a+H$, there exists a neighbourhood $V_0$ of $0$ such that $V_0\cap(a+H)=\emptyset$, hence the existence of an open ball $B(0,r)$ such that $B(0,r)\cap(a+H)=\emptyset$.

• Show that for every $x\in B(0,r)$, we have $$|Tx|\leq 1$$

This is where I stopped. I thought of the orthogonal projection but didn't understand how to proceed.

Any help and correction of the answers above will be highly appreciated.

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.

#### Monotone likelihood ratio for logistic distribution $f(x;\theta) = e^{-x - \theta} (1 + e^{-x - \theta})^{-2}$

Balkys Asks: Monotone likelihood ratio for logistic distribution $f(x;\theta) = e^{-x - \theta} (1 + e^{-x - \theta})^{-2}$
This is a question from Problems 9.4 of An Introduction to Probability and Statistics by Rohatgi.

Let $X$ have logistic distribution with the PDF $$f(x;\theta) = e^{-x - \theta} (1 + e^{-x - \theta})^{-2}, x \in \mathbb{R}.$$ Does ${f}$ have a monotone likelihood ratio?

I have tried to look at this related question, but the theory was beyond what I have learned.

My attempt: Let $\theta_1, \theta_0 \in \mathbb{R}$. Let $\theta_1 > \theta_0.$

Consider the likelihood ratio, $T_{NP}(\mathbf{x}) = \dfrac{f_n(\mathbf{x};\theta_1)}{f_n(\mathbf{x};\theta_0)} = \dfrac{ e^{-n \bar{x} - n \theta_1} \prod (1+e^{-x_i - \theta_1})^{-2} }{ e^{ -n \bar{x} - n \theta_0} \prod (1+e^{-x_i - \theta_0})^{-2} }.$

After simplifying more, I get $T_{NP}(\mathbf{x}) = e^{n(\theta_0 - \theta_1)} \prod \left(\dfrac{1+e^{-x_i - \theta_0}}{1+e^{-x_i - \theta_1}}\right)^2.$

I found that $\dfrac{d}{dx} \dfrac{1+e^{-x - \theta_0}}{1+e^{-x - \theta_1}}$ was negative for all $x$.

I tried to relate this to what I had done so far. I used that $\log$ is monotone increasing. Taking the log of the likelihood ratio, I get that the log-likelihood ratio is decreasing in $\sum \log \left(\dfrac{1+e^{-x_i - \theta_0}}{1+e^{-x_i - \theta_1}}\right).$ However, I am not sure how to get the required test statistic that is independent of the parameter $\theta$.

I think that a relevant result is that for a distribution from the exponential family, $f(x;\theta) = c(\theta)h(x)\exp(\pi(\theta)T(x))$, the likelihood ratio is monotone in $T(x)$.

I tried to use this by writing $f(x;\theta) = \exp{(-x-\theta)} \exp{(-2\log(1+e^{-x-\theta}))}.$ But I cannot identify $T(x)$ from this form.