Hopefully this post goes better..
So I am stuck on this feature of this program that will return the whole word where a certain keyword is specified.
ie - If I tell it to look for the word "I=" in the string "blah blah blah blah I=1mV blah blah etc?", that it returns the whole word where it is found, so in this case, it would return I=1mV.
I have tried a bunch of different approaches, such as,
text = "One of the values, I=1mV is used" print(re.split('I=', text))
However, this returns the same String without I in it, so it would return
['One of the values, ', '1mV is used']
If I try regex solutions, I run into the problem where the number could possibly be more then 1 digit, and so this bottom piece of code only works if the number is 1 digit. If I=10mV was that value, it would only return one, but if i have [/0-9] in twice, the code no longer works with only 1 value.
text = "One of the values, I=1mV is used" print(re.findall("I=[/0-9]", text)) ['I=1']
When I tried using re.match,
text = "One of the values, I=1mV is used" print(re.search("I=", text)) <_sre.SRE_Match object at 0x02408BF0>
What is a good way to retrieve the word (In this case, I want to retrieve I=1mV) and cut out the rest of the string?
A better way would be to split the text into words first:
>>> text = "One of the values, I=1mV is used" >>> words = text.split() >>> words ['One', 'of', 'the', 'values,', 'I=1mV', 'is', 'used']
And then filter the words to find the one you need:
>>> [w for w in words if 'I=' in w] ['I=1mV']
This returns a list of all words with
I= in them. We can then just take the first element found:
>>> [w for w in words if 'I=' in w] 'I=1mV'
Done! What we can do to clean this up a bit is to just look for the first match, rather then checking every word. We can use a generator expression for that:
>>> next(w for w in words if 'I=' in w) 'I=1mV'
Of course you could adapt the
if condition to fit your needs better, you could for example use
str.startswith() to check if the words starts with a certain string or
re.match() to check if the word matches a pattern.
For the record, your attempt to split the string in two halves, using
I= as the separator, was nearly correct. Instead of using
str.split(), which discards the separator, you could have used
str.partition(), which keeps it.
>>> my_text = "Loadflow current was I=30.63kA" >>> my_text.partition("I=") ('Loadflow current was ', 'I=', '30.63kA')
A more flexible and robust solution is to use a regular expression:
>>> import re >>> pattern = r""" ... I= # specific string "I=" ... \s* # Possible whitespace ... -? # possible minus sign ... \s* # possible whitespace ... \d+ # at least one digit ... (\.\d+)? # possible decimal part ... """ >>> m = re.search(pattern, my_text, re.VERBOSE) >>> m <_sre.SRE_Match object at 0x044CCFA0> >>> m.group() 'I=30.63'
This accounts for a lot more possibilities (negative numbers, integer or decimal numbers).
Note the use of:
a*- zero or more
a+- at least one
a?- "optional" - one or zero
re.VERBOSEflag) with comments - much easier to understand the pattern above than the non-verbose equivalent,
r"..."instead of plain strings
"..."- means that literal backslashes don't have to be escaped. Not required here because our pattern doesn't use backslashes, but one day you'll need to match
C:\Program Files\...and on that day you will need raw strings.
Exercise 1: How do you extend this so that it can match the unit as well? And how do you extend this so that it can match the unit as either
kA? Hint: "Alternation operator".
Exercise 2: How do you extend this so that it can match numbers in engineering notation, i.e. "1.00e3", or "-3.141e-4"?
import re text = "One of the values, I=1mV is used" l = (re.split('I=', text)) print str(l).split(' ') 
if you have more than one
I= do the above for each odd index in l sice 0 is the first one.
that is a good way since one can write "One of the values, I= 1mV is used" and I guess you want to get that I is 1mv.
BTW I is current and its units are Ampers and not Volts :)
With your re.findall attempt you would want to add a
+ which means one or more.
Here are some examples:
import re test = "This is a test with I=1mV, I=1.414mv, I=10mv and I=1.618mv." result = re.findall(r'I=[\d\.]+m[vV]', test) print(result) test = "One of the values, I=1mV is used" result = re.search(r'I=([\d\.]+m[vV])', test) print(result.group(1))
The first print is:
['I=1mV', 'I=1.414mv', 'I=10mv', 'I=1.618mv']
I've grouped everything other than
I= in the re.search example,
so the second print is:
incase you are interested in extracting that.