Greetings Venerable Masters and Dear Friends,

With a 'large' text file (HTML) edit requiring about 500 - 600 text-edits (deletions) (below) ... it takes 10 + seconds to execute and longer over LANs.

My only question thus: Is there a faster way to make 500 text deletions to a large-text file Or, how might you consider doing multiple edits with large text files

Thanks in advance for any comments, thoughts, rebukes, etc.



*-- (3b) Delete redundant (non-Print) entries && TAKES 8 LONG SECONDS

LOCAL lNo,lDeleteReference,lTempBody,lMidPosition,lStartPosition,lEndPosition,lExtracted

lNo = 1

lDeleteReference = [')"></A></div>] && lDeleteReference = [href=fn_DPW('29262')"></A></div>]

DO WHILE ATC(lDeleteReference,lHTMLtext) > 0

WAIT WINDOW NOWAIT "Calendar Adjustment ... " + ALLTRIM(STR(lNo))

lTempBody = lHTMLtext

lMidPosition = ATC(lDeleteReference,lTempBody)

lEndPosition = lMidPosition + LEN(lDeleteReference)

lStartPosition = lMidPosition - 170 &&135

lTempStart = RIGHT(lTempBody,LEN(lTempBody) - lStartPosition)

lStartPosition = lStartPosition + ATC([<Div],lTempStart) -1

lTempBody = RIGHT(lTempBody,LEN(lTempBody) - lStartPosition ) && again refined

lExtracted = SUBSTR(lTempBody,1,lEndPosition - lStartPosition - 1)

lHTMLtext = STRTRAN(lHTMLtext,lExtracted)

lNo = lNo + 1



Re: SUBSTR() STRTRAN() with Large Text Files

Tamar E. Granor

Can't you just use STRTRAN() on the whole thing:

cModified = STRTRAN(cOriginal, cThingsToDelete, "")


Re: SUBSTR() STRTRAN() with Large Text Files


Are you removing all div sections with a tags in it You can do it quicker with DOM or strextract().

Re: SUBSTR() STRTRAN() with Large Text Files


Thanks Tamar, I'll try. (The data is tricky) ...

Re: SUBSTR() STRTRAN() with Large Text Files


Thanks so much Cetin for your thoughtful reply.

I'm removing certain *peculiar* div sections, not all. I'll research/experiment DOM and strextract() and see what becomes.

Re: SUBSTR() STRTRAN() with Large Text Files


You're doing repeated atc on the entire file. The larger the file, the slower it will be because you're basically reading the file roughly the number of apearances devided to 2.

If your search / replace scope is too complicated and you can't use a single strtran, maybe a solution would be to parse the file by reading sections and doing the ATC on that section and writing to a file. One way you can do this is by using lower file IO like fopen, and a series of fgets.
I think this method should be faster than using strextract but probably slower than a single SIMPLE strtran.

Hope you don't need this but still...


Re: SUBSTR() STRTRAN() with Large Text Files


Thank you so much Aleniko for your extremely thoughtful reply,

I've never been comfortable with lower file functions (A.K.A. I'm clueless). Last night, I tried much of what Cetin and Tamar suggested, but to no avail, due to the search / replace complexity of the multiple STRTRANS() ... as you have observed.

As for parsing "sections", that seems promising based on your logic: Multiple ATC()s (..etc.) spread out on 5-10 smaller sections, temp-files (or memo-files) ... to be re-appended ... may speed things up, yes

I'll try to remember to let you know the results, especially if favorable.

Re: SUBSTR() STRTRAN() with Large Text Files


You don't need to devide the file to 5-10 smaller sections.
Just do a series of fgets. fgets reads until it reaches chr(13). So:

nextline = fgets(filehandle,8192) will read up to 8192 chars from your file or until it reaches chr(13).

So, you kinda read the file line by line. Run your find/replace algorithm on the line and then write it to a new file using fputs.

You will need:

fopen (To open input file)
fcreate (To open output file)
fgets (To read a line, which could be up to 8192 chars long)
fputs (To write to new file)
fclose (To close both files)
And of course your search replace routine.

Good luck.

Re: SUBSTR() STRTRAN() with Large Text Files


Thank you (much) again Aleniko,

I'll study your routine with the lower functions and see what comes up and try to remember to report back to you with results.

Re: SUBSTR() STRTRAN() with Large Text Files


OK Aleniko,

I did try the low file stuff, but was not able to figure a reference nor a chr(13) point to reach.

Then I stumbled accross ALINES(). This reduced time to a fraction of a second (vs. 8 - 10 seconds minimum). Here is the code:

lDeleteReference = [')"></A></div>] && lDeleteReference = [href=fn_DPW('29262')"></A></div>]

*-- Very Fast:

lHTMLtext = STRTRAN(lHTMLtext,'</div>','</div>' + CHR(13) + CHR(10)) && Place into CRLF increments for necessary lines

lNumofLines = ALINES(lHTMLary,lHTMLtext)

lHTMLtext =""

FOR N = 1 TO lNumofLines

IF ATC(lDeleteReference,lHTMLary(N)) = 0 AND !EMPTY(lHTMLary(N))

lHTMLtext = lHTMLtext + lHTMLary(N)



Again thank you for encouraging me through this obstacle.

Re: SUBSTR() STRTRAN() with Large Text Files


Great. One last thing to consider is the fact that your HTML code may not have cr+lf in some cases. Html disregards CRLF so you could in principal have an HTML file which will have no CRLF or one which sometimes does and sometimes not.

Re: SUBSTR() STRTRAN() with Large Text Files


My peculiar HTML text file was generated by a report listener; wherein the CRLFs may have defaulted via VFP-9 settings. I discovered that chr(13) had to be added to each line for 'A href' triggers to be operational.

(Note: I noticed that in Windows Notepad: chr(13) SANS chr(10) chokes-up 'micro-square' characters (vs. 'new lines), so I added the LFs for tidiness sake only.)

Sincere regards,